Biclustering in bioinformatics using big data and High Performance Computing applications: challenges and perspectives, a review

López Fernández, AurelioGómez-Vela, Francisco AntonioDelgado Cháves, Fernando M.Rodríguez Baena, Domingo SavioGonzález Dominguez, Jorge2025-07-102025-07-102025-07-08J Supercomput 81, 1123 (2025).10.1007/s11227-025-07563-6https://hdl.handle.net/10433/24404Biclustering is a powerful machine learning technique that simultaneously groups rows and columns in matrix-based datasets. Applied to gene expression data in bioinformatics, its use has expanded alongside the rapid growth of high-throughput sequencing technologies, leading to massive and complex biological datasets. This review aims to examine how biclustering methods and their validation strategies are evolving to meet the demands of High Performance Computing (HPC) and Big Data environments. We present a structured classification of existing approaches based on the computational paradigms they employ, including MPI/OpenMP, Apache Hadoop/Spark, and GPU/CUDA. By synthesising these developments, we highlight current trends and outline key research challenges. The knowledge gathered in this work may support researchers in adapting and scaling biclustering algorithms to analyse large-scale biomedical data more efficiently. Our contribution is intended to bridge the gap between algorithmic innovation and computational scalability in the context of bioinformatics and data-intensive applications.application/pdfenAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/Big DataBiological DatabasesData Analysis and Big DataFunctional clusteringProtein DatabasesBioinformaticsBiclustering in bioinformatics using big data and High Performance Computing applications: challenges and perspectives, a reviewBiclustering in bioinformatics using big data and High Performance Computing applications: challenges and perspectives, a reviewjournal articleopen access