%0 Thesis %A Rodríguez Baena, Domingo Savio %T Extracción y validación de biclusters a partir de bases de datos binarios %D 2012 %U http://hdl.handle.net/10433/7298 %X Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been speciallydeveloped to be applied to binary datasets. Several approaches based on matrix factorization or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations.We propose a novel approach to extracting biclusters from binary datasets,BiBit. The results obtained from different experiments with synthetic datareveal the excellent performance and the robustness of BiBit to density andsize of input data. Also, BiBit is applied to a central nervous system embryonictumor gene expression dataset to test the quality of the results. Anovel gene expression pre-processing methodology, based on expression levellayers, and the selective search performed by BiBit, based on a very fastbit-pattern processing technique, provide very satisfactory results in qualityand computational cost. The power of biclustering in finding genes involvedsimultaneously in different cancer processes is also shown. Finally, a comparisonwith Bimax, one of the most cited binary biclustering algorithms,shows that BiBit is faster while providing essentially the same results.Besides, in this work, we introduce a software tool, named CarGene (Characterizationof Genes), that helps scientists to validate sets of genes usingbiological knowledge. The integration of huge databases with searching techniquesin order to automatically validate results from different sources isa key factor in bioinformatics. Several tools have been developed for analysinggene¿enrichment in terms. Most of them are Gene Ontology-based tools,i.e., these analyse gene-enrichment in GO annotations. CarGene uses metabolicpathways stored in the Kyoto Encyclopedia of Genes and Genomes(Kegg) and provides a friendly graphical environment to analyse and compareresults generated by different clustering and/or biclustering techniques.CarGene is based on the degree of coherence of genes in (bi)clusters withrespect to metabolic pathways of organisms stored in Kegg, and provides anestimate of obtaining results by chance, including two statistical corrections(Bonferroni, andWestfall and Young). One of the most important featuresof CarGene is the possibility of simultaneously comparing and statisticallyanalysing the information about many groups of genes in both visual andtextual manner. Furthermore, it includes its own web browser to explore indetail the information extracted from Kegg. %K Datasets binarios %K Algoritmos %K Biclusters %~