Clustering of SNP Data based on SCLIQUE
SNP clustering is an indispensable exploratory tool of biology researchers, which can identify coexpression or co-regulated genes, and predict functions of unknown genes according to the same cluster of genes with known ones. CLIQUE clustering algorithm is an effective way to solve highdimensional clustering problems, but it is not applicable for categorical data. Single nucleotide polymorphisms (SNPs) are single base pair positions in genomic DNA at which different sequence alternatives (alleles) exist in normal individuals in some population(s). SNPS data is genotype value, which belongs to the categorical data. In this paper, we improve CLIQUE algorithm aimed at SNP clustering from three aspects: re-defining the grids division, re-defining common face between two units, redefining rules on the generation of high-dimensional candidate dense units. Experiments show that the proposed algorithm SCLIQUE not only takes the advantages of CLIQUE algorithm, but also expands CLIQUE clustering algorithm from numer ical space to categorical space.
SNP clustering high dimensional clustering SCLIQUE algorithm categorical data
Min Jia Yue Wu Zhou Lei Zongtian Liu
Computer Engineering and Science Shanghai, China
国际会议
哈尔滨
英文
2359-2363
2011-12-24(万方平台首次上网日期,不代表论文的发表时间)