A Residue-Based Cluster Validity Indez for Gene Ezpression Data Biclustering
Biclustering consists in simultaneous partitioning of the set of genes and the set of their conditions into biclusters using the gene expression data. In theory, the automated variable weighting K-means clustering algorithm (W-K-means) is proper to conduct the biclustering issue. However, it is critical for the W-K-means algorithm to assign the number of biclusters, K, because the quality of biclustering result highly depends on the parameter setting. In this paper, we proposed a novel residue-based cluster validity index to determine the K value. The residue is an indicator of the coherence degree of its corresponding expression level with respect to remaining expression levels within a bicluster. The evaluation of coherent tendency using residues is easier than that using expression levels, so analyzing the Mean Squared Residue (MSR) model which takes the residue into account is helpful for the biclustering issue. The main concept of our proposed index lies in translating the result of the W-K-means algorithm, including the gene-bicluster membership matrix and the conditionbicluster membership matrix, to match the mean squared residue (MSR) model. Therefore, the appropriate number of biclusters generated by the W-K-means algorithm can be determined based on the MSR model so that the determination result becomes meaningful and reasonable.
biclustering cluster validity indez residue mean squared residue model W-K-means algotirhm
Chieh-Yuan Tsai Chuang-Cheng Chiu
Department of Industrial Engineering and Management Yuan Ze University Chung-Li,Taiwan
国际会议
北京
英文
1-4
2009-06-11(万方平台首次上网日期,不代表论文的发表时间)