会议专题

A Clustering System for Gene Expression Data Based upon Genetic Programming and the HS-Model

Cluster analysis is a major method to study gene function and gene regulation information for there is a lack of prior knowledge for gene data. Many clustering methods existed at present usually need manual operations or predetermined parameters, which are difficult for gene data. Besides, gene data possess their own characteristics, such as large scale, high-dimension, and noise. Therefore, a systematic clustering algorithm should be proposed to effectively deal with gene data. In this paper, a novel genetic programming (GP) clustering system for gene data based on hierarchical statistical model (HS-model) is proposed. And an appropriate fitness function is also proposed in this system. This clustering system can largely eliminate the infection of data scale and dimension. The proposed GP clustering system is applied to cluster the whole intact yeast gene data without dimensionality reduction. The experimental results indicate that the algorithm is highly efficient and can effectively deal with missing values in gene dataset

genetic programming cluster analysis missing value fitness function

Guiquan Liu Xiufang Jiang Lingyun Wen

Key Laboratory of Software in Computing and Communication, Anhui Province School of Computer Science and Technology University of Science and Technology of China, Hefei, Anhui 230027, China

国际会议

The Third International Joint Conference on Computational Science and Optimization(第三届计算科学与优化国际大会 CSO 2010)

黄山

英文

238-241

2010-05-28(万方平台首次上网日期,不代表论文的发表时间)