Principal component tests: applied to temporal gene ezpression data
Background: Clustering analysis is a common statistical tool for knowledge discovery. It is mainly conducted when a project still is in the exploratory phase without any priori hypotheses. However, the statistical significance testing between the clusters can be meaningful in helping the researchers to assess if the classification results from implementing a clustering algorithm need to be improved, even after the cluster number has been determined by a well-established criterion. This is important when we want to identify highly-specific patterns through classification.Results: We proposed to use a principal component (PC) test, which is an implementation of an exact F statistic for the measures at multiple endpoints based on elliptical distribution theory, to assess the statistical significance between clusters. A challenge in the implementation is the choice of the number (q) of principal components to be considered, which can severely influence the statistical power of the method. We optimized the determination via validation according to a permutation test based on the clustering to be evaluated. The method was applied to a public dataset in classifying genes according to their temporal gene expression profiles.Conclusions: The results demonstrated that the PC testing were useful for determining the optimal number of clusters.
Wensheng Zhang Hong-bin Fang Jiuzhou Song
Department of Animal and Avian Science, University of Maryland, College Park, MD 20742 USA Division of Biostatistics, University of Maryland Greenebaum Cancer Center, Baltimore, MD 21201 USA
国际会议
The 7th Asia-Pacific Bioinformatics Conference(第七届亚太生物信息学大会)
北京
英文
278-286
2009-01-01(万方平台首次上网日期,不代表论文的发表时间)