会议专题

Classification and identification of differential gene ezpression for microarray data: improvement of the random forest method

Classification and gene selection of microarray data have been important aspects of the investigation of gene expression data in biomedical researches. The analysis of gene expression data presents a new challenge for statistical methods because of its high dimensionality. Random forest has been used to deal with the problem. We present a new classifier named Recursive Random Forest which selects genes automatically and improves the accuracy of classification based on random forest. Three microarray datasets (ALL-AML Leukemia data, Colon Cancer data and Prostate cancer data) were analyzed using Recursive Random Forest. Although the genes selected from the microarray data were only a few, they were effective on cancer prediction and their biological functions have been confirmed. Especially on the ALL-AML Leukemia data, it achieved a perfect accuracy on the test set using only three genes (selected from over 7000). We also research the properties of random forest and recursive random forest on simulated experiments. Recursive random forest provides more useful information than simply using random forest for the further biological experiment, clinical diagnoses and disease therapies because of its function of gene selection, which would probably become an excellent tool on sample classification and gene selection for microarray data. Source code written in R for Recursive Random Forest is available from http://yxzy.hrbmu.edu.cn/gongwei/biostatistics/.

recursive random forest random forest microarray data classification gene selection

Xiaoyan Wu Zhenyu Wu Kang Li

Department of Biostatistics, Public Health College, Harbin Medical University, Harbin 150086, P.R. China

国际会议

The 2nd International Conference on Bioinformatics and Biomedical Engineering(iCBBE 2008)(第二届生物信息与生物医学工程国际会议)

上海

英文

763-766

2008-05-16(万方平台首次上网日期,不代表论文的发表时间)