会议专题

Determining the repeat number of cross-validation

The cross-validation is probably the most popular approach for estimating the classification error rate in classifying gene expression data. In order to reduce the variance of estimation, the procedure of cross-validation will be repeated to get the average result However, the repetition number of crossvalidation is generally set by an empirical value. This paper proposed two methods (FCI and TSE) for determining the repeat number of cross-validation based on the approximate confidence interval. The experimental results on real data show the empirical method of giving repeat number of cross-validation is usually unreliable and the proposed methods can determine cross-validation repeat number to achieve a pre-specified precision of the error rate. Furthermore, both methods can automatically adjust to meet the change of data, the value of k-fold and classification model.

microarray(gene expression) data classification error rate cross-validation

Kun Yang Haipeng Wang Guojun Dai Sanqing Hu Yanbin Zhang Jing Xu

School of Computer Science and Technology, Hangzhou Dianzi University, China, 310018 School of Stati School of Computer Science and Technology, Hangzhou Dianzi University, China, 310018 School of Statistics and Mathematics, Zhejiang Gongshang University, China, 310018

国际会议

2011 4th International Conference on Biomedical Engineering and Informatics(第四届生物医学工程与信息学国际会议 BMEI 2011)

上海

英文

1718-1722

2011-10-15(万方平台首次上网日期,不代表论文的发表时间)