Determining the repeat number of cross-validation

摘要：

The cross-validation is probably the most popular approach for estimating the classification error rate in classifying gene expression data. In order to reduce the variance of estimation, the procedure of cross-validation will be repeated to get the average result However, the repetition number of crossvalidation is generally set by an empirical value. This paper proposed two methods (FCI and TSE) for determining the repeat number of cross-validation based on the approximate confidence interval. The experimental results on real data show the empirical method of giving repeat number of cross-validation is usually unreliable and the proposed methods can determine cross-validation repeat number to achieve a pre-specified precision of the error rate. Furthermore, both methods can automatically adjust to meet the change of data, the value of k-fold and classification model.

关键词： microarray(gene expression) data classification error rate cross-validation

作者: Kun Yang Haipeng Wang Guojun Dai Sanqing Hu Yanbin Zhang Jing Xu

作者单位: School of Computer Science and Technology, Hangzhou Dianzi University, China, 310018 School of Stati School of Computer Science and Technology, Hangzhou Dianzi University, China, 310018 School of Statistics and Mathematics, Zhejiang Gongshang University, China, 310018

会议类型: 国际会议

会议名称: 2011 4th International Conference on Biomedical Engineering and Informatics(第四届生物医学工程与信息学国际会议 BMEI 2011)

会议地点: 上海

会议语种:英文

页码: 1718-1722

在线出版日期: 2011-10-15（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Determining the repeat number of cross-validation