Determining the repeat number of cross-validation
The cross-validation is probably the most popular approach for estimating the classification error rate in classifying gene expression data. In order to reduce the variance of estimation, the procedure of cross-validation will be repeated to get the average result However, the repetition number of crossvalidation is generally set by an empirical value. This paper proposed two methods (FCI and TSE) for determining the repeat number of cross-validation based on the approximate confidence interval. The experimental results on real data show the empirical method of giving repeat number of cross-validation is usually unreliable and the proposed methods can determine cross-validation repeat number to achieve a pre-specified precision of the error rate. Furthermore, both methods can automatically adjust to meet the change of data, the value of k-fold and classification model.
microarray(gene expression) data classification error rate cross-validation
Kun Yang Haipeng Wang Guojun Dai Sanqing Hu Yanbin Zhang Jing Xu
School of Computer Science and Technology, Hangzhou Dianzi University, China, 310018 School of Stati School of Computer Science and Technology, Hangzhou Dianzi University, China, 310018 School of Statistics and Mathematics, Zhejiang Gongshang University, China, 310018
国际会议
上海
英文
1718-1722
2011-10-15(万方平台首次上网日期,不代表论文的发表时间)