Decision tree based predictive models for breast cancer survivability on imbalanced data
Based on imbalanced data, the predictive models for 5-year survivability of breast cancer using decision tree are proposed. After data preprocessing from SEER breast cancer datasets, it is obviously that the category of data distribution is imbalanced. Under-sampling is taken to make up the disadvantage of the performance of models caused by the imbalanced data. The performance of the models is evaluated by AUC under ROC curve, accuracy, specificity and sensitivity with 10-fold stratified cross-validation. The performance of models is best while the distribution of data is approximately equal. Bagging algorithm is used to build an integration decision tree model for predicting breast cancer survivability.
imbalanced data decision tree predictive breast cancer survivability 10-fold stratified cross-validation bagging algorithm
Liu Ya-Qin Wang Cheng Zhang Lu
Dept.of Biomedical Engineering School of Basic Medicine,Shanghai JiaoTong University Shanghai,China
国际会议
北京
英文
1-4
2009-06-11(万方平台首次上网日期,不代表论文的发表时间)