会议专题

Research and Application on Ensemble Learning Methods

  As shown in previous data,diabetes has led to the increasing mortality and considerable financial expenditure in the US.It is necessary to find out how to making correct diagnosis and prescription of diabetes plays an important role in helping patients.That is why we choose the dataset of diabetic inpatients having diagnosis at hospitals in the US,and predict how different treatments and medications influence patient outcomes.We use the class attribute of readmission number to obtain the results.Because of the large and biased dataset,we firstly remove attributes with high missing value rate,and reduce the imbalance classes of instances by oversampling and under-sampling,then followed by the attribute selection through various methods,such as the Correlation-based feature selection,the Chi-Squared Attribute Evaluator,the Information Gain Attribute Evaluator,etc.Three classification methods C4.5,RIPPER,and Random Forests are used to predict the classification in Weka.In addition,we also use the ensemble learning methods including bagging and boosting to improve the stability and accuracy.From the analysing results,we can see that C4.5 and Ripper perform better,and both bagging and boosting increase the accuracy rate to differing degrees because both algorithms are somewhat unstable.There is no doubt that Random Forests is the best performer among all classification methods we use,and after using boosting,we see big increases in the values of the evaluation metrics we use.The final outcome is much better than random guess.

Dataset Data Preprocessing Ensemble learning Data mining Classification Models

Yuzhong Wang

Zhonghuan Information College Tianjin University of Technology,Tianjin,China

国际会议

2019中国智能自动化大会(CIA,2019)

江苏镇江

英文

145-155

2019-09-20(万方平台首次上网日期,不代表论文的发表时间)