Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data

摘要：

Random forest is an excellent ensemble learning method, which is composed of multiple decision trees grown on random input samples and splitting nodes on a random subset of features. Due to its good classification and generalization ability, random forest has achieved success in various domains. However, random forest will generate many noisy trees when it learns from the data set that has high dimension with many noise features. These noisy trees will affect the classification accuracy, and even make a wrong decision for new instances. In this paper, we present a new approach to solve this problem through weighting the trees according to their classification ability, which is named Trees Weighting Random Forest (TWRF). Here, Out-Of-Bag, which is the training data subset generated by Bagging and not involved in building decision tree, is used to evaluate the tree. For simplicity, we choose the accuracy as the index that notes tree’s classification ability and set it as the tree’s weight. Experiments show that TWRF has better performance than the original random forest and other traditional methods, such as C45, Naive Bayes and so on.

关键词： Data mining Ensemble learning random forest classification

作者: Hong Bo Li Wei Wang Hong Wei Ding Jin Dong

作者单位: IBM Research - China Beijing, P. R. China

会议类型: 国际会议

会议名称: 2010 IEEE International Conference on e-Business Engineering(2010年电子商务工程国际研讨会 ICEBE 2010)

会议地点: 上海

会议语种:英文

页码: 160-163

在线出版日期: 2010-11-10（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data