会议专题

Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data

Random forest is an excellent ensemble learning method, which is composed of multiple decision trees grown on random input samples and splitting nodes on a random subset of features. Due to its good classification and generalization ability, random forest has achieved success in various domains. However, random forest will generate many noisy trees when it learns from the data set that has high dimension with many noise features. These noisy trees will affect the classification accuracy, and even make a wrong decision for new instances. In this paper, we present a new approach to solve this problem through weighting the trees according to their classification ability, which is named Trees Weighting Random Forest (TWRF). Here, Out-Of-Bag, which is the training data subset generated by Bagging and not involved in building decision tree, is used to evaluate the tree. For simplicity, we choose the accuracy as the index that notes tree’s classification ability and set it as the tree’s weight. Experiments show that TWRF has better performance than the original random forest and other traditional methods, such as C45, Naive Bayes and so on.

Data mining Ensemble learning random forest classification

Hong Bo Li Wei Wang Hong Wei Ding Jin Dong

IBM Research - China Beijing, P. R. China

国际会议

2010 IEEE International Conference on e-Business Engineering(2010年电子商务工程国际研讨会 ICEBE 2010)

上海

英文

160-163

2010-11-10(万方平台首次上网日期,不代表论文的发表时间)