会议专题

A Cluster-based Regrouping Approach for Imbalanced Data Distributions

In real-world applications,it has been observed that class imbalance (significant differences in class prior probabilities) may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. In this paper, we propose a Clusterbased Regrouping approach (CR) which divides the whole training data into positive group and negative group by clustering through the outlier factor. As a result, the similar samples will be in the same group while the dissimilar samples will be in the different groups. Then the basic classifier is employed to build the models on both the positive group and the negative group respectively. When classifying the new object, the model used to evaluate will be chosen according to the type of the group which the new object is nearest. The experimental results demonstrate that our approach achieved promising performance in some cases by directly or indirectly reducing the class distribution skewness.

Imbalanced data classification One-pass clustering Na.ve-bayes C4.5

Wen Yu ShengYi Jiang

School of Management Guangdong University of Foreign Studies Guangzhou 510006,China School of Informatics Guangdong University of Foreign Studies Guangzhou 510006,China

国际会议

第四届全国可信计算学术会议

宜昌

英文

121-124

2010-10-10(万方平台首次上网日期,不代表论文的发表时间)