The Problem of Classification in Imbalanced Data Sets in Knowledge Discovery

摘要：

It has been observed that classification in imbalanced data sets have drawn more attention to researchers in knowledge discovery and data mining fields. In such problems, almost all the samples are labeled as one class, while far fewer samples are labeled as the other class, which are usually more important. But traditional classifiers that try to pursue whole accurate performance over a full range of samples are not suitable to deal with classification in imbalanced data sets, since they tend to biases towards majority class while pay less attention to the rare one. In the present work, we perform a review of the most important research lines on this topic and point out several directions for further investigation.

关键词： knowledge discovery classification imbalanced data sets sampling ensemble

作者: Haifeng sui Bingru Yang Yun Zhai Wu Qu Yun Zhai Bing An

作者单位: School of Information Engineering University of Science and Technology Beijing, Beijing .China School of Computer Science Liaocheng University Liaocheng, China

会议类型: 国际会议

会议名称: The 2010 International Conference on Computer Application and System Modeling(2010计算机应用与系统建模国际会议 ICCASM 2010)

会议地点: 太原

会议语种:英文

页码: 658-661

在线出版日期: 2010-10-22（万方平台首次上网日期，不代表论文的发表时间）

会议专题

The Problem of Classification in Imbalanced Data Sets in Knowledge Discovery