A Classification for Short Text Based on Category Distinguishing Features
Short text is characterized with sparseness and weak description for concept, which make the traditional classification unsuitable for short text.Existing classification methods for short text can be divided into two categories.One tends to expand the feature space with the help of some external resources such as wiki.This type of methods is time-consuming and the results are largely dependent on the quality of the external resources.The other selects features and instances in an iterative process, in which, the feature selection is the key for the classification.In this paper, we prefer the latter and propose a short text classification based on the category distinguishing abilities of features.Firstly, we select the features with higher ability for category distinguishing, and extract the training and test subset with those selected features.Secondly, the training subset is used to pre-classify the test subset.The process is iterated until the all test data are labeled.Experimental results show the effectiveness of the proposed method and its superiority over the existing methods.
Feature Selection Sparseness Short text classification
Xuegang Hu Chaoqun Yang Yuhong Zhang
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China
国际会议
三亚
英文
304-310
2015-12-26(万方平台首次上网日期,不代表论文的发表时间)