会议专题

Two-stage Feature Selection Method for Text Classification

Dimension reduction is the process of reducing the number of random features under consideration, and can be divided into the feature selection and the feature extraction. A two-stage feature selection method based on the Regularized Least Squares-Multi Angle Regression and Shrinkage (RLS-MARS) model is proposed in this paper: In the first stage, a new weighting method, the Term Frequency Inverse Document and Category Frequency Collection normalization (TF-IDCFC) is applied to measure the features, and select the important features by using the category information as a factor. In the second stage, the RLS-MARS model is used to select the relevant information, while the Regularized Least Squares (RLS) with the Least Angle Regression and Shrinkage (LARS) can be viewed as an efficient approach. The experiments on Fudan University Chinese Text Classification Corpus and 20 Newsgroups, both of those datasets demonstrate the effectiveness of the new feature selection method for text classification in several classical algorithms: KNN and SVMLight.

Text Classification Feature Selection TF-IDCFC RLS LARS RLS-MARS

LI Xi DAI Hang WANG Mingwen

School of Mathematics & Computer Science, Jiangxi Science & Technology Normal University, Nanchang, School of Computer Information Engineering, Jiangxi Normal University, Nanchang, China

国际会议

The First International Conference on Multimedia Information Networking and Security(第一届国际多媒体网络信息安全会议 MINES 2009)

武汉

英文

234-238

2009-11-18(万方平台首次上网日期,不代表论文的发表时间)