会议专题

CLASS-INDEXING:THE EFFECTIVENESS OF CLASS-SPACE-DENSITY IN HIGH AND LOW-DIMENSIONAL VECTOR SPACE FOR TEXT CLASSIFICATION

  Most of the previous studies of term weighting schemes emphasize on the document-indexing-based term weighting approach to address automatic text classification (ATC).In this study,we introduce class-indexing-based term-weighting approaches and judge their effects in high and comparatively low-dimensional dataset over the TF.IDF approach.First,we implement a class-indexing-based TF.IDF.ICF term weighting approach in which the inverse class frequency (ICF) is incorporated; provide positive discrimination on rare terms and biased against frequent terms in the TC task.Therefore,we revised the ICF function and implemented a new inverse class space density frequency (ICSδF) multiplied by TF.IDF,and generated the TF.IDF.ICSδF that provides a positive discrimination on infrequent and frequent terms.We present detailed evaluation of each category for the Reuters-21578 and 20 Newsgroups datasets.The experiment results show that the proposed class-indexing-based TF.IDF.ICSδF term weighting approach plays a significant role to overcome the problem of dimensionality reduction in the feature space.

Text classification indexing term weighting machine learning centroid classifier

Mohammad Golam Sohrab Fuji Ren

Faculty of Engineering,The University of Tokushima,Japan

国际会议

2012 2nd IEEE International Conference on Cloud Computing and Intelligence Systems (2012年第2届IEEE云计算与智能系统国际会议(IEEE CCIS2012))

杭州

英文

2034-2042

2012-10-30(万方平台首次上网日期,不代表论文的发表时间)