CLASS-INDEXING:THE EFFECTIVENESS OF CLASS-SPACE-DENSITY IN HIGH AND LOW-DIMENSIONAL VECTOR SPACE FOR TEXT CLASSIFICATION
Most of the previous studies of term weighting schemes emphasize on the document-indexing-based term weighting approach to address automatic text classification (ATC).In this study,we introduce class-indexing-based term-weighting approaches and judge their effects in high and comparatively low-dimensional dataset over the TF.IDF approach.First,we implement a class-indexing-based TF.IDF.ICF term weighting approach in which the inverse class frequency (ICF) is incorporated; provide positive discrimination on rare terms and biased against frequent terms in the TC task.Therefore,we revised the ICF function and implemented a new inverse class space density frequency (ICSδF) multiplied by TF.IDF,and generated the TF.IDF.ICSδF that provides a positive discrimination on infrequent and frequent terms.We present detailed evaluation of each category for the Reuters-21578 and 20 Newsgroups datasets.The experiment results show that the proposed class-indexing-based TF.IDF.ICSδF term weighting approach plays a significant role to overcome the problem of dimensionality reduction in the feature space.
Text classification indexing term weighting machine learning centroid classifier
Mohammad Golam Sohrab Fuji Ren
Faculty of Engineering,The University of Tokushima,Japan
国际会议
杭州
英文
2034-2042
2012-10-30(万方平台首次上网日期,不代表论文的发表时间)