会议专题

Cluster Based Symbolic Representation and Feature Selection for Text Classification

In this paper, we propose a new method of representing documents based on clustering of term frequency vectors. For each class of documents we propose to create multiple clusters to preserve the intraclass variations. Term frequency vectors of each cluster are used to form a symbolic representation by the use of interval valued features. Subsequently we propose a novel symbolic method for feature selection. The corresponding symbolic text classification is also presented. To corroborate the efficacy of the proposed model we conducted an experimentation on various datasets. Experimental results reveal that the proposed method gives better results when compared to the state of the art techniques. In addition, as the method is based on a simple matching scheme, it requires a negligible time.

Text Document Term Frequency Vector Fuzzy C Means Symbolic Representation Interval Valued Features Symbolic Feature Selection Text Classification

B.S. Harish D.S. Guru S. Manjunath R. Dinesh

Department of Studies in Computer Science,University of Mysore, Mysore 570 006, India Honeywell Technologies Ltd Bangalore, India

国际会议

6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)

重庆

英文

158-166

2010-11-19(万方平台首次上网日期,不代表论文的发表时间)