Maximum Entropy Modeling with Feature Selection for Text Categorization
Maximum entropy provides a reasonable way of estimating probability distributions and has been widely used for a number of language processing tasks.In this paper,we explore the use of different feature selection methods for text categorization using maximum entropy modeling.We also propose a new feature selection method based on the difference between the relative document frequencies of a feature for both relevant and irrelevant classes.Our experiments on the Reuters RCV1 data set show that our own feature selection performs better than the other feature selection methods and maximum entropy modeling is a competitive method for text categorization.
Text Categorization Feature Selection Maximum Entropy Modeling
Jihong Cai Fei Song
Department of Computing and Information Science University of Guelph,Guelph,Ontario,Canada NIG 2W1
国际会议
4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)
哈尔滨
英文
549-554
2008-01-16(万方平台首次上网日期,不代表论文的发表时间)