会议专题

Building Naive Bayes Document Classifier Using Word Clusters Based on Bootstrap Averaging

Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive bayes document classfier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive bayes documents classifier on word clusters or on words.

WANG Yuanzhe ZHANG Qiang BAI Liyuan

Institute of Information Engineering ,Wuhan University of Technology, Wuhan,China, 430070 Henan Univ Henan University of Technology, Zhengzhou, China, 450052

国际会议

2009 IEEE International Symposium on IT in Medicine & Education( IEEE 教育与医药信息化国际会议)

济南

英文

202-207

2009-08-14(万方平台首次上网日期,不代表论文的发表时间)