Building Naive Bayes Document Classifier Using Word Clusters Based on Bootstrap Averaging

摘要：

Aimed to solve the problem of low classification accuracy caused by poor distribution estimation by training naive bayes document classfier on word clusters, we build a sequential word list based on mutual information between words and their semantic cluster labels, then construct a sample set of the same size with the word list through bootstrap sampling and use the average of the corresponding parameters estimated from the sample set as the last parameter to classify unknown documents. Experiment results on benchmark document data sets show that the proposed strategy gains higher classification accuracy comparing to naive bayes documents classifier on word clusters or on words.

作者: WANG Yuanzhe ZHANG Qiang BAI Liyuan

作者单位: Institute of Information Engineering ,Wuhan University of Technology, Wuhan,China, 430070 Henan Univ Henan University of Technology, Zhengzhou, China, 450052

会议类型: 国际会议

会议名称: 2009 IEEE International Symposium on IT in Medicine & Education( IEEE 教育与医药信息化国际会议)

会议地点: 济南

会议语种:英文

页码: 202-207

在线出版日期: 2009-08-14（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Building Naive Bayes Document Classifier Using Word Clusters Based on Bootstrap Averaging