Multi-LDA hybrid topic model with boosting strategy and its application in text classification
Topic modeling,especially Latent Dirichlet Allocation is an efficacious algorithm for feature selection and dimension reduction in text categorization tasks.Unlike the traditional Vector Space Model,LDA can easily overcome the curse of dimensionality and feature sparse problems.With the mapping from word space to the topic space,there are more benefits,but at the same time,the determination of model parameters turn into a new trouble.This article proposed a novel classification algorithm that combined different models with different parameters together via boosting strategy.Moreover,Na(i)ve Bayes and Support Vector Machine are employed as weak classifier and a weighted method is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more ration way.Experiment results show our method well perform both in accuracy and generalization.
Topic Model Latent Dirichlet Allocation Boosting
WANG Yongliang GUO Qiao
School of Automation,Beijing Institute of Technology,Beijing 100081
国际会议
The 33th Chinese Control Conference第33届中国控制会议
南京
英文
4802-4806
2014-07-28(万方平台首次上网日期,不代表论文的发表时间)