会议专题

Multi-LDA hybrid topic model with boosting strategy and its application in text classification

  Topic modeling,especially Latent Dirichlet Allocation is an efficacious algorithm for feature selection and dimension reduction in text categorization tasks.Unlike the traditional Vector Space Model,LDA can easily overcome the curse of dimensionality and feature sparse problems.With the mapping from word space to the topic space,there are more benefits,but at the same time,the determination of model parameters turn into a new trouble.This article proposed a novel classification algorithm that combined different models with different parameters together via boosting strategy.Moreover,Na(i)ve Bayes and Support Vector Machine are employed as weak classifier and a weighted method is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more ration way.Experiment results show our method well perform both in accuracy and generalization.

Topic Model Latent Dirichlet Allocation Boosting

WANG Yongliang GUO Qiao

School of Automation,Beijing Institute of Technology,Beijing 100081

国际会议

The 33th Chinese Control Conference第33届中国控制会议

南京

英文

4802-4806

2014-07-28(万方平台首次上网日期,不代表论文的发表时间)