Research on text categorization model based on LDA-KNN

In the text classification,The similarity between the text need to be calculated,but the existing classification methods only consider the similarity between feature words and categories and does not involve the semantic similarity between feature words.In this paper,a new classification model LDA(Latent Dirichlet Allocation)– KNN(K-Nearest Neighbor)is proposed.LDA is used to solve the problem of semantic similarity measurement in traditional text categorization.The sample space is modeled and selected by this model.In the reduced feature space,KNN classifier is used to classify the sample.The experiment was based on the Matlab software platform,and the data set was obtained from the Chinese corpus of Fudan University,and the high precision classification result was obtained with the average value of 0.933.LDA-KNN model is compared with MI(Mutual Information)-KNN model and LSI(Latent Semantic Index)-KNN model.The results show that LDA-KNN model has superior classification performance in automatic text categorization.
Text classification LDA theme model KNN classification algorithm Feature select
Weihua Chen Xian Zhang
School of Computer Science and Technology,Wuhan University of Technology Hubei,China
国际会议
重庆
英文
2719-2726
2017-03-25(万方平台首次上网日期,不代表论文的发表时间)