会议专题

A TOPIC-BASED DOCUMENT CORRELATION MODEL

Document correlation analysis is now a focus of study in text mining. This paper proposed a Document Correlation Model to capture the correlation between documents from topic level. The model represents the document correlation as the Optimal Matching of a bipartite graph, of which each partition is a document, each node is a topic, and each edge is the similarity between two topics. The topics of each document are retrieved by the Latent Dirichlet Allocation model and Gibbs sampling. Experiments on correlated document search show that the Document Correlation Model outperforms the Vector Space Model on two aspects: 1) it has higher average retrieval precision; 2) it needs less space to store a documents information.

Topic document correlation document retrieval tezt mining

XI-PING JIA HONG PENG QI-LUN ZHENG ZHUO-LIN JIANG ZHAO LI

School of Computer Science and Engineering, South China University of Technology, Guangzhou 510640, China

国际会议

2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

昆明

英文

2487-2491

2008-07-12(万方平台首次上网日期,不代表论文的发表时间)