Searching Semantically Similar Questions from a Large Community-based Question Archive

摘要：

This paper provides a novel and totally statistical method to search similar questions from a large question archive for a given queried question. Firstly, a word relevance model is trained based on the whole question archive which is made up of millions of natural language questions proposed by users on the web. The word relevance model is utilized to find most semantically related words to a specific word. Secondly, in order to find semantically similar questions for a queried question, each non-stop word in a question is expanded with the help of word relevance model and represented as a word vector. Elements of the vector include the word itself and some semantically related words to it. Elements of the word vector are weighted by combining both classical IR term weighting method and word transformation probability learned from the relevance model. Then the question is mapped to a question vector as the normalized center of the word vectors representing these words contained in it. The problem of question retrieval can be solved by comparing the similarity between question vectors. The method is actually a simple question expansion based Kernel approach. Experimental results indicate the proposed method outperforms the baseline methods such as Vector Space Model (VSM) and Language Model for Information Retrieval (LMIR).

作者: Mingrong LIU Yicen LIU Qing YANG

作者单位: National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences, Beijing 100080, China

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-8

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Searching Semantically Similar Questions from a Large Community-based Question Archive