Searching Semantically Similar Questions from a Large Community-based Question Archive
This paper provides a novel and totally statistical method to search similar questions from a large question archive for a given queried question. Firstly, a word relevance model is trained based on the whole question archive which is made up of millions of natural language questions proposed by users on the web. The word relevance model is utilized to find most semantically related words to a specific word. Secondly, in order to find semantically similar questions for a queried question, each non-stop word in a question is expanded with the help of word relevance model and represented as a word vector. Elements of the vector include the word itself and some semantically related words to it. Elements of the word vector are weighted by combining both classical IR term weighting method and word transformation probability learned from the relevance model. Then the question is mapped to a question vector as the normalized center of the word vectors representing these words contained in it. The problem of question retrieval can be solved by comparing the similarity between question vectors. The method is actually a simple question expansion based Kernel approach. Experimental results indicate the proposed method outperforms the baseline methods such as Vector Space Model (VSM) and Language Model for Information Retrieval (LMIR).
Mingrong LIU Yicen LIU Qing YANG
National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences, Beijing 100080, China
国际会议
大连
英文
1-8
2009-09-24(万方平台首次上网日期,不代表论文的发表时间)