AN IMPROVED MEASURING SIMILARITY FOR SHORT TEXT SNIPPETS AND ITS APPLICATION IN CLUSTERING SEARCH ENGINE

摘要：

Measuring the similarity of short text snippets plays an important role in information retrieval and natural language processing. Measuring the similarity for short text snippets, such as search queries, remains a challenging task. In this paper, we develop a new similarity measure, which can further improve the accuracy of semantic similarity for short text snippets, especially in the case of insufficient content, such as web page snippets. Then we introduce our similarity measure combined with information entropy to the clustering search engine to automatically find the best clustering numbers. Meanwhile, we rank the clusters with our method and illustrate the results.

关键词： Semantic similarity clustering search engine information entropy

作者: ZHAO LI HONG PENG PENG PENG XI-PING JIA JIA-BING WANG

作者单位: School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510640, China

会议类型: 国际会议

会议名称: 2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

会议地点: 昆明

会议语种:英文

页码: 1581-1585

在线出版日期: 2008-07-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

AN IMPROVED MEASURING SIMILARITY FOR SHORT TEXT SNIPPETS AND ITS APPLICATION IN CLUSTERING SEARCH ENGINE