LCA-based Keyword Search for Effectively Retrieving “Information Unit from Web Pages

摘要：

With the rapid development of the Internet technology, the structured data are more and more prevalent in the Internet. Moreover, most web sites organize their data systematically and relevant data may be separated into different pages but linked through hyperlinks. However, the existing web search engines cannot integrate information from multiple interrelated pages to answer keyword queries meaningfully. Next-generation web search engines require link-awareness, or more generally, the capability of integrating correlative information items that are linked through hyperlinks. In this paper, we study the problems of identifying the “Information Unit of relevant pages containing all the input keywords as the answer. We model a set of most related web pages as a tree, where the nodes in the tree are the web pages and the edges are the links between the web pages.We retrieve the “Information Unit of the most related and connected subtrees instead of single web page as the answer. To improve the search effectiveness, we propose an efficient and effective LCA-based algorithm to identify those subtrees which are most related to the given input keywords. We have conducted a set of extensive experiments on the proposed algorithm. The experimental results show that our method achieves high search performance and outperforms the existing alternative methods significantly.

作者: Xiaoming Song Jianhua Feng Guoliang Li Qin Hong

作者单位: Computer Science and Technology,Tsinghua University Beijing 100084,China School of Physics and Opto.Electronics Technology,Fujian Normal University Fuzhou,Fujian 350007,Chin

会议类型: 国际会议

会议名称: The Ninth International Conference on Web-Age Information Management(第九届web时代信息管理国际会议)(WAIM 2008)

会议地点: 张家界

会议语种:英文

在线出版日期: 2008-07-20（万方平台首次上网日期，不代表论文的发表时间）

会议专题

LCA-based Keyword Search for Effectively Retrieving “Information Unit from Web Pages