Query-Sets: Using Implicit Feedback and Query Patterns to Organize Web Documents
In this paper we present a new document representation model based on implicit user feedback obtained from search engine queries. The main objective of this model is to achieve better results in non-supervised tasks, such as clustering and labeling, through the incorporation of usage data obtained from search engine queries. This type of model allows us to discover the motivations of users when visiting a certain document. The terms used in queries can provide a better choice of features, from the user’s point of view, for summarizing theWeb pages that were clicked from these queries. In this work we extend and formalize as query model an existing but not very well known idea of query view for document representation. Furthermore, we create a novel model based on frequent query patterns called the query-set model. Our evaluation shows that both query-based models outperform the vector-space model when used for clustering and labeling documents in a website. In our experiments, the query-set model reduces by more than 90% the number of features needed to represent a set of documents and improves by over 90% the quality of the results. We believe that this can be explained because our model chooses better features and provides more accurate labels according to the user’s expectations.
Feature Selection Labeling Search Engine Queries Usage Mining Web Page Organization
Barbara Poblete Ricardo Baeza-Yates
Web Research Group University Pompeu Fabra Barcelona, Spain Yahoo! Research & Barcelona Media Innovation Center Barcelona, Spain
国际会议
第十七届国际万维网大会(the 17th International World Wide Web Conference)(WWW08)
北京
英文
2008-04-21(万方平台首次上网日期,不代表论文的发表时间)