会议专题

Using Subspace Analysis for Event Detection from Web Click-through Data

Although most of existing research usually detects events by analyzing the content or structural information of Web documents, a recent direction is to study the usage data. In this paper, we focus on detecting events fromWeb click-through data generated byWeb search engines. We propose a novel approach which effectively detects events from click-through data based on robust subspace analysis. We first transform click-through data to the 2D polar space. Next, an algorithm based on Generalized Principal Component Analysis (GPCA) is used to estimate subspaces of transformed data such that each subspace contains query sessions of similar topics. Then, we prune uninteresting subspaces which do not contain query sessions corresponding to real events by considering both the semantic certainty and the temporal certainty of query sessions in each subspace. Finally, various events are detected from interesting subspaces by utilizing a nonparametric clustering technique. Compared with existing approaches, our experimental results based on real-life click-through data have shown that the proposed approach is more accurate in detecting real events and more effective in determining the number of events.

click-through data event detection subspace estimation GPCA

Ling Chen Yiqun Hu Wolfgang Nejdl

L3S Research Center University of Hannover 30167 Hannover, Germany School of Computer Engg Nanyang Technological University Singapore 639798

国际会议

第十七届国际万维网大会(the 17th International World Wide Web Conference)(WWW08)

北京

英文

2008-04-21(万方平台首次上网日期,不代表论文的发表时间)