会议专题

Blog Post and Comment Extraction Using Information Quantity of Web Format

With the development of the research on blogosphere,acquiring the post and comment from blog page becomes more important in improving the search performance.In this paper,we present a twostage method.First,we combine the advantage of the vision information and the effective text information to locate the main text which represents the theme of blog page.Second,we use the information quantityof separator to detect the boundary between the post and comment.According to our experiments,this method achieves a good performance inextraction and improves the performance of blog search.

Donglin Cao Xiangwen Liao Hongbo Xu Shuo Bai

Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080;Graduate School,the Chi Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080;Graduate School,the Chi Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080

国际会议

4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)

哈尔滨

英文

298-309

2008-01-16(万方平台首次上网日期,不代表论文的发表时间)