Blog Post and Comment Extraction Using Information Quantity of Web Format
With the development of the research on blogosphere,acquiring the post and comment from blog page becomes more important in improving the search performance.In this paper,we present a twostage method.First,we combine the advantage of the vision information and the effective text information to locate the main text which represents the theme of blog page.Second,we use the information quantityof separator to detect the boundary between the post and comment.According to our experiments,this method achieves a good performance inextraction and improves the performance of blog search.
Donglin Cao Xiangwen Liao Hongbo Xu Shuo Bai
Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080;Graduate School,the Chi Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080;Graduate School,the Chi Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100080
国际会议
4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)
哈尔滨
英文
298-309
2008-01-16(万方平台首次上网日期,不代表论文的发表时间)