An Automatic Approach to Extracting Review Link from Chinese News Pages

摘要：

Review links are widely used in some special kinds, of web pages, especially news pages. They are very useful pieces of information in many applications; such as hot topic discovery and public opinion monitoring. Unfortunately, extracting review links manually from news pages is time-consuming and errorprone. Though lots of works on web data extraction have been-developed, we argue that this is still.not a trivial problem due to the diversity on both DOM tree structure and visual- presentation. In this paper, a novel approach is proposed for automatically extracting the review links from web pages. This approach consists of two steps: first segment each news page into a set of blocks, and then identify, the block(s) that contain the review link using a machine learning technique. Experimental results over a large number of Chinese news pages indicate that this approach is highly accurate.

关键词： Web data extraction Review link Machine learning Visual feature

作者: Wei Liu

作者单位: Information Source Center Institute of Scientific and Technical Information of China Beijing, 100038

会议类型: 国际会议

会议名称: 2011 6th Joint International Information Technology and Artificial Intelligence Conference(2011年第六届IEEE联合国际信息技术与人工智能会议 IEEE ITAIC 2011)

会议地点: 重庆

会议语种:英文

页码: 914-918

在线出版日期: 2011-08-20（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An Automatic Approach to Extracting Review Link from Chinese News Pages