会议专题

An Automatic Approach to Extracting Review Link from Chinese News Pages

Review links are widely used in some special kinds, of web pages, especially news pages. They are very useful pieces of information in many applications; such as hot topic discovery and public opinion monitoring. Unfortunately, extracting review links manually from news pages is time-consuming and errorprone. Though lots of works on web data extraction have been-developed, we argue that this is still.not a trivial problem due to the diversity on both DOM tree structure and visual- presentation. In this paper, a novel approach is proposed for automatically extracting the review links from web pages. This approach consists of two steps: first segment each news page into a set of blocks, and then identify, the block(s) that contain the review link using a machine learning technique. Experimental results over a large number of Chinese news pages indicate that this approach is highly accurate.

Web data extraction Review link Machine learning Visual feature

Wei Liu

Information Source Center Institute of Scientific and Technical Information of China Beijing, 100038

国际会议

2011 6th Joint International Information Technology and Artificial Intelligence Conference(2011年第六届IEEE联合国际信息技术与人工智能会议 IEEE ITAIC 2011)

重庆

英文

914-918

2011-08-20(万方平台首次上网日期,不代表论文的发表时间)