An Automatic Approach to Extracting Review Link from Chinese News Pages
Review links are widely used in some special kinds, of web pages, especially news pages. They are very useful pieces of information in many applications; such as hot topic discovery and public opinion monitoring. Unfortunately, extracting review links manually from news pages is time-consuming and errorprone. Though lots of works on web data extraction have been-developed, we argue that this is still.not a trivial problem due to the diversity on both DOM tree structure and visual- presentation. In this paper, a novel approach is proposed for automatically extracting the review links from web pages. This approach consists of two steps: first segment each news page into a set of blocks, and then identify, the block(s) that contain the review link using a machine learning technique. Experimental results over a large number of Chinese news pages indicate that this approach is highly accurate.
Web data extraction Review link Machine learning Visual feature
Wei Liu
Information Source Center Institute of Scientific and Technical Information of China Beijing, 100038
国际会议
重庆
英文
914-918
2011-08-20(万方平台首次上网日期,不代表论文的发表时间)