会议专题

Web Page Sectioning Using Regex-based Template

This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The structural and content information is leveraged via template, a generalized regular expression learnt over set of pages. The template along with visual information results into high sectioning accuracy. The experimental results demonstrate the effictiveness of the approach.

Site-speci c segmentation Site-specific noise elimination Tree-based reg-ex

Rupesh R. Mehta Amit Madaan

Yahoo! R&D Bangalore, India

国际会议

第十七届国际万维网大会(the 17th International World Wide Web Conference)(WWW08)

北京

英文

2008-04-21(万方平台首次上网日期,不代表论文的发表时间)