会议专题

An Approach of Web Page Information Extraction

The Web has become the largest information source,but the noise content is an inevitable part in any web pages.The noise content reduces the nicety of search engine and increases the load of server.Information extraction technology has been developed.Information extraction technology is mostly based on page segmentation.Through analyzed the existing method of page segmentation,an approach of web page information extraction is provided.The block node is identified by analyzing attributes of HTML tags.This algorithm is easy to implementation.Experiments prove its good performance.

Information extraction DOM page segmentation HTML lag

Yaohui Li Yongqiang Wu Zhenyan Wang Liting Gao Lixia Wang Shucai Song

Department of Computer Science Hebei Institute of Architecture & Civil Engineering Zhangjiakou City, Academic Administration Hebei Institute of Architecture & Civil Engineering Zhangjiakou City,China

国际会议

2011 3rd International Conference on Computer and Network Technology(ICCNT 2011)(2011第三届IEEE计算机与网络技术国际会议)

太原

英文

386-388

2011-02-26(万方平台首次上网日期,不代表论文的发表时间)