会议专题

The Research of Applying Support Vector Machine into Web Page Information Extraction Algorithm Based on Visual Characteristics

With the development of the Internet, the amount of information in web page is constantly increasing, information intensive degree is strengthened ceaselessly. But the theme of the web page information is usually not very clear, and extracting thematic information is very ifficult. This paper presents a new web page information extraction algorithm, in accordance with the theme web page visual characteristics to construct web page tag tree, analyze web page and split web page into blocks, eliminate noise node in web page. According to web pages index block and theme block characteristics difference and semantic difference use trained Support Vector Machine to classify and identify index blocks and theme blocks, then extract topic information of web pages. The experimental results show that, the application of support vector machine in the web page information extraction algorithm based on visual features is effective to identify theme web pages, complete the task of extracting text information of theme-orie nted web pages accurately, and achieve good experimental results.

Support Vector Machine Visual characteristics information extraction page splitting

JianJing Li ChunYing Zhang Xiao Chen ChunBo Li

Qianan College Hebei United University Qianan, Hebei, China Zhongxin Bank Tangshan, Hebei, China

国际会议

2012 International Conference on Electric Technology and Civil Engineering(2012 电子技术与土木工程国际会议 ICETCE 2012)

三峡

英文

2025-2027

2012-05-18(万方平台首次上网日期,不代表论文的发表时间)