会议专题

Design and Implementation of a Web News Extraction System

with the widespread use of Internet and the development of information technology, there is a tremendous amount of news information resource. The ability to quickly obtain useful resource from the huge news information is a crucial problem at present. Based on the analysis of the structure of the news portal page, this paper combines the technology of regular expressions and HTML-Parser, introduces a general method of news and information automatically extracted, and realizes an efficient general news information extraction system. The system can not only extract the headlines, time released, text content rightly, but also can extract the news information relevant or similar to the subject.

Information Extraction Regular Expressions Index Page Content Page

Hua-lin XIA Yang-sen ZHANG

Institute of Intelligence Information Processing Beijing information Science and Technology University Beijing 100192.China

国际会议

2011 Eighth International Conference on Fuzzy System and Knowledge Discovery(第八届模糊系统与知识发现国际会议 FSKD 2011)

上海

英文

1843-1847

2011-07-26(万方平台首次上网日期,不代表论文的发表时间)