会议专题

Research and Implementation of Web Structure-Based News Gathering System

On the basis of depth studying the technology of web information gathering, a web structure-based news gathering model is proposed. Firstly, it load the gathering entry address, find the news list page with the Information Gathering and Filter Algorithm, then identify and improve the news content page link address according to the rules set by acquisition and combined with regular expression technology automatically, and then load the target page-news content page, gather the news information with the algorithm automatically. At the same time, it can fdter any information that is set in this page such as embedded advertising messages. Practical results show that the proposed model works well; it can gather news information efficiently and automatically.

Web Structure Web Gathering News Gathering Regular Expressions

Jianguo Chen Minrong Lu XiaoYu Ke

Software School, Hunan University, Changsha 410082, China Software College, Fujian University of Technology, Fujian,3 50003, China

国际会议

2010 Second Asia-Pacific Conference on Information Processing(2010年第二届亚太地区信息处理国际会议 APCIP 2010)

南昌

英文

94-97

2010-09-17(万方平台首次上网日期,不代表论文的发表时间)