Research on Web information extraction based on Spider Algorithm and DOM thinking

摘要：

The structure characteristics of the website is complicated, Web information structure is not fixed and not neat, so it is inefficient that the Web information is captured largely, the integration of Web information is very difficulty. Research Web information extraction technology, put forward and carry out a new method based on a spider algorithm and DOM thinking. Experimental results show that the method can extract information efficiently and accurately on the Web.

关键词： Web information extraction spider algorithm DOM tree website structure

作者: Xinchao Han XiangDong Li Qiusheng Zheng

作者单位: ZhongYuan University of Technology HeNan,China

会议类型: 国际会议

会议名称: 2010 International Conference on Information,Networking and Automation(2010 IEEE信息网络与自动化国际会议 ICINA 2010)

会议地点: 昆明

会议语种:英文

页码: 182-185

在线出版日期: 2010-10-17（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Research on Web information extraction based on Spider Algorithm and DOM thinking