Research on Web information extraction based on Spider Algorithm and DOM thinking
The structure characteristics of the website is complicated, Web information structure is not fixed and not neat, so it is inefficient that the Web information is captured largely, the integration of Web information is very difficulty. Research Web information extraction technology, put forward and carry out a new method based on a spider algorithm and DOM thinking. Experimental results show that the method can extract information efficiently and accurately on the Web.
Web information extraction spider algorithm DOM tree website structure
Xinchao Han XiangDong Li Qiusheng Zheng
ZhongYuan University of Technology HeNan,China
国际会议
昆明
英文
182-185
2010-10-17(万方平台首次上网日期,不代表论文的发表时间)