Research on Asynchronous Communication-Oriented Page Searching

摘要：

Researches on asynchronous communication-oriented page searching aim at solving the new problems for search engine brought about by the adoption of asynchronous communication technology.At present,a full text search engine crawler mostly adopts the algorithm based on a hyperlink analysis.The crawler searches only the contents of the HTML page and ignores the codes in the script region.But it is through the script codes that asynchronous communication is realized.Since a great number of hyperlinks are hidden in the script region,it is necessary to improve the present search engine crawler to search the codes in the script region and extract the hyperlinks hidden in the script region.This paper proposes an approach,which,with the help of script code operation environment,takes advantage of the Windows message mechanism,and employs simulation clicking script function to extract hyperlinks.Meanwhile,in view of the problem that a feedback webpage is not integrated resulting from the asynchronous communication technology,this paper adopts a method that loads in the source page where hyperlinks locate and uses partial refreshing mechanism to save the refreshed page to solve the problem that information cannot be directly stored.

关键词： asynchronous communication search engine crawler

作者: Yulian Fei Min Wang Wenjuan Chen

作者单位: Zhejiang Gongshang University No.18 Xuezheng Str.,Xiasha University Town,Hangzhou,Zhejiang,China 310035

会议类型: 国际会议

会议名称: 4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)

会议地点: 哈尔滨

会议语种:英文

页码: 412-417

在线出版日期: 2008-01-16（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Research on Asynchronous Communication-Oriented Page Searching