Research on Asynchronous Communication-Oriented Page Searching
Researches on asynchronous communication-oriented page searching aim at solving the new problems for search engine brought about by the adoption of asynchronous communication technology.At present,a full text search engine crawler mostly adopts the algorithm based on a hyperlink analysis.The crawler searches only the contents of the HTML page and ignores the codes in the script region.But it is through the script codes that asynchronous communication is realized.Since a great number of hyperlinks are hidden in the script region,it is necessary to improve the present search engine crawler to search the codes in the script region and extract the hyperlinks hidden in the script region.This paper proposes an approach,which,with the help of script code operation environment,takes advantage of the Windows message mechanism,and employs simulation clicking script function to extract hyperlinks.Meanwhile,in view of the problem that a feedback webpage is not integrated resulting from the asynchronous communication technology,this paper adopts a method that loads in the source page where hyperlinks locate and uses partial refreshing mechanism to save the refreshed page to solve the problem that information cannot be directly stored.
asynchronous communication search engine crawler
Yulian Fei Min Wang Wenjuan Chen
Zhejiang Gongshang University No.18 Xuezheng Str.,Xiasha University Town,Hangzhou,Zhejiang,China 310035
国际会议
4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)
哈尔滨
英文
412-417
2008-01-16(万方平台首次上网日期,不代表论文的发表时间)