VRPSOFC:A Framework for Focused Crawler Using Mutation Improving Particle Swarm Optimization Algorithm
The focused crawler is the key technology of the search en-gine.It filters webpages based on relevant algorithms until certain conditions are met.The current focused crawler is prone to topic-drift and low precision in the process of crawl-ing the webpages.Therefore,this paper proposes a focused crawler framework(VRPSOFC)based on mutation improv-ing particle swarm optimization.First of all,for each topic,VRPSOFC gets 3 different types of seed pages that are easy to generate large-scale web page aggregation based on the page click rate of Google search,which are official website,wikipedia,forum or video page.Then VRPSOFC uses the mutation improved particle swarm optimization algorithm proposed in this paper to crawl webpages,where each seed page will be used as the initial page.Finally,experiment in the real web environment and analyze the results.Compared with traditional VSM and other methods,VRPSOFC can obtain more accurate URL priority and crawl high quality web pages.Therefore,the topic crawler framework proposed in this paper is effective and important.
mutation particle swarm algorithm focused crawler topic-drift precision
Guangxia Xu Peng Jiang Chuang Ma Mahmoud Daneshmand Shaoci Xie
School of Software Engineering Chongqing University of Posts and Telecommunications Chongqing,China School of Software Engineering,Chongqing University of Posts and Telecommunications Chongqing,China School of Business Stevens Institute of Technology Hoboken,USA
国际会议
2019国图灵大会(ACM Turing Celebration conference-China 2019 )
成都
英文
339-345
2019-05-17(万方平台首次上网日期,不代表论文的发表时间)