A Distributed Web Crawler Model based on Cloud Computing

摘要：

　　With the rapid development of the network,distributed Web Crawler was introduced for fetching the massive web pages.However,the traditional distributed Web Crawler has disadvantages in load balancing between different nodes.In addition,the number of fetching web pages had not grown up linearly in the case of extended crawling nodes.This paper proposes a distributed web crawler model which runs on the Hadoop platform.The characteristics of Hadoop guarantees the scalability of the crawler model proposed by this paper.At the same time,the crawler model makes good use of HBase to guarantee the storage service of massive web context data.This paper also proposed a method of load balancing which is based on the feedback of crawling nodes.The crawler model has been proved to have good performance in load balancing and node extension.

关键词： web crawler cloud computing distributed cloud-based web crawler

作者: JIANKUN YU MENGRONG LI DENGYIN ZHANG

作者单位: College of Internet of things, Nanjing University of Posts and Communications, No.66, Xin Mofan Road College of Telecommunications&Information Engineering, Nanjing University of Posts and Communication

会议类型: 国际会议

会议名称: 2016 2nd International Conference on Mechanical, Electronic and Information Technology Engineering(2016机械、电子和信息技术国际会议)

会议地点: 重庆

会议语种:英文

页码: 276-279

在线出版日期: 2016-03-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Distributed Web Crawler Model based on Cloud Computing