THE DESIGN AND IMPLEMENTATION OF THE CRAWLER-INAR
This paper discusses the design and implementation of a web crawler-Inar written in C++ executed on Linux. It is a single-threaded crawler base on asynchronous I/O technology.It is under development now. This paper describes the architecture of the web crawler and discusses the design and the function of its each component in detail. For some design problems that we met in practice, such as URL queues design,hash algorithm design, we proposed our solution.
Crawler single thread asynchronous I/O web
YU-XIN DING XIAO-LONG WANG LE-BIN LIN QI ZHANG YONG-HUI WU
Department of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen 518055, China
国际会议
2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)
大连
英文
4527-4530
2006-08-13(万方平台首次上网日期,不代表论文的发表时间)