Reputation-based Contents Crawling in Web Archiving System

摘要：

The size of the web archive is increasing exponentially, many national libraries are making efforts to preserve born-digital scientific, artistic and cultural contents. However, in order to crawl and store huge volume of digital information, it is very hard to resolve various problems from the social, legal and technical view points. In this paper, from the view points of long-term preserving digital contents with good reputation of trustiness, uniqueness and valuation, we discuss strategies to preserve monotonously increasing digital contents on web servers. According to experimental results of our reputation model, it makes possible to crawl socially valuable contents for archiving.

关键词： Web Archive Web Crawling Reputation Management

作者: Hiroyuki Kawano

作者单位: Nanzan University, Aichi 4890863

会议类型: 国际会议

会议名称: The Seventh International Symposium(ISORA08)(第七届国际效力研究及其应用学术会议)

会议地点: 云南丽江

会议语种:英文

页码: 317-324

在线出版日期: 2008-10-31（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Reputation-based Contents Crawling in Web Archiving System