THE OPTIMIZATION OF HDFS BASED ON SMALL FILES

摘要：

HDFS is a distributed file system which can process large amounts of data effectively through large clusters, the HADOOP framework which is based on it has been widely used in various clusters to build large scale, high performance systems. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. There are many companies focus on cloud storage areas today, such as Amazons s3 which provide data hosting. With the rapid development of Internet, users may be more tend to store their data and programs in the cloud computing platform in the future, the personal data has an obvious feature—most of them is small files, so HDFS can not meet this demand. In this article, we optimize the HDFS I/O feature based on small files, the basic idea is let one block save many small files and let the datanode save some meta-data of small files in it’s memory. The experiment shows that our design can provide a better performance.

关键词： HADOOP HDFS small files I/O

作者: Liu Jiang Bing Li Meina Song

作者单位: Beijing University of Posts and Telecommunications, Beijing, China

会议类型: 国际会议

会议名称: 2010 3rd IEEE International Conference on Broadband Network & Multimedia Technology(2010年第三届IEEE宽带网络与多媒体国际会议 IC-BNMT 2010)

会议地点: 北京

会议语种:英文

页码: 912-915

在线出版日期: 2010-10-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

THE OPTIMIZATION OF HDFS BASED ON SMALL FILES