THE OPTIMIZATION OF HDFS BASED ON SMALL FILES
HDFS is a distributed file system which can process large amounts of data effectively through large clusters, the HADOOP framework which is based on it has been widely used in various clusters to build large scale, high performance systems. However, HDFS is designed to handle large files and suffers performance penalty while dealing with large number of small files. There are many companies focus on cloud storage areas today, such as Amazons s3 which provide data hosting. With the rapid development of Internet, users may be more tend to store their data and programs in the cloud computing platform in the future, the personal data has an obvious feature—most of them is small files, so HDFS can not meet this demand. In this article, we optimize the HDFS I/O feature based on small files, the basic idea is let one block save many small files and let the datanode save some meta-data of small files in it’s memory. The experiment shows that our design can provide a better performance.
HADOOP HDFS small files I/O
Liu Jiang Bing Li Meina Song
Beijing University of Posts and Telecommunications, Beijing, China
国际会议
北京
英文
912-915
2010-10-26(万方平台首次上网日期,不代表论文的发表时间)