Hadoop-HBase for Large-Scale Data
Today we are inundated with digital data. Yet we are very poor in managing and processing it It is becoming increasingly difficult to store and analyze data efficiently and economically via conventional database management tools. Not only that, type of data, appearing in the databases, are also changing. Now a day, binary large objects are a standard integral part of any database. Researchers, all over the globe, are baffling with analysis of these ultra large databases. Apache HBase is one such attempt HBase is a noSQL distributed database developed on top of Hadoop Distributed File System (HDFS). In this paper, we present an evaluation of hybrid architecture where HDFS contains the non-textual data like images and location of such data is stored in HBase. This hybrid architecture enables faster search and retrieval of the data which is a growing need in any organization who are flooded with data. The paper aims at evaluating the performance of random reads and random writes of data storage location information to HBase and retrieving and storing data in HDFS respectively. We also present a comparative study of HBase-HDFS architecture with MySQL-HDFS architecture.
large-scale data distributed storage Hadoop HDFS Map Reduce HBase noSQL database
Mehul Nalin Vora
Innovation Labs, PERC Tata Consultancy Services (TCS) Ltd. Mumbai, India
国际会议
哈尔滨
英文
601-605
2011-12-24(万方平台首次上网日期,不代表论文的发表时间)