Hadoop-HBase for Large-Scale Data

摘要：

Today we are inundated with digital data. Yet we are very poor in managing and processing it It is becoming increasingly difficult to store and analyze data efficiently and economically via conventional database management tools. Not only that, type of data, appearing in the databases, are also changing. Now a day, binary large objects are a standard integral part of any database. Researchers, all over the globe, are baffling with analysis of these ultra large databases. Apache HBase is one such attempt HBase is a noSQL distributed database developed on top of Hadoop Distributed File System (HDFS). In this paper, we present an evaluation of hybrid architecture where HDFS contains the non-textual data like images and location of such data is stored in HBase. This hybrid architecture enables faster search and retrieval of the data which is a growing need in any organization who are flooded with data. The paper aims at evaluating the performance of random reads and random writes of data storage location information to HBase and retrieving and storing data in HDFS respectively. We also present a comparative study of HBase-HDFS architecture with MySQL-HDFS architecture.

关键词： large-scale data distributed storage Hadoop HDFS Map Reduce HBase noSQL database

作者: Mehul Nalin Vora

作者单位: Innovation Labs, PERC Tata Consultancy Services (TCS) Ltd. Mumbai, India

会议类型: 国际会议

会议名称: 2011 International Conference on Computer Science and Network Technology(2011计算机科学与网络技术国际会议 ICCSNT 2011)

会议地点: 哈尔滨

会议语种:英文

页码: 601-605

在线出版日期: 2011-12-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Hadoop-HBase for Large-Scale Data