Matrix: A Large-scale Data Mining System for Structured Data

摘要：

We present the design and implementation of Matrix, a hybrid data mining system for manipulating massive structured data. Parallel relational database (PKDB) and MapReduce (MR) are two common technologies used for data mining, each having respective advantages and limitations. Matrix aims to inherit the best of both worlds: the efficiency of PRDB for handling structured data and the scalability of MR for parallel computing. Specifically, we address two key challenges in the system design and implementation, including (a) the storage scaling for massive structured data while guaranteeing query performance, (b) the parallel computing model over PRDB to ensure the scalability. The performance of Matrix has been evaluated through extensive experiments. Our results demonstrate that such a hybrid system can provide scalable and efficient data mining service for largescale structured data.

关键词： component MapReduce PRDB Data Mining Parallel Computing

作者: Fuhan Chen HaoYin

作者单位: Computer Science & Technology Tsinghua National Laboratory of Information Science and Technology Beijing, China

会议类型: 国际会议

会议名称: 2011 International Conference on Database and Data Mining(ICDDM 2011)(2011年数据库和数据挖掘国际会议)

会议地点: 三亚

会议语种:英文

页码: 176-180

在线出版日期: 2011-03-25（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Matrix: A Large-scale Data Mining System for Structured Data