Matrix: A Large-scale Data Mining System for Structured Data
We present the design and implementation of Matrix, a hybrid data mining system for manipulating massive structured data. Parallel relational database (PKDB) and MapReduce (MR) are two common technologies used for data mining, each having respective advantages and limitations. Matrix aims to inherit the best of both worlds: the efficiency of PRDB for handling structured data and the scalability of MR for parallel computing. Specifically, we address two key challenges in the system design and implementation, including (a) the storage scaling for massive structured data while guaranteeing query performance, (b) the parallel computing model over PRDB to ensure the scalability. The performance of Matrix has been evaluated through extensive experiments. Our results demonstrate that such a hybrid system can provide scalable and efficient data mining service for largescale structured data.
component MapReduce PRDB Data Mining Parallel Computing
Fuhan Chen HaoYin
Computer Science & Technology Tsinghua National Laboratory of Information Science and Technology Beijing, China
国际会议
2011 International Conference on Database and Data Mining(ICDDM 2011)(2011年数据库和数据挖掘国际会议)
三亚
英文
176-180
2011-03-25(万方平台首次上网日期,不代表论文的发表时间)