Data Mining of Mass Storage based on Cloud Computing
Cloud computing is an elastic computing model that the users can lease the resources from the rentable infrastructure. Cloud computing is gaining popularity due to its lower cost, high reliability and huge availability.To utilize the powerful and huge capability of cloud computing, this paper is to import it into data mining and machine learning field. As one of the most influential and open competition in machine learning area, Netflix Prize attached with mass storage had driven thousands of teams across the world to attack the problem, among which the final winner was BellKor’s Pragmatic Chaos team, who bested Netflix’s own algorithm for predicting ratings by 10%. Their solution is an ensemble of a large number of models, each of which specializes in addressing a different aspect of the data. Among such different models, k-nearest neighbors (KNN) and Restricted Boltzmann Machine (RBM) are reported to be two most important and successful models. As a result, we build two predictors based on such two model respectively with the order to testify their performance based on cloud computing platforms. The results show that KNN can achieve root mean square deviation (rmse) with 0.9468 after the Global Effect (GE) data preprocessing, which is better than the Cinematch’s performance with rmse being 0:951. The rmse for RBM algorithm is about 0.9670 on the raw dataset, which can be further improved by KNN model.
Cloud Computing Mass Storage Data Mining
Jianzong Wang Jiguang Wan Zhuo Liu Peng Wang
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China Wuhan National Laboratory for Optoelectronics, Wuhan 430073, China
国际会议
The Ninth International Conference on Grid and Cloud Computing(第九届网格与云计算国际学术会议 GCC 2010)
南京
英文
426-431
2010-11-01(万方平台首次上网日期,不代表论文的发表时间)