Learning Decision Trees from Distributed Datasets
Decision trees are an important data mining tool with many applications.Like many classification techniques,decision trees process the entire database in order to produce a generalization of the data that can be used subsequently for classification.Distributed databases are not amenable to such a global approach to generalization.This paper describes architecture of decision trees induction from distributed datasets which includes configuration manager retrieval data from distributed data,pruning data,and partial decision trees and data integration.In retrieval data,we explore a general strategy for explores a general strategy transforming traditional machine learning algorithms into algorithms for learning from distributed data;then we devise a pruning algorithms to optimal the data retrieval;finally we integrate the distributed sub-result data into final decision trees.
Decision Trees Data Retrieval Pruning Data Integration
Xie Hongxia Shi Liping Meng Fanrong Wang Chun
School of Computer Science and Technology,China University of Mining and Technology,Xuzhou Jiangsu,2 School of Information and Electrical Engineering,China University of Mining and Technology Xuzhou,Ji School of Computer Science and Technology,China University of Mining and Technology,Xuzhou Jiangsu,2 SINOPEC Pipeline Storage & Transportation Corporation,Xuzhou,Jiangsu,221000,China
国际会议
大连
英文
96-100
2008-07-27(万方平台首次上网日期,不代表论文的发表时间)