会议专题

Learning Decision Trees from Distributed Datasets

Decision trees are an important data mining tool with many applications.Like many classification techniques,decision trees process the entire database in order to produce a generalization of the data that can be used subsequently for classification.Distributed databases are not amenable to such a global approach to generalization.This paper describes architecture of decision trees induction from distributed datasets which includes configuration manager retrieval data from distributed data,pruning data,and partial decision trees and data integration.In retrieval data,we explore a general strategy for explores a general strategy transforming traditional machine learning algorithms into algorithms for learning from distributed data;then we devise a pruning algorithms to optimal the data retrieval;finally we integrate the distributed sub-result data into final decision trees.

Decision Trees Data Retrieval Pruning Data Integration

Xie Hongxia Shi Liping Meng Fanrong Wang Chun

School of Computer Science and Technology,China University of Mining and Technology,Xuzhou Jiangsu,2 School of Information and Electrical Engineering,China University of Mining and Technology Xuzhou,Ji School of Computer Science and Technology,China University of Mining and Technology,Xuzhou Jiangsu,2 SINOPEC Pipeline Storage & Transportation Corporation,Xuzhou,Jiangsu,221000,China

国际会议

2008年国际电子商务、工程及科学领域的分布式计算和应用学术研讨会(2008 International Symposium on Distributed Computing and Applications for Business Engineering and Science)

大连

英文

96-100

2008-07-27(万方平台首次上网日期,不代表论文的发表时间)