Inductive data mining: automatic generation of decision trees from data for QSAR modelling and process historical data analysis

摘要：

A new inductive data mining method for automatic generation of decision trees from data (GPTree) is presented.Compared with other decision tree induction techniques that are based upon recursive partitioning employing greedy searches to choose the best splitting attribute and value at each node therefore will necessarily miss regions of the search space,GPTree can overcome the problem.In addition,the approach is extended to a new method (YAdapt) that models the original continuous endpoint by adaptively finding suitable ranges to describe the endpoints during the tree induction process,removing the need for discretization prior to tree induction and allowing the ordinal nature of the endpoint to be taken into account in the models built.A strategy for further improving the predictive performance for previously unseen data is investigated that uses multiple decisions trees,i.e.a decision forest,and a majority voting strategy to give predictions (GPForest).The methods were applied to QSAR (quantitative structure – activity relationships) modeling for eco-toxicity prediction of chemicals and to the analysis of a historical database for a wastewater treatment plant.

关键词： inductive data mining decision trees genetic programming QSAR process historical data analysis.

作者: Chao Y Ma Frances V Buontempo Xue Z Wang

作者单位: Institute of Particle Science and Engineering,School of Process,Environmental and Materials Engineering,University of Leeds,Leeds LS2 9JT,UK

会议类型: 国际会议

会议名称: International Conference on Modelling,Identification and Control(模拟、鉴定、控制国际会议)

会议地点: 上海

会议语种:英文

在线出版日期: 2008-06-29（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Inductive data mining: automatic generation of decision trees from data for QSAR modelling and process historical data analysis