STUDY OF CLUSTERING ALGORITHM BASED ON MODEL DATA

摘要：

Clustering technique is a key tool in data mining and pattern recognition.Usually, objects for some traditional clustering algorithms are expressed in the form of vectors, which consist of some components to be described as features.However, objects in real tasks may be some models which are clustered other than data points, for example neural networks, decision trees, support vector machines, etc.This paper studies the clustering algorithm based on model data.By denning the extended measure, clustering methods are studied for the abstract data objects.Framework of clustering algorithm for models is presented.To validate the effectiveness of models clustering algorithm, we choose the hierarchical model clustering algorithm in the experiments.Models in clustering algorithm are BP(Back Propagation) neural networks and learning method is BP algorithm.Measures are chosen as both same-fault measure and double-fault measure for pairwise of models.Distances between clusters are the single link and the complete link, respectively.By this way, we may obtain part of neural network models which are from each cluster and improve diversity of neural network models.Then, part of models is ensembled.Moreover, we also study the relations between the number of clusters in clustering analysis, the size of ensemble learning, and performance of ensemble learning by experiments.Experimental results show that performance of ensemble learning by choosing part of models using clustering of models is improved.

关键词： Model clustering Measure space Validation of clustering Diversity

作者: KAI LI LI-JUAN CUI

作者单位: School of Mathematic and computer, HeBei University, Baoding 071002, China Library, Hebei University, Baoding 071002, China

会议类型: 国际会议

会议名称: 2007 International Conference on Machine Learning and Cybernetics(IEEE第六届机器学习与控制论国际会议)

会议地点: 香港

会议语种:英文

页码: 3961-3964

在线出版日期: 2007-08-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

STUDY OF CLUSTERING ALGORITHM BASED ON MODEL DATA