Entropy-based Sequence Clustering Algorithm for Analyzing Software Fault Feature
Sequence clustering is significant for analyzing software fault The existing similarity measures of sequence clustering are inexact for clustering software fault. In this paper, a software fault feature clustering algorithm called ECA is proposed. In ECA the similarity of fault sequence is defined by global and local similarity measure (CLSM) which considers both the items contained in sequence and the order of items occurrence; The clusters are collected according to the entropy of sequences that is computed by global and local similarity. The sequence with the smallest entropy is selected as the centroid of each clustering, and then the clusters are obtained based on the largest similarity between the unselected sequence and the clustering centroid. The optimal number of clusters is determined by the average silhouette coefficient. In order to analyze the fault type, the sequences to be analyzed are matched to each cluster and classed into the most similar cluster. Experimental results show that ECA improves the precision of clustering and reduces the matching scope of the software fault feature.
software fault feature sequence entropy clustering
Yanyan Wang Jiadong Ren Jiaxin Liu Jiadong Ren Yanning Wang
College of Information Science and Engineering Yanshan University Qinhuangdao City, China School of Computer Science and Technology Beijing Institute of Technology Beijing City, China College of Sciences Yanshan University Qinhuangdao City, China
国际会议
成都
英文
793-797
2010-12-17(万方平台首次上网日期,不代表论文的发表时间)