A NEW ALGORITHM FOR TEXT CLUSTERING BASED ON PROJECTION PURSUIT
Vector Space Model (VSM) is usually used to express text features in text mining with huge dimension, but it can not show the structure of the text set obviously and costs much in computing.A new pursuit projection based text clustering algorithm is proposed.With minimizing (or maximizing) a projecting index, Projection Pursuit searches for an optimal projection direction and projects text feature vectors from high-dimensional into low-dimensional ( 1 to 3 dimensions ) space.The linear and non-linear structures and features of the original high-dimensional data can be expressed by its projection weights in the optimal projection direction.The optimal projection direction is looked for by genetic algorithm, and the distribution of texts can be visualized.Pursuit projection based text clustering does not need to set cluster number previously like in k-means clustering, and opens out non-linear structure not like in latent semantics analysis only discovering linear structure.Experiments demonstrated that this algorithm is effective to cluster texts.
Text clustering Projection pursuit Dimension reduction Genetic algorithm
MAO-TINC GAO ZHENG-OU WANG
Computer Science Department, Shanghai Maritime University, Shanghai 200135, China Institute of Systems Engineering,Tianjin University, Tianjin 300072, China
国际会议
2007 International Conference on Machine Learning and Cybernetics(IEEE第六届机器学习与控制论国际会议)
香港
英文
3401-3405
2007-08-19(万方平台首次上网日期,不代表论文的发表时间)