A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering

摘要：

Fuzzy Kernel C-Means (FKCM) algorithm can improve accuracy significantly compared with classical Fuzzy C-Means algorithms for nonlinear separability,high dimension and clusters with overlaps in input space.Despite of these advantages,several features are subjected to the applications in real world such as local optimal,outliers,the c parameter must be assigned in advance and slow convergence speed.To overcome these disadvantages.Semi-Supervised learning and validity index are employed.Semi-Supervised learning uses limited labeled data to assistant a bulk of unlabeled data.It makes the FKCM avoid drawbacks proposed.The number of cluster will great affect clustering performance.It isnt possible to assume the optimal number of clusters especially to large text corps.Validity function makes it possible to determine the suitable number of cluster in clustering process.Sparse format,scatter and gathering strategy save considerable store space and computation time.Experimental results on the Reuters-21578 benchmark dataset demonstrate that the algorithm proposed is more flexibility and accuracy than the state-of-art FKCM.

关键词： Text clustering Semi-supervised Learning Fuzzy Kernel C-Means Kernel Validity Index

作者: Yingshun Yin Xiaobin Zhang Baojun Miao Lili Gao

作者单位: School of computer science,Xian polytechnic university,Shaanxi,China Schol of mathematical Science,Xuchang University,Henan,China

会议类型: 国际会议

会议名称: 4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)

会议地点: 哈尔滨

会议语种:英文

页码: 418-423

在线出版日期: 2008-01-16（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering