Web Document Clustering Research Based on Granular Computing

摘要：

In this paper, a method of web document clustering based on granular computing (WDCGrc) is presented. The method computes the weight value of the words in documents by adopting the TF-IDF principle. Meanwhile, combinative ways defining documents threshold and average weight value are adopted to reduce dimensions and extract the keywords in each document. The paper establishes the transformation between the keywords in documents and the binary granules, and adopts the algorithm of association rules based on granular computing to obtain frequent itemsets between documents. Bring in the set theory thought, numbers of the same word between documents as the document similarity and the clustering result is obtained. The experiment shows that the method is practical and feasible, with good quality of clustering.

关键词： Granularcomputing Clustering Association rules Web documents

作者: Zheng Shangzhi Zhao Xiaolong Zhang Buqun Bu Hualong

作者单位: Department of Computer Science and Technology, Chaohu University Chaohu, P.R.C Wuhan University of T Department of Computer Science and Technology, Chaohu University Chaohu, P.R.C

会议类型: 国际会议

会议名称: Second International Symposium on Electronic Commerce and Security(第二届电子商务与安全国际研究大会)(ISECS 2009)

会议地点: 南昌

会议语种:英文

页码: 1102-1106

在线出版日期: 2009-05-22（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Web Document Clustering Research Based on Granular Computing