Web Document Clustering Research Based on Granular Computing
In this paper, a method of web document clustering based on granular computing (WDCGrc) is presented. The method computes the weight value of the words in documents by adopting the TF-IDF principle. Meanwhile, combinative ways defining documents threshold and average weight value are adopted to reduce dimensions and extract the keywords in each document. The paper establishes the transformation between the keywords in documents and the binary granules, and adopts the algorithm of association rules based on granular computing to obtain frequent itemsets between documents. Bring in the set theory thought, numbers of the same word between documents as the document similarity and the clustering result is obtained. The experiment shows that the method is practical and feasible, with good quality of clustering.
Granularcomputing Clustering Association rules Web documents
Zheng Shangzhi Zhao Xiaolong Zhang Buqun Bu Hualong
Department of Computer Science and Technology, Chaohu University Chaohu, P.R.C Wuhan University of T Department of Computer Science and Technology, Chaohu University Chaohu, P.R.C
国际会议
Second International Symposium on Electronic Commerce and Security(第二届电子商务与安全国际研究大会)(ISECS 2009)
南昌
英文
1102-1106
2009-05-22(万方平台首次上网日期,不代表论文的发表时间)