会议专题

DFSSM Based Web Text Clustering Algorithm

For complex type data, because of both complexity and great quantity of data, the representation and management are very difficult. A key challenge of data mining is to tackling the problem of mining richly structured datasets such as Web pages. In this paper, we propose a generalized representation method for complex type data. Then we propose a Web text clustering algorithm (WTCA) based on it. The algorithm includes the training stage of SOM and the clustering stage. It can distinguish the most meaningful features from the Concept Space without the evaluation function. We applied the algorithm to the Chinese Modern Long-distance Education Network, and compared our work with some popular clustering algorithms. The experimental results show that the average accuracy of WTCA is better than that of the other three algorithms.

Data Mining Web text mining clustering analysis SOM

Rong Qian Shiyuan Zhang Kejun Zhang

Department of Computer Science,Beijing Electronic Science and Technology Institute,Beijing 100070,China

国际会议

2011 3rd International Conference on Computer and Network Technology(ICCNT 2011)(2011第三届IEEE计算机与网络技术国际会议)

太原

英文

565-570

2011-02-26(万方平台首次上网日期,不代表论文的发表时间)