Topic-Constrained Hierarchical Clustering for Document Datasets
In this paper, we propose the topic-constrained hierarchical clustering, which organizes document datasets into hierarchical trees con-sistant with a given set of topics. The proposed algorithm is based on a constrained agglomerative clustering framework and a semi-supervised criterion function that emphasizes the relationship between documents and topics and the relationship among documents themselves simultaneously. The experimental evaluation show that our algorithm outperformed the traditional agglomerative algorithm by 7.8% to 11.4%.
Constrained hierarchical clustering Semi-supervised learning Criterion functions
Ying Zhao
Department of Computer Science and Technology Tsinghua University Beijing China 100084
国际会议
6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)
重庆
英文
181-192
2010-11-19(万方平台首次上网日期,不代表论文的发表时间)