会议专题

Topic-Constrained Hierarchical Clustering for Document Datasets

In this paper, we propose the topic-constrained hierarchical clustering, which organizes document datasets into hierarchical trees con-sistant with a given set of topics. The proposed algorithm is based on a constrained agglomerative clustering framework and a semi-supervised criterion function that emphasizes the relationship between documents and topics and the relationship among documents themselves simultaneously. The experimental evaluation show that our algorithm outperformed the traditional agglomerative algorithm by 7.8% to 11.4%.

Constrained hierarchical clustering Semi-supervised learning Criterion functions

Ying Zhao

Department of Computer Science and Technology Tsinghua University Beijing China 100084

国际会议

6th International Conference on Advanced Data Mining and Applications(第六届先进数据挖掘及应用国际会议 ADMA 2010)

重庆

英文

181-192

2010-11-19(万方平台首次上网日期,不代表论文的发表时间)