会议专题

A No-Word-Segmentation Hierarchical Clustering Approach to Chinese Web Search Results

In this paper,we present a No-Word-Segmentation Hierarchical Clustering Approach (NWSHCA) to Chinese Web search results.The approach uses a new similarity measure between two documents based on a variation of the Edit Distance,and then it generates preliminary clusters using a partitioning clustering method.Next it ranks all common substring in a cluster using a cluster-discriminative metric with the top K as cluster description labels.Finally it uses HAC to cluster the top K cluster labels to form a navigational tree.NWSHCA can generate overlapping clusters contrast to most clustering algorithms.Experimental results show that the approach is feasible and effective.

hierarchical clustering Chinese Web search results no-word segmentation Edit Distance

Hui Zhang Liping Zhao Rui Liu Deqing Wang

State Key Lab.Of Software Development Environment,Beihang University,100083

国际会议

4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)

哈尔滨

英文

573-577

2008-01-16(万方平台首次上网日期,不代表论文的发表时间)