A No-Word-Segmentation Hierarchical Clustering Approach to Chinese Web Search Results
In this paper,we present a No-Word-Segmentation Hierarchical Clustering Approach (NWSHCA) to Chinese Web search results.The approach uses a new similarity measure between two documents based on a variation of the Edit Distance,and then it generates preliminary clusters using a partitioning clustering method.Next it ranks all common substring in a cluster using a cluster-discriminative metric with the top K as cluster description labels.Finally it uses HAC to cluster the top K cluster labels to form a navigational tree.NWSHCA can generate overlapping clusters contrast to most clustering algorithms.Experimental results show that the approach is feasible and effective.
hierarchical clustering Chinese Web search results no-word segmentation Edit Distance
Hui Zhang Liping Zhao Rui Liu Deqing Wang
State Key Lab.Of Software Development Environment,Beihang University,100083
国际会议
4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)
哈尔滨
英文
573-577
2008-01-16(万方平台首次上网日期,不代表论文的发表时间)