Clustering Deep Web Databases Semantically

摘要：

Deep Web database clustering is a key operation in organizing Deep Web resources.Cosine similarity in Vector Space Model (VSM) is used as thesimilarity computation in traditional ways.However it cannot denote the semantic similarity between the contents of two databases.In this paper how to cluster Deep Web databases semantically is discussed.Firstly,a fuzzy semanticmeasure,which integrates ontology and fuzzy set theory to compute semantic similarity between the visible features of two Deep Web forms,is proposed,and then a hybrid Panicle Swarm Optimization (PSO) algorithm is provided for Deep Web databases clustering.Finally the clustering results are evaluated according to Average Similarity of Document to the Cluster Centroid (ASDC) and Rand Index (RI).Experiments show that: I) the hybrid PSO approach has the higher ASDC values than those based on PSO and K-Means approaches.It means the hybrid PSO approach has the higher intra cluster similarity and lowest inter cluster similarity; 2) the clustering results based on fuzzy semantic similarity have higher ASDC values and higher RI values than those based on cosine similarity.It reflects the conclusion that the fuzzy semantic similarity approach can explore latent semantics.

关键词： Semantic Deep Web clustering Fuzzy set Ontology PSO K-Means

作者: Ling Song Jun Ma Po Yan Li Lian Dongmei Zhang

作者单位: School of Computer Science &Technology,Shandong University,250061.China;School of Computer Science & School of Computer Science &Technology,Shandong University,250061.China

会议类型: 国际会议

会议名称: 4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)

会议地点: 哈尔滨

会议语种:英文

页码: 365-376

在线出版日期: 2008-01-16（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Clustering Deep Web Databases Semantically