Domain-Independent Classification for Deep Web Interfaces

摘要：

The data sources of Deep Web provide tremendous structured data with high quality. However, classifying these interfaces with domain independ ent is required since the domains of the huge scale of deep web are hard to pre define. In this paper, we propose a novel approach with three-stage to solve this problem. First, we extract both texts and structure of a query interface by apply ing FIE algorithm we proposed. Then we construct frequent itemsets by using frequent pattern mining algorithm. Finally, we apply AP clustering algorithm to cluster the frequent itemsets according to similarity measure FGSTD presented in this paper. The experiment demonstrates our approach clusters interfaces well with domain independent.

关键词： frequent itemset clustering domain classification Deep Web

作者: Yingjun Li Siwei Wang Derong Shen Tiezheng Nie Ge Yu

作者单位: College of Information Science and Engineering, Northeastern University, 110004 Shenyang, China

会议类型: 国际会议

会议名称: 11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)

会议地点: 九寨沟

会议语种:英文

页码: 453-458

在线出版日期: 2010-07-14（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Domain-Independent Classification for Deep Web Interfaces