会议专题

Domain-Independent Classification for Deep Web Interfaces

The data sources of Deep Web provide tremendous structured data with high quality. However, classifying these interfaces with domain independ ent is required since the domains of the huge scale of deep web are hard to pre define. In this paper, we propose a novel approach with three-stage to solve this problem. First, we extract both texts and structure of a query interface by apply ing FIE algorithm we proposed. Then we construct frequent itemsets by using frequent pattern mining algorithm. Finally, we apply AP clustering algorithm to cluster the frequent itemsets according to similarity measure FGSTD presented in this paper. The experiment demonstrates our approach clusters interfaces well with domain independent.

frequent itemset clustering domain classification Deep Web

Yingjun Li Siwei Wang Derong Shen Tiezheng Nie Ge Yu

College of Information Science and Engineering, Northeastern University, 110004 Shenyang, China

国际会议

11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)

九寨沟

英文

453-458

2010-07-14(万方平台首次上网日期,不代表论文的发表时间)