A Multilevel and Domain-Independent Duplicate Detection Model for Scientific Database

摘要：

The duplicate detection is one of technical difficulties in data clean ing area. At present, the data volume of scientific database is increasing rapidly, bringing new challenges to the duplicate detection. In the scientific database, the duplicate detection model should be suitable for massive and numerical data, should independent from the domains, should well consider the relation ships among tables, and should focus on common grounds of the scientific database. In the paper, a multilevel duplicate detection model for scientific data base is proposed, which consider numerical data and general usage well. Firstly, the challenges are propose by analyzing duplicaterelated characteristics of sci entific data; Secondly, similarity measure of the proposed model are defined; Then the details of multilevel detecting algorithms are introduced; At last, some experiments and applications show that the proposed model is more domain independent and effective, suitable for duplicate detection in scientific database.

作者: Jie Song Yubin Bao Ge Yu

作者单位: Northeastern University, Shenyang 110004, China

会议类型: 国际会议

会议名称: 11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)

会议地点: 九寨沟

会议语种:英文

页码: 729-741

在线出版日期: 2010-07-14（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Multilevel and Domain-Independent Duplicate Detection Model for Scientific Database