会议专题

A Multilevel and Domain-Independent Duplicate Detection Model for Scientific Database

The duplicate detection is one of technical difficulties in data clean ing area. At present, the data volume of scientific database is increasing rapidly, bringing new challenges to the duplicate detection. In the scientific database, the duplicate detection model should be suitable for massive and numerical data, should independent from the domains, should well consider the relation ships among tables, and should focus on common grounds of the scientific database. In the paper, a multilevel duplicate detection model for scientific data base is proposed, which consider numerical data and general usage well. Firstly, the challenges are propose by analyzing duplicaterelated characteristics of sci entific data; Secondly, similarity measure of the proposed model are defined; Then the details of multilevel detecting algorithms are introduced; At last, some experiments and applications show that the proposed model is more domain independent and effective, suitable for duplicate detection in scientific database.

Jie Song Yubin Bao Ge Yu

Northeastern University, Shenyang 110004, China

国际会议

11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)

九寨沟

英文

729-741

2010-07-14(万方平台首次上网日期,不代表论文的发表时间)