A Multilevel and Domain-Independent Duplicate Detection Model for Scientific Database
The duplicate detection is one of technical difficulties in data clean ing area. At present, the data volume of scientific database is increasing rapidly, bringing new challenges to the duplicate detection. In the scientific database, the duplicate detection model should be suitable for massive and numerical data, should independent from the domains, should well consider the relation ships among tables, and should focus on common grounds of the scientific database. In the paper, a multilevel duplicate detection model for scientific data base is proposed, which consider numerical data and general usage well. Firstly, the challenges are propose by analyzing duplicaterelated characteristics of sci entific data; Secondly, similarity measure of the proposed model are defined; Then the details of multilevel detecting algorithms are introduced; At last, some experiments and applications show that the proposed model is more domain independent and effective, suitable for duplicate detection in scientific database.
Jie Song Yubin Bao Ge Yu
Northeastern University, Shenyang 110004, China
国际会议
11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)
九寨沟
英文
729-741
2010-07-14(万方平台首次上网日期,不代表论文的发表时间)