Euclidean-based Entity Resolution for Evolving Data
With large companies and corporations becoming increasingly responsible for data collection, in recent years, a growing number of scientists have proposed using a variety of algorithms and different theories to solve the database problem. Even though existing solutions are effective in many cases many, problems are left to solve during the integration of database. The entity resolution (ER) is a crucial problem to solve. ER has been used in many applications during the updating and loading process of the big data set, while the evolving data needs most. The evolving data set are currently used in the biology and computer information a lot, which contains microscope observation and biology information. Even though researchers have proposed different ER methods, the cost of ER problems is usually too large to accept. We use the high-dimensional space Euclidean vector to simulate the states of different entities in big data set. We combine this approach with the parallel improved Top-K algorithm, devising a way to more effectively detect the identity of the entity. Theoretical analysis and experimental results show that the proposed method could perform entity resolution on evolving data effectively and efficiently.
Entity Resolution Euclidean Vector Top-K
Chang Lu Hongzhi Wang Yan Zhang Hong Gao
Harbin Institute of Technology CS Department China,Harbin
国际会议
秦皇岛
英文
1547-1552
2015-09-18(万方平台首次上网日期,不代表论文的发表时间)