会议专题

Euclidean-based Entity Resolution for Evolving Data

  With large companies and corporations becoming increasingly responsible for data collection, in recent years, a growing number of scientists have proposed using a variety of algorithms and different theories to solve the database problem. Even though existing solutions are effective in many cases many, problems are left to solve during the integration of database. The entity resolution (ER) is a crucial problem to solve. ER has been used in many applications during the updating and loading process of the big data set, while the evolving data needs most. The evolving data set are currently used in the biology and computer information a lot, which contains microscope observation and biology information. Even though researchers have proposed different ER methods, the cost of ER problems is usually too large to accept. We use the high-dimensional space Euclidean vector to simulate the states of different entities in big data set. We combine this approach with the parallel improved Top-K algorithm, devising a way to more effectively detect the identity of the entity. Theoretical analysis and experimental results show that the proposed method could perform entity resolution on evolving data effectively and efficiently.

Entity Resolution Euclidean Vector Top-K

Chang Lu Hongzhi Wang Yan Zhang Hong Gao

Harbin Institute of Technology CS Department China,Harbin

国际会议

2015 Fifth International Conference on Instrumentation and Measurement,Computer,Communication and Control (IMCCC2015)(第五届仪器测量、计算机通信与控制国际会议)

秦皇岛

英文

1547-1552

2015-09-18(万方平台首次上网日期,不代表论文的发表时间)