会议专题

EIF: A Framework of Effective Entity Identification

Entity identification, that is to build corresponding relationships between objects and entities in dirty data, plays an important role in data cleaning. The confusion between entities and their names often results in dirty data. That is, different entities may share the identical name and different names may correspond to the identical entity. Therefore, the major task of entity identification is to distinguish entities sharing the same name and recognize different names referring to the same entity. However, current research focuses on only one aspect and cannot solve the problem completely. To address this problem, in this paper, ElF, a framework of entity identification with the consideration of the both kinds of confusions, is proposed. With effective clustering techniques, approximate string matching algorithms and a flexible mechanism of knowledge integration, EIF can be widely used to solve many different kinds of entity identification problems. In this paper, as an application of ElF, we solved the author identification problem. The effectiveness of this framework is verified by extensive experiments.

entity identification data cleaning graph partition

Lingli Li Hongzhi Wang Hong Gao Jianzhong Li

Department of Computer Science and Engineering, Harbin Institute of Technology, China

国际会议

11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)

九寨沟

英文

717-728

2010-07-14(万方平台首次上网日期,不代表论文的发表时间)