Detection of Entity Mixture in Knowledge Bases Using Hierarchical Clustering

摘要：

　　Entity mixture in a knowledge base refers to the situation that some attributes of an entity are mistaken for another entitys,and it often occurs among homonymous entities which have the same value of the attribute “Name.Elimination of entity mixture is critical to ensure data accuracy and validity for knowledge based services.However,current researches on entity disambiguation mainly focuses on determining the identity of entities mentioned in text during information extraction for building a knowledge base,while little work has been done to verify the information in a built knowledge base.In this paper,we propose a generic method to detect mixed homonymous entities in a knowledge base using hierarchical clustering.The principle of our methodology to differentiate entities is detecting the inconsistence of their attributes based on analysis of the appearance distribution of their attribute values in documents of a common corpus.Experiments on a data set of industry applications have been conducted to demonstrate the workflow of performing the clustering and detecting mixed entities in a knowledge base using our methodology.

关键词： Entity Entity Mixture Hierarchical Clustering Knowledge Base Knowledge Graph Homonymous Entities Triple

作者: Haihua Xie Xiaoqing Lu Zhi Tang Xiaojun Huang

作者单位: Institute of Computer Science & Technology,Peking University,Beijing,China;State Key Laboratory of D Institute of Computer Science & Technology Peking University Beijing,China,100871 Department of Knowledge Service Technology Beijing Founder Apabi Technology Limited Beijing,China,10

会议类型: 国际会议

会议名称: 第五届自然语言处理与中文计算会议(NLPCC-ICCPOL2016)

会议地点: 昆明

会议语种:英文

页码: 1-12

在线出版日期: 2016-12-02（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Detection of Entity Mixture in Knowledge Bases Using Hierarchical Clustering