Using Active Learning to Improve Distantly Supervised Entity Typing in Multi-source Knowledge Bases

摘要：

　　Entity typing in the knowledge base is an essential task for constructing a knowledge base.Previous models mainly rely on manu-ally annotated data or distant supervision.However,human annotation is expensive and distantly supervised data suffers from label noise prob-lem.In addition,it suffers from semantic heterogeneity problem in the multi-source knowledge base.To address these issues,we propose to use an active learning method to improve distantly supervised entity typing in the multi-source knowledge base,which aims to combine the benefits of human annotation for difficult instances with the coverage of a large dis-tantly supervised data.However,existing active learning criteria do not consider the label noise and semantic heterogeneity problems,resulting in much of annotation effort wasted on useless instances.In this paper,we develop a novel active learning pipeline framework to tackle the most difficult instances.Specifically,we first propose a noise reduction method to re-annotate the most difficult instances in distantly supervised data.Then we propose a data augmentation method to annotate the most dif-ficult instances in unlabeled data.We propose two novel selection criteria to find the most difficult instances in different phases,respectively.More-over,we propose a hybrid annotation strategy to reduce human labeling effort.Experimental results show the effectiveness of our method.

作者: Bo Xu Xiangsan Zhao Qingxuan Kong

作者单位: School of Computer Science and Technology,Donghua University,Shanghai,China Glorious Sun School of Business and Management,Donghua University,Shanghai,China

会议类型: 国际会议

会议名称: 9th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2020)

会议地点: 郑州

会议语种:英文

页码: 219-231

在线出版日期: 2020-10-14（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Using Active Learning to Improve Distantly Supervised Entity Typing in Multi-source Knowledge Bases