Using Active Learning to Improve Distantly Supervised Entity Typing in Multi-source Knowledge Bases
Entity typing in the knowledge base is an essential task for constructing a knowledge base.Previous models mainly rely on manu-ally annotated data or distant supervision.However,human annotation is expensive and distantly supervised data suffers from label noise prob-lem.In addition,it suffers from semantic heterogeneity problem in the multi-source knowledge base.To address these issues,we propose to use an active learning method to improve distantly supervised entity typing in the multi-source knowledge base,which aims to combine the benefits of human annotation for difficult instances with the coverage of a large dis-tantly supervised data.However,existing active learning criteria do not consider the label noise and semantic heterogeneity problems,resulting in much of annotation effort wasted on useless instances.In this paper,we develop a novel active learning pipeline framework to tackle the most difficult instances.Specifically,we first propose a noise reduction method to re-annotate the most difficult instances in distantly supervised data.Then we propose a data augmentation method to annotate the most dif-ficult instances in unlabeled data.We propose two novel selection criteria to find the most difficult instances in different phases,respectively.More-over,we propose a hybrid annotation strategy to reduce human labeling effort.Experimental results show the effectiveness of our method.
Bo Xu Xiangsan Zhao Qingxuan Kong
School of Computer Science and Technology,Donghua University,Shanghai,China Glorious Sun School of Business and Management,Donghua University,Shanghai,China
国际会议
9th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2020)
郑州
英文
219-231
2020-10-14(万方平台首次上网日期,不代表论文的发表时间)