Reducing Human Effort in Named Entity Corpus Construction Based on Ensemble Learning and Annotation Categorization
Annotated named entity corpora play a significant role in many natural language processing applications.However,annotation by humans is time-consuming and costly.In this paper,we propose a high recall pre-annotator which combines multiple existing named entity taggers based on ensemble learning,to reduce the number of annotations that humans have to add.In addition,annotations are categorized into normal annotations and candidate annotations based on their estimated confidence,to reduce the number of human corrective actions as well as the total annotation time.The experiment results show that our ap-proach outperforms the baseline methods in reduction of annotation time without loss in annotation performance(in terms of F-measure).
Corpus Construction Named Entity Recognition Assisted Annotation Ensemble Learning
Tingming Lu Man Zhu Zhiqiang Gao
Key Lab of Computer Network and Information Integration(Southeast University),Ministry of Education, School of Computer Science and Technology,Nanjing University of Posts and Telecommunications,China
国际会议
第五届自然语言处理与中文计算会议(NLPCC-ICCPOL2016)
昆明
英文
1-12
2016-12-02(万方平台首次上网日期,不代表论文的发表时间)