WORD-LEVEL INFORMATION EXTRACTION FROM SCIENCE AND TECHNOLOGY ANNOUNCEMENTS CORPUS BASED ON CRF

摘要：

　　Conditional Random Field (CRF) has been applied widely in information extraction and natural language processing.However,according to corpus types,it has not been made much use of on corpus about science and technology declarations.In this paper,we extract word-level information from amounts of science and technology announcements corpus,and analyze the performance of CRF,comparing with Na(i)ve Bayes as a baseline.According to our experiments,we show that CRF has much high precision except for a few unknown data.Also,Naive Bayes model is satisfactory in closed domains,but it always makes mistakes when the data belong to a less weighted class.

关键词： Conditional random field Information extraction Word-level Science and technology corpus Na(i)ve bayes

作者: Yushu Cao Jun Wang Lei Li

作者单位: School of Engineering and Applied Science,University of Pennsylvania,Philadelphia 19104,US School of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876,China

会议类型: 国际会议

会议名称: 2012 2nd IEEE International Conference on Cloud Computing and Intelligence Systems (2012年第2届IEEE云计算与智能系统国际会议(IEEE CCIS2012))

会议地点: 杭州

会议语种:英文

页码: 2015-2019

在线出版日期: 2012-10-30（万方平台首次上网日期，不代表论文的发表时间）

会议专题

WORD-LEVEL INFORMATION EXTRACTION FROM SCIENCE AND TECHNOLOGY ANNOUNCEMENTS CORPUS BASED ON CRF