STUDY ON AUTOMATIC TERM EXTRACTION BASED ON CRF MODEL FOR INFORMATION FIELD

摘要：

Automatic Domain Term Extraction (ADTE) has an important significance in natural language processing, and it is widely applied in information retrieval, information extraction, data mining, machine translation and other information processing fields. In this paper, an automatic domain term extraction method is proposed based on condition random fields (CRF). We treat domain terms extraction as a sequence labeling problem, and terms distribution characteristics as features of the CRF model. Then we used the CRF tool to train a template for the term extraction. Experimental results showed that the method is simple, with common domains, and good results were achieved. In the open test, the precision rate achieved was 73.24 %, recall rate was 69.57%, and F-measure was 71.36%.

关键词： Term eztraction CRF model Information entropy TF/IDF

作者: Meiying Jia Dequan Zheng Bingru Yang Jing Yang

作者单位: School of Information Engineering, University of Science and Technology Beijing, Beijing 100083, Chi MOE-MS Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology, Har School of Information Engineering, University of Science and Technology Beijing, Beijing 100083, Chi MOE-MS Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology, Har

会议类型: 国际会议

会议名称: China-Ireland International Conference on Information and Communications Technologies 2008(2008 中国-爱尔兰信息与通信技术国际会议 CIICT 2008)

会议地点: 北京

会议语种:英文

页码: 1-6

在线出版日期: 2008-09-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

STUDY ON AUTOMATIC TERM EXTRACTION BASED ON CRF MODEL FOR INFORMATION FIELD