Auto-POS Templates and Mixed Metrics for Recognizing Terms in Scientific literature

摘要：

Automatic Term Recognition (ATR) is an important task for Knowledge Acquisition, which aims at acquiring formalized words which are not recorded in time in the glossary. In recent years, several statistical methods has proved to be effective, and emerging methods such as C-value, NC-Value, Term Extractor has shown great advantages on this task. However, few works have been done on the Metric mixing algorithm that combines those metrics as a whole. In this paper, we first collect part-of-speech templates from already-known terms automatically, namely Auto-POS templates, instead of artificial regular expressions, and then we match them with POS strings to acquire candidate terms. Finally we sort those candidates by metric mixing algorithm. Experimental results on IEEE2006-2007 metadata show that the metric mixing algorithm performs better than any separate metrics alone.

关键词： Automatic Term Recognition Knowledge Acquisition Information Extraction Text Mining

作者: Hongliang You Wei Zhang Junyi Shen Yang Yu Ting Liu

作者单位: School of Electric and Information Engineering, XianJiaotong Univ., Xi an, 710049, China IT Labor IT Laboratory, Beijing Document Service Beijing, 100142, China School of Electric and Information Engineering, XianJiaotong Univ., Xi an, 710049, China Information Retrieval Laboratory, Harbin institute of Technology, Harbin, 150001, China

会议类型: 国际会议

会议名称: 2010 Third International Symposium on Knowledge Acquisition and Modeling(第三届知识获取与建模国际研讨会 KAN 2010)

会议地点: 武汉

会议语种:英文

页码: 84-87

在线出版日期: 2010-10-20（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Auto-POS Templates and Mixed Metrics for Recognizing Terms in Scientific literature