Auto-POS Templates and Mixed Metrics for Recognizing Terms in Scientific literature
Automatic Term Recognition (ATR) is an important task for Knowledge Acquisition, which aims at acquiring formalized words which are not recorded in time in the glossary. In recent years, several statistical methods has proved to be effective, and emerging methods such as C-value, NC-Value, Term Extractor has shown great advantages on this task. However, few works have been done on the Metric mixing algorithm that combines those metrics as a whole. In this paper, we first collect part-of-speech templates from already-known terms automatically, namely Auto-POS templates, instead of artificial regular expressions, and then we match them with POS strings to acquire candidate terms. Finally we sort those candidates by metric mixing algorithm. Experimental results on IEEE2006-2007 metadata show that the metric mixing algorithm performs better than any separate metrics alone.
Automatic Term Recognition Knowledge Acquisition Information Extraction Text Mining
Hongliang You Wei Zhang Junyi Shen Yang Yu Ting Liu
School of Electric and Information Engineering, XianJiaotong Univ., Xi an, 710049, China IT Labor IT Laboratory, Beijing Document Service Beijing, 100142, China School of Electric and Information Engineering, XianJiaotong Univ., Xi an, 710049, China Information Retrieval Laboratory, Harbin institute of Technology, Harbin, 150001, China
国际会议
2010 Third International Symposium on Knowledge Acquisition and Modeling(第三届知识获取与建模国际研讨会 KAN 2010)
武汉
英文
84-87
2010-10-20(万方平台首次上网日期,不代表论文的发表时间)