A Combined Method for Automatic Domain-Specific Terminology Extraction

摘要：

In this paper we present a Terminology extraction algorithm combining with machine learning and corpus-based statistical model. We collect a balanced corpus with all the possible nominal terms of every domain annotated, and take this corpus as training corpus. After selecting training features for terms, we use SVTVI to recognize terminological candidates in target corpus. Then we calculate the Domain Relevance (DR) and Domain Consensus (DC) scores for the terminological candidates to acquire domain-specific Terminologies. We make 4 experiments on Tourism corpus and short sentences with two kinds of balanced training corpora. Furthermore, we evaluate the precision and recall of our Terminology extraction algorithm by comparing the words in a golden standard with the words extracted by our system. The experiments show that our algorithm can get improved result in automatic extraction of nominal domain-specific Terminologies. A detailed analysis shows the advantages and disadvantages of our algorithm.

关键词： Terminology SVM GATE domain relevance domain consensus

作者: Li Liu Quan Qi

作者单位: School of Computer Science and Technology Beijing Institute of Technology China, Beijing

会议类型: 国际会议

会议名称: 2011 Eighth International Conference on Fuzzy System and Knowledge Discovery(第八届模糊系统与知识发现国际会议 FSKD 2011)

会议地点: 上海

会议语种:英文

页码: 1784-1787

在线出版日期: 2011-07-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Combined Method for Automatic Domain-Specific Terminology Extraction