会议专题

A Combined Method for Automatic Domain-Specific Terminology Extraction

In this paper we present a Terminology extraction algorithm combining with machine learning and corpus-based statistical model. We collect a balanced corpus with all the possible nominal terms of every domain annotated, and take this corpus as training corpus. After selecting training features for terms, we use SVTVI to recognize terminological candidates in target corpus. Then we calculate the Domain Relevance (DR) and Domain Consensus (DC) scores for the terminological candidates to acquire domain-specific Terminologies. We make 4 experiments on Tourism corpus and short sentences with two kinds of balanced training corpora. Furthermore, we evaluate the precision and recall of our Terminology extraction algorithm by comparing the words in a golden standard with the words extracted by our system. The experiments show that our algorithm can get improved result in automatic extraction of nominal domain-specific Terminologies. A detailed analysis shows the advantages and disadvantages of our algorithm.

Terminology SVM GATE domain relevance domain consensus

Li Liu Quan Qi

School of Computer Science and Technology Beijing Institute of Technology China, Beijing

国际会议

2011 Eighth International Conference on Fuzzy System and Knowledge Discovery(第八届模糊系统与知识发现国际会议 FSKD 2011)

上海

英文

1784-1787

2011-07-26(万方平台首次上网日期,不代表论文的发表时间)