A Combined Method for Automatic Domain-Specific Terminology Extraction
In this paper we present a Terminology extraction algorithm combining with machine learning and corpus-based statistical model. We collect a balanced corpus with all the possible nominal terms of every domain annotated, and take this corpus as training corpus. After selecting training features for terms, we use SVTVI to recognize terminological candidates in target corpus. Then we calculate the Domain Relevance (DR) and Domain Consensus (DC) scores for the terminological candidates to acquire domain-specific Terminologies. We make 4 experiments on Tourism corpus and short sentences with two kinds of balanced training corpora. Furthermore, we evaluate the precision and recall of our Terminology extraction algorithm by comparing the words in a golden standard with the words extracted by our system. The experiments show that our algorithm can get improved result in automatic extraction of nominal domain-specific Terminologies. A detailed analysis shows the advantages and disadvantages of our algorithm.
Terminology SVM GATE domain relevance domain consensus
Li Liu Quan Qi
School of Computer Science and Technology Beijing Institute of Technology China, Beijing
国际会议
上海
英文
1784-1787
2011-07-26(万方平台首次上网日期,不代表论文的发表时间)