A hybrid Approach for Arabic Multi-Word Term Eztraction

摘要：

Building a domain model from a specialized corpus requires identifying candidate terms. It also includes identifying semantic relations between terms. Once this model is constructed it can be used for many tasks of information retrieval. In this process, multi-word terms have a great importance. In the one hand they constitute domain relevant candidate terms. On the other hand syntactic relations that link their constituents can be used to infer semantic relations between terms. In this paper we propose to extract mutli-word terms from Arabic specialized corpora. The proposed approach uses linguistic rules based on morphological features and POS (Part Of Speech) tags to parse documents and retrieve candidate terms. Statistical measures are used to deal with ambiguities generated by the linguistic tools and to rank candidate terms according to their relevance. We present experiments on a corpus from the environment domain. We report high quality results that are confirm the targets set for the precision metric.

关键词： Arabic language processing morpho-syntactic parsing multi-word terms terminology eztraction.

作者: Ibrahim BOUNHAS Yahya SLIMANI

作者单位: Department of Computer Science, Faculty of Sciences of Tunis, University of Tunis 1060, Tunis, Tunisia

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-8

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A hybrid Approach for Arabic Multi-Word Term Eztraction