A NEW COLLOCATION EXTRACTION METHOD COMBINING MULTIPLE ASSOCIATION MEASURES

摘要：

As an important linguistic resource, collocation represents a significant relation between words. Automatic collocation extraction is very important for many natural language processing applications, such as word sense disambiguation, machine translation and information retrieval etc. While traditional collocation extraction approaches use only one single statistical measure, they may not be optimal in that they can not take advantage of multiple statistical measures. In this paper, we propose a logistic linear regression model (LLRM) that combines five classical lexical association measures: x2-test, t-test, co-occurrence frequency, log-likelihood ratio and mutual information. Experiments show that our approach leads to a significant performance improvement in comparison with individual basic methods in both precision and recall.

关键词： Collocation Co-occurrence frequency X2-test T-test Mutual information Log-likelihood ratio

作者: JIAN-FANG LIN SHENG LI YUHAN CAI

作者单位: MOE-MS Key Laboratory of NLP & Speech, Harbin Institute of Technology, Harbin 150001, China Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA, 98195

会议类型: 国际会议

会议名称: 2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

会议地点: 昆明

会议语种:英文

页码: 12-17

在线出版日期: 2008-07-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A NEW COLLOCATION EXTRACTION METHOD COMBINING MULTIPLE ASSOCIATION MEASURES