Exploring the Data-Driven Prediction of Prepositions in English

摘要：

Prepositions in English are a well-known challenge for language learners, and the computational analysis of preposition us-age has attracted significant attention. Such research generally starts out by de-veloping models of preposition usage for native English based on a range of fea-tures, from shallow surface evidence to deep linguistically-informed properties. While we agree that ultimately a com-bination of shallow and deep features is needed to balance the preciseness of ex-emplars with the usefulness of generaliza-tions to avoid data sparsity, in this paper we explore the limits of a purely surface-based prediction of prepositions. Using a web-as-corpus approach, we in-vestigate the classification based solely on the relative number of occurrences for tar-get n-grams varying in preposition usage. We show that such a surface-based ap-proach is competitive with the published state-of-the-art results relying on complex feature sets. Where enough data is available, in a sur-prising number of cases it thus is possible to obtain sufficient information from the relatively narrow window of context pro-vided by n-grams which are small enough to frequently occur but large enough to contain enough predictive information about preposition usage.

作者: Anas Elghafari Detmar Meurers Holger Wunsch

作者单位: Seminar fur SprachwissenschaftUniversit¨ at T¨ ubingen Seminar fur Sprachwissenschaft Universit¨ at T¨ ubingen

会议类型: 国际会议

会议名称: The 23rd International Conference on Computational Linguistics(第23届国际计算语言学大会)

会议地点: 北京

会议语种:英文

页码: 267-275

在线出版日期: 2010-08-01（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Exploring the Data-Driven Prediction of Prepositions in English