Research on the Limitations of Statistically-based NLP Models
This paper discusses the limitations of the statistically-based natural language processing (NLP) models from the perspective of linguistic theory by introducing and commenting on the working mechanism of the statistical language models (SLM) and the application cases. Firstly, it mentions the studies of the statistical structure of language under the influence of information theory, especially Chomskys demonstration that Markov-process-based finite state grammar (FSG) is not sufficient for the description of natural language. Then, a detailed demonstration of the mechanism and possible applying fields of SLM is given by discussing N-state grammar and its application in part-of-speech tagging. It discusses the recursion property of linguistic structure and the structure-dependent property of linguistic knowledge, and points out that recursive nested structures would upset the statistic regularity and the structure-dependency of linguistic knowledge would make the independence assumption, by which SLM can be realized, lose its effectiveness. Finally, it is suggested that the right track of NLP may be the combination of rule-based approach and statistically-based approach, because natural language is a miscellaneous system.
statistical models recursion structure-dependent property finite state grammar
GONG Zhiqi YIN Xiaojing
School of Foreign Languages, University of Jinan, Jinan, Shandong, P.R.China, 250022 School of Foreign Languages, Shandong Economic University, Jinan, Shandong, P.R.China, 250014
国际会议
威海
英文
252-259
2010-07-24(万方平台首次上网日期,不代表论文的发表时间)