Automatic Cantonese POS Tagging with Information Transference
Cantonese enjoys great influence in several regions of China,Southeast Asia and many western countries.However,research on language processing of Cantonese is quite rare.In this work,we aim to build a Cantonese part-of-speech(POS)tagger based on the averaged perceptron algorithm.Meanwhile,we propose to take advantage of English POS information from Penn Treebank and Mandarin POS information from People”s Daily annotated corpus to help further improve the performance of the model.Our experiment results indicate the validity of such information transference.With the accuracy comparable to state-of-the-art Mandarin POS taggers,the model is also expected to be generalized to other text.We also anticipate our work could provide some insights to subsequent research on natural language processing of less popular languages,especially when available resources for these languages are limited.
Cantonese perceptron algorithm part-of-speech tagging information transference
Sihui Fu Shengyi Jiang Lindong Guo
School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou,China School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou,China
国内会议
北京
英文
561-564
2017-06-02(万方平台首次上网日期,不代表论文的发表时间)