Automatic Cantonese POS Tagging with Information Transference

摘要：

　　Cantonese enjoys great influence in several regions of China,Southeast Asia and many western countries.However,research on language processing of Cantonese is quite rare.In this work,we aim to build a Cantonese part-of-speech(POS)tagger based on the averaged perceptron algorithm.Meanwhile,we propose to take advantage of English POS information from Penn Treebank and Mandarin POS information from People”s Daily annotated corpus to help further improve the performance of the model.Our experiment results indicate the validity of such information transference.With the accuracy comparable to state-of-the-art Mandarin POS taggers,the model is also expected to be generalized to other text.We also anticipate our work could provide some insights to subsequent research on natural language processing of less popular languages,especially when available resources for these languages are limited.

关键词： Cantonese perceptron algorithm part-of-speech tagging information transference

作者: Sihui Fu Shengyi Jiang Lindong Guo

作者单位: School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou,China School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou,China

会议类型: 国内会议

会议名称: 第21届全球华人计算机教育应用大会

会议地点: 北京

会议语种:英文

页码: 561-564

在线出版日期: 2017-06-02（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Automatic Cantonese POS Tagging with Information Transference