会议专题

Automatic Cantonese POS Tagging with Information Transference

  Cantonese enjoys great influence in several regions of China,Southeast Asia and many western countries.However,research on language processing of Cantonese is quite rare.In this work,we aim to build a Cantonese part-of-speech(POS)tagger based on the averaged perceptron algorithm.Meanwhile,we propose to take advantage of English POS information from Penn Treebank and Mandarin POS information from People”s Daily annotated corpus to help further improve the performance of the model.Our experiment results indicate the validity of such information transference.With the accuracy comparable to state-of-the-art Mandarin POS taggers,the model is also expected to be generalized to other text.We also anticipate our work could provide some insights to subsequent research on natural language processing of less popular languages,especially when available resources for these languages are limited.

Cantonese perceptron algorithm part-of-speech tagging information transference

Sihui Fu Shengyi Jiang Lindong Guo

School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou,China School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou,China

国内会议

第21届全球华人计算机教育应用大会

北京

英文

561-564

2017-06-02(万方平台首次上网日期,不代表论文的发表时间)