STATISTIC-BASED CHINESE ORGANIZATION NAME RECOGNITION

摘要：

Organization name is a kind of frequently occurring but ever-changing proper nouns in texts. Chinese organization name recognition is a non-trivial task in named entity recognition (NER). Comparing with other entities such as person and location, Chinese organization name is the most difficult to be identified. Currently statistic-based approach for automatic NER is widely studied. In this paper, we try to make clear several puzzling problems of statistic-based Chinese organization name recognition and propose experimental conclusions. Whether the encoding scheme in the recognition system by classification approach affects the performance and how much? Should we build one identification model for all different named entities or one-for-each? Or whether Chinese organization name recognition after person and location identification outperforms the parallel approach or not? Which is better, word-based or character-based Chinese organization recognition? Our conclusions are drawn on corpora of SIGHAN Bakeoff datasets for NER.

关键词： Chinese Organization Name Recognition Conditional Random Fields Model Feature

作者: Ying Qin Xiaojie Wang Yixin Zhong

作者单位: National Research Centre for Foreign Language Education, Beijing Foreign Studies University, Beijing Beijing University of Posts and Telecommunications, Beijing 100876, China

会议类型: 国际会议

会议名称: China-Ireland International Conference on Information and Communications Technologies 2008(2008 中国-爱尔兰信息与通信技术国际会议 CIICT 2008)

会议地点: 北京

会议语种:英文

页码: 1-5

在线出版日期: 2008-09-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

STATISTIC-BASED CHINESE ORGANIZATION NAME RECOGNITION