会议专题

Exploiting Rich Features for Chinese Named Entity Recognition

In this paper we design a multiple features template includes basic features, prefixes and suffixed features, dictionary features and combined features for Chinese named entity recognizer CRF model-based. We do a pre-processing procedure such as pos tag, chunk dictionary-based first. Then for dictionary features, different proportion of dictionaries are used in training and testing, which is different from the work reported in the literature, especially to person name dictionary, location name dictionary and organization name dictionary. For these three named entity dictionaries, the training dictionaries are just a part of the testing dictionaries. Empirical results show that the multiple features template is comprehensive and different proportion of some dictionaries used in training and testing improve performance significantly. Our final system achieved the F-measure of 91.27% at MSRA testing corpus, which is even better than the SIGHAN 2006 at the same testing corpus.

Jianping Shen Xuan Wang Shaofeng Li Lin Yao

Computer Application Research Center, Harbin Institute of Technology Shenzhen Graduate School Shenzhen, China, 518055

国际会议

The 2010 International Conference on Intelligent Systems and Knowledge Engineering(第五届智能系统与知识工程国际会议)

杭州

英文

278-282

2010-11-15(万方平台首次上网日期,不代表论文的发表时间)