A Topic Model of Observing Chinese Characters

摘要：

The Topic Models are a class of hierarchical statistical models for analyzing document collections and it has become one of the most used techniques in Natural Language Processing in the recent years. It assumes that each document could be expressed as a mixture of topics and each topic could be characterized by a distribution over words. In previous research 6, like in English language, Topic Models for Chinese Language use the words as observing data. In this research, we demonstrated the effectiveness of using Chinese characters as the basic units of observing data. The comparisons with those models based on Chinese words and English words are presented.

作者: Yunkai Zhang Zengchang Qin

作者单位: College of Software Beihang University Beijing,China,100191 Intelligent Computing and Machine Learning Lab School of Automation Science and Electrical Engineeri

会议类型: 国际会议

会议名称: 2010 Second International Conference on Intelligent Human-Machine Systems and Cybernetics(第二届智能人机系统与控制论国际学术会议 IHMSC 2010)

会议地点: 南京

会议语种:英文

页码: 346-349

在线出版日期: 2010-08-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Topic Model of Observing Chinese Characters