Which Embedding Level is Better for Semantic Representation? An Empirical Research on Chinese Phrases
Word embeddings have been used as popular features in various Natural Language Processing(NLP)tasks.To overcome the coverage problem of statistics,compositional model is proposed,which embeds basic units of a language,and compose structures of higher hierarchy,like idiom,phrase,and named entity.In that case,selecting the right level of basic-unit embedding to represent semantics of higher hierarchy unit is crucial.This paper investigates this problem by Chinese phrase representation task,in which language characters and words are viewed as basic units.We define a phrase representation evaluation tasks by utilizing Wikipedia.We propose four intuitionistic composing methods from basic embedding to higher level representation,and investigate the performance of the two basic units.Empirical results show that with all composing methods,word embedding out performs character embedding on both tasks,which indicates that word level is more suitable for composing semantic representation.
Word embedding Phrase representation Composing model
Kunyuan Pang Jintao Tang Ting Wang
College of Computer,National University of Defense Technology Changsha,Hunan410073,Peoples Republic of China
国际会议
2018自然语言处理与中文计算国际会议(NLPCC2018)
呼和浩特
英文
54-66
2018-08-26(万方平台首次上网日期,不代表论文的发表时间)