Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation

摘要：

　　Large corpus-based embedding methods have received increasing attention for their exibility and effectiveness in many NLP tasks including Word Similarity(WS).However,these approaches rely on high-quality corpora and neglect the humans intelligence contained in semantic resources such as Tongyici Cilin and Hownet.This paper proposes a novel framework for measuring the Chinese word similarity by combining word embedding and Tongyici Cilin.We also utilize retrieval techniques to extend the contexts of word pairs and calculate the similarity scores to weakly supervise the selection of a better result.In the Chinese Lexical Similarity Computation(CLSC)shared task,we rank No.2 with the result of 0.457/0.455 of Spearman/Pearson rank correlation coecient.After the submission,we boost the embedding model by merging an English model into the Chinese one and learning the co-occurrence sequence via LSTM networks.Our nal results are 0.541/0.514,which outperform the state-of-the-art performance to the best of our knowledge.

关键词： Chinese word similarity Word embedding Semantic Lexicons LSTM networks

作者: Jiahuan Pei Cong Zhang Degen Huang Jianjun Ma

作者单位: School of Computer Science and Technology,Dalian University of Technology,Dalian Liaoning 116024,Chi School of Foreign Languages,Dalian University of Technology,Dalian Liaoning 116024,China

会议类型: 国际会议

会议名称: 第五届自然语言处理与中文计算会议(NLPCC-ICCPOL2016)

会议地点: 昆明

会议语种:英文

页码: 1-12

在线出版日期: 2016-12-02（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation