会议专题

Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation

  Large corpus-based embedding methods have received increasing attention for their exibility and effectiveness in many NLP tasks including Word Similarity(WS).However,these approaches rely on high-quality corpora and neglect the humans intelligence contained in semantic resources such as Tongyici Cilin and Hownet.This paper proposes a novel framework for measuring the Chinese word similarity by combining word embedding and Tongyici Cilin.We also utilize retrieval techniques to extend the contexts of word pairs and calculate the similarity scores to weakly supervise the selection of a better result.In the Chinese Lexical Similarity Computation(CLSC)shared task,we rank No.2 with the result of 0.457/0.455 of Spearman/Pearson rank correlation coecient.After the submission,we boost the embedding model by merging an English model into the Chinese one and learning the co-occurrence sequence via LSTM networks.Our nal results are 0.541/0.514,which outperform the state-of-the-art performance to the best of our knowledge.

Chinese word similarity Word embedding Semantic Lexicons LSTM networks

Jiahuan Pei Cong Zhang Degen Huang Jianjun Ma

School of Computer Science and Technology,Dalian University of Technology,Dalian Liaoning 116024,Chi School of Foreign Languages,Dalian University of Technology,Dalian Liaoning 116024,China

国际会议

第五届自然语言处理与中文计算会议(NLPCC-ICCPOL2016)

昆明

英文

1-12

2016-12-02(万方平台首次上网日期,不代表论文的发表时间)