Measuring the Semantic Stability of Word Embedding

摘要：

　　The techniques of word embedding have a wide range of applications in natural language processing(NLP).However,recent stud-ies have revealed that word embeddings have large amounts of instability,which affects the performance in downstream tasks and the applications in safety-critical fields such as medical diagnosis and financial analy-sis.Further researches have found that the popular metric of Nearest Neighbors Stability(NNS)is unreliable for qualitative conclusions on diachronic semantic matters,which means NNS cannot fully capture the semantic fluctuations of word vectors.To measure semantic stability more accurately,we propose a novel metric that combines the Nearest Senses Stability(NSS)and the Aligned Sense Stability(ASS).Moreover,previous studies on word embedding stability focus on static embedding models such as Word2vec and ignore the contextual embedding mod-els such as Bert.In this work,we propose the SPIP metric based on Pairwise Inner Product(PIP)loss to extend the stability study to con-textual embedding models.Finally,the experimental results demonstrate that CS and SPIP are effective in parameter configuration to minimize embedding instability without training downstream models,outperform-ing the state-of-the-art metric NNS.

关键词： Static word embeddings Contextual word embeddings Semantic stability

作者: Zhenhao Huang Chenxu Wang

作者单位: School of Software Engineering,Xi'an Jiaotong University,Xi'an,China School of Software Engineering,Xi'an Jiaotong University,Xi'an,China;MOE Key Lab of Intelligent Netw

会议类型: 国际会议

会议名称: 9th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2020)

会议地点: 郑州

会议语种:英文

页码: 1229-1241

在线出版日期: 2020-10-14（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Measuring the Semantic Stability of Word Embedding