Measuring the Semantic Stability of Word Embedding
The techniques of word embedding have a wide range of applications in natural language processing(NLP).However,recent stud-ies have revealed that word embeddings have large amounts of instability,which affects the performance in downstream tasks and the applications in safety-critical fields such as medical diagnosis and financial analy-sis.Further researches have found that the popular metric of Nearest Neighbors Stability(NNS)is unreliable for qualitative conclusions on diachronic semantic matters,which means NNS cannot fully capture the semantic fluctuations of word vectors.To measure semantic stability more accurately,we propose a novel metric that combines the Nearest Senses Stability(NSS)and the Aligned Sense Stability(ASS).Moreover,previous studies on word embedding stability focus on static embedding models such as Word2vec and ignore the contextual embedding mod-els such as Bert.In this work,we propose the SPIP metric based on Pairwise Inner Product(PIP)loss to extend the stability study to con-textual embedding models.Finally,the experimental results demonstrate that CS and SPIP are effective in parameter configuration to minimize embedding instability without training downstream models,outperform-ing the state-of-the-art metric NNS.
Static word embeddings Contextual word embeddings Semantic stability
Zhenhao Huang Chenxu Wang
School of Software Engineering,Xi'an Jiaotong University,Xi'an,China School of Software Engineering,Xi'an Jiaotong University,Xi'an,China;MOE Key Lab of Intelligent Netw
国际会议
9th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC 2020)
郑州
英文
1229-1241
2020-10-14(万方平台首次上网日期,不代表论文的发表时间)