Improving Word Embeddings for Low Frequency Words by Pseudo Contexts

(0)

摘要：

　　This paper investigates relations between word semantic den-sity and word frequency.A distributed representations based word av-erage similarity is defined as the measure of word semantic density.We find that the average similarities of low frequency words are always big-ger than that of high frequency words,when the frequency approaches to 400 around,the average similarity tends to stable.The finding keeps cor-rect with changes of the size of training corpus,dimension of distributed representations and number of negative samples in skip-gram model.It also keeps on 17 different languages.Basing on the finding,we propose a pseudo context skip-gram model,which makes use of context words of semantic nearest neighbors of target words.Experiment results show our model achieves significant performance improvements in both word similarity and analogy tasks.

关键词： Word Embedding Low Freuqcy Word

作者: Fang Li Xiaojie Wang

作者单位: School of Computer,Beijing University of Posts and Telecommunications,Beijing,China

会议类型: 国内会议

会议名称: 第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会

会议地点: 南京

会议语种:英文

页码: 1-11

在线出版日期: 2017-10-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Improving Word Embeddings for Low Frequency Words by Pseudo Contexts