Paraphrase Identification Based on Weighted URAE,Unit Similarity and Context Correlation Feature

摘要：

　　A deep learning model adaptive to both sentence-level and articlelevel paraphrase identification is proposed in this paper.It consists of pairwise unit similarity feature and semantic context correlation feature.In this model,sentences are represented by word and phrase embedding while articles are represented by sentence embedding.Those phrase and sentence embedding are learned from parse trees through Weighted Unfolding Recursive Autoencoders(WURAE),an unsupervised learning algorithm.Then,unit similarity matrix is calculated by matching the pairwise lists of embedding.It is used to extract the pairwise unit similarity feature through CNN and k-max pooling layers.In addition,semantic context correlation feature is taken into account,which is captured by the combination of CNN and LSTM.CNN layers learn collocation information between adjacent units while LSTM extracts the long-term dependency feature of the text based on the output of CNN.This model is experimented on a famous English sentence paraphrase corpus,MSRPC,and a Chinese article paraphrase corpus.The results show that the deep semantic feature of text could be extracted based on WURAE,unit similarity and context correlation feature.We release our code of WURAE,deep learning model for paraphrase identification and pre-trained phrase end sentence embedding data for use by the community.

关键词： Paraphrase identification Recursive Autoencoders Phrase embedding Sentence embedding Deep learning Semantic feature

作者: Jie Zhou Gongshen Liu Huanrong Sun

作者单位: School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University,Shanghai,C SJTU-Shanghai Songheng Information Content Analysis Joint Lab.,Shanghai,China

会议类型: 国际会议

会议名称: 2018自然语言处理与中文计算国际会议(NLPCC2018)

会议地点: 呼和浩特

会议语种:英文

页码: 41-53

在线出版日期: 2018-08-26（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Paraphrase Identification Based on Weighted URAE,Unit Similarity and Context Correlation Feature