Question Answering with Character-Level LSTM Encoders and Model-Based Data Augmentation

摘要：

　　This paper presents a character-level encoder-decoder mod-eling method for question answering(QA)from large-scale knowledge bases(KB).This method improves the existing approach ”9” from three aspects.First,long short-term memory(LSTM)structures are adopted to replace the convolutional neural networks(CNN)for encoding the can-didate entities and predicates.Second,a new strategy of generating neg-ative samples for model training is adopted.Third,a data augmentation strategy is applied to increase the size of the training set by generating factoid questions using another trained encoder-decoder model.Experi-mental results on the SimpleQuestions dataset and the Freebase5M KB demonstrates the effectiveness of the proposed method,which improves the state-of-the-art accuracy from 70.3%to 78.8%when augmenting the training set with 70,000 generated triple-question pairs.

关键词： Question Answering Knowledge Base Long Short-TermMemory Encoder-Decoder

作者: Run-Ze Wang Chen-Di Zhan Zhen-Hua Ling

作者单位: National Engineering Laboratory for Speech and Language Information Processing,University of Science and Technology of China,Hefei,China

会议类型: 国内会议

会议名称: 第十六届全国计算语言学学术会议暨第五届基于自然标注大数据的自然语言处理国际学术研讨会

会议地点: 南京

会议语种:英文

页码: 1-11

在线出版日期: 2017-10-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Question Answering with Character-Level LSTM Encoders and Model-Based Data Augmentation