The Accurate Guidance for Image Caption Generation

摘要：

　　Image caption task has been focusing on generating a descriptive sentence for a certain image. In this work, we propose the accurate guidance for image caption generation, which guides the caption model to focus more on the principle semantic object while making human reading sentence, and generate high quality sentence in grammar. In particular, we replace the classification network with object detection network as the multi-level feature extracter to emphasize what human care about and avoid unnecessary model additions. Attention mechanism is utilized to align the feature of principle objects with words in the semantic sentence. Under these circumstances, we combine the object detection network and the text generation model together and it becomes an end-to-end model with less parameters. The experimental results on MSCOCO dataset show that our methods are on part with or even outperforms the current state-of-the-art.

关键词： Image caption Object detection Attention mechanism Deep learning

作者: Xinyuan Qi Zhiguo Cao Yang Xiao Jian Wang Chao Zhang

作者单位: National Key Lab of Science and Technology of Multispetral Information Processing,School of Automation,Huazhong University of Science and Technology,Wuhan 430074,Hubei,China

会议类型: 国际会议

会议名称: 中国模式识别与计算机视觉大会(PRCV2018)

会议地点: 广州

会议语种:英文

页码: 15-26

在线出版日期: 2018-11-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

The Accurate Guidance for Image Caption Generation