Policy Multi-Region Integration for Video Description

摘要：

　　As a bridge between video frames and natural text,automatic video description technology can be widely used in real life.The main point of this task is to represent the dynamic information of the video through a compact vector.Most recent progress is to combine the attention mechanism through the encoder-decoder architecture.This paper introduces a novel Policy based Multi-Region Attention Model(PMRAM)that is capable of integrating information from mul-tiple local regions in a video frame by adaptively learning a location policy.And combining temporal attention mechanism to selectively focus on regions under different frames to obtain a compact vector.Our model provides a natural way to fuse temporal-spatial informa-tion for video description.While the model is non-differentiable,it can be trained using reinforcement learning methods.We evaluate our approach on two large-scale benchmark datasets:MSVD and TACoS-MultiLevel.Our approach outperforms the current state-of-art on both datasets according to BLEU,METEOR and CIDEr metrics.

关键词： video description PMRAM attention mechanism reinforcement learning

作者: Junxian Ye Le Dong Wenpu Dong Ning Feng Ning Zhang

作者单位: University of Electronic Science and Technology of China Cheng Du,Si Chuan

会议类型: 国际会议

会议名称: 2019国图灵大会(ACM Turing Celebration conference-China 2019 )

会议地点: 成都

会议语种:英文

页码: 163-167

在线出版日期: 2019-05-17（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Policy Multi-Region Integration for Video Description