会议专题

Policy Multi-Region Integration for Video Description

  As a bridge between video frames and natural text,automatic video description technology can be widely used in real life.The main point of this task is to represent the dynamic information of the video through a compact vector.Most recent progress is to combine the attention mechanism through the encoder-decoder architecture.This paper introduces a novel Policy based Multi-Region Attention Model(PMRAM)that is capable of integrating information from mul-tiple local regions in a video frame by adaptively learning a location policy.And combining temporal attention mechanism to selectively focus on regions under different frames to obtain a compact vector.Our model provides a natural way to fuse temporal-spatial informa-tion for video description.While the model is non-differentiable,it can be trained using reinforcement learning methods.We evaluate our approach on two large-scale benchmark datasets:MSVD and TACoS-MultiLevel.Our approach outperforms the current state-of-art on both datasets according to BLEU,METEOR and CIDEr metrics.

video description PMRAM attention mechanism reinforcement learning

Junxian Ye Le Dong Wenpu Dong Ning Feng Ning Zhang

University of Electronic Science and Technology of China Cheng Du,Si Chuan

国际会议

2019国图灵大会(ACM Turing Celebration conference-China 2019 )

成都

英文

163-167

2019-05-17(万方平台首次上网日期,不代表论文的发表时间)