会议专题

Convolutional LSTM Based Video Object Detection

  The state-of-the-art performance for object detection has been significantly improved over the past two years. Despite the effectiveness on still images, something stands in the way of transferring the powerful detection networks to videos object detection. In this work, we present a fast and accurate framework for video object detection that incorporates temporal and contextual information using convolutional LSTM 27. Moreover, an Encoder-Decoder module is made up based on the convolutional LSTM to predict the feature map. It is an endto- end learning framework and is general and flexible when combining with still-image detection networks. It achieves significant improvement on both speed and accuracy. Our method significantly improves upon strong single-frame baselines in ImageNet VID 21, especially for more challenging moving objects at high speed.

Video object detection Convolutional LSTM Encoder-Decoder module

Xiao Wang Xiaohua Xie Jianhuang Lai

School of Data and Computer Science,Sun Yat-sen University,Guangzhou,China;Guangdong Key Laboratory of Information Security Technology,Guangzhou,China;Key Laboratory of Machine Intelligence and Advanced Computing of the Ministry of Education,Guangzhou,China

国际会议

中国模式识别与计算机视觉大会(PRCV2018)

广州

英文

99-109

2018-11-23(万方平台首次上网日期,不代表论文的发表时间)