Convolutional LSTM Based Video Object Detection
The state-of-the-art performance for object detection has been significantly improved over the past two years. Despite the effectiveness on still images, something stands in the way of transferring the powerful detection networks to videos object detection. In this work, we present a fast and accurate framework for video object detection that incorporates temporal and contextual information using convolutional LSTM 27. Moreover, an Encoder-Decoder module is made up based on the convolutional LSTM to predict the feature map. It is an endto- end learning framework and is general and flexible when combining with still-image detection networks. It achieves significant improvement on both speed and accuracy. Our method significantly improves upon strong single-frame baselines in ImageNet VID 21, especially for more challenging moving objects at high speed.
Video object detection Convolutional LSTM Encoder-Decoder module
Xiao Wang Xiaohua Xie Jianhuang Lai
School of Data and Computer Science,Sun Yat-sen University,Guangzhou,China;Guangdong Key Laboratory of Information Security Technology,Guangzhou,China;Key Laboratory of Machine Intelligence and Advanced Computing of the Ministry of Education,Guangzhou,China
国际会议
广州
英文
99-109
2018-11-23(万方平台首次上网日期,不代表论文的发表时间)