Convolutional LSTM Based Video Object Detection

摘要：

　　The state-of-the-art performance for object detection has been significantly improved over the past two years. Despite the effectiveness on still images, something stands in the way of transferring the powerful detection networks to videos object detection. In this work, we present a fast and accurate framework for video object detection that incorporates temporal and contextual information using convolutional LSTM 27. Moreover, an Encoder-Decoder module is made up based on the convolutional LSTM to predict the feature map. It is an endto- end learning framework and is general and flexible when combining with still-image detection networks. It achieves significant improvement on both speed and accuracy. Our method significantly improves upon strong single-frame baselines in ImageNet VID 21, especially for more challenging moving objects at high speed.

关键词： Video object detection Convolutional LSTM Encoder-Decoder module

作者: Xiao Wang Xiaohua Xie Jianhuang Lai

作者单位: School of Data and Computer Science,Sun Yat-sen University,Guangzhou,China;Guangdong Key Laboratory of Information Security Technology,Guangzhou,China;Key Laboratory of Machine Intelligence and Advanced Computing of the Ministry of Education,Guangzhou,China

会议类型: 国际会议

会议名称: 中国模式识别与计算机视觉大会(PRCV2018)

会议地点: 广州

会议语种:英文

页码: 99-109

在线出版日期: 2018-11-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Convolutional LSTM Based Video Object Detection