Multi-level Three-Stream Convolutional Networks for Video-Based Action Recognition

摘要：

　　Deep convolutional neural networks (ConvNets) have shown remarkable capability for visual feature learning and representation. In the field of video-based action recognition, much progress has been made with the development of ConvNets. However, main-stream ConvNets used for video-based action recognition, such as two-stream ConvNets and 3D ConvNets, still lack the ability to represent fine-grained features. In this paper, we propose a novel architecture named multi-level threestream convolutional network (MLTSN), which contains three streams, i.e., the spatial stream, the temporal stream, and the multi-level correlation stream (MLCS). The MLCS contains several correlation modules, which fuse appearance and motion features at the same levels and obtain spatial-temporal correlation maps. The correlation maps will further be fed in several convolution layers to get refined features. The whole network is trained in a multi-step modality. Extensive experimental results show that the performance of the proposed network is competitive to state-of-the-art methods on HMDB51 and UCF101.

关键词： Action recognition Convolutional networks Multi-level correlation mechanism

作者: Yijing Lv Huicheng Zheng Wei Zhang

作者单位: School of Data and Computer Science,Sun Yat-sen University,Guangzhou,China;Key Laboratory of Machine Intelligence and Advanced Computing,Ministry of Education,Guangzhou,China;Guangdong Key Laboratory of Information Security Technology,135 West Xingang Road,Guangzhou 510275,China

会议类型: 国际会议

会议名称: 中国模式识别与计算机视觉大会(PRCV2018)

会议地点: 广州

会议语种:英文

页码: 237-249

在线出版日期: 2018-11-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Multi-level Three-Stream Convolutional Networks for Video-Based Action Recognition