End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion

摘要：

　　With the rapid development of Internet technology,the spread of bloody video has become increasingly serious,causing huge harm to society.In this paper,a bloody video recognition method based on audio-visual feature fusion is proposed to complement the limitation of the single vision-modality methods.In the absence of open bloody video data,this paper first constructed a database of bloody videos through web crawlers and data augmentation methods; then it used CNN and LSTM methods to extract the spatiotemporal features of visual channels.Meanwhile,the audio channel features were extracted directly from the original waveforms using the 1D convolutional network.Finally,the neural network based on the audio-visual feature fusion layer was constructed to achieve the early fusion of multimodal cues.The accuracy of the proposed method on the bloody video test data is 95%.The experimental results on self-built bloody video databases demonstrate that the extracted audio-visual feature representations are effective and the proposed multimodal fusion model can obtain the better and discriminative recognition performance than the singlechannel model.

关键词： Bloody video recognition Feature extraction Multimodal fusion

作者: Congcong Hou Xiaoyu Wu Ge Wang

作者单位: Communication University of China,Beijing,China Columbia School of Engineering and Applied Science,Computer Science,Columbia University,New York,USA

会议类型: 国际会议

会议名称: 中国模式识别与计算机视觉大会(PRCV2018)

会议地点: 广州

会议语种:英文

页码: 501-510

在线出版日期: 2018-11-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion