会议专题

End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion

  With the rapid development of Internet technology,the spread of bloody video has become increasingly serious,causing huge harm to society.In this paper,a bloody video recognition method based on audio-visual feature fusion is proposed to complement the limitation of the single vision-modality methods.In the absence of open bloody video data,this paper first constructed a database of bloody videos through web crawlers and data augmentation methods; then it used CNN and LSTM methods to extract the spatiotemporal features of visual channels.Meanwhile,the audio channel features were extracted directly from the original waveforms using the 1D convolutional network.Finally,the neural network based on the audio-visual feature fusion layer was constructed to achieve the early fusion of multimodal cues.The accuracy of the proposed method on the bloody video test data is 95%.The experimental results on self-built bloody video databases demonstrate that the extracted audio-visual feature representations are effective and the proposed multimodal fusion model can obtain the better and discriminative recognition performance than the singlechannel model.

Bloody video recognition Feature extraction Multimodal fusion

Congcong Hou Xiaoyu Wu Ge Wang

Communication University of China,Beijing,China Columbia School of Engineering and Applied Science,Computer Science,Columbia University,New York,USA

国际会议

中国模式识别与计算机视觉大会(PRCV2018)

广州

英文

501-510

2018-11-23(万方平台首次上网日期,不代表论文的发表时间)