The SYSU System for CCPR 2016 Multimodal Emotion Recognition Challenge

摘要：

　　In this paper,we propose a multimodal emotion recognition system that combines the information from the facial,text and speech data.First,we propose a residual network architecture within the convolutional neural networks(CNN)framework to improve the facial expression recognition performance.We also perform video frames selection to fine tune our pre-trained model.Second,while the text emotion recognition conventionally deal with the clean perfect texts,here we adopt an automatic speech recognition(ASR)engine to transcribe the speech into text and then apply Support Vector Machine(SVM)on top of bag-ofwords(BoW)features to predict the emotion labels.Third,we extract the openSMILE based utterance level feature and MFCC GMM based zero-order statistics feature for the subsequent SVM modeling in the speech based subsystem.Finally,score level fusion was used to combine the multimodal information.Experimental results were carried on the CCPR 2016 Multimodal Emotion Recognition Challenge database,our proposed multimodal system achieved 36%macro average precision on the test set which outperforms the baseline by 6%absolutely.

关键词： Multimodal emotion recognition Residual network Speech recognition Text emotion recognition

作者: Gaoyuan He Jinkun Chen Xuebo Liu Ming Li

作者单位: SYSU-CMU Shunde International Joint Research Institute,Foshan,China SYSU-CMU Shunde International Joint Research Institute,Foshan,China;School of Electronics and Inform

会议类型: 国际会议

会议名称: 第七届全国模式识别学术会议(The 7th Chinese Conference on Pattern Recognition,CCPR2016)

会议地点: 成都

会议语种:英文

页码: 707-720

在线出版日期: 2016-11-03（万方平台首次上网日期，不代表论文的发表时间）

会议专题

The SYSU System for CCPR 2016 Multimodal Emotion Recognition Challenge