Emotion Recognition in Videos via Fusing Multimodal Features
Emotion recognition is a challenging task with a wide range of applications.In this paper,we present our system in the CCPR 2016 multimodal emotion recognition challenge.Multimodal features from acoustic signals,facial expressions and speech contents are extracted to recognize the emotion of the character in the video.Among them the facial CNN feature is the most discriminative feature for emotion recognition.We train SVM and random forest classifiers based on each type of features and utilize early and late fusion to combine the different modality features.To deal with the data unbalance issue,we propose to adapt the probability thresholds for each emotion class.The macro precision of our best multimodal fusion system achieves 50.34%on the testing set,which significantly outperforms the baseline of 30.63%.
Emotion recognition Multimodal features fusion CNN Features
Shizhe Chen Yujie Dian Yujie Dian Xiaozhu Lin Qin Jin Haibo Liu Li Lu
Multimedia Computing Laboratory,School of Information,Renmin University of China,Beijing,Peoples Re Tencent Inc.,Beijing,Peoples Republic of China
国际会议
第七届全国模式识别学术会议(The 7th Chinese Conference on Pattern Recognition,CCPR2016)
成都
英文
632-644
2016-11-03(万方平台首次上网日期,不代表论文的发表时间)