A Study on Sports Video Classification Based on Audio Analysis and Speech Recognition

摘要：

This paper proposes a method to deal with the problem of sports classification through audio analysis. First, a two-pass audio segmentation module is developed as the front-end to extract announcer’s speech from the audio streams. Then speech recognition technology is employed on the speech segments to extract keywords which are used as features to distinguish different sports. Finally, based on the keyword spotting (KWS) results and specific keywords selected for each kind of sports, a score ranking strategy is designed for conducting classification automatically. For robust KWS in our system, adaptation techniques for acoustic model and language model are employed and both of them show significant improvements on the KWS performance. Fifteen games of seven kinds of sports are used to evaluate the system performance. By integrating all the techniques, an average figure of metric (FOM) of 70.74 is achieved on the KWS task, a 100％ accuracy rate is achieved on sports classification task using all detected keywords of each game.

作者: Li Lu Qingwei Zhao Yonghong Yan Kun Liu

作者单位: THINKIT speech lab, Institute of Acoustics, Chinese Academy of Sciences THINKIT speech lab,Institute of Acoustics,Chinese Academy of Sciences Sony China Research Lab, Beijing, P.R.China

会议类型: 国际会议

会议名称: 第十届中国虚拟现实年会

会议地点: 上海

会议语种:英文

页码: 737-742

在线出版日期: 2010-10-20（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Study on Sports Video Classification Based on Audio Analysis and Speech Recognition