Robust Segmentation for Video Captions with Complex Backgrounds
Caption text contains rich information that can be used for video indexing and summarization.In this paper,we propose an effective caption text segmentation approach to improve OCR accuracy.Here,an AlexNet CNN is first trained with path signature for text tracking.Then we utilize an improved adaptive thresholding method to segment caption text in individual frames.Finally,the multi-frame integration is conducted with gamma correction and region growing.In contrast to conventional methods which extract video text in individual frames independently,we exploit the specific temporal characteristics of videos to perform segmentation.Moreover,the proposed method can effectively remove the complex backgrounds with similar intensity to text.Experimental results on different videos and comparisons with other methods show the efficiency of our approach.
Caption text segmentation Convolutional neural networks Path signature Multi-frame integration
Zong-Heng Xing Fang Zhou Shu Tian Xu-Cheng Yin
Department of Computer Science and Technology,School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing,China
国际会议
第七届全国模式识别学术会议(The 7th Chinese Conference on Pattern Recognition,CCPR2016)
成都
英文
89-100
2016-11-03(万方平台首次上网日期,不代表论文的发表时间)