Broadcast News Story Segmentation Using Probabilistic Latent Semantic Analysis and Laplacian Eigenmaps
This paper proposes to integrate probabilistic latent semantic analysis (PLSA) and Laplacian Eigenmaps (LE) for broadcast news story segmentation. PLSA can address synonymy and polysemy problems by exploring underlying semantic relations beneath the actual occurrences of words. LE can provide a data transformation with the advantage of preserving the original temporal structure of sentence cohesive relations.We adopt PLSA statistics to replace term frequency as the representation of sentences and measure their connective strength. LE analysis is then performed on the connective strength matrix so that the sentence relations becomes geometrically evident for discriminating different stories. A dynamic programming (DP) algorithm is used for story boundary identification. Experiments show that the proposed method achieves superior story segmentation performances with the highest F1-measure of 0:7536 on TDT2 Mandarin BN corpus.
Mimi Lu Lilei Zheng Cheung-Chi Leung Lei Xie Bin Ma Haizhou Li
Shaanxi Provincial Key Laboratory of Speech and Image Information Processing,School of Computer Scie Institute for Infocomm Research, ASTAR, Singapore Shaanxi Provincial Key Laboratory of Speech and Image Information Processing, School of Computer Sci
国际会议
2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)
西安
英文
1-5
2011-10-18(万方平台首次上网日期,不代表论文的发表时间)