WEB Page Collection Using Automatic Document Segmentation for Spoken Document Retrieval

摘要：

In spoken document retrieval, the main factor affecting retrieval performance is speech recognition errors. Refining speech recognition technology can make improvement of speech recognition performance. However, if a query has out-ofvocabulary words, we cannot get the spoken documents related to the query. This paper describes spoken document retrieval using document expansion based on WEB whose contents are similar to the spoken documents retrieved. Most of spoken documents have some topics. Therefore, each spoken document is automatically divided into some segments depending on topic. And then, similar WEB pages to the spoken document can be collected using the query derived from the segment. The document expansion using WEB achieved improvement of the spoken document retrieval performance from 0.364 to 0.401 on interpolated 11-points average precition metric.

作者: Hiromitsu Nishizaki Kiyotaka Sugimoto Yoshihiro Sekiguchi

作者单位: Department of Research Interdisciplinary Graduate School of Medicine and Engineering, University of Department of Education Interdisciplinary Graduate School of Medicine and Engineering, University of

会议类型: 国际会议

会议名称: 2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

会议地点: 西安

会议语种:英文

页码: 1-4

在线出版日期: 2011-10-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

WEB Page Collection Using Automatic Document Segmentation for Spoken Document Retrieval