会议专题

WEB Page Collection Using Automatic Document Segmentation for Spoken Document Retrieval

In spoken document retrieval, the main factor affecting retrieval performance is speech recognition errors. Refining speech recognition technology can make improvement of speech recognition performance. However, if a query has out-ofvocabulary words, we cannot get the spoken documents related to the query. This paper describes spoken document retrieval using document expansion based on WEB whose contents are similar to the spoken documents retrieved. Most of spoken documents have some topics. Therefore, each spoken document is automatically divided into some segments depending on topic. And then, similar WEB pages to the spoken document can be collected using the query derived from the segment. The document expansion using WEB achieved improvement of the spoken document retrieval performance from 0.364 to 0.401 on interpolated 11-points average precition metric.

Hiromitsu Nishizaki Kiyotaka Sugimoto Yoshihiro Sekiguchi

Department of Research Interdisciplinary Graduate School of Medicine and Engineering, University of Department of Education Interdisciplinary Graduate School of Medicine and Engineering, University of

国际会议

2011亚太信号与信息处理协会年度峰会(APSIPAASC 2011)

西安

英文

1-4

2011-10-18(万方平台首次上网日期,不代表论文的发表时间)