KEYWORD SPOTTING IN DEGRADED DOCUMENT USING MIXED OCR AND WORD SHAPE CODING

摘要：

This paper presents a new way for keyword spotting in degraded imaged document. Two prevalent word indexing, OCR and word shape coding, are combined compactly based on the recognition confidence evaluation. The basic procedures are as follows. First, OCR candidates are used for OCR indexing. Second, a new stoke feature and convexconcave feature of word are adopted for word shape coding. Furthermore, an intelligent indexing based on recognition confidence is introduced, which is adaptive to image quality. Finally, an inexact matching is used for word spotting. A collection from NLM, including 1553 scanned imaged documents, is used to evaluate our method. The results confirm the validity of our method.

关键词： spotting degraded imaged document OCR indexing word shape coding

作者: Yong Xia Guangri Quan Yongdong Xu Yushan Sun

作者单位: School of Computer Science and Technology Harbin Institute of Technology Harbin, China

会议类型: 国际会议

会议名称: 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems(2010 IEEE 智能计算与智能系统国际会议 ICIS 2010)

会议地点: 厦门

会议语种:英文

页码: 411-414

在线出版日期: 2010-10-29（万方平台首次上网日期，不代表论文的发表时间）

会议专题

KEYWORD SPOTTING IN DEGRADED DOCUMENT USING MIXED OCR AND WORD SHAPE CODING