Towards Improving the Accuracy of Telugu OCR Systems

摘要：

Design of a high accuracy OCR system is a challenging task as the system performance is affected by its component modules. Each module has its own impact on the overall accuracy of the OCR system. An improvement in a module refiects upon overall system performance. In the present work, we have developed an OCR system for Telugu. Our experiments on a corpus of about 1000 images has shown that the system performance is degraded due to broken characters caused by the binarization module as well as due to improper character segmentation. Therefore, we address the issues of handling broken characters and poor segmentation. A novel approach which is based on feedback from the distance measure used by the classifier is proposed to handle broken characters. For character segmentation, our proposed approach exploits the orthographic properties of Telugu script. As a result, significant improvement is obtained in the performance of the system. These algorithms are generic and may be applicable to other Indian scripts, especially to south Indian scripts. In our experiments, an end-to-end system performance is evaluated which is not reported in the literature.

作者: P.Pavan Kumar Chakravarthy Bhagvati Atul Negi Arun Agarwal B.L.Deekshatulu

作者单位: Dept.Of Computer and Information Sciences University of Hyderabad Hyderabad 500 046, INDIA

会议类型: 国际会议

会议名称: 第11届文档分析与识别国际会议(ICDAR)

会议地点: 北京

会议语种:英文

页码: 910-914

在线出版日期: 2011-09-01（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Towards Improving the Accuracy of Telugu OCR Systems