Composite Script Identification and Orientation Detection for Indian Text Images

摘要：

A major preprocessing step in a multi-script OCR is to identify the script type of the test document image. The published papers on script identification usually assume that the test image is in correct I.e. 0° orientation. But by mistake a document may be fed to the system in wrong orientation, say at an angle of nearly 180 or ±90. In this method we propose a script identification method that works for unknown orientation for all 11 official Indian scripts. Here, we first find the skew and counter-rotate the document by the skew angle. This will lead to correct (0) or upside down (180) orientation. Then script identification is done by a multi-stage tree classifier using features invariant to 0 /180 orientation. Next we go to find the orientation of the image by a two class classifier for each script. Performance of the proposed method has been tested on a variety of documents and promising results have been obtained.

作者: Shamita Ghosh Bidyut B.Chaudhuri

作者单位: Computer Vision and Pattern Recognition Unit Indian Statistical Institute, Kolkata-700 108, India

会议类型: 国际会议

会议名称: 第11届文档分析与识别国际会议(ICDAR)

会议地点: 北京

会议语种:英文

页码: 294-298

在线出版日期: 2011-09-01（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Composite Script Identification and Orientation Detection for Indian Text Images