Composite Script Identification and Orientation Detection for Indian Text Images
A major preprocessing step in a multi-script OCR is to identify the script type of the test document image. The published papers on script identification usually assume that the test image is in correct I.e. 0° orientation. But by mistake a document may be fed to the system in wrong orientation, say at an angle of nearly 180 or ±90. In this method we propose a script identification method that works for unknown orientation for all 11 official Indian scripts. Here, we first find the skew and counter-rotate the document by the skew angle. This will lead to correct (0) or upside down (180) orientation. Then script identification is done by a multi-stage tree classifier using features invariant to 0 /180 orientation. Next we go to find the orientation of the image by a two class classifier for each script. Performance of the proposed method has been tested on a variety of documents and promising results have been obtained.
Shamita Ghosh Bidyut B.Chaudhuri
Computer Vision and Pattern Recognition Unit Indian Statistical Institute, Kolkata-700 108, India
国际会议
北京
英文
294-298
2011-09-01(万方平台首次上网日期,不代表论文的发表时间)