会议专题

A Novel Method to Extract Text from Compound Document Images

to separate the text embedded in colored and/or complex backgrounds, a novel segmentation algorithm to separate the text from the image in a complicated document in which the text overlaps the background was presented, and this work could be seen as a new view to realize the multi thresholding segmentation method. In the first step, the curve fitting using least square method was carried out to fit the image histogram; in the second step, the image was split into several layers including text layers and background layers. These layers were merged by some given rules to simplify the image processing period; at last, all the text layers were processed using different techniques to pick up the text document successfully. Experiments were carried out with large number of such images and it shows that the proposed method outperforms the common used segmentation methods and has preferable applicability.

image segmentation complicated color document image image layers curve fitting

Huaibo Song Dongjian He

College of Mechanical and Electronic Engineering Northwest A&F University Yangling, CHINA

国际会议

2010 IEEE International Conference on Intelligent Computing and Intelligent Systems(2010 IEEE 智能计算与智能系统国际会议 ICIS 2010)

厦门

英文

143-146

2010-10-29(万方平台首次上网日期,不代表论文的发表时间)