Character Segmentation for Classical Mongolian Words in Historical Documents
There are many classical Mongolian historical documents which are reserved in image form,and as a result it is inconvenient for us to search and mining the desired content.In order to facilitate the word recognition in the document digitization procedure,this paper proposes a novel approach to segment the historical words in which the characters are intrinsically connected together and possess remarkable overlapping and variation.The approach consist of three steps: (1)significant contour point (SCP) detection on the approximated polygon of the word’s external contour,(2)baseline locating based on the logistic regression model and (3)segment path generation and validation based on the heuristic rules and the neural network.The SCP helps in the baseline locating and segment path generation.Experiment on the historical Mongolian Kanjur demonstrates that our approach could effectively locate the words’ baselines and segment the words into characters.
Classical Mongolian Character Segmentation Logistic Regression Heuristic Rule Neural Network
Xiangdong Su Guanglai Gao Weihua Wang Feilong Bao Hongxi Wei
School of Computer Science,Inner Mongolia University Hohhot,China 010021
国际会议
Chinese Conference on Pattern Recognition, CCPR(2014年全国模式识别学术会议)
长沙
英文
464-473
2014-11-01(万方平台首次上网日期,不代表论文的发表时间)