Character Segmentation for Classical Mongolian Words in Historical Documents

摘要：

　　There are many classical Mongolian historical documents which are reserved in image form,and as a result it is inconvenient for us to search and mining the desired content.In order to facilitate the word recognition in the document digitization procedure,this paper proposes a novel approach to segment the historical words in which the characters are intrinsically connected together and possess remarkable overlapping and variation.The approach consist of three steps: (1)significant contour point (SCP) detection on the approximated polygon of the word’s external contour,(2)baseline locating based on the logistic regression model and (3)segment path generation and validation based on the heuristic rules and the neural network.The SCP helps in the baseline locating and segment path generation.Experiment on the historical Mongolian Kanjur demonstrates that our approach could effectively locate the words’ baselines and segment the words into characters.

关键词： Classical Mongolian Character Segmentation Logistic Regression Heuristic Rule Neural Network

作者: Xiangdong Su Guanglai Gao Weihua Wang Feilong Bao Hongxi Wei

作者单位: School of Computer Science,Inner Mongolia University Hohhot,China 010021

会议类型: 国际会议

会议名称: Chinese Conference on Pattern Recognition, CCPR(2014年全国模式识别学术会议)

会议地点: 长沙

会议语种:英文

页码: 464-473

在线出版日期: 2014-11-01（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Character Segmentation for Classical Mongolian Words in Historical Documents