会议专题

Research on Text Line Segmentation of Historical Tibetan Documents Based on the Connected Component Analysis

  Text line segmentation is one of the critical content in handwriting documents recognition especially in the historical documents’ analysis and recognition. Because of the low quality and the complexity of these documents (background noise, scattered character, touching components between consecutive lines), automatic text line segmentation remains to be a hot spot for researching. In this paper we propose a new method to segment the text line from the historical Tibetan scripture “kangjur of the Beijing version on the paper by means of woodcut. This method first performs document image skew detection and correction, using projection profiles to get the baseline of text line, then the connected component is allocated to text line according to the location relationship. For some connected components, analyzing their location and sharp to assign these connected components correctly. This method using connected component instead of pixels, avoiding the noise generated by splitting characters. Experiments show that this method is effective in copes with touching text lines and promising in text line segmentation from historical Tibetan document.

Historical Tibetan document Kangjur Text line segmentation Component analysis Location Sharp

Yiqun Wang Weilan Wang Zhenjiang Li Yuehui Han Xiaojuan Wang

Key Laboratory of Chinas Ethnic Languages and Information Technology of Ministry of Education,North Key Laboratory of Chinas Ethnic Languages and Information Technology of Ministry of Education,North

国际会议

中国模式识别与计算机视觉大会(PRCV2018)

广州

英文

74-87

2018-11-23(万方平台首次上网日期,不代表论文的发表时间)