A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm

摘要：

　　The benchmarking database plays an essential role in evaluating the performance of the touching character string segmentation algorithm.In this paper,we present a new touching Tibetan character strings database.Firstly,using the previous proposed layout analysis and text-line segmentation algorithms,we segment scanned images of historical Tibetan documents into text-line images.Then,we find candidate touching Tibetan character strings using connected component analysis and screen out the correct touching samples.Finally,we annotate the data manually and establish the touching character database.The database contains 5,844 images of two-touching characters and 1,399 images of more than two-touching characters.It is applicable to evaluate the segmentation algorithms for the touching Tibetan character strings.For each image,the annotated ground truth file includes class labels,candidate segment points,baseline and average stroke width of a Tibetan single character.According to the type of touching,we divide the touching character string into three types: AB,OB and BB.We also count the number of different type of samples and find that 76.27%of the samples belongs to the third type(BB).In the end,we measure the performance of the over-segmentation algorithm on this database for reference.

关键词： Historical tibetan documents Touching character

作者: Quanchao Zhao Long-long Ma Lijuan Duan

作者单位: Faculty of Information Technology,Beijing University of Technology,Beijing,China;Beijing Key Laborat Chinese Information Processing Laboratory,Institute of Software,Chinese Academy of Sciences,Beijing, Faculty of Information Technology,Beijing University of Technology,Beijing,China;Beijing Key Laborat

会议类型: 国际会议

会议名称: 中国模式识别与计算机视觉大会(PRCV2018)

会议地点: 广州

会议语种:英文

页码: 309-321

在线出版日期: 2018-11-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm