会议专题

A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm

  The benchmarking database plays an essential role in evaluating the performance of the touching character string segmentation algorithm.In this paper,we present a new touching Tibetan character strings database.Firstly,using the previous proposed layout analysis and text-line segmentation algorithms,we segment scanned images of historical Tibetan documents into text-line images.Then,we find candidate touching Tibetan character strings using connected component analysis and screen out the correct touching samples.Finally,we annotate the data manually and establish the touching character database.The database contains 5,844 images of two-touching characters and 1,399 images of more than two-touching characters.It is applicable to evaluate the segmentation algorithms for the touching Tibetan character strings.For each image,the annotated ground truth file includes class labels,candidate segment points,baseline and average stroke width of a Tibetan single character.According to the type of touching,we divide the touching character string into three types: AB,OB and BB.We also count the number of different type of samples and find that 76.27%of the samples belongs to the third type(BB).In the end,we measure the performance of the over-segmentation algorithm on this database for reference.

Historical tibetan documents Touching character

Quanchao Zhao Long-long Ma Lijuan Duan

Faculty of Information Technology,Beijing University of Technology,Beijing,China;Beijing Key Laborat Chinese Information Processing Laboratory,Institute of Software,Chinese Academy of Sciences,Beijing, Faculty of Information Technology,Beijing University of Technology,Beijing,China;Beijing Key Laborat

国际会议

中国模式识别与计算机视觉大会(PRCV2018)

广州

英文

309-321

2018-11-23(万方平台首次上网日期,不代表论文的发表时间)