Practical Study of Subclasses of Regular Expressions in DTD and XML Schema

摘要：

　　DTD and XSD are two popular schema languages widely used in XML documents.Most content models used in DTD and XSD essentially consist of restricted subclasses of regular expressions.However,existing subclasses of content models are all defined on standard regular expressions without considering counting and interleaving.Through the investigation on the real world data,this paper introduces a new subclass of regular expressions with counting and interleaving.Then we give a practical study on this new subclass and five already known subclasses of content models.One distinguishing feature of this paper is that the data set is sufficiently large compared with previous relevant work.Therefore our results are more accurate.In addition,based on this large data set,we analyze the different features of regular expressions used in practice.Meanwhile,we are the first to simultaneously inspect the usage of the five subclasses and analyze different reasons dissatisfying the corresponding definitions.Furthermore,since W3C standard requires the content models to be deterministic,the determinism of content models is also tested by our validation tools.

关键词： XML DTD XML schema Interleaving Counting

作者: Yeting Li Xiaolan Zhang Feifei Peng Haiming Chen

作者单位: University of Chinese Academy of Sciences,Beijing,China State Key Laboratory of Computer Science,Institute of Software,Chinese Academy of Sciences,Beijing 1 State Key Laboratory of Computer Science,Institute of Software,Chinese Academy of Sciences,Beijing 1

会议类型: 国际会议

会议名称: International Asia-Pacific Web Conference(第18届国际亚太互联网大会)

会议地点: 苏州

会议语种:英文

页码: 368-382

在线出版日期: 2016-09-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Practical Study of Subclasses of Regular Expressions in DTD and XML Schema