Practical Study of Subclasses of Regular Expressions in DTD and XML Schema
DTD and XSD are two popular schema languages widely used in XML documents.Most content models used in DTD and XSD essentially consist of restricted subclasses of regular expressions.However,existing subclasses of content models are all defined on standard regular expressions without considering counting and interleaving.Through the investigation on the real world data,this paper introduces a new subclass of regular expressions with counting and interleaving.Then we give a practical study on this new subclass and five already known subclasses of content models.One distinguishing feature of this paper is that the data set is sufficiently large compared with previous relevant work.Therefore our results are more accurate.In addition,based on this large data set,we analyze the different features of regular expressions used in practice.Meanwhile,we are the first to simultaneously inspect the usage of the five subclasses and analyze different reasons dissatisfying the corresponding definitions.Furthermore,since W3C standard requires the content models to be deterministic,the determinism of content models is also tested by our validation tools.
XML DTD XML schema Interleaving Counting
Yeting Li Xiaolan Zhang Feifei Peng Haiming Chen
University of Chinese Academy of Sciences,Beijing,China State Key Laboratory of Computer Science,Institute of Software,Chinese Academy of Sciences,Beijing 1 State Key Laboratory of Computer Science,Institute of Software,Chinese Academy of Sciences,Beijing 1
国际会议
International Asia-Pacific Web Conference(第18届国际亚太互联网大会)
苏州
英文
368-382
2016-09-23(万方平台首次上网日期,不代表论文的发表时间)