dtd++ 2.0: adding support for co-constraints davide fiorello nicola gessa paolo marinelli fabio...

31
DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

DTD++ 2.0: Adding support for

co-constraints

Davide FiorelloNicola GessaPaolo MarinelliFabio Vitali

University of Bologna

Page 2: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: The war of schema languages2/31

Two sales pitches here

DTDs aren’t dead yet and should not be

Co-constraints are important, and the very next step in validation

Page 3: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

The war of schema languages

DTD?

XML Schema?

Relax NG?

Schematron?

ISO/IEC 19757 DSDL (especially part9: “Data type- and namespace-aware DTDs”)

Page 4: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: NormeInRete: DTD or XML Schema?4/31

My own story

The project NormeInRete (http://www.normeinrete.it): XML-ization of national and regional laws and basically any kind of normative document in Italy

Supported by the Italian Office of the Prime Minister, the Ministry of Justice and the department for Informatics in Public Administration. All national laws and regional laws from 3 (soon 7) of the 20 regions are now available in XML and locatable through URNs.

Yours truly is the main author of the DTDs and documentation manuals providing guidance for conversion.

The document type contains 150+ elements and 50+ attributes, dealing with content, meta-content, evolution in time and space, non-ASCII characters. By the end of the year we will deal with judicial documents.

Page 5: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: But…5/31

NormeInRete: DTD or XML Schema?

Started in 1999, the first versions of the rules was readied in 2000: necessarily DTD!

The syntax is clear, easy to look up and use, well-known by the users and tool implementers.

The birth of XML Schema created many discussions on whether to switch: “All my friends use XML Schema” “XML Spy creates very nice drawings of an XML Schema” “XML Schema is the future” “Admit you don’t know the first thing about XML Schema”

In truth, there is very little real reason to switch: DTDs are fine for our purposes.

So far, the parts are balanced. European integration may provide the necessary pressure.

Page 6: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: Are DTDs dead?6/31

But…

… is the switch inevitable?

Page 7: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: So: DTD++ 1.0 (Extreme Markup 2003)7/31

Are DTDs dead?

The need for an XML-based syntax For automatic processing and generation

The presence of strong competition XML Schema Relax NG

The absence of many important featuresYes, but … DTDs are easier to learn, DTDs are easier to read, DTDs are easier to use Many people still think in terms of DTDs

Page 8: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: What about XML-based syntax?8/31

So: DTD++ 1.0 (Extreme Markup 2003)

The idea: create a DTD-like language that is as powerful as the most powerful validation language: XML schema.

Syntax from DTD, structures and concepts from XML Schema: Namespace support Complex types for managing markup structures Simple types for managing constraints on data containers

Use as much as possible of DTD syntax, invent as little as possible, recycle concepts with new meanings.

Page 9: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: A taste of DTD++ (1)9/31

What about XML-based syntax?

Semantic equivalence to another XML-based schema language means this is no longer a problem.

Just convert it!

All human tasks use the original DTD++ form, All computer task use the corresponding XSD version. Conversion is easy and fast.

Page 10: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: A taste of DTD++ (2)10/31

A taste of DTD++ (1)

Anonymous complex types in XSD are content models<!ELEMENT X (A?, (B | C)[2-5], D*) >

Predefined simple types are predefined keywords <!ELEMENT A (#PCDATA)> or <!ELEMENT A (#STRING)><!ELEMENT B (#INTEGER)><!ELEMENT C (#DATE)>

Anonymous simple types add facets to predefined simple types. Syntax for facets uses well-known mathematical constructs: for instance {} for lengths and [] for ranges.<!ELEMENT D (#INTEGER[,100])>

Page 11: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: A taste of DTD++ (3)11/31

A taste of DTD++ (2)

Named types are named entities using different characters to differentiate themselves<!ENTITY # myInt “(#INTEGER[0,100])”><!ELEMENT D #myInt; >

<!ENTITY @ myType “(A?, (B | C)[2-5], D*)” ><!ELEMENT X @myType; >

Complex types that specify attributes have an additional block of quotes: <!ENTITY @ myType “(A?, (B | C)[2-5], D*)” “anAttr #STRING{10} #IMPLIED”><!ELEMENT X @myType; >

Page 12: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: Limits12/31

A taste of DTD++ (3)

Mixed content models extend the DTD syntax to allow any structure allowable with XSD: <!ENTITY @ myType “#PCDATA (A?, (B | C)[2-5], D*)” ><!ELEMENT X @myType; >

The ANY structure is extended<!ELEMENT comment ANY[0,3]{http://www.foo.org}>

Target namespaces use the newly introduced TARGETNS structure<!TARGETNS “http://www.foo.org”><!TARGETNS ns “http://www.bar.org”><!ELEMENT name (ns:firstname)><!ELEMENT ns:firstname (#PCDATA)>

Page 13: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: Co-constraints and what are they for13/31

Limits

No support (yet) for keys, keyrefs, uniques. No local elements No support for refs Only two design styles supported:

Salami slices Garden of Eden.

No redefine or include (but no need for them)

Page 14: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Co-constraints and what are they for

Better constraints

Real-life constraints

Constraints difficult to formalize

Page 15: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: For example…15/31

Is DTD++ 1.0 enough, then?

No, since XML Schema is not enough XML Schema cannot express all the structure and data constraints

that document designers may need: Mutual exclusion (“element x may have either the a attribute or the b

attribute, but not both”) Deep exclusions (“element x cannot contain, at any level of its subtree,

element y”) Structure-dependent structures (“if the item is gratis, i.e., the attribute

gratis is present, then no price should be specified, i.e., the element price should be absent”)

Data-dependent structures (“if the address is a PO box, then the address must include a PO box number, otherwise it must include a street name and a street number”)

These kinds of constraints are known as co-constraints, or co-occurrence constraints. Most real life XML document types have one or more of those constraints.

Page 16: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: Who cares?16/31

For example…

XHTML “a elements cannot contain other a elements” (appendix B) Both the normative DTD and the non normative XML Schema cannot express

fully this requirement (they only express a weaker form: “a elements cannot directly contain other a elements”)

XSLT “In a template element at least one of the match and name attributes must

be present” Again, the DTD and XML schema cannot express this requirement, and specify

both attributes as optional. XML Schema itself

“An element definition must either contain a ref or a name attribute, but not both. Furthermore, if the name attribute is present, then the type attribute or one of the simpleType or complexType elements must be present, but not two.”

The normative XML schema can only specify all these elements and attributes as optional.

… and plenty more…

Page 17: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: SchemaPath and DTD++ 2.017/31

Who cares?

Documents that contain violations to these rules are still considered valid by the XML schema validator.

Three solutions: Hope for the best (“It won’t happen”) - subject to Murphy’s Law Provide a default behavior (“If both attributes are present, consider

the first only”) Provide validation code within the downstream application

XMLdoc

DOM parser

invalid

DOM tree

?

??DOM

Tree + PSVI

Schema validator

Not well-formed

downstream applicationrulesrules

Page 18: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: DTD++ 2.018/31

SchemaPath and DTD++ 2.0

At the WWW2004 conference, we presented SchemaPath, our proposal to minimally extend XML Schema to handle co-constraints.

The idea is to find a way to conditionally assign types to elements and attributes. Furthermore, a non-satisfiable type is added for specifying error conditions to avoid.

SchemaPath maintains the XML Schema syntax, adds only ONE construct and ONE pre-defined simple type, maintains important XML Schema properties (the validation theorem and round-tripping and reverse round-tripping properties), and does not impact the PSVI for valid documents.

DTD++ 2.0 is the DTD-like syntax for Schematron

Page 19: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: Examples19/31

DTD++ 2.0

Conditional assignment of types Multiple definitions of the same element, each conditioned by an

XPath expression. Implicit and explicit priorities are used. Each condition is tested on the instance element, and the one that

holds with the highest priority is selected. The type specified by the selected definition is assigned to the

element. This is NOT a way to provide conditional types: types are just plain

old DTD++ 1.0 (XML Schema) types. The #ERROR simple type

When we want to specify the non-validity of a condition, we assign the element the #ERROR type.

The #ERROR type is a non-satisfiable type, whose presence in the instance document always and automatically signals a validation error.

Page 20: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: One possible solution to the W3C problems (1)20/31

Examples

Mutual exclusion “Element x may have either the a attribute or the b attribute but not both”.

Suppose we have defined a type myType with both a and b attributes as optional<xsd:element name=“x”>

<xsd:alt cond=“(@a and @b)” type=“xsd:error”/><xsd:alt type=“myType”/>

</xsd:element><!ELEMENT x “(@a and @b)” #ERROR><!ELEMENT x “” @myType;>

Data-dependent structures “The element quantity must be an integer if the unit element is ‘items’, and it

must be a decimal value if the unit element is ‘meters’”. Suppose we have already defined the data type for the unit element to only contain the values “meters” or “items”.

<xsd:element name=“quantity”><xsd:alt cond=“../unit=‘items’” type=“xsd:integer”/><xsd:alt cond=“../unit=‘meters’” type=“xsd:decimal”/>

</xsd:element>

Page 21: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: One possible solution to the W3C problems (2)21/31

One possible solution to the W3C problems (1)

XHTML “a elements cannot contain other a elements” (appendix B)<!ELEMENT A “.//a” (#ERROR)><!ELEMENT A “” (@inlineType;)>

XSLT “In a template element at least one of the match and name

attributes must be present”<!ELEMENT template "not(@match) and not(@name)" (#ERROR) >

<!ELEMENT template "" (@templateType;) ><!ENTITY @ templateType "%templateContent;" "match (#patternType;) name(#NCName;)">

Page 22: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: The “Trojan Milestones” requirements22/31

One possible solution to the W3C problems (2)

XML Schema “An element definition must either contain a ref or a name attribute, but

not both. Furthermore, if the name attribute is present, then the type attribute or one of the simpleType or complexType elements must be present, but not two.”

<!ELEMENT simpleType (@localSimpleType;)><!ELEMENT complexType (@localComplexType;)><!ENTITY @ element "(simpleType|complexType)" "name (#NCName;) #IMPLIED ref (#QName;) #IMPLIED type (#QName;) #IMPLIED"><!ELEMENT element "@name and @ref":4 (#ERROR)><!ELEMENT element "(@type or @ref) and (xsd:simpleType or

xsd:complexType)":3 (#ERROR)><!ELEMENT element "../xsd:schema and @ref":2 (#ERROR)><!ELEMENT element "not(@ref) and not(@name)":1 (#ERROR)><!ELEMENT element "":0 (@element;)>

Page 23: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: A DTD++ 2.0 solution to the Trojan Milestones requirements23/31

The “Trojan Milestones” requirements

“1. the element must be empty exactly when its sID or eID attribute is set.

2. when eID is present, no other attributes are permitted.3. each sID/eID value should occur only twice (once on sID

and once on eID)4. empty elements with matching sID and eID values should

match up in proper pairs and in order.Note that because of the second rule above, no attributes may be required for milestoneable elements. Schema languages that can make attributes optional or required depending on the presence of other attributes (in this case eID) do not suffer this problem.”

[DeRose, Extreme Markup 2004]

Page 24: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: Implementation of the DTD++2.0 parser24/31

A DTD++ 2.0 solution to the Trojan Milestones requirements

<!ENTITY @ startMarker “EMPTY” “sID ID #REQUIRED %regularAtts;”><!ENTITY @ endMarker “EMPTY” “eID IDREF #REQUIRED”>

<!ELEMENT X “”:0 %regularCM; ><!ATTLIST X “”:0 %regularAtts;>

<!ELEMENT X “@sID”:2 @startMarker;><!ELEMENT X “@sID = preceding::*/@sID”:3 #ERROR>

<!ELEMENT X “@eID=preceding::X/@sID”:4 @endMarker;><!ELEMENT X “@eID = preceding::*/@eID”:3 #ERROR><!ELEMENT X “@eID”:2 #ERROR>

Page 25: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: … but who cares for DTD anyway?25/31

Implementation of the DTD++2.0 parser

A DTD++ 2.0 validator exists and can be tested online at http://tesi.

fabio.web.cs.unibo.it/dpp

It is a Java application and a plain XML Schema validating engine

(tested with Xalan and MS XML parsers)

The application is a pre-processor to any XML Schema validator,

and, given an XML document X and a DTD++ document D, it converts D into (one or more) equivalent Schemapath file SP It converts SP into a plain XML Schema file XS It converts X into a different XML file X’, so that XS validates X’ if and only if SP validates X and thus if and only if D

validates X

Page 26: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: The test26/31

… but who cares for DTD anyway?

This part is not in the published paper On July 21st, 2004 we did a test on the relative speed

and precision of DTD++ and XML schema 14 volunteers (10M, 4F) were summoned, all 3rd and

4th year computer science students, versed in both DTD and XML schema (they all had passed with good marks bot the Web Technologies exam and specifically the questions on DTDs and XML schema)

The volunteers were divided in two groups and given 15 questions. Half had to solve them using XML schema, half using DTD++.

Page 27: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: A sample question27/31

The test

The 15 questions were identical in both tests, and regarded: A. Write XML: applying the rules from a schema and

write valid XML fragments (5 questions)

B. Validate XML: applying the rules from a schema and find errors in XML fragments (5 questions)

C. Write Schemas: write a fragment of schema given a plain text description of the problem (5 questions)

Page 28: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: The results28/31

A sample question

Verify whether the fragment: <order>

<to id=”125”>John Smith</to><lines><line>

<art>130</art><description>Some nice stuff</description><col>Red</col><price>0,65</price><quant>130</quant>

</line></lines></order>

is valid with respect the following DTD++ fragment: <!ELEMENT order (to, lines) ><!ELEMENT to (#STRING)><!ATTLIST to id ID #REQUIRED><!ELEMENT lines (line+) > <!ELEMENT line (art, col, price, quant)><!ELEMENT art (#PCDATA{,20}) ><!ENTITY # colors (“red | blue | green | yellow)” > <!ELEMENT col (#colors;) ><!ELEMENT quant (#INTEGER]0,]) ><!ELEMENT price (#DECIMAL]0,]) >

Page 29: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: Demo29/31

The results

DTD++ resulted a clear winner in all categories 36% faster on group A (Write XML) 53% faster on group B (Validate XML) Twice as fast (99%) on group C (Write Schemas) The question on the previous slide was answered on the average in

0:01:33 with DTD++, and in 0:03:03 average with XML Schema. Errors are slightly more with DTD++ than XML schema (123%), but this

might be due to the fact that the language was brand new. Of course the volunteers are very few, and the test might be

considered non-significant, but it gives at least an initial approximate measure of the relative value of the two languages.

An interesting note is that one of the volunteer converted the XML Schema into DTD fragments with textual annotations before answering each question.

Page 30: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Next: Conclusions30/31

Demo

A demo of the validating engine and the full result of the tests can be found at

http://tesi.fabio.web.cs.unibo.it/dpp

Time for a demo?

Page 31: DTD++ 2.0: Adding support for co-constraints Davide Fiorello Nicola Gessa Paolo Marinelli Fabio Vitali University of Bologna

Fine presentazione31/31

Conclusions

DTDs are faster to learn and use XML Schema are powerful and expressive Schematron-like co-constraints are even more expressive Why learning three languages?

DTD++ 1.0 is semantically equivalent to a relevant subset of XML schema

SchemaPath provides co-constraints with a very limited syntax and the new idea of conditional assignment of types (rather than conditional typing)

DTD++ 2.0 uses the same principle with a DTD-like syntax What now? Maybe ISO/IEC 19757 - DSDL:

Part 5 Data typesPart 9 Data type- and namespace-aware DTDs