using xml in internet protocols - tbray.orgxml internationalization •“an xml document knows what...
TRANSCRIPT
![Page 1: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/1.jpg)
Using XML in Internet Protocols
Tim BrayDistinguished EngineerDirector of Web TechnologiesSun Microsystems
![Page 2: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/2.jpg)
Using XML in Internet Protocols
Tim BrayDistinguished EngineerDirector of Web TechnologiesSun Microsystems
![Page 3: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/3.jpg)
Agenda
• Should you use XML?• Should you invent a new XML language?• If you’re inventing a new XML language, how do you
maximize your chances of success?
![Page 4: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/4.jpg)
Should You Use XML? Other options:
• Hardwired binary• ASN.1• Plain text • JSON• XML
![Page 5: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/5.jpg)
Hardwired Binary: Issues
• Compact.• (Potentially) high-performance parsing.• Architecture-dependence.• Severe debugging pain.
Example: IPV? packet headers
![Page 6: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/6.jpg)
Use Hardwired Binary If:
• You’re way down the protocol stack.• But even then, be nervous.
![Page 7: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/7.jpg)
ASN.1: Issues
• Compact.• IETF tradition.• Lousy tools.• Debugging hell.• No community outside the IETF & ITU.• Only metadata is data type.
Example: SNMP
![Page 8: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/8.jpg)
Use ASN.1 If:
• You have to talk to other IETF stuff that’s locked in.
![Page 9: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/9.jpg)
Plain Text: Issues
• The simplest possible option is often the best.• Pretty efficient.• Fits well with server-side Internet (Unix) culture.• Watch out for I18n.• Watch out for extensibility.
Example: HTTP
![Page 10: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/10.jpg)
Use Plain Text If:
• ... you possibly can.
![Page 11: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/11.jpg)
JSON: Example vs. XML {"menu": { "id": "file", "value": "File", "popup": { "menuitem": [ {"value": "New", "onclick": "CreateNewDoc()"}, {"value": "Open", "onclick": "OpenDoc()"}, {"value": "Close", "onclick": "CloseDoc()"} ] }}}
<menu id="file" value="File"> <popup> <menuitem value="New" onclick="CreateNewDoc()" /> <menuitem value="Open" onclick="OpenDoc()" /> <menuitem value="Close" onclick="CloseDoc()" /> </popup></menu>
![Page 12: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/12.jpg)
JSON: Issues
• Superb browser integration.• Knows about lists, tuples, hashes.• Maps directly to programming-language structures.• Hard-wired to UTF-8 (in theory).• Awkward for deeply-nested or “document”-style
structures.• Watch out for extensibility.• Browser security issues.
Example: Google Maps mashups
![Page 13: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/13.jpg)
Use JSON If:
• You’re shipping structs and tuples around from program to program.
• You expect to implement client software in-browser.• The expected lifetime of the data is short.• It isn’t text-heavy.
![Page 14: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/14.jpg)
XML: Issues
• Tons of excellent open-source tools.• Programmers love XPath.• Decent extensibility.• I18n is nailed.• Handles “document” structures well.• Verbose & ugly.• Doesn’t map naturally to programming-language
structures.• DOM API is programmer-hostile.
![Page 15: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/15.jpg)
Use XML If:
• Your data is document-flavored.• You’re worried about i18n.• You’re worried about extensibility.• You’re worried about reusability.
![Page 16: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/16.jpg)
So, you’re going to use XML...
![Page 17: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/17.jpg)
Inventing New XML Languages:
• Time-consuming.• Bureaucratic.• Difficult.• Unpleasant.• Includes complex software development as a sub-
task.• Usually fails.
![Page 18: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/18.jpg)
Inventing New XML Languages:
• Time-consuming.• Bureaucratic.• Difficult.• Unpleasant.• Includes complex software development as a sub-
task.• Usually fails.
... so try not to!
![Page 19: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/19.jpg)
Some Good XML Languages
• XHTML• DocBook• ODF• Atom• XMPP• UBL• RDF
![Page 20: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/20.jpg)
So, you’re making your own
language...
![Page 21: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/21.jpg)
![Page 22: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/22.jpg)
♥ ♥ ♥ ♥ ♥
♥ ♥ ♥ ♥ ♥
![Page 23: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/23.jpg)
Design Issue: Semantics
• What does “Age” mean?• What does “Version” mean?• What does “Person” mean?• What does “Update” mean?• What does “Creator” mean?
![Page 24: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/24.jpg)
Design Issue: Model vs. Syntax
“What matters is getting the data model right.
The syntax is ephemeral.”
“The bits on the wire are the only
reality.”
![Page 25: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/25.jpg)
Design Issue: Minimalism vs. Completeness
“Let’s solve the whole problem.”
“Minimum progress required to declare victory.”
![Page 26: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/26.jpg)
Design Issue: Specification Tools
• Human-readable prose.• Examples.• Validator.• Schema.
![Page 27: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/27.jpg)
But, first: Know Your Audience
Why specs matterMost developers are morons, and the rest are assholes. I have at various times counted myself in both groups, so I can say this with the utmost confidence.
-Mark Pilgrim: http://diveintomark.org/archives/2004/08/16/specs
![Page 28: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/28.jpg)
Design Issue: Specification Tools
• Human-readable prose.• Examples.• Validator.• Schema.
![Page 29: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/29.jpg)
Design Issue: Specification Tools
• Human-readable prose.• Examples.• Validator.• Schema.
Most important
Very important
Nice to have
![Page 30: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/30.jpg)
XML Schema Language Options
• DTD• XSD (W3C XML Schemas)• RelaxNG • Schematron
![Page 31: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/31.jpg)
Document Type Definitions (DTDs)
• Constrain only what elements/attributes can appear, and where.
• Don’t say much about content.• Allow the definition use of “Entities”, macros of zero
arguments. Don’t use them!• Past their sell-by date.
![Page 32: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/32.jpg)
W3C XML Schemas (XSD)
• Hard to understand, hard to implement, hard to interoperate.
• No underlying formalism.• Limited in the set of markup idioms they can define.• Includes (in “Part 2”) a usable set of primitive data
types: Integers, floats, dates, URIs, and so on.• One of the reasons why the SOA/WS-* project is
sinking.
![Page 33: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/33.jpg)
RelaxNG
• Based on the hedge-automaton formalism.• Written in XML, or a non-XML Compact Syntax.• Good human-readability.• Can specify a very wide range of markup idioms.• Can use XSD Part 2 base datatypes.• Validators only available in Java and C.• For a good example, see RFC4287.• ISO 19757-2.
![Page 34: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/34.jpg)
Schematron
• Based on XPath.• Assertions with associated error/success messages.• Excellent for checking for specific error conditions or
anomalies.• Not really a language-specification tool.• Several implementations.• ISO 19757-3.
![Page 35: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/35.jpg)
XML Extensibility: Three Options
• No changes.• Must-Understand policy (e.g. as in SOAP).• Must-Ignore policy (e.g. as in Atom).
![Page 36: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/36.jpg)
XML Internationalization
• “An XML document knows what encoding it’s in.” -Larry Wall
• In an ideal world, everything would be in UTF-8.• In the real world, people don’t understand this stuff
and probably shouldn’t have to.• XML makes this survivable in many circumstances...
with most tools, they can suck up their Shift-JIS or Big5 or whatever and it’ll quite possibly Just Work.
![Page 37: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/37.jpg)
XML Security and Signatures
• Shouldn’t these two have the same signature?• XML Canonicalization is the solution.• Unfortunately, it’s also a problem.• XML DigSig says how to apply a signature to c14n-
ized XML.• Or, you could just sign the bag-o’-bits.
<a b="1" c="1"/> <a c='1' b='1'></a>
![Page 38: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/38.jpg)
The Semantic Web
• The RDF view: Everything’s a graph of 3-tuple assertions: Resource/Property/Value.
• R, P, and V can each be a URI. Value can be a URI or a literal.
• Assertions can be resources.• The RDF/XML serialization is ugly and annoying. • Semantic Web project sees a bright future of
operations on the Universal graph, once it’s built, so they’d like to use RDF/XML for everything.
![Page 39: Using XML in Internet Protocols - tbray.orgXML Internationalization •“An XML document knows what encoding it’s in.” -Larry Wall •In an ideal world, everything would be in](https://reader033.vdocuments.us/reader033/viewer/2022053011/5f0f1dec7e708231d44292d8/html5/thumbnails/39.jpg)
Thank [email protected]/ongoing/