introduction to xml and databases
TRANSCRIPT
Introduction to XML
Kristian Torp
Department of Computer ScienceAalborg University
people.cs.aau.dk/˜[email protected]
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 1 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 2 / 42
Learning Goals
GoalsKnow the basic differences between a table and an XML document
Know the different representations of an XML document
Know the basic parts of an XML document
Know the goals of designing XML
Know data centric from document centric
Be able to construct your own basic XML documents
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 3 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 4 / 42
Text Files, (a Deja Vu?)
Example (A Text File)
P4 OOP 3 Object−or ien ted programmingP2 DB 7 Databases i n c l u d i n g SQL
Open QuestionsWhat does the columns mean?
When does white space matter?
What are the types of the columns?
NoteNo metadata what so everNeed additional information to parse the text file!
Could be a human looking at the file
Lowest common denominator a CSV fileKristian Torp (Aalborg University) Introduction to XML November 3, 2015 5 / 42
A First Look
Example (Table Look)
Id Name Semester Desc
P4 OOP 3 Object-oriented programmingP2 DB 7 Databases including SQL
Example (XML Look)<?xml vers ion= ” 1.0 ” ?>< !DOCTYPE coursecata log SYSTEM ” coursecata log . dtd ”><coursecata log>
<course c id= ’P4 ’><name>OOP< / name><semester>3< / semester><desc>Object−or ien ted programming< / desc>
< / course><course c id= ’P2 ’>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g SQL< / desc>
< / course>< / coursecata log>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 6 / 42
A First Look
Example (Table Look)
Id Name Semester Desc
P4 OOP 3 Object-oriented programmingP2 DB 7 Databases including SQL
Example (XML Look)<?xml vers ion= ” 1.0 ” ?>< !DOCTYPE coursecata log SYSTEM ” coursecata log . dtd ”><coursecata log>
<course c id= ’P4 ’><name>OOP< / name><semester>3< / semester><desc>Object−or ien ted programming< / desc>
< / course><course c id= ’P2 ’>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g SQL< / desc>
< / course>< / coursecata log>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 6 / 42
A Second LookExample (XML Look (again))<?xml vers ion= ” 1.0 ” ?>< !DOCTYPE coursecata log SYSTEM ” coursecata log . dtd ”><coursecata log>
<course c id= ’P4 ’><name>OOP< / name><semester>3< / semester><desc>Object−or ien ted programming< / desc>
< / course><course c id= ’P2 ’>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g SQL< / desc>
< / course>< / coursecata log>
Example (Tree Look)/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 7 / 42
A Second LookExample (XML Look (again))<?xml vers ion= ” 1.0 ” ?>< !DOCTYPE coursecata log SYSTEM ” coursecata log . dtd ”><coursecata log>
<course c id= ’P4 ’><name>OOP< / name><semester>3< / semester><desc>Object−or ien ted programming< / desc>
< / course><course c id= ’P2 ’>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g SQL< / desc>
< / course>< / coursecata log>
Example (Tree Look)/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 7 / 42
Something Well Known?
Example (XHTML)<?xml vers ion= ” 1.0 ” encoding= ”UTF−8” ?>< !DOCTYPE html PUBLIC ” − / /W3C/ / DTD XHTML 1.0 T r a n s i t i o n a l / / EN”
” h t t p : / /www.w3 . org /TR/ xhtml1 /DTD/ xhtml1− t r a n s i t i o n a l . dtd ”><html xmlns= ” h t t p : / /www.w3 . org /1999/ xhtml ”>
<head>< t i t l e>A Simple XHTML Document< / t i t l e>
< / head><body>
<p>Hel lo XHTML!< / p>< / body>
< / html>[Source: examples/xhtml_simple.xhtml]
XHTML versus HTMLXHTML is a cleaned-up version of HTML
Looks a lot like HTML
Much stricter requirements to XHTML than to HTML
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 8 / 42
Data versus Document Centric
Example (Data Centric)<rows><row><name>Hans< / name><address>Denmark< / address>
< / row><row><name>Marge< / name><address>Sweden< / address>
< / row>< / rows>
Example (Document Centric)< l y r i c>I s i t g e t t i n g < i t>b e t t e r< / i t>?Or do you f e e l the same?W i l l i t make i t eas ie r on you now?You got someone to <em>blame< /em>You say
One loveOne l i f e< / l y r i c>
Data CentricDatabase table like
Content in leafs
Inflexible, but simple
Document CentricFree format (almost)
Mixed content
Flexible, but complex
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 9 / 42
Goals of XML
GoalsXML shall be straight forwardly usable over the Internet
XML shall support a wide variety of applicationsXML shall be compatible with SGML
SGML = Standard Generalized Markup Language
Easy to write programs which process XML documents
Keep the number of optional features low (0)
XML documents should be reasonably clear
The XML design should be prepared quickly
The design of XML shall be formal and concise
XML documents shall be easy to create
[Source: www.w3.org/TR/REC-xml/]
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 10 / 42
XML Family of Products
ProductsCore
The basic XML recommendation
Add-onsDTD, XML Namespace, XPath, XLink, XPointer, XQuery, etc.
Focus on layoutCSS, XSLT, and XSL-FO
XML ApplicationsXHTML, DocBook, SVG, XForms, etc.
XML ApplicationsWeb Content Syndication: RSS (www.rssboard.org)
Education: SCORM for teaching material (www.scorm.com)
Document metadata: Dublin Core (www.dublincore.org)
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 11 / 42
Summary: Introduction
Main PointsAn XML document compared to a text file
More readable (without help)More complicated to handle (if you are familiar with content)Higher space usageData and metadata embedded in the same documentMarkup and content clearly separated
An XML document can be represented in two waysTextual structureTree structure
The goals of the XML design were made in an Internet age!
There is a very large set of XML technologies and applications
NoteXML and databases are not competing technologies
XML is not a replacement of HTML
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 12 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 13 / 42
Main Parts of an XML Document
ConceptsDocument prologElements
A root
Attributes
Entities
Example (XML Document)<?xml vers ion= ” 1.0 ” encoding= ”UTF−8” ?>< !DOCTYPE coursecata log
SYSTEM ” coursecata log . dtd ” [<!ENTITY prg ” programming ”>< ! ENTITY sq l ”SQL”> ]>
<coursecata log><course i d = ”P4”>
<name>OOP< / name><semester>3< / semester><desc>Object−or ien ted &prg ;< / desc>
< / course><course i d = ”P2”>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g &sq l ;< / desc>
< / course>< / coursecata log>
NoteElements more flexible than attributes
XML supports UTF out-of-the box
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 14 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 15 / 42
Document Prolog
Example<?xml vers ion= ” 1.0 ” ?>< !DOCTYPE coursecata log SYSTEM ” coursecata log . dtd ”><coursecata log>
Consists ofVersion number and text encoding
Document type definition declaration
Instruction to the XML processor
Root element of the XML document
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 16 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 17 / 42
Elements
ExampleStart tag <state> or <course>
State tag with attributes <state id=”1” abbr=”GA”>
End tag </state>
Element with content <state>Georgia</state>
Empty element <state/>
Empty element with attributes <state id=”1” abbr=”GA”/>
Case matters <state> , <State> , <STaTE>
Consists ofStart tag
Some content called character data
End tag
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 18 / 42
Elements, cont.
RulesStart tag must be before end tagAn elements start and end tag must have the same parent
Wrong: <state><city></state></ city>
Right: <state><city></ city></state>
ContentSimple <outer><one>stuff</one></outer>
Mixed content <outer>More <one>stuff</one></outer>
Tag versus Element<msg>Hello World</msg>
Element: <msg>Hello World</msg>
Tag: msg
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 19 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 20 / 42
Attributes
Example<state id=”1” abbr=”GA”>
<country id=”DK”date=”2006−02−01”>
Consists ofName/value pairs
NoteAttributes cannot stand alone
Only start tags can have attributes
There can be any number of attributes
Attribute names must be unique <state id=”GA”id=”GE”>
Attribute values must be in quotes <state id=GA>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 21 / 42
Elements versus Attributes
Example (Elements versus Attributes)<box he igh t= ” 20 ”
width= ” 20 ”depth= ” 30 ”u n i t = ”cm”><content>S t u f f< / content>
< / box>
<box><he igh t>
<sca la r>20< / s ca la r><u n i t>cm< / u n i t>
< / he igh t><width>
<sca la r>20< / s ca la r><u n i t>cm< / u n i t>
< / w id th><depth>
<sca la r>30< / s ca la r><u n i t>cm< / u n i t>
< / depth><content>S t u f f< / content>
< / box>
NoteAttributes can always be converted to elements
Elements can sometimes be converted to attributes
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 22 / 42
Elements versus Attributes, cont.
Example (Elements versus Attributes)<box>
<he igh t><sca la r>20< / s ca la r><u n i t>cm< / u n i t>
< / he igh t><width>
<sca la r>20< / s ca la r><u n i t>cm< / u n i t>
< / w id th><depth>
<sca la r>30< / s ca la r><u n i t>cm< / u n i t>
< / depth><content>S t u f f< / content>
< / box>
<box><he igh t u n i t = ”cm”>20< / he igh t><width u n i t = ”cm”>20< / w id th><depth u n i t = ”cm”>30< / depth><content>S t u f f< / content>
< / box>
NoteAttributes good for identify, units and so on
Elements good if variable number of “stuff”
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 23 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 24 / 42
Entities
Example<! ENTITY company ”XML Lovers Inc.”>
<! ENTITY sql ”SQL”>
PurposeTo make XML document easier to maintain
Recurring text
Are place holders for content (abbreviations)
TypesParameter entities used in DTD
General entities used in the XML document itself
There are a lot of details about entities!
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 25 / 42
Using Entities
Example<?xml vers ion= ” 1.0 ” encoding= ”UTF−8” ?>< !DOCTYPE coursecata log
SYSTEM ” coursecata log . dtd ” [<!ENTITY prg ” programming ”>< ! ENTITY sq l ”SQL”> ]>
<coursecata log><course i d = ”P4”>
<name>OOP< / name><semester>3< / semester><desc>Object−or ien ted &prg ;< / desc>
< / course><course i d = ”P2”>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g &sq l ;< / desc>
< / course>< / coursecata log>
[Source: examples/coursecatalog_with_entity.xml]
NoteThe entities prg and sql
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 26 / 42
More Entity Examples
Entity TypesPredefined character entities amp = & gt = >
Usage: <msg>Hello & and ></msg>
Numbered character entities #145 = æ
Usage: <msg>This is a Danish letter ‘</msg>
External entities definition is in another file
Internal entities
Unparsed entity <!ENTITY logo SYSTEM ”logo.gif”NDATA gif>
NoteThere are a lot of details about entities!
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 27 / 42
Various Comments on XML Documents
CommentsAre always in UTF
Whitespace is preserved (not the case in HTML)Carriage return and line feed converted to line feed
Weird when used to MS Windows
This is a comment <!−−a comment in XML −−>
Example (Comments in XML)<?xml vers ion= ” 1.0 ”><doc>
< !−− A comment −−><row> < / row><row> < !−− Another comment −−> < / row>
< / doc>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 28 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 29 / 42
First Design
Example (1-n Relationship)<order−db>
<orders><order i d = ” 117 ”>
<customer−name>Ann< / customer−name>< / order><order i d = ” 341 ”>
<customer−name>Jim< / customer−name>< / order>
< / orders><o r d e r l i n e s>
<o r d e r l i n e i d = ” 117 ” l i n e −no= ” 1 ”><d e s c r i p t i o n>pizza< / d e s c r i p t i o n><q u a n t i t y>1< / q u a n t i t y><pr ice −each>10.50< / p r ice −each>
< / o r d e r l i n e>< / o r d e r l i n e s>
< / order−db>
NoteToo much first normal form, does not use tree hierarchy
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 30 / 42
Second Design
Example (1-n Relationship)<order−db>
<orders><order i d = ”O117”>
<customer−name>Ann< / customer−name><o r d e r l i n e s>
<o r d e r l i n e l i n e −no= ” 1 ”><d e s c r i p t i o n>pizza< / d e s c r i p t i o n><q u a n t i t y>1< / q u a n t i t y><pr ice −each>10.50< / p r ice −each>
< / o r d e r l i n e>< / o r d e r l i n e s>
< / order><order i d = ”O341”>
<customer−name>Jim< / customer−name>< / order>
< / orders>< / order−db>
NoteAll information related to single order is stored together
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 31 / 42
Summary: Anatomy
Main PointsElements
One is the root
AttributeLimited set
EntitiesSimilar to a macroThere are many details
The prolog
NoteIn doubt element or attribute? Pick element
Remember good comments, for humans!
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 32 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 33 / 42
Non Well-Formed XML Document
Example (Missing Root)<course i d = ”P4”>
<name>OOP< / name><semester>3< / semester><desc>Object−or iented Prog .< / desc>
< / course><course i d = ”P2”>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g SQL< / desc>
< / course>
Example (Nesting Wrong)<person ssn= ” 43 ”>
<name>< f i r s t>James< / f i r s t> < l a s t>Bond< / name>< / l a s t>< j ob>agent< / job>
< / person>
Example (Missing Quotes)<person ssn=43>
<name> . . . < / name>< / person>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 34 / 42
Non Well-Formed XML Document
Example (Missing Root)<course i d = ”P4”>
<name>OOP< / name><semester>3< / semester><desc>Object−or iented Prog .< / desc>
< / course><course i d = ”P2”>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g SQL< / desc>
< / course>
Example (Nesting Wrong)<person ssn= ” 43 ”>
<name>< f i r s t>James< / f i r s t> < l a s t>Bond< / name>< / l a s t>< j ob>agent< / job>
< / person>
Example (Missing Quotes)<person ssn=43>
<name> . . . < / name>< / person>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 34 / 42
Non Well-Formed XML Document
Example (Missing Root)<course i d = ”P4”>
<name>OOP< / name><semester>3< / semester><desc>Object−or iented Prog .< / desc>
< / course><course i d = ”P2”>
<name>DB< / name><semester>7< / semester><desc>Databases i n c l u d i n g SQL< / desc>
< / course>
Example (Nesting Wrong)<person ssn= ” 43 ”>
<name>< f i r s t>James< / f i r s t> < l a s t>Bond< / name>< / l a s t>< j ob>agent< / job>
< / person>
Example (Missing Quotes)<person ssn=43>
<name> . . . < / name>< / person>
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 34 / 42
Well-Formed XML and Valid Document
Well-Formed XML DocumentAll XML elements must have a closing tag
Empty elements are allow
Tags must be properly nestedStart and end tag must have the same parent
The XML document must have a root tag
Attribute values must be quoted
Valid XML DocumentIs well-formedAdheres to the rules of the specified DTD or XML Schema
Similar to a schema for a table, e.g., types and integrity constraints
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 35 / 42
Well-Formed and Valid
XML Documents
Well-Formed XML Documents
Valid XML Documents
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 36 / 42
Summary: Well-Formed and Valid
Main PointsWell-formed XML document
Structure must adhere to certain rules
Valid XML documentTypes and constraints must match a schema (DTD or XML Schema)Not covered in this lecture, more to come later
NoteTools check if documents are well-form and valid
The well-formedness is a huge plus over “flat” files
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 37 / 42
Outline
1 Introduction
2 Anatomy of an XML DocumentDocument PrologElementsAttributesEntitiesComplete XML Document
3 Well-Formed and Valid XML Documents
4 Summary
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 38 / 42
Why XML?
Many Good ReasonsOpen
Specifications available to all
Platform neutralRuns on Apple, Linux, Unix, Windows, . . .
Vendor neutralCompetition among vendors
StandardChanges done in open forums
NoteXML has support for checking structure/types/integrity constraints
DTD and XML Schema
XML has support for querying text documentsXPath and XQuery
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 39 / 42
Data vs. Document Centric
Data CentricDatabase designer
Does not use document order
Only content at leaf level
Simple
Rigid
Example: Extract RDBMS
Document CentricText author
Document order, e.g., forchapters figure no
Mixed content
Complex
Flexible
Examples: DocBook, XHTML
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 40 / 42
XML vs. DBMS
RDBMSStructured data
Unordered
Flat information
Native format
Very compact format
SQL
Fine-grained modifications
Bad data exchange
Integrity via SQL DDL
Supports data types
Extreme data volumes
XMLStructured and unstructured
Ordered
Hierarchical information
Standard format
Very verbose format
XPath and XQuery
Coarse-grained modifications
Excellent data exchange
Integrity via XML Schema (DTD)
Supports data types
Large data volumes
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 41 / 42
Additional Information
Web SitesW3C Schools free online tutorials www.w3schools.com.
Quite good for getting an overview of the various XML technologies.
Interactive XML Tutorials www.xmlzoo.net.Covers several parts of XML
The Annotated XML 1.0 Specificationwww.xml.com/axml/testaxml.htm.
The XML 1.0 specification with a lot of comments.
W3C XML recommendations www.w3.org.The place to go if you want all the details.
Altova’s home page (maker of XMLSpy) www.altova.com. If you arelooking for a good XML tool.IBM developerWorks overview “New to XML”www.ibm.com/developerworks/xml/newto/
Many links to additional information.
Kristian Torp (Aalborg University) Introduction to XML November 3, 2015 42 / 42