xml – a data sharing standard
DESCRIPTION
XML – a data sharing standard. DSC340 Mike Pangburn. Data-sharing challenge. Companies need to “talk” across different locations/systems and software Most database systems (and other software applications, e.g., Excel) can read text files formatted as XML. Blind Men and Elephants. - PowerPoint PPT PresentationTRANSCRIPT
XML – a data sharing standard
DSC340
Mike Pangburn
Data-sharing challenge
Companies need to “talk” across different locations/systems and software
Most database systems (and other software applications, e.g., Excel) can read text files formatted as XML.
Blind Men and Elephants
The ol’ standby: tab (or comma) delimited data…
Who: authored it? to contact about data?
What: are contents of database?
When: was it collected? processed? finalized? Where: was the study done?
Why: was the data collected?
How: were data collected? processed? Verified?
… can be pretty useless!
Metadata
Literally, “data about data” a set of data that describes and gives
information about other data ― Oxford English
Dictionary
Early Example of Metadata
Is HTML a good way to share data?
<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995
<p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999
XML: Tags provide meaning (metadata)<bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley
</publisher> <year> 1995 </year> </book> …
</bibliography>
XML vs. HTMLHTML
<h1> DELTA </h1><h2> 101 </h2> <h3> Atlanta </h3><h3> Brussels </h3><p> flight information </p>
XML<airline>DELTA </airline> <number>101 </number><from> Atlanta </from><to> Brussels </to> <description> flight information </description>
Another example: iTunes Library is XML<dict>
<key>Track ID</key><integer>617</integer>
<key>Name</key><string>Take Five</string>
<key>Artist</key><string>Dave Brubeck Quartet</string>
<key>Album</key><string>Time Out</string>
<key>Genre</key><string>Jazz</string>
<key>Kind</key><string>AAC audio file</string>
<key>Size</key><integer>5892093</integer>
<key>Total Time</key><integer>363578</integer>
….
</dict>
Acct./Fin XML example
XBRL (eXtensible Business Reporting Language) is a language for the electronic communication of business and financial data.
For example, company net profit has its own unique tag.
Edgar Online (SEC company information) uses XBRL The solution will allow users to request XBRL formatted
data for all US equities from within Microsoft Excel, custom templates, and the web.
HTML vs. XML
HTML started with very few tags, but… HTML now has many tags, and more keep being added
over time Messy, yet not customizable
XML has very few standard tags You add custom tags that are particular to your data
needs
XML helps solves the “Is this an elephant?” problem
The following code is legal in HTML:<p>This is a paragraph<p>This is another paragraph
In XML all elements must have a closing tag like this:
<p>This is a paragraph</p><p>This is another paragraph</p>
Opening and closing tags must have the same case:
<message>This is correct</message><Message>This is incorrect</message>
More strict tagging rules than HTML
In HTML, improperly nested tags like the following are frowned upon, but will not cause an error:
<b><i>This text is bold and italic</b></i>
In XML all elements must be properly nested within each other like this
<b><i>This text is bold and italic</i></b>
More strict tagging rules than HTML
Visualizing XML data: a “Tree”
<data><person id=“o555” >
<name> Mary </name><address>
<street> Maple </street> <no> 345 </no> <city> Seattle </city>
</address></person><person>
<name> John </name><address> Thailand </address><phone> 23456 </phone>
</person></data>
data
Mary
personperson
name addressname address
street no city
Maple 345 Seattle
JohnThai
phone
23456
id
o555
Elementnode
Valuenode
Attributenode
Database data vs. XML Data
<persons> <row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row>
</persons>
row row row
name name namephone phone phone
“John” 3634 “Sue” “Dick”6343 6363
XML:persons
Database table:
XML data can usually fit a DBMS table…
Consider the case of: missing attributes
An acceptable fit withtable Even though blanks are
deemed a bit undesirable in database tables
<person> <name> John</name> <phone>1234</phone> </person>
<person> <name>Joe</name></person>
no phone !
name phone
John 1234
Joe -
…but the fit is not always good Problematic case: Repeated
attributes
A poor fit with table, because database-design rule is that you should not have two columns with the same name:
<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>
two phones !
name phone phone
Mary 2345 3456
If XML data haddistinct “home” and “cell” columns, the issue goes away
Formatting (visualizing) XML data
Formatting (visualizing in a web page) XML data requires a separate file
What is that separate file? It’s called a “style sheet”
There are a couple different standards for style sheets XML-specific standard: XSL Flexible standard: CSS
CSS can be used with both .xml and .html files We will look at CSS after XML