extensible markup language xml

45
Extensible Markup Language XML MSI 602 – Spring 2003

Upload: hiroshi-miura

Post on 02-Jan-2016

51 views

Category:

Documents


1 download

DESCRIPTION

Extensible Markup Language XML. MSI 602 – Spring 2003. Databases Types. Data is facts and figures Database is a related set of data Kinds of databases Unstructured Meaning of data interpreted by user Semi-Structured Structure of data wrapped around data Structured - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Extensible Markup Language XML

Extensible Markup LanguageXML

MSI 602 – Spring 2003

Page 2: Extensible Markup Language XML

DatabasesTypes

• Data is facts and figures• Database is a related set of data

Kinds of databases• Unstructured

– Meaning of data interpreted by user

• Semi-Structured– Structure of data wrapped around data

• Structured– Fixed structure of data

– Data added to the fixed structure

Page 3: Extensible Markup Language XML

XMLDefinition and Example

• XML is a text based markup language that is fast becoming a standard of data interchange– An open standard from W3C

– A direct descendant from SGML

Example: Product Inventory Data

<Product>

<Name>Refrigerator</Name>

<Model Number>R3456d2h</Model Number>

<Manufacturer>General Electric</Manufacturer>

<Price>1290.00</Price>

<Quantity>1200</Quantity>

</Product>

Page 4: Extensible Markup Language XML

XMLData Interchange

• XMLs key role is data interchange

• Two business partners want to exchange customer data– Agree on a set of tags

– Exchange data without having to change internal databases

• Other business partners can participate by using the same tagset– New tags can be added to extend the functionality

Key to successful data interchange is building

consensus and standardizing of tag sets

Page 5: Extensible Markup Language XML

XML Universal Data

• TCP/IP Universal Networking

• HTML Universal Rendering

• Java Universal Code

• XML Universal Data

• Numerous standard bodies are set up for standardization of tags in different domains– ebXML

– XBRL

– MML

– CML

Page 6: Extensible Markup Language XML

HTML vs. XML Comparison

• Both are markup languages– HTML has fixed set of tags

– XML allows user to specify the tags based on requirements

• Usage– HTML tags specify how to display data

– XML tags specify semantics of the data

• Tag Interpretation– HTML specifies what each tag and attribute means

– XML tags delimit data & leave interpretation to the parsing application

• Well formedness– HTML very tolerant of rule violations (nesting, matching tags)

– XML very strictly follows rules of well formedness

Page 7: Extensible Markup Language XML

XML Structure

• Prolog– Instructs the parser as to what it it parsing– Contains processing instructions for processor

• Body– Tags - Entities– Attributes - Properties of Entities– Comments - Statements for clarification in the document

Example<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?> Prolog

<contact><name>

<first name>Sanjay</first name><last name>Goel</last name>

</name><address> Body

<street>56 Della Street</street><city>Phoenix</city><state>AZ</state><zip>15784</zip>

</address></contact>

Page 8: Extensible Markup Language XML

XML Prolog

• Syntax: <?xml version=“1.0” encoding=“UTF-8”

standalone=“yes” ?>

• Contains eclaration that identifies a document as xml

• Version– Version of XML markup language used in the data

– Not optional

• Encoding– Identifies the character set used to encode the data

– Default compressed Unicode: UTF-8

• Standalone– Tells whether or not this document references external entity

• May contain entity definitions and tag specifications

Page 9: Extensible Markup Language XML

XML Syntax Elements & Attributes

• Uses less-than and greater-than characters (<…>) as delimiters

• Every opening tag must having an accompanying closing tag – <First Name>Sanjay</First Name>– Empty tags do not require an accompanying closing tag.– Empty tags have a forward slash before the greater-than sign e.g.

<Name/>

• Tags can have attributes which must be enclosed in double quotes– <name first=“Sanjay” last=“Goel”)

• Elements should be properly nested– The nesting can not be interleaved– Each document must have one single root element

• Elements and attribute names are case sensitive

Page 10: Extensible Markup Language XML

Tree Structure Elements

• XML documents have a tree structure containing multiple levels of nested tags. – Root element is a single XML element which encloses all of the other XML

elements and data in the document– All other elements are children of the root element

<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>

<contact> Root Element<name>

<first name>Sanjay</first name><last name>Goel</last name>

</name><address>

<street>56 Della Street</street> Child Elements<city>Phoenix</city><state>AZ</state><zip>15784</zip>

</address></contact>

Page 11: Extensible Markup Language XML

Attributes Definition and Example

• Attributes are properties associated with an element• Each attribute is a name value pair

– No element may contain two attributes with same name– Name and value are strings

Example<?xml version=“1.0” encoding=“UTF-8” standalone=“yes” ?>

<contact> <name first=“Sanjay” last=“Goel”></name> Attributes<address>

<street>56 Della Street</street> Nested Elements<city>Phoenix</city><state>AZ</state><zip>15784</zip>

</address></contact>

Page 12: Extensible Markup Language XML

Elements vs. Attributes Comparison

• Data should be stored in Elements

• Information about data (meta-data) should be stored in attributes– When in doubt use elements

• Rules of thumb– Elements should have information which some one may want to read.

– Attributes are appropriate for information about document that has nothing to do with content of document

e.g. URLs, units, references, ids belong to attributes

– What is your meta-data may be some ones data

Page 13: Extensible Markup Language XML

CommentsBasics

• XML comments begin with “<!--”and end with “-->”– All data between these delimiters is discarded

– <!-- This is a list of names of people -->

• Comments should not come before XML declaration

• Comments can not be placed inside a tag

• Comments may be used to hide and surround tags<Name>

<first>Sanjay</first>

<!-- <last>Goel</last> --> Last tag is ignored

</Name>

• “--” string may not occur inside a comment except as part of its opening and closing tag– <!-- the Red door -- that is the second --> Illegal

Page 14: Extensible Markup Language XML

Namespaces Basics

• XML documents come from different sources– Combining elements from different sources can result in

name conflict– Namespaces allow the interpreter to resolve the elements

• Namespaces– Declared within element start-tag using attribute xmlns– Represented as an actual URI (since namespaces are

globally unique)– e.g. <Collection xmlns:book="http://www.mjyOnline.com/books"

xmlns:cd=http://www.mjyOnline.com/books>

– Here book and cd are short hands for the full namespace name

• Default namespace is used if no other namespace is defined– It does not have any prefix associated with it

Page 15: Extensible Markup Language XML

Namespaces Example

<?xml version="1.0"?><!-- File Name: Collection.xml --><COLLECTION xmlns:book="http://www.mjyOnline.com/books" xmlns:cd="http://www.mjyOnline.com/cds"> <ITEM Status="in"> <TITLE>The Adventures of Huckleberry

Finn</book:TITLE> <AUTHOR>Mark Twain</book:AUTHOR> <PRICE>$5.49</book:PRICE> </ITEM> <ITEM Status="in"> <TITLE>The Marble Faun</TITLE> <AUTHOR>Nathaniel Hawthorne</AUTHOR> <PRICE>$10.95</PRICE> </ITEM> <ITEM> <ITEM Status="out"> <TITLE>Leaves of Grass</TITLE> <AUTHOR>Walt Whitman</AUTHOR> <PRICE>$7.75</PRICE> </ITEM> <ITEM Status="out"> <TITLE>The Legend of Sleepy Hollow</TITLE> <AUTHOR>Washington Irving</AUTHOR> <PRICE>$2.95</PRICE> </ITEM>

<?xml version="1.0"?>

<!-- File Name: Collection.xml -->

<COLLECTION

<ITEM> <TITLE>Violin Concertos Numbers 1, 2, and 3</TITLE> <COMPOSER>Mozart</COMPOSER> <PRICE>$16.49</PRICE> </ITEM> <TITLE>Violin Concerto in D</TITLE> <COMPOSER>Beethoven</COMPOSER> <PRICE>$14.95</PRICE> </ITEM></COLLECTION>

Books and CDs are tracked in different files if combined will lead to conflicts

Page 16: Extensible Markup Language XML

Namespaces Example

<?xml version="1.0"?>

<!-- File Name: Collection.xml -->

<COLLECTION

xmlns:book="http://www.mjyOnline.com/books"

xmlns:cd="http://www.mjyOnline.com/cds">

<book:ITEM Status="in">

<book:TITLE>The Adventures of Huckleberry Finn</book:TITLE>

<book:AUTHOR>Mark Twain</book:AUTHOR>

<book:PRICE>$5.49</book:PRICE>

</book:ITEM>

<cd:ITEM>

<cd:TITLE>Violin Concerto in D</cd:TITLE>

<cd:COMPOSER>Beethoven</cd:COMPOSER>

<cd:PRICE>$14.95</cd:PRICE>

</cd:ITEM>

<book:ITEM Status="out">

<book:TITLE>Leaves of Grass</book:TITLE>

<book:AUTHOR>Walt Whitman</book:AUTHOR>

<book:PRICE>$7.75</book:PRICE>

</book:ITEM>

<cd:ITEM> <cd:TITLE>Violin Concertos Numbers 1, 2, and

3</cd:TITLE> <cd:COMPOSER>Mozart</cd:COMPOSER> <cd:PRICE>$16.49</cd:PRICE> </cd:ITEM> <book:ITEM Status="out"> <book:TITLE>The Legend of Sleepy Hollow</book:TITLE> <book:AUTHOR>Washington Irving</book:AUTHOR> <book:PRICE>$2.95</book:PRICE> </book:ITEM> <book:ITEM Status="in"> <book:TITLE>The Marble Faun</book:TITLE> <book:AUTHOR>Nathaniel Hawthorne</book:AUTHOR> <book:PRICE>$10.95</book:PRICE> </book:ITEM></COLLECTION>

Page 17: Extensible Markup Language XML

Display XML Style Sheets

• A style sheet is a file that contains instructions for rendering individual elements in an XML document

• Two kinds of style sheets exist– Cascading Style Sheets (CSS)

– Extensible Stylesheet language (XSLT)

• Please refer to the following web site for comprehensive information on style sheets– http://www.w3schools.com/css/default.asp

Page 18: Extensible Markup Language XML

Cascading Style SheetsExample

<?xml version="1.0"?><!-- File Name: Inventory01.xml --><?xml-stylesheet type="text/css"

href="Inventory01.css"?>

<INVENTORY> <BOOK> <TITLE>The Adventures of Huckleberry

Finn</TITLE> <AUTHOR>Mark Twain</AUTHOR> <BINDING>mass market paperback</BINDING> <PAGES>298</PAGES> <PRICE>$5.49</PRICE> </BOOK> <BOOK> <TITLE>Leaves of Grass</TITLE> <AUTHOR>Walt Whitman</AUTHOR> <BINDING>hardcover</BINDING> <PAGES>462</PAGES> <PRICE>$7.75</PRICE> </BOOK>

<BOOK> <TITLE>The Legend of Sleepy Hollow</TITLE> <AUTHOR>Washington Irving</AUTHOR> <BINDING>mass market paperback</BINDING> <PAGES>98</PAGES> <PRICE>$2.95</PRICE> </BOOK> <BOOK> <TITLE>The Marble Faun</TITLE> <AUTHOR>Nathaniel Hawthorne</AUTHOR> <BINDING>trade paperback</BINDING> <PAGES>473</PAGES> <PRICE>$10.95</PRICE> </BOOK> <BOOK> <TITLE>Moby-Dick</TITLE> <AUTHOR>Herman Melville</AUTHOR> <BINDING>hardcover</BINDING> <PAGES>724</PAGES> <PRICE>$9.95</PRICE> </BOOK></INVENTORY>

Page 19: Extensible Markup Language XML

Cascading Style Sheets Example

/* File Name: Inventory02.css */

BOOK

{display:block;

margin-top:12pt;

font-size:10pt}

TITLE

{display:block;

font-size:12pt;

font-weight:bold;

font-style:italic}

AUTHOR

{display:block;

margin-left:15pt;

font-weight:bold}

BINDING {display:block; margin-left:15pt}

PAGES {display:none}

PRICE {display:block; margin-left:15pt}

Page 20: Extensible Markup Language XML

Cascading Style Sheets Display

Page 21: Extensible Markup Language XML

Formal Languages/Grammars Basics

• A formal language is a set of strings– It is characterized by a set of rules which determine which strings are

a part of the language and which are not

– In case of programming languages, programs which compile are grammatical corret (others are not)

– In a natural language, like English, correct sentences follows rules of the English language grammar

• More precisely grammar a defines four things– A vocabulary out of which the strings are constructed (terminal

symbols)

– Vocabulary that is used to formulate grammar rules (non terminal symbols)

– Grammar rules (productions), each of which has a lhs and a rhs

– A designated start symbol

Page 22: Extensible Markup Language XML

Validated XML Document Basics

• An XML document is valid if it conforms to the grammar of the language– Validity is different from well-formedness

• Two ways to specify the grammar of the language– Document Type Definition (DTD)

– XML Schema

• Why bother with the language grammar– It provides the blueprint of the language

– Ensures that the data is interchangable

– Eliminates processing errors in custom software which expects a particular document content and structure

• Validity of the document is checked by using a validator

Page 23: Extensible Markup Language XML

Document Type Declaration Basics

• Document type declaration is a block of XML markup added to the prolog of the document– It has to follow the XML declaration– It has to be outside of other markup language

• It defines the content and structure of the language– Without a document type declaration or schema a document is merely

checked for well-formedness and not validity

• Why bother with the language grammar– It provides the blueprint of the language– Ensures that the data is interchangable– Eliminates processing errors in custom software which expects a

particular document content and structure

• The form of a document type declaration is:– <!DOCTYPE Name DTD> – DTD is document type definition– Name specifies the name of the document element

Page 24: Extensible Markup Language XML

Document Type Definitions Basics

• Document type definition (DTD) consists of a series of markup declarations enclosed in square brackets

<?xml version=“1.0” standalone=“yes”?>

<!DOCTYPE GREETING [

<!ELEMENT GREETING (#PCDATA)>

]>

<GREETING>

Hello XML!

</GREETING>

• A DTD can also be stored separately from the XML document and referenced in it.

Page 25: Extensible Markup Language XML

Document Type Definitions Syntax

• Element Type Declaration– Syntax: <!Element Name contentspec>– Name is the name of the element– contentspec is the content specification

• Example: – <!Element Title (#PCDATA)>

• Content specification can have four types of values– EMPTY content – Element must not have content

<!Element Image EMPTY>– ANY Content – Can contain any thing

<!Element misc ANY>– Element Content – Child elements but no character data

<!DOCTYPE BOOK [<!ELEMENT BOOK (TITLE, AUTHOR)><!ELEMENT TITLE (#PCDATA)><!ELEMENT AUTHOR (#PCDATA)>

– Mixed Content – character data and child elements interspersed

Page 26: Extensible Markup Language XML

Element Content Specification Types

• Content Specification indicates allowed child elements and their order– If element has element content it can not contain any character data

• Types of content specifications– Sequence: Indicates that each element must have a specific sequence of

child elements– Example

<!Doctype Mountain [<!ELEMENT MOUNTAIN (NAME, HEIGHT, STATE)>

<!ELEMENT NAME (#PCDATA)<!ELEMENT HEIGHT (#PCDATA)<!ELEMENT STATE (#PCDATA)

]>

– Valid XML<MOUNTAIN>

<NAME>Wheeler</NAME><HEIGHT>13161</HEIGHT><STATE>New Mexico</STATE>

</MOUNTAIN>

Page 27: Extensible Markup Language XML

Element Content Specification Types

• Types of content specifications– Choice: Indicates that element can have one of a series of child elements– Each element is separated by a | sign– Example

<!Doctype FILM [<!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)>

<!ELEMENT STAR (#PCDATA)><!ELEMENT NARRATOR (#PCDATA)><!ELEMENT INSTRUCTOR (#PCDATA)>

]>

– Valid XML<FILM>

<STAR>ROBERT REDFORD</STAR></FILM>

– Invalid XML<FILM>

<NARRATOR>Sir Gregory Parsloe</NARRATOR><INSTRUCTOR>Galahad Threepwood</INSTRUCTOR>

</FILM>

Page 28: Extensible Markup Language XML

Element Content Specification Number of Elements

• Specifying the number of elements allowed– ? zero or one– + one or more– * zero or more– Example

<!Doctype Mountain [<!ELEMENT MOUNTAIN (NAME+, HEIGHT?, STATE)>

<!ELEMENT NAME (#PCDATA)<!ELEMENT HEIGHT (#PCDATA)<!ELEMENT STATE (#PCDATA)

]>

– Valid XML<MOUNTAIN>

<NAME>Peublo Peak</NAME><NAME>Taos Mountain</NAME><STATE>New Mexico</STATE>

</MOUNTAIN>

Page 29: Extensible Markup Language XML

Element Content Specification Modification

• Modifying a group of elements– Example

<!Doctype FILM [

<!ELEMENT FILM (STAR | NARRATOR | INSTRUCTOR)+>

<!ELEMENT STAR (#PCDATA)>

<!ELEMENT NARRATOR (#PCDATA)>

<!ELEMENT INSTRUCTOR (#PCDATA)>

]>

– Valid XML<FILM>

<NARRATOR>Sir Gregory Parsloe</NARRATOR>

<STAR>ROBERT REDFORD</STAR>

<NARRATOR>PLUG BASHMAN</NARRATOR>

</FILM>

Page 30: Extensible Markup Language XML

Element Content Specification Nesting

• Nesting in specification– Example

<!Doctype FILM [

<!ELEMENT FILM TITLE, CLASS,(STAR | NARRATOR | INSTRUCTOR)+>

<!ELEMENT TITLE (#PCDATA)>

<!ELEMENT CLASS (#PCDATA)>

<!ELEMENT STAR (#PCDATA)>

<!ELEMENT NARRATOR (#PCDATA)>

<!ELEMENT INSTRUCTOR (#PCDATA)>

]>

– Valid XML<FILM>

<TITLE>The Net</TITLE>

<CLASS>Action</CLASS>

<STAR>Sandra Bullock</STAR>

</FILM>

Page 31: Extensible Markup Language XML

Element Content Specification Mixed Content Model

• Mixed Content Model: Allows element to contain– Character Data

– Child elements in any position and any frequency (zero or more repetitions)

– Child elements can be interspersed with data

• Character data only– Example

<!ELEMENT TITLE (#PCDATA)>

• Character data and elements– Example:

<!ELEMENT TITLE (#PCDATA | SUBTITLE)+>

<!ELEMENT SUBTITLE (#PCDATA)>

– Valid XML<TITLE>Moby Dick <SUBTITLE>Or, The Whale</SUBTITLE></TITLE>

<TITLE><SUBTITLE>Or, The Whale</SUBTITLE>Moby Dick</TITLE>

Page 32: Extensible Markup Language XML

Attribute Specification Basics

• All attributes in the document need to be specified using an attribute declaration list. It defines– Defines the name of the attribute– Defines the data type of each attribute– Specifies whether an attribute is required or noe

• Syntax: <!ATTLIST Name Attdefs>– Name is the name of the element– Attdefs is a series of one or more attribute definitions

• Attribute definition Syntax: Name AttType DefaultDecl– Name is the attribute name– AttType is the type of the attribute (CDATA, Token Type,

Enumerated)– DefaultDecl specifies if attribute is required & default values – Example:

<!ELEMENT FILM (TITLE, (STAR | NARRATOR | INSTRUCTOR))><!ATTLIST FILM Class CDATA “fictional” Year CDATA #REQUIRED>

Page 33: Extensible Markup Language XML

Entity Specification Types

• There are two kinds of entities in XML documents1– Character entities (referred by character unicode number)

– Named entities, referred to by name

Page 34: Extensible Markup Language XML

XML Parsing Definition and Types

• An XML parser is a program that reads an XML document and makes its contents available for processing

• There are two standard types of parsers for XML– Document Object Model (DOM) which makes the document

available as a tree

– Simple XML Parser (SAX) which associates an event with each tag and each block of text

• XML parsers are available from many vendors– Each vendor conforms to the standardized XML interfaces

– One of the best parsers is the xerces parser

– Suns API for XML parsing is JAXP (supports basic classes and interfaces that a Java XML parser should support)

– Often SAX parsers are used for writing DOM parsers

Page 35: Extensible Markup Language XML

SAX Parser Basics

• As the parser scans the document it sends notifications of events, for instance– Element start

– Element end

– Character sequence between two elements is found

• SAX provides standard names for these callback functions that are triggerd by these eventsvoid characters (char[] ch, int start, int length): notification of character data

void startDocument(): notification of start of document

void endDocument(): notification of end of document

void startElement(String name, AttributeList atts): notification of start of element

void endElement(String name): notification of end of element

void processingInstruction(String target, String data): notification of processing instruction

Page 36: Extensible Markup Language XML

SAX Parser Example

From professional JSP page 658

Page 37: Extensible Markup Language XML

XSLT Parser Definition and Uses

• XSLT is an XML structure transforming language– Any treee transforming language needs an ability to refer

to tree paths– Xpath is the sub-language underneath XSLT for tree path

description

• There are two scenarios for use of XSLT– Browser contains an XSLT and uses it to render XML

documents– XSLT is used for changing the structure of an existing

XML document

• To run XSLT the following components are required– Java 1.4 standard development kit– James Clark’s xt (xt.jar)

Page 38: Extensible Markup Language XML

XSLT Parser Basics

• XSLT style sheet is an XML document

• Consists of two parts– Standard XML declaration including namespace declaractions

– Top level elements that set up the general framework for the output, e.g., variables or import parameters from the command line

• Processing involves the following – A current list of nodes from the source document is created by

matching a pattern

– Output to the current node is generated by instantiating a template corresponding the current pattern

– In process of transformation new nodes can be added to the list

– The processing begins by processing a list containing the entire document

– Transformation ends when the node list is empty

Page 39: Extensible Markup Language XML

XSLT Parser Example

• XSLT

Page 40: Extensible Markup Language XML

Web Services

Page 41: Extensible Markup Language XML

Web Services Definition

• Web Services are software programs that use XML to exchange information with other software programs via common Internet protocols. – Web services communicate over the network to provide

specific methods that other applications can invoke.– Thus applications residing on different computer can

work synergistically by invoking methods on each other– Http is the key protocol used for Web Services.

• Characteristics– Programmable– Encapsulate a task– XML based data exchange allows programs on

heterogenous platforms to communicate (SOAP)– Self-describing (WSDL)– Discoverable (UDDI)

Page 42: Extensible Markup Language XML

Web Services SOAP

• SOAP – Simple Object Access Protocol – Enables data transfer between systems distributed over a

network– A SOAP method send to the a Web Service invokes a method

provided by the service– Web Service may return the result via another SOAP

message

• SOAP consists of standardized XML schemas• Defines a format for transmitting XML messages over

network– Includes data types and message structure

• Layered over an Internet protocol, such as HTTP and can be used to transfer data across the Web and other networks– Http allows message transfer across firewall since Http

messages are usually accepted by firewalls

Page 43: Extensible Markup Language XML

Web Services SOAP

• SOAP message consists of three parts– Envelope– Header– Body

• Envelope wraps the entire message and contains header and body

• Header (optional) provides information on security and routing

• Body contains application specific data that is being transferred

• Other alternative to SOAP are XML-RPC– SOAP de facto standard due to simplicity, extensibility

and interoperability

Page 44: Extensible Markup Language XML

Web Services WSDL

• WSDL – Web Services Description Language• Provides means to provide information about a

web service– Instructions of its use– Capability of the service

• Provides information on connection to the service and communicate

• Syntax is fairly complex– Normally created using automated tools– Not important to understand the precise syntax of

WSDL while developing web services

Page 45: Extensible Markup Language XML

Web Services UDDI

• UDDI – Universal Description, Discovery and Integration– Allows developers and businesses to publish and locate

web services on a network via use of registries– The registries can be made private or public

• Structure similar to a phone book– White pages contain contact information and textual

description– Yellow pages provides classification information about

companies and details of company’s electronic capability– Green pages list technical data relating to services and

business processes