lecture introduction to xml lecture introduction to xml what is xml.pdfwhat is xml
TRANSCRIPT
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
1/34
Copyright IBM Corporation 2004
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Welcome to:
3.1
What Is XMLWhat Is XML??
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
2/34
Copyright IBM Corporation 2004
Unit Objectives
After completing this unit, you should be able to:
Describe the basic rules of XML
Describe what it means for an XML document to be well-formed
List the components that make up an XML document
Differentiate between XML and HTML
Describe the internationalization support in XML
Define some best practices for XML
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
3/34
Copyright IBM Corporation 2004
What Is XML?
At its core XML is text formatted to follow a well-defined set of rules.
XML documents consist primarily of tags and text.
If you've ever seen the source to an HTML document, then theXML structureshould look familiar
This text may be stored/represented in:A normal file stored on disk
A message being sent over HTTP
A character string in a programming language
A CLOB (character large object) in a database
Any other way textual data can be used
XML documents do notneed to exist as documents --they may be:
Byte streams sent between applications
Fields in a database record
Collections of XML Infoset information items
For simplicity they will be referred to as though they aredocuments and files.
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
4/34 Copyright IBM Corporation 2004
XML documents should be thought of as a hierarchical tree
structure.
Example Tree Representation of XML
"Tom
Wolfe"
"$6.00""TheRight
Stuff"
ROOT
=
Tom Wolfe
The Right Stuff
$6.00
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
5/34 Copyright IBM Corporation 2004
"Optional" first line; only required ifencoding IS NOT UTF-8 or UTF-16*
Root element start tag
Alphabet from A to Z
First child element with data
Empty element (no data)
Begin element tag
Boreng Riter Nested child elements
End element tag
The letter A is the first in the alphabet. It is also the first of five vowels.
Element containing an attribute andparsed character data (PCDATA) [TBD]
Comment
The letter Z is the last letter in the alphabet.
Last element in document
Root element end tag
A Simple XML Document - Basic Structure
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
6/34 Copyright IBM Corporation 2004
A Simple XML Document -Basic Nomenclature
The XML instance on the previous page consists of:
One main elementbook
Subelementstitle, isbn, author, chapter, and comment
Authorcontains other subelements firstNameand lastNameISBNand chaptercontain attributesnumberand title, respectively
Title, firstName, and lastNamecontain only strings:
Elements that contain numbers, strings, dates, and so forth (TBD) but no
subelements (or attributes) are said to have simple types
ISBN and chapter carry attributes; author has subelements:
Elements that contain subelements or carry attributes are said to havecomplex types
Attributes always have simple types (that is, they are numbers, strings,dates, and so forth.
TBD -- In a later chapter we describe XML Schemas which have access to
a collection of built-in simple types
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
7/34 Copyright IBM Corporation 2004
Basics of Well-formedXML (1 of 2)
XML documents are considered to be well-formedwhen they
adhere to a set of five rules that define basic XML syntax andstructure + a sixth for worldwide conformity.
1. There must be a single root element:
All other elements are nested inside the root element
2. Elements must be properly terminated:For every opening tag"" there must be a matching closing tag
""The exception is an empty (no content or body) tag ""
3. Elements must be properly nested underneath a parent tag
(except for the single, root element):A nested tag-pair may not overlap another tagThere is no limit to the nesting level of children elements
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
8/34 Copyright IBM Corporation 2004
Basics of Well-formedXML (2 of 2)
4. Tag names are case sensitive:
All tag and attribute names, attribute values, and data must complywith XML naming rules.
5.Attributes, extra information that can be provided for elements,
must be properly quoted:That is, all attributevalues must be in quotes.
6. The first line should/must contain the special tag that identifies
the version of the XML specification to apply:XML 1.0 is currently the most common.
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
9/34 Copyright IBM Corporation 2004
Element Rules - Rule 1. Single Root Element
All XML documents must have a single root element.
Legal: Not legal:
red green
red
green
Colors is the root element forthis XML.
Color represents multiple rootelements.
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
10/34 Copyright IBM Corporation 2004
Element Rules - Rule 2. Element Tag Rules
Elements consist of start and end tags.
End tag is identified by the /.Example: red
Elements may contain attributes within the start tag.
Example: Note: The attribute is isbn.
Empty elements contain no child elements or data.
These elements can be represented with a special shorthandnotation.
Example:
Can be shortened to:(preferred)
Or, if the element has no data as:
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
11/34 Copyright IBM Corporation 2004
Element Rules - Rule 3. Element Nesting
Elements must be properly nested.
The end tags of inner elements must occur before the end tags ofouter elements.
Any number of child elements or data may be nested within the startand end tags of an element.
El t N ti E l
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
12/34 Copyright IBM Corporation 2004
Element Nesting Example
Legal: Not legal: Polo
red large
large red Polo
All elements are properly nested.
The element tags are mixed up
and not ordered.
Best Practice:Use indentation to represent the document's hierarchy.
Important if your document will likely be read by humans.Computers and programs don't usually care.
El t R l R l 4 XML N i R l
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
13/34
Copyright IBM Corporation 2004
Element Rules - Rule 4. XML Naming Rules
XML name construction:
The first character must be A-Z, a-z, or _ (underscore)Any number of subsequent letters, numbers, hyphens,periods, colons, and underscore characters.
XML names are case sensitive.Names cannot contain spaces.
Names must not have a prefix of xml in any case combination
(such names are reserved).Best Practice:Brevity in tag names is not necessary.
Use descriptive names for elements and attributes.
oris far better than.Best Practice:Maintain standard naming conventions andquoting.
Camelback, dot and underscore notation are all common(For example, camelBackNotation, dot.notation, andunderscore_notation).
R l 4 T N i S l
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
14/34
Copyright IBM Corporation 2004
Rule 4. Tag Naming - Samples
Legal Not Legal Commentstitle, book.isdn,lastName, _street,addrLine1, name:first
1name, -street,&name
Examples of legal andillegal element names.
red
small
red
small
Element names arecase sensitive andstart and end tagsmust match.
John
John
Element names must
not contain spaces.
John
John
Elements must notcontain any W3Creserved words.
R l 4 El t C t t (1 f 2) G l
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
15/34
Copyright IBM Corporation 2004
Rule 4. Element Content (1 of 2): General
An XML instance is composed of elementsexpressed in tag pairs
(except for empty tags) plus optional attributesthat always havequoted values and optional datathat appears between the elementstart tag and the element end tag.
Mixed content - element content that contains data (PCDATA is
shown) and other elements.
Example (snippet):XMLExample
Chapter informationWhat is XML
What is HTMLMore chapter information
Rule 4 Element Content (2 of 2): Data
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
16/34
Copyright IBM Corporation 2004
Rule 4. Element Content (2 of 2): Data
Element data content is handled in one of two ways:
1. Parsed Character Data (PCDATA): is examined by the XMLparser to discover XML content embedded within it.
2. Character Data (CDATA): is delimited by the special syntax and is not processed by the parser.
Rule 4 PCDATA Parsed Character Data
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
17/34
Copyright IBM Corporation 2004
Rule 4. PCDATA - Parsed Character Data
Predefined entities exist to address ambiguous syntax situations,
situations where the literal would be interpreted as part of theXML document syntax rather than its content.
Examples:> 6& < 20
Entity Description Character
< "less than" "greater than" >
& "ampersand" &' "apostrophe" '" "quote" "
Rule 4 CDATA Character Data
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
18/34
Copyright IBM Corporation 2004
Rule 4. CDATA - Character Data
Syntax:
Note: Anything except the literal string "]]>";to embed "]]>" use "]]>"
CDATA is not parsed and is treated as-is.Useful for embedding other languages within the XML.
HTML documents.
XML documents.
JavaScript source.
Or any other text with a lot of special characters.
Generally speaking the escaping rules inside a CDATA section are
those of the embedded languageFor example, to escape an ampersand in Javascript use &.
Rule 4 CDATA Examples
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
19/34
Copyright IBM Corporation 2004
Rule 4. CDATA Examples
These script elements contain JavaScript:
This nameXML element stores actual XML to be treated as text:
{ return 1 } else { return 0 }}
]]>
{ return 1 } else { return 0 }}]]>
Sir Frederick of Ledyard's End
]]>
Element Rules - Rule 5 Element Attributes
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
20/34
Copyright IBM Corporation 2004
Element Rules - Rule 5. Element Attributes
Attributes are used to attach information to elements.
Attributes consist of a name="value" pair, where the name is a legalXML name. This is often referred to as a "key-value" pair.
Attributes are placed in the start tag of the element to which they
apply.An element may have several attributes, each uniquely named.
Examples:XML overview
Yacht
Notice the different usage of the attribute "type" in the two elements;semantically they are not the same.
Attributes must have a value.
Values must be quoted with either double or single quotes.
Convention is to stick with one or the other.
Element Rules - Rule 6
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
21/34
Copyright IBM Corporation 2004
Element Rules - Rule 6.XML Declaration (1 of 2)
The XML Declaration is an optional first line in all XML documents:
If this declaration is used, the version attribute is mandatory.The encodingattribute indicates the character encoding used in thedocument; if UTF-8 or UTF-16 is used it may be omitted.
ASCII is a subset of UTF-8 and need not be declared.
Comments are notallowed before this statement.The XML Declaration follows the syntax of a Processing Instructionor PI,which is described on a subsequent chart, but it is considered to beunique and is treated separately in the 1.0 XML specification.
GENERAL NOTE OF CAUTION: You can not always rely on a browser ortool to completely/correctly enforce the specifications. Nor are thespecifications alwayswritten in language that, to a particular reader, is
unambiguous. Still, the best advice is when in doubt, refer to thespecification, which for XML is www.w3.org/XML.
Element Rules - Rule 6
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
22/34
Copyright IBM Corporation 2004
The stand-aloneattribute is included here for completeness: it is used toindicate if this XML document depends on information declared externally tothis document (in a DTD or XSL file (TBD), for examples); value may be yesor no.
A value of "yes" indicates there are no external markup declarations; ifthere are no external markup declarations, the declaration has nomeaning.
A value of "no"indicates there are or may be such external markupdeclarations; if there are such declarations but there is no standalone
declaration, "no" is assumed.. . . so it is typically not used.
In any event, the inclusion in the XML instance of references to externalentities, such as those in an embedded DTD, does not change its
standalonestatus.
A bigger issue associated with the stand-aloneattribute is that of defining orsetting values in anyentity that may be external to the XML instance.
Arguably, the principal reason for using XML is that it explicitly defines theelements it includes. If attribute values are overridden then the XMLinstance before us is no longer declarative.
Element Rules - Rule 6.XML Declaration (2 of 2)
Comments
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
23/34
Copyright IBM Corporation 2004
Comments
Defines a comment.A space after the beginning and before the trailing hyphens isrecommended but not required.
A is the first letterZ is the last letter
Improper usage:
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
24/34
Copyright IBM Corporation 2004
Internationalization and Encoding (1 of 2)
Support for different character encodings is provided through theencoding attribute of the XML Declaration.
The encoding attribute indicates the set of characters that are
permitted in the document.In the absence of an encoding declaration, Unicode UTF-8 orUTF-16 characters may be used.
Documents exchanged via network may be presented to theprocessor in an encoding format other than the specified encodingas long as the transport protocol (for example, HTTP) indicates theencoding used.
Internationalization and Encoding (2 of 2)
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
25/34
Copyright IBM Corporation 2004
Internationalization and Encoding (2 of 2)
It is very important that the editor and operating system used to
write and save an XML document support the encoding specified inthe XML Declaration.
Sample encoding declarations:
ASCII (subset of UTF-8)
16 bit UNICODE...
Japanese
...
Note: Encoding names are case-insensitive
Processing Instruction
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
26/34
Copyright IBM Corporation 2004
Processing Instruction
Syntax
Processing Instruction is often abbreviated as PI indocumentation.
A feature inherited from SGML.
Used to embed application-specific instructions in documents.
The target name immediately follows "
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
27/34
Copyright IBM Corporation 2004
e o ed e sus a d
A well-formed XML document:
Consists of XML elements that are nested within another.Has a unique root element.
Follows the XML naming conventions.
Follows the XML rules for quoting attributes.
Has tags that are properly terminated.All XML parsers check for well-formedness.
A valid XML document has an associated vocabularyand obeys the
structural rules specified by that vocabulary.Associated vocabulary is typically defined by either a DTD or anXML Schema.
XML parsers may be validating or non-validating depending upon
whether or not they can apply an associated grammar.Studio is an example of a tool whose XML capabilities includevalidation.
HTML versus XML (1 of 2)
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
28/34
Copyright IBM Corporation 2004
XML is about structured informationinterchange
HTML is about presentation andbrowsing
( )
Java ProgrammingEECS
Paul Thompson
Ron Jones
Uma Abingdon
Lindsay Garmon
HTML versus XML (2 of 2)
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
29/34
Copyright IBM Corporation 2004
( )
HTML XMLCourse Roster Course Roster
XML Programming Department: EECS
Teacher Paul Thompson Student
List Ron Jones
Uma AbingdonLindsay Garmon
Java Programming EECS
Paul Thompson Ron Jones Uma Abingdon Lindsay Garmon
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
30/34
Checkpoint Questions (1 of 3)
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
31/34
Copyright IBM Corporation 2004
p ( )
1. Basic XML can be described as:
A. A hierarchical structure of tagged elements, attributes and text.B. All the HTML tags plus a set of new XML only tags.
C. Object-oriented structure of rows and columns.
D. Processing instructions (PIs) for text data.
E. Textual data with tags for visual presentation.
2. Which of these XML fragments is not well-formed?
A. XMLB. XML
C.
D. XMLXML
E. XML
Checkpoint Questions (2 of 3)
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
32/34
Copyright IBM Corporation 2004
3. XML Comments are allowed (Select all that apply):
A. Before the XML DeclarationB. Anywhere
C. Between element tags
D. Before the root element
E. All of the Above
4. Which of these XML elements with attributes is not well-formed?
A. B.
C.
D.E.
F. All of the Above
Checkpoint Questions (3 of 3)
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
33/34
Copyright IBM Corporation 2004
5. Which of these comments regarding HTML and XML is not true?
A. HTML markup is focused on presentation.B. XML markup is based on defining the data.
C. XML is based on HTML.
D. HTML tags are not case sensitive.
E. XML tags are case sensitive.F. Both XML and HTML support attributes.
Unit Summary
-
7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML
34/34
Copyright IBM Corporation 2004
Having completed this unit, you should be able to:
Describe the basic rules of XML
Describe what it means for an XML document to be well-formed
List the components that make up an XML document
Describe the differences between XML and HTML
Describe the internationalization support in XML
Describe some best practices in XML