components of an xml document

21
Components of an Components of an XML Document XML Document

Upload: anishnirmal

Post on 11-Apr-2015

6.617 views

Category:

Documents


0 download

DESCRIPTION

xml

TRANSCRIPT

Page 1: Components of an XML Document

Components of an Components of an XML DocumentXML Document

Page 2: Components of an XML Document

Definition Definition Description Description

ElementsElementsWhat XML elements are and requirements for working with them in What XML elements are and requirements for working with them in

XML documents.XML documents.

PrologPrologOutlines the order and contents of the initial prolog or XML Outlines the order and contents of the initial prolog or XML

document header in an XML document.document header in an XML document.

XML Declaration XML Declaration Explains what the XML declaration is and its required placement if Explains what the XML declaration is and its required placement if

included in XML documents.included in XML documents.

Processing InstructionsProcessing InstructionsWhat processing instructions are in XML documents and their most What processing instructions are in XML documents and their most

frequent use, as a means of linking to an XML style sheet in frequent use, as a means of linking to an XML style sheet in the prolog of an XML document.the prolog of an XML document.

DOCTYPE DeclarationDOCTYPE DeclarationWhat the DOCTYPE declaration is and how it is used to reference What the DOCTYPE declaration is and how it is used to reference

an external or internal Document Type Definition (DTD) for an external or internal Document Type Definition (DTD) for XML documents that include it.XML documents that include it.

XML CommentsXML CommentsExplains how comments can be made in XML markup as a means of Explains how comments can be made in XML markup as a means of

annotating and as a mechanism for including unparsed content annotating and as a mechanism for including unparsed content in the XML document.in the XML document.

Textual ContentTextual ContentOutlines the rules for use and inclusion of textual content (also Outlines the rules for use and inclusion of textual content (also

known as character data) in XML documents. known as character data) in XML documents.

Character and Entity Character and Entity References References

Describes XML character entities for escaping special or reserved Describes XML character entities for escaping special or reserved characters that are used to delineate markup and node characters that are used to delineate markup and node

boundaries within the XML document.boundaries within the XML document.

CDATA SectionsCDATA SectionsDescribes the use of the XML-specific CDATA (character data) Describes the use of the XML-specific CDATA (character data)

sections for fully escaping text contents (including formatting sections for fully escaping text contents (including formatting or white space contents) in XML documents.or white space contents) in XML documents.

AttributesAttributesWhat XML attributes are and requirements for working with them in What XML attributes are and requirements for working with them in

XML elements.XML elements.

White SpaceWhite SpaceThe rules and options for how white space can be handled when The rules and options for how white space can be handled when

parsing XML documents.parsing XML documents.

Page 3: Components of an XML Document

ElementsElements

►Element NamesElement Names Element names are case-sensitive and Element names are case-sensitive and

must start with a letter or underscore.must start with a letter or underscore.

►Start Tags, End Tags, and Empty Tags Start Tags, End Tags, and Empty Tags <elementName att1Name="att1Value" <elementName att1Name="att1Value"

att2Name="att2Value".../> att2Name="att2Value".../> <giggle></giggle> or <giggle/><giggle></giggle> or <giggle/>

Page 4: Components of an XML Document

PrologProlog

►The prolog refers to the information The prolog refers to the information that appears before the start tag of that appears before the start tag of the document or root element. It the document or root element. It includes information that applies to includes information that applies to the document as a whole, such as the document as a whole, such as character encoding, document character encoding, document structure, and style sheets.structure, and style sheets.

Page 5: Components of an XML Document

► <?xml version="1.0" encoding="UTF-8"?><?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" <?xml-stylesheet type="text/xsl" href="show_book.xsl"?>href="show_book.xsl"?><!DOCTYPE catalog SYSTEM "catalog.dtd"><!DOCTYPE catalog SYSTEM "catalog.dtd"><!--catalog last updated 2000-11-01--><!--catalog last updated 2000-11-01-->

► <?xml-stylesheet type="text/xsl" <?xml-stylesheet type="text/xsl" href="show_book.xsl"?>href="show_book.xsl"?>

► <! -- catalog last updated 2000-11-01--!><! -- catalog last updated 2000-11-01--!>

Page 6: Components of an XML Document

XML DeclarationXML Declaration

►The version number, <?xml The version number, <?xml version="1.0"?>.version="1.0"?>.

►The encoding declaration, <?xml The encoding declaration, <?xml version="1.0" encoding="UTF-8"?>.version="1.0" encoding="UTF-8"?>.

►An XML declaration can also contain a An XML declaration can also contain a standalone declaration, for example,standalone declaration, for example,

<?xml version="1.0" encoding="UTF-<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 8" standalone="yes"?>

Page 7: Components of an XML Document

Processing InstructionsProcessing Instructions

► Processing instructions can be used to pass Processing instructions can be used to pass information to applications in a way that information to applications in a way that escapes most XML rules. Processing escapes most XML rules. Processing instructions do not have to follow much instructions do not have to follow much internal syntax, can include markup internal syntax, can include markup characters without escaping them, and can characters without escaping them, and can appear anywhere in the document outside of appear anywhere in the document outside of other markup. They can appear in the prolog, other markup. They can appear in the prolog, including the document type definition (DTD), including the document type definition (DTD), in textual content, or after the document. in textual content, or after the document. Their appearance is not noted by schema or Their appearance is not noted by schema or DTD processors. DTD processors.

Page 8: Components of an XML Document

► The following is an xml-stylesheet processing instruction The following is an xml-stylesheet processing instruction identifying a style sheet built using a cascading style sheet.identifying a style sheet built using a cascading style sheet.

<?xml-stylesheet href="/style.css" type="text/css" <?xml-stylesheet href="/style.css" type="text/css" title="default stylesheet"?>title="default stylesheet"?>

► The following is an xml-stylesheet processing instruction The following is an xml-stylesheet processing instruction identifying a style sheet built using Extensible Stylesheet identifying a style sheet built using Extensible Stylesheet Language (XSL).Language (XSL).

<?xml-stylesheet href="/style.xsl" type="text/xsl" <?xml-stylesheet href="/style.xsl" type="text/xsl" title="default stylesheet"?>title="default stylesheet"?>

Page 9: Components of an XML Document

DOCTYPE DeclarationDOCTYPE Declaration

►A DOCTYPE declaration can contain:A DOCTYPE declaration can contain: The name of the document or root The name of the document or root

element.This is required if the DOCTYPE element.This is required if the DOCTYPE declaration is used. declaration is used.

System and public identifiers for the DTD System and public identifiers for the DTD that can be used to validate the document that can be used to validate the document structure. If a public identifier is used, a structure. If a public identifier is used, a system identifier must also be present. system identifier must also be present.

An internal subset of DTD declarations. An internal subset of DTD declarations. The internal subset appears between The internal subset appears between square brackets ([ ]). square brackets ([ ]).

Page 10: Components of an XML Document

► <!DOCTYPE rootElement PUBLIC <!DOCTYPE rootElement PUBLIC "PublicIdentifier" "PublicIdentifier" "URIreference"[declarations]>"URIreference"[declarations]>

► The The PublicIdentifierPublicIdentifier provides a separate provides a separate identifier that some XML parsers can use to identifier that some XML parsers can use to reference the DTD in place of the reference the DTD in place of the URIreferenceURIreference. This is useful if the parser is . This is useful if the parser is used on a system without a network used on a system without a network connection or where that connection would connection or where that connection would slow down processing significantly.slow down processing significantly.

Page 11: Components of an XML Document

XML CommentsXML Comments

► Content that is not intended for the XML Content that is not intended for the XML parser, such as notes about document parser, such as notes about document structure or editing, can be included in a structure or editing, can be included in a comment. Comments begin with a <!-- and comment. Comments begin with a <!-- and end with a -->end with a -->

► <!--catalog last updated 2000-11-01--<!--catalog last updated 2000-11-01-->>

► <!--- <test pattern="SECAM" /><test <!--- <test pattern="SECAM" /><test pattern="NTSC" /> -->pattern="NTSC" /> -->

Page 12: Components of an XML Document

Textual ContentTextual Content

►Because of XML support for the Unicode Because of XML support for the Unicode character set, XML supports a range of character set, XML supports a range of characters, including letters, digits, characters, including letters, digits, punctuation, and symbols. Most control punctuation, and symbols. Most control characters and Unicode compatibility characters and Unicode compatibility characters are not allowed. XML relies on characters are not allowed. XML relies on <, >, and & to delimit markup, we should <, >, and & to delimit markup, we should represent these characters using the represent these characters using the character and entity references or character and entity references or CDATA.CDATA.

Page 13: Components of an XML Document

Character and Entity Character and Entity References References

► Characters cannot be entered directly into a Characters cannot be entered directly into a document because they would be interpreted document because they would be interpreted as markup.as markup.

► Characters cannot be entered directly into a Characters cannot be entered directly into a document because of input device limitations.document because of input device limitations.

► Characters cannot be transported reliably Characters cannot be transported reliably through a processor limited to one-byte through a processor limited to one-byte characters.characters.

► A character string or document fragment A character string or document fragment appears repeatedly and can be appears repeatedly and can be

Page 14: Components of an XML Document

ltlt &lt;&lt;< (less than)< (less than)

gtgt &gt;&gt;> (greater than)> (greater than)

ampamp &amp;&amp;& (ampersand)& (ampersand)

aposapos &apos;&apos;' (apostrophe or single ' (apostrophe or single

quote)quote)

quotquot &quot;&quot;" (double quote)" (double quote)

► To write To write Me&YouMe&You, for , for example, use example, use Me&amp;YouMe&amp;You..

► For For a<ba<b, use , use a&lt;ba&lt;b..► For For b>cb>c, use , use b&gt;c.b&gt;c.► &apos&apos; is not ; is not

recognized as an recognized as an HTML file; HTML file; $#....$#.... must be used when must be used when transforming to transforming to HTML.HTML.

Page 15: Components of an XML Document

CDATA SectionsCDATA Sections► <![CDATA[An in-depth look at creating applications <![CDATA[An in-depth look at creating applications

with XML, using <, >,]]>with XML, using <, >,]]>

► <![CDATA[if (c<10)]]>Note <![CDATA[if (c<10)]]>Note Content within CDATA sections must be within the Content within CDATA sections must be within the range of characters permitted for XML content; range of characters permitted for XML content; control characters and compatibility characters control characters and compatibility characters cannot be escaped this way. In addition, the cannot be escaped this way. In addition, the sequence ]]> cannot appear within a CDATA section sequence ]]> cannot appear within a CDATA section because this sequence signals the end of the because this sequence signals the end of the section. This means that CDATA sections cannot be section. This means that CDATA sections cannot be nested. The sequence also appears in some scripts. nested. The sequence also appears in some scripts. Within scripts, it is usually possible to Within scripts, it is usually possible to substitute] ]> for ]]>.substitute] ]> for ]]>.

Page 16: Components of an XML Document

AttributesAttributes

►Attributes allow we to add information Attributes allow we to add information about an element using name-value about an element using name-value pairs. Attributes are often used to pairs. Attributes are often used to define properties of elements that are define properties of elements that are not considered the content of the not considered the content of the element, though in some cases (for element, though in some cases (for example, the HTML img element) the example, the HTML img element) the content of the element is determined content of the element is determined by attribute values.by attribute values.

Page 17: Components of an XML Document

►<elementName <elementName att1Name="att1Value" att1Name="att1Value" att2Name="att2Value".../> att2Name="att2Value".../>

►<myElement question="They asked <myElement question="They asked &quot;Why?&quot;" />&quot;Why?&quot;" />

►<myElement contraction="isn't" <myElement contraction="isn't" question='They asked "Why?"' />question='They asked "Why?"' />

Page 18: Components of an XML Document

White Space White Space

►White Space and the XML DeclarationWhite Space and the XML Declaration According to the current XML 1.0 According to the current XML 1.0

standard, white space is not allowed standard, white space is not allowed before the XML declaration.before the XML declaration.

Xml version=1.0Xml version=1.0

BOOKBOOK

BOOKNAMEXMLBOOKNAMEBOOKNAMEXMLBOOKNAME

BOOKBOOK

Page 19: Components of an XML Document

► White Space in Element ContentWhite Space in Element Content XML parsers are required to report all white space that XML parsers are required to report all white space that

appears in element content within a document. For this appears in element content within a document. For this reason, the following three documents are different to an reason, the following three documents are different to an XML parseXML parse

► documentdocument► data1datadata1data► data2datadata2data► data3datadata3data► DocumentDocument

► Documentdata1datadata2datadata3datadocumentDocumentdata1datadata2datadata3datadocument

► documentdata1data data2data data3datadocumentdocumentdata1data data2data data3datadocument

Page 20: Components of an XML Document

► White Space in AttributesWhite Space in Attributes

<whiteSpaceLoss note1="this is a note." <whiteSpaceLoss note1="this is a note." note2="thisnote2="this

isis

aa

note.">note.">

An XML parser reports both attribute values as this An XML parser reports both attribute values as this is a note., converting the line breaks to single is a note., converting the line breaks to single spaces.spaces.

Page 21: Components of an XML Document

►End of Line HandlingEnd of Line Handling XML processors treat the character XML processors treat the character

sequence Carriage Return-Line Feed sequence Carriage Return-Line Feed (CRLF) like single CR or LF characters. All (CRLF) like single CR or LF characters. All are reported as a single LF character. are reported as a single LF character. Applications can save documents using Applications can save documents using the appropriate line-ending convention.the appropriate line-ending convention.