notes from the library juice academy course, “introduction to xml”: university of florida...
TRANSCRIPT
Tech TalkBIBFRAME Working Group
20 October 2015
Notes from the Library Juice Academy course, “Introduction to XML”
Allison Jai O’Dell | [email protected] || Hikaru Nakano | [email protected]
Douglas Smith | [email protected] || Gerald Langford | [email protected]
Introduction to XML“The Extensible Markup Language (XML) is a simple text-based format for
representing structured information: documents, data, configuration, books,
transactions, invoices, and much more. ” -- http://www.w3.org/standards/xml/core
Basic XML Structure: Elements and Attributes
Element: an element consists of an opening tag, a closing tag, and anything in between the opening and closing tag, such as, its attributes, other elements, or text content in any combination.
Example: <bunny>Frances</bunny>
Attribute: an attribute provides more information about an element. It is often the case that an attribute contains information that is not really part of the data in the element, but rather helps a user, software, or a processor understand or do something with the element and its data.
Example: <bunny type=“mini_lop”>Frances</bunny>
Well-Formed XML
• XML elements must have an opening and closing tag
• XML tags are case sensitive
• XML elements must be properly nested
• XML attribute values must be quoted
• XML documents should have an opening XML declaration
• XML documents must have a root element
When an XML document follows all basic syntax rules it is considered to be 'well-formed.’ A well-formed XML document can be successfully parsed by any XML processor.
XML Namespaces
Namespaces are actually a companion standard to the W3C XML standard (see, http://www.w3.org/TR/REC-xml-names/)
XML Namespaces provide a way to create uniquely named elements and attributes in an XML document.
XML Character Encoding
UTF-8 is a default character encoding for XML, which means that the full Unicode standard character set is available to your XML document should you need to represent any language, character, or symbol in your document.
Best practices dictate that an XML document's character encoding should always be declared in the XML processing instruction, for example: <?xml version="1.0" encoding="UTF-8"?>
XML Example
<?xml version="1.0" encoding="UTF-8"?>
<bunnies xmlns:food=“http://www.example.com/food”>
<bunny>
<name>Frances</name>
<breed>mini lop</breed>
<gender>female</gender>
<color>white with brown spots</color>
<birth>January 10, 2009</birth>
<food:fave>strawberries, parsley,
cilantro, carrots</food:fave>
</bunny>
<bunny status="RBB">
<name>Howard</name>
<breed>mixed, dwarf</breed>
<gender>male</gender>
<color>light brown agouti</color>
<birth>March 15, 2009</birth>
<death>September 1, 2012</death>
</bunny>
</bunnies>
• opening and closing tag
• case sensitive
• properly nested
• quoted attribute values
• opening XML declaration
• character encoding
• root element
• namespace declaration
XML Entity References
In the XML standard there are two characters that are considered reserved and can not be used in any XML document: < &
3 additional characters have a special meaning and it is considered best practice not to use them in an XML document: > ' “
Entity references for special characters:
< < less than
> > grater than
& & ampersand
' ‘ apostrophe
" “ quotation mark
CDATA
CDATA is an abbreviation of the term 'Character Data'. In XML CDATA is a way to 'escape' a block of text that might contain markup or text characters that you don't want the XML processor to parse.
In other words, it's a way to tell an XML processor to just ignore a block of text and keep the text exactly as it is without processing.
Example:
<![CDATA[
I can use characters like > < " and & or write things like
<foo></bar> but my document is still well formed!
]]>