intro to xml in libraries

21
Intro to XML in libraries Kyle Banerjee [email protected]

Upload: kyle-banerjee

Post on 29-Jun-2015

513 views

Category:

Education


0 download

DESCRIPTION

Explanation of XML, how it is processed, and common examples of its application in libraries

TRANSCRIPT

Page 1: Intro to XML in libraries

Intro to XML in librariesKyle Banerjee

[email protected]

Page 2: Intro to XML in libraries

Why do libraries use XML?

• Easy to share information

• Strict syntax and human readability make it easy to work with

• Create any structure you need

• Many tools for all operating systems

• Schema support

• Namespace support

2

Page 3: Intro to XML in libraries

Disadvantages

• Requires an external application

• Verbose

• Inefficient

• Picky – everything stops when data is not well formed

• No intrinsic data types

3

Page 4: Intro to XML in libraries

Encoded Archival Description (EAD)

4

Page 5: Intro to XML in libraries

Open Archives Initiative Protocol for Metadata Harvesting(OAI-PMH)

5

Page 6: Intro to XML in libraries

NISO Circulation Interchange Protocol (NCIP)

6

<!DOCTYPE NCIPMessage PUBLIC "-//NISO//NCIP DTD Version 1.0//EN" "http://www.niso.org/ncip/v1_0/imp1/dtd/ncip_v1_0.dtd"><NCIPMessage version="http://www.niso.org/ncip/v1_0/imp1/dtd/ncip_v1_0.dtd"> <LookupUserResponse> <ResponseHeader> <FromAgencyId> <UniqueAgencyId> <Scheme>http://136.181.125.166:6601/IRCIRCD?target=get_scheme_values&amp;scheme=UniqueAgencyId</Scheme> <Value>zv229</Value> </UniqueAgencyId> </FromAgencyId> <ToAgencyId> <UniqueAgencyId> <Scheme>http://136.181.125.166:6601/IRCIRCD?target=get_scheme_values&amp;scheme=UniqueAgencyId</Scheme> <Value>melir</Value> </UniqueAgencyId> </ToAgencyId> </ResponseHeader>

… [rest of entry deleted]

Page 7: Intro to XML in libraries

MARCXML

<record xmlns="http://www.loc.gov/MARC21/slim">

<leader>00000cas a2200000 4500</leader>

<controlfield tag="001">1798471</controlfield>

<controlfield tag="008">750909d19722001sw qx p ob 0 a0eng</controlfield>

<datafield ind1=" " ind2=" " tag="010">

<subfield code="a">75640778</subfield>

</datafield>

<datafield ind1=" " ind2=" " tag="022">

<subfield code="a">0105-0397</subfield>

<subfield code="l">0105-0397</subfield>

<subfield code="2">1</subfield>

</datafield>

…[rest of record deleted]7

Page 8: Intro to XML in libraries

Dublin Core (DC)

<qdc:qualifieddc xmlns:qdc="http://epubs.cclrc.ac.uk/xmlns/qdc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://epubs.cclrc.ac.uk/xmlns/qdc/ http://epubs.cclrc.ac.uk/xsd/qdc.xsd">

<dc:creator>Huntington, C. L.</dc:creator>

<dc:title>Horseshoe Bend near Wolf Creek, Southern Pacific Railroad, Shasta Route</dc:title>

<dc:date>1908-00-00</dc:date>

<dc:date>1900-1909</dc:date>

<dc:subject>Railroad tracks; Forests; Railroad locomotives</dc:subject>

<dc:coverage>Josephine County (Ore.)</dc:coverage>

<dc:type>Image</dc:type>

<dc:source>Postcards</dc:source>

<dc:source>Gerald W. Williams Collection</dc:source>

<dc:title>Umpqua Album</dc:title>

<dcterms:isPartOf>WilliamsG:Horseshoe Bend</dcterms:isPartOf>

..[rest of record deleted] 8

Page 9: Intro to XML in libraries

Search / Retrieve via URL (SRU)

9

Page 10: Intro to XML in libraries

And enough other stuff to blow your mind

• RDF

• Darwin Core

• VRA Core

• MODS

10

• MADS

• PBCore

• Webapps and other cool stuff

Page 11: Intro to XML in libraries

XML is not a language

• It’s a grammar that specifies a structure for exchanging information

• XML cannot do anything by itself• When most people talk about XML, they are

actually referring to a family of related technologies

• Don’t confuse XML (a data structure standard) with content standards such as AACR2R/RDA, DACS, LCNAF, LCSH, MeSH, and AAT

11

Page 12: Intro to XML in libraries

Interpreting XML

• Common methods are Document Object Model (DOM) and Simple API for XML (SAX)

• DOM is more common and far more powerful. Best for smaller files and documents

• SAX is much faster and requires much less memory. Best for large files

12

Page 13: Intro to XML in libraries

XML Document

<?xml version = “1.0”?><inventory> <book> <title>My Dog</title> </book> <book> <title>My Cat</title> </book></inventory>

DOM (tree structure) SAX (linear events)

Start document

Start element: inventoryStart element: bookStart element: titleCharacters: My DogEnd element: titleEnd element: book

Start element: bookStart element: titleCharacters: My CatEnd element: titleEnd element: book

End document

DOM vs. SAX

13

inventory

book book

title title

My Dog

My Cat

Page 14: Intro to XML in libraries

DOM basics

• Platform independent way to represent and interact with XML documents

• All nodes and relationships are accessible

• Great for generating and displaying documents (e.g. EAD), interpreting messages (e.g. NCIP, OAI-PMH)

• Must load entire document into memory – terrible for transferring millions of records

14

Page 15: Intro to XML in libraries

SAX (Simple API for XML)

• Not formally defined

• Relies on events – detects beginnings/ends of elements, attributes, etc.

• Does not require loading file into memory

• Great for extracting info from large files but awkward for interpreting documents

15

Page 16: Intro to XML in libraries

XML Document

<?xml version = “1.0”?><inventory> <book> <title>My Dog</title> </book> <book> <title>My Cat</title> </book></inventory>

JSON

{“inventory”: { “book”: { “title”: “My Dog” }, “book”: { “title”: “My Cat” } }}

Delimited

Inventory

Common Alternatives to XML

16

Item type Title

book My Dog

book My Cat

Page 17: Intro to XML in libraries

Why Delimited or JSON?

• Delimited– Easiest to parse– Works great with tabular data– Not good for arbitrary and nested structures

• JSON– Much simpler and easier to use– Bad for situations where markup languages are

appropriate (e.g. documents)

17

Page 18: Intro to XML in libraries

XML = Data Duct Tape

• Very useful and is here to stay

• Best uses are documents, messaging, and data transport

• Can be used for almost anything but sometimes not a good choice

18

Page 19: Intro to XML in libraries

XML and Life after MARC

• Use of XML will expand as the role of the traditional catalog wanes

• Expect growth as libraries need to provide access to a greater variety of resources

• XML will be critical as linked data becomes more common

19

Page 20: Intro to XML in libraries

What You Should Do Now

• Be aware of what XML is

• Know what it is good for

• Learn specifics on an as needed basis

20

Page 21: Intro to XML in libraries

Thank You!Kyle Banerjee

[email protected]