full implementation of guids at sernec institutions: a strategy that accommodates institutions of...

Full implementation of GUIDs at SERNEC institutions: A strategy that

accommodates institutions of varying sizes and complex resource

relationships

Steven J. Baskauf – Vanderbilt University

Thomas Sasek - University of Louisiana at Monroe

GUIDs

Goodfor whatails you

Globally Unique Identifiers (GUIDs),

a.k.a. Persistent Identifiers

Properties of GUIDs:1. Globally unique (no two

alike!)2. Persistent (lasts forever!)3. Actionable (explains itself

to you and web crawlers on demand!)

. = technical detail warning

Identifiers that are persistent should be scalable

http://lod.geospecies.org/ses/4XSQO• This URI could represent a passive file delivery system where ses is

the name of a directory on the server and 4XSQO the name of a file in that directory (no illegal file characters)

• ses/4XSQO could also represent an identifier passed to a server-side script that generates a file on the fly from a database

• In accordance with the principle of REST (representational state transfer), the client (i.e. user with a web browser) doesn’t need to know how the server produces the file it sends-the method could change over time as needed.

• Other nice things about this style of URI– could correspond to a user’s hierarchy (e.g.

collectionCode/catalogNumber)– relatively short– no characters that need to be escaped in XML

Thanks for the example, Pete DeVries .

My grant got funded!

Identifiers that are persistent should be able to survive the

apocalypse• Grants end.• People quit.• People loose interest.

http://lsid.tdwg.org/urn:lsid:gdb.org:GenomicSegment:GDB132938

My grant ran out.

How can we provide actionability?

“Adoption of Persistent Identifiers forBiodiversity Informatics” GBIF, 2009.

ServerMan

We can do this easily with a mod_rewrite

accessing a php script that uses our MySQL database!

If this is so easy, why aren’t people using actionable GUIDs with occurrence data???

The Chicken and Egg Problem of Actionability

• Nobody is going to go to the trouble of making their GUIDs actionable if the metadata that the GUIDs return aren’t ever going to be used for anything.

• Nobody is going to build a system that gleans data from actionable GUIDs if there aren’t any GUIDs from which to harvest metadata.

(Just like the early Internet where little content was available for users!)

Economics of investing in GUIDs

• The use of GUIDs for occurrences will increase when the benefits outweigh the costs of implementation.

• If no one uses the metadata from actionable GUIDs, then in order for them to be adopted either:– the cost of implementation must be very low– there must be other benefits– or both!

SERNEC (Southeast Regional Network of Expertise and Collections): Representing

herbaria in the Southeast USA

• 125 member herbaria• 53 survey respondents• 43% of institutions have negligible to no IT support.• 40% have web pages (most are

rudimentary)• 3-4 serve data

Data courtesy of Zack Murrell of SERNEC

Economics 101

Databasing technology in SERNEC

• 75% are databasing• approximately 35% are using Excel or nothing• Although some are institutions with significant budgets, IT support , some are one-person operations with no budgets and no IT staff

Data courtesy of Zack Murrell of SERNEC

These people don’t need help

These people need a lot of help

Costs:

1. Risk: depending on someone else’s complicated solutions that may result in disaster.

Costs:

2. You may invest time in something that never happens.

Cost:

3. Unavailability of a template for generating RDF/XML

• The TDWG, GBIF, and Linked Data guidelines say we must use Resource Description Framework (RDF) in XML format to describe metadata.

• What is it? RDF describes metadata properties in a way that can be understood by computers.

• It looks like this:<dcterms:description>Field individual of Arborus rarus</dcterms:description> <dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>

Summary:

Users having few IT resources need a simple system:– that requires little or no help to implement– that can use existing database output– that requires the least possible maintenance on

the server

The cost of complex systems is too high for small users to implement without a very large benefit.

Methods for lowering the cost of implementing actionable GUIDs for small-scale users: RAX and REJAX

Review of Linked Data rules1. URIs of physical or conceptual (non-information)

resources must differ from the URLs of documents that describe them, e.g.:

http://bioimages.vanderbilt.edu/vanderbilt/7-314is an oak treehttp://bioimages.vanderbilt.edu/vanderbilt/7-314.rdfis a metadata file describing the oak tree

2. Content negotiation for actionable non-information resource URIs should produce:

A. a web page for humans to seeB. an RDF/XML file for semantic clients (i.e. computers)

EXtensible Stylesheet Language Transformation (XSLT)

RDF/XML metadatain the file0134.rdf

XSLT stylesheetin the fileguid-o-matic.xsl

XHTML web pageas seen by ahuman being

RDF And XSLT (RAX) method

1. Client requests extension-less URI.2. Server concatenates “.rdf” to the URI.3. RDF/XML file delivered to client regardless of

requested content-type.4. Web browsers use an XSLT stylesheet to

create an XHTML web page for humans from the RDF/XML.

5. Semantic clients just use the RDF/XML.

RAX Content Negotiation

web server

GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134Content-type: application/rdf+xml

http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf

I cannot send a specimen!

RDF/XML file

I am a computer. Send me http://www.cyberfloralouisiana.com/specimens/lsu000/0134

RAX Content Negotiation“I am a human. Send me

http://www.cyberfloralouisiana.com/specimens/lsu000/0134”

web server

GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134Content-type: text/html

Duh, what’s that mean? He gets RDF anyway.

http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf

RDF/XML file

what the web browser shows

Static file structure for RAX

The specimen having barcodeLSU0000134is identified by the URIhttp://www.cyberfloralouisiana.com/specimens/lsu000/0134Its RDF formatted metadata is in the filehttp://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf

Asynchronous JavaScript And XML (AJAX)

RDF/XML metadatain the filesvanderbilt/4-145.rdf (the tree)baskauf/79687.rdf (an image)baskauf/79695.rdf (another image), etc.

JavaScript in the filemetadata.htmretrieves metadata XHTML web page

created using those metadataas seen by ahuman being

Redirection, Javascript, and XSLT (REJAX) method

1. Client requests extension-less URI.2. Server does content negotiation based on

requested content-type.3. Semantic clients are sent the RDF/XML.4. Web browsers are sent a TEXT/HTML webpage

which uses JavaScript (i.e. AJAX) to open RDF/XML files and obtain the metadata required to construct the web page. The JavaScript can also retrieve blocks of XSLT formatted RDF data.

REJAX Content Negotiation

web server

GET http://bioimages.vanderbilt.edu/vanderbilt/4-145Content-type: application/rdf+xml

http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf

I cannot send a tree! I’ll send information about the tree.

RDF/XML file

“I am a computer. Send me http://bioimages.vanderbilt.edu/vanderbilt/4-145”

REJAX Content Negotiation“I am a human. Send me

http://bioimages.vanderbilt.edu/vanderbilt/4-145”

web server

GET http://bioimages.vanderbilt.edu/vanderbilt/4-145Content-type: text/html

Got it. I’ll send XHTML.

http://bioimages.vanderbilt.edu/vanderbilt/4-145.htm

XHTML file

web page created by JavaScript

The tree identified by the URIhttp://bioimages.vanderbilt.edu/vanderbilt/4-145has RDF metadata in the filehttp://bioimages.vanderbilt.edu/vanderbilt/4-145.rdfwhile the filehttp://bioimages.vanderbilt.edu/vanderbilt/4-145.htmpasses information to the javascript inhttp://bioimages.vanderbilt.edu/metadata.htm? vanderbilt/4-145/metadata/ind/etc.

Static file structure for REJAX

Comparison of RAX and REJAX Similarities Differences

Both use static files.

Both will work offline with at least some browsers.

Both require modification of only a single file to change the appearance of the web page.

RAX uses metadata from a single RDF file while REJAX inputs metadata from several RDF files.

RAX simply displays the metadata for one or more closely related resources while REJAX allows the user to interact with many resources in complex ways.

• RAX and REJAX are not programs or languages. • They are simple content-negotiation methods

that make use of the RDF/XML required by the Linked Data concept to create web pages.

Back to economics… Cost reduction

• Risk is lowered because they can operate on a generic web server with no server-side scripting. No maintenance required once set up (although a minor server rewrite rule is required).

• Little time must be invested – existing database can be used to provide metadata and implementation can be immediate.

• Scalable: URIs are such that static files can be replaced at any time by server-side scripting.

What about the RDF?RAX (specimen record) single RDF file using hash URIs

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind"><rdfs:type rdf:resource ="http://bioimages.vanderbilt.edu/rdf/terms#Individual" />… [metadata about the individual] …<rdf:Description>

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#39265b"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Identification" />… [metadata about the determination] …<rdf:Description>

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Occurrence" /><dwc:basisOfRecord rdf:resource ="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen" />… [metadata about the specimen] …<rdf:Description>

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#img">etc.

What about the RDF?REJAX (live plant image records) using multiple RDF files

<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314"><rdfs:type rdf:resource ="http://bioimages.vanderbilt.edu/rdf/terms#Individual" />… [metadata about the individual] …<rdf:Description>

<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314#19287" ><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Identification" />… [metadata about the determination] …<rdf:Description>

<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/baskauf/79651"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Occurrence" /><dwc:basisOfRecord>DigitalStillImage</dwc:basisOfRecord>… [metadata about the image] …<rdf:Description>

The importance of separation of resources in the RDF

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind">… [metadata about the individual] …<rdf:Description>

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428">… [metadata about the specimen] …<dwc:individualID rdf:resource=" http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind" /><rdf:Description>

<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/baskauf/12345">… [metadata about the image] …<dwc:individualID rdf:resource=" http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind" /><rdf:Description>

This file is served from the herbarium’s website

This file is served from the image repository’s website

See Biodiversity Informatics 7:17-44 for much more on this.

Guid-O-Matic1. Create CSV export

containing terms that vary among specimens.

3. Create a directory to hold the RDF files.

2. Download guid-o-matic.exe (200 kB) from http://bioimages.vanderbilt.edu/guid-o-matic (no installation required).

4. Enter (one time) the stuff about your institution that doesn’t change.

5. Click this button and poof! the RDF files appear in the directory you created.

6. Re-publish your website using WinSCP or whatever.

What’s the point???• Appropriate design of the RDF structure allows

for both – simple methods of generating a representation for

humans– semantic clients drawing correct inferences about the

relationships among resources• The human end user doesn’t care about this and

doesn’t have to know about it (they’ll just see the web page.

• The raw data provider shouldn’t have to worry about what RDF is or how to use it (They just need some simple software to map their data correctly!).

Economics: benefits to small users

• Serving the files from the user’s own web server allows the users to brand their GUIDs by including their own domain name rather than that of an external host.

• Clickable attribution on websites• Reference link in PDF publication citations. • Instant iPhone “app” to access collection

metadata.• XSLT can easily be modified to meet the needs of

the users, e.g. QR codes on displays.

QR code on a museum display

Try these on your portable device (iPhone=yes, others=?)

Juncus diffusissimus specimen at the LSU herbariumhttp://www.cyberfloralouisiana.com/specimens/lsu000/0428

The “Bicentennial Oak” in Vanderbilt’s arboretumhttp://bioimages.vanderbilt.edu/vanderbilt/7-314

RAX example REJAX example

Summary• It is possible for GUIDs of the HTTP URI form to be

implemented right now, even by users with very few IT resources.

• Restricting the format of the URIs to a simple structure (no weird characters, short, slashes to indicate hierarchy) prevents dependence on a particular delivery method (you can change your mind later).

• Making HTTP URI GUIDs actionable (i.e. resolvable in XHTML) in a simple way provides immediate benefits to the issuer even if the RDF is never used by a semantic client.

• Making it practical to implement resolvable GUIDs on a large scale increases the likelihood that semantic web-based databases will evolve because the economics are shifted toward their favor (solution to chicken and egg problem).

References

• Links from Bioimages GUID page http://bioimages.vanderbilt.edu/pages/guid.htm

• TDWG GUID/LSID applicability statement http://www.tdwg.org/stdtrack/article/download/150/51

• Cool URIs don't change (Tim Berners-Lee) http://www.w3.org/Provider/Style/URI

• Cool URIs for the Semantic Web http://www.w3.org/TR/cooluris/

• Recommendations for implementation of guids in the SERNEC collections community http://bioimages.vanderbilt.edu/guid

• Biodiversity Informatics 7:17-44 https://journals.ku.edu/index.php/jbi/article/view/3664

Note: this PowerPoint will be linked from the first URL below(QR code at right loads the URL).

http://bioimages.vanderbilt.edu/pages/guid.htm

http://www.tdwg.org/stdtrack/article/download/150/51

http://www.w3.org/Provider/Style/URI

http://www.w3.org/TR/cooluris/

http://bioimages.vanderbilt.edu/guid

https://journals.ku.edu/index.php/jbi/article/view/3664

full implementation of guids at sernec institutions: a strategy that accommodates institutions of...

Documents