full implementation of guids at sernec institutions: a strategy that accommodates institutions of...
TRANSCRIPT
Full implementation of GUIDs at SERNEC institutions: A strategy that
accommodates institutions of varying sizes and complex resource
relationships
Steven J. Baskauf – Vanderbilt University
Thomas Sasek - University of Louisiana at Monroe
GUIDs
Goodfor whatails you
Globally Unique Identifiers (GUIDs),
a.k.a. Persistent Identifiers
Properties of GUIDs:1. Globally unique (no two
alike!)2. Persistent (lasts forever!)3. Actionable (explains itself
to you and web crawlers on demand!)
. = technical detail warning
Identifiers that are persistent should be scalable
http://lod.geospecies.org/ses/4XSQO• This URI could represent a passive file delivery system where ses is
the name of a directory on the server and 4XSQO the name of a file in that directory (no illegal file characters)
• ses/4XSQO could also represent an identifier passed to a server-side script that generates a file on the fly from a database
• In accordance with the principle of REST (representational state transfer), the client (i.e. user with a web browser) doesn’t need to know how the server produces the file it sends-the method could change over time as needed.
• Other nice things about this style of URI– could correspond to a user’s hierarchy (e.g.
collectionCode/catalogNumber)– relatively short– no characters that need to be escaped in XML
Thanks for the example, Pete DeVries .
My grant got funded!
Identifiers that are persistent should be able to survive the
apocalypse• Grants end.• People quit.• People loose interest.
http://lsid.tdwg.org/urn:lsid:gdb.org:GenomicSegment:GDB132938
My grant ran out.
How can we provide actionability?
“Adoption of Persistent Identifiers forBiodiversity Informatics” GBIF, 2009.
ServerMan
We can do this easily with a mod_rewrite
accessing a php script that uses our MySQL database!
If this is so easy, why aren’t people using actionable GUIDs with occurrence data???
The Chicken and Egg Problem of Actionability
• Nobody is going to go to the trouble of making their GUIDs actionable if the metadata that the GUIDs return aren’t ever going to be used for anything.
• Nobody is going to build a system that gleans data from actionable GUIDs if there aren’t any GUIDs from which to harvest metadata.
(Just like the early Internet where little content was available for users!)
Economics of investing in GUIDs
• The use of GUIDs for occurrences will increase when the benefits outweigh the costs of implementation.
• If no one uses the metadata from actionable GUIDs, then in order for them to be adopted either:– the cost of implementation must be very low– there must be other benefits– or both!
SERNEC (Southeast Regional Network of Expertise and Collections): Representing
herbaria in the Southeast USA
• 125 member herbaria• 53 survey respondents• 43% of institutions have negligible to no IT support.• 40% have web pages (most are
rudimentary)• 3-4 serve data
Data courtesy of Zack Murrell of SERNEC
Economics 101
Databasing technology in SERNEC
• 75% are databasing• approximately 35% are using Excel or nothing• Although some are institutions with significant budgets, IT support , some are one-person operations with no budgets and no IT staff
Data courtesy of Zack Murrell of SERNEC
These people don’t need help
These people need a lot of help
Costs:
1. Risk: depending on someone else’s complicated solutions that may result in disaster.
Costs:
2. You may invest time in something that never happens.
Cost:
3. Unavailability of a template for generating RDF/XML
• The TDWG, GBIF, and Linked Data guidelines say we must use Resource Description Framework (RDF) in XML format to describe metadata.
• What is it? RDF describes metadata properties in a way that can be understood by computers.
• It looks like this:<dcterms:description>Field individual of Arborus rarus</dcterms:description> <dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>
Summary:
Users having few IT resources need a simple system:– that requires little or no help to implement– that can use existing database output– that requires the least possible maintenance on
the server
The cost of complex systems is too high for small users to implement without a very large benefit.
Methods for lowering the cost of implementing actionable GUIDs for small-scale users: RAX and REJAX
Review of Linked Data rules1. URIs of physical or conceptual (non-information)
resources must differ from the URLs of documents that describe them, e.g.:
http://bioimages.vanderbilt.edu/vanderbilt/7-314is an oak treehttp://bioimages.vanderbilt.edu/vanderbilt/7-314.rdfis a metadata file describing the oak tree
2. Content negotiation for actionable non-information resource URIs should produce:
A. a web page for humans to seeB. an RDF/XML file for semantic clients (i.e. computers)
EXtensible Stylesheet Language Transformation (XSLT)
RDF/XML metadatain the file0134.rdf
XSLT stylesheetin the fileguid-o-matic.xsl
XHTML web pageas seen by ahuman being
RDF And XSLT (RAX) method
1. Client requests extension-less URI.2. Server concatenates “.rdf” to the URI.3. RDF/XML file delivered to client regardless of
requested content-type.4. Web browsers use an XSLT stylesheet to
create an XHTML web page for humans from the RDF/XML.
5. Semantic clients just use the RDF/XML.
RAX Content Negotiation
web server
GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134Content-type: application/rdf+xml
http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf
I cannot send a specimen!
RDF/XML file
I am a computer. Send me http://www.cyberfloralouisiana.com/specimens/lsu000/0134
RAX Content Negotiation“I am a human. Send me
http://www.cyberfloralouisiana.com/specimens/lsu000/0134”
web server
GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134Content-type: text/html
Duh, what’s that mean? He gets RDF anyway.
http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf
RDF/XML file
what the web browser shows
Static file structure for RAX
The specimen having barcodeLSU0000134is identified by the URIhttp://www.cyberfloralouisiana.com/specimens/lsu000/0134Its RDF formatted metadata is in the filehttp://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf
Asynchronous JavaScript And XML (AJAX)
RDF/XML metadatain the filesvanderbilt/4-145.rdf (the tree)baskauf/79687.rdf (an image)baskauf/79695.rdf (another image), etc.
JavaScript in the filemetadata.htmretrieves metadata XHTML web page
created using those metadataas seen by ahuman being
Redirection, Javascript, and XSLT (REJAX) method
1. Client requests extension-less URI.2. Server does content negotiation based on
requested content-type.3. Semantic clients are sent the RDF/XML.4. Web browsers are sent a TEXT/HTML webpage
which uses JavaScript (i.e. AJAX) to open RDF/XML files and obtain the metadata required to construct the web page. The JavaScript can also retrieve blocks of XSLT formatted RDF data.
REJAX Content Negotiation
web server
GET http://bioimages.vanderbilt.edu/vanderbilt/4-145Content-type: application/rdf+xml
http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf
I cannot send a tree! I’ll send information about the tree.
RDF/XML file
“I am a computer. Send me http://bioimages.vanderbilt.edu/vanderbilt/4-145”
REJAX Content Negotiation“I am a human. Send me
http://bioimages.vanderbilt.edu/vanderbilt/4-145”
web server
GET http://bioimages.vanderbilt.edu/vanderbilt/4-145Content-type: text/html
Got it. I’ll send XHTML.
http://bioimages.vanderbilt.edu/vanderbilt/4-145.htm
XHTML file
web page created by JavaScript
The tree identified by the URIhttp://bioimages.vanderbilt.edu/vanderbilt/4-145has RDF metadata in the filehttp://bioimages.vanderbilt.edu/vanderbilt/4-145.rdfwhile the filehttp://bioimages.vanderbilt.edu/vanderbilt/4-145.htmpasses information to the javascript inhttp://bioimages.vanderbilt.edu/metadata.htm? vanderbilt/4-145/metadata/ind/etc.
Static file structure for REJAX
Comparison of RAX and REJAX Similarities Differences
Both use static files.
Both will work offline with at least some browsers.
Both require modification of only a single file to change the appearance of the web page.
RAX uses metadata from a single RDF file while REJAX inputs metadata from several RDF files.
RAX simply displays the metadata for one or more closely related resources while REJAX allows the user to interact with many resources in complex ways.
• RAX and REJAX are not programs or languages. • They are simple content-negotiation methods
that make use of the RDF/XML required by the Linked Data concept to create web pages.
Back to economics… Cost reduction
• Risk is lowered because they can operate on a generic web server with no server-side scripting. No maintenance required once set up (although a minor server rewrite rule is required).
• Little time must be invested – existing database can be used to provide metadata and implementation can be immediate.
• Scalable: URIs are such that static files can be replaced at any time by server-side scripting.
What about the RDF?RAX (specimen record) single RDF file using hash URIs
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind"><rdfs:type rdf:resource ="http://bioimages.vanderbilt.edu/rdf/terms#Individual" />… [metadata about the individual] …<rdf:Description>
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#39265b"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Identification" />… [metadata about the determination] …<rdf:Description>
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Occurrence" /><dwc:basisOfRecord rdf:resource ="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen" />… [metadata about the specimen] …<rdf:Description>
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#img">etc.
What about the RDF?REJAX (live plant image records) using multiple RDF files
<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314"><rdfs:type rdf:resource ="http://bioimages.vanderbilt.edu/rdf/terms#Individual" />… [metadata about the individual] …<rdf:Description>
<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314#19287" ><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Identification" />… [metadata about the determination] …<rdf:Description>
<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/baskauf/79651"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Occurrence" /><dwc:basisOfRecord>DigitalStillImage</dwc:basisOfRecord>… [metadata about the image] …<rdf:Description>
The importance of separation of resources in the RDF
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind">… [metadata about the individual] …<rdf:Description>
<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428">… [metadata about the specimen] …<dwc:individualID rdf:resource=" http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind" /><rdf:Description>
<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/baskauf/12345">… [metadata about the image] …<dwc:individualID rdf:resource=" http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind" /><rdf:Description>
This file is served from the herbarium’s website
This file is served from the image repository’s website
See Biodiversity Informatics 7:17-44 for much more on this.
Guid-O-Matic1. Create CSV export
containing terms that vary among specimens.
3. Create a directory to hold the RDF files.
2. Download guid-o-matic.exe (200 kB) from http://bioimages.vanderbilt.edu/guid-o-matic (no installation required).
4. Enter (one time) the stuff about your institution that doesn’t change.
5. Click this button and poof! the RDF files appear in the directory you created.
6. Re-publish your website using WinSCP or whatever.
What’s the point???• Appropriate design of the RDF structure allows
for both – simple methods of generating a representation for
humans– semantic clients drawing correct inferences about the
relationships among resources• The human end user doesn’t care about this and
doesn’t have to know about it (they’ll just see the web page.
• The raw data provider shouldn’t have to worry about what RDF is or how to use it (They just need some simple software to map their data correctly!).
Economics: benefits to small users
• Serving the files from the user’s own web server allows the users to brand their GUIDs by including their own domain name rather than that of an external host.
• Clickable attribution on websites• Reference link in PDF publication citations. • Instant iPhone “app” to access collection
metadata.• XSLT can easily be modified to meet the needs of
the users, e.g. QR codes on displays.
QR code on a museum display
Try these on your portable device (iPhone=yes, others=?)
Juncus diffusissimus specimen at the LSU herbariumhttp://www.cyberfloralouisiana.com/specimens/lsu000/0428
The “Bicentennial Oak” in Vanderbilt’s arboretumhttp://bioimages.vanderbilt.edu/vanderbilt/7-314
RAX example REJAX example
Summary• It is possible for GUIDs of the HTTP URI form to be
implemented right now, even by users with very few IT resources.
• Restricting the format of the URIs to a simple structure (no weird characters, short, slashes to indicate hierarchy) prevents dependence on a particular delivery method (you can change your mind later).
• Making HTTP URI GUIDs actionable (i.e. resolvable in XHTML) in a simple way provides immediate benefits to the issuer even if the RDF is never used by a semantic client.
• Making it practical to implement resolvable GUIDs on a large scale increases the likelihood that semantic web-based databases will evolve because the economics are shifted toward their favor (solution to chicken and egg problem).
References
• Links from Bioimages GUID page http://bioimages.vanderbilt.edu/pages/guid.htm
• TDWG GUID/LSID applicability statement http://www.tdwg.org/stdtrack/article/download/150/51
• Cool URIs don't change (Tim Berners-Lee) http://www.w3.org/Provider/Style/URI
• Cool URIs for the Semantic Web http://www.w3.org/TR/cooluris/
• Recommendations for implementation of guids in the SERNEC collections community http://bioimages.vanderbilt.edu/guid
• Biodiversity Informatics 7:17-44 https://journals.ku.edu/index.php/jbi/article/view/3664
Note: this PowerPoint will be linked from the first URL below(QR code at right loads the URL).