full implementation of guids at sernec institutions: a strategy that accommodates institutions of...

37
Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven J. Baskauf – Vanderbilt University Thomas Sasek - University of Louisiana at Monroe

Upload: vivian-parrish

Post on 01-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Full implementation of GUIDs at SERNEC institutions: A strategy that

accommodates institutions of varying sizes and complex resource

relationships

Steven J. Baskauf – Vanderbilt University

Thomas Sasek - University of Louisiana at Monroe

Page 2: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

GUIDs

Goodfor whatails you

Globally Unique Identifiers (GUIDs),

a.k.a. Persistent Identifiers

Properties of GUIDs:1. Globally unique (no two

alike!)2. Persistent (lasts forever!)3. Actionable (explains itself

to you and web crawlers on demand!)

. = technical detail warning

Page 3: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Identifiers that are persistent should be scalable

http://lod.geospecies.org/ses/4XSQO• This URI could represent a passive file delivery system where ses is

the name of a directory on the server and 4XSQO the name of a file in that directory (no illegal file characters)

• ses/4XSQO could also represent an identifier passed to a server-side script that generates a file on the fly from a database

• In accordance with the principle of REST (representational state transfer), the client (i.e. user with a web browser) doesn’t need to know how the server produces the file it sends-the method could change over time as needed.

• Other nice things about this style of URI– could correspond to a user’s hierarchy (e.g.

collectionCode/catalogNumber)– relatively short– no characters that need to be escaped in XML

Thanks for the example, Pete DeVries .

My grant got funded!

Page 4: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Identifiers that are persistent should be able to survive the

apocalypse• Grants end.• People quit.• People loose interest.

http://lsid.tdwg.org/urn:lsid:gdb.org:GenomicSegment:GDB132938

My grant ran out.

Page 5: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

How can we provide actionability?

“Adoption of Persistent Identifiers forBiodiversity Informatics” GBIF, 2009.

ServerMan

We can do this easily with a mod_rewrite

accessing a php script that uses our MySQL database!

If this is so easy, why aren’t people using actionable GUIDs with occurrence data???

Page 6: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

The Chicken and Egg Problem of Actionability

• Nobody is going to go to the trouble of making their GUIDs actionable if the metadata that the GUIDs return aren’t ever going to be used for anything.

• Nobody is going to build a system that gleans data from actionable GUIDs if there aren’t any GUIDs from which to harvest metadata.

(Just like the early Internet where little content was available for users!)

Page 7: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Economics of investing in GUIDs

• The use of GUIDs for occurrences will increase when the benefits outweigh the costs of implementation.

• If no one uses the metadata from actionable GUIDs, then in order for them to be adopted either:– the cost of implementation must be very low– there must be other benefits– or both!

Page 8: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

SERNEC (Southeast Regional Network of Expertise and Collections): Representing

herbaria in the Southeast USA

• 125 member herbaria• 53 survey respondents• 43% of institutions have negligible to no IT support.• 40% have web pages (most are

rudimentary)• 3-4 serve data

Data courtesy of Zack Murrell of SERNEC

Economics 101

Page 9: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Databasing technology in SERNEC

• 75% are databasing• approximately 35% are using Excel or nothing• Although some are institutions with significant budgets, IT support , some are one-person operations with no budgets and no IT staff

Data courtesy of Zack Murrell of SERNEC

These people don’t need help

These people need a lot of help

Page 10: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Costs:

1. Risk: depending on someone else’s complicated solutions that may result in disaster.

Page 11: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Costs:

2. You may invest time in something that never happens.

Page 12: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Cost:

3. Unavailability of a template for generating RDF/XML

• The TDWG, GBIF, and Linked Data guidelines say we must use Resource Description Framework (RDF) in XML format to describe metadata.

• What is it? RDF describes metadata properties in a way that can be understood by computers.

• It looks like this:<dcterms:description>Field individual of Arborus rarus</dcterms:description> <dcterms:type rdf:resource="http://purl.org/dc/dcmitype/PhysicalObject"/>

Page 13: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Summary:

Users having few IT resources need a simple system:– that requires little or no help to implement– that can use existing database output– that requires the least possible maintenance on

the server

The cost of complex systems is too high for small users to implement without a very large benefit.

Page 14: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Methods for lowering the cost of implementing actionable GUIDs for small-scale users: RAX and REJAX

Page 15: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Review of Linked Data rules1. URIs of physical or conceptual (non-information)

resources must differ from the URLs of documents that describe them, e.g.:

http://bioimages.vanderbilt.edu/vanderbilt/7-314is an oak treehttp://bioimages.vanderbilt.edu/vanderbilt/7-314.rdfis a metadata file describing the oak tree

2. Content negotiation for actionable non-information resource URIs should produce:

A. a web page for humans to seeB. an RDF/XML file for semantic clients (i.e. computers)

Page 16: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

EXtensible Stylesheet Language Transformation (XSLT)

RDF/XML metadatain the file0134.rdf

XSLT stylesheetin the fileguid-o-matic.xsl

XHTML web pageas seen by ahuman being

Page 17: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

RDF And XSLT (RAX) method

1. Client requests extension-less URI.2. Server concatenates “.rdf” to the URI.3. RDF/XML file delivered to client regardless of

requested content-type.4. Web browsers use an XSLT stylesheet to

create an XHTML web page for humans from the RDF/XML.

5. Semantic clients just use the RDF/XML.

Page 18: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

RAX Content Negotiation

web server

GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134Content-type: application/rdf+xml

http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf

I cannot send a specimen!

RDF/XML file

I am a computer. Send me http://www.cyberfloralouisiana.com/specimens/lsu000/0134

Page 19: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

RAX Content Negotiation“I am a human. Send me

http://www.cyberfloralouisiana.com/specimens/lsu000/0134”

web server

GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134Content-type: text/html

Duh, what’s that mean? He gets RDF anyway.

http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf

RDF/XML file

what the web browser shows

Page 20: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Static file structure for RAX

The specimen having barcodeLSU0000134is identified by the URIhttp://www.cyberfloralouisiana.com/specimens/lsu000/0134Its RDF formatted metadata is in the filehttp://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf

Page 21: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Asynchronous JavaScript And XML (AJAX)

RDF/XML metadatain the filesvanderbilt/4-145.rdf (the tree)baskauf/79687.rdf (an image)baskauf/79695.rdf (another image), etc.

JavaScript in the filemetadata.htmretrieves metadata XHTML web page

created using those metadataas seen by ahuman being

Page 22: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Redirection, Javascript, and XSLT (REJAX) method

1. Client requests extension-less URI.2. Server does content negotiation based on

requested content-type.3. Semantic clients are sent the RDF/XML.4. Web browsers are sent a TEXT/HTML webpage

which uses JavaScript (i.e. AJAX) to open RDF/XML files and obtain the metadata required to construct the web page. The JavaScript can also retrieve blocks of XSLT formatted RDF data.

Page 23: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

REJAX Content Negotiation

web server

GET http://bioimages.vanderbilt.edu/vanderbilt/4-145Content-type: application/rdf+xml

http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf

I cannot send a tree! I’ll send information about the tree.

RDF/XML file

“I am a computer. Send me http://bioimages.vanderbilt.edu/vanderbilt/4-145”

Page 24: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

REJAX Content Negotiation“I am a human. Send me

http://bioimages.vanderbilt.edu/vanderbilt/4-145”

web server

GET http://bioimages.vanderbilt.edu/vanderbilt/4-145Content-type: text/html

Got it. I’ll send XHTML.

http://bioimages.vanderbilt.edu/vanderbilt/4-145.htm

XHTML file

web page created by JavaScript

Page 25: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

The tree identified by the URIhttp://bioimages.vanderbilt.edu/vanderbilt/4-145has RDF metadata in the filehttp://bioimages.vanderbilt.edu/vanderbilt/4-145.rdfwhile the filehttp://bioimages.vanderbilt.edu/vanderbilt/4-145.htmpasses information to the javascript inhttp://bioimages.vanderbilt.edu/metadata.htm? vanderbilt/4-145/metadata/ind/etc.

Static file structure for REJAX

Page 26: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Comparison of RAX and REJAX Similarities Differences

Both use static files.

Both will work offline with at least some browsers.

Both require modification of only a single file to change the appearance of the web page.

RAX uses metadata from a single RDF file while REJAX inputs metadata from several RDF files.

RAX simply displays the metadata for one or more closely related resources while REJAX allows the user to interact with many resources in complex ways.

• RAX and REJAX are not programs or languages. • They are simple content-negotiation methods

that make use of the RDF/XML required by the Linked Data concept to create web pages.

Page 27: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Back to economics… Cost reduction

• Risk is lowered because they can operate on a generic web server with no server-side scripting. No maintenance required once set up (although a minor server rewrite rule is required).

• Little time must be invested – existing database can be used to provide metadata and implementation can be immediate.

• Scalable: URIs are such that static files can be replaced at any time by server-side scripting.

Page 28: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

What about the RDF?RAX (specimen record) single RDF file using hash URIs

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind"><rdfs:type rdf:resource ="http://bioimages.vanderbilt.edu/rdf/terms#Individual" />… [metadata about the individual] …<rdf:Description>

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#39265b"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Identification" />… [metadata about the determination] …<rdf:Description>

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Occurrence" /><dwc:basisOfRecord rdf:resource ="http://rs.tdwg.org/dwc/dwctype/PreservedSpecimen" />… [metadata about the specimen] …<rdf:Description>

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#img">etc.

Page 29: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

What about the RDF?REJAX (live plant image records) using multiple RDF files

<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314"><rdfs:type rdf:resource ="http://bioimages.vanderbilt.edu/rdf/terms#Individual" />… [metadata about the individual] …<rdf:Description>

<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314#19287" ><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Identification" />… [metadata about the determination] …<rdf:Description>

<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/baskauf/79651"><rdfs:type rdf:resource ="http://rs.tdwg.org/dwc/terms/Occurrence" /><dwc:basisOfRecord>DigitalStillImage</dwc:basisOfRecord>… [metadata about the image] …<rdf:Description>

Page 30: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

The importance of separation of resources in the RDF

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind">… [metadata about the individual] …<rdf:Description>

<rdf:Description rdf:about="http://www.cyberfloralouisiana.com/specimens/lsu000/0428">… [metadata about the specimen] …<dwc:individualID rdf:resource=" http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind" /><rdf:Description>

<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/baskauf/12345">… [metadata about the image] …<dwc:individualID rdf:resource=" http://www.cyberfloralouisiana.com/specimens/lsu000/0428#ind" /><rdf:Description>

This file is served from the herbarium’s website

This file is served from the image repository’s website

See Biodiversity Informatics 7:17-44 for much more on this.

Page 31: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Guid-O-Matic1. Create CSV export

containing terms that vary among specimens.

3. Create a directory to hold the RDF files.

2. Download guid-o-matic.exe (200 kB) from http://bioimages.vanderbilt.edu/guid-o-matic (no installation required).

4. Enter (one time) the stuff about your institution that doesn’t change.

5. Click this button and poof! the RDF files appear in the directory you created.

6. Re-publish your website using WinSCP or whatever.

Page 32: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

What’s the point???• Appropriate design of the RDF structure allows

for both – simple methods of generating a representation for

humans– semantic clients drawing correct inferences about the

relationships among resources• The human end user doesn’t care about this and

doesn’t have to know about it (they’ll just see the web page.

• The raw data provider shouldn’t have to worry about what RDF is or how to use it (They just need some simple software to map their data correctly!).

Page 33: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Economics: benefits to small users

• Serving the files from the user’s own web server allows the users to brand their GUIDs by including their own domain name rather than that of an external host.

• Clickable attribution on websites• Reference link in PDF publication citations. • Instant iPhone “app” to access collection

metadata.• XSLT can easily be modified to meet the needs of

the users, e.g. QR codes on displays.

Page 34: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

QR code on a museum display

Page 35: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Try these on your portable device (iPhone=yes, others=?)

Juncus diffusissimus specimen at the LSU herbariumhttp://www.cyberfloralouisiana.com/specimens/lsu000/0428

The “Bicentennial Oak” in Vanderbilt’s arboretumhttp://bioimages.vanderbilt.edu/vanderbilt/7-314

RAX example REJAX example

Page 36: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

Summary• It is possible for GUIDs of the HTTP URI form to be

implemented right now, even by users with very few IT resources.

• Restricting the format of the URIs to a simple structure (no weird characters, short, slashes to indicate hierarchy) prevents dependence on a particular delivery method (you can change your mind later).

• Making HTTP URI GUIDs actionable (i.e. resolvable in XHTML) in a simple way provides immediate benefits to the issuer even if the RDF is never used by a semantic client.

• Making it practical to implement resolvable GUIDs on a large scale increases the likelihood that semantic web-based databases will evolve because the economics are shifted toward their favor (solution to chicken and egg problem).

Page 37: Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven

References

• Links from Bioimages GUID page http://bioimages.vanderbilt.edu/pages/guid.htm

• TDWG GUID/LSID applicability statement http://www.tdwg.org/stdtrack/article/download/150/51

• Cool URIs don't change (Tim Berners-Lee) http://www.w3.org/Provider/Style/URI

• Cool URIs for the Semantic Web http://www.w3.org/TR/cooluris/

• Recommendations for implementation of guids in the SERNEC collections community http://bioimages.vanderbilt.edu/guid

• Biodiversity Informatics 7:17-44 https://journals.ku.edu/index.php/jbi/article/view/3664

Note: this PowerPoint will be linked from the first URL below(QR code at right loads the URL).