Download - Building a Linked Open Data Set
![Page 1: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/1.jpg)
Implementing a Linked Open Data set
Joel Richard
Smithsonian Libraries
SLA Annual Conference, July
2012
![Page 2: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/2.jpg)
Who are the Smithsonian Libraries?
• 20 Libraries in the U.S. and Panama
• Supports research of staff and the public
• Strong effort to digitize pre-1923 texts
• Taxonomic Literature II is one of these texts
Joel Richard,
SLA Annual Conference, July
2012
![Page 3: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/3.jpg)
Summary of Agenda
• Our data set and process
• Conversion to Linked Data
• Storing Linked Data
• Examples and More Info
• Summary
• … and Best brew pubs in Chicago
Joel Richard,
SLA Annual Conference, July
2012
![Page 5: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/5.jpg)
What is Linked Data?
HTTP URIs identify things to Humans and computers
Identifiers are related to other identifiers (or values) via predicates in a “triple”:
Charles Darwin // Creator // On the Origin of Species
See also :
http://linkeddata.org/
http://en.wikipedia.org/wiki/Linked_Data
http://richard.cyganiak.de/2007/10/lod/
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
![Page 6: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/6.jpg)
Joel Richard,
SLA Annual Conference, July
2012
http://richard.cyganiak.de/2007/10/lod/
![Page 7: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/7.jpg)
Taxonmic Literature II
Essential Reference Tool for Botanists
Authors and their Publications from1753 to 1940
It is a “database in book form.”
![Page 9: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/9.jpg)
Our process
Scanned the pages
Hired contractor for OCR and correction (99.97% accuracy)
Received XML dataset from Contractor
Verified and Imported to SQL Server
Built a website to search the data
Joel Richard,
SLA Annual Conference, July
2012
![Page 11: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/11.jpg)
Great! Let’s make some linked data!
First...what does 99.97% accuracy mean?
Joel Richard,
SLA Annual Conference, July
2012
~12,000 Errors
![Page 12: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/12.jpg)
Great! Let’s make some linked data!
Select Identifiers for your data
http://library.si.edu/tl-2/author/darwin
http://library.si.edu/tl-2/title/origin_of_species
http://library.si.edu/tl-2/title/1313
Choose vocabularies for predicates(harder than it sounds)
OWL, FOAF, DublinCore, OpenGraph, SIOC, SKOS, BIBO, etc.
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
![Page 13: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/13.jpg)
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
Mondeca Labs
Linked Open Vocabularies (LOV)
Vocabulary of a Friend(VOAF)
A vocabulary for describing other vocabularies
http://labs.mondeca.com/dataset/lov
![Page 14: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/14.jpg)
Joel Richard,
SLA Annual Conference, July
2012
http://library.si.edu/tl2/author/darwin
http://library.si.edu/tl2/title/origin…
tl2:creatorhttp://library.si.edu/tl2/title/1313
owl:sameAshttp://viaf.org/viaf/27063124
dc:creatorhttp://library.si.edu/tl2/author/darwin
owl:sameAshttp://www.archive.org/details/
originofspecies00darwuoft
![Page 15: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/15.jpg)
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
foaf:lastName, foaf:familyName
foaf:firstName, foaf:givenName
foaf:name, skos:prefLabel
tl2:birthYear
tl2:deathYear
skos:definition
tl2:personAbbreviation
tl2:titleNumber
dc:title
event:place
dc:publisher
dc:created
tl2:titleAbbreviation
http://library.si.edu/tl2/author/darwinRDF Type = foaf:Person
http://library.si.edu/tl2/title/origin…RDF Type = bibo:Book
![Page 16: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/16.jpg)
Great! Let’s make some linked data!
How are we going to store all this?
We’re using Drupal. RDFa is built-in, RDF extensions is an add-on module.
Probably not a good idea for very large datasets.
TL-2: 10,000 authors + 37,000 titles becomes about 400,000 triples.
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
![Page 17: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/17.jpg)
Storage considerations
Performance of Drupal Import:
Feeds Import: 7 Hours for 35k Records
Other options? Still searching…
Our linked data set will grow to at least 600-700k Drupal nodes.
Is Drupal the best way to do this?
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
![Page 18: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/18.jpg)
Storage considerations
2000 US Census
19 million households received “long form”
Joshua Tauberer: converted to 1bln triples
http://www.rdfabout.com/demo/census/
Carefully consider your storage options!
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
![Page 19: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/19.jpg)
Storage
ARC2 used by Drupal 7
RDBMS via D2RQ
RDBMS via Triplify
OpenLink Virtuoso
See Also:
http://www.w3.org/2001/sw/rdb2rdf/use-cases/
Joel Richard,
SLA Annual Conference, July
2012
![Page 20: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/20.jpg)
Linked Data. What’s the point?
Disambiguation
Connecting Relevant Information
More visible via search
Enrichment of your data
Easier reuse of data
Joel Richard,
SLA Annual Conference, July
2012
![Page 22: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/22.jpg)
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
http://en.openei.org/apps/mashathon2010/
![Page 23: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/23.jpg)
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
http://data.nytimes.com/schools/schools.html
![Page 24: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/24.jpg)
Joel
Richard, [email protected]
SLA Annual Conference, July
2012
http://data.nytimes.com/N38444093941437235523
![Page 25: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/25.jpg)
Joel Richard,
SLA Annual Conference, July
2012
http://www.worldcat.org/oclc/7619054
![Page 26: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/26.jpg)
Other Examples and Info
Library of Congress: Linked Data Serviceshttp://id.loc.gov/
Schema.orghttp://www.schema.org
Data.gov / Semantichttp://www.data.gov/semantic
Linked Data.orghttp://linkeddata.org/
Stephen Dale: Linked Data in Actionhttp://www.slideshare.net/stephendale/linked-data-in-action-4487244
Joel Richard,
SLA Annual Conference, July
2012
![Page 27: Building a Linked Open Data Set](https://reader033.vdocuments.us/reader033/viewer/2022060204/55a066b91a28ab3a728b4815/html5/thumbnails/27.jpg)
Joel Richard,
SLA Annual Conference, July
2012
Thank you!
?
[email protected]://slideshare.net/joelrichard