a lightning case study fromtaxonomy to linked data taxonomy boot camp 2015 lightning session...
TRANSCRIPT
A Lightning Case Study
FROMTAXONOMY
TO LINKED DATA
Taxonomy Boot Camp 2015 Lightning SessionNovember 4th, 2015
Bob Kasenchak, Director of Business DevelopmentAccess Innovations, Inc.
[email protected]@taxobob
Copyright 2015 Access Innovations
What? Why?
How?
What Happened?
Copyright 2015 Access Innovations
PLOS POC Project to establish links from terms inthe thesaurus to corresponding concepts in DBpedia;Used one branch for proof-of-concept
Use DBpedia Spotlight query tool to automaticallygenerate candidates; validate by hand; query offAbstracts from DBpedia and add to thesaurus;Add backlinks to DBpedia
Results and lessons learned
What? Why?
Access Innovations and PLOS POCproject to equate terms from the PLOSThesaurus to open data concepts to:
• Enable addition of information toterm records, e.g., definition/abstract• Move towards Linked Open Data
byfinding equivalent concept in DBpediafor each thesaurus term• Provide linkouts from PLOS
SubjectArea pages to external references
Copyright 2015 Access Innovations
DBpedia: Structured datafrom Wikipedia
Abstract/Definition
Images
External Links
Photos
Foreign language wikis
Subtopics
&c. &c.
Copyright 2015 Access Innovations
PLOS Subject Area Landing Page
Copyright 2015 Access Innovations
Establish Links to add information…(abstract, external resources, etc.)
Copyright 2015 Access Innovations
…By Adding Information to Thesaurus
Copyright 2015 Access Innovations
Linked Open Data Cloud
Copyright 2015 Access Innovations
CKAN version: http://blog.okfn.org/2010/09/03/next-version-of-the-linked-open-data-cloud-based-on-ckan/
How?Establishing Links
Copyright 2015 Access Innovations
• DBpedia Spotlight API (free) allows you to send text to
be matched against DBpedia concepts
• Our input text was single terms from PLOS Thesaurus
• We hand QC’d all of the results
• Added a custom field in Data Harmony Thesaurus Master
Tool to hold abstract/definition information
• Once links were verified, automatically queried abstracts
from DBpedia and added to thesaurus term records
DBpedia Spotlight:Matching toThesaurus Terms
Copyright 2015 Access Innovations
How?Adding Backlinks to PLOS in DBpedia
Copyright 2015 Access Innovations
This is accomplished by editing Wikipedia…
…but it’s complicated (more on this later)
What Happened? Results
Copyright 2015 Access Innovations
Spotlight match? Total Subject Area Terms % of Subject Area Terms
Match is top hit 71 59.7%
Match is in position 2-5 15 12.6%
No – matched manually 10 8.4%
No – no match found 21 17.6%
Yes but false positive 2 1.7%
What Happened? Results
Copyright 2015 Access Innovations
Lessons Learned during Matching Process:
• Your taxonomy is more granular than DBpedia: not every concept will have a match
• Spotlight performs better with a block of text than single terms– And our inputs were just terms from PLOS thesaurus– Results will HAVE to be QC’d – fully automating the process
is a non-starter from an accuracy standpoint– Some of the false automatic matches were hilarious
• Overall:– Our methodology was basically sound– The process is pretty painless but requires QC
What Happened? Adding Backlinks
Copyright 2015 Access Innovations
Lessons Learned during Backlinking Process:
• Can’t edit DBpedia directly; this information is crawledfrom Wikipedia pages;• Added links to some Wikipedia pages experimentally;
• Eventually they should show up in DBpedia; but• There is some question as to the appropriateness
of the links (per Wikipedia), so• Even though the PLOS subject area pages are
stable URIs, have relevant content etc.• Best option is probably to publish the PLOS
vocabulary (in OWL or perhaps SKOS) including the URI for each term, which would link to the URI for each Subject Area page• Using OWL:sameAS instead of
dbo:wikiPageExternalLink
What Next?
Copyright 2015 Access Innovations
• Present results• Refine methodology• Figure out best practices for
backlinking• Apply to entire PLOS
thesaurus (~11,000 terms)• Declare victory
Linked Data and Taxonomies
THANKS!ANY
QUESTIONS?
Bob KasenchakAccess Innovations, Inc.
[email protected]@taxobob
Copyright 2015 Access Innovations