a lightning case study fromtaxonomy to linked data taxonomy boot camp 2015 lightning session...

15
A Lightning Case Study FROM TAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th , 2015 Bob Kasenchak, Director of Business Development Access Innovations, Inc. [email protected] @taxobob Copyright 2015 Access Innovati

Upload: robyn-hubbard

Post on 17-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

A Lightning Case Study

FROMTAXONOMY

TO LINKED DATA

Taxonomy Boot Camp 2015 Lightning SessionNovember 4th, 2015

Bob Kasenchak, Director of Business DevelopmentAccess Innovations, Inc.

[email protected]@taxobob

Copyright 2015 Access Innovations

Page 2: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

What? Why?

How?

What Happened?

Copyright 2015 Access Innovations

PLOS POC Project to establish links from terms inthe thesaurus to corresponding concepts in DBpedia;Used one branch for proof-of-concept

Use DBpedia Spotlight query tool to automaticallygenerate candidates; validate by hand; query offAbstracts from DBpedia and add to thesaurus;Add backlinks to DBpedia

Results and lessons learned

Page 3: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

What? Why?

Access Innovations and PLOS POCproject to equate terms from the PLOSThesaurus to open data concepts to:

• Enable addition of information toterm records, e.g., definition/abstract• Move towards Linked Open Data

byfinding equivalent concept in DBpediafor each thesaurus term• Provide linkouts from PLOS

SubjectArea pages to external references

Copyright 2015 Access Innovations

Page 4: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

DBpedia: Structured datafrom Wikipedia

Abstract/Definition

Images

External Links

Photos

Foreign language wikis

Subtopics

&c. &c.

Copyright 2015 Access Innovations

Page 5: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

PLOS Subject Area Landing Page

Copyright 2015 Access Innovations

Establish Links to add information…(abstract, external resources, etc.)

Page 6: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

Copyright 2015 Access Innovations

…By Adding Information to Thesaurus

Copyright 2015 Access Innovations

Page 8: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

How?Establishing Links

Copyright 2015 Access Innovations

• DBpedia Spotlight API (free) allows you to send text to

be matched against DBpedia concepts

• Our input text was single terms from PLOS Thesaurus

• We hand QC’d all of the results

• Added a custom field in Data Harmony Thesaurus Master

Tool to hold abstract/definition information

• Once links were verified, automatically queried abstracts

from DBpedia and added to thesaurus term records

Page 9: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

DBpedia Spotlight:Matching toThesaurus Terms

Copyright 2015 Access Innovations

Page 10: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

How?Adding Backlinks to PLOS in DBpedia

Copyright 2015 Access Innovations

This is accomplished by editing Wikipedia…

…but it’s complicated (more on this later)

Page 11: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

What Happened? Results

Copyright 2015 Access Innovations

Spotlight match? Total Subject Area Terms % of Subject Area Terms

Match is top hit 71 59.7%

Match is in position 2-5 15 12.6%

No – matched manually 10 8.4%

No – no match found 21 17.6%

Yes but false positive 2 1.7%

Page 12: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

What Happened? Results

Copyright 2015 Access Innovations

Lessons Learned during Matching Process:

• Your taxonomy is more granular than DBpedia: not every concept will have a match

• Spotlight performs better with a block of text than single terms– And our inputs were just terms from PLOS thesaurus– Results will HAVE to be QC’d – fully automating the process

is a non-starter from an accuracy standpoint– Some of the false automatic matches were hilarious

• Overall:– Our methodology was basically sound– The process is pretty painless but requires QC

Page 13: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

What Happened? Adding Backlinks

Copyright 2015 Access Innovations

Lessons Learned during Backlinking Process:

• Can’t edit DBpedia directly; this information is crawledfrom Wikipedia pages;• Added links to some Wikipedia pages experimentally;

• Eventually they should show up in DBpedia; but• There is some question as to the appropriateness

of the links (per Wikipedia), so• Even though the PLOS subject area pages are

stable URIs, have relevant content etc.• Best option is probably to publish the PLOS

vocabulary (in OWL or perhaps SKOS) including the URI for each term, which would link to the URI for each Subject Area page• Using OWL:sameAS instead of

dbo:wikiPageExternalLink

Page 14: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

What Next?

Copyright 2015 Access Innovations

• Present results• Refine methodology• Figure out best practices for

backlinking• Apply to entire PLOS

thesaurus (~11,000 terms)• Declare victory

Page 15: A Lightning Case Study FROMTAXONOMY TO LINKED DATA Taxonomy Boot Camp 2015 Lightning Session November 4 th, 2015 Bob Kasenchak, Director of Business Development

Linked Data and Taxonomies

THANKS!ANY

QUESTIONS?

Bob KasenchakAccess Innovations, Inc.

[email protected]@taxobob

Copyright 2015 Access Innovations