mit’s simile project demonstrating practical value of semantic web technology for digital...

31
MIT’s SIMILE Project MIT’s SIMILE Project Demonstrating Practical Value Demonstrating Practical Value of Semantic Web Technology for of Semantic Web Technology for Digital Libraries Digital Libraries MacKenzie Smith, MIT Libraries

Upload: angela-mccarthy

Post on 24-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

MIT’s SIMILE ProjectMIT’s SIMILE ProjectDemonstrating Practical Value of Demonstrating Practical Value of Semantic Web Technology for Semantic Web Technology for Digital LibrariesDigital Libraries

MacKenzie Smith, MIT Libraries

2©MIT CNI Spring 2006

Semantic Web GoalsSemantic Web Goals

Effective management and reuse of data across domains, at Web scale

Standards to reduce the social and technical costs of sharing data

i.e. data interoperability

3©MIT CNI Spring 2006

Semantic Web StackSemantic Web Stack

4©MIT CNI Spring 2006

RDF “Open World” RDF “Open World” PhilosophyPhilosophy

The real world is very, very messy• People lie, cheat, and make mistakes about their data• They do what they must to get the job done• Their decisions are subjective, inconsistent• HTML, XML, other encodings are usually malformed

Standards for dealing with data must cope…

RDF allows for this better than most

5©MIT CNI Spring 2006

RDF “Open World” RDF “Open World” PhilosophyPhilosophyNeed to support a new kind of data?

• OK! Just add a new RDF statement, no need to change an xml or database schema

Need to mix data from several sources?• OK! Just pour them together as RDF statements

Got data that contradicts itself?• OK! Put both statements in RDF and equate or disambiguate

them with more RDF statements; all points of view are possible

6©MIT CNI Spring 2006

The Only Two Things You Really The Only Two Things You Really Need to Know about RDF…Need to Know about RDF…

1. Every piece of data has a URI• i.e. a globally unique identifier

http://web.mit.edu/simile/www/metadata/ocw/Contributor#john_dower

• Needn’t be resolvable on the Web

2. All data relationships are explicitly labeled• Differs from XML, other data standards that hide

relationships in their structure• Can model any kind of data this way

7©MIT CNI Spring 2006

The Digital Library The Digital Library ProblemProblem Digital repositories manage metadata

descriptive, administrative, structural, technical/preservation

Metadata is highly diverse and it evolves

XML/RDBMS solutions are too brittleneed to reduce barriers to interoperability (e.g. cost, prior agreement)

8©MIT CNI Spring 2006

Simple ExampleSimple Example

Qualified DC for digital object description• Supports display, search, browse, versioning, etc.• Consistent across all collections/objects in

repository• Creates internal interoperability of the data model

But • Metadata started out much richer (MARC, ONIX,

PRISM, IMS LOM, VRA, DDI, FGDC, etc.)• Many locally developed domain-based data models

So all of that rich description is lost

9©MIT CNI Spring 2006

Simple ExampleSimple Example

RDF for digital object description• Supports display, search, browse, versioning, etc.• Consistent across all collections, digital items• Creates internal interoperability of the data model

And • Still have all of the original metadata• It’s just remodeled into RDF as a graph and each

data element has a URI added

So all of that rich description is still there to use

10©MIT CNI Spring 2006

Simple ExampleSimple Example

But metadata consists of values that are interpreted, made sense of

• Different encodingse.g. Pablo Picasso == Picasso, Pablo,

1881-1973

• Typose.g. Pablo Picasso == Pablo Picassso

• Homonyms, other collisions across domainse.g. apple the fruit vs apple the

computer

11©MIT CNI Spring 2006

Simple ExampleSimple Example

Qualified DC caseNot much you can do except normalize values

where possible

RDF caseNo need to normalize, add more RDF!

Pablo Picasso sameAs Picasso, Pablo, 1881-1973

Pablo Picasso sameAs Pablo Picassso

Bank (the place where you put your money)

differentFrom Bank (the place next to the river)

Not a complete solution, but a big improvement

12©MIT CNI Spring 2006

Mixing Data Quality, Mixing Data Quality, SemanticsSemantics

But each collection offers a different set of qualities

e.g. level of granularity, correctness, consistency

C1 Q1 M1

collection quality goals metadata

C2 Q2 M2

C? Q? M?

What qualities doesthe union have? Is anybody happy?

+ union

=

?

13©MIT CNI Spring 2006

RDF ChallengesRDF Challenges

RDF enthusiasts• Very ivory tower, where the air is thin

RDF adoption rate• Creates doubt in target audience

Future is cloudy on scalability• Query engines look tractable• Large-scale inferencing… ?

“It’s too complicated”• RDF, RDF/S, OWL, OWL-lite, SPARQL, 50 years of AI

research…

14©MIT CNI Spring 2006

RDF To-DoRDF To-Do Scalability

• Just a matter of doing the work• Not a problem for many domains, applications• Do not necessarily use RDF internally (just where

interoperability or schema evolution is a problem)

Real world, public demonstrations• e.g. Piggy Bank• Uptake in other domains (e.g. biomedical, Oracle db)• Build more short term benefits for RDF adopters• Demonstrations of interoperability wins

Lower the barrier to entry• Open Source toolkits• Tap into innovative energy on the periphery

15©MIT CNI Spring 2006

SIMILE GoalsSIMILE Goals

Make metadata interoperability easier for digital libraries by providing useful tools for browsing, searching and mapping heterogeneous metadata in RDF

16©MIT CNI Spring 2006

Tools for Metadata Tools for Metadata ManagersManagers Gadget

– XML inspector

RDFizers– Batch tools to transform existing XML data into RDF

Solvent– Firefox extension for Javascript screen scraping

Welkin – Graphical tool to inspect/edit RDF graph

17©MIT CNI Spring 2006

GadgetGadget

Works on any well-formed XMLUsed for

•Data exploration, understanding•Data migration, transformation•Data cleanup•Complexity evaluation•Schema adherence understanding•Schema emergence (if none

provided)

18©MIT CNI Spring 2006

Gadget – the big picture of your Gadget – the big picture of your XMLXMLOCW: 2,002,015 Lines of XML

avg. string length # of instances# of unique values

19©MIT CNI Spring 2006

Gadget - Gadget - the big picture of your the big picture of your XMLXML

That’s Odd

20©MIT CNI Spring 2006

RDFizersRDFizers

Input types • Simple Dublin Core (via OAI-PMH)• MARC/MODS• OCW (soon IMS LOM)• VRA Core 3• Email• BibTeX

Done with XSLT style sheets, simple scripts

Need to define RDF “ontologies” for each

21©MIT CNI Spring 2006

Solvent - Solvent - Easier Scraping to Easier Scraping to RDFRDF

22©MIT CNI Spring 2006

Tools for End-UsersTools for End-Users

Longwell– Web-based RDF faceted metadata browser

Piggy Bank– Firefox extension for personal information

management of metadata in RDF

Semantic Bank– Web-based server that allows data publishing and

sharing by individuals, groups, or communities

23©MIT CNI Spring 2006

Longwell – Faceted RDF Longwell – Faceted RDF BrowserBrowser

24©MIT CNI Spring 2006

Example Collection – MIT Example Collection – MIT LibrariesLibraries

MIT Libraries public catalog– books, other publications

MIT OpenCourseWare– course material including visual images

DSpace@MIT – articles, working papers, theses, images, datasets, etc.

FOAF for MIT people

25©MIT CNI Spring 2006

RDF OntologiesRDF Ontologies

MODS– for OPAC MARC data and DSpace data

OCW-specific– Will migrate to IMS LOM eventually

FOAF for people

SIMILE specific (glue)

26©MIT CNI Spring 2006

OpenCourseWare example OpenCourseWare example (N3)(N3)@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema@> .@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix lomEdu:

<http://www.imsproject.org/rdf/imsmd_educationalv1p3#> .@prefix ocw: <http://web.mit.edu/simile/www/2004/01/ocw#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix dcq: <http://dublincore.org/2000/03/13/dcq#> .@prefix : <#> .

[…]

ocw:Lecturerdfs:subClassOf lomEdu:LearningResourceType ;rdfs:label “Lecture”@en .

ocw:Bibliographyrdfs:subClassOf lomEdu:LearningResourceType ;rdfs:label “Bibliography”@en .

27©MIT CNI Spring 2006

Piggy BankPiggy Bank

Firefox extension for managing metadata

• Loads RDF into local Longwell server

Search and faceted browse of local RDF• Views defined by library, other users

Users can find, collect, annotate RDF • Can then publish for access by others

28©MIT CNI Spring 2006

Piggy BankPiggy Bank

29©MIT CNI Spring 2006

Semantic BankSemantic Bank To persist,

share, publish data on a server

For individuals, groups, communities

e.g. conference proceedings

30©MIT CNI Spring 2006

SIMILE Categories of SIMILE Categories of Work Work

31©MIT CNI Spring 2006

DemosDemos