simple knowledge organization system (skos) in the context of semantic web deployment, library of...

70
The Simple Knowledge Organization System (SKOS) in the context of Semantic Web Deployment Alistair Miles http://purl.org/net/aliman Library of Congress May 2008

Upload: gardensofmeaning

Post on 08-May-2015

5.102 views

Category:

Technology


2 download

DESCRIPTION

Links are valuable. Links between documents, between people, between ideas, between data. Data is now a first class Web citizen, and the Web is expanding as more of these valuable networks are deployed within its fabric. Well-established knowledge organization systems like the Library of Congress Subject Headings will play a major role within these networks, as hubs, connecting people with information and providing a firm foundation for network growth as many new routes to the discovery of information emerge through the collective action of individuals. Or will they?This talk introduces the Simple Knowledge Organization System (SKOS), a soon-to-be-completed W3C standard for publishing thesauri, classification schemes and subject headings as linked data in the Web. This talk also presents SKOS in the context of the W3C’s Semantic Web Activity, and in particular the work of the W3C’s Semantic Web Deployment Working Group where other specifications are being developed for publishing linked data in the Web, for embedding linked data in Web pages, and for managing Semantic Web vocabularies. Finally, this talk takes a mildly inquisitive look at the value propositions for linked data in the Web, and how LCSH might be deployed in the Web for better information discovery.

TRANSCRIPT

Page 1: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

The Simple Knowledge Organization System (SKOS)in the context of

Semantic Web Deployment

Alistair Mileshttp://purl.org/net/aliman

Library of CongressMay 2008

Page 2: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

THE FUTURE OF THE WEB

http://purl.org/net/aliman 2

Page 3: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• Testimony of Sir Timothy Berners-LeeCSAIL Decentralized Information GroupMassachusetts Institute of Technology

• Before the United States House of RepresentativesCommittee on Energy and CommerceSubcommittee on Telecommunications and the Internet

• http://dig.csail.mit.edu/2007/03/01-ushouse-future-of-the-web.html

http://purl.org/net/aliman 3

Page 4: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

I. Foundations of the Web• “The success of the World Wide Web, itself built

on the open Internet, has depended on three critical factors:

1.unlimited links from any part of the Web to any other;

2.open technical standards as the basis for continued growth of innovation applications; and

3.separation of network layers, enabling independent innovation for network transport, routing and information applications.”

http://purl.org/net/aliman 4

Page 5: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

A. Universal linking: Anyone can connect to anyone...

• “In simple terms, the Web has grown because it's easy to write a Web page and easy to link to other pages.”

• “What makes it easy to create links ... is that there is no limit to the number of pages or number of links possible on the Web.”

• “Adding a Web page requires no coordination with any central authority, and has an extremely low, often zero, additional cost.”

http://purl.org/net/aliman 5

Page 6: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• “Adding a page provides content, but adding a link provide the organization, structure and endorsement to information on the Web which turn the content as a whole into something of great value.”

http://purl.org/net/aliman 6

Page 7: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• “The universality and flexibility of the Web's linking architecture has a unique capacity to break down boundaries of distance, language, and domains of knowledge.”

• “These traditional barriers fall away because the cost and complexity of a link is unaffected by most boundaries that divide other media.”

http://purl.org/net/aliman 7

Page 8: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• “The Web's ability to allow people to forge links is why we refer to it as an abstract information space, rather than simply a network.”

http://purl.org/net/aliman 8

Page 9: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

II. Looking forward

• “First, the Web will get better and better at helping us to manage, integrate, and analyze data.”

• “Today, the Web is quite effective at helping us to publish and discover documents, but the individual information elements within those documents ... cannot be handled directly as data.”

http://purl.org/net/aliman 9

Page 10: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• “Today you can see the data with your browser, but can't get other computer programs to manipulate or analyze it without going through a lot of manual effort yourself.”

• “As this problem is solved, we can expect that Web as a whole to look more like a large database or spreadsheet, rather than just a set of linked documents.”

http://purl.org/net/aliman 10

Page 11: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

A. Data Integration

• “Locked within all of this data is the key to knowledge about how to cure diseases, create business value, and govern our world more effectively.”

• “The good news is that a number of technical innovations...

• ... (RDF which is to data what HTML is to documents, and the Web Ontology Language (OWL) which allows us to express how data sources connect together) ...

• ... along with more openness in information sharing practices are moving the World Wide Web toward what we call the Semantic Web.”

http://purl.org/net/aliman 11

Page 12: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• “Progress toward better data integration will happen through use of the key piece of technology that made the World Wide Web so successful: the link.”

• “The power of the Web today, including the ability to find the pages we're looking for, derives from the fact that documents are put on the Web in standard form, and then linked together.”

http://purl.org/net/aliman 12

Page 13: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• “The Semantic Web will enable better data integration by allowing everyone who puts individual items of data on the Web to link them with other pieces of data using standard formats.”

http://purl.org/net/aliman 13

Page 14: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

DATA WEBS FOR E-SCIENCE

http://purl.org/net/aliman 14

Page 15: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 15

Page 16: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

FlyWeb Project

• Fruit flies (Drosophila ...) • Model organism• Extensive body of genetic research• Much of that knowledge is in journal papers• Recognised value of research data• Establish public databases

– E.g. FlyBase– Centrally curated

http://purl.org/net/aliman 16

Page 17: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 17

Page 18: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Data Webs

• Link data resources• Ask questions that no single data resource can

answer

• What’s the easiest, cheapest, most scalable way to achieve this?

• Agile approach, add value incrementally, return value early and often...

http://purl.org/net/aliman 18

Page 19: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 19

Vertical Web Apps

Vertical Web Apps

Level 0 – Any Data Resources in the WebLevel 0 – Any Data Resources in the Web

Level 1 – SPARQL End-pointsLevel 1 – SPARQL End-points

Level 2 – SPARQL End-points (Schema Alignment)

Level 2 – SPARQL End-points (Schema Alignment)

Level 3 – SPARQL End-points(Integrated Data)

Level 3 – SPARQL End-points(Integrated Data)

Web 2 Mash-ups

Web 2 Mash-ups

SPARQL Mash-upsSPARQL

Mash-ups

SPARQL Mash-upsSPARQL

Mash-ups

??????

Data Web Layer Cake

Page 20: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Example Application

• [insert screenshot of mashup]

http://purl.org/net/aliman 20

Page 21: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Future, self-publishing

• As publishing data on the Web becomes easier...

• ...more research groups will publish their own data...

• ...rich network of data resources...• ...challenging traditional view of scholarly life

cycle & value chain ... value grid...

http://purl.org/net/aliman 21

Page 22: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

SKOS

http://purl.org/net/aliman 22

Page 23: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Potted History

• SKOS 2001 (pre-alpha)– Thesaurus Interchange Format (TIF), LIMBER Project

• SKOS 2003 (alpha)– Semantic Web Advanced Development for Europe (SWAD-

Europe)• SKOS 2005 (beta)

– W3C Semantic Web Best Practices and Deployment Working Group (SWBPD)

• SKOS 2008 (W3C Recommendation)– W3C Semantic Web Deployment Working Group (SWD)

http://purl.org/net/aliman 23

Page 24: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 24

http://www.w3.org/2007/Talks/1211-whit-tbl/

Page 25: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Layers in the Web

• http://www.w3.org/2007/Talks/1211-whit-tbl/#(23)

• Third layer is network (graph) of connections beyond documents...

• ... people, organisations, genes, proteins, concepts ...

• Represent these connections (data) in the (Semantic) Web

http://purl.org/net/aliman 25

Page 26: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

KOS e.g. LCSH

• Can be viewed as a network of interconnected concepts

• Represent LCSH as data in the Web– Make those concepts and their interconnections

part of the Web

http://purl.org/net/aliman 26

Page 27: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 27

Page 28: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 28

Page 29: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 29

Page 30: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 30

Page 31: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Publishing KOS in the Web?

• Use RDF– Basic framework for data in the Web – resources,

literals, links... (“graphs” of data)

http://purl.org/net/aliman 31

Page 32: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Publishing KOS in the Web?

• Use SKOS• Standard set of...

– Resource types (Classes)– Link types (Properties)

• ... For representing KOS as RDF data• (N.B. Because use URIS as names for classes

and properties, call this an RDF vocabulary)

http://purl.org/net/aliman 32

Page 33: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

SKOS Resource Types (Classes)

• skos:Concept– E.g. Baseball in art

• skos:ConceptScheme– E.g. LCSH itself

http://purl.org/net/aliman 33

Page 34: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

SKOS Link Types (Properties)

• For labeling concepts– skos:prefLabel, skos:altLabel, skos:hiddenLabel

• For documenting concepts– skos:note, skos:scopeNote, skos:definition,

skos:editorialNote...

• For linking concepts– skos:broader, skos:narrower, skos:related

http://purl.org/net/aliman 34

Page 35: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• http://inkdroid.org/bzr/lcsh/docs/slides/

http://purl.org/net/aliman 35

Page 36: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 36

Page 37: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 37

Page 38: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Publishing LCSH in the Web

• Project LCSH into RDF (i.e. create an RDF representation)

• Publish it in the Web as linked data– http://lcsh.info

• Ed Summers, Clay Redding, Dan Krech, Antoine Isaac

http://purl.org/net/aliman 38

Page 39: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Scope of SKOS

• SKOS will be an all-encompassing standard for the lossless representation and exchange of all varieties of knowledge organisation system ... ?– No

• http://lists.w3.org/Archives/Public/public-swd-wg/2008Feb/0116.html -- Antoine Isaac

http://purl.org/net/aliman 39

Page 40: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• “...the things that we aim at representing are very diverse: some classification schemes use ‘codes’ and refer to ‘classes’, thesauri have ‘terms’ and so on.”

http://purl.org/net/aliman 40

Page 41: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• “Yet, it happens, looking at the way these things are used now and will be in the near future (with more and more links established between them), that (i) some standardisation has to take place, and that (ii) this standardisation can be actually grounded on some observed practical similarities (http://www.w3.org/TR/skos-ucr/)”

• “Our aim is not to replace the original objects in their initial context of use, but to allow to port them to a shared space, based on a simplified model, enabling wider re-use and better interoperability.”

http://purl.org/net/aliman 41

Page 42: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Lessons from the Web

• Less is more ...– E.g. REST over SOAP

• SKOS should capture a small amount of common ground ... Just enough to enable KOS’s valuable concepts and connections to be deployed in the Web and be linked to/from

• N.B. SKOS is infinitely extensible!– Easy to mix & match– Easy to refine

http://purl.org/net/aliman 42

Page 43: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

THE VALUE OF LINKS

http://purl.org/net/aliman 43

Page 44: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

The value of links

• The Web showed, links between documents are really useful

• Google’s pagerank showed, structure of network means something (and is worth something!)

• Social networking Web sites showed, how much we value other kinds of links

http://purl.org/net/aliman 44

Page 45: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Linked Metadata

• You’ve got LCSH in the Web, what next?• ... Linked metadata...?

http://purl.org/net/aliman 45

Page 46: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• http://inkdroid.org/bzr/lcsh/docs/slides/

http://purl.org/net/aliman 46

Page 47: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 47

Page 48: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 48

Page 49: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

• [insert demo, show how links change topology of information space]

http://purl.org/net/aliman 49

Page 50: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

http://purl.org/net/aliman 50

Page 51: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Value Proposition

• Links are paths to the discovery of information• Links can be exploited in useful (and

surprising) ways• Well-established KOS like LCSH can be hubs in

the network of linked metadata, bridging ...• (On the Semantic Web, LCSH should get very

high semantic pagerank!)

http://purl.org/net/aliman 51

Page 52: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

USING URIS

http://purl.org/net/aliman 52

Page 53: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Why use URIs?

1. Identifier management2. Data discovery

http://purl.org/net/aliman 53

Page 54: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Identifier management

• Referring to things• In a database, each table has a primary key• What happens when you try to combine data

from 2 databases?– Identifier clashes (ambiguous reference)– Identifier aliases (co-reference)

• Clashes hurt precision, give you nonsense• Aliases hurt recall, miss important results/links

http://purl.org/net/aliman 54

Page 55: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

URIs & identifier management

• URIs are like a single, global pool of identifiers – one world-wide primary key

• Can claim ownership of parts of “URI space”• Even though we’re all using same primary key,

mechanism for avoiding URI clashes• Can join data from multiple sources with

confidence• But ... doesn’t solve the alias problem, still

need to find co-references

http://purl.org/net/aliman 55

Page 56: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Data discovery

• Problem with distributed data ... How do you find everything thats “out there”?

• Two general approaches:– Centralised– Decentralised

http://purl.org/net/aliman 56

Page 57: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Centralised discovery

• Someone somewhere keeps a “catalogue” of everything

• Everyone “knows” where that catalogue is• New sources “tell” the catalogue about

themselves (a.k.a. “register” themselves)• E.g. Gas maintenance• Works well at small-medium scales

• E.g. FlyWeb• Rely on networks outside the Web (e.g.

Knowing the right people)http://purl.org/net/aliman 57

Page 58: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

FlyWeb Project

• [small number of large data resources]

http://purl.org/net/aliman 58

Page 59: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Decentralised discovery

• Data in one source refers data in another (using a URI)

• Data from the other source can be retrieved directly, by “de-referencing” the URI via the Web

• So given one data source, you can “follow your nose” and retrieve data from all linked sources ...

• ... without needing a central catalogue or registry, just the Web

• Works well up to Web-scale– E.g. FOAF

http://purl.org/net/aliman 59

Page 60: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Dereferenceable?

• For some URIs, can retrieve a “representation” of the “resource” via the Web

• (N.B. “resource” = “thing”)

http://purl.org/net/aliman 60

Page 61: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

FOAF

• Very large number of relatively small data resources

http://purl.org/net/aliman 61

Page 62: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Why use URIs?

• Identity management– From 2 to 2 billion data sources, always a problem

• Data discovery– Ability to “de-reference” a URI opens possibility

for decentralisation– Ability to “de-reference” is also useful in

centralised models (e.g. Registries can harvest)

http://purl.org/net/aliman 62

Page 63: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

SEMANTIC WEB DEPLOYMENT

http://purl.org/net/aliman 63

Page 64: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

W3C SWD WG

• SKOS• RDFa• Recipes for publishing RDF (linked data)• Vocabulary management

http://purl.org/net/aliman 64

Page 65: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

W3C Semantic Web Activity

• Semantic Web Deployment• Data Access (DAWG)

– SPARQL query language, SPARQL protocol

• GRDDL• OWL 2• SWHCLSIG• SWEO

http://purl.org/net/aliman 65

Page 66: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

SUMMARY

http://purl.org/net/aliman 66

Page 67: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Suggestions

• Linked KOS– Project LCSH into RDF (SKOS) – done – Publish LCSH as linked data in the Web – done– Publish SPARQL endpoint for LCSH

• Linked metadata– Project LOC metadata into RDF– Publish LOC metadata as linked data in the Web

• With links to LCSH & LCC– Publish SPARQL endpoint for LOC metadata

http://purl.org/net/aliman 67

Page 68: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Bibliographic Information as RDF?

• Projecting LCSH into RDF ... SKOS is standard vocabulary of resource & link types

• Projecting LCSH metadata into RDF ... Which vocabulary to use???

• Challenge – diversity of bibliographic information!

http://purl.org/net/aliman 68

Page 69: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

RDA -> RDF

• Joint DCMI/RDA task force• Seed funding to develop initial prototype RDF

vocabularies for bibliographic information• Based on FRBR and data model implicit in RDA• Early stages• http://dublincore.org/dcmirdataskgroup/• Karen Coyle

http://purl.org/net/aliman 69

Page 70: Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

Thanks

• STFC Rutherford Appleton Lab– Brian Matthews, Michael Wilson, Juan Bicarregui

• Oxford Image Bioinformatics Research Group– David Shotton, Graham Klyne, Jun Zhao

• W3C Semantic Web Deployment WG• Members of [email protected]

• Comments on SKOS -> [email protected]://purl.org/net/aliman 70