introduction to the semantic web and linked data module 1 - unit 2 the semantic web and linked data...

58
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training for Catalogers

Upload: chad-carson

Post on 13-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Introduction to the Semantic Web and Linked Data

Module 1 - Unit 2

The Semantic Web and Linked Data Concepts

1-1

Library of CongressBIBFRAME Pilot Training

for Catalogers

1: Objectives and overview

1. Context

2. Goals of the presentation

3. Outline of the content

1-2

Context

• AACR2, RDA, MARC 21 record environment

• Linked data techniques to use, share, and enhance library data

• Alternative to MARC 21 as the primary carrier of library data

1-3

Goals of the presentation

• Describe the basic components of semantic web and linked data technologies

• Identify library efforts to expose library data to the semantic web

• Describe how semantic web and linked data technologies are being used in the current environment

1-4

Outline

• Linked data defined• RDF (Resource Description Framework)• RDF data model• Ontologies and vocabularies• Query languages• Linked open data on the web (LOD)

1-5

Linked data defined

• A set of best practices for publishing and connecting structured data on the Web http://linkeddata.org/faq

• Links are made between data (like the web links documents)

• Allows machines to return meaningful results based on the semantic structure of data

1-6

URIs and IRIs

• URI: uniform resource identifier– Sequence of characters used to identify a resource

• IRI: internationalized resource identifier– Identifier with extended character set

• This presentation uses the term “URI” for both of these concepts

1-7

URIs on the semantic web

• On the web of documents URL is a type of URI that links documents

• On the semantic web, URIs identify real-world objects– People– Cars– Books– Unicorns

1-8

Tim Berners-Lee’s rules for linked data

http://www.w3.org/DesignIssues/LinkedData.html• Use URIs as names for things• Use HTTP URIs so names can be looked up• Provide useful information with common tools• Include links to other URIs, so people can

discover more useful resources

1. Use URIs to name resources

• Use URIs as names • Don’t use strings as names• Computers can interpret URIs but not strings

– http://id.loc.gov/authorities/names/n79111488NOT a string like:

Guthrie, Woody, 1912-1967

2. Use HTTP(web) URIs for names

HTTP URIs:• Allow people to look up resources by name• Work with a Web browser• No special tools needed for look-up

http://id.loc.gov/authorities/names/n79111488NOT

n79111488

2. Use HTTP (web) URIs for names

http://id.loc.gov/authorities/names/n82061739

Lookup on the Web:

3. Return useful information via standards

• When someone looks up a URI, provide useful information, using common linked data standards (RDF, SPARQL)

RDF = Resource Description Framework–Data model based on triples

SPARQL = query language for RDF (something like Z39.50 for MARC)

3. Return useful information via standards

1-14

• Others can link to library data

3. Return useful information via standards

1-15

4. Include links to other URIs

1-16

• Libraries can enhance services for users

Linked data principles: take away

• HTTP URIs identify resources: people, books, serials, songs, “things”

• Useful web services can be built on library linked data and URIs

• Libraries can enhance services by linking out widely

1-17

RDF: Resource Description Framework

• Standard model for exchange of data on the Web

• Structures relationships between resources, people, and things on the web

• Uses graph model to represent database relationships

• RDF and related standards maintained by the World Wide Web Consortium (W3C)

1-18

RDF tools

• URIs are used to identify resources and relationships

• Vocabularies and ontologies: tools that define relationships between resources

• Triple statements are the core means of expressing relationships

1-19

RDF tools

• Standard languages and formats are used to express relationships in the RDF model

• Query languages allow people and machines to interact with RDF data stored in large data sets

1-20

RDF tools: Take away

• Widespread usage of common, openly available linked data tools promotes wide use and reuse of data on the web

1-21

RDF data model

• Triple statements• Graph data model• RDF XML (or other serialization format)• URIs• Ontologies and vocabularies• Namespaces

1-22

Triple statements

1-23

Subject ObjectPredicate

This work This authorWas written by

Triple statements

• Subject identifies: “Resource of interest”• Predicate identifies: Property of the “resource

of interest,” a relationship• Object identifies: Property value, a resource

that has a relationship to the “resource of interest”

1-24

Triple statements

1-25

This land is your land Woody Guthrie

Was written by

URI for work URI for author

URI for Dublin Core term: Creator [read: has creator]

Triple statements

The triple statement: This land is your land has creator Woody Guthrie

Can be expressed in a way that machines can interpret using URIs for name authorities and URIs for Dublin Core terms:

<http://id.loc.gov/authorities/names/n2013032388><http://purl.org/dc/terms/creator><http://id.loc.gov/authorities/names/n79111488>

1-26

Triple statements: take away

• Triple statements make it possible to make meaningful statements about resources on the semantic web

• Make use of URIs to identify subject, predicates and objects (ideally all three are URIs)

• Can be processed by computers and serve meaningful results to users

1-27

RDF serialization formats

• Languages for expressing RDF triples• Common formats:– Turtle– N-triples (N3)– RDF XML

• Can be easily parsed (processed) by machines • Some (like Turtle) can be easily be read by

humans

1-28

RDF XML

• Format for expressing triples• Identifies the syntaxes and vocabularies used

to express triple statements• RDF XML can be modeled as graph data

1-29

RDF XML

• Uses XML structure to help computers read statements about resources

• URIs are used to identify resources and namespaces

• Namespaces identify vocabularies and syntaxes used to make semantic statements about resources

1-30

RDF XML- under the hood

<rdf:Description Beginning of triple

rdf:about="http://id.loc.gov/authorities/names/n2013032388“ Subject>

<dc:creator>http://id.loc.gov/authorities/names/n79111488</dc:creator> Predicate and Object

1-33

Graph of the RDF data model

1-34

Subject Predicate Object

Song: This land is your land has creator Guthrie, Woody, 1912-1967

URIs in RDF XML

• Retrieve content to be read by humans and machines– Humans get an HTML page to read– Machines retrieve (through redirect) an RDF XML

format (or another format) that it can interpret and act on

1-35

URI resolves to a form humans can read and a form machines can read

1-36

URIs in RDF XML

• URIs identify web resources– Such as a book or author– Namespaces of standards that have been used to

encode RDF triple statements– Vocabulary and ontology terms– Subject, predicate, and object in triple statements

1-37

Namespaces

• Are declared in the root of an XML file• Are identified by URIs• Declare:– Vocabularies– Syntaxes– Sources of terms used to describe and identify the

resource

1-38

Namespaces

• Namespace declarations look like this in RDF XML: xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:lcnaf="http://id.loc.gov/authorities/names">

• But don’t worry! You won’t need to know all the details to use the BIBFRAME editor

1-39

RDF XML and other formats: take away

• Allow computers to process triple statements in descriptions of resources

• URIs retrieve content to be read by humans and machines (in an RDF format)

• Namespaces are used to declare formats, syntaxes, sources of terms

1-40

Vocabularies and Ontologies

• Used to define concepts within a particular field of study (domain)

• Terms used somewhat interchangeably, ontologies often described as “more complex” vocabularies

• Are necessary for discovering relationships on the Semantic Web

1-41

Vocabularies and Ontologies

• Define classes of objects• Define relationships between objects• Define properties of resources• Can be expressed using RDF, so computers

may interpret them• Help retrieve meaningful search results for

users

1-42

Vocabularies and Ontologies

• Example of discovering relationships:– Data set says “Flipper is a dolphin”– Ontology says “all dolphins are mammals”– A semantic web program that understands that X

= Y– Can discover a new relationship: “Flipper is a

mammal”

1-43

Building vocabularies and ontologies

• RDF Schema (RDFS)• Simple Knowledge Organization System (SKOS)• Web Ontology Language (OWL)• MADS/RDF

1-44

Vocabularies available in RDF formats

• BIBFRAME Vocabulary• Dublin Core Abstract Model and Dublin Core

metadata ontology• FOAF• Library of Congress authorities and vocabularies

at http://id.loc.gov “value vocabularies”• RDA vocabularies and registry:

http://www.rdaregistry.info/• Schema.org

1-45

BIBFRAME Vocabulary

• Creative Work - reflects a conceptual essence of the … resource.

• Instance - reflects an individual, material embodiment of the Work.

• Authority - defined relationships reflected in the Work and Instance: People, Places, Topics, Organizations, etc.

• Annotation - enhances our knowledge about another resource: Library Holdings, Cover Art and Reviews are examples.

1-46

BIBFRAME Model

1-47

BIBFRAME classes

1-48

BIBFRAME properties

1-49

BIBFRAME property description

1-50

Vocabularies, ontologies: take away

• Are necessary for discovering relationships on the semantic web

• Define classes and properties• Define relationships between resources• Facilitate “inference” discovery of new

information

1-51

Triplestore

• A database for storing and searching triple statements

• Triplestores are made available openly on the web

• Are searched using an RDF query language: SPARQL

• Searches result in meaningful inferences about resources

1-52

SPARQL

• RDF Query Language• Searches triplestore data sets called: “SPARQL

end points”• Makes the semantic web “readable”• http://dbpedia.org/ is an example of a SPARQL

end point

1-53

SPARQL Endpoints

• Many organizations make their data freely available as SPARQL endpoints on the web

• Allow other data providers (including libraries) to make use of the data

• Free availability promotes experimentation with user interfaces

1-54

Linked Open Data (LOD)

• Interlinked data sets on the web• Published using:– the 4 principles of linked data– common linked data tools such as RDF

• Global linked data space called: “Web of data”

1-55

1-56Source: http://linkeddata.org/home

Linked Open Data (LOD)

• Billions of RDF statements covering:– Geographic locations– People– Companies– Books– Scientific publications– Films, music, television, and radio programs– Genes, proteins, drugs and clinical trials, statistical

data, census results, online communities, reviews and more

1-57Source: http://linkeddata.org/home

Linked Open Data: take away

“…and the really important thing about data is the more things you have to connect together, the more powerful it is.”

1-58

Tim Berners-Lee, TED talk, “The Next Web”

1: Summary

1. Linked data techniques:– Enhance of sharing library data– Allow libraries to enhance services using data

from other sources

2. Linked data is a set of best practices for publishing and connecting on the web

3. Using URIs to name resources on the web and common standards such as RDF enable widespread sharing and reuse of data

1-59

2: Summary

4. Triple statements are at the heart of the semantic web

5. Can be expressed in a way that allows machines to serve meaningful results to users

6. Vocabularies and ontologies define relationships within a triple statement

7. BIBFRAME is a vocabulary built on linked data principles

1-60