introduction to linked data and semantic web technology · introduction to linked data and semantic...

40
1 Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for use of some of his slides

Upload: leduong

Post on 17-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

1

Introduction to linked data and Semantic Web technology

Dave Raggett, W3C

6 October 2009 With thanks to Ivan Herman for use of some of his slides

Page 2: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

2

The Unfinished Revolution

● Today's Web is designed for people to interpret○ Using your eyes and your mind

● Each website only covers part of your needs○ You have to do integrate information across websites○ This is time consuming and a waste of effort

● We should put computers to work on our behalf○ We need to find ways for software to query, combine and

interpret data accessible over the Web– Michael Dertouzos: “The Unfinished Revolution, How to Make

Technology Work for Us--Instead of the Other Way Around”

Page 3: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

3

So what is the Semantic Web?

Page 4: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

4

It is, essentially, the Web of Data and the technologies to realize that

Page 5: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

5

Is it that simple...

● Of course, the devil is in the details○ a common model has to be provided for machines

to describe and query the data and its connections○ the “classification” of the terms can become very

complex for specific knowledge areas: this is where ontologies, thesauri, etc, enter the game…

Page 6: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

6

Linked Data

Page 7: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

7

Data Integration withthe Semantic Web

● Map each data source into binary relations● Merge the relations● Start making queries

○ Uniform representation of relations as RDF Triples

subject objectVerb

All three are named with URIs

Page 8: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

8

A simplified book store example

ID Author Title Publisher YearISBN0-00-651409-X The Glass Palace 2000id_xyz id_qpr

ID Name Home Page

ID CityHarper Collins London

id_xyz Ghosh, Amitav http://www.amitavghosh.com

Publ. Nameid_qpr

SQL database:

Page 9: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

9

Export data as relations

Page 10: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

10

Another book store example

A B D E

1 ID Titre Original

2

ISBN0 2020386682 A13 ISBN-0-00-651409-X

3

6 ID Auteur7 ISBN-0-00-651409-X A12

11

12

13

TraducteurLe Palais des miroirs

NomGhosh, AmitavBesse, Christianne

Spreadsheet

Page 11: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

11

Export it as relations

Page 12: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

12

Merge the relations

Page 13: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

13

Merging continued...

Page 14: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

14

Merging identical nodes

Page 15: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

15

Add some missing knowledge

● We “feel” that a:author and f:auteur should be the same

● But an automatic merge doesn't know that without help

● We will add some extra information to the merged data:○ a:author same as f:auteur

○ both identify a “Person”

○ a term that a community may have already defined:

○ a “Person” is uniquely identified by his/her name and, say, homepage

○ it can be used as a “category” for certain type of resources

Page 16: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

16

The merged relations

Page 17: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

17

Start making queries

● You can now ask for the home page of the original author of a translated book

● This information is made available byreasoning over the merged datasets

Page 18: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

18

What did we do?

Page 19: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

19

Web of Data

● We should publish data on servers○ In standard ways rather than ad hoc approaches

● Set RDF links among the data items from different data sets○ URIs as globally unique names○ URIs for downloadable datasets○ URIs for Web APIs

● Encourage people to innovate○ More data○ More applications

● Watch the network effect work its magic!

Page 20: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

20

Linked Open Data Cloud,March 2008

Page 21: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

21

Linked Open Data Cloud,March 2009

Page 22: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

22

All this sounds nice, but isn't that just a dream?

Page 23: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

23

2007 Gartner Predictions

● During the next 10 years, Web-based technologies will improve the ability to embed semantic structures [… it] will occur in multiple evolutionary steps…

● By 2017, we expect the vision of the Semantic Web […] to coalesce […] and the majority of Web pages are decorated with some form of semantic hypertext.

● By 2012, 80% of public Web sites will use some level of semantic hypertext to create SW documents […] 15% of public Web sites will use more extensive Semantic Web-based ontologies to create semantic databases

Page 24: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

24

Corporate adoption

● Major companies offer (or will offer) Semantic Web tools or systems using Semantic Web: Adobe, Oracle, IBM, HP, Software AG, GE, Northrop Gruman, Altova, Microsoft, Dow Jones, …

● Others are using it (or consider using it) as part of their own operations: Novartis, Pfizer, Telefónica, …

● Some of the names of active participants in W3C SW related groups: ILOG, HP, Agfa, SRI International, Fair Isaac Corp., Oracle, Boeing, IBM, Chevron, Siemens, Nokia, Pfizer, Sun, Eli Lilly, …

Page 25: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

25

Query languages

Page 26: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

26

Querying RDF with SPARQL

● A query language for RDF data● Similar in syntax and spirit to SQL

SELECT ?p WHERE { ?L1 arcrole:parent-child ?b1 . ?b1 xl:type xl:link . ?b1 xl:from ?p OPTIONAL { ?L2 arcrole:parent-child ?b2 . ?b2 xl:type xl:link . ?b2 xl:to ?p } FILTER (!BOUND(?b2)) }

Page 27: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

27

Defining shared vocabularies

Page 28: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

28

Data Types

● RDFS defines some predicates for common datatypes, e.g.

○ Booleans

○ Numbers

○ Strings

As XML or as natural language, e.g. Spanish

○ Dates

○ Classes

● Resources can belong to several classes

Page 29: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

29

OWL for Ontologies

● RDFS is useful, but complex applications may want more

● OWL adds lots of possibilities○ Characterization of properties○ Disjointness or equivalence of classes○ In RDFS, you can subclass existing classes○ In OWL, you can construct classes from existing ones

– Through set intersection, union, complement, etc.

● But this comes at a cost...

Page 30: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

30

OWL Profiles

● Trade off between rich semantics for expressibility and ease of making inferences○ Simpler inference engines are possible with restrictions

on which terms can be used and under what circumstances

● OWL full○ Very expressive, but not computable in general

● OWL DL○ Popular computable subset of OWL full

● OWL 2 defines further profiles

Page 31: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

31

Rules

Page 32: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

32

Rule Languages

● May be more convenient than ontologies● Example

○ A cheap book is a novel with over 500 pages and costing less than $8

● W3C Rule Interchange Format (RIF)○ Family of languages for rule interchange

– For different kinds of rule language

○ Uses include– Negotiating eBusiness contracts across platforms

– Access to business rules of supply chain partners

– Managing inter-organizational business policies

Page 33: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

33

XBRL and the Semantic Web

Page 34: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

34

Why translate XBRLto another format?

● It is very expensive to process 10-50MB of XML on each query○ Memory and CPU intensive: about one second

of CPU time per 10MB of XML source

● Better to pre-process filings into a persistent format designed to match needs of queries○ Current tools use proprietary solutions

● RDF and OWL as natural choices○ Mature standards

○ Facilitate mashing financial data with other kinds of information available over the Web

○ Web APIs and standards would enable an ecosystem of value adding players

Page 35: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

35

XBRL as RDF/Turtle

Part of US GAAP taxonomy

@prefix usfr-pte: <http://www.xbrl.org/us/fr/common/pte/2005-02-28>.

usfr-pte:ChangeOtherCurrentAssets rdf:type xbrli:monetaryItemType; xbrli:periodType "duration".usfr-pte:ChangeOtherCurrentLiabilities rdf:type xbrli:monetaryItemType; xbrli:periodType "duration".

_:link155 arcrole:parent-child [ xl:type xl:link; xl:role role1:StatementFinancialPosition; xl:use "prohibited"; xl:priority "1"^^xsd:integer; xl:order "1.0"^^xsd:decimal; xl:from usfr-pte:IntangibleAssetsNetAbstract; xl:to usfr-pte:IntangibleAssetsGoodwill; ].

Page 36: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

36

XBRL as RDF/Turtle

Sample of an XBRL Instance file

_:context_FY07Q3 xl:type xbrli:context; xbrli:entity [ xbrli:identifier "0000789019"; xbrli:scheme <http://sec.gov/CIK>; ]; xbrli:period ( [ xbrli:startDate "2007-01-01"^^xsd:date; xbrli:endDate "2007-03-31"^^xsd:date; ] ).

_:unit_usd xbrli:measure iso4217:USD.

_:fact209 xl:type xbrli:fact; xl:provenance _:provenance1; rdf:type us-gaap:PaymentsToAcquireProductiveAssets; rdf:value "461000000"^^xsd:integer; xbrli:decimals "-6"^^xsd:integer; xbrli:unit _:unit_USD; xbrli:context _:context_FY07Q3.

Page 37: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

37

XBRL and OWL

● XBRL Taxonomy loosely equates to OWL ontology

○ But note XBRL's taxonomy overrides

● Automated mapping is mostly feasible

○ As demonstrated by Rhizomik XSD2OWL

● XBRL's formal semantics are weak

● XBRL versioning standard will describe differences between different versions of the same taxonomy, e.g. US GAAP 2008, 2009

○ Unaware of work on mapping this into OWL

○ Is it a good match to real world needs?

– e.g. rules of thumb for computing analytic ratios

● Reasoning across different taxonomies remains a major challenge

○ e.g. US GAAP vs IFRS

Page 38: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

38

Web-based ecosystemfor financial data

● Publishers of raw data○ Investor relation websites

○ Government agencies

○ News agencies

● Data aggregators○ Republish data as linkable triples, Sparql queries

○ Higher level APIs for common queries

– Results as charts or tables

○ Web of scripts that add value

– Custom analytics across filings

● Smart search engines

● Communities○ Share reviews, comments, analyses, mashups, ...

Page 39: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

39

Smart Search Engines

● Imagine search engines that provide selected financial highlights for each company that matches the search criteria you just entered○ With salient numbers and charts

● The search results tailor the data provided according to your interests○ Based upon analysis of the search criteria and other

information gleaned from previous searches○ Subject to your privacy preferences, of course! **

● Interactive data you can drill down on

** My other job is on privacy and identity management for an EU FP7 project

Page 40: Introduction to linked data and Semantic Web technology · Introduction to linked data and Semantic Web technology Dave Raggett, W3C 6 October 2009 With thanks to Ivan Herman for

40

Thank you for listening