linked data applications: there is no one-size-fits all formula (long presentation)

74
Linked Data Applications: There is no One-Size-Fits-All Formula Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net [email protected] Acknowledgements: O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0

Upload: asuncion-gomez-perez

Post on 01-Nov-2014

1.925 views

Category:

Education


0 download

DESCRIPTION

Linked Data Applications: There is no One-Size-Fits-All Formula. (Long Presentation). Summer school on Ontological Engineering and the Semantic Web. Cercedilla, Spain

TRANSCRIPT

Page 1: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

Linked Data Applications:

There is no One-Size-Fits-All

Formula

Asunción Gómez-Pérez

Facultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net

[email protected]

Acknowledgements:

O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón

Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0

Page 2: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Table of content

1. The concept

2. Foundations

3. The process

4. Examples • Libraries: http://datos.bne.es

• Geo: http://geo.linkeddata.es/

• Metereology:http://aemet.linkeddata.es/

• Travelling: http://webenemasuno.linkeddata.es/

2

Page 3: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Complex queries using data from heterogeneous Web

pages

3

http://www.aemet

http://www.viaf.org/

*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

Cervantes enthusiast from Germany

visiting Madrid and willing to know

more about Cervantes’ work and life

http://www.bne.es/

http://elviajero.elpais.com/

Page 4: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

M. Cervantes

Don Quixote

Hebrew

creator

Translated

into

1960

Year of

publication

VIAF

located

Data Integration

4

M. Cervantes Alcalá de Henares

Alcalá de Henares

birthPlace

Same as

Alcalá de Henares

20º

Temperatura

M. Cervantes

El Quijote

Autor

1605

Año de

Publicación

BNE

Ubicado en

BD BNE BD

VIAF

BD

AEMET

BD IGN

Alcalá de Henares

Tapas Siglo

de Oro

guía

BD

Prisa

BD

DBpedia

Page 5: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Table of content

1. The concept

2. Foundations

3. The process

4. Examples • Libraries: http://datos.bne.es

• Geo: http://geo.linkeddata.es/

• Metereology:http://aemet.linkeddata.es/

• Travelling: http://webenemasuno.linkeddata.es/

5

Page 6: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

The model (Ontology) and the data

13

Work

Idiom

translation

Year

Publication date

Library

Located at

Person

Is creator of

Has subject

El Quijote Cervantes

Is creator of

Catalán

translation

1960

Publication date

BNE

Located in

Has subject

Vida de Cervantes

Ontology

Data

birthPlace Place

birthPlace Alcalá de Henares

Page 7: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 14

http://iflastandards.info/ns/fr/frbr/frbrer/C1001

http://iflastandards.info/ns/fr/frbr/frbrer/C1002

translation

Año

Publication date

http://xmlns.com/foaf/0.1/Organization

Located in

http://iflastandards.info/ns/fr/frbr/frbrer/C1005

Is creator of

Has subject

http://datos.bne.es/resource/XX3383563 http://datos.bne.es/resource/XX1718747

Es autor

http://datos.bne.es/resource/XX1924295

translation

1960

Publication date

BNE

Located in

Has subject

http://datos.bne.es/resource/bimo0002045496

Vida de Miguel de Cervantes Saavedra

Don Quijote de la Mancha Cervantes Saavedra, Miguel de

Catalán

Ontology

Data http://datos.bne.es/#

Language

work

Biblioteca

Person

http://geo.linkeddata.es/ontology/Municipio

birthPlace

http://geo.linkeddata.es/resource/Alcalá de Henares

birthPlace

The model (Ontology) and the data

Page 8: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Table of content

1. The concept

2. Foundations

3. The process

4. Examples

15

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 9: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

• Data sources analysis

• URI Design

• License definition

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

16 Reunión bilateral CNIG – OEG

Proyecto OTALEX

Page 10: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

URI design

• Meaningful URIs vs opaque URIs

• Separate TBox (ontology model) from ABox

• Base URI

http://linkeddata.es/

http://geo.linkeddata.es/

http://otalex.linkeddata.es/

• Ontología (TBox URIs)

http://phenomenontology.linkeddata.es/ontology/{concept|property}

http://phenomenontology.linkeddata.es/ontology/Municipality

• Datos (ABox URIs)

http://geo.linkeddata.es/resource/{resource type}/{resource name}

http://geo.linkeddata.es/resource/Municipio/Azuaga

Specification

17 Reunión bilateral CNIG – OEG

Proyecto OTALEX

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 11: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

License Definition

• Several possibilities

• The UK Open Government License

• Open Database License

• Public Domain Dedication and License

• Open Data Commons Attribution License

• The Creative Commons Licenses (CC)

• It is also possible to reuse and apply an existing

license of the (government) data sources.

Specification

18 Reunión bilateral CNIG – OEG

Proyecto OTALEX

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 12: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Ontology

19

• Ontologies: • A set of terms

• A set of explicit assumptions regarding the intended meaning of

the terms.

• Almost always including concepts and their classification

• Almost always including properties between concepts

• Shared understanding of a domain of interest

• Ontologies expressed in OWL or RDF(S), both based on

RDF

• The NeOn methodology helps to build ontologies

Modelling

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Page 13: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

2. Vocabulary development

• Features

• Lightweight :

• Taxonomies and a few properties

• Consensuated vocabularies

• To avoid the mapping problems

• Multilingual

• Linked data are multilingual

• The NeOn methodology can help to

• Re-enginer Non ontological resources into ontologies

• Pros: use domain terminology already

consensuated by domain experts

• Withdraw in heavyweight ontologies those features

that you don’t need

• Reuse existing vocabularies

21 Asunción Gómez Pérez

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 14: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

O. Specification O. Conceptualization O. Implementation O. Formalization

1 RDF(S)

OWL

Flogic

NeOn Methodology

Ontology Restructuring (Pruning, Extension,

Specialization, Modularization)

8

O. Localization

9

Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation;

Configuration Management; Evaluation (V&V); Assessment

1,2,3,4,5,6,7,8, 9

O. Aligning

O. Merging

Alignments 5

5

5

Ontological Resource

Reengineering

4

4

4

6

6

6

6

Knowledge Resources

Ontological Resources

O. Design Patterns

2

Non Ontological Resources

Thesauri

Dictionaries Glossaries Lexicons

Taxonomies Classification

Schemas

Non Ontological Resource

Reuse

Non Ontological Resource

Reengineering

2

2

O. Repositories and Registries

Flogic

RDF(S)

OWL

Ontology Design

Pattern Reuse

7

3

Ontological Resource

Reuse

3

Page 15: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Reuse available vocabularies

23

Reuse suitable

Ontologies and

vocabularies

Linked Open Vocabularies

Modelling

Search for suitable

non-ontological resources

Domain-related sites

Government Catalogs

are there

suitable

resources?

Build the vocabulary by

transforming available

resources

Yes

No

Build the vocabulary from

scratch Highly reliable Web Sites

Page 16: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Publicación

Data publication

Metadata publicacion using VOID

To facilitate the discovery

• Register in CKAN your dataset

• Use to sitemap4rdf to generate the site map

• Upload the site map to Google and Sindice

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 17: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Table of content

1. The concept

2. Foundations

3. The process

4. Examples • Libraries: http://datos.bne.es

• http://linkeddata3.dia.fi.upm.es/bne-demo

• Geo: http://geo.linkeddata.es/

• Metereology: http://aemet.linkeddata.es/

• Travelling: http://webenemasuno.linkeddata.es/

25

Page 18: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

MARC21

• Different communication formats:

• MARC 21 format for Bibliographic Data

• MARC 21 format for Authority Data

• Others: Holdings, Classification, etc.

• Three main elements:

• Record structure: ISO 2709. Fields, indicators,

subfields…

• Content designation: "Meaning" of codes and

conventions

• Content: Defined outside the MARC standard (ISBD,

AACR..)

26

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Page 19: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Specification@ BNE

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

• Records in the MARC 21 format

• 3.9 million bibliographical records

• 4.2 million authority records

• Version: November, 2011

27

AUTHORITY BIBLIOGRAPHIC

Persons

Corporate bodies

Conferences

Titles

Subject

76576 Maps

320727 Sound recordings

166017 Gravings, drawings, pictures

35770 Manuscripts

143959 Ancient books

2696560 Modern books

178473 Scores

3021 Electronic resources

156634 Serials

96672 Videos

Page 20: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

MARC21 record structure

001 XX1721208

005 200012181124

008 901120nn aijnnaabn n aaa

016 $a BNE19900178994

040 $a SpMaBN $b spa $c SpMaBN $e rdc $f

embne

100 10 $a Camus, Albert

$d 1913-1960

670 $a El mite de Sísif, 1987 $b port. (Albert

Camus)

670 $a Dic. de filosofía, de J. Ferrater Mora,

1980$b(Camus., Albert (1913-1960); n.

Mondovi, Argel)

670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)

28

Subfield Field

Control Field

Content

Subfield Content

• Authority record: Camus, Albert*

HEADING

1XX

* http://datos.bne.es/resource/XX1721208

Specification

Page 21: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

MARC21 record content designation

001 XX1721208

100 10 $a Camus, Albert

$d 1913-1960

670 $a El mite de Sísif, 1987 $b port. (Albert

Camus)

29

Name Personal name

Control Number

Dates associated with name

• Authority record: Camus, Albert*

HEADING – Personal

Name

100

* http://datos.bne.es/resource/XX1721208

• Human reading:

An authority record that describes a Person, named

Camus, Albert with associated dates 1913-1960

Source consulted Citation

Page 22: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Frecuency of codes in records

30

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Page 23: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Specification

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

• Source data: MARC 21 records, not RDB. Very flat

structure difficult to map to richer models

• Domain experts (catalogers) need to be part of the mapping

process.

• Data quality good but still many errors: reporting.

• Iterative and incremental transformation process: measure

coverage and progress.

• Highly specialized library models: FRBR, ISBD.

• Multilinguality, collaboration with IFLA

Page 24: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Model: FRBR at a glance

Works

Expressions

Manifestations

Work 1

Work 2

Work 3

Expression1

Expression 2

Manifestation1 Manifestation2

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

32

Page 25: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

The Ontology: based on IFLA vocabularies

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 26: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Who will be the mapping generator?

BNE

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

001 XX1721208

005 200012181124

008 901120nn aijnnaabn n aaa

016 $a BNE19900178994

040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne

100 10 $a Camus, Albert

$d 1913-1960

670 $a El mite de Sísif, 1987 $b port. (Albert Camus)

670 $a Dic. de filosofía, de J. Ferrater Mora,

1980$b(Camus., Albert (1913-1960); n. Mondovi,

Argel)

670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)

Page 27: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Similar to mapping ontologies

35

100at Work

property subfield

maps

100t title of work maps

is creator of

Person 100a maps

Content

(100a)

Content

(100at) contained in

maps

Page 28: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Marimba allows librarians to create mappings

• Three spreadsheets:

36

Classification

mapping

Annotation

mapping

Relationships

mapping

MARC21

info

Records count Content sample Mapping

100 $a $d 888.880 Camus, Albert

1913-1960

foaf:Person

100 $a 999.999 Cervantes, Miguel

de

foaf:name

100 $a $m 10.000 Cervantes, iguel ERROR

Basic structure

Page 29: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Librarians create mappings using excell

37

Classification

mapping

Annotation

mapping

Relationships

mapping

MARC21

info

Records count Content sample Mapping

100 $a $d 888.880 Camus, Albert

1913-1960

foaf:Person

100 $a 999.999 Cervantes, Miguel

de

foaf:name

100 $a $m 10.000 Cervantes, iguel ERROR

Basic structure

Classification

mapping

Page 30: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 38

Annotation

mapping

Relationships

mapping

Librarians create mappings using excell

place of publication

has dimensions

Is part of work

Page 31: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Marimba interprets the Mappings and generate the RDF

• Classify: Exploiting the heading field and subfield codes.

100 $a $d Person (it has a personal name)

100 $a $d $t Work (it has a title)

• Annotate: Using subfield codes and the content.

100 $a "Camus, Albert" frbr:3001 "Camus, Albert"

100 $t "La Peste" frbr:P3039 "La Peste"

39

BNE

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

MARC 21 record

(Input)

Action RDF (Output)

100 $a $d Classify rdf:type frbr:C1005

100 $a Camus,

Albert

Annotate frbr:P3039 "Camus,

Albert"

100 $d 1913-1960 Annotate frbr:P3040 "1913-

1960"

001 XX1721208

……

100 10 $a Camus, Albert

$d 1913-1960

……

Page 32: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Mapping process more in detail

• But, what about the relationships between the entities?

• Relationships between records are not explicit in MARC.

Goal: The work "La Peste" was created by Albert Camus

40

RDF

Generation

001 XX1721208

100 10 $a Camus, Albert $d 1913-1960

001 XX1910518

100 10 $a Camus, Albert$d1913-1960 $tLa peste

* http://datos.bne.es/resource/XX1910518

Common

Common

Diff

Work

bne:XX1721208 frbr:2010 bne:XX1910518

(isCreatorOf)

We know the type of R1 and R2, and we look at the heading diff

Page 33: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Marimba: Mapping process summary

Classify

Annotate

Relate

41

001 XX1721208

100 10 $a Camus, Albert $d 1913-1960

001 XX1910518

100 10 $a Camus, Albert$d1913-1960 $tLa

peste

bne:XX1721208 a frbr:Person

bne:XX1910518 a frbr:Work

bne:XX1721208 a frbr:Person

frbr:name "Camus, Albert" .

frbr:hasDates 1913-1960

bne:XX1910518 a frbr:Work

frbr:title "La Peste"

bne:XX1721208 a frbr:Person

frbr:name "Camus, Albert" .

frbr:hasDates 1913-1960 .

frbr:isCreatorOf bne:XX1721208

bne:XX1910518 a frbr:Work

frbr:title "La Peste" .

frbr:isCreatedBy bne:XX1721208

(MARC records)

BNE

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 34: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Marimba uses the ontology to generate RDF

BNE

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 35: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Marimba links with other resources:

VIAF, DNB, SUDOC, LIBRIS, DBpedia

BNE

http://datos.bne.es/resource/XX1718747

Same As

Same As

Same As

Same As

Same As

LIBRIS

http://libris.kb.se/resource/auth/45369

SUDOC

http://www.idref.fr/026774771/id

DNB

http://d-nb.info/gnd/11851993X

DBpedia

http://dbpedia.org/resource/Miguel_de_Cervantes

VIAF

http://viaf.org/viaf/17220427

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 36: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Marimba links with other resources:

VIAF, DNB, SUDOC, LIBRIS, DBpedia

Page 37: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Publicación

Data publication

Metadata publicacion using VOID

To facilitate the discovery

• Register in CKAN your dataset

• Use to sitemap4rdf to generate the site map

• Upload the site map to Google and Sindice

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 38: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Exploitation

select distinct COUNT(?Obras) where {

http://datos.bne.es/resource/XX1718747

<http://iflastandards.info/ns/fr/frbr/frbrer/P2010>

?Obras

}

URI Cervantes

Is author

SPARQL queries

Web Interface

Especification

Model

RDF

generation

Publication

Exploitation

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

http://bne.linkeddata.es/

Page 39: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 48

Page 40: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 49

Page 41: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 50

Page 42: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

• Modelling:

• Open Metadata Registry

• Neon Toolkit

• Mapping and generation

• MARiMbA: Library-oriented, supports and facilitates the

entire process od transformation from MARC21 to RDF

• Publication:

• Virtuoso Universal Server

• Pubby

• CKAN registry

• Sitemap4rdf

• Exploitation:

• Web Applications that visualize data using SPARQL

Technological Support

Page 43: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Results: datos.bne.es

• Total number of authority records: 4.100.000

• Total number of bibliographical records: 2.390.140

• Total number of RDF triples: 58.053.215

• Number of links: (15% authorities): 587.520

• Linked sources:

• VIAF

• SUDOC (French collective university catalogue) FR

• GND (German National Library of authorities) GER

• LIBRIS Sweden

• DBPedia

• Soon BNF

52

http://bne.linkeddata.es/

Page 44: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Table of content

1. The concept

2. Foundations

3. The process

4. Examples • Libraries: http://datos.bne.es

• http://linkeddata3.dia.fi.upm.es/bne-demo

• Geo: http://geo.linkeddata.es/

• Metereology: http://aemet.linkeddata.es/

• Travelling: http://webenemasuno.linkeddata.es/

53

Page 45: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Identification and selection of data sources

54

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

Geographical

Spanish

Institute

Statistical

Spanish

institute

Page 46: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 55

Specification

Modelling

RDF Generation

Publication

Links Generation

Exploitation

• Geographic Spanish Institute

• Multilingual (Spanish, Vasc, Gallician, Catalan)

• Conceptualization mistmatches

• Granularity (scale concept)

• Textual information Inform. Hidrográfica: reservoir, river, , etc.

Transportes. Vía desdoblada, Ferrocarril, …

Unidades Administrativas. Municipality

• Particularaties

• Longitude and latitude

• Statistic Spanish Institute

• Monolingual

• Numerical information

• Particularaties

• Geo (textual level) and Temporal

Page 47: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

1. Identification and selection of the data sources

IGN-E

Page 48: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Statistical information

Specification

57

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 49: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

2. Lightweight Ontology Development

hasStatisticalData

on

Ontology

Specification

Legend

hydrOntology

4

FAO

FAO

Geopolitical

ontology

WGS84

4W3C

Vocabulary

GML

4GML

Specification

O.

Statistics

SCOVO

O.

Time

W3C Time

hasLat/Long

hasGeometry

hasLat/Long

hasGeometry

hasLocation/isLocated

Thesaurus

UNESCO

4EGM / ERM

GeoNames

scv:Dimension

scv:Item

scv:Dataset

WGS84 Geo

Positioning: an RDF

vocabulary

hydrographical

phenomena (rivers,

lakes, etc.)

Ontology for OGC

Geography Markup

Language

Vocabulary for

instants, intervals,

durations, etc.

Names and

international code

systems for

territories and

groups

Following the INSPIRE

(INfrastructure for SPatial InfoRmation in Europe) recommendation.

hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time

Classes 33 33

Object Properties 44 44

Data Properties 318 318

reused

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 50: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Hydroontology

Nivel superior

Nivel inferior

Page 51: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Implementation

1

2

3

4

5

+ Pellet

Page 52: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Geospatial Model

62

geoes:ontology/Polygon

geoes: http://geo.linkeddata.es/

geo: http://www.w3.org/2003/01/geo/wgs84_pos#

geoes:ontology/Geometry

geoes:ontology/Curve geo:Point

rdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

geo:lat geo:long Collection of 2 or

more geo:Points Collection of 3 or

more geo:Points

Composed by Composed by

Page 53: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

3. Generation of RDF

• From the Data sources

• Geographic information (Databases)

• Statistic information (.xsl)

• Geospatial information

• Different technologies for RDF

generation

• NOR20 (from excell, XML, text files, …)

• R20 and ODEMapster (from Databases)

• Geometry2RDF and SPh2RDF (for Geo

data)

Identification

of the data sources

Vocabulary

development

Generation

of the RDF Data

Publication

of the RDF data

Linking

the RDF data

Data cleansing

Enable effective

discovery

Page 54: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

3. Generation of the RDF Data / instances

• PR-NORs define a procedure that transforms a Non-Ontological Resource

(NOR) components into ontology elements. http://ontologydesignpatterns.org/

NOR2O

· Classification

schemes

· Thesauri

· Lexicons

NOR2O

FAO Water classification

· Classification scheme

· Path enumeration data model

· Implemented in a database

Page 55: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

3. Generation of the RDF Data – NOR2O

Industry Production Index

Province

Year

NOR2O

Page 56: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

RDF geoespatial

Reunión bilateral CNIG – OEG Proyecto OTALEX

70

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

shp2RDF

geometry2RDF

shp2RDF

Page 57: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Oracle STO UTIL package

SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry FROM "BCN200"."BCN200_0301L_RIO" c WHERE c.Etiqueta='Arroyo'

71

Generación de RDF geoespacial (geometry2RDF)

Reunión bilateral CNIG – OEG Proyecto OTALEX

Generación

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 58: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 72

Generación de RDF geoespacial (shp2RDF)

Reunión bilateral CNIG – OEG Proyecto OTALEX

Generación

Specification

Modelling

RDF Generation

Publication

Exploitation

Links Generation

Page 59: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

3. Generation of the RDF Data – Geometry2RDF

Page 60: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

RDF geoespatial

74

geoes:resource/Embalse/Embalse

%20de%20Orellana

rdf:type

geoes:ontology/Embalse

(reservoir)

geo:geometry

otalex:resource/4e994dad1c44d2

b50597dd64ddfbcac30de06d80

-5.498

38.985 geo:lat

geo:long

geoes: http://geo.linkeddata.es/

otalex: http://otalex.linkeddata.es/

geo: http://www.w3.org/2003/01/geo/wgs84_pos#

Embalse de Orellana

rdfs:label

otalex:resource/ 38.984222213320045_-

5.49938294416971

otalex:resource/wgs84/38.982575823226234_-

5.495821779307759

otalex:resource/wgs84/38.98531526569159_-

5.498594084713078

geoes:Polígono

(Polygon)

rdf:type

otalex:resource/wgs84/…

74

Page 61: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 75

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Page 62: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Statistics in RDF

ota:ds/population

ota:ds/population/Atala

ya_2008

qb:dataSet

ota:ds/population/Azu

aga_2008 ……

qb:dataSet

qb:DataSet

rdf:type

qb:Observation

rdf:type rdf:type

rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#

qb: http://purl.org/linked-data/cube#

ota: http://otalex.linkeddata.es/resource/

76

RDF Data Cube – DataSet

8396

otaonto:population

ota:Municipio/Azuaga

otaonto:geoArea

Page 63: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

IGN

Same As

Same As

Geonames

http://www.geonames.org/2521436/

DBpedia

http://dbpedia.org/resource/Azuaga

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

http://geo.linkeddata.es/page/resource/Municipio/Azuaga

Page 64: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Exploitation

Web Interface

Especification

Model

RDF

generation

Publication

Exploitation

Specification

Modelling

RDF

Generation

Publication

Exploitation

Links

Generation

Visualización

Exploitation

map4rdf:

• Google maps viewer of RDF resources

• Resources with spatial information

• Used in other applications like AEMET, Goodrelations,

GeoLinked Data, El Viajero…

map4rdf

http://oegdev.dia.fi.upm.es/projects/map4rdf/

SPARQL

Triplestore

Page 65: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

• Simple SPARQL Query

• To get the RDF instances of the Laguna concept, and their

geometry property.

• Complex SPARQL Query:

• To get resources near the city of Azuaga, at a distance of 10Km

(0.1) and with labels in spanish. The query has a limit of 50

resources.

81

PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>

SELECT ?subject ?label ?latitude2 ?longitude2 WHERE

{<http://geo.linkeddata.es/resource/Municipio/Azuaga>

<http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?g. ?g geo:lat ?latitude. ?g geo:long ?longitude.

?subject <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?g2. ?g2 geo:lat ?latitude2. ?g2 geo:long

?longitude2. ?subject rdfs:label ?label. FILTER(xsd:double(?latitude2) - xsd:double(?latitude) <= 0.1 &&

xsd:double(?latitude) - xsd:double(?latitude2) <= 0.1 && xsd:double(?longitude2) - xsd:double(?longitude) <=

0.1 && xsd:double(?longitude) - xsd:double(?longitude2) <= 0.1 && lang(?label) = "es"). } limit 50

SELECT ?x ?geo WHERE {?x rdf:type <http://geo.linkeddata.es/ontology/Laguna> .

?x <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?geo}

Page 66: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Page 67: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

geo.linkeddata.es

Page 68: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

aemet.linkeddata.es

Page 69: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

webenemasuno.linkeddata.es/

Page 70: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Phase BNE IGN AEMET PRISA INE

Modeling

RDF generation

Links

generation

Publication

Exploitation 87

Scovo

Data cube SSN ontology

SIOC DC

map4rdf SPARQL

geometry2rdf NOR2O

sitemap4rdf Pubby

MARiMbA

Silk Silk Silk NOR2O

DNB

VIAF

LIBRIS

DBPEDIA

DBPEDIA

Geonames

Geolinkeddata.es

DBPEDIA

Geolinkeddata.es Geolinkeddata.es

hydrontology

Wgs84

time

CSV parser CSV parser NOR2O

Page 71: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

RDF

Generation

Linking

Data & Knowledge

Visualization

Linked Data

Page 72: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Results

• Total number of authority records: 4.100.000

• Total number of bibliographic records:

2.390.140

• Total number of RDF triples: 58.053.215

• Links (15% authority): 587.520

• Linked sources:

• VIAF

• SUDOC (Sistema Universitario de

Documentación) FR

• GND (Auth German National Library) GER

• LIBRIS Sweden

• DBPedia

89

• Total number of guides: 27.876

• Total number of posts: 32.502

• Total number of locations: 6.838

• Total number of RDF triples: 9.462.339

• Linked sources: 12.750

DBPedia (6024 links)

GeoLinkedData (6726 links)

http://datos.bne.es

http://geo.linkeddata.es/

http://webenemasuno.linkeddata.es/

Number of geo type phenomenon: 95 (Rivers, mountains, etc.)

Number of geo entities: 155.000

Total number of RDF triples: 21.564.199

Links: 1002 (outlinks) y 6782 (coming)

Linked sources: DBpedia y GeoNames (outlinks)

AEMET y El Viajero (entry)

Page 73: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain

Lessons learnt

• URI

• Follow existing design guidelines for new URIs

• Reuse existing URIs from authoritative sources

• Models

• Reuse existing models when available

• Create new models from authoritative sources

• Do not forget to align your model with existing models

• Generation

• Vertical domains usually require specific tools for generation

• Link

• Generic link discovery tools performs well in vertical domains

• Link to other data sets using

• Equivalence links (sameAs)

• Typed links

• Discovery

• Use sitemap4rdf to allow search engines to find your data

• Use an iterative-incremental life cycle in your development

90

Municipality Person birthPlace

Dbpedia:cervantes bne:Cervantes sameAs

Learn about Linked Data

with UPM official courses in

one week

Page 74: Linked DAta Applications: There is no One-Size-Fits All Formula (Long presentation)

Linked Data Applications:

There is no One-Size-Fits-All

Formula

Asunción Gómez-Pérez

Facultad de Informática, Universidad Politécnica de Madrid

Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net

[email protected]

Acknowledgements:

O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón

Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0