towards long-term preservation of linked...

55
© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at Towards long-term preservation of linked data The PRELIDA project Dieter Fensel & José M. García 11 th April 2014 – EOD Conference, Innsbruck

Upload: others

Post on 19-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

Towards long-term preservation of linked data

The PRELIDA projectDieter Fensel & José M. García

11th April 2014 – EOD Conference, Innsbruck

Page 2: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Short biographies

• Univ.-Prof. Dr. Dieter Fensel– Professor at the University of Innsbruck, Austria– Director of Semantic Technology Institute (STI) Innsbruck– Scientific director and coordinator of more than 70 ICT & IST projects

• Dr. José María García– University assistant (Postdoc) at the University of Innsbruck, Austria– Senior researcher at STI Innsbruck– Participation in several national and EU-funded projects, including WP

leadership in PRELIDA and BYTE

• STI Innsbruck (www.sti-innsbruck.at) – A leading institute for semantic technologies– Established in Innsbruck, Tyrol, in August 2002, initially as a “Next Web

Generation” research group at the University of Innsbruck– Possesses strong links to local start-ups and SMEs, especially in the

areas of tourism and online marketing – Participating in PRELIDA in roadmapping, scientific events

organization and dissemination leaders

2

Page 3: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Outline

• Public Open Data

• Linked (Open) Data

• PRELIDA: Digital Preservation Linked Data

• What Digital Preservation community can provide

• What Linked Data community needs

• Outlook

3

Page 4: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

Public Open Data

OpenData                       

Page 5: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Public Open Data: What is Open Data?

5

Definitions:

• Open data is non-personally identifiable data produced in the course of an organisation’s ordinary business, which has been released under an unrestricted licence (like the Open Government Licence).

• Open public data is underpinned by the philosophy that data generated or collected by organisations in the public sector should belong to the taxpayers, wherever financially feasible and where releasing it won’t violate any laws or rights to privacy (either for citizens or government staff).

[linkedgov project]http://linkedgov.org

Page 6: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Public Open Data: Features of Open Data

Open Data principles [1]:1. completeness – all data that can be open (w.r.t. privacy and security)

should be open

2. primary source – all open data should be gathered at their source in raw format

3. temporal closeness – all open data should be up-to-date

4. easy access – all open data should be easily accessible

5. machine readability – all open data should be structured for machine processing

6

[1] Source [Kaltenböck M., Thurner T., (Hg.): Open Government Data Weißbuch, 2011]

Page 7: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Public Open Data: Features of Open Data

Open Data principles [1]:

6. non-discriminating – all open data should be accessible for everyone

7. open standards – all open data should use open standards

8. liberal licensing – all open data should use a liberal licensing without huge obligations for potential users

9. durability – all open data should be available on a long term basis

10. non-discriminating usage costs – some open data might involve usage costs. These should be kept as low as possible.

7

[1] Source [Kaltenböck M., Thurner T., (Hg.): Open Government Data Weißbuch, 2011]

Page 8: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Public Open Data

• Openness: Open Data is about changing behaviour

• Heterogeneity: Different vocabularies are used

• Interlinking: Need to link these data sets to prevent data silos

• Linked Open Data

8

Page 9: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

Linked (Open) Data

Page 10: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at 10

Motivation: From a Web of Documents to a Web of Data

• Web of Documents • Fundamental elements:1. Names (URIs)2. Documents (Resources)

described by HTML, XML, etc.3. Interactions via HTTP4. (Hyper)Links between

documents or anchors in these documents

• Shortcomings:– Untyped links– Web search engines fail on

complex queries“Documents”

Hyperlinks

Page 11: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at 11

Motivation: From a Web of Documents to a Web of Data

• Web of Documents • Web of Data

“Documents”“Things”

Hyperlinks

Typed Links

Page 12: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at 12

Motivation: From a Web of Documents to a Web of Data

• Characteristics:– Links between arbitrary things

(e.g., persons, locations, events, buildings)

– Structure of data on Web pages is made explicit

– Things described on Web pages are named and get URIs

– Links between things are made explicit and are typed

• Web of Data

“Things”

Typed Links

Page 13: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Google Knowledge Graph

• “A huge knowledge graph of interconnected entities and their attributes”.

Amit Singhal, Senior Vice President at Google

• “A knowledge based used by Google to enhance its search engine’s results with semantic-search information gathered from a wide variety of sources”

http://en.wikipedia.org/wiki/Knowledge_Graph

• Based on information derived from many sources including Freebase, CIA World Factbook, Wikipedia

• Contains about 3.5 billion facts about 500 million objects

13

Page 14: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Google Knowledge Graph

14

Page 15: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Linked Data – a definition and principles

• Linked Data is about the use of Semantic Web technologies to publish structured data on the Web and set links between data sources.

15

Figure from C. Bizer

Page 16: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Linked Data – a definition and principles

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up (dereference) those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

4. Include links to other URIs. so that they can discover more things.

16

Page 17: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

5-star Linked (Open) Data

★ Available on the web (whatever format) but with an open license, to be Open Data★★ Available as machine-readable structured data (e.g. excel instead of image scan of a table)★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)★★★★ All the above plus, Use open standards from W3C (URIs, RDF and SPARQL) to identify things, so that people can point at your stuff★★★★★ All the above, plus: Link your data to other people’s data to provide context

17

Page 18: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at 18

LOD Cloud May 2007

Figure from http://linkeddata.org/

Page 19: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at 19

LOD Cloud May 2007

Basics:The Linked Open Data cloud is an interconnected set of datasets all of which were published and interlinked following the Linked Data principles.Facts:•Focal points:

•DBPedia: RDFized vesion of Wikipiedia; many ingoing and outgoing links•Music‐related datasets

•Big datasets include FOAF, US Census data•Size approx. 1 billion triples, 250k links

Figure from http://linkeddata.org/

Page 20: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at 20

LOD Cloud September 2011

Figure from http://linkeddata.org/

Page 21: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at 21

LOD Cloud September 2011

Facts:• 295 data sets• Over 31 billion triples• Over 504 billion RDF links between data sources

Figure from http://linkeddata.org/

Page 22: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Challenges ahead

• Data economy– Non tangible assets (i.e. data) play a significant role in the creation of economic

value– Data generates the potential for many new types of product of services– Sustainability of services depends on availability of data

• Linked Data community– needs to preserve the data

• Digital Preservation community– needs Linked Data for metadata– faces new challenges by Linked Data

• So far, little or no interaction between the two communities

22

Page 23: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

The PRELIDA project

Page 24: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

PRELIDA

• PREserving LInked DAta

• EU-funded Coordination and Support Action (FP7)

• Running from 01/2013 to 12/2014

• Consortium– Consiglio Nazionale delle Richerche– Alliance for Permanent Access– University of Huddersfield– Universität Innsbruck, STI Innsbruck– Europeana Foundation– STI International

24

Page 25: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

PRELIDA – Objectives

Bridge the Linked Data and Digital Preservation communities for

• making the LD community aware of the existing DP results

• making the DP community aware of the challenges posed by LD– intrinsic features of Linked Data, including their structuring, interlinking, dynamicity and

distribution.

25

Page 26: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

PRELIDA – Objectives

• collect, organize and publish use cases related to the long-term access to LD

• create a comprehensive state of the art on LD and DP technologies

• set up a technology observatory

• bring together scientists and stakeholders for identifying relevant challenges and paths for addressing them in the near future

• draw attention of standardization bodies

26

Page 27: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

PRELIDA – Next event

More info at http://summerschool2014.eswc-conferences.org/27

• PRELIDA / ESWC Summer School• Featuring specific track for Digital

Preservation of Linked Data• From 1st to 6th September 2014• Venue: Kalamaki, Crete, Greece

• Application is now open!• Deadline: June 20th

• Several grants available

Page 28: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

What Digital Preservation community can provide

Page 29: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Digital Preservation - definition

• “Digital preservation refers to the series of managed activities necessary to ensure continued access to digital objects for as long as necessary”

Neil Beagrie and Maggie Jones: “Preservation management of digital materials: The Handbook” (Digital Preservation Coalition, 2008)

• “The goal of digital preservation is, hence, the accurate rendering of authenticated content over time”

Wikipedia

• “The act of maintaining information, Independently Understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long Term”

Open Archival Information System (OAIS)

29

Page 30: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

OAIS Reference model

30

Page 31: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Digital Preservation - some solutions

• Use file formats based on standards

• Use services of digital archives to store documents for the long-term

• Create and maintain high quality documentation

• Use multiple storage facilities (the LOCKSS -Lots Of Copies Keeps Stuff Safe- method)

31

Page 32: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Web archiving – Internet Archive (archive.org)

32

Page 33: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Web archiving – National Libraries

33

Page 34: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Web archiving – National Libraries

34

Page 35: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Digital Preservation – topics relevant for LD

• Object classification and validation– Rendered vs non-rendered– Complex vs simple– Dynamic vs static– Active vs passive

• Representation information (representation network)– The information model is key– Recursion ends at KNOWLEDGEBASE of the

DESIGNATED COMMUNITY (this knowledge will change over time and region)

– Does not demand that ALL Representation Informationbe collected at once

– A process which can be tested

• Persistent identifiers• Audit & Certification / Trustworthy Digital Repositories

35

Page 36: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

What Linked Data community needs

Page 37: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Particularities of Linked Data

• Differences between Linked Data and other types of data with respect to Digital Preservation requirements

– Is it only about reliable storing RDF data?

• Is LD preservation a special case of Web archiving?• Differences with other special types of data (e.g. multimedia content)• Functionality on top of RDF dataset (e.g. SPARQL endpoints, inference)

– Must be also preserved?

• Evolving data– Versioning– Other datasets directly or indirectly connected

• Particularities of LD may complicate preservation requirements in terms of stakeholders, rights over data, ownership of an interlinked dataset and ownership of the archived version

37

Page 38: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Challenges ahead on LD preservation (I)

• Selection of LD sources– Which data sources should be preserved?– When we stop crawling? -> Same as Web Archival?

• Who is responsible for the preservation?• Which formats can we distinguish?• Database approach• What about Linked Data which is not Open?

– Rights and licenses is the main difference, preservation-wise

• Ownership and authenticity

38

Page 39: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Challenges ahead on LD preservation (II)

• Storage– Multiple redundancy to reduce risks– Trust– Scalability (as in Web archival)

• Metadata and definitions– Self-descriptiveness requires preserving the ontologies too– Provenance and additional information

39

Page 40: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

Outlook

Page 41: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Digital preservation Linked Data

• Ultimately, both communities can benefit from each other

• Digital preservation using Linked Data– Archival metadata

• Linked Data preservation– Storage redundancy– Archiving data– Retrieving functionality– Many other challenges to cover

• 5-star Preservable Linked Data– Outcome of PRELIDA midterm workshop of last week– Upcoming report to be published

41

Page 42: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

PRELIDA Roadmap

• Gap analysis and consolidated state-of-the-art

• Defining a roadmap for long-term preservation of Linked Data

• PRELIDA Summer School– 1st-6th September 2014– Kalamaki, Crete, Greece– More info at http://summerschool2014.eswc-conferences.org/

• Final PRELIDA workshop– October 2014– Location TBD (tentatively co-located with ISWC 2014 in Riva del Garda, Italy)

• Stay tuned at http://www.prelida.eu and @PRELIDA_project (Twitter)

42

Page 43: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at© Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

Thanks for your attention

Any questions?

PRELIDA is co-funded by the EU Commission under FP7 grant no. 600663

Page 44: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Web archiving – MEMENTO project

• USA funded project

• Web archival with versioning

• Archived content accessible via the original URL

44

Page 45: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Related initiatives – DIACHRON

• FP7 EU-funded project• Preserving the Evolving Data Web• Making Open / Linked Data Diachronic

45

Page 46: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Public Open Data: What is Open Data?

46

Definitions:

The idea behind open data is that information held by government should be freely available to use and re-mix by the public. It’s a movement to make non-personal data:

• open so that it can be turned into useful applications• support transparency and accountability• make sharing data between public sector partners more efficient.

The Government is committed to making much more public data openly available. On 22 March 2010, the Prime Minister announced that the Government was going to:

“...use digital technology to open up data with the aim of providing every citizen in Britain with true ownership and accountability over the services they demand from government.”

http://www.idea.gov.uk/

Page 47: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Public Open Data: And it works (police.uk)

47

Page 48: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Public Open Data: And it works (police.uk)

• Different apps such as „Vehicle Crime & Road Accident Map“, „Crime Sounds“ and „UK Crimeview“ are provided

• The user can get a quick idea about different areas of cities and towns and their crime statistics.

48

Page 49: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

★ Available on the web (+ open license)

• Easy to publish web data

• Data can be easily accessed and stored locally

• Data can be entered manually into another system

49

Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012]

Page 50: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

★★ Available as structured data

• All benefits from ★

• Data can be directly processed with proprietary software

• Easy to export it into another structured format

50

Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012]

Page 51: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

★★★ Non-proprietary format is used

• All benefits from ★★

• No need to pay for a format controlled by a single organization

51

Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012]

Page 52: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

★★★★ Use open standards to identify things

• All benefits from ★★★

• Link to data from anywhere, either on the web or locally

• It can be bookmarked and parts of the data can be reused

• Access to data items can be optimized (caching, load balancing, etc.)

• BUT the publisher needs to identify separable items, assign URIs to each one, and

52

Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012]

Page 53: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

★★★★★ Data is linked to provide context

• All benefits from ★★★★

• New data of interest can be discovered while consuming other

• Data schema can be obtained

• Added value to the data

• Linked datasets are discoverable

• BUT resources have to be invested to link datasets

53

Source [Bauer F., Kaltenböck, M.: Linked Open Data: The Essentials, 2012]

Page 54: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Linked Open Data – silver bullet for data integration

• Linked Open Data can be seen as a global data integration platform– Heterogeneous data items from different data sets are linked to each other following the

Linked Data principles – Widely deployed vocabularies (e.g. FOAF) provide the predicates to specify links between

data items

• Data integration with LOD requires:1. Access to Linked Data

• HTTP, SPARQL endpoints, RDF dumps• Crawling and caching

2. Normalize vocabularies – data sets that overlap in content use different vocabularies• Use schema mapping techniques based on rules (e.g. RIF, SWRL) or query languages (e.g. SPARQL

Construct, etc.)3. Resolve identifies – data sets that overlap in content use different URIs for the same real

world entities• Use manual merging or approaches such as SILK (part of Linked Data Integration Framework) or

LIMES4. Filter data

• Use SIVE ((part of Linked Data Integration Framework)

54

See: http://www4.wiwiss.fu-berlin.de/bizer/ldif/

Page 55: Towards long-term preservation of linked databooks2ebooks.eu/eod2014/static/files/presentations/... · 1. completeness – all data that can be open (w.r.t. privacy and security)

www.sti-innsbruck.at

Example - Mashup: DBPedia Mobile

• Geospatial entry point into the Web of Data.• It exploits information coming from DBpedia, Revyu and Flickr data.• It provides a way to explore maps of cities and gives pointers to more

information which can be explored

5555

Try yourself: http://wiki.dbpedia.org/DBpediaMobile

Pictures from DBPedia Mobile