why 5-star data?

60
Dr. Sabin-Corneliu Buraga http://profs.info.uaic.ro/~busaco/ Dr. Sabin Buraga Faculty of Computer Science, UAIC Iasi, Romania profs.info.uaic.ro/~busaco/ slideshare.net/busaco

Upload: sabin-buraga

Post on 09-Apr-2017

739 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Dr. Sabin BuragaFaculty of Computer Science, UAIC Iasi, Romania

profs.info.uaic.ro/~busaco/ slideshare.net/busaco

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/open participation

open data

open software

open app development

open web

open cloud

open (computing) hardware

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

World Wide Web = “a common information space

in which we communicate by sharing information”

Tim Berners-Lee (2013)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Client Web application Storage

(user interface) server/framework (data persistence)

Internet

(Web)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Client Web application Storage

(user interface) server/framework (data persistence)

Internet

(Web)

URL – Uniform Resource Identifier

addressability

for example: http://www.slideshare.net/busaco/presentations/

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Client Web application Storage

(user interface) server/framework (data persistence)

Internet

(Web)

HTTP – HyperText Transfer Protocol

access to resources

a browser asks a Web server to provide a resource representation

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Client Web application Storage

(user interface) server/framework (data persistence)

Internet

(Web)

HTML, JSON, PDF, PNG, SVG,…

representation(s) of a resource

a Web page includes URLs to other resourceshypermedia

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Reusing & sharing data available on the Web

data access via a Web service

usually, by using an API

(Application Programming Interface)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Web servicespublic APIsmash-ups

www.programmableweb.com

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

APIs could be described via an open format (see OpenAPI specifications): http://theapistack.com/

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

aging…

James Governor (2007)

software ≈ fishdata ≈ wine

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

open data

“A piece of content or data is open

if anyone is free to use, reuse, and redistribute it.”

http://opendefinition.org/

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/ >

“If you have access to the data,

then you can achieve continuity

even if you don’t have access to

the underlying source of the application.

Open data is more important than open source. […]

Data persists, open data endures.”

Ian Davis, 2009

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

legal/technical openness

availability & access

reusing & sharing

universal participation

inter-operability

opendatahandbook.org

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Reusing data available on the Web

necessity of adopting a (re)use license

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Reusing data available on the Web

necessity of adopting a (re)use license

fair use

public domain

copyleft

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Reusing data available on the Web

necessity of adopting a (re)use license

Creative Commons

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

openness, transparency, respect

https://creativecommons.org/

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Data availability

on the Web

as “opaque” document

(usually, using a proprietary format)

does not refer – via current Web technologies –

other resources of interest

Tom Health (2007)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Data availability

in the Web

assuring discoverability via hypermedia

uses open data models/formats

(e.g., HTML, XML, JSON, CSV, RDF etc.)

platform independent

Tom Health (2007)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Can we evaluate the data openness?

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

5 ★ Open Data

Tim Berners-Lee (2009)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

1-star data

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

1-star data

the content is available on the Web – by using any

format – according to an open license

http://opendefinition.org/licenses/

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

users can view, print, locally store,

and – eventually – modify the document

the document itself can be shared on the Internet

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

a PDF containing a scanned image ☹

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

the document could be easily published on the Web

in order to reuse the data kept into the document,

additional processing might be necessary

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

2-star data

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

2-star data

additionally, the content must be available

as structured data (e.g., relations between entities)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

users can process the document by using, in most cases,

a proprietary software application

the document can be exported

into another (structured) format

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

a proprietary format

containing structured data ☹

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

the document can be easily published on the Web

data is still “locked” into the document +

processing is depending by a specific application

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

3-star open data

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

3-star open data

using an open (non-proprietary) format

to make data available on the Web

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

same content as HTML5 document ☺

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

<section class="timeslot" lang="en"><div class="timeslot-label"><time class="start-time"

datetime="20160508T11:45"> 11<span>45</span>

</time><time class="end-time"

datetime="20160508T12:45">12<span>45</span>

</time></div><p class="title">Why 5-Star Data?</p><p class="speaker">Sabin-Corneliu Buraga</p>

</section>

denoting a certain meaning from

the document’s author point of view

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

data can be managed (viewed, processed, filtered,

converted, shared, reused, etc.) in any manner

important aspect: platform independence

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

the document is still rather simple to be published on Web

exporting data into a proprietary format

could be problematic

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

4-star open data

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

4-star open data

each “thing” (entity) of interest from the document

is denoted by a Web address – URL

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

data, information, and knowledge are identified via URLs

in order to be accessed and (re)used

RDF (Resource Description Framework) modelW3C standards

www.w3.org/standards/semanticweb/

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

machine-friendly RDF assertions ☺

<!-- the thing identified by ‘busaco’ is a person -->

<div resource="#busaco" typeof="foaf:Person">

<a property="url" href="..."><span property="name">

Sabin Buraga</span></a>

</div>

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

machine-friendly RDF assertions ☺

towards classes of things:presentations

personsorganizations

...

things, not strings

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

content publishing could be much difficult, employing

the adoption of the semantic Web – or Web of Data –

technologies, tools, and methodologies

data in the Weblong term implications

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

5-star open data

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

5-star open data

additionally, data is inter-connected to other

datasets, according to the linked data initiative

linkeddata.org

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

inter-connecting open datasets ☺

graphofthings.org

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

possibility to discover other (related) data of interest

while consuming the datanetwork effect

other advantage: Web-based automatic reasoning

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

difficulties:

assuring data/knowledge consistency

problems related to slow adoption

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

5stardata.info

Michael Hausenblas (2012)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

★make your stuff available on the Web

(whatever format) under an open license

★★make it available as structured data

e.g., Excel instead of image scan of a table

★★★use non-proprietary formats

e.g., CSV (Comma Separated Values) instead of Excel

★★★★use Web addresses (URLs) to denote things,

so that people can point at your stuff

★★★★★link your data to other data – see http://datahub.io/ –

to provide context

Ed Summers (2010)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Several real-life examples?

Dr.

Sab

in B

ura

ga

www.purl.org/net/busa

co

augmenting the current Web search activities

via HTML5 schema.org + RDFa rdfa.info

Dr.

Sab

in B

ura

ga

www.purl.org/net/busa

co

access to public datasets ☺

Dr.

Sab

in B

ura

ga

www.purl.org/net/busa

co

Academic Torrentshttp://academictorrents.com/

Awesome Public Datasetshttps://github.com/caesar0301/awesome-public-datasets

Awesome JSON Datasetshttps://github.com/jdorfman/awesome-json-datasets

Common Crawlhttp://commoncrawl.org/the-data/

DataHubhttps://datahub.io/dataset

Dr.

Sab

in B

ura

ga

www.purl.org/net/busa

co

DBpedia.orga crowd-sourced

community effort

to extract structured

information from Wikipedia

in order to be

“intelligently” processed

by software

Dr.

Sab

in B

ura

ga

www.purl.org/net/busa

co

Wikidata.org – a free knowledge base that can be read

and edited by both humans & machines

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

open e-government: visualizing + comparing quality indicators

(license, formats, availability, metadata) regarding open datasets

opendatamonitor.eu

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

“Software – as a service or not – is just a container.

What makes software valuable has always been what

it does to data. Now, in the same spirit of SOA (Service

Oriented Architecture) and SaaS (Software As A Service),

a new concept is emerging, Data-as-a-Service – DaaS.”

Pete Soderling (2010)

Dr. S

abin

-Cor

nel

iuBura

ga–

htt

p://

pro

fs.in

fo.u

aic.ro

/~busa

co/

Dr. Sabin BuragaFaculty of Computer Science, UAIC Iasi, Romania

profs.info.uaic.ro/~busaco/ slideshare.net/busaco