metadata & brokering - a modern approach for ingv ri

Post on 13-Aug-2015

104 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

METADATAa modern approach

Daniele Bailo

CHARACTERS

Leading Actor

Digital Data

Sequence of (digital) symbols- With a meaning- Can be stored- Can be transmitted- Can be computed

Guest Star

Metadata

Data about Data (really?)

FunctionsManage Data (discovery, selection etc)

Issues (selection of)- What is metadata to

me, can be data to others

- Many standards- Ontologies

Actor

Broker(ing system)Intermediary software

Functions- Access to several

system at your place

- Collects data for you (integration)

Issues (selection of)- Performances- Works better with

metadata

Actor

The Triad

A set of 3 elements to fully manage data

FunctionsPID – persistent identifierMetadata – discovery & selectionDO – data of interest

<PID, metadata, DO>

Technical support staff

Data Base

Collection of (organized) Data

AliasRepository, Data Center etc.

Superpowers- DBMS (allows definition, creation, querying, update, and administration of databases)

Technical support staff

APIs Application programming Interface

Standard procedures or instructions to access to a service (or function)

AliasWEB service, RESTful service, [thin layer] etc..

Needs- Standards for

requests- Standards for

response

Themes1. Optimizaton of

resources

2. Single point access…to several Database and services

3. OPEN ACCESS obligationsBerlin Declaration,DPC…

4. Interoperation for data re-use New multidisciplinary science

5. Citationand data provenance

Comments?

Questions?

SCENARIOS1. Friendship based

discovery

2. Manual discovery

3. Advanced manual discovery

4. Brokering (canonical form

5. Metadata driven canonical brokering

6. Metadata driven canonical brokering with contextualization

PREMISEStructured data (standards)

#0 friendship based discovery1. data stored on USB

pendrives, CDs etc.

2. Phone calls

3. Emails

Issues

Works well in masonry clubs

#1 Manual discovery

= data Format A – repository A

= data Format B – repository B

= data Format C – repository C

Dataset

Dataset

Dataset

Data from Irpinia

1. User discovers data

2. Repository do not have web services

3. No metadata (or embedded into file or diectory structure)

4. Manual match & mapping

Issues

Performances, efficiency, error prone, partial datasets

Dataset

Dataset

DatasetData

setDataset

Dataset

#2 Advanced manual discovery

= data Format A – repository A

= data Format B – repository B

= data Format C – repository C

Dataset

Dataset

Dataset

Data from Irpinia

1. User discovers data

2. Repository have access interfaces (APIs, WS…)

3. Minimal metadata set

4. Manual match & mapping

Issues- Performances,

efficiency, error prone

- Some standardization in place

Dataset

Dataset

DatasetData

setDataset

Dataset

API API API

#4 Brokering (canonical form)

= data Format A – repository A

= data Format B – repository B

= data Format C – repository C

Dataset

Dataset

Dataset

Data from Irpinia

1. Broker discovers data

2. Repository have access interfaces (APIs, WS…)

3. Minimal metadata set

4. Minimal match &mapping

5. Multdisciplinary (ontologies)

Issues- Single AP- development and

maintenance- “hardcoded”

metadata

Dataset

Dataset

DatasetData

setDataset

Dataset

API API API

Broker

API Metadata canonical form

#5 Metadata driven canonical Brokering

= data Format A – repository A

= data Format B – repository B

= data Format C – repository C

Dataset

Dataset

Dataset

Data from Irpinia

1. Broker discovers data

2. Access interfaces3. Full metadata set4. Advance match

&mapping5. Multdisciplinary

(ontologies)Issues- Single AP- Stored graph

metadata- Huge metadata

superset

Dataset

Dataset

DatasetData

setDataset

Dataset

API API API

Broker

API Metadatacatalog

#6 Metadata driven canonical Brokeringwith contextualization

= data Format A – repository A

= data Format B – repository B

= data Format C – repository C

Dataset

Dataset

Dataset

Data from Irpinia

1. Map & match only contextualization metadata

2. Pointers to detailed metadata

Dataset

Dataset

DatasetData

setDataset

Dataset

API API API

Broker

API Metadatacatalog

#6 Metadata driven canonical Brokeringwith contextualization

= data Format A – repository A

= data Format B – repository B

= data Format C – repository C

Dataset

Dataset

Dataset

1. Map & match only contextualization metadata

2. Pointers to detailed metadata

3. Export metadata in any standard

3 layer metadata model

Dataset

Dataset

DatasetData

setDataset

Dataset

API API API

Discovery (DC) and (CKAN, eGMS)

Contextual (CERIF metadata model)

Detailed (community specific)

Gen

erat

e

Point to

Question

There is a missing actor.

WHO?

Dataset

Dataset

DatasetData

setDataset

DatasetData

setDataset

Dataset

API API API

Discovery (DC) and (CKAN, eGMS)

Contextual (CERIF metadata model)

Detailed (community specific)

<PID, metadata, DO>1. PID univocally

identifies a Digital Object

2. Metadata provides description of the Object

3. DO is the Digital Object… to be defined

Data from Irpinia

<PID, metadata, DO>

request response

Wrapping up

We need1. Metadata describing

data2. APIs & web services3. Defined WS output

format4. PID system -5. Brokering system6. Metadata catalogue

supporting1. Ontologies2. Contextualization

Q&A

#3 Metadata driven canonical brokering

= data Format A – repository A

= data Format B – repository B

= data Format C – repository C

Dataset

Dataset

Dataset

Data from Irpinia

1. Broker discovers data

2. Repository have access interfaces (APIs, WS…)

3. Significant metadata set

4. Good match &mapping

Issues

- development and maintenance

- Single AP

- “hardcoded” metadata

Dataset

Dataset

DatasetData

setDataset

Dataset

API API API

Broker

API Metadatacatalog

#4 Metadata driven canonical brokering

Broker

= any data format

Dataset

Issues

1. Predefined tools for matching and mapping

2. Writing software: n conversion algorithms to canonical form

3. Ontologies

4. Multidisciplinarybut many formats

5. Good data discovery

6. Not all metadata used

Dataset Data

set

Dataset

Dataset

= metadata format A

= metadata format B

Data from Irpinia

catalog

#1 Conventional

Brokering

Broker

= data Format A

= data Format B

= data Format C

Dataset

Dataset Data

set

Dataset

Dataset

Dataset

Dataset

DatasetData

set Dataset

Dataset

Dataset

Data from Irpinia

Issues

1. Writing software: n*(n-1) conversion algorithms

2. does not scale in costs of development and maintenance

3. matching and mapping

4. works within a restricted research domain

5. “Complex” data discovery

#2 Brokering with canonical form

Broker

= data Format A

= data Format B

= data Format C

Dataset

Dataset Data

set

Dataset

Dataset

Dataset

Dataset

DatasetData

set Dataset

Dataset

Dataset

Data from Irpinia

Issues

1. Writing software: n conversion algorithms to canonical form

2. works within a restricted research domain

3. matching and mapping

4. “Complex” data discovery

= canonical Format A

#3 Metadata driven simple brokering

Broker

= any data format

Dataset

Issues

1. Good data discovery

2. Predefined tools for matching and mapping

3. Multidisciplinarybut many formats

4. Writing software: n*(n-1) conversion algorithms

5. Ontologies

Dataset Data

set

Dataset

Dataset

= metadata format A

= metadata format B

Data from Irpinia

METADATA

#2 Metadata driven canonical brokering

Broker

= any data format

Dataset

Issues

1. Predefined tools for matching and mapping

2. Writing software: n conversion algorithms to canonical form

3. Ontologies

4. Multidisciplinarybut many formats

5. Good data discovery

Dataset Data

set

Dataset

Dataset

= metadata format A

= metadata format B

Data from Irpinia

catalog

METADATA

top related