wp3: data provenance and access control

25
WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg

Upload: hashim

Post on 19-Mar-2016

65 views

Category:

Documents


5 download

DESCRIPTION

WP3: Data Provenance and Access Control. Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg. Extended example scenario. GADM-RDF, Geospecies. Provenance and access control. SSN. ontologies. registry. C-SPARQL/ SPARQL-STR /HTTP. Quality control. relational DBMS. stream DB. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WP3: Data Provenance and Access Control

WP3: Data Provenance and Access Control

Irini Fundulaki, FORTHDecember 11-12, 2012, Luxembourg

Page 2: WP3: Data Provenance and Access Control

Slide 2 of 25

Extended example scenario

relational DBMS

stream DB stream DB RDF (stream)

DB

CSV twitter

C-SPARQL/SPARQL-STR/HTTP

registry

Qual

ity c

ontro

l

Provenance and access control

ontologiesSSN

GADM-RDF, Geospecies

Page 3: WP3: Data Provenance and Access Control

Slide 3 of 25

First Year Review Comments Collaboration

◦ Provenance and Data Quality (with UPM, FUB for WP2)◦ Collaboration with KIT for a privacy aware access control

framework Real versus syntheric data

◦ PlanetData datasets (GADM-RDF1, GeoSpecies2 enhanced with links from DBPedia using LDSpider), GO3 and CIDOC4 ontologies

Scalability◦ Real datasets with up to 11M triples

1 GADM-RDF http://gadm.geovocab.org/2 GeoSpecies: http://lod.geospecies.org/3 Gene Ontology: http://www.geneontology.org/4 CIDOC-CRM: http://www.cidoc-crm.org/

Page 4: WP3: Data Provenance and Access Control

Slide 4 of 25

Access Control Access Control:

◦ Refers to the ability to allow or deny the use of a particular resource by a particular entity

◦ Crucial for sensitive content since it ensures the selective exposure of information to different classes of users

Access Control Enforcement techniques for RDF data in the presence of RDFS inference:◦ Materialization (forward chaining)◦ Query Rewriting (backward chaining)◦ Static Analysis

Page 5: WP3: Data Provenance and Access Control

Slide 5 of 25

Access Control ModelsStandard access control models associate a

concrete label to a triple to indicate whether the triple can be accessed or not

Built in rule: An implied RDF triple can be accessed if and only if all its implying triples can be accessed

s p o

&a type Student

label

yesStudent sc Person yes

&a type Person yes

s p o label

Page 6: WP3: Data Provenance and Access Control

Slide 6 of 25

Access Control Models In the case of any kind of update, the implied triples &

their labels must be re-computed

An implied RDF triple can be accessed if and only if all its implying triples can be accessed

s p o

&a type Student

label

yesStudent sc Person yesno

&a type Person yes

s p o label

no

the overhead can be substantial in large datasetswhen updates occur frequently

Page 7: WP3: Data Provenance and Access Control

Slide 8 of 25

Our Approach Fine-grained Abstract Access Control Model comprised of abstract

tokens and operators◦ focus at the RDF Triple level◦ focus on permissions for read-only queries◦ labeled triples are encoded as quadruples◦ supports RDFS inference and propagation of labels across the

RDFS hierarchies◦ encodes how the access label of an implied quadruple is

computed◦ concrete policies are used to assign specific values to the

abstract tokens and operators Implementation of a fine-grained, repository independent, portable

across platforms access control framework on top of the MonetDB column store

Page 8: WP3: Data Provenance and Access Control

Slide 9 of 25

Annotating Triples with Access Tokens

s p o

&a type Student

label

l1Student sc Person l2

• l1 , l2 : abstract tokens

Person type l3class

q1:

q2:

q3:

&a type Persons p o label

l1 ⊙ l2 ⊙ : entailment operator records the triples used in the implication Student type class l2 ⊙ l3

q4:

q5:

&a type Person

s p o label l3 : propagation operator q6:

&a fname Alice

s p o label

: default access token q7:

Page 9: WP3: Data Provenance and Access Control

Slide 10 of 25

Annotating Triples with Access TokensAuthorization Rules

A1: (construct {?x lname ?y} where {?x type Student},l1)

A2:(construct {?x sc ?y},l2)

A3:(construct {?x type Student},l3)

Authorizations(Query, abstract token)

A4:(construct {?x type class},l4)

A5:(construct {Student sc ?y},l5)

RDF quadruples

q1:

q2:

q3:

q4:

q5:

Student sc Person

Person sc Agent

&a type Student

&a fname Alice

Agent type class

l2

l2

l3

l4

q6: Student sc Person l5

s p o l

Page 10: WP3: Data Provenance and Access Control

Slide 11 of 25

Computing the labels of implied triples

RDFS Inference: generate new knowledge

RDF quadruples

q1:

q2:

q3:

q5:

Student sc Person

Person sc Agent

&a type Student

Agent type class

l2

l2

l3

l4

q6: Student sc Person l5

s p o lq8:

q9:

q10:

q11:

q12:

Student sc Agent

Student sc Agent

&a type Person

&a type Agent

&a type Agent

l2 ⊙ l2

l5 ⊙ l2

l3 ⊙ l2

(l3 ⊙ l2) ⊙ l2

(l5 ⊙ l2) ⊙ l2

s p o l

(A1, sc, A2, l1) (A2, sc, A3, l2) (A1, sc, A3, l1 ⊙ l2)

(&r1, type, A1, l1)

(A1, sc, A2, l2) (&r1, type, A2, l1 ⊙ l2)

- subClassOf

- typeOf

Page 11: WP3: Data Provenance and Access Control

Slide 12 of 25

Computing the labels of implied triples

RDFS Inference: generate new knowledge

RDF quadruples

q1:

q2:

q3:

q5:

Student sc Person

Person sc Agent

&a type Student

Agent type class

l2

l2

l3

l4

q6: Student sc Person l5

s p o l

q8:

q9:

q10:

q11:

q12:

Student sc Agent

Student sc Agent

&a type Person

&a type Agent

&a type Agent

l2 ⊙ l2

l5 ⊙ l2

l3 ⊙ l2

(l3 ⊙ l2) ⊙ l2

(l5 ⊙ l2) ⊙ l2

s p o l

(A1, sc, A2, l1) (A2, sc, A3, l2) (A1, sc, A3, l1 ⊙ l2)

(&r1, type, A1, l1)

(A1, sc, A2, l2) (&r1, type, A2, l1 ⊙ l2)

- subClassOf

- typeOf

Page 12: WP3: Data Provenance and Access Control

Slide 13 of 25

Computing the propagated labels Propagating labels along the RDFS hierarchies:

no new knowledge created(A1, type, class, l1)

(A2, type, A1, (l1 ))

(A2, type, A1, l2)

RDF quadruples

q1:

q2:

q3:

q5:

Student sc Person

Person sc Agent

&a type Student

Agent type class

l2

l2

l3

l4

q6: Student sc Person l5

s p o lq8:

q9:

q10:

q11:

q12:

Student sc Agent

Student sc Agent

&a type Person

&a type Agent

&a type Agent

l2 ⊙ l2

l5 ⊙ l2

l3 ⊙ l2

(l3 ⊙ l2) ⊙ l2

(l5 ⊙ l2) ⊙ l2

q13: &a type Agent l4

s p o l

Page 13: WP3: Data Provenance and Access Control

Slide 14 of 25

From Abstract Models to Concrete Policies Concrete Policies implement Abstract Model:

◦ Set of Concrete Tokens ◦ Mapping from abstract to concrete tokens◦ Set of concrete operators that implement the abstract

ones◦ Conflict resolution operator to resolve cases where the

same triple is assigned different labels◦ Access Function to decide when a triple is accessible

Page 14: WP3: Data Provenance and Access Control

Slide 15 of 25

Concrete Policy C1 Concrete Tokens: LP = { true, false} Concrete Operators:

◦ Entailment Operator ⊙

◦ Propagation Operator : Identity◦ Conflict Resolution Operator

Access function: triples with label true are accessible, otherwise, inaccessible

l1 l2 if l1 and l2 are different from l1 if l2 is

if l1 , l2 are equal to l1 ⊙ l2 =

false if value false is in Strue if false does not belong in S

otherwise S =

Page 15: WP3: Data Provenance and Access Control

Slide 16 of 25

Concrete Policy C1

q1:

q2:

q3:

q5:

Student sc Person

Person sc Agent

&a type Student

Agent type class

l2

l2

l3

l4

q6: Student sc Person l5

s p o l

q8:

q9:

Student sc Agent

Student sc Agent

l2 ⊙ l2

l5 ⊙ l2

q13: &a type Agent l4

Mapping: l2 true l4 true l5 false l3 true

q1:

q2:

q3:

q5:

Student sc Person

Person sc Agent

&a type Student

Agent type class

true

true

true

true

q6: Student sc Person false

s p o l

q8:

q9:

Student sc Agent

Student sc Agent

true

false

q13: &a type Agent true

Page 16: WP3: Data Provenance and Access Control

Slide 17 of 25

Evaluation Implementation:

◦ Relational Schema: Authorization(auth_id, query, access_token) ExplicitTriples(qid, s, p, o, auth_id): stores explicit triples InferopTriples(qid, s, p, o): stores implicit quadruples PropOp(qid, s, p, o, l): stores the propagated quadruples LabelStore(qid, qid_uses): stores for each implied and

propagated quadruple, the explicit quadruples that are used in its implication;

◦ Query Engine: CWI’s MonetDB Open Source Column Store Stored Procedures are used for the computation of the access

labels

Page 17: WP3: Data Provenance and Access Control

Slide 18 of 25

Benchmark Datasets:

◦ Synthetic Datasets: produced by PowerGen [1]◦ Real Datasets: CIDOC, GO, GeoSpecies and GADM Datasets

Authorizations:◦ 15 authorizations

Concrete Policy:◦ Access Tokens: {low, medium, high}

low < medium < high◦ Mapping:

same triple is assigned different labels and the same label is assigned to different triples ◦5 authorizations - assign to 35% of triples low label◦5 authorizations - assign to 55% of triples medium label◦5 authorizations - assign to 45% of triples high label

◦ Operators: Entailment, Conflict Resolution: min(), Propagation: identity

Page 18: WP3: Data Provenance and Access Control

Slide 19 of 25

ExperimentsExperiment 1: Annotation Time

◦the time required to compute the implied quadruples and the propagated labels

Experiment 2: Evaluation Time ◦the time needed to compute for a concrete

policy, the concrete access label of a percentage of the RDF triples

Page 19: WP3: Data Provenance and Access Control

Slide 20 of 25

Real DatasetsCharacteristics

Experiment 1: Annotation Time

Dataset C P CI PI SC SP Type Explicit Implied Totalgeospecies_plus-dbpedia_no-owl.nt111 40 88868 48145 97 14 127587 2591011 496028 3067278

go_3.5.11.nt 0 0 0 0 55169 0 35451 265355 395517 659054

gadm-rdf-0.94_plus-dbpedia-geo_no-owl.nt26 40 2915 41387 25 0 1856187 11303686 9 11303693

cidoc_crm_v5.0.2.nt 82 260 0 0 94 130 342 3282 451 3714

35451 35451 35451

Page 20: WP3: Data Provenance and Access Control

Slide 21 of 25

Real Datasets: Experiment 2

Page 21: WP3: Data Provenance and Access Control

Slide 22 of 25

Synthetic Datasets: Experiment 1

Annotation time does not depend on the number of explicit (i.e, initial triples in the dataset)

but on the number of produced implied quadruples

Page 22: WP3: Data Provenance and Access Control

Slide 23 of 25

Synthetic Datasets: Experiment 2 Characteristics

Observations ◦ Dataset D2 has a larger complexity than D1 (due to the

larger depth) leading to more complex abstract expressions

Page 23: WP3: Data Provenance and Access Control

Slide 24 of 25

Pros & Cons of Abstract Access Control Models Pros:

◦ Same application can experiment with different concrete policies over the same dataset e.g., liberal vs conservative policies for different classes

of users◦ Different applications can experiment with different

concrete policies for the same data◦ In the case of updates there is no need re-compute the

access labels of the inferred triples Cons:

◦ overhead in the required storage space expressions that describe how a label is computed can

become quite complex depending on the structure of the dataset

Page 24: WP3: Data Provenance and Access Control

Slide 25 of 25

WP3: Work Plan View18 24 30 366 120

Task 3.1ProvenanceManagement

Task 3.2Privacy, DRM and Access Control

Task 3.3Trust management

D 3.4 Trust management and inference system

FORTHFORTH

42D 3.2 Provenance management and propagation through SPARQL queryand update languages

D 3.1 Access control specification language, reasoning and enforcement mechanisms

FORTHFORTH

EPFLEPFL

D 3.3 Access control system andprivacy-aware language

Page 25: WP3: Data Provenance and Access Control

Slide 26 of 25

BackUp Slides