wp3: data provenance and access control

26
WP3: Data Provenance and Access Control Giorgos Flouris, Irini Fundulaki, Vassilis Papakonstantinou, FORTH September 9-10, 2013, Heraklion

Upload: kiet

Post on 16-Jan-2016

18 views

Category:

Documents


1 download

DESCRIPTION

WP3: Data Provenance and Access Control. Giorgos Flouris, Irini Fundulaki, Vassilis Papakonstantinou, FORTH September 9-10, 2013, Heraklion. Presentation Outline. WP3 status and outline Research achievements D3.2 status Review comments Health use case description Demo - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WP3: Data Provenance and Access Control

WP3: Data Provenance and Access Control

Giorgos Flouris, Irini Fundulaki, Vassilis Papakonstantinou, FORTHSeptember 9-10, 2013, Heraklion

Page 2: WP3: Data Provenance and Access Control

Slide 2

Presentation Outline

WP3 status and outline◦Research achievements

D3.2 statusReview commentsHealth use case descriptionDemoNext steps (on demo)

Page 3: WP3: Data Provenance and Access Control

Slide 3

WP3: Work Plan View

18 24 30 366 120

Task 3.1ProvenanceManagement

Task 3.2Privacy, DRM and Access Control

Task 3.3Trust Management

42

FORTHFORTH

FORTH, KITFORTH, KIT

EPFLEPFL

D 3.2 Provenance management and propagation through SPARQL query and update languagesD 3.2 Provenance management and propagation through SPARQL query and update languages

D 3.3 Access control system and privacy-aware language

D 3.3 Access control system and privacy-aware language

D 3.4 Trust management and inference system

D 3.1 Access control specification language,

reasoning and enforcement mechanisms

D 3.1 Access control specification language,

reasoning and enforcement mechanisms

Page 4: WP3: Data Provenance and Access Control

Slide 4

Research So Far (Outline)

Abstract models for access control (FORTH)Abstract models for provenance (FORTH)

◦Provenance for SPARQL query◦Provenance for SPARQL update

Privacy (KIT)◦Privacy in smart grids (not integrated)◦Some integration in the demo

Problems (non-critical) – to be discussedTrust (EPFL)

Page 5: WP3: Data Provenance and Access Control

Slide 5

Access Control

The selective exposure of information to different users/roles

Useful for applications involving sensitive information

In the context of LOD:◦Encourages publication of data that may

include sensitive informationStandard approach:

◦Data annotates with specific tags determining whether it should be accessible by specific users/roles

Page 6: WP3: Data Provenance and Access Control

Slide 6

Abstract LabelsTriples associated with abstract labelsA set of abstract tokens (a1, a2, …)

◦Explicit triples associated with such tokens via authorizations

Abstract operators (⊙, , )◦a1 ⊙ a2: the triple occurred via inference from

triples with labels a1, a2

◦a1: the triple occurred via propagation from a triple with label a1

◦a1 a2: the triple occurred in two different manners, one via a1, one via a2 (e.g., two different authorizations)

◦a1 (a2 ⊙ ( a3)): …

Page 7: WP3: Data Provenance and Access Control

Slide 7

Determining Accessibility

Concrete policy◦Associate tokens to concrete values◦Associate operators to concrete operations◦Determine whether the final value corresponds

to an accessible triple (access function)Example

◦a1=1, a2=2, a3=3

◦⊙=min, =max, =ID function◦Accessible iff result >1◦a1 (a2 ⊙ ( a3)) evaluates to 2 (i.e., triple is

accessible)

Page 8: WP3: Data Provenance and Access Control

Slide 8

SPARQL Query ProvenanceWhat is the provenance of the result of a

complex SPARQL query?Adapting relational solutions

◦Positive fragment (semirings) Works fine

◦Non-monotonic fragment (m-semirings) Problem with OPTIONAL, DIFFERENCE Different semantics than SQL

Two alternative approaches◦m-semirings: translation to SQL◦spm-semirings: a new operation (and the

corresponding properties) to capture the provenance of OPTIONAL, DIFFERENCE

Page 9: WP3: Data Provenance and Access Control

Slide 9

SPARQL Update Provenance

What is the provenance of a new triple, inserted via a complex SPARQL Update?

Similar to CONSTRUCT (query)But still different

◦CONSTRUCT creates a new triple but does not modify the dataset

◦Updates specify explicitly the named graph to put the new triple(s) Triples with different provenance may be put in the same named graph

Named graphs alone are not sufficient for capturing the provenance of updates

Page 10: WP3: Data Provenance and Access Control

Slide 10

D3.2 Status

Contents of D3.2◦Abstract models for provenance (very similar to

the abstract models for access control)◦Provenance for SPARQL query results◦Provenance for SPARQL update (inserted

triples)Review version uploaded on the wiki on 05/09/13

◦http://wiki.planet-data.eu/web/D3.2 ◦Only one reviewer at the moment (Oscar)

Volunteers?

Page 11: WP3: Data Provenance and Access Control

Slide 11

Review Comments

Generally happy (“impressed by D3.1”)Applicability

◦Usefulness: convince industry to look into that◦Focus on a real-world use case to demonstrate

valueIn a nutshell

◦Some implementation to show valueSolution: demo (use case)

◦Health use case◦Also suitable to show synergy

Page 12: WP3: Data Provenance and Access Control

Slide 12

Health Use Case

A use case to show applicability and usefulness◦In collaboration with Computational Medicine

Laboratory (CML) of FORTHHealth-related data are sensitiveProposed by the reviewers (Anders Tornquist)

◦Insurance companies need controlled access to sensitive medical data to determine premiums, insurance policies, contract terms etc

Relevant to access control/privacy challenges◦But also related to streaming, data quality and

trust

Page 13: WP3: Data Provenance and Access Control

Slide 13

Personal Health Record

Personal Health Record (PHR)◦Collection of data regarding a patient

Diseases, personal information, medications, clinical observations and findings, measurements, …

Properties◦Sensitive◦Dynamic, sometimes streaming◦Not always of good quality

Page 14: WP3: Data Provenance and Access Control

Slide 14

Relation to Other WPs Relation to WP1

◦ Part of the PHR data may be of streaming nature E.g., vital signs’ measurements of hospitalized patients

Relation to WP2◦ Data often of poor quality◦ Up to 26,9% of the data can be erroneous

Patient provides data, faulty readings, sensors etc Suggestion (for the review)

◦ Outline how the technologies developed in WP1, WP2 could be used (potentially) to address these issues

◦ Specific and concrete, but no implementation needed

Page 15: WP3: Data Provenance and Access Control

Slide 15

Access Control and Privacy

PHR (normally) accessible only by the patient◦Sensitive data

Doctors, nurses, hospitals, insurance companies, public services may require access

Informed Consent◦Patient allows access to (parts of) his PHR to

specific entities, for a specific purpose, in a specific timeframe etc

Via Consent Forms◦Formal, legal document

Page 16: WP3: Data Provenance and Access Control

Slide 16

Objectives

We will use this use case to demonstrate the benefits of our approach

Different entities have access to the same data, without accessing sensitive information◦Unless the owner of the data has explicitly

allowed so (via the consent form)Without replication

Page 17: WP3: Data Provenance and Access Control

Slide 17

Health Use Case Setting

Dataset(collection of PHRs)

Dataset Dat

asetD

atasetD

ataset Dat

aset

Page 18: WP3: Data Provenance and Access Control

Slide 18

result (triples)

SQL,concrete

policy

SPARQL

result(triples)

user request(accessing entity, SPARQL query)

Architecture (Data Access)

PACEM API

SPARQL to SQL Translation Module

accessing entity

concrete policy

•MonetDB •Abstract expressions DB

AnnotationModuleA

AC

AP

I

EvaluationModule

UpdateModule

l1⊙ l2

l2⊙ l3

&a type Person

s p o label

Student type class

l1⊙ l2

l2⊙ l3

&a type Person

s p o label

Student type class

User interface- authentication- queries

•User credentials for authentication

AUTH DB

AU

TH

AP

I

AUTHModule

•Purpose and role hierarchy•Assignment of concrete policies to accessing entities

CPRP DB

CP

RP

AP

ICPRP

Module

Page 19: WP3: Data Provenance and Access Control

Slide 19

Dataset

Advanced Patient Data Generator (APDG)◦Synthetic, but realistic data◦Developed in the context of EURECA (FP7 IP)

Data associated with large medical schemas◦HL7-RIM, SNOMED-CT

10K patients750K instance triples

Page 20: WP3: Data Provenance and Access Control

Slide 20

Data on HL7-RIM (1/2)

Page 21: WP3: Data Provenance and Access Control

Slide 21

Data on HL7-RIM (2/2)

Observation

http://kandel…./entityno/BC_ZSH2012A1000000

http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3

“Sally Berry”

foaf:name

Entity

Role Participation…

Page 22: WP3: Data Provenance and Access Control

Slide 22

Data on SNOMED-CT (1/2)

http://purl.bioontology…./408643008

“Infiltrating duct carcinoma of breast”

skos:prefLabel

Observation indicating that the patient has“infiltrating duct carcinoma of breast”

http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3

Page 23: WP3: Data Provenance and Access Control

Slide 23

Data on SNOMED-CT (2/2)

Infiltrating duct carc. of breast

Neoplasm of breast

Malignant tumor

of breast

Carc. of breast

Infiltrating lobular carc.

of breast

Carc. in situ of breast

Lobular carc. in situ of

breast

Intraductal carc. in situ of

breast

Page 24: WP3: Data Provenance and Access Control

Slide 24

Infiltrating duct carc. of breast

HL7-RIM and SNOMED-CT

Observation

Entity

http://kandel…./entityno/BC_ZSH2012A1000000

http://kandel.…/obsno/ 5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3

Neoplasm of breast

Malignant tumor

of breast

Carc. of breast

Infiltrating lobular carc.

of breast

Carc. in situ of breast

Lobular carc. in situ of

breast

Intraductal carc. in situ of

breast

“Sally Berry”

foaf:name

Page 25: WP3: Data Provenance and Access Control

Slide 25

Demo Scenario

Breast Cancer Action Fund (BCAF) provides benefits for cancer patients

Requires info on patients’ status to give the benefit

Sally Berry wants to apply for the benefit

Alternative: insurance company wants access to (part of) the data for determining the insurance premium and the contract terms

Demo: http://daphne.ics.forth.gr:8084/pd-demo/login.jsp

Page 26: WP3: Data Provenance and Access Control

Slide 26

Next Steps

Make more explicit the benefit of abstract models◦Efficient updates (no recomputation required)◦Efficient change of policies (no recomputation

required)Try more scenariosPurpose and role hierarchiesMore functionality