an owl based schema for personal data protection policies giles hogben joint research centre,...

22
An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Upload: stewart-mcgee

Post on 24-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

An OWL based schema for personal data protection policies

Giles Hogben

Joint Research Centre, European Commission

Page 2: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Overview

• Introduction – what is P3P and the Base Data Schema

• Why do we need a generic data schema for personal data (outside of P3P)?

• Other schemas available• Modelling the schema in OWL

– Model– Reasoning– Validation

• Further work

Page 3: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Intro• P3P – Platform for Privacy Preferences• W3C XML standard for expressing web site privacy policies (2001)• Statements about data practices by data type• Example of use of data schema<STATEMENT>

<PURPOSE><develop/></PURPOSE> <RECIPIENT><ours/></RECIPIENT> <RETENTION><indefinitely/></RETENTION><DATA-GROUP>

<DATA ref="#dynamic.cookies“/> </DATA-GROUP>

</STATEMENT>

Page 4: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Requirements• P3P data schema works OK within P3P 1.0 and

1.1 but many uses outside of P3P scope. • EPAL (Enterprise Privacy Authorization

Language) • CC/PP • PRIME

– Obligations – Credential metadata – Data-handling

Page 5: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Requirements– Reasoning about credential types (e.g. Driver’s licence

valid => Over 18) – Reasoning about data handling: e.g. purpose marketing,

opt-out -> Risk of spam. – Obligation management – attach obligations to triples

without revealing content. – Automatic form-filling – implies reasoning about data

type equivalences between data store, data request and client preferences

– Identity management and privacy enhancing access control rules – reasoning about pseudonyms and linkability related to classes of data revealed.

Page 6: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Requirements• Reuseable data structures

• Type validation

• Efficient and extensible definition format

• Metadata on types

• Abstraction layer between privacy rules and enterprise data structures

Page 7: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Existing Schema Formats

• P3P1.0 Schema– Quirky syntax only understood by 3 people worldwide– Semantics understood by 2 people worldwide– Customization format understood by 0 people

worldwide– But all other versions share the same semantics as they are

required by the use cases (Reuseable, extensible, non-subclassed data structures)

E.g.<DATA-DEF name="business.contact-info" short-description="Contact Information

for the Organization" structref="#contact"><CATEGORIES><physical/><online/></CATEGORIES>

</DATA-DEF>

Page 8: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Existing Schema Formats

P3P1.1 SchemaUses XML syntax + informal semantics:E.g. <datatype>

<dynamic><cookies> <CATEGORIES type="preference"/> </cookies>

</dynamic>

</datatype>

Page 9: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Existing Schema Formats

• Existing Schema Formats• RDFS Schema for P3P (

http://www.w3.org/TR/p3p-rdfschema ) • Models every single class in the class hierarchy• Models classes of data as properties.

– Difficult to describe instance data– Metadata for properties less natural

• Email can be seen as a property, but what is the Dynamic/Cookies property?

Page 10: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

OWL Schema

• Models semantics of P3P 1.0 data schema

• Allows reference from RDF -> reasoning

• Allows type validation

• Simplifies syntax esp extensibility syntax

BUT

• Modelling P3P semantics exactly => Modal logic which makes some reasoning nasty

Page 11: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Structure of Existing Schema

Personname Bdate

User

Gender

Thirdparty

Cert

Entity May CollectDataClassX User

Name

Given Prefix

Some Values From OnlysubClass

• A hierarchy of sorts

• but NOT subclass hierarchy

• Essentially semantic and syntactic validation scheme.

EmployerAddress

Thirdparty

Name

PrefixGiven

Page 12: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

How to model the existing structure

• Formal set theory definition

))((, lAiiLl laLlAa :,

Personname Bdate

User

Gender

Thirdparty

Cert

For A (User) SVFO L (Cert,Personname…)

Page 13: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Shortcut

<owl:Class rdf:ID="A">

<customNS:SVFO rdf:parseType="Collection">

<B/>

<C/>

</customNS:SVFO>

</owl:Class>

Page 14: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Data handling statements and reasoning use case

Entity

May CollectDataClassX

User

Name

Given Prefix

subClass

A service states that it may collect any values from the class User data

A user agent rule says to block transfer to any services which might collect Given name data.

Note the modal predicate May collect, which changes the expected logic

Page 15: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Data handling statements and reasoning use case

Entity

May CollectDataClassX

User

Name

Given Prefix

subClass

The agent needs to deduce:

if a service may collect values from User data, it may also collect values from Name

Applying the same rule again, if a service may collect values from Name, it may also collect values from GivenName

->

If a service may collect values from User, it may collect them from GivenName

For discussion of how this was achieved using Jena and OWL, see paper

Page 16: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Quickfix: Using shortcut classes

• Use of shortcut/convenience classes:<owl:Class rdf:ID="User.Name.Given">

<rdf:type rdf:resource="#Instantiateable"/>

<owl:intersectionOf rdf:parseType="Collection">

<owl:Class rdf:about="#User"/>

<owl:Class rdf:about="#Name"/>

<owl:Class rdf:about="#Given"/>

</owl:intersectionOf>

</owl:Class>

Page 17: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Advantage: More compact RDF<prime-PII:hasData>

<prime-PII:User.Name.Given ><rdf:value>Bob</rdf:value>

</prime-PII:User.Name.Given></prime-PII:hasData>

Instead of

<prime-PII:hasData><prime-PII:User >

<rdf:value>Bob</rdf:value><rdf:type rdf:resource=“Name”><rdf:type rdf:resource=“Given”>

</prime-PII:User></prime-PII:hasData>

(Important for adoption and acceptance by policy authors)

Page 18: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Advantage 2. Makes reasoning use case trivial

• Practical use cases only require matching concrete classes (described by the shortcut classes) with their ancestors in the hierarchy.

• By using shortcut classes in OWL, this is simply acheived since a standard OWL reasoner concludes:

<owl:Class rdf:ID="User.Name.Given"> <rdf:type rdf:resource="#Instantiateable"/> <owl:intersectionOf rdf:parseType="Collection">

<owl:Class rdf:about="#User"/><owl:Class rdf:about="#Name"/><owl:Class rdf:about="#Given"/>

</owl:intersectionOf>

</owl:Class>

-> User.Name.Given rdfs:subClassOf User

Page 19: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Validation

• Structure provides some semantic validation through disjoint classes (e.g. City disjoint from Gender – so if something is typed as both city and gender data, it flags an error)

• OWL supports XSD datatyping for syntactic validation (e.g. string, numeric and allows customized types through Regex such as email addresses)

Page 20: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Summary

• We need an ontological model which satisfies the requirements of the P3P 1.0 data schema

• We can use OWL for this

• OWL satisfies (with difficulty) reasoning requirements

• provides validation features not provided by P3P syntax

Page 21: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

Further work• Rethink structure without trying to be

backward compatible?• Multi language HR strings• Support for numerical reasoning

– e.g. not just Drivers’ Licence -> Majority age, but ?x has Drivers’ Licence -> [?a >= 18 <- ?x has ?a, ?a isA age] so e.g. Drivers’ licence => age > 16.

• Other more complex reasoning– e.g. ?x collects User.Name.Prefix -> [?x collects User.CivilStatus <-

User.Name.Gender = ‘female’]

Page 22: An OWL based schema for personal data protection policies Giles Hogben Joint Research Centre, European Commission

That’s all folks

????????????????????????????????????????????????????????????????????????????????????????????????????????????