an owl based schema for personal data protection policies giles hogben joint research centre,...
TRANSCRIPT
An OWL based schema for personal data protection policies
Giles Hogben
Joint Research Centre, European Commission
Overview
• Introduction – what is P3P and the Base Data Schema
• Why do we need a generic data schema for personal data (outside of P3P)?
• Other schemas available• Modelling the schema in OWL
– Model– Reasoning– Validation
• Further work
Intro• P3P – Platform for Privacy Preferences• W3C XML standard for expressing web site privacy policies (2001)• Statements about data practices by data type• Example of use of data schema<STATEMENT>
<PURPOSE><develop/></PURPOSE> <RECIPIENT><ours/></RECIPIENT> <RETENTION><indefinitely/></RETENTION><DATA-GROUP>
<DATA ref="#dynamic.cookies“/> </DATA-GROUP>
</STATEMENT>
Requirements• P3P data schema works OK within P3P 1.0 and
1.1 but many uses outside of P3P scope. • EPAL (Enterprise Privacy Authorization
Language) • CC/PP • PRIME
– Obligations – Credential metadata – Data-handling
Requirements– Reasoning about credential types (e.g. Driver’s licence
valid => Over 18) – Reasoning about data handling: e.g. purpose marketing,
opt-out -> Risk of spam. – Obligation management – attach obligations to triples
without revealing content. – Automatic form-filling – implies reasoning about data
type equivalences between data store, data request and client preferences
– Identity management and privacy enhancing access control rules – reasoning about pseudonyms and linkability related to classes of data revealed.
Requirements• Reuseable data structures
• Type validation
• Efficient and extensible definition format
• Metadata on types
• Abstraction layer between privacy rules and enterprise data structures
Existing Schema Formats
• P3P1.0 Schema– Quirky syntax only understood by 3 people worldwide– Semantics understood by 2 people worldwide– Customization format understood by 0 people
worldwide– But all other versions share the same semantics as they are
required by the use cases (Reuseable, extensible, non-subclassed data structures)
E.g.<DATA-DEF name="business.contact-info" short-description="Contact Information
for the Organization" structref="#contact"><CATEGORIES><physical/><online/></CATEGORIES>
</DATA-DEF>
Existing Schema Formats
P3P1.1 SchemaUses XML syntax + informal semantics:E.g. <datatype>
<dynamic><cookies> <CATEGORIES type="preference"/> </cookies>
</dynamic>
</datatype>
Existing Schema Formats
• Existing Schema Formats• RDFS Schema for P3P (
http://www.w3.org/TR/p3p-rdfschema ) • Models every single class in the class hierarchy• Models classes of data as properties.
– Difficult to describe instance data– Metadata for properties less natural
• Email can be seen as a property, but what is the Dynamic/Cookies property?
OWL Schema
• Models semantics of P3P 1.0 data schema
• Allows reference from RDF -> reasoning
• Allows type validation
• Simplifies syntax esp extensibility syntax
BUT
• Modelling P3P semantics exactly => Modal logic which makes some reasoning nasty
Structure of Existing Schema
Personname Bdate
User
Gender
Thirdparty
Cert
Entity May CollectDataClassX User
Name
Given Prefix
Some Values From OnlysubClass
• A hierarchy of sorts
• but NOT subclass hierarchy
• Essentially semantic and syntactic validation scheme.
EmployerAddress
Thirdparty
Name
PrefixGiven
How to model the existing structure
• Formal set theory definition
))((, lAiiLl laLlAa :,
Personname Bdate
User
Gender
Thirdparty
Cert
For A (User) SVFO L (Cert,Personname…)
Shortcut
<owl:Class rdf:ID="A">
<customNS:SVFO rdf:parseType="Collection">
<B/>
<C/>
</customNS:SVFO>
</owl:Class>
Data handling statements and reasoning use case
Entity
May CollectDataClassX
User
Name
Given Prefix
subClass
A service states that it may collect any values from the class User data
A user agent rule says to block transfer to any services which might collect Given name data.
Note the modal predicate May collect, which changes the expected logic
Data handling statements and reasoning use case
Entity
May CollectDataClassX
User
Name
Given Prefix
subClass
The agent needs to deduce:
if a service may collect values from User data, it may also collect values from Name
Applying the same rule again, if a service may collect values from Name, it may also collect values from GivenName
->
If a service may collect values from User, it may collect them from GivenName
For discussion of how this was achieved using Jena and OWL, see paper
Quickfix: Using shortcut classes
• Use of shortcut/convenience classes:<owl:Class rdf:ID="User.Name.Given">
<rdf:type rdf:resource="#Instantiateable"/>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="#User"/>
<owl:Class rdf:about="#Name"/>
<owl:Class rdf:about="#Given"/>
</owl:intersectionOf>
</owl:Class>
Advantage: More compact RDF<prime-PII:hasData>
<prime-PII:User.Name.Given ><rdf:value>Bob</rdf:value>
</prime-PII:User.Name.Given></prime-PII:hasData>
Instead of
<prime-PII:hasData><prime-PII:User >
<rdf:value>Bob</rdf:value><rdf:type rdf:resource=“Name”><rdf:type rdf:resource=“Given”>
</prime-PII:User></prime-PII:hasData>
(Important for adoption and acceptance by policy authors)
Advantage 2. Makes reasoning use case trivial
• Practical use cases only require matching concrete classes (described by the shortcut classes) with their ancestors in the hierarchy.
• By using shortcut classes in OWL, this is simply acheived since a standard OWL reasoner concludes:
<owl:Class rdf:ID="User.Name.Given"> <rdf:type rdf:resource="#Instantiateable"/> <owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="#User"/><owl:Class rdf:about="#Name"/><owl:Class rdf:about="#Given"/>
</owl:intersectionOf>
</owl:Class>
-> User.Name.Given rdfs:subClassOf User
Validation
• Structure provides some semantic validation through disjoint classes (e.g. City disjoint from Gender – so if something is typed as both city and gender data, it flags an error)
• OWL supports XSD datatyping for syntactic validation (e.g. string, numeric and allows customized types through Regex such as email addresses)
Summary
• We need an ontological model which satisfies the requirements of the P3P 1.0 data schema
• We can use OWL for this
• OWL satisfies (with difficulty) reasoning requirements
• provides validation features not provided by P3P syntax
Further work• Rethink structure without trying to be
backward compatible?• Multi language HR strings• Support for numerical reasoning
– e.g. not just Drivers’ Licence -> Majority age, but ?x has Drivers’ Licence -> [?a >= 18 <- ?x has ?a, ?a isA age] so e.g. Drivers’ licence => age > 16.
• Other more complex reasoning– e.g. ?x collects User.Name.Prefix -> [?x collects User.CivilStatus <-
User.Name.Gender = ‘female’]
That’s all folks
????????????????????????????????????????????????????????????????????????????????????????????????????????????