najmeh mousavi nejad, simon scerri, sören auer and elisa m. sibarani | eulaide: interpretation of...

22
EULAide: Interpretation of End-User License Agreements using Ontology- Based Information Extraction Najmeh Mousavi Nejad, Simon Scerri, Sören Auer & Elisa M. Sibarani SEMANTiCS16 - 12 th International Conference on Semantic Systems Leipzig, September 12 - 15. 2016 DAAD Deutscher Akademischer Austauschdienst German Academic Exchange Service

Upload: semanticsconference

Post on 07-Jan-2017

58 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

EULAide: Interpretation of End-User License Agreements using Ontology-

Based Information Extraction

Najmeh Mousavi Nejad, Simon Scerri, Sören Auer & Elisa M. SibaraniSEMANTiCS16 - 12th International Conference on Semantic Systems

Leipzig, September 12 - 15. 2016

DAADDeutscher Akademischer Austauschdienst

German Academic Exchange Service

Page 2: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

2

Outline Motivation Related Work Approach: EULAide Evaluation & Results Conclusions & Future Works

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Page 3: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

3

Motivation Online research commissioned by

Skandia1

7% read online EULAs when signing up for products & services

21% suffered as a result of ticking EULA box without reading them

10% locked into a longer term contract than they expected 

5% lost money by not being able to cancel or amend hotels or holidays

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html

Page 4: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

4

Problem Statement Given an EULA (End-User License Agreement), we want to

extract permissions, prohibitions & duties from it Violation of EULA : legal punishments, including federal fines Specifications

Complexity Emerge of new regulations Possible change of regulations Long texts

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Page 5: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

5

Involved Communities Permission & Obligations Expression Working

Group1

A World Wide Web Consortium (W3C) group Mission: defining a semantic data model for

expressing permissions and obligations statements

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

1 https://www.w3.org/2016/poe/charter

Spare Us the small print2

A campaign run by Fairer Finance Mission: getting rid of lengthy terms & conditions

2 http://www.fairerfinance.com/campaigns/spare-us-the-small-print

Page 6: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

6

Related Work Manual

Online service (tldrlegal.com)

Semi-automatic NLL2RDF (Cabrio, et al. ,ESWC 2014)

First attempt to generate RDF expressions of EULAs Exploit CC REL & ODRL vocabularies Supervised machine learning Limitation: few number of rights

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Page 7: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

7

Vocabularies & Ontologies for EULAsName Domain Coverage Last

ReleaseCC REL Linked data 2013/11ODRL Open digital content 2015/03LDR (derived from ODRL) Linked data resources 2014/09LiMo Open data 2013/05L4LOD Web of data 2013/05ODRS Open data 2013/07MPEG-21 Rights Data Dictionary

Contains the terms as standardized in ISO/IEC 21000-6

2005/07

IPROnto Intellectual property rights (focus: ecommerce)

2003/12

Copyright ontology Digital rights management 2014/01Semantic Copyright - Basic Works in digital formats 2009/10Semantic Copyright - Registry

Works in digital formats 2009/10Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Page 8: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Approach: OBIE

Why OBIE? Relying on domain expert

knowledge Clear structure & terminologies of EULAs

Our chosen ontology: ODRL Most recently updated Broad enough Highest community endorsement

8

Policy(e.g, Request)

Asset

Constraint

Rule(e.g.,

Permission)

Action(e.g., sell)

Party(e.g.,

Individual)

permissionprohibition

duty

constraint

function(e.g., assignee)

action

target output

Leveraging OBIE to classify EULAs into predefined categories

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Page 9: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Black Box Architecture

9 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

EULAide

EULA

Ontology

Annotation Types: Permission,

Prohibition, Duty

Page 10: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Architecture of EULAide

Sentence Splitter

Morphological Analyser

POS Tagger

Linguistic Pre-Processing

Ontology-based-

Gazetteer

EULA OBIE Transducer

Annotation Types: Permission,

Prohibition, Duty

ODRL EnhancementGazetteer

User Interface

EULA

Ontology

enhanced ODRL Ontology

Pre-processedEULA

Annotated concepts

GATE EULA OBIE Pipeline

Tokeniser

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction10

Page 11: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

11

EULA OBIE Transducer

JAPE transducer based on ODRL community specification documentations

Example of Permission Rulesphase rule example

AnnotateClasses

PermissionAction Copy, Reproduce, Delete

PermissionWords May, grant, allow

ExtractPermissions

[Subj][PermissionWords][PermissionAction]+[Asset]

[You][May][copy, share and reproduce][the product]

[License][PermissionWords][object][PermissionAction]+[Asset]

[This license] [grants] [you] [to copy, share and reproduce] [the product]

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

EULA OBIE Transducer

Page 12: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

12

Example of Permission

Sentence: This license grants you to copy, share and reproduce the product.

Text-file Gazetteer

This license grants you to copy, share and reproduce the product (License) (Asset)

Ontology-based Gazetteer

This license grants you to copy, share and reproduce the product

(Lookups)

annotateClasses phase

This license grants you to copy, share and reproduce the product

(PermWords) (PermissionActions)

extractPermissions

This license grants you to copy, share and reproduce the product (License) (PermWords) (obj) (PermissionActions)+

(Asset)Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Page 13: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

13

Implementation of EULAide using GATE GATE: General Architecture for Text Engineering

Open source software University of Sheffield Written in Java Initial release: 1995

We chose GATE for: Its ANNIE IE system & its support for JAPE grammar rules Excellent support for OBIE approaches Support for evaluation tools Embedded API

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Page 14: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

14

Gold Standard Creation 20 popular licenses including 193 permissions, 185 prohibitions & 168

duties Average words: 3,206 Average characters without space: 16,815

Using GATE IAA pluginPrecision Recall F-measurePermission 0.94 0.9 0.92Prohibition 0.79 0.94 0.86

Duty 0.86 0.96 0.91Summary 0.87 0.93 0.9IAA (Inter Annotator Agreement) for two AnnotatorsNajmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based

Information Extraction

Page 15: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

15

Evaluation: GATE Corpus Quality Assurance Apply EULAide to the gold

standard F-measure calculation for two

conditionsEvaluation of EULAide without Ontology Enhancement

Precision

Recall F0.5 F1 F2

Permission 0.74 0.75 0.74 0.74

0.75

Prohibition 0.89 0.63 0.82 0.74

0.66

Duty 0.66 0.67 0.67 0.67

0.67

Summary 0.75 0.68 0.74 0.72

0.7Evaluation of EULAide with Ontology Enhancement

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Without ontology enhancement transducer

Precision

Recall F0.5

F1 F2

Permission

0.75 0.56 0.71

0.64

0.59

Prohibition

0.89 0.47 0.75

0.61

0.52

Duty 0.73 0.43 0.64

0.54

0.46

Summary 0.79 0.49 0.7 0.6 0.53

With ontology enhancement transducer

Page 16: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

16

Evaluation – Example of Failure Permission:

False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”

False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject to system requirements”

Page 17: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

17

Example: Facebook EULA

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Page 18: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Conclusions

Sentence SplitterMorphological

AnalyserPOS Tagger

Linguistic Pre-Processing

Ontology-based-

Gazetteer

EULA OBIE Transducer

Annotation Types:

Permission, Prohibition,

Duty

ODRL EnhancementGazetteer

User Interface

EULA

Ontologyenhanced ODRL Ontology

Pre-processedEULA

Annotated concepts

GATE EULA OBIE Pipeline

Tokeniser

Identification of ontologies and vocabularies in EULA domain EULAide: semi-automatic ontology-based annotation of EULAs Ontology enhancement by adding additional concepts IAA of 90% F-measure of more than 70% Applying the pipeline to different kind of front ends (using GATE API)

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction18

Page 19: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

19

Future Works Extract more policies and rights (e.g., agreements, constraints, etc.) Compilation of a more comprehensive ontology based on ODRL Combining different IE methods Implementation of a RESTful application in Java Planning to design a mobile app Integrating GATE with Stanford NLP to have a more accurate extraction (e.g., using coreference)

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

Email: [email protected]

Page 20: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

20

References

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

• All images are from Pixabay which are released under Creative Commons CC0 into the public domain.

Page 21: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

21

Backup Slides

Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

• The plugin offers two types of IAA measurement: Fmeasure and agreement based on the kappa statistic. The latter has been criticized and has a number of well-known limitations. Kappa is suitable when annotators have the same number of instances but with different class labels. It is not recommended for text mark-up tasks, such as named entity recognition and information extraction [9]. When the annotators themselves determine which text spans they can annotate, the F-measure should be used. The F-measure has been less controversial and is also indicated as the most appropriate IAA measure in the GATE manual itself, given the nature of our annotation task [5].

Page 22: Najmeh Mousavi Nejad, Simon Scerri, Sören Auer and Elisa M. Sibarani | EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction

22

Evaluation – Example of Failure Permission:

False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”

Prohibition: False positive: “Do not use such Services in a way that distracts you and prevents you from obeying traffic or

safety laws.”

Duty: False positive: “Each Recipient is solely responsible for determining the appropriateness of using and

distributing the Program”

False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject to system requirements”

False negative: “No one other than Sun has the right to modify the terms applicable to Covered Code created under this License”

False negative: “The GNU GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions”