najmeh mousavi nejad, simon scerri, sören auer and elisa m. sibarani | eulaide: interpretation of...
TRANSCRIPT
EULAide: Interpretation of End-User License Agreements using Ontology-
Based Information Extraction
Najmeh Mousavi Nejad, Simon Scerri, Sören Auer & Elisa M. SibaraniSEMANTiCS16 - 12th International Conference on Semantic Systems
Leipzig, September 12 - 15. 2016
DAADDeutscher Akademischer Austauschdienst
German Academic Exchange Service
2
Outline Motivation Related Work Approach: EULAide Evaluation & Results Conclusions & Future Works
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
3
Motivation Online research commissioned by
Skandia1
7% read online EULAs when signing up for products & services
21% suffered as a result of ticking EULA box without reading them
10% locked into a longer term contract than they expected
5% lost money by not being able to cancel or amend hotels or holidays
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
1 http://www.prnewswire.co.uk/news-releases/skandia-takes-the-terminal-out-of-terms-and-conditions-145280565.html
4
Problem Statement Given an EULA (End-User License Agreement), we want to
extract permissions, prohibitions & duties from it Violation of EULA : legal punishments, including federal fines Specifications
Complexity Emerge of new regulations Possible change of regulations Long texts
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
5
Involved Communities Permission & Obligations Expression Working
Group1
A World Wide Web Consortium (W3C) group Mission: defining a semantic data model for
expressing permissions and obligations statements
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
1 https://www.w3.org/2016/poe/charter
Spare Us the small print2
A campaign run by Fairer Finance Mission: getting rid of lengthy terms & conditions
2 http://www.fairerfinance.com/campaigns/spare-us-the-small-print
6
Related Work Manual
Online service (tldrlegal.com)
Semi-automatic NLL2RDF (Cabrio, et al. ,ESWC 2014)
First attempt to generate RDF expressions of EULAs Exploit CC REL & ODRL vocabularies Supervised machine learning Limitation: few number of rights
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
7
Vocabularies & Ontologies for EULAsName Domain Coverage Last
ReleaseCC REL Linked data 2013/11ODRL Open digital content 2015/03LDR (derived from ODRL) Linked data resources 2014/09LiMo Open data 2013/05L4LOD Web of data 2013/05ODRS Open data 2013/07MPEG-21 Rights Data Dictionary
Contains the terms as standardized in ISO/IEC 21000-6
2005/07
IPROnto Intellectual property rights (focus: ecommerce)
2003/12
Copyright ontology Digital rights management 2014/01Semantic Copyright - Basic Works in digital formats 2009/10Semantic Copyright - Registry
Works in digital formats 2009/10Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Approach: OBIE
Why OBIE? Relying on domain expert
knowledge Clear structure & terminologies of EULAs
Our chosen ontology: ODRL Most recently updated Broad enough Highest community endorsement
8
Policy(e.g, Request)
Asset
Constraint
Rule(e.g.,
Permission)
Action(e.g., sell)
Party(e.g.,
Individual)
permissionprohibition
duty
constraint
function(e.g., assignee)
action
target output
Leveraging OBIE to classify EULAs into predefined categories
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Black Box Architecture
9 Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
EULAide
EULA
Ontology
Annotation Types: Permission,
Prohibition, Duty
Architecture of EULAide
Sentence Splitter
Morphological Analyser
POS Tagger
Linguistic Pre-Processing
Ontology-based-
Gazetteer
EULA OBIE Transducer
Annotation Types: Permission,
Prohibition, Duty
ODRL EnhancementGazetteer
User Interface
EULA
Ontology
enhanced ODRL Ontology
Pre-processedEULA
Annotated concepts
GATE EULA OBIE Pipeline
Tokeniser
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction10
11
EULA OBIE Transducer
JAPE transducer based on ODRL community specification documentations
Example of Permission Rulesphase rule example
AnnotateClasses
PermissionAction Copy, Reproduce, Delete
PermissionWords May, grant, allow
ExtractPermissions
[Subj][PermissionWords][PermissionAction]+[Asset]
[You][May][copy, share and reproduce][the product]
[License][PermissionWords][object][PermissionAction]+[Asset]
[This license] [grants] [you] [to copy, share and reproduce] [the product]
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
EULA OBIE Transducer
12
Example of Permission
Sentence: This license grants you to copy, share and reproduce the product.
Text-file Gazetteer
This license grants you to copy, share and reproduce the product (License) (Asset)
Ontology-based Gazetteer
This license grants you to copy, share and reproduce the product
(Lookups)
annotateClasses phase
This license grants you to copy, share and reproduce the product
(PermWords) (PermissionActions)
extractPermissions
This license grants you to copy, share and reproduce the product (License) (PermWords) (obj) (PermissionActions)+
(Asset)Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
13
Implementation of EULAide using GATE GATE: General Architecture for Text Engineering
Open source software University of Sheffield Written in Java Initial release: 1995
We chose GATE for: Its ANNIE IE system & its support for JAPE grammar rules Excellent support for OBIE approaches Support for evaluation tools Embedded API
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
14
Gold Standard Creation 20 popular licenses including 193 permissions, 185 prohibitions & 168
duties Average words: 3,206 Average characters without space: 16,815
Using GATE IAA pluginPrecision Recall F-measurePermission 0.94 0.9 0.92Prohibition 0.79 0.94 0.86
Duty 0.86 0.96 0.91Summary 0.87 0.93 0.9IAA (Inter Annotator Agreement) for two AnnotatorsNajmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based
Information Extraction
15
Evaluation: GATE Corpus Quality Assurance Apply EULAide to the gold
standard F-measure calculation for two
conditionsEvaluation of EULAide without Ontology Enhancement
Precision
Recall F0.5 F1 F2
Permission 0.74 0.75 0.74 0.74
0.75
Prohibition 0.89 0.63 0.82 0.74
0.66
Duty 0.66 0.67 0.67 0.67
0.67
Summary 0.75 0.68 0.74 0.72
0.7Evaluation of EULAide with Ontology Enhancement
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Without ontology enhancement transducer
Precision
Recall F0.5
F1 F2
Permission
0.75 0.56 0.71
0.64
0.59
Prohibition
0.89 0.47 0.75
0.61
0.52
Duty 0.73 0.43 0.64
0.54
0.46
Summary 0.79 0.49 0.7 0.6 0.53
With ontology enhancement transducer
16
Evaluation – Example of Failure Permission:
False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”
False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject to system requirements”
17
Example: Facebook EULA
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Conclusions
Sentence SplitterMorphological
AnalyserPOS Tagger
Linguistic Pre-Processing
Ontology-based-
Gazetteer
EULA OBIE Transducer
Annotation Types:
Permission, Prohibition,
Duty
ODRL EnhancementGazetteer
User Interface
EULA
Ontologyenhanced ODRL Ontology
Pre-processedEULA
Annotated concepts
GATE EULA OBIE Pipeline
Tokeniser
Identification of ontologies and vocabularies in EULA domain EULAide: semi-automatic ontology-based annotation of EULAs Ontology enhancement by adding additional concepts IAA of 90% F-measure of more than 70% Applying the pipeline to different kind of front ends (using GATE API)
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction18
19
Future Works Extract more policies and rights (e.g., agreements, constraints, etc.) Compilation of a more comprehensive ontology based on ODRL Combining different IE methods Implementation of a RESTful application in Java Planning to design a mobile app Integrating GATE with Stanford NLP to have a more accurate extraction (e.g., using coreference)
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
Email: [email protected]
20
References
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
• All images are from Pixabay which are released under Creative Commons CC0 into the public domain.
21
Backup Slides
Najmeh Mousavi Nejad, EULAide: Interpretation of End-User License Agreements using Ontology-Based Information Extraction
• The plugin offers two types of IAA measurement: Fmeasure and agreement based on the kappa statistic. The latter has been criticized and has a number of well-known limitations. Kappa is suitable when annotators have the same number of instances but with different class labels. It is not recommended for text mark-up tasks, such as named entity recognition and information extraction [9]. When the annotators themselves determine which text spans they can annotate, the F-measure should be used. The F-measure has been less controversial and is also indicated as the most appropriate IAA measure in the GATE manual itself, given the nature of our annotation task [5].
22
Evaluation – Example of Failure Permission:
False positive: “nothing other than this License grants you permission to propagate or modify any covered work.”
Prohibition: False positive: “Do not use such Services in a way that distracts you and prevents you from obeying traffic or
safety laws.”
Duty: False positive: “Each Recipient is solely responsible for determining the appropriateness of using and
distributing the Program”
False negative: “The number of permitted participants on a group video call varies from 3 to a maximum of 10, subject to system requirements”
False negative: “No one other than Sun has the right to modify the terms applicable to Covered Code created under this License”
False negative: “The GNU GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions”