machine learning applied to legal practice · romain vial – nlp meetup s3#3 – 23/01/2019...
TRANSCRIPT
Machine Learning Applied to Legal Practice
Romain VialData Scientist @ Hyperlex
NLP Meetup Season 3 #3 - 23/01/2019
Romain Vial – NLP Meetup S3#3 – 23/01/2019 2
We create the AI which is able to search, understand and analyse legal terms and financial
data within millions of legal documents
Who is Hyperlex?
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Use Case: Due diligence
During a due diligence (1000+ docs), I would like to find all companies that signed a NDA with my client
● Retrieve all NDAs (document classification)
● Browse among particular clauses (document segmentation / clause classification)
● Look for companies (NER / disambiguation / Knowledge Graph)
3
NDA
Romain Vial – NLP Meetup S3#3 – 23/01/2019 4
Document recognition
Clauses recognition
Key elements detections
OCR & Image processing
Import with different formats
(pdf, docx, ....)
How do we help you tackle this workload?
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Main NLP Tasks @ Hyperlex
5
Document classification
Lease Loan NDA
Paragraphclassification
Governing LawConfidentialitySeverability
Named EntityRecognition
OrganisationDateDuration
by Hyperlex SAS
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Main NLP Tasks @ HyperlexThe standard classification pipeline in NLP
6
LeaseGoverning Law
Date
Representation Classification
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Main NLP Tasks @ HyperlexRepresentation is the key!
7
LeaseGoverning Law
Date
Representation Classification
All the complexity lies in the representation of the input
words/sentences/documents
● Labelled data is scarce
● Unlabelled data is not (or less)
● Representations can be shared among clients, classifiers cannot
Unsupervised methods are crucial
Romain Vial – NLP Meetup S3#3 – 23/01/2019
How to learn representations in an unsupervised fashion?
8
Problem: the task is undefined
Goal: we want to learn features that will generalize well to many downstream tasks
Document classification
Paragraphclassification
Named EntityRecognition
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPHistory
9M. Ranzato. Unsupervised Learning Tutorial Part 2. NIPS 2018
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPWord Vectors
10
Confidentialcat
Personal
“The Issuer hereby agrees to hold and treat all Confidential Information”
Main conclusions:
- A word can be defined by its context!- Two words are similar when they have similar context
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPWord Vectors
11
Confidentialcat
Personal
“The Issuer hereby agrees to hold and treat all Confidential Information”
In practice, you learn representations that are good at predicting nearby words.
Such embeddings allow for computing semantic similarities!
T. Mikolov et al. Distributed Representations of Words and Phrasesand their Compositionality. NIPS 2013
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPWord Vectors
12
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPWord Vectors
13
Confidentialcat
Personal
“The Issuer hereby agrees to hold and treat all Confidential Information”
Challenges:
- Learning need a large amount of data- How does one handle words like “party”?- Word embeddings are poor at describing sentences, the signal
becomes too noisy
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPContextualizing your Word Vectors
14
“The Issuer hereby agrees to hold and treat all Confidential Information”
J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.Pre-print 10/2018
1. Truly rely on sentence compositionality
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPContextualizing your Word Vectors
15
“The Issuer hereby agrees to hold and treat all Confidential Information”
J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.Pre-print 10/2018
1. Truly rely on sentence compositionality
From shallow to deep representations: each word is encoded via a sequence of computational blocks
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPContextualizing your Word Vectors
16
“The Issuer hereby agrees to hold and treat all Confidential Information”
J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.Pre-print 10/2018
1. Truly rely on sentence compositionality
The representation of a word at a certain layer depends of all the previous contextualized words.
Use self-attention to both handle variable length sentences and contextualize wrt. to all previous words!
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPContextualizing your Word Vectors
17
“The Issuer hereby agrees to hold and treat all Confidential Information”
J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.Pre-print 10/2018
1. Truly rely on sentence compositionality
1. Given current representations:
2. Compute similarity scores:
3. Compute weighted sum:
A. Vaswani et al. Attention is all you need. NIPS 2017D. Bahdanau et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPContextualizing your Word Vectors
18J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.Pre-print 10/2018
2. Find interesting unsupervised tasks
“The Issuer hereby agrees to hold and treat all Confidential Information”
a. Masked Language Model
“The Issuer hereby agrees to [...]” || “This Agreement shall terminate [...]”
b. Next sentence prediction
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPContextualizing your Word Vectors
19J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.Pre-print 10/2018
3. Hope you still have those cloud TPU credits
Dataset: BookCorpus (800M words) + English Wikipedia (2500M words)
According to the paper: english models took 4 days to pre-train on 16 to 64 TPUs (~500USD for a BERT-base model)
English + multilingual models released by Google
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPContextualizing your Word Vectors
20
New SOTA on GLUE-benchmark (10 various sentence or sentence-pair language understanding tasks)
J. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.Pre-print 10/2018
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPOne model to rule them all?
21
1. Start from a general representation trained on a large corpus of contracts
2. Finetune the representation on a smaller corpus of contracts more related to the task
3. Let transfer learning do the magic!
LeaseGoverning Law
Date
Representation Classification
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLPOur feedbacks on BERT
22
● Quite fast to finetune from BERT-base (minutes to hour)
● Finetuning on the training corpus is needed (compared to finetuning only on a general corpus)
● Finetuning only the extractor is already enough, but jointly learn BERT+classifier helps a little more
● More experiments should be done with >128 tokens and BERT-large
● Needs to evaluate the ratio (performance / price) before pushing it to production
Romain Vial – NLP Meetup S3#3 – 23/01/2019
Unsupervised Methods in NLP
23
Take-home message:
Sentence representation starts to be well understood empirically
Large document representation is still an open (and interesting) problem!
Romain Vial – NLP Meetup S3#3 – 23/01/2019
effective date23 / 01 / 2019
For 1 contract, we can extract dozen to hundreds of legal entities and clauses!
How to go from simple predictions to knowledge?
termination date
23 / 01 / 2022
organisation
person
John Doe
From predictions to knowledgeLook at your contract!
24
Romain Vial – NLP Meetup S3#3 – 23/01/2019
effective date23 / 01 / 2019
termination date
23 / 01 / 2022
organisation
person
John Doe
LEGAL REPRESENTATIVE
CONTRACT DURATION
From predictions to knowledgeLook at your contract!
25
Romain Vial – NLP Meetup S3#3 – 23/01/2019
effective date23 / 01 / 2019
termination date
23 / 01 / 2022
organisation
person
John Doe
LEGAL REPRESENTATIVE
CONTRACT DURATION
From predictions to knowledgeLook at your contract!
Two methods to extract knowledge:- contextual (contextualize your entities to
understand their type and relations)
- business rules (introduce some prior knowledge and business constraints)
26
Romain Vial – NLP Meetup S3#3 – 23/01/2019
From predictions to knowledgeLook at your contracts!
27
organisation
organisation
organisation
legal representative
person
Why should we only look at the current contract, when the information has probably been seen elsewhere?
person
John Doe
Big Corporation
Mr. John Doe
Romain Vial – NLP Meetup S3#3 – 23/01/2019
From predictions to knowledgeLook at your contracts!
28
organisation
organisation
organisation
legal representative
person
person
John Doe
Big Corporation
Mr. John Doe
Lega
l rep
rese
ntat
ive
Same as
Romain Vial – NLP Meetup S3#3 – 23/01/2019
From predictions to knowledgeLook at your contracts!
29
organisation
organisation
organisation
legal representative
person
person
John Doe
Big Corporation
Mr. John Doe
Knowledge Graph construction by
distant supervision
Lega
l rep
rese
ntat
ive
Same as
C. Lockard et al. CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web. VLDB 2018.
Romain Vial – NLP Meetup S3#3 – 23/01/2019
From predictions to knowledgeLook at your contracts!
30
organisation
organisation
organisation
legal representative
person
person
John Doe
Big Corporation
Lega
l rep
rese
ntat
ive
Same as
Knowledge and relations can be inferred from our previous contracts!
2 nice byproducts:- The contract base becomes queryable :
“Hyperlex, give me the counterparts of the company Big Corporation”- Incoherences can be spotted
Mr. John Doe
Romain Vial – NLP Meetup S3#3 – 23/01/2019
From predictions to knowledgeDefine your own ontology!
31
- Few contract types- Possibly huge volume
but by flow- Managing needs
(termination date, notice…)
Client 1: Lawyer Client 2: Logistics
- Lots of contract types- Possibly huge volume
but by batch- No need to manage
the contract
User-defined ontology
Romain Vial – NLP Meetup S3#3 – 23/01/2019
From predictions to knowledgeDefine your own ontology!
32
Client 1: Lawyer Client 2: Logistics
While users may look different, they probably have some common clauses (parties, duration, jurisdiction...) and entities (effective date, termination date…).
When possible, we suggest users to use labels from our legal ontology, to further improve models performance
- Lots of contract types- Possibly huge volume
but by batch- No need to manage
the contract
- Few contract types- Possibly huge volume
but by flow- Managing needs
(termination date, notice…)
User-defined ontology
Hyperlex Legal ontology
Romain Vial – NLP Meetup S3#3 – 23/01/2019
What’s next?
33
- Going deeper in contract understanding/contract summarization
What should pay/do the Lessee? What should pay/do the Lessor?
- Going broader in client’s contract base understanding
Bringing external knowledge to the contract base (SIREN, Légifrance, EDGAR...)
“Le Preneur tiendra les lieux loués de façon constante en parfait état [...]”“Le Bailleur s‘oblige à supporter la charge des travaux rendus nécessaires [...]”
Am I impacted by a change of regulation in this jurisdiction?