efficient labeling with active learning · 2020-04-27 · françois hu ensae - société générale...

27
Context Active learning Experimentations Efficient labeling with Active Learning François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société Générale Insurance) and Romuald ELIE (CREST-ENSAE) Online International Conference in Actuarial Science, Data Science and Finance (OICA), 2020 April 28, 2020 François HU Efficient labeling with Active Learning

Upload: others

Post on 11-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Efficient labeling with Active Learning

François HUENSAE - Société Générale Insurance

Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (SociétéGénérale Insurance) and Romuald ELIE (CREST-ENSAE)

Online International Conference in Actuarial Science, Data Science andFinance (OICA), 2020

April 28, 2020

François HU Efficient labeling with Active Learning

Page 2: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Summary

1 ContextMotivating applicationNotationIntuition

2 Active learningUncertainty-based active learningDisagreement-based active learningMore algorithms

3 ExperimentationsDataActive learningMini-batch active learning

François HU Efficient labeling with Active Learning

Page 3: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Motivating application

Insurance organisations store voluminous textual data on a daily basis :free text areas used by call center agents,e-mails,customer reviews,...

These textual data are valuable and can be used in many use cases ...optimize business processes,analyze customer expectations and opinions,control compliance (GDPR type) and fight against fraud, ...

... howeverit is impossible for human experts to analyse all these quantities,and the data usually comes unlabelled

Solution : exploit this large pool of unlabelled data with Active Learning

François HU Efficient labeling with Active Learning

Page 4: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Motivating application

Insurance organisations store voluminous textual data on a daily basis :free text areas used by call center agents,e-mails,customer reviews,...

These textual data are valuable and can be used in many use cases ...optimize business processes,analyze customer expectations and opinions,control compliance (GDPR type) and fight against fraud, ...

... howeverit is impossible for human experts to analyse all these quantities,and the data usually comes unlabelled

Solution : exploit this large pool of unlabelled data with Active Learning

François HU Efficient labeling with Active Learning

Page 5: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Motivating application

Insurance organisations store voluminous textual data on a daily basis :free text areas used by call center agents,e-mails,customer reviews,...

These textual data are valuable and can be used in many use cases ...optimize business processes,analyze customer expectations and opinions,control compliance (GDPR type) and fight against fraud, ...

... howeverit is impossible for human experts to analyse all these quantities,and the data usually comes unlabelled

Solution : exploit this large pool of unlabelled data with Active Learning

François HU Efficient labeling with Active Learning

Page 6: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Motivating application

Insurance organisations store voluminous textual data on a daily basis :free text areas used by call center agents,e-mails,customer reviews,...

These textual data are valuable and can be used in many use cases ...optimize business processes,analyze customer expectations and opinions,control compliance (GDPR type) and fight against fraud, ...

... howeverit is impossible for human experts to analyse all these quantities,and the data usually comes unlabelled

Solution : exploit this large pool of unlabelled data with Active Learning

François HU Efficient labeling with Active Learning

Page 7: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Notation and goal

Notations :

let X be the instance space, Y the label space and H : X → Y a class ofhypotheses with finite VC dimension d

let P be the distribution over X × Y and PX the marginal of P over X . Inpractice instead of PX we have a pool of unlabeled data U = (x

(pool)i )Ui=1

Goal : label a sub-sample of U in order to construct an optimal training setL = {(x (train)i , y

(train)i )}Li=1 for our learning algorithm A (which give us h ∈ H)

For any h ∈ H, define :

Risk : R(h) = P(h(x) 6= y)

Empirical risk : R{(xi ,yi )}ni=1(h) = 1

n

∑ni=1 1 (h(xi ) 6= yi )

Given a holdout set (or test set) T = {(x (test)i , y(test)i )}Ti=1, our aim is to produce

a highly-accurate classifier (i.e. minimize RT (h) ) using as few labels as possible.

François HU Efficient labeling with Active Learning

Page 8: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Notation and goal

Notations :

let X be the instance space, Y the label space and H : X → Y a class ofhypotheses with finite VC dimension d

let P be the distribution over X × Y and PX the marginal of P over X . Inpractice instead of PX we have a pool of unlabeled data U = (x

(pool)i )Ui=1

Goal : label a sub-sample of U in order to construct an optimal training setL = {(x (train)i , y

(train)i )}Li=1 for our learning algorithm A (which give us h ∈ H)

For any h ∈ H, define :

Risk : R(h) = P(h(x) 6= y)

Empirical risk : R{(xi ,yi )}ni=1(h) = 1

n

∑ni=1 1 (h(xi ) 6= yi )

Given a holdout set (or test set) T = {(x (test)i , y(test)i )}Ti=1, our aim is to produce

a highly-accurate classifier (i.e. minimize RT (h) ) using as few labels as possible.

François HU Efficient labeling with Active Learning

Page 9: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Notation and goal

Notations :

let X be the instance space, Y the label space and H : X → Y a class ofhypotheses with finite VC dimension d

let P be the distribution over X × Y and PX the marginal of P over X . Inpractice instead of PX we have a pool of unlabeled data U = (x

(pool)i )Ui=1

Goal : label a sub-sample of U in order to construct an optimal training setL = {(x (train)i , y

(train)i )}Li=1 for our learning algorithm A (which give us h ∈ H)

For any h ∈ H, define :

Risk : R(h) = P(h(x) 6= y)

Empirical risk : R{(xi ,yi )}ni=1(h) = 1

n

∑ni=1 1 (h(xi ) 6= yi )

Given a holdout set (or test set) T = {(x (test)i , y(test)i )}Ti=1, our aim is to produce

a highly-accurate classifier (i.e. minimize RT (h) ) using as few labels as possible.

François HU Efficient labeling with Active Learning

Page 10: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Notation and goal

Notations :

let X be the instance space, Y the label space and H : X → Y a class ofhypotheses with finite VC dimension d

let P be the distribution over X × Y and PX the marginal of P over X . Inpractice instead of PX we have a pool of unlabeled data U = (x

(pool)i )Ui=1

Goal : label a sub-sample of U in order to construct an optimal training setL = {(x (train)i , y

(train)i )}Li=1 for our learning algorithm A (which give us h ∈ H)

For any h ∈ H, define :

Risk : R(h) = P(h(x) 6= y)

Empirical risk : R{(xi ,yi )}ni=1(h) = 1

n

∑ni=1 1 (h(xi ) 6= yi )

Given a holdout set (or test set) T = {(x (test)i , y(test)i )}Ti=1, our aim is to produce

a highly-accurate classifier (i.e. minimize RT (h) ) using as few labels as possible.

François HU Efficient labeling with Active Learning

Page 11: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Passive Learning : a naive solution

Passive Learning : sample x(train)i , . . . , x

(train)L i .i .d ∼ PX then request their label

Figure: Conventional passive learningFigure: an illustration of passive learning

François HU Efficient labeling with Active Learning

Page 12: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Active Learning : a better solution

Active Learning : Let a learning algorithm sequentially requests the labels of U

Figure: Conventional active learningFigure: an illustration of active learning

François HU Efficient labeling with Active Learning

Page 13: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Motivating applicationNotationIntuition

Active Learning : a better solution

[Hanneke, 14] : Let h∗ the optimalBayes classifer and

ε = R(h)− R(h∗)

Then under a given hypothesis (boundednoise) and active learning algorithm (A2)

passive learning :

ε ∼ d

n

active learning :

ε ∼ exp(−constant · n

d

)François HU Efficient labeling with Active Learning

Page 14: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Uncertainty-based active learningDisagreement-based active learningMore algorithms

Active learning

1 ContextMotivating applicationNotationIntuition

2 Active learningUncertainty-based active learningDisagreement-based active learningMore algorithms

3 ExperimentationsDataActive learningMini-batch active learning

François HU Efficient labeling with Active Learning

Page 15: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Uncertainty-based active learningDisagreement-based active learningMore algorithms

Uncertainty Sampling

Uncertainty Sampling : label the instances for which the current model is leastcertain as to what the correct output should be.

Example : for binary classification, label the instances whose posterior probabilityof being positive is nearest 0.5 :

x (train) = argminx∈U{|P(y = 1|x)− 0.5|}

Entropy-based active learning [Lewis and Gale, 94] :

x(train)H = argmax

x∈U

−∑y∈Y

P(y |x) logP(y |x)

François HU Efficient labeling with Active Learning

Page 16: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Uncertainty-based active learningDisagreement-based active learningMore algorithms

Uncertainty Sampling

x(train)H = argmax

x∈U

−∑y∈Y

P(y |x) logP(y |x)

Figure: Entropy score for two instances : x (left) and x ′ (right)

François HU Efficient labeling with Active Learning

Page 17: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Uncertainty-based active learningDisagreement-based active learningMore algorithms

Query By Committee

Query By Committee : construct a committee of models C = {h1, . . . , hN}trained on the current labeled data L. The most informative query is consideredto be the instance about which the "committee" disagrees the most

The committee C = {h1, . . . , hN}:Query by bagging (Qbag) [Abe and Mamitsuka, 98] : Bootstrap N times Lthen train a learning algorithm on each bootstrapped data

Measure of disagreement :Average Kullback Leibler divergence [MacCallum and Nigam, 98] :

x(train)KL = argmax

x∈U

{1N

N∑i=1

D(Phi||Pcommittee)

}where

D(Phi||Pcommittee) =

∑y∈Y

Phi(y |x) log

{Phi

(y |x)Pcommittee(y |x)

}

François HU Efficient labeling with Active Learning

Page 18: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Uncertainty-based active learningDisagreement-based active learningMore algorithms

Query By Committee

x(train)KL = argmax

x∈U

{1N

N∑i=1

D(Phi||Pcommittee)

}

Figure: Distribution of the committee for two instances : x (left) and x ′ (right)

François HU Efficient labeling with Active Learning

Page 19: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Uncertainty-based active learningDisagreement-based active learningMore algorithms

Other types of Active Learning

Another disagreement-based active learning : Agnostic Active Learning(A2) [Hanneke, 14]

Expected Model Change : Sample instances that would impact thegreatest change to the current model if we knew its label (example :"expected gradient length" (EGL) [Settles et al, 08])

Expected Error Reduction : Sample instances that would make itsgeneralization error likely to be reduced

Density Weighted Sampling [Settles and Craven, 08] :

x(train)density = argmax

x∈U

{φA(x)×

(1U

∑x′∈U

sim(x , x ′)

)}

with φA a measure of informativeness of x according to some samplingstrategy A (uncertainty sampling, QBC, ...)

François HU Efficient labeling with Active Learning

Page 20: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

DataActive learningMini-batch active learning

Experimentations

1 ContextMotivating applicationNotationIntuition

2 Active learningUncertainty-based active learningDisagreement-based active learningMore algorithms

3 ExperimentationsDataActive learningMini-batch active learning

François HU Efficient labeling with Active Learning

Page 21: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

DataActive learningMini-batch active learning

Text data : Net Promoter Score (NPS)

About the data : Net Promoter Score

1 Score : the client’s score of an insurance product (score between 0 and 10)

2 Verbatim : explanation (in french) of the score by the client

Encoding the text data : word2vec [Mikolov 2013]

Sentiment analysis : Y = {0, 1} = {score ≤ 6, score > 6}

Mini-batch sampling algorithm : until we reach a stopping criterion,

1 train our model h on the training set L

2 select the k most informative samples x (train)1 , . . . , x(train)k from the pool set U

3 U ← U − {x (train)1 , . . . , x(train)k } and L ← L ∪ {x (train)1 , . . . , x

(train)k }

François HU Efficient labeling with Active Learning

Page 22: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

DataActive learningMini-batch active learning

Passive Learning

Learning model : XGBoost

Sampling strategy : random sampling

Initial training size / mini batch size : 200 / 100

Stopping criterion : 25 000

François HU Efficient labeling with Active Learning

Page 23: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

DataActive learningMini-batch active learning

Active Learning

Learning model : XGBoost

Sampling strategy : random samplinguncertainty-based sampling (Entropy, Margin)disagreement-based sampling (Qbag)density-based sampling (DensityWeighted)

Initial training size / mini batch size : 800 / 200

Stopping criterion : 6 000

François HU Efficient labeling with Active Learning

Page 24: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

DataActive learningMini-batch active learning

Mini-batch active learning

Learning model : XGBoost

Sampling strategy :

random samplingentropy sampling

Initial training size / mini batchsize : 800 / (50, 100, 200)

Stopping criterion : 6 000

François HU Efficient labeling with Active Learning

Page 25: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

DataActive learningMini-batch active learning

Mini-batch active learning

Learning model : XGBoost

Sampling strategy : entropysampling

Initial training size / mini batchsize : 1000 / (200, 2000)

Stopping criterion : 13 000

François HU Efficient labeling with Active Learning

Page 26: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

Conclusion

For real text database :

Construct a good classifier if the labeled data is available ;⇒ Power many use cases

Compared to passive learning, active sampling can construct a morehighly-accurate classifier ;⇒ Reduce the cost of annotation (here at least 4 times)

In this context : the mini-batch size can vary between 1 and 2000.⇒ Speed up the annotation process

François HU Efficient labeling with Active Learning

Page 27: Efficient labeling with Active Learning · 2020-04-27 · François HU ENSAE - Société Générale Insurance Joint work with Caroline HILLAIRET (CREST-ENSAE), Marc JUILLARD (Société

ContextActive learning

Experimentations

References

1 Steve Hanneke, "Theory of Active Learning", 2014

2 Naoki Abe and Hiroshi Mamitsuka "Query Learning Strategies usingBoosting and Bagging", 1998

3 D. Lewis and W. Gale, "A sequential algorithm for training text classifiers",ACM SIGIR Conference on Research and Development in InformationRetrieval, 1994.

4 McCallum and K. Nigam. "Employing EM in pool-based active learning fortext classification". ICML Confenrence, 1998

5 Settles and M. Craven. "An analysis of active learning strategies forsequence labeling tasks". EMNLP Conference, 2008.

6 Mikolov et al., "Distributed Representations ofWords and Phrases and theirCompositionality", Neural information processing systems, 2013

7 Burr Settles, "Survey of Active Learning", 2012

François HU Efficient labeling with Active Learning