guiding semi-supervision with constraint-driven learningdanroth/teaching/cis... · bene ts: simple...
Post on 22-Sep-2020
0 Views
Preview:
TRANSCRIPT
Guiding Semi-Supervision with Constraint-DrivenLearning
Ming-Wei Chang1 Lev Ratinov2 Dan Roth3
1Department of Computer ScienceUniversity of Illinois at Urbana-Champaign
Paper presentation by: Drew Stone
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 1 /
39
Outline
1 BackgroundSemi-supervised learningConstraint-driven learning
2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm
3 Experiments
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 2 /
39
Outline
1 BackgroundSemi-supervised learningConstraint-driven learning
2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm
3 Experiments
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 3 /
39
Semi-supervised learningIntroduction
Question: When labelled data is scarce, how can we take advantageof unlabelled data for training?
Intuition: If there exists structure in the underlying distribution ofsamples, points close/clustered to one another may share labels
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 4 /
39
Semi-supervised learningFramework
Given a model M trained on labelled samples from a distribution Dand an unlabelled set of examples U ⊆ Dm
Learn labels for each example in U −→ Learn(U,M) and re-useexamples (now labelled) to tune M−→M∗
Benefit: Access to more training data
Drawback: Learned model might drift from correct classifier if theassumptions on the distribution do not hold.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 5 /
39
Semi-supervised learningFramework
Given a model M trained on labelled samples from a distribution Dand an unlabelled set of examples U ⊆ Dm
Learn labels for each example in U −→ Learn(U,M) and re-useexamples (now labelled) to tune M−→M∗
Benefit: Access to more training data
Drawback: Learned model might drift from correct classifier if theassumptions on the distribution do not hold.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 5 /
39
Semi-supervised learningFramework
Given a model M trained on labelled samples from a distribution Dand an unlabelled set of examples U ⊆ Dm
Learn labels for each example in U −→ Learn(U,M) and re-useexamples (now labelled) to tune M−→M∗
Benefit: Access to more training data
Drawback: Learned model might drift from correct classifier if theassumptions on the distribution do not hold.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 5 /
39
Outline
1 BackgroundSemi-supervised learningConstraint-driven learning
2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm
3 Experiments
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 6 /
39
Constraint-driven learningIntroduction
Motivation: Keep the learned model simple by using constraints tobalance over-simplicity
Benefits: Simple models (less features) are more computationallyefficient
Intuition: Fix a set of task-specific constraints to enable the use of asimple machine learning model but encode task-specific constraints tomake both learning easier and more correct.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 7 /
39
Constraint-driven learningIntroduction
Motivation: Keep the learned model simple by using constraints tobalance over-simplicity
Benefits: Simple models (less features) are more computationallyefficient
Intuition: Fix a set of task-specific constraints to enable the use of asimple machine learning model but encode task-specific constraints tomake both learning easier and more correct.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 7 /
39
Constraint-driven learningIntroduction
Motivation: Keep the learned model simple by using constraints tobalance over-simplicity
Benefits: Simple models (less features) are more computationallyefficient
Intuition: Fix a set of task-specific constraints to enable the use of asimple machine learning model but encode task-specific constraints tomake both learning easier and more correct.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 7 /
39
Constraint-driven learningFramework
Given an objective function
argmaxy λ · F (x , y)
Define the set of linear (non-linear) constraints {Ci}i≤k
Ci : X × Y −→ {0, 1}
Solve the optimization problem given the constraints
Any Boolean rule can be encoded as a set of linear inequalities.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 8 /
39
Constraint-driven learningFramework
Given an objective function
argmaxy λ · F (x , y)
Define the set of linear (non-linear) constraints {Ci}i≤k
Ci : X × Y −→ {0, 1}
Solve the optimization problem given the constraints
Any Boolean rule can be encoded as a set of linear inequalities.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 8 /
39
Constraint-driven learningExample task
Sequence tagging with HMM/CRF + global constraints
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 9 /
39
Constraint-driven learningSequence tagging constraints
Unique label for each word
Edges must for a path
A verb must exist
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 10 /
39
Constraint-driven learningExample task
Semantic role labelling with independent classifiers + globalconstraints
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 11 /
39
Constraint-driven learningExample task
Each verb predicate carries with it a new inference problem
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 12 /
39
Constraint-driven learningExample task
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 13 /
39
Constraint-driven learningSRL constraints
No duplicate argument classes
Unique labels
Order constraints
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 14 /
39
Outline
1 BackgroundSemi-supervised learningConstraint-driven learning
2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm
3 Experiments
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 15 /
39
Constraint-drive learning w/ semi-supervisionIntroduction
Intuition: Use constraints to provide better training samples from ourunlabelled set of samples
Benefit: Deviations from simple model only do so towards a moreexpressive answer, since constraints guide the model
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 16 /
39
Constraint-drive learning w/ semi-supervisionIntroduction
Intuition: Use constraints to provide better training samples from ourunlabelled set of samples
Benefit: Deviations from simple model only do so towards a moreexpressive answer, since constraints guide the model
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 16 /
39
Constraint-drive learning w/ semi-supervisionExamples
Consider the constraint over {0, 1}n of 1∗0∗. For every possiblesequence {1∗010∗}, there are 2 good fixes {1∗110∗, 1∗000∗}. What isthe best approach for learning?
Consider the constraint, state transitions must occur on punctuationmarks. There could be many good sequences abiding by thisconstraint.
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 17 /
39
Outline
1 BackgroundSemi-supervised learningConstraint-driven learning
2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm
3 Experiments
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 18 /
39
Constraint-driven learning w/ semi-supervisionModel
Consider a data and output domain X , Y and a distribution Ddefined over X × YInput/output pairs (x , y) ∈ X × Y are given as sequence pairs:
x = (x1, · · · , xN), y = (y1, · · · , yM)
We wish to find a structured output classifier h : X −→ Y that uses astructured scoring function f : X ×Y −→ R to choose the most likelyoutput sequence, where the correct output sequence is given thehighest score
We are given (or define) a set of constraints {Ci}i≤K
Ci : X × Y −→ {0, 1}
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 19 /
39
Outline
1 BackgroundSemi-supervised learningConstraint-driven learning
2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm
3 Experiments
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 20 /
39
Constraint-driven learning w/ semi-supervisionScoring
Scoring rule:
f (x , y) =M∑i=1
λi fi (x , y) = λ · F (x , y) ≡ wT · φ(x , y)
Compatible for any linear discriminative and generative model such asHMMs and CRFs
Local feature functions {fi}i≤M allow for tractable inference bycapturing contextual structure
Space of structured pairs could be hugeLocally there are smaller distances to account for
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 21 /
39
Outline
1 BackgroundSemi-supervised learningConstraint-driven learning
2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm
3 Experiments
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 22 /
39
Constraint-driven learning w/ semi-supervisionSample constraints
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 23 /
39
Constraint-driven learning w/ semi-supervisionConstraint pipeline (hard)
Define 1C(x) ⊆ Y as the set of output sequences that assign the value1 for a given (x , y)
Our objective function then becomes
argmaxy∈1C(x)λ · F (x , y)
Intuition: Find best output sequence y that maximizes the score,fitting the hard constraints
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 24 /
39
Constraint-driven learning w/ semi-supervisionConstraint pipeline (hard)
Define 1C(x) ⊆ Y as the set of output sequences that assign the value1 for a given (x , y)
Our objective function then becomes
argmaxy∈1C(x)λ · F (x , y)
Intuition: Find best output sequence y that maximizes the score,fitting the hard constraints
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 24 /
39
Constraint-driven learning w/ semi-supervisionConstraint pipeline (soft)
Given a suitable distance metric between an output sequence and thespace of outputs respecting a single constraint
Given a set of soft constraints {Ci}i≤K with penalties {ρi}i≤K , weget a new objective function:
argmaxy λ · F (x , y)−K∑i=1
ρi · d(y , 1Ci (x))
Goal: Find best output sequence under our model that violates theleast amount of constraints
Intuition: Given a learned model, λ · F (x , y), we can bias itsdecisions using the amount by which the output sequence violateseach soft constraint
Option: Use minimal Hamming distance to a sequence
d(y , 1Ci (x)) = miny ′∈Ci (x)H(y , y ′)
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 25 /
39
Constraint-driven learning w/ semi-supervisionConstraint pipeline (soft)
Given a suitable distance metric between an output sequence and thespace of outputs respecting a single constraint
Given a set of soft constraints {Ci}i≤K with penalties {ρi}i≤K , weget a new objective function:
argmaxy λ · F (x , y)−K∑i=1
ρi · d(y , 1Ci (x))
Goal: Find best output sequence under our model that violates theleast amount of constraints
Intuition: Given a learned model, λ · F (x , y), we can bias itsdecisions using the amount by which the output sequence violateseach soft constraint
Option: Use minimal Hamming distance to a sequence
d(y , 1Ci (x)) = miny ′∈Ci (x)H(y , y ′)
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 25 /
39
Constraint-driven learning w/ semi-supervisionConstraint pipeline (soft)
Given a suitable distance metric between an output sequence and thespace of outputs respecting a single constraint
Given a set of soft constraints {Ci}i≤K with penalties {ρi}i≤K , weget a new objective function:
argmaxy λ · F (x , y)−K∑i=1
ρi · d(y , 1Ci (x))
Goal: Find best output sequence under our model that violates theleast amount of constraints
Intuition: Given a learned model, λ · F (x , y), we can bias itsdecisions using the amount by which the output sequence violateseach soft constraint
Option: Use minimal Hamming distance to a sequence
d(y , 1Ci (x)) = miny ′∈Ci (x)H(y , y ′)
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 25 /
39
Constraint-driven learning w/ semi-supervisionDistance example
Taken from CCM-NAACL-12-Tutorial
d(y , 1Ci (x)) =M∑j=1
φCi(yj) −→ count violations
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 26 /
39
Constraint-driven learningRecall: sequence tagging
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 27 /
39
Constraint-driven learningRecall: sequence tagging
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 28 /
39
Constraint-driven learningRecall: sequence tagging
(a) correct citation parsing
(b) HMM predicted citation parsing
Adding punctuation state transition constraint returns correct parsingunder the same HMM
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 29 /
39
Constraint-driven learningRecall: SRL
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 30 /
39
Constraint-driven learningRecall: SRL
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 31 /
39
Constraint-driven learningRecall: SRL
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 32 /
39
Outline
1 BackgroundSemi-supervised learningConstraint-driven learning
2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm
3 Experiments
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 33 /
39
Constraint-driven learning w/ semi-supervisionCODL algorithm
Constraints are used ininference not learning
Maximum likelihood estimationof λ (EM algorithm)
Semi-supervision occurs on line7,9 (i.e. for each unlabelledsample, we generate K trainingexamples)
All unlabelled samples arere-labelled each training cycle,unlike ”self-training”perspective
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 34 /
39
Constraint-driven learning w/ semi-supervisionExpectation Maximization: Top K -hard EM
EM procedures typically assign a distribution over all input/outputpairs for a given unlabelled x ∈ XInstead, we choose top K , y ∈ Y maximizing our soft objectivefunction and assign uniform probability to each output
Constraints mutate distribution each step
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 35 /
39
Constraint-driven learning w/ semi-supervisionMotivation for K > 1
There may be multiple ways tocorrect constraint-violatingsamples
We have access to moretraining data
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 36 /
39
ExperimentsCitations
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 37 /
39
ExperimentsHMM
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 38 /
39
Pictures taken from:
CCM-NAACL-12-Tutorial
Chang, M. W., Ratinov, L., & Roth, D. (2007, June). Guidingsemi-supervision with constraint-driven learning. In ACL (pp.280-287).
Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 39 /
39
top related