guiding semi-supervision with constraint-driven learningdanroth/teaching/cis... · bene ts: simple...

Guiding Semi-Supervision with Constraint-DrivenLearning

Ming-Wei Chang1 Lev Ratinov2 Dan Roth3

1Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

Paper presentation by: Drew Stone

Ming-Wei Chang Lev Ratinov Dan Roth (University of Illinois at Urbana-Champaign)Guiding Semi-Supervision with Constraint-Driven LearningPaper presentation by: Drew Stone 1 /

Outline

1 BackgroundSemi-supervised learningConstraint-driven learning

2 Constraint-driven learning with semi-supervisionIntroductionModelScoringConstraint pipelineConstraint Driven Learning (CODL) algorithm

3 Experiments

Outline

3 Experiments

Semi-supervised learningIntroduction

Question: When labelled data is scarce, how can we take advantageof unlabelled data for training?

Intuition: If there exists structure in the underlying distribution ofsamples, points close/clustered to one another may share labels

Semi-supervised learningFramework

Given a model M trained on labelled samples from a distribution Dand an unlabelled set of examples U ⊆ Dm

Learn labels for each example in U −→ Learn(U,M) and re-useexamples (now labelled) to tune M−→M∗

Benefit: Access to more training data

Drawback: Learned model might drift from correct classifier if theassumptions on the distribution do not hold.

Outline

3 Experiments

Constraint-driven learningIntroduction

Motivation: Keep the learned model simple by using constraints tobalance over-simplicity

Benefits: Simple models (less features) are more computationallyefficient

Intuition: Fix a set of task-specific constraints to enable the use of asimple machine learning model but encode task-specific constraints tomake both learning easier and more correct.

Constraint-driven learningFramework

Given an objective function

argmaxy λ · F (x , y)

Define the set of linear (non-linear) constraints {Ci}i≤k

Ci : X × Y −→ {0, 1}

Solve the optimization problem given the constraints

Any Boolean rule can be encoded as a set of linear inequalities.

Constraint-driven learningFramework

Given an objective function

argmaxy λ · F (x , y)

Define the set of linear (non-linear) constraints {Ci}i≤k

Ci : X × Y −→ {0, 1}

Solve the optimization problem given the constraints

Any Boolean rule can be encoded as a set of linear inequalities.

Constraint-driven learningExample task

Sequence tagging with HMM/CRF + global constraints

Constraint-driven learningSequence tagging constraints

Unique label for each word

Edges must for a path

A verb must exist

Semantic role labelling with independent classifiers + globalconstraints

Each verb predicate carries with it a new inference problem

Constraint-driven learningSRL constraints

No duplicate argument classes

Unique labels

Order constraints

Outline

3 Experiments

Constraint-drive learning w/ semi-supervisionIntroduction

Intuition: Use constraints to provide better training samples from ourunlabelled set of samples

Benefit: Deviations from simple model only do so towards a moreexpressive answer, since constraints guide the model

Constraint-drive learning w/ semi-supervisionIntroduction

Intuition: Use constraints to provide better training samples from ourunlabelled set of samples

Benefit: Deviations from simple model only do so towards a moreexpressive answer, since constraints guide the model

Constraint-drive learning w/ semi-supervisionExamples

Consider the constraint over {0, 1}n of 1∗0∗. For every possiblesequence {1∗010∗}, there are 2 good fixes {1∗110∗, 1∗000∗}. What isthe best approach for learning?

Consider the constraint, state transitions must occur on punctuationmarks. There could be many good sequences abiding by thisconstraint.

Outline

3 Experiments

Constraint-driven learning w/ semi-supervisionModel

Consider a data and output domain X , Y and a distribution Ddefined over X × YInput/output pairs (x , y) ∈ X × Y are given as sequence pairs:

x = (x1, · · · , xN), y = (y1, · · · , yM)

We wish to find a structured output classifier h : X −→ Y that uses astructured scoring function f : X ×Y −→ R to choose the most likelyoutput sequence, where the correct output sequence is given thehighest score

We are given (or define) a set of constraints {Ci}i≤K

Ci : X × Y −→ {0, 1}

Outline

3 Experiments

Constraint-driven learning w/ semi-supervisionScoring

Scoring rule:

f (x , y) =M∑i=1

λi fi (x , y) = λ · F (x , y) ≡ wT · φ(x , y)

Compatible for any linear discriminative and generative model such asHMMs and CRFs

Local feature functions {fi}i≤M allow for tractable inference bycapturing contextual structure

Space of structured pairs could be hugeLocally there are smaller distances to account for

Outline

3 Experiments

Constraint-driven learning w/ semi-supervisionSample constraints

Constraint-driven learning w/ semi-supervisionConstraint pipeline (hard)

Define 1C(x) ⊆ Y as the set of output sequences that assign the value1 for a given (x , y)

Our objective function then becomes

argmaxy∈1C(x)λ · F (x , y)

Intuition: Find best output sequence y that maximizes the score,fitting the hard constraints

Constraint-driven learning w/ semi-supervisionConstraint pipeline (hard)

Define 1C(x) ⊆ Y as the set of output sequences that assign the value1 for a given (x , y)

Our objective function then becomes

argmaxy∈1C(x)λ · F (x , y)

Intuition: Find best output sequence y that maximizes the score,fitting the hard constraints

Constraint-driven learning w/ semi-supervisionConstraint pipeline (soft)

Given a suitable distance metric between an output sequence and thespace of outputs respecting a single constraint

Given a set of soft constraints {Ci}i≤K with penalties {ρi}i≤K , weget a new objective function:

argmaxy λ · F (x , y)−K∑i=1

ρi · d(y , 1Ci (x))

Goal: Find best output sequence under our model that violates theleast amount of constraints

Intuition: Given a learned model, λ · F (x , y), we can bias itsdecisions using the amount by which the output sequence violateseach soft constraint

Option: Use minimal Hamming distance to a sequence

d(y , 1Ci (x)) = miny ′∈Ci (x)H(y , y ′)

ρi · d(y , 1Ci (x))

Constraint-driven learning w/ semi-supervisionDistance example

Taken from CCM-NAACL-12-Tutorial

d(y , 1Ci (x)) =M∑j=1

φCi(yj) −→ count violations

Constraint-driven learningRecall: sequence tagging

(a) correct citation parsing

(b) HMM predicted citation parsing

Adding punctuation state transition constraint returns correct parsingunder the same HMM

Constraint-driven learningRecall: SRL

Outline

3 Experiments

Constraint-driven learning w/ semi-supervisionCODL algorithm

Constraints are used ininference not learning

Maximum likelihood estimationof λ (EM algorithm)

Semi-supervision occurs on line7,9 (i.e. for each unlabelledsample, we generate K trainingexamples)

All unlabelled samples arere-labelled each training cycle,unlike ”self-training”perspective

Constraint-driven learning w/ semi-supervisionExpectation Maximization: Top K -hard EM

EM procedures typically assign a distribution over all input/outputpairs for a given unlabelled x ∈ XInstead, we choose top K , y ∈ Y maximizing our soft objectivefunction and assign uniform probability to each output

Constraints mutate distribution each step

Constraint-driven learning w/ semi-supervisionMotivation for K > 1

There may be multiple ways tocorrect constraint-violatingsamples

We have access to moretraining data

ExperimentsCitations

ExperimentsHMM

Pictures taken from:

CCM-NAACL-12-Tutorial

Chang, M. W., Ratinov, L., & Roth, D. (2007, June). Guidingsemi-supervision with constraint-driven learning. In ACL (pp.280-287).

guiding semi-supervision with constraint-driven learningdanroth/teaching/cis... · bene ts: simple...

Documents

“non si può pensare bene, amare bene, dormire...

intuition & briskmanag

statistical justiﬁcations for computationally tractable...

tutto bene

computationally eﬃcient receiver design for mitigating...

chatelet intuition geometrique intuition physique

19 research on intuition using intuition

entrepreneurial intuition, an empirical approach - aabri ·...

computationally modeling interpersonal trust

from cryptoverif specifications to computationally secure...

technical intuition

errata in intuition pumps intuition pumps and other tools...

towards computationally sound symbolic security...

computationally efficient mimo hsdpa system ... - … ·...

computationally-efficient approximation mechanisms

nutrition intuition

regarding intuition

intuition - house of montclairs · 2020. 3. 22. ·...

computationally independent model and service...

programming intuition