algorithms for nlptbergkir/11711fa17/fa17 11-711...proposal? en vertu de les nouvelle propositions,...

37
Classification II Taylor Berg-Kirkpatrick – CMU Slides: Dan Klein – UC Berkeley Algorithms for NLP

Upload: others

Post on 22-Dec-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

ClassificationIITaylorBerg-Kirkpatrick– CMU

Slides:DanKlein– UCBerkeley

AlgorithmsforNLP

Page 2: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

MinimizeTrainingError?§ Alossfunctiondeclareshowcostlyeachmistakeis

§ E.g.0lossforcorrectlabel,1lossforwronglabel§ Canweightmistakesdifferently(e.g.falsepositivesworsethanfalse

negativesorHammingdistanceoverstructuredlabels)

§ Wecould,inprinciple,minimizetrainingloss:

§ Thisisahard,discontinuousoptimizationproblem

Page 3: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Examples:Perceptron

§ SeparableCase

3

Page 4: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Examples:Perceptron

§ Non-SeparableCase

4

Page 5: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

ObjectiveFunctions

§ Whatdowewantfromourweights?§ Depends!§ Sofar:minimize(training)errors:

§ Thisisthe“zero-oneloss”§ Discontinuous,minimizingisNP-complete

§ MaximumentropyandSVMshaveotherobjectivesrelatedtozero-oneloss

Page 6: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

LinearModels:MaximumEntropy

§ Maximumentropy(logisticregression)§ Usethescoresasprobabilities:

§ Maximizethe(log)conditionallikelihoodoftrainingdata

Make positive

Normalize

Page 7: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

MaximumEntropyII

§ Motivationformaximumentropy:§ Connectiontomaximumentropyprinciple(sortof)§ Mightwanttodoagoodjobofbeinguncertainonnoisycases…

§ …inpractice,though,posteriorsareprettypeaked

§ Regularization(smoothing)

Page 8: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Log-Loss§ Ifweviewmaxentasaminimizationproblem:

§ Thisminimizesthe“logloss”oneachexample

§ Oneview:loglossisanupperbound onzero-oneloss

Page 9: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

MaximumMargin

§ Non-separableSVMs§ Addslacktotheconstraints§ Makeobjectivepay(linearly)forslack:

§ Ciscalledthecapacity oftheSVM– thesmoothingknob

§ Learning:§ CanstillstickthisintoMatlabifyouwant§ Constrainedoptimizationishard;bettermethods!§ We’llcomebacktothislater

Note: exist other choices of how to penalize slacks!

Page 10: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

RememberSVMs…

§ Wehadaconstrained minimization

§ …butwecansolveforxi

§ Giving

Page 11: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

HingeLoss

§ Thisiscalledthe“hingeloss”§ Unlikemaxent /logloss,youstop

gainingobjectiveoncethetruelabelwinsbyenough

§ YoucanstartfromhereandderivetheSVMobjective

§ Cansolvedirectlywithsub-gradientdecent(e.g.Pegasos:Shalev-Shwartz etal07)

§ Consider the per-instance objective:

Plot really only right in binary case

Page 12: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Maxvs“Soft-Max”Margin

§ SVMs:

§ Maxent:

§ Verysimilar!Bothtrytomakethetruescorebetterthanafunctionoftheotherscores§ TheSVMtriestobeattheaugmentedrunner-up§ TheMaxentclassifiertriestobeatthe“soft-max”

You can make this zero

… but not this one

Page 13: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

LossFunctions:Comparison

§ Zero-OneLoss

§ Hinge

§ Log

Page 14: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Structure

Page 15: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Handwritingrecognition

brace

Sequential structure

x y

[Slides:Taskar andKlein05]

Page 16: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

CFGParsing

The screen was a sea of red

Recursive structure

x y

Page 17: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

BilingualWordAlignment

What is the anticipated cost of collecting fees under the new proposal?

En vertu de nouvelle propositions, quel est le côut prévu de perception de les droits?

x yWhat

is the

anticipatedcost

ofcollecting

fees under

the new

proposal?

En vertu delesnouvelle propositions, quel est le côut prévu de perception de le droits?

Combinatorial structure

Page 18: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

StructuredModels

Assumption:

Scoreisasumoflocal“part”scores

Parts=nodes,edges,productions

space of feasible outputs

Page 19: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

NamedEntityRecognition

Apple Computer bought Smart Systems Inc. located in Arkansas.

ORG ORG --- ORG ORG ORG --- --- LOC

Page 20: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Bilingualwordalignment

§ association§ position§ orthography

Whatis

theanticipated

costof

collecting fees

under the

new proposal

?

En vertu delesnouvelle propositions, quel est le côut prévu de perception de le droits?

j

k

Page 21: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

EfficientDecoding§ Commoncase:youhaveablackboxwhichcomputes

atleastapproximately,andyouwanttolearnw

§ Easiestoptionisthestructuredperceptron[Collins01]§ Structureentershereinthatthesearchforthebestyistypicallya

combinatorialalgorithm(dynamicprogramming,matchings,ILPs,A*…)§ Predictionisstructured,learningupdateisnot

Page 22: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

StructuredMargin(Primal)

Rememberourprimalmarginobjective?

minw

1

2kwk22 + C

X

i

✓max

y

�w>fi(y) + `i(y)

�� w>fi(y

⇤i )

Stillapplieswithstructuredoutputspace!

Page 23: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

StructuredMargin(Primal)

minw

1

2kwk22 + C

X

i

�w>fi(y) + `i(y)� w>fi(y

⇤i )�

y = argmaxy�w>fi(y) + `i(y)

�Justneedefficientloss-augmenteddecode:

rw = w + CX

i

(fi(y)� fi(y⇤i ))

Stillusegeneralsubgradient descentmethods!(Adagrad)

Page 24: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

StructuredMargin(Dual)§ Remembertheconstrainedversionofprimal:

§ Dualhasavariableforeveryconstrainthere

Page 25: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

§ Wewant:

§ Equivalently:

FullMargin:OCR

a lot!…

“brace”

“brace”

“aaaaa”

“brace” “aaaab”

“brace” “zzzzz”

Page 26: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

§ Wewant:

§ Equivalently:

‘It was red’

Parsingexample

a lot!

SA B

C D

SA BD F

SA B

C D

SE F

G H

SA B

C D

SA B

C D

SA B

C D

‘It was red’

‘It was red’

‘It was red’

‘It was red’

‘It was red’

‘It was red’

Page 27: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

§ Wewant:

§ Equivalently:

‘What is the’‘Quel est le’

Alignmentexample

a lot!…

123

123

‘What is the’‘Quel est le’

123

123

‘What is the’‘Quel est le’

123

123

‘What is the’‘Quel est le’

123

123

123

123

123

123

123

123

‘What is the’‘Quel est le’

‘What is the’‘Quel est le’

‘What is the’‘Quel est le’

Page 28: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

CuttingPlane(Dual)§ Aconstraintinductionmethod[Joachimsetal09]

§ Exploitsthatthenumberofconstraintsyouactuallyneedperinstanceistypicallyverysmall

§ Requires(loss-augmented)primal-decodeonly

§ Repeat:§ Findthemostviolatedconstraintforaninstance:

§ Addthisconstraintandresolvethe(non-structured)QP(e.g.withSMOorotherQPsolver)

Page 29: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

CuttingPlane(Dual)§ Someissues:

§ CaneasilyspendtoomuchtimesolvingQPs§ Doesn’texploitsharedconstraintstructure§ Inpractice,worksprettywell;fastlikeperceptron/MIRA,morestable,noaveraging

Page 30: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Likelihood,Structured

§ Structureneededtocompute:§ Log-normalizer§ Expectedfeaturecounts

§ E.g.ifafeatureisanindicatorofDT-NNthenweneedtocomputeposteriormarginalsP(DT-NN|sentence)foreachpositionandsum

§ Alsoworkswithlatentvariables(morelater)

Page 31: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Comparison

Page 32: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Option0:Reranking

x = “The screen was a sea of red.”

…Baseline Parser

Input N-Best List(e.g. n=100)

Non-Structured Classification

Output

[e.g. Charniak and Johnson 05]

Page 33: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Reranking§ Advantages:

§ Directlyreducetonon-structuredcase§ Nolocalityrestrictiononfeatures

§ Disadvantages:§ Stuckwitherrorsofbaselineparser§ Baselinesystemmustproducen-bestlists§ But,feedbackispossible[McCloskey,Charniak,Johnson2006]

Page 34: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

M3Ns§ Anotheroption:expressallconstraintsinapackedform

§ MaximummarginMarkovnetworks[Taskar etal03]§ Integratessolutionstructuredeeplyintotheproblemstructure

§ Steps§ ExpressinferenceoverconstraintsasanLP§ Usedualitytotransformminimax formulationintomin-min§ Constraintsfactorinthedualalongthesamestructureastheprimal;

alphasessentiallyactasadual“distribution”§ Variousoptimizationpossibilitiesinthedual

Page 35: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Example:Kernels

§ Quadratickernels

Page 36: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

Non-LinearSeparators§ Anotherview:kernelsmapanoriginalfeaturespacetosome

higher-dimensionalfeaturespacewherethetrainingsetis(more)separable

Φ: y→ φ(y)

Page 37: Algorithms for NLPtbergkir/11711fa17/FA17 11-711...proposal? En vertu de les nouvelle propositions, quel est le côut prévu de perception de le droits? Combinatorial structure. Structured

WhyKernels?§ Can’tyoujustaddthesefeaturesonyourown(e.g.addall

pairsoffeaturesinsteadofusingthequadratickernel)?§ Yes,inprinciple,justcomputethem§ Noneedtomodifyanyalgorithms§ But,numberoffeaturescangetlarge(orinfinite)§ Somekernelsnotasusefullythoughtofintheirexpanded

representation,e.g.RBFordata-definedkernels[HendersonandTitov05]

§ Kernelsletuscomputewiththesefeaturesimplicitly§ Example:implicitdotproductinquadratickerneltakesmuchless

spaceandtimeperdotproduct§ Ofcourse,there’sthecostforusingthepuredualalgorithms…