ed oct2 2009 - simon fraser universityanoop/papers/pdf/ed_oct2_2009.pdfbrief history of...

55
1 Anoop Sarkar On Sabbatical at U. of Edinburgh (Informatics 4.18b) Simon Fraser University Vancouver, Canada natlang.cs.sfu.ca October 2, 2009 Bootstrapping a Classifier Using the Yarowsky Algorithm

Upload: others

Post on 10-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

1

AnoopSarkarOnSabbaticalatU.ofEdinburgh(Informatics4.18b)

SimonFraserUniversityVancouver,Canada

natlang.cs.sfu.ca October2,2009

BootstrappingaClassifier

UsingtheYarowskyAlgorithm

Page 2: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Acknowledgements

• ThisisjointworkwithmystudentsGholamrezaHaffari(Ph.D.)andMaxWhitney(B.Sc.)atSFU.

• ThankstoMichaelCollinsforprovidingthenamed‐entitydatasetandansweringourquestions.

• ThankstoDamianosKarakosandJasonEisnerforprovidingthewordsensedatasetandansweringourquestions.

2

Page 3: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

3

Bootstrapping

Page 4: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

4

Self‐Training

1.  Abasemodelistrainedwithasmall/largeamountoflabeleddata.

2.  Thebasemodelisthenusedtoclassifytheunlabeleddata.

3.  Onlythemostconfidentunlabeledpoints,alongwiththepredictedlabels,areincorporatedintothelabeledtrainingset(pseudo‐labeleddata).

4.  Thebasemodelisre‐trained,andtheprocessisrepeated.

Page 5: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

5

Self‐Training

•  Itcanbeappliedtoanybaselearningalgorithm:onlyneedconfidenceweightsforpredictions.

•  DifferenceswithEM:•  Self‐trainingonlyusesthemodeofpredictiondistribution.•  Unlikehard‐EM,itcanabstain:“Idonotknowthelabel”.

•  DifferenceswithCo‐training:•  Inco‐trainingtherearetwoviews,ineachofwhichamodel

islearned.•  Themodelinoneviewtrainsthemodelinanotherviewby

providingpseudo‐labeledexamples.

Page 6: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

6

Bootstrapping

•  Startwithafewseedrules(typicallyhighprecision,lowrecall).Buildinitialclassifier.

•  Useclassifiertolabelunlabeleddata.

•  Extractnewrulesfrompseudo‐labeleddataandbuildclassifierfornextiteration.

•  Exitiflabelsforunlabeleddataareunchanged.Else,applyclassifiertounlabeleddataandcontinue.

Page 7: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

77

DecisionList(DL)

•  ADecisionListisanorderedsetofrules.•  Givenaninstancex,thefirstapplicableruledeterminestheclass

label.

•  Insteadoforderingtherules,wecangiveweighttothem.•  Amongallapplicablerulestoaninstancex,applytherulewhich

hasthehighestweight.

•  Theparametersaretheweightswhichspecifytheorderingoftherules.

Rules:Ifxhasfeaturefclassk ,θf,k

parameters

Page 8: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

88

DLforWordSenseDisambiguation

Ifcompany+1,confidenceweight.97Iflife−1,confidenceweight.96…

(Yarowsky1995)

•  WSD:Specifythemostappropriatesense(meaning)ofawordinagivensentence.

  Considerthesetwosentences:  …companysaidtheplantisstilloperating.factorysense+  …anddividelifeintoplantandanimalkingdom.livingorganismsense‐

  Considerthesetwosentences:  …companysaidtheplantisstilloperating.sense+  …anddividelifeintoplantandanimalkingdom.sense‐

  Considerthesetwosentences:  …companysaidtheplantisstilloperating.(company,operating)sense+  …anddividelifeintoplantandanimalkingdom.(life,animal)sense‐

Sorted

Page 9: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Example:disambiguate2sensesofsentence

•  Seedrules:Ifcontextcontainsserved,label+1,conf=1.0Ifcontextcontainsreads,label‐1,conf=1.0

•  Seedruleslabel8outof303unlabeledexamples

•  These8pseudo‐labeledexamplesprovide6rulesabove0.95threshold(includingtheoriginalseedrules)e.g.Ifcontextcontainsread,label‐1,conf=0.953

•  These6ruleslabel151outof303unlabeledexamples

Page 10: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Example:disambiguate2sensesofsentence

•  These151pseudo‐labeledexamplesprovide60rulesabovethethreshold,e.g.Ifcontextcontainsprison,label+1,conf=0.989Ifprevwordislife,label+1,conf=0.986Ifprevwordishis,label+1,conf=0.983Ifnextwordisfrom,label‐1,conf=0.982Ifcontextcontainsrelevant,label‐1,conf=0.953Ifcontextcontainspage,label‐1,conf=0.953

•  After5iterations,297/303unlabeledexamplesarepermanentlylabeled(nochangespossible)

•  Buildingfinalclassifiergives67%accuracyontestsetof515sentences.Withsome“tricks”wecanget76%accuracy.

Page 11: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

11

BriefHistoryofBootstrapping

•  (Yarowsky1995)useditwithDecisionListbaseclassifierforWordSenseDisambiguation(WSD)task.•  Itachievedthesameperformancelevelasthesupervisedalgorithm,

usingonlyafewseedexamplesaslabeledtrainingdata.

•  (Collins&Singer1999)useditforNamedEntityClassificationtaskwithDecisionListbaseclassifier.•  Usingonly7initialrules,itachieved91%accuracy.•  ItachievedthesameperformancelevelasCo‐training(noneedfor2

views).

•  (AbneyACL2002)inapaperaboutco‐trainingcontrastsitwiththeYarowskyalgorithm.Initialanalysisabandonedlater.

Page 12: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

12

BriefHistoryofBootstrapping

•  (AbneyCL2004)providedanewanalysisoftheYarowskyalgorithm.•  ItcouldnotmathematicallyanalyzetheoriginalYarowskyalgorithm,

butintroducednewvariants(wewillseethemlater).

•  (Haffari&SarkarUAI2007)advancedAbney’sanalysisandgaveageneralframeworkthatshowedhowtheYarowskyalgorithmintroducedbyAbneyisrelatedtootherSSLmethods.

•  (EisnerandKarakos2005)examinestheconstructionofseedrulesforbootstrapping.

Page 13: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

13

AnalysisoftheYarowskyAlgorithm

Page 14: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

14

OriginalYarowskyAlgorithm

•  TheYarowskyalgorithmisabootstrappingalgorithmwithaDecisionListbaseclassifier.

•  Thepredictedlabelisk*iftheconfidenceoftheappliedruleisabovesomethresholdη.

•  Aninstancemaybecomeunlabeledinfutureiterations.

(Yarowsky1995)

Page 15: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

15

ModifiedYarowskyAlgorithm

•  Insteadofthefeaturewiththemaxscoreweusethesumofthescoresofallfeaturesactiveforanexampletobelabeled.

•  Thepredictedlabelisk*iftheconfidenceoftheappliedruleisabovethethreshold1/K.•  K:isthenumberoflabels.

•  Aninstancemuststaylabeledonceitbecomeslabeled,butthelabelmaychange.

•  Thesearetheconditionsinallthealgorithmswewillanalyzeintherestofthetalk.•  AnalyzingtheoriginalYarowskyalgorithmisstillanopenquestion.

(Abney2004)

Page 16: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

1616

BipartiteGraphRepresentation

+1companysaidtheplantisstilloperating

‐1dividelifeintoplantandanimalkingdom

company

operating

life

animal

(Features)F

X(Instances)

Unlabeled

(Cordunneanu2006,Haffari&Sarkar2007)

Weproposetoviewbootstrappingaspropagatingthelabelsofinitiallylabelednodestotherestofthegraphnodes.

Page 17: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

1717

Self‐TrainingontheGraph

f

(Features)F X(Instances)

… …

xπx qxLabelingdistribution

+ ‐

1qx

θfLabelingdistribution

+ ‐

.7.3θf

(Haffari&Sarkar2007)

+ ‐

.6.4

+ ‐

1qx

Page 18: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

1818

GoalsoftheAnalysis

•  Tofindreasonableobjectivefunctionsfortheself‐trainingalgorithmsonthebipartitegraph.

•  TheobjectivefunctionsmayshedlighttotheempiricalsuccessofdifferentDL‐basedself‐trainingalgorithms.

•  Itcantellusthekindofpropertiesinthedatawhicharewellexploitedandcapturedbythealgorithms.

•  Itisalsousefulinprovingtheconvergenceofthealgorithms.

Page 19: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

•  KL‐divergenceisameasureofdistancebetweentwoprobabilitydistributions:

•  EntropyHisameasureofrandomnessinadistribution:

•  Theobjectivefunction:

19

ObjectiveFunction

F X

Page 20: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

20

TheBregmanDistance

•  Examples:–  Ifψ(t)=tlogtThenBψ(α,β)=KL(α,β)–  Ifψ(t)=t2ThenBψ(α,β)=Σi(αi‐βi)2

ψ(αi)

αiβi

ψ(t)

t

Page 21: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

•  Givenastrictlyconvexfunctionψ,theBregmandistanceBψbetweentwoprobabilitydistributionsisdefinedas:

•  Theψ‐entropyHψisdefinedas:

•  Thegeneralizedobjectivefunction:

21

GeneralizingtheObjectiveFunction

F X

Page 22: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

22

OptimizingtheObjectiveFunctions

•  Inwhatfollows,wementionsomespecificobjectivefunctionstogetherwiththeiroptimizationalgorithms.

•  TheseoptimizationalgorithmscorrespondtosomevariantsofthemodifiedYarowskyalgorithm.

•  Itisnoteasytocomeupwithalgorithmsfordirectlyoptimizingthegeneralizedobjectivefunctions.

Page 23: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

2323

UsefulOperations

•  Average:takestheaveragedistributionoftheneighbors

•  Majority:takesthemajoritylabeloftheneighbors

(.2,.8)

(.4,.6)

(.3,.7)

(0,1)

(.2,.8)

(.4,.6)

Page 24: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

2424

AnalyzingSelf‐Training

 Theorem.Thefollowingobjectivefunctionsareoptimizedbythecorrespondinglabelpropagationalgorithmsonthebipartitegraph:

F X

where:ConvergesinPolytimeO(|F|2|X|2|)

Relatedtograph‐basedSSlearning(Zhuetal2003)

Abney’svariantofYarowskyalgorithm

Page 25: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

2525

WhataboutLog‐Likelihood?

•  Initially,thelabelingdistributionisuniformforunlabeledverticesandaδ‐likedistributionforlabeledvertices.

•  Bylearningtheparameters,wewouldliketoreducetheuncertaintyinthelabelingdistributionwhilerespectingthelabeleddata:

Negativelog‐Likelihoodoftheoldandnewlylabeleddata

Page 26: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

 Lemma.Ifmisthenumberoffeaturesconnectedtoaninstance,then:

2626

ConnectionbetweenthetwoAnalyses

ComparewithConditionalEntropyRegularization(GrandvaletandBengio2005)!

x!L

KL(qx||!x) + "!

x!U

H(!x)

Page 27: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

27

Experiments

Page 28: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

NamedEntityClassification

• 971,476sentencesfromtheNYTwereparsedwiththeCollinsparser

• Thetaskistoidentifythreetypesofnamedentities:1.  Location(LOC)2.  Person(PER)3.  Organization(ORG)−1.notaNEor“don’tknow”

28

(CollinsandSinger,1999)

Page 29: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

NamedEntityClassification

• Nounphraseswereextractedthatmetthefollowingconditions1.  TheNPcontainedonlywordstaggedasproper

nouns2.  TheNPappearedinthefollowingtwo

syntacticcontexts:  Modifiedbyanappositivewhoseheadisasingular

noun  InaprepositionalphrasemodifyinganNPwhose

headisasingularnoun29

(CollinsandSinger,1999)

Page 30: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

NamedEntityClassification

• Nounphraseswereextractedthatmetthefollowingconditions1.  TheNPcontainedonlywordstaggedasproper

nouns2.  TheNPappearedinthefollowingtwo

syntacticcontexts:  Modifiedbyanappositivewhoseheadisasingular

noun  InaprepositionalphrasemodifyinganNPwhose

headisasingularnoun30

(CollinsandSinger,1999)

NP

NNP NNP NNPS

International Business Machines

…,says[NEMauryCooper],avice[CONTEXTpresident]atS.&P.

…,fraudrelatedtoworkonafederallyfundedsewage[CONTEXTplantin][NEGeorgia]

Page 31: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

NamedEntityClassification

• Thetask:classifyNPsintoLOC,PER,ORG• 89,305trainingexampleswith68,475distinctfeaturetypes– 88,962wasusedinCS99experiments

•  1000testdataexamples(includesNPsthatarenotLOC,PERorORG)

•  Monthnamesareeasilyidentifiableasnotnamedentities:leaves962examples

•  Still85NPsthatarenotLOC,PER,ORG.•  Cleanaccuracyover877;Noisyover962 31

(CollinsandSinger,1999)

Page 32: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

YarowskyVariants

•  AtrickfromtheCo‐trainingpaper(BlumandMitchell1998)istobecautious.Don’taddallrulesabovethe0.95threshold

•  Addonlynrulesperlabel(say5)andincreasethisamountbynineachiteration

•  Changesthedynamicsoflearninginthealgorithmbutnottheobjectivefn

•  Twovariants:Yarowsky(basic),Yarowsky(cautious)

•  Withoutathreshold:Yarowsky(nothreshold)32

(Abney2004,CollinsandSinger,1999)

Page 33: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

ResultsLearning Algorithm Accuracy (Clean) Accuracy (Noisy) Baseline (all organization)

45.8 41.8

EM 83.1 75.8 Yarowsky (basic) 80.7 73.5 Yarowsky (no threshold) 80.3 73.2 Yarowsky (cautious) 91 83 DL-CoTrain 91 83

33

Page 34: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

NumberofRules(basic)

34

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 2 3 4 5 6 7 8 9

Num

. R

ule

s

Iteration

Page 35: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

NumberofRules(cautious)

35 0

1000

2000

3000

4000

5000

6000

7000

0 50 100 150 200 250 300 350 400 450

Num

. R

ule

s

Iteration

Page 36: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Coverage(basic)

36 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9

Co

ve

rag

e

Iteration

Page 37: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Coverage(cautious)

37 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 50 100 150 200 250 300 350 400 450

Co

vera

ge

Iteration

Page 38: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Accuracy(basic)

38

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 2 3 4 5 6 7 8 9 10

Accura

cy

Iteration

Page 39: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Accuracy(cautious)

39

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 50 100 150 200 250 300 350 400 450

Accura

cy

Iteration

Page 40: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Precision‐Recall(basic)

40

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85

Pre

cis

ion

Recall

Page 41: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Precision‐Recall(cautious)

41

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre

cis

ion

Recall

Page 42: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Seeds

• Selectingseedrules:whatisagoodstrategy?–  Frequency:sortbyfrequencyoffeatureoccurrence

– Contexts:sortbynumberofotherfeaturesafeaturewasobservedwith

– Weighted:sortbyaweightedcountofotherfeaturesobservedwithfeature.Weight(f)=count(f)/Σf’count(f’)

42

(EisnerandKarakos2005,ZagibalovandCarroll2008)

Page 43: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Seeds

• Ineachcasethefrequenciesweretakenfromtheunlabeledtrainingdata

• Seedswereextractedfromthesortedlistoffeaturesbymanualinspectionandassignedalabel(theentireexamplewasused)

• Location(LOC)featuresappearinfrequentlyinallthreeorderings

• ItispossiblethatsomegoodLOCseedsweremissed

43

Page 44: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

SeedsNumber of Rules Frequency Contexts Weighted

(n/3) rules/label Clean Noisy Clean Noisy Clean Noisy 3 84 77 84 77 88 80 9 91 83 90 82 82 74 15 91 83 91 83 85 77 7 (CS99) 91 83

44

Page 45: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

WordSenseDisambiguation

• Datafrom(EisnerandKarakos2005)• Disambiguatetwosenseseachfordrug,duty,land,language,position,sentence(Galeet.al.1992)

• Sourceofunlabeleddata:14MwordCanadianHansards(Englishonly)

• Twoseedrulesforeachdisambiguationtaskfrom(EisnerandKarakos2005)

45

Page 46: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Results

46

Learning Algorithm drug land sentence Seeds alcohol medical acres courts served reads Train / Test size 134 / 386 1604 / 1488 303 / 515

Yarowsky (basic) 53.3 79.3 67.7

Yarowsky (no threshold) 52 79 64.8

Yarowsky (cautious) 55.9 79 76.1

DL-CoTrain (2 views = long distance v.s. immediate context)

53.1 77.7 75.9

Page 47: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

47

Self‐trainingforMachineTranslation

Page 48: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

48

Self‐TrainingforSMT

Train

MFE

Bilingualtext

F E

Monolingualtext

DecodeTranslatedtext

F E

F E

SelecthighqualitySent.pairs

Re‐Log‐linearModel

Re‐trainingtheSMTmodel

Page 49: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

49

SelectingSentencePairs

•  Firstgivescores:  Usenormalizeddecoder’sscore  Confidenceestimationmethod(Ueffing&Ney2007)

•  Thenselectbasedonthescores:  Importancesampling:  Thosewhosescoreisaboveathreshold  Keepallsentencepairs

Page 50: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

50

Re‐trainingtheSMTModel

•  UsenewsentencepairstotrainanadditionalphrasetableanduseitasanewfeaturefunctionintheSMTlog‐linearmodel  Onephrasetabletrainedonsentencesforwhichwe

havethetruetranslations  Onephrasetabletrainedonsentenceswiththeir

generatedtranslations

PhraseTable1 PhraseTable2

Page 51: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

51

ChinesetoEnglish(Transductive)

Selection Scoring BLEU%

Baseline 31.8±.7

Keepall 33.1

ImportanceSampling

Norm.score 33.5

Confidence 33.2

Threshold Norm.score 33.5

confidence 33.5

Bold:bestresult,italic:significantlybetter

NISTEval‐2004:train=8.2M,test=1788(4refs)

Train:news,magazines,laws+UN

Test:newswire,editorials,politicalspeeches

We use Portage from NRC as the underlying SMT system (Ueffing et al, 2007)

Page 52: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

52

ChinesetoEnglish(Inductive)

system BLEU%

Baseline 31.8±.7

AddChinesedata Iter1 32.8

Iter4 32.6

Iter10 32.5

Bold:bestresult,italic:significantlybetter

Usingimportancesampling

Before After

editorials 30.7 31.3

newswire 30.0 31.1

speeches 36.1 37.3

Page 53: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

53

Whydoesitwork?

•  Reinforcespartsofthephrasetranslationmodelwhicharerelevantfortestcorpus

•  Gluephrasesfromtestdatausedtocomposenewphrases(mostphrasesstillfromoriginaldata)

Page 54: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

54

Whydoesitwork?

Page 55: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense

Summary

•  ShouldweeveruseCo‐trainingforBootstrapping?• Per‐labelcautiousnessleadstoeffective

bootstrapping.–  ExploitedinYarowskyalgo.,DL‐CoTrain,Co‐Boosting

•  Thesedynamicscan/shouldbeexaminedmoreclosely.–  Perhapsusingtoolsfromtheanalysisoffeature

induction.

• Bootstrappingandself‐trainingmaybemoreeffectivethanyoumayhavethought.

55