ed oct2 2009 - simon fraser universityanoop/papers/pdf/ed_oct2_2009.pdfbrief history of...
TRANSCRIPT
![Page 1: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/1.jpg)
1
AnoopSarkarOnSabbaticalatU.ofEdinburgh(Informatics4.18b)
SimonFraserUniversityVancouver,Canada
natlang.cs.sfu.ca October2,2009
BootstrappingaClassifier
UsingtheYarowskyAlgorithm
![Page 2: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/2.jpg)
Acknowledgements
• ThisisjointworkwithmystudentsGholamrezaHaffari(Ph.D.)andMaxWhitney(B.Sc.)atSFU.
• ThankstoMichaelCollinsforprovidingthenamed‐entitydatasetandansweringourquestions.
• ThankstoDamianosKarakosandJasonEisnerforprovidingthewordsensedatasetandansweringourquestions.
2
![Page 3: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/3.jpg)
3
Bootstrapping
![Page 4: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/4.jpg)
4
Self‐Training
1. Abasemodelistrainedwithasmall/largeamountoflabeleddata.
2. Thebasemodelisthenusedtoclassifytheunlabeleddata.
3. Onlythemostconfidentunlabeledpoints,alongwiththepredictedlabels,areincorporatedintothelabeledtrainingset(pseudo‐labeleddata).
4. Thebasemodelisre‐trained,andtheprocessisrepeated.
![Page 5: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/5.jpg)
5
Self‐Training
• Itcanbeappliedtoanybaselearningalgorithm:onlyneedconfidenceweightsforpredictions.
• DifferenceswithEM:• Self‐trainingonlyusesthemodeofpredictiondistribution.• Unlikehard‐EM,itcanabstain:“Idonotknowthelabel”.
• DifferenceswithCo‐training:• Inco‐trainingtherearetwoviews,ineachofwhichamodel
islearned.• Themodelinoneviewtrainsthemodelinanotherviewby
providingpseudo‐labeledexamples.
![Page 6: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/6.jpg)
6
Bootstrapping
• Startwithafewseedrules(typicallyhighprecision,lowrecall).Buildinitialclassifier.
• Useclassifiertolabelunlabeleddata.
• Extractnewrulesfrompseudo‐labeleddataandbuildclassifierfornextiteration.
• Exitiflabelsforunlabeleddataareunchanged.Else,applyclassifiertounlabeleddataandcontinue.
![Page 7: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/7.jpg)
77
DecisionList(DL)
• ADecisionListisanorderedsetofrules.• Givenaninstancex,thefirstapplicableruledeterminestheclass
label.
• Insteadoforderingtherules,wecangiveweighttothem.• Amongallapplicablerulestoaninstancex,applytherulewhich
hasthehighestweight.
• Theparametersaretheweightswhichspecifytheorderingoftherules.
Rules:Ifxhasfeaturefclassk ,θf,k
parameters
![Page 8: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/8.jpg)
88
DLforWordSenseDisambiguation
Ifcompany+1,confidenceweight.97Iflife−1,confidenceweight.96…
(Yarowsky1995)
• WSD:Specifythemostappropriatesense(meaning)ofawordinagivensentence.
Considerthesetwosentences: …companysaidtheplantisstilloperating.factorysense+ …anddividelifeintoplantandanimalkingdom.livingorganismsense‐
Considerthesetwosentences: …companysaidtheplantisstilloperating.sense+ …anddividelifeintoplantandanimalkingdom.sense‐
Considerthesetwosentences: …companysaidtheplantisstilloperating.(company,operating)sense+ …anddividelifeintoplantandanimalkingdom.(life,animal)sense‐
Sorted
![Page 9: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/9.jpg)
Example:disambiguate2sensesofsentence
• Seedrules:Ifcontextcontainsserved,label+1,conf=1.0Ifcontextcontainsreads,label‐1,conf=1.0
• Seedruleslabel8outof303unlabeledexamples
• These8pseudo‐labeledexamplesprovide6rulesabove0.95threshold(includingtheoriginalseedrules)e.g.Ifcontextcontainsread,label‐1,conf=0.953
• These6ruleslabel151outof303unlabeledexamples
![Page 10: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/10.jpg)
Example:disambiguate2sensesofsentence
• These151pseudo‐labeledexamplesprovide60rulesabovethethreshold,e.g.Ifcontextcontainsprison,label+1,conf=0.989Ifprevwordislife,label+1,conf=0.986Ifprevwordishis,label+1,conf=0.983Ifnextwordisfrom,label‐1,conf=0.982Ifcontextcontainsrelevant,label‐1,conf=0.953Ifcontextcontainspage,label‐1,conf=0.953
• After5iterations,297/303unlabeledexamplesarepermanentlylabeled(nochangespossible)
• Buildingfinalclassifiergives67%accuracyontestsetof515sentences.Withsome“tricks”wecanget76%accuracy.
![Page 11: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/11.jpg)
11
BriefHistoryofBootstrapping
• (Yarowsky1995)useditwithDecisionListbaseclassifierforWordSenseDisambiguation(WSD)task.• Itachievedthesameperformancelevelasthesupervisedalgorithm,
usingonlyafewseedexamplesaslabeledtrainingdata.
• (Collins&Singer1999)useditforNamedEntityClassificationtaskwithDecisionListbaseclassifier.• Usingonly7initialrules,itachieved91%accuracy.• ItachievedthesameperformancelevelasCo‐training(noneedfor2
views).
• (AbneyACL2002)inapaperaboutco‐trainingcontrastsitwiththeYarowskyalgorithm.Initialanalysisabandonedlater.
![Page 12: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/12.jpg)
12
BriefHistoryofBootstrapping
• (AbneyCL2004)providedanewanalysisoftheYarowskyalgorithm.• ItcouldnotmathematicallyanalyzetheoriginalYarowskyalgorithm,
butintroducednewvariants(wewillseethemlater).
• (Haffari&SarkarUAI2007)advancedAbney’sanalysisandgaveageneralframeworkthatshowedhowtheYarowskyalgorithmintroducedbyAbneyisrelatedtootherSSLmethods.
• (EisnerandKarakos2005)examinestheconstructionofseedrulesforbootstrapping.
![Page 13: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/13.jpg)
13
AnalysisoftheYarowskyAlgorithm
![Page 14: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/14.jpg)
14
OriginalYarowskyAlgorithm
• TheYarowskyalgorithmisabootstrappingalgorithmwithaDecisionListbaseclassifier.
• Thepredictedlabelisk*iftheconfidenceoftheappliedruleisabovesomethresholdη.
• Aninstancemaybecomeunlabeledinfutureiterations.
(Yarowsky1995)
![Page 15: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/15.jpg)
15
ModifiedYarowskyAlgorithm
• Insteadofthefeaturewiththemaxscoreweusethesumofthescoresofallfeaturesactiveforanexampletobelabeled.
• Thepredictedlabelisk*iftheconfidenceoftheappliedruleisabovethethreshold1/K.• K:isthenumberoflabels.
• Aninstancemuststaylabeledonceitbecomeslabeled,butthelabelmaychange.
• Thesearetheconditionsinallthealgorithmswewillanalyzeintherestofthetalk.• AnalyzingtheoriginalYarowskyalgorithmisstillanopenquestion.
(Abney2004)
![Page 16: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/16.jpg)
1616
BipartiteGraphRepresentation
+1companysaidtheplantisstilloperating
‐1dividelifeintoplantandanimalkingdom
company
operating
life
animal
(Features)F
…
X(Instances)
…
Unlabeled
(Cordunneanu2006,Haffari&Sarkar2007)
Weproposetoviewbootstrappingaspropagatingthelabelsofinitiallylabelednodestotherestofthegraphnodes.
![Page 17: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/17.jpg)
1717
Self‐TrainingontheGraph
f
(Features)F X(Instances)
… …
xπx qxLabelingdistribution
+ ‐
1qx
θfLabelingdistribution
+ ‐
.7.3θf
(Haffari&Sarkar2007)
+ ‐
.6.4
+ ‐
1qx
![Page 18: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/18.jpg)
1818
GoalsoftheAnalysis
• Tofindreasonableobjectivefunctionsfortheself‐trainingalgorithmsonthebipartitegraph.
• TheobjectivefunctionsmayshedlighttotheempiricalsuccessofdifferentDL‐basedself‐trainingalgorithms.
• Itcantellusthekindofpropertiesinthedatawhicharewellexploitedandcapturedbythealgorithms.
• Itisalsousefulinprovingtheconvergenceofthealgorithms.
![Page 19: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/19.jpg)
• KL‐divergenceisameasureofdistancebetweentwoprobabilitydistributions:
• EntropyHisameasureofrandomnessinadistribution:
• Theobjectivefunction:
19
ObjectiveFunction
F X
![Page 20: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/20.jpg)
20
TheBregmanDistance
• Examples:– Ifψ(t)=tlogtThenBψ(α,β)=KL(α,β)– Ifψ(t)=t2ThenBψ(α,β)=Σi(αi‐βi)2
ψ(αi)
αiβi
ψ(t)
t
![Page 21: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/21.jpg)
• Givenastrictlyconvexfunctionψ,theBregmandistanceBψbetweentwoprobabilitydistributionsisdefinedas:
• Theψ‐entropyHψisdefinedas:
• Thegeneralizedobjectivefunction:
21
GeneralizingtheObjectiveFunction
F X
![Page 22: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/22.jpg)
22
OptimizingtheObjectiveFunctions
• Inwhatfollows,wementionsomespecificobjectivefunctionstogetherwiththeiroptimizationalgorithms.
• TheseoptimizationalgorithmscorrespondtosomevariantsofthemodifiedYarowskyalgorithm.
• Itisnoteasytocomeupwithalgorithmsfordirectlyoptimizingthegeneralizedobjectivefunctions.
![Page 23: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/23.jpg)
2323
UsefulOperations
• Average:takestheaveragedistributionoftheneighbors
• Majority:takesthemajoritylabeloftheneighbors
(.2,.8)
(.4,.6)
(.3,.7)
(0,1)
(.2,.8)
(.4,.6)
![Page 24: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/24.jpg)
2424
AnalyzingSelf‐Training
Theorem.Thefollowingobjectivefunctionsareoptimizedbythecorrespondinglabelpropagationalgorithmsonthebipartitegraph:
F X
where:ConvergesinPolytimeO(|F|2|X|2|)
Relatedtograph‐basedSSlearning(Zhuetal2003)
Abney’svariantofYarowskyalgorithm
![Page 25: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/25.jpg)
2525
WhataboutLog‐Likelihood?
• Initially,thelabelingdistributionisuniformforunlabeledverticesandaδ‐likedistributionforlabeledvertices.
• Bylearningtheparameters,wewouldliketoreducetheuncertaintyinthelabelingdistributionwhilerespectingthelabeleddata:
Negativelog‐Likelihoodoftheoldandnewlylabeleddata
![Page 26: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/26.jpg)
Lemma.Ifmisthenumberoffeaturesconnectedtoaninstance,then:
2626
ConnectionbetweenthetwoAnalyses
ComparewithConditionalEntropyRegularization(GrandvaletandBengio2005)!
x!L
KL(qx||!x) + "!
x!U
H(!x)
![Page 27: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/27.jpg)
27
Experiments
![Page 28: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/28.jpg)
NamedEntityClassification
• 971,476sentencesfromtheNYTwereparsedwiththeCollinsparser
• Thetaskistoidentifythreetypesofnamedentities:1. Location(LOC)2. Person(PER)3. Organization(ORG)−1.notaNEor“don’tknow”
28
(CollinsandSinger,1999)
![Page 29: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/29.jpg)
NamedEntityClassification
• Nounphraseswereextractedthatmetthefollowingconditions1. TheNPcontainedonlywordstaggedasproper
nouns2. TheNPappearedinthefollowingtwo
syntacticcontexts: Modifiedbyanappositivewhoseheadisasingular
noun InaprepositionalphrasemodifyinganNPwhose
headisasingularnoun29
(CollinsandSinger,1999)
![Page 30: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/30.jpg)
NamedEntityClassification
• Nounphraseswereextractedthatmetthefollowingconditions1. TheNPcontainedonlywordstaggedasproper
nouns2. TheNPappearedinthefollowingtwo
syntacticcontexts: Modifiedbyanappositivewhoseheadisasingular
noun InaprepositionalphrasemodifyinganNPwhose
headisasingularnoun30
(CollinsandSinger,1999)
NP
NNP NNP NNPS
International Business Machines
…,says[NEMauryCooper],avice[CONTEXTpresident]atS.&P.
…,fraudrelatedtoworkonafederallyfundedsewage[CONTEXTplantin][NEGeorgia]
![Page 31: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/31.jpg)
NamedEntityClassification
• Thetask:classifyNPsintoLOC,PER,ORG• 89,305trainingexampleswith68,475distinctfeaturetypes– 88,962wasusedinCS99experiments
• 1000testdataexamples(includesNPsthatarenotLOC,PERorORG)
• Monthnamesareeasilyidentifiableasnotnamedentities:leaves962examples
• Still85NPsthatarenotLOC,PER,ORG.• Cleanaccuracyover877;Noisyover962 31
(CollinsandSinger,1999)
![Page 32: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/32.jpg)
YarowskyVariants
• AtrickfromtheCo‐trainingpaper(BlumandMitchell1998)istobecautious.Don’taddallrulesabovethe0.95threshold
• Addonlynrulesperlabel(say5)andincreasethisamountbynineachiteration
• Changesthedynamicsoflearninginthealgorithmbutnottheobjectivefn
• Twovariants:Yarowsky(basic),Yarowsky(cautious)
• Withoutathreshold:Yarowsky(nothreshold)32
(Abney2004,CollinsandSinger,1999)
![Page 33: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/33.jpg)
ResultsLearning Algorithm Accuracy (Clean) Accuracy (Noisy) Baseline (all organization)
45.8 41.8
EM 83.1 75.8 Yarowsky (basic) 80.7 73.5 Yarowsky (no threshold) 80.3 73.2 Yarowsky (cautious) 91 83 DL-CoTrain 91 83
33
![Page 34: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/34.jpg)
NumberofRules(basic)
34
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
1 2 3 4 5 6 7 8 9
Num
. R
ule
s
Iteration
![Page 35: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/35.jpg)
NumberofRules(cautious)
35 0
1000
2000
3000
4000
5000
6000
7000
0 50 100 150 200 250 300 350 400 450
Num
. R
ule
s
Iteration
![Page 36: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/36.jpg)
Coverage(basic)
36 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9
Co
ve
rag
e
Iteration
![Page 37: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/37.jpg)
Coverage(cautious)
37 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 50 100 150 200 250 300 350 400 450
Co
vera
ge
Iteration
![Page 38: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/38.jpg)
Accuracy(basic)
38
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10
Accura
cy
Iteration
![Page 39: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/39.jpg)
Accuracy(cautious)
39
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 50 100 150 200 250 300 350 400 450
Accura
cy
Iteration
![Page 40: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/40.jpg)
Precision‐Recall(basic)
40
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85
Pre
cis
ion
Recall
![Page 41: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/41.jpg)
Precision‐Recall(cautious)
41
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pre
cis
ion
Recall
![Page 42: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/42.jpg)
Seeds
• Selectingseedrules:whatisagoodstrategy?– Frequency:sortbyfrequencyoffeatureoccurrence
– Contexts:sortbynumberofotherfeaturesafeaturewasobservedwith
– Weighted:sortbyaweightedcountofotherfeaturesobservedwithfeature.Weight(f)=count(f)/Σf’count(f’)
42
(EisnerandKarakos2005,ZagibalovandCarroll2008)
![Page 43: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/43.jpg)
Seeds
• Ineachcasethefrequenciesweretakenfromtheunlabeledtrainingdata
• Seedswereextractedfromthesortedlistoffeaturesbymanualinspectionandassignedalabel(theentireexamplewasused)
• Location(LOC)featuresappearinfrequentlyinallthreeorderings
• ItispossiblethatsomegoodLOCseedsweremissed
43
![Page 44: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/44.jpg)
SeedsNumber of Rules Frequency Contexts Weighted
(n/3) rules/label Clean Noisy Clean Noisy Clean Noisy 3 84 77 84 77 88 80 9 91 83 90 82 82 74 15 91 83 91 83 85 77 7 (CS99) 91 83
44
![Page 45: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/45.jpg)
WordSenseDisambiguation
• Datafrom(EisnerandKarakos2005)• Disambiguatetwosenseseachfordrug,duty,land,language,position,sentence(Galeet.al.1992)
• Sourceofunlabeleddata:14MwordCanadianHansards(Englishonly)
• Twoseedrulesforeachdisambiguationtaskfrom(EisnerandKarakos2005)
45
![Page 46: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/46.jpg)
Results
46
Learning Algorithm drug land sentence Seeds alcohol medical acres courts served reads Train / Test size 134 / 386 1604 / 1488 303 / 515
Yarowsky (basic) 53.3 79.3 67.7
Yarowsky (no threshold) 52 79 64.8
Yarowsky (cautious) 55.9 79 76.1
DL-CoTrain (2 views = long distance v.s. immediate context)
53.1 77.7 75.9
![Page 47: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/47.jpg)
47
Self‐trainingforMachineTranslation
![Page 48: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/48.jpg)
48
Self‐TrainingforSMT
Train
MFE
Bilingualtext
F E
Monolingualtext
DecodeTranslatedtext
F E
F E
SelecthighqualitySent.pairs
Re‐Log‐linearModel
Re‐trainingtheSMTmodel
![Page 49: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/49.jpg)
49
SelectingSentencePairs
• Firstgivescores: Usenormalizeddecoder’sscore Confidenceestimationmethod(Ueffing&Ney2007)
• Thenselectbasedonthescores: Importancesampling: Thosewhosescoreisaboveathreshold Keepallsentencepairs
![Page 50: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/50.jpg)
50
Re‐trainingtheSMTModel
• UsenewsentencepairstotrainanadditionalphrasetableanduseitasanewfeaturefunctionintheSMTlog‐linearmodel Onephrasetabletrainedonsentencesforwhichwe
havethetruetranslations Onephrasetabletrainedonsentenceswiththeir
generatedtranslations
PhraseTable1 PhraseTable2
![Page 51: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/51.jpg)
51
ChinesetoEnglish(Transductive)
Selection Scoring BLEU%
Baseline 31.8±.7
Keepall 33.1
ImportanceSampling
Norm.score 33.5
Confidence 33.2
Threshold Norm.score 33.5
confidence 33.5
Bold:bestresult,italic:significantlybetter
NISTEval‐2004:train=8.2M,test=1788(4refs)
Train:news,magazines,laws+UN
Test:newswire,editorials,politicalspeeches
We use Portage from NRC as the underlying SMT system (Ueffing et al, 2007)
![Page 52: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/52.jpg)
52
ChinesetoEnglish(Inductive)
system BLEU%
Baseline 31.8±.7
AddChinesedata Iter1 32.8
Iter4 32.6
Iter10 32.5
Bold:bestresult,italic:significantlybetter
Usingimportancesampling
Before After
editorials 30.7 31.3
newswire 30.0 31.1
speeches 36.1 37.3
![Page 53: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/53.jpg)
53
Whydoesitwork?
• Reinforcespartsofthephrasetranslationmodelwhicharerelevantfortestcorpus
• Gluephrasesfromtestdatausedtocomposenewphrases(mostphrasesstillfromoriginaldata)
![Page 54: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/54.jpg)
54
Whydoesitwork?
![Page 55: ed oct2 2009 - Simon Fraser Universityanoop/papers/pdf/ed_oct2_2009.pdfBrief History of Bootstrapping • (Yarowsky 1995) used it with Decision List base classifier for Word Sense](https://reader034.vdocuments.us/reader034/viewer/2022050523/5fa6e400bda011618d34b8f8/html5/thumbnails/55.jpg)
Summary
• ShouldweeveruseCo‐trainingforBootstrapping?• Per‐labelcautiousnessleadstoeffective
bootstrapping.– ExploitedinYarowskyalgo.,DL‐CoTrain,Co‐Boosting
• Thesedynamicscan/shouldbeexaminedmoreclosely.– Perhapsusingtoolsfromtheanalysisoffeature
induction.
• Bootstrappingandself‐trainingmaybemoreeffectivethanyoumayhavethought.
55