cs4705 - columbia universitykathy/nlp/2019/classslides/... · • class parkcipaon using...
TRANSCRIPT
![Page 1: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/1.jpg)
CS4705ProbabilityReviewandNaïveBayesSlidesfromDragomirRadevandmodified
![Page 2: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/2.jpg)
Announcements• Readingfortoday:C.4,4.5NLP• Readingfornextclass:C3,NLP
• NextclasswillbetaughtbyChrisKedzie• Fornewstudentsinclass:• Nolaptoppolicy• ClassparKcipaKonusingPollEverywhereorin-classcomments
![Page 3: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/3.jpg)
Today• SciKitLearnTutorial• WrapuponopKmizaKon• GeneraKvemethods
![Page 4: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/4.jpg)
Regularization• Considerthecasewhereoneormoredocumentsaremis-labeled• Textfromanovelmaybemis-labeledassocialmediaifpostedasaquote
• TheclassifierwillaRempttolearnweightsthatpromotewordscharacterisKcofnovelsaspredictorsofsocialmedia• OverfiTngcanalsooccurwhenthesocialmediadocumentsinthetrainingsetarenotrepresentaKve
![Page 5: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/5.jpg)
Loss• TopreventoverfiTng,aregularizaKonparameterR(Θ)isadded:
![Page 6: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/6.jpg)
TwoCommonregularizers• L2regularizaKon• Keepssumofsquaresofparametervalueslow
• Gaussianpriororweightdecay(HereWisweightsnotincludingb)• Preferstodecreaseparameterwithhighweightby1than10parameterswithlowweights
• L1regularizaKon• KeepssumofabsolutevalueofparameterslowPunisheduniformlyforhighandlowvalues
![Page 7: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/7.jpg)
Gradientbasedoptimization• RepeatunKlL(Loss)<margin• ComputeLoverthetrainingset• ComputegradientsofΘwithrespecttoL• MovetheparametersintheoppositedirecKonofthegradient
![Page 8: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/8.jpg)
StochasticGradientDescent
![Page 9: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/9.jpg)
Problem• Erroriscalculatedbasedonjustonetrainingsample• MaynotberepresentaKveofcorpuswideloss• Insteadcalculatetheerrorbasedonasetoftrainingexamples:minibatch• ->MinibatchstochasKcgradientdescent
![Page 10: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/10.jpg)
ComputingGradients
![Page 11: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/11.jpg)
Summary• Smoothinghelpstoaccountforzerovaluedn-grams• TextclassificaKonusingfeaturevectorsrepresenKngn-gramsandotherproperKes• DiscriminaKvelearning• MethodsforopKmizaKon,lossfuncKonsandregularizaKon
![Page 12: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/12.jpg)
ClassiCicationusingaGenerativeApproach• StartwithNaïveBayesandMaximumLikelihoodExpectaKon• Butweneedsomebackgroundinprobabilityfirst
![Page 13: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/13.jpg)
ProbabilitiesinNLP• Veryimportantforlanguageprocessing• ExampleinspeechrecogniKon:• “recognizespeech”vs“wreckanicebeach”
• ExampleinmachinetranslaKon:• “l’avocatgeneral”:“theaRorneygeneral”vs.“thegeneralavocado”
• ExampleininformaKonretrieval:• Ifadocumentincludesthreeoccurrencesof“sKr”andoneof“rice”,whatistheprobabilitythatitisarecipe
• ProbabiliKesmakeitpossibletocombineevidencefrommulKplesourcessystemaKcally
![Page 14: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/14.jpg)
Probabilities• Probabilitytheory• predicKnghowlikelyitisthatsomethingwillhappen
• Experiment(trial)• e.g.,throwingacoin
• Possibleoutcomes• headsortails
• Samplespaces• discrete(numberof“rice”)orconKnuous(e.g.,temperature)
• Events• Ωisthecertainevent• ∅istheimpossibleevent• eventspace-allpossibleevents
![Page 15: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/15.jpg)
SampleSpace• Randomexperiment:anexperimentwithuncertainoutcome• e.g.,flippingacoin,pickingawordfromtext• Samplespace:allpossibleoutcomes,e.g.,• Tossing2faircoins,Ω={HH,HT,TH,TT}
![Page 16: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/16.jpg)
Events• Event:asubspaceofthesamplespace• E⊆Ω,EhappensiffoutcomeisinE,e.g.,• E={HH}(allheads)• E={HH,TT}(sameface)
• ProbabilityofEvent:0≤P(E)≤1,s.t.• P(Ω)=1(outcomealwaysinΩ)• P(A∪B)=P(A)+P(B),if(A∩B)=∅(e.g.,A=sameface,B=differentface)
![Page 17: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/17.jpg)
Example:TossaDie
• Samplespace:Ω={1,2,3,4,5,6}• Fairdie:• p(1)=p(2)=p(3)=p(4)=p(5)=p(6)=1/6
• Unfairdie:p(1)=0.3,p(2)=0.2,...• N-dimensionaldie:• Ω={1,2,3,4,…,N}
• Exampleinmodelingtext:• TossadietodecidewhichwordtowriteinthenextposiKon• Ω={cat,dog,Kger,…}
![Page 18: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/18.jpg)
Example:FlipaCoin• Ω:{Head,Tail}• Faircoin:• p(H)=0.5,p(T)=0.5• Unfaircoin,e.g.:• p(H)=0.3,p(T)=0.7• Flippingtwofaircoins:• Samplespace:{HH,HT,TH,TT}
• Exampleinmodelingtext:• Flipacointodecidewhetherornottoincludeawordinadocument• Samplespace={appear,absence}
![Page 19: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/19.jpg)
Probabilities
• ProbabiliKes• numbersbetween0and1
• ProbabilitydistribuKon• distributesaprobabilitymassof1throughoutthesamplespaceΩ.
• Example:• AfaircoinistossedthreeKmes.• Whatistheprobabilityof3heads?
![Page 20: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/20.jpg)
Probabilities
• Jointprobability:P(A∩B),alsowriRenasP(A,B)• CondiKonalProbability:P(A|B)=P(A∩B)/P(B)• P(A∩B)=P(A)P(B|A)=P(B)P(A|B)• So,P(A|B)=P(B|A)P(A)/P(B)(Bayes’Rule)• Forindependentevents,P(A∩B)=P(A)P(B),soP(A|B)=P(A)
• Totalprobability:IfA1,…,AnformaparKKonofS,then• P(B)=P(B∩S)=P(B,A1)+…+P(B,An)• So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]• ThisallowsustocomputeP(Ai|B)basedonP(B|Ai)
![Page 21: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/21.jpg)
Probabilities
• Jointprobability:P(A∩B),alsowriRenasP(A,B)• CondiKonalProbability:P(A|B)=P(A∩B)/P(B)• P(A∩B)=P(A)P(B|A)=P(B)P(A|B)• So,P(A|B)=P(B|A)P(A)/P(B)(Bayes’Rule)• Forindependentevents,P(A∩B)=P(A)P(B),soP(A|B)=P(A)
• Totalprobability:IfA1,…,AnformaparKKonofS,then• P(B)=P(B∩S)=P(B,A1)+…+P(B,An)• So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]• ThisallowsustocomputeP(Ai|B)basedonP(B|Ai)
![Page 22: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/22.jpg)
PropertiesofProbabilities• p(∅)=0• P(certainevent)=1• p(X)≤p(Y),ifX⊆Y• p(X∪Y)=p(X)+p(Y),ifX∩Y=∅
![Page 23: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/23.jpg)
ConditionalProbability
• Priorandposteriorprobability• CondiKonalprobability
P(A|B)=P(A∩B)
P(B)
Ω
A B
A∩B
![Page 24: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/24.jpg)
ConditionalProbability• Six-sidedfairdie• P(Deven)=?• P(D>=4)=?• P(Deven|D>=4)=?• P(Dodd|D>=4)=?• MulKplecondiKons• P(Dodd|D>=4,D<=5)=?
![Page 25: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/25.jpg)
![Page 26: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/26.jpg)
![Page 27: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/27.jpg)
![Page 28: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/28.jpg)
![Page 29: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/29.jpg)
![Page 30: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/30.jpg)
Independence
• TwoeventsareindependentwhenP(A∩B)=P(A)P(B)
• UnlessP(B)=0thisisequivalenttosayingthatP(A)=P(A|B)• Iftwoeventsarenotindependent,theyareconsidereddependent
![Page 31: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/31.jpg)
[slidefromBrendanO’Connor]
![Page 32: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/32.jpg)
NaïveBayesClassiCier• WeuseBaye’srule:• P(C|D)=P(D|C)P(C)P(D)HereC=Class,D=Document
• WecansimplifyandignoreP(D)sinceitisindependentofclasschoice• P(C|D)≅P(D|C)P(C)≅P(C)ΠP(wi|C)i=1,n• ThisesKmatestheprobabilityofDbeinginClassCassumingthatDasntokensandwisatokeninD.
![Page 33: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/33.jpg)
UseLabeledTrainingData• P(C)isequivalenttothenumberoflabeleddocumentsintheclass/totalnumberofdocuments:
P(C)=Dc/DP(wi|C)isequivalenttothenumberofKmeswioccurswithlabelC/thenumberofKmesallwordsinthevocabulary(V)occurwithlabelC
P(w,|C)=Count(wiC)/ΣCount(viC)viεV
![Page 34: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/34.jpg)
MultinomialNaïveBayesIndependenceAssumptions
• BagofWordsassumpKon• AssumeposiKondoesn’tmaRer
• CondiKonalIndependence• AssumethefeatureprobabiliKesP(wi|c)areindependentgiventheclassc.
[JurafskyandMarKn]
P(w1,…wn)
P(w1,…wn)=ΠP(wi|C)i=1,n
![Page 35: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/35.jpg)
MultinomialNaïveBayesClassiCier• CMAP=argmaxP(w1…wn|C)P(C)• CNB=argmaxP(Cj)ΠP(w|C)wεW
Thisiswhyit’snaïve!
[JurafskyandMarKn]
![Page 36: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/36.jpg)
Laplace Smoothing: Needed because counts may be zero
P̂(wi | c) =count(wi,c)+1count(w,c)+1( )
w∈V∑
=count(wi,c)+1
count(w,cw∈V∑ )
#
$%%
&
'(( + V
P̂(wi | c) =count(wi,c)count(w,c)( )
w∈V∑
[JurafskyandMarKn]
![Page 37: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/37.jpg)
Questions?
![Page 38: CS4705 - Columbia Universitykathy/NLP/2019/ClassSlides/... · • Class parKcipaon using PollEverywhere or in-class comments Today • SciKit Learn Tutorial • Wrap up on opKmizaon](https://reader036.vdocuments.us/reader036/viewer/2022071215/6044aa1ea972346ca272d2f8/html5/thumbnails/38.jpg)
SciKitLearn