![Page 1: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/1.jpg)
NaturalLanguageInference,ReadingComprehensionandDeepLearning
ChristopherManning@chrmanning•@stanfordnlp
StanfordUniversitySIGIR2016
![Page 2: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/2.jpg)
Machine ComprehensionTested by question answering (Burges)
“Amachinecomprehends apassageoftext if,foranyquestion regardingthattextthatcanbeansweredcorrectlybyamajorityofnativespeakers,thatmachinecanprovideastringwhichthosespeakerswouldagreebothanswersthatquestion,anddoesnotcontaininformationirrelevanttothatquestion.”
![Page 3: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/3.jpg)
IR needs language understanding
• ThereweresomethingsthatkeptIRandNLPapart• IRwasheavilyfocusedonefficiencyandscale• NLPwaswaytoofocusedonformratherthanmeaning
• Nowtherearecompellingreasonsforthemtocometogether• TakingIRprecisionandrecalltothenextlevel• [carparts forsale]• Shouldmatch:Sellingautomobileandpickupengines, transmissions
• Example fromJeffDean’sWSDM2016talk
• Informationretrieval/questionansweringinmobilecontexts• Websnippetsnolongercutitonawatch!
![Page 4: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/4.jpg)
Menu
1. Naturallogic:Aweaklogicoverhumanlanguagesforinference
2. Distributedwordrepresentations
3. Deep,recursiveneuralnetworklanguageunderstanding
![Page 5: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/5.jpg)
How can information retrieval be viewed more as theorem proving (than matching)?
![Page 6: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/6.jpg)
AI2 4th Grade Science Question Answering [Angeli, Nayak, & Manning, ACL 2016]
Our“knowledge”:
Ovariesarethefemalepartoftheflower,whichproduceseggsthatareneededformakingseeds.
Thequestion:
Whichpartofaplantproducestheseeds?
Theanswerchoices:
theflowertheleavesthestemtheroots
![Page 7: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/7.jpg)
How can we represent and reason with broad-coverage knowledge?
1. Rigid-schemaknowledgebaseswithwell-definedlogicalinference
2. Open-domainknowledgebases(OpenIE)– noclearontologyorinference[Etzionietal.2007ff]
3. HumanlanguagetextKB–Norigidschema,butwith“Naturallogic”candoformalinferenceoverhumanlanguagetext [MacCartneyandManning2008]
![Page 8: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/8.jpg)
Natural Language Inference[Dagan 2005, MacCartney & Manning, 2009]
Doesapieceoftextfollowsfromorcontradictanother?TwosenatorsreceivedcontributionsengineeredbylobbyistJackAbramoffinreturnforpoliticalfavors.
JackAbramoffattemptedtobribetwolegislators.
Heretrytoproveorrefuteaccordingtoalargetextcollection:1. Theflowerofaplantproducestheseeds2. Theleavesofaplantproducestheseeds3. Thestemofaplantproducestheseeds4. Therootsofaplantproducestheseeds
Follows
![Page 9: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/9.jpg)
Text as Knowledge Base
Storingknowledgeastextiseasy!
Doinginferencesovertextmightbehard
Don’twanttoruninferenceovereveryfact!
Don’twanttostorealltheinferences!
![Page 10: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/10.jpg)
Inferences … on demand from a query … [Angeli and Manning 2014]
![Page 11: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/11.jpg)
… using text as the meaningrepresentation
![Page 12: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/12.jpg)
Natural Logic: logical inference over text
WearedoinglogicalinferenceThecatateamouse⊨ ¬Nocarnivoreseatanimals
WedoitwithnaturallogicIfImutateasentenceinthisway,doIpreserveitstruth?Post-DealIranAsksifU.S. IsStill‘GreatSatan,’orSomethingLess ⊨ACountryAsksifU.S. IsStill‘GreatSatan,’orSomethingLess
• Asoundandcompleteweaklogic[Icard andMoss2014]• Expressiveforcommonhumaninferences*• “Semantic”parsingisjustsyntacticparsing• Tractable:Polynomialtimeentailmentchecking• Playsnicelywithlexicalmatchingback-offmethods
![Page 13: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/13.jpg)
#1. Common sense reasoning
PolarityinNaturalLogic
WeorderphrasesinpartialordersSimplestone:is-a-kind-ofAlso:geographicalcontainment,etc.
Polarity:Inacertaincontext,isitvalidtomoveupordowninthisorder?
![Page 14: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/14.jpg)
Example inferences
Quantifiersdeterminethepolarity ofphrases
Validmutationsconsider polarity
Successfultoyinference:• Allcatseatmice ⊨ Allhousecatsconsumerodents
![Page 15: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/15.jpg)
“Soft” Natural Logic
• Wealsowanttomakelikely(butnotcertain)inferences• SamemotivationasMarkovlogic,probabilisticsoftlogic,
etc.• Eachmutationedgetemplatefeaturehasacostθ ≥0• Costofanedgeisθi ·fi• Costofapathisθ ·f• Canlearnparametersθ• Inferenceisthengraphsearch
![Page 16: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/16.jpg)
#2. Dealing with real sentences
Naturallogicworkswithfactsliketheseintheknowledgebase:ObamawasborninHawaii
Butreal-worldsentencesarecomplexandlong:BorninHonolulu,Hawaii,Obama isagraduateofColumbiaUniversityandHarvardLawSchool,whereheservedaspresidentoftheHarvardLawReview.
Approach:1. Classifierdivideslongsentencesintoentailedclauses2. Naturallogicinferencecanshortentheseclauses
![Page 17: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/17.jpg)
Universal Dependencies (UD)http://universaldependencies.github.io/docs/
Asingleleveloftypeddependencysyntaxthat(i) worksforallhumanlanguages(ii) givesasimple,human-friendlyrepresentationofsentence
Dependencysyntaxisbetterthanaphrase-structuretreeformachineinterpretation– it’salmostasemanticnetwork
UDaimstobelinguisticallybetteracrosslanguagesthanearlierrepresentations,suchasCoNLLdependencies
![Page 18: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/18.jpg)
Generation of minimal clauses
1. Classificationproblem:givenadependencyedge,doesitintroduceaclause?
2. Isitmissingacontrolledsubjectfromsubj/object?
3. Shortenclauseswhilepreservingvalidity,usingnaturallogic!• Allyoungrabbitsdrinkmilk
⊭ Allrabbitsdrinkmilk
• OK: SJC,theBayArea’sthirdlargestairport,oftenexperiences delaysduetoweather.
• Oftenbetter:SJCoftenexperiencesdelays.
![Page 19: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/19.jpg)
#3. Add a lexical alignment classifier
• Sometimeswecan’tquitemaketheinferencesthatwewouldliketomake:
• Weuseasimplelexicalmatchback-offclassifierwithfeatures:• Matchingwords,mismatchedwords,unmatchedwords• Thesealwaysworkprettywell• Thiswasthe lessonofRTEevaluations andperhaps orIRingeneral
![Page 20: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/20.jpg)
The full system
• Werunourusualsearchoversplitup,shortenedclauses• Ifwefindapremise,great!• Ifnot,weusethelexicalclassifierasanevaluationfunction
• Weworktodothisquicklyatscale• Visit1Mnodes/second, don’trefeaturize, justdelta• 32bytesearchstates (thanksGabor!)
![Page 21: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/21.jpg)
Solving NY State 4th grade science (Allen AI Institute datasets)
Multiplechoicequestionsfromreal4th gradescienceexamsWhichactivityisanexampleofagoodhealthhabit?
(A)Watchingtelevision(B)Smokingcigarettes(C)Eatingcandy(D)Exercisingeveryday
Inourcorpus knowledgebase:• PlasmaTV’scandisplayupto16millioncolors...greatfor
watchingTV...alsomakeagoodscreen.• Notsmokingordrinkingalcoholisgoodforhealth,regardlessof
whetherclothingiswornornot.• Eatingcandyfordinerisanexampleofapoorhealthhabit.• Healthyisexercising
![Page 22: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/22.jpg)
Solving 4th grade science (Allen AI NDMC)
System Dev TestKnowBot[Hixon etal.NAACL2015] 45 –KnowBot(augmentedwith humaninloop) 57 –IRbaseline(Lucene) 49 42NaturalLI 52 51Moredata+IRbaseline 62 58Moredata+NaturalLI 65 61NaturalLI+🔔 +(lex.classifier) 74 67Aristo[Clarketal.2016]6systems,evenmoredata 71
Testset:NewYorkRegents4thGradeScienceexammultiple-choice questionsfromAI2Training:BasicisBarron’sstudyguide;moredataisSciText corpusfromAI2.Score:%correct
![Page 23: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/23.jpg)
Natural Logic
• Canwejustusetextasaknowledgebase?• Naturallogicprovidesauseful,formal(weak)logicfortextual
inference• Naturallogiciseasilycombinablewithlexicalmatching
methods,includingneuralnetmethods• Theresultingsystemisusefulfor:• Common-sensereasoning• QuestionAnswering• OpenInformationExtraction• i.e.,gettingoutrelation triples fromtext
![Page 24: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/24.jpg)
Can information retrieval benefit from distributed representations of words?
![Page 25: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/25.jpg)
From symbolic to distributed representations
Thevastmajorityofrule-basedorstatisticalNLPand IR workregardedwordsasatomicsymbols:hotel, conference,walk
Invectorspaceterms,thisisavectorwithone1andalotofzeroes
[0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
Wenowcallthisa“one-hot”representation.
Sec. 9.2.2
![Page 26: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/26.jpg)
From symbolic to distributed representations
Itsproblem:• Ifusersearchesfor[Dellnotebookbatterysize],wewouldliketomatchdocumentswith“Delllaptopbatterycapacity”
• Ifusersearchesfor[Seattlemotel],wewouldliketomatchdocumentscontaining“Seattlehotel”
Butmotel [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]T
hotel [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0] = 0OurqueryanddocumentvectorsareorthogonalThereisnonaturalnotionofsimilarityinasetofone-hotvectors
Sec. 9.2.2
![Page 27: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/27.jpg)
Capturing similarity
Therearemanythingsyoucandoaboutsimilarity,manywellknowninIR
Queryexpansionwithsynonymdictionaries
Learningwordsimilaritiesfromlargecorpora
ButawordrepresentationthatencodessimilaritywinsLessparameterstolearn(perword,notperpair)
Moresharingofstatistics
Moreopportunitiesformulti-tasklearning
![Page 28: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/28.jpg)
Distributional similarity-based representations
Youcangetalotofvaluebyrepresentingawordbymeansofitsneighbors
“Youshallknowawordbythecompanyitkeeps”(J.R.Firth1957:11)
OneofthemostsuccessfulideasofmodernNLPgovernment debt problems turning into banking crises as has happened in
saying that Europe needs unified banking regulation to replace the hodgepodge
ë Thesewordswillrepresentbankingì
![Page 29: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/29.jpg)
Basic idea of learning neural network word embeddings
Wedefinesomemodelthataimstopredictawordbasedonotherwordsinitscontext
Chooseargmaxw w·((wj−1 +wj+1)/2)
whichhasalossfunction,e.g.,
J =1−wj·((wj−1 +wj+1)/2)
Welookatmanysamplesfromabiglanguagecorpus
Wekeepadjustingthevectorrepresentationsofwordstominimizethisloss
Unitnormvectors
![Page 30: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/30.jpg)
With distributed, distributional representations, syntactic and semantic similarity is captured
0.2860.792−0.177−0.1070.109−0.5420.3490.271
currency =
![Page 31: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/31.jpg)
Distributional representations can solve the fragility of NLP tools
StandardNLPsystems– here,theStanfordParser– areincrediblyfragilebecauseofsymbolicrepresentations
Crazysententialcomplement, suchasfor“likes [(being)crazy]”
![Page 32: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/32.jpg)
Distributional representations can capture the long tail of IR similarity
Google’sRankBrain
Notnecessarilyasgoodfortheheadofthequerydistribution,butgreatforseeingsimilarityinthetail
3rd mostimportantrankingsignal(we’retold…)
![Page 33: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/33.jpg)
LSA:Count!models• Factorizea(maybeweighted,oftenlog-scaled)term-
document(Deerwester etal.1990)orword-contextmatrix(Schütze1992)intoUΣVT
• Retainonlyksingularvalues,inordertogeneralize
[Cf.Baroni:Don’tcount,predict!Asystematiccomparisonofcontext-countingvs.context-predicting semanticvectors. ACL2014]
LSA (Latent Semantic Analysis) vs. word2vec
k
Sec. 18.3
![Page 34: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/34.jpg)
word2vecCBOW/SkipGram:Predict![Mikolov etal.2013]:Simplepredictmodelsforlearningwordvectors• Trainwordvectorstotrytoeither:
• Predictawordgivenitsbag-of-wordscontext(CBOW);or
• Predictacontextword(position-independent)fromthecenterword
• Updatewordvectorsuntiltheycandothispredictionwell
LSA vs. word2vec
Sec. 18.3
![Page 35: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/35.jpg)
word2vec encodes semantic components as linear relations
![Page 36: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/36.jpg)
COALS model (count-modified LSA)[Rohde, Gonnerman & Plaut, ms., 2005]
![Page 37: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/37.jpg)
Count based vs. direct prediction
LSA, HAL (Lund & Burgess), COALS (Rohde et al), Hellinger-PCA (Lebret & Collobert)
• Fast training• Efficient usage of statistics
• Primarily used to capture word similarity
• May not use the best methods for scaling counts
• NNLM, HLBL, RNN, word2vec Skip-gram/CBOW, (Bengio et al; Collobert & Weston; Huang et al; Mnih & Hinton; Mikolov et al; Mnih & Kavukcuoglu)
• Scales with corpus size
• Inefficient usage of statistics
• Can capture complex patterns beyond word similarity
• Generate improved performance on other tasks
![Page 38: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/38.jpg)
Ratiosofco-occurrenceprobabilitiescanencodemeaningcomponents
Crucialinsight:
x =solid x =water
large
x =gas
small
x =random
smalllarge
small large large small
~1 ~1large small
Encoding meaning in vector differences[Pennington, Socher, and Manning, EMNLP 2014]
![Page 39: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/39.jpg)
Ratiosofco-occurrenceprobabilitiescanencodemeaningcomponents
Crucialinsight:
x =solid x =water
1.9x10-4
x =gas x =fashion
2.2x10-5
1.36 0.96
Encoding meaning in vector differences[Pennington, Socher, and Manning, EMNLP 2014]
8.9
7.8x10-4 2.2x10-3
3.0x10-3 1.7x10-5
1.8x10-5
6.6x10-5
8.5x10-2
![Page 40: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/40.jpg)
A:Log-bilinearmodel:
withvectordifferences
Encoding meaning in vector differences
Q:Howcanwecaptureratiosofco-occurrenceprobabilitiesasmeaningcomponentsinawordvectorspace?
![Page 41: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/41.jpg)
Nearestwordsto frog:
1.frogs2.toad3.litoria4.leptodactylidae5.rana6.lizard7.eleutherodactylus
Glove Word similarities[Pennington et al., EMNLP 2014]
litoria leptodactylidae
rana eleutherodactylushttp://nlp.stanford.edu/projects/glove/
![Page 42: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/42.jpg)
Glove Visualizations
http://nlp.stanford.edu/projects/glove/
![Page 43: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/43.jpg)
Glove Visualizations: Company - CEO
![Page 44: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/44.jpg)
Named Entity Recognition Performance
ModelonCoNLL CoNLL ’03dev CoNLL ’03test ACE2 MUC7CategoricalCRF 91.0 85.4 77.4 73.4SVD(logtf) 90.5 84.8 73.6 71.5HPCA 92.6 88.7 81.7 80.7C&W 92.2 87.4 81.7 80.2CBOW 93.1 88.2 82.2 81.1GloVe 93.2 88.3 82.9 82.2
F1scoreofCRFtrainedonCoNLL2003Englishwith50dimwordvectors
![Page 45: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/45.jpg)
Word embeddings: Conclusion
Glovetranslatesmeaningfulrelationshipsbetweenword-wordco-occurrencecounts into linearrelations inthewordvectorspace
GloveshowstheconnectionbetweenCount!workandPredict!work– appropriatescalingofcountsgivesthepropertiesandperformanceofPredict!Models
Alotofotherimportantworkinthislineofresearch:[Levy&Goldberg,2014][Arora,Li,Liang,Ma&Risteski,2015][Hashimoto,Alvarez-Melis &Jaakkola,2016]
![Page 46: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/46.jpg)
Can we use neural networks to understand, not just word similarities,
but language meaning in general?
![Page 47: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/47.jpg)
CompositionalityArtificial Intelligence requires being able to understand bigger things from knowing
about smaller parts
![Page 48: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/48.jpg)
We need more than word embeddings!
Howcanweknowwhenlargerlinguisticunitsaresimilarinmeaning?
Thesnowboarderisleapingoverthemogul
Apersononasnowboardjumpsintotheair
Peopleinterpretthemeaningoflargertextunits–entities,descriptiveterms,facts,arguments,stories– bysemanticcomposition ofsmallerelements
![Page 49: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/49.jpg)
Beyond the bag of words: Sentiment detection
Isthetoneofapieceoftextpositive,negative,orneutral?
• Sentimentisthatsentimentis“easy”• Detectionaccuracyforlongerdocuments~90%,BUT
……loved……………great………………impressed………………marvelous…………
![Page 50: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/50.jpg)
Stanford Sentiment Treebank
• 215,154phraseslabeledin11,855sentences• Cantrainandtestcompositions
http://nlp.stanford.edu:8080/sentiment/
![Page 51: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/51.jpg)
Tree-Structured Long Short-Term Memory Networks [Tai et al., ACL 2015]
![Page 52: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/52.jpg)
Tree-structured LSTM
GeneralizessequentialLSTMtotreeswithanybranchingfactor
![Page 53: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/53.jpg)
Positive/Negative Results on Treebank
75
80
85
90
95
TrainingwithSentence Labels TrainingwithTreebank
BiNB
RNN
MV-RNN
RNTN
TreeLSTM
![Page 54: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/54.jpg)
Experimental Results on Treebank
• TreeRNN cancaptureconstructionslikeXbutY• Biword NaïveBayesisonly58%onthese
![Page 55: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/55.jpg)
Stanford Natural Language Inference Corpus http://nlp.stanford.edu/projects/snli/570K Turker-judged pairs, based on an assumed picture
Amanridesabikeon asnowcoveredroad.Aman isoutside. ENTAILMENT
2femalebabieseatingchips.Twofemalebabiesareenjoyingchips.NEUTRAL
Amaninanapronshoppingatamarket.Amaninanapronispreparingdinner.CONTRADICTION
![Page 56: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/56.jpg)
NLI with Tree-RNNs[Bowman, Angeli, Potts & Manning, EMNLP 2015]
Approach: Wewouldliketoworkoutthemeaningofeachsentenceseparately– apurecompositionalmodel
ThenwecomparethemwithNN&classifyforinference
P(Entail)=0.8
manoutside vs. maninsnow
snowin
insnowman
maninsnow
outsideman
manoutside
Softmax classifier
ComparisonNNlayer(s)
Composition NN layer
Learnedwordvectors
![Page 57: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/57.jpg)
Tree recursive NNs (TreeRNNs)
Theoreticallyappealing
Veryempiricallycompetitive
But
Prohibitivelyslow
Usuallyrequireanexternalparser
Don’texploitcomplementarylinearstructureoflanguage
![Page 58: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/58.jpg)
A recurrent NN allows efficient batched computation on GPUs
![Page 59: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/59.jpg)
TreeRNN: Input-specific structure undermines batched computation
![Page 60: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/60.jpg)
The Shift-reduce Parser-Interpreter NN (SPINN) [Bowman, Gauthier et al. 2016]
BasemodelequivalenttoaTreeRNN,but…supportsbatchedcomputation:25× speedups
Plus:Effectivenewhybridthatcombineslinearandtree-structuredcontext
Canstandalonewithoutaparser
![Page 61: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/61.jpg)
Beginning observation: binary trees = transition sequences
SHIFT SHIFTREDUCE SHIFTSHIFT REDUCE
REDUCE
SHIFT SHIFTSHIFT SHIFTREDUCE REDUCE
REDUCE
SHIFT SHIFTSHIFT REDUCESHIFT REDUCE
REDUCE
![Page 62: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/62.jpg)
The Shift-reduce Parser-Interpreter NN (SPINN)
![Page 63: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/63.jpg)
The Shift-reduce Parser-Interpreter NN (SPINN)
ThemodelincludesasequenceLSTMRNN• ThisactsasasimpleparserbypredictingSHIFTorREDUCE• Italsogivesleftsequencecontextasinputtocomposition
![Page 64: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/64.jpg)
Implementing the stack
• Naïveimplementation:simulatesstacksin abatchwithafixed-sizemultidimensionalarray ateachtimestep• Backpropagationrequiresthateachintermediatestackbemaintainedinmemory
• ⇒ Largeamountofdatacopyingandmovementrequired• Efficientimplementation
• Haveonlyonestackarrayforeachexample• Ateachtimestep,augmentwiththecurrentheadofthestack• Keeplistofbackpointers forREDUCEoperations
• Similartozipperdatastructuresemployedelsewhere
![Page 65: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/65.jpg)
Array Backpointers
1
2
3
4
5
Spot 1
sat 1 2
down 123
(satdown) 14
(Spot(satdown)) 5
A thinner stack
![Page 66: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/66.jpg)
![Page 67: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/67.jpg)
Using SPINN for natural language inference
o₁ o₂the cat
sat down
the cat
is angry
(the cat) (sat down)
(the cat) (is angry)
![Page 68: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/68.jpg)
SNLI Results
Model %Accuracy(Testset)
Feature-based classifier 78.2PreviousSOTAsentenceencoder[Mou etal.2016]
82.1
LSTMRNNsequencemodel 80.6Tree LSTM 80.9SPINN 83.2SOTA(sentencepairalignmentmodel)[Parikhetal.2016]
86.8
![Page 69: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/69.jpg)
Successes for SPINN over LSTM
Exampleswithnegation• P:Therhythmicgymnastcompletesherfloorexerciseatthecompetition.
• H:Thegymnastcannotfinishherexercise.
Longexamples(>20words)• P:AmanwearingglassesandaraggedcostumeisplayingaJaguarelectricguitarandsingingwiththeaccompanimentofadrummer.
• H:Amanwithglassesandadisheveledoutfitisplayingaguitarandsingingalongwithadrummer.
![Page 70: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/70.jpg)
Envoi
• Thereareverygoodreasonsforwantingtorepresentmeaningwithdistributedrepresentations
• Sofar,distributionallearninghasbeenmosteffectiveforthis• Butcf.[Young,Lai,Hodosh&Hockenmaier 2014]ondenotationalrepresentations,usingvisualscenes
• However,wewantnotjustwordmeanings,butalso:• Meaningsoflargerunits,calculatedcompositionally• Theabilitytodonaturallanguageinference
• TheSPINNmodelisfast— closetorecurrentnetworks!• Itshybridsequence/treestructureispsychologicallyplausible
andout-performsothersentencecompositionmethods
![Page 71: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/71.jpg)
2011201320152017speechvisionNLPIR
Final Thoughts
Youarehere
IR NLP
![Page 72: Natural Language Inference, Reading …...Natural Language Inference,Reading Comprehension and Deep Learning Christopher Manning @chrmanning • @stanfordnlp Stanford University SIGIR](https://reader030.vdocuments.us/reader030/viewer/2022040120/5e8d1e68d7f57011bc159a6f/html5/thumbnails/72.jpg)
Final Thoughts
I’mcertainthatdeep learningwillcometodominateSIGIRover thenextcoupleofyears…justlikespeech, vision,andNLPbefore it.Thisisagoodthing.Deeplearningprovidessomepowerfulnewtechniques thatarejustbeingamazinglysuccessful onmanyhardappliedproblems.However, weshould realize thatthere isalsocurrentlyahugeamountofhypeaboutdeep learningandartificialintelligence.Weshouldnotletagenuineenthusiasmforimportantandsuccessful newtechniques leadtoirrationalexuberance oradiminishedappreciation ofotherapproaches. Finally,despite theefforts ofanumberofpeople, inpractice therehas beenaconsiderable divisionbetween thehumanlanguage technologyfieldsofIR,NLP,andspeech.Partlythisisduetoorganizational factorsandpartlythatatonetimethesubfieldseachhadaverydifferent focus.However, recentchanges inemphasis– withIRpeoplewantingtounderstand theuserbetterandNLPpeoplemuchmoreinterested inmeaningandcontext– meanthattherearealotofcommoninterests, andIwouldencouragemuchmorecollaborationbetween NLPandIRinthenextdecade.