ml4nlp - purdue university · ml4nlp introduction to syntax and parsing cs 590nlp dangoldwasser...

59
ML4NLP Introduction to Syntax and Parsing CS 590NLP Dan Goldwasser Purdue University [email protected]

Upload: others

Post on 10-Sep-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ML4NLPIntroduction to Syntax and Parsing

CS 590NLP

Dan GoldwasserPurdue University

[email protected]

Page 2: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

“I shot an elephant in my pajamas”

“How he got into my pajamas,I'll never know.”

Groucho Marx

Page 3: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

“I shot an elephant in my pajamas.”

“I shot an elephant in the zoo.”

Page 4: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Parsing

Language isnotjustastreamofwords,Wewanttorepresentlinguisticstructure!

• Twoviews:– ConstituencyParsing:Buildahierarchicalphrasestructure

– DependencyParsing:Showwordsdependencies(dependency=modifiers,orarguments)

Page 5: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Dependency and constituent parsing

Page 6: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ConstituencyParsing“thegoodolddays..”:Writeaprogram!

S à NP VPNP à Det NNP à NP PPVP à V NPVP à VP PPPP à P NP

NP à JohnNP à MaryN à binocularsN à dogV à sawP à withDet à a

Can you parse: “Mary saw John with binoculars”?How about: “Mary saw a dog with binoculars”?

Page 7: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ConstituencyParsing

• “thegoodolddays..”:Writeaprogram!• Canyoutreatnaturalandformallanguages inthesameway?

Page 8: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ConstituencyParsing

“Fed raises interests rates 0.5% in effort to control inflation”

Howmanyparsingoptions?Millionofpossibleparsesinabroad-coverage grammar

This explains the popularity of statistical methods in NLP : millions of options, but only a few are likely!

Page 9: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

CFG

• Formally:acontext-freegrammaris:• G=(T,N,S,R)– T:terminalsymbols– N:non-terminals– S:startsymbol– R:productionrulesXà Y(whereXisN,YisTorN)

• AgrammarGgeneratesalanguageL

Page 10: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

PCFG

• Formally:aprobabilistic context-freegrammar:• G=(T,N,S,R,P)– T:terminalsymbols– N:non-terminals– S:startsymbol– R:productionrulesXà Y(whereXisN,YisTorN)– P:probabilityfunctionoverR

∀X ∈ N,!

X→Y ∈R

P (X → Y ) = 1

Page 11: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

PCFGexample

S à NP VP 1.0NP à Det N 0.6NP à NP PP 0.4VP à V NP 0.6VP à VP PP 0.4PP à P NP 1.0

NP à John 0.3NP à Mary 0.3N à binoculars 0.2N à dog 0.2V à saw 0.2P à with 0.4Det à a 0.4…

P(Tree) – The probability of a tree is the product of the probabilities of the rules used to generate it.

Page 12: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ConstituencyParsingasStructuredLearning

• CanyoudefinePCFGasastructuredpredictionproblem?– Howwouldyoudefinethepredictionproblem?–Whatarethedependenciesinthemodel?–Whataretheparametersyouneedtolearn?–Whataregoodfeatures?

Page 13: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

CKYAlgorithm

• Dynamicprogrammingalgorithmforparsing• GivenaCFGGandastringw,determinecanGparsew?

• WeassumeGisaCNF:– Eachrulehasatmost2symbolsontherightAàBCorAà B,Aà a

• ThealgorithmmaintainsatriangularDPtable.– Bottomrow:parsestringsofsize1– Secondbottomrow:parsestringsofsize2..– Toprow:parsetheentiresentence!

Page 14: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

DPTriangularTable

X1,5X1,4 X2,5X1,3 X2, 4 X3,5X1,2 X2,3 X3, 4 X4,5X1,1 X2,2 X3,3 X4,4 X5,5w1 w2 w3 w4 w5

Tableforstring‘w’thathaslength5

Page 15: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ConstructingTheTriangularTable

{B} {A,C} {A,C} {B} {A,C}b a a b a

Ateachpointconsiderpossible rules,andtheirprobabilities

Sà AB|BCA à BA|aBà CC|bCà AB|a

Page 16: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu
Page 17: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

CYKAlgorithm

• SimilartoViterbi,keepbackpointers toreconstructtheparsetreefromthetable

• Theruleactivationscoringfunctiondependsonthedependencyassumptions– Lookattheprobabilityofpreviousrowactivation,andconsidertheconditionalprobabilityoftherulegivenpreviousparses.

• OverallComplexity:O(n3 G)

Page 18: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Option2:dependencyparsing

• Keyidea:syntacticstructurerepresentedasrelationsbetweenlexicalitems,calleddependencies

Dependencies can be represented as a graph, where the nodes are words, and edges are dependencies,which are: (1) directional (2) often typed

Mainverb Obj

Subj Det

Root Mary ate a banana

Page 19: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Non-ProjectiveStructure

Root Mary ate a banana today that was yellow

Projectivestructure:nocrossingedges

Arethosereallyneeded?

However,wewilloftenassumenon-projectivity.- Itmakeslifeeasier- Itdoesn’toccuroften

Page 20: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

DependencyParsing

• Weneedtoanswertwoquestions–

– Howcanyoumakeparsingdecisions?(i.e.,inference)

– Howdoyoulearntheparameterstoscorethesedecisions?

Page 21: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

DependencyParsing

• Parser:foreachword,choosewhichotherworditdependson.– Youcanchoosetolabelthesedependencies

• Constraints:– Onlyoneroot– Nocycles

èEssentially, force a tree structure• AdditionalConstraints:nocrossingdependencies

Page 22: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ParsingApproaches

• Twocompetingapproaches–

• ExactInference:mostlygraphbasedalgorithms(e.g.,spanningtree)butalsoILP

• Approximateinference:lineartimetransitionparser

• Transitionbasedparserareverypopular!

Page 23: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

GreedyTransition-basedDependencyParsing(Nivre’03)

• Parseroperatesbymaintainingtwodatastructures:– StackandBuffer

• Parsingisdoneviaasequenceofoperations.– Pushingthewordsfromthebuffertothestack,andassociatingdependencyedgesoverwordsinthestack.

– Shift:takeawordfromthetopofthebuffer,andputitontopofthestack.

– Left/RightArc:Dependencyoperationsassociatedependenciesbetweenwordsinthestack,andremovethedependentwordfromthestack.

• Parsingsequenceendswhenthestackandbufferareempty

Page 24: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

[Root] Ilikelettuce

Stack Buffer

[Root]I likelettuce

[Root]Ilike lettuce

[Root]like lettuce

shift

shift

Left Arc

[Root]likelettuce

Shift

[Root]like

Right Arc

[Root]

Right Arc

Page 25: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

LearningforDependencyParsing

• Learningatransitionparser:usedatatobuildascoringfunctionforparseroperations.– Thisshouldsoundfamiliar..

• Breakthedataintoasequenceofdecisions,andtraina”next-state”function.– Locallearning,(greedy)inferenceonlyattesttime.

• Traditionally:SVM,LR,..– Essentiallyamulticlassclassifieroverthecurrentstateoftheparser.

Page 26: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

LearningforDependencyParsing

• Whichfeatureswouldyouconsider?

Page 27: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

DeepLearningforDependencyParsing(Chen,Manning’14)

Page 28: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu
Page 29: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

FromSyntaxtoSemantics

• Thesyntacticstructureofthesentencecapturessomesemanticproperties(e.g.,recallthePPattachmentproblem).

• However,itdoesnotaccountformeaninginabroadsense.

• Interestingquestion:Whatisacomputationalmodelformeaning?

Whatisthemeaningofmeaning?

Page 30: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Semantic

• Wedistinguishbetween:

• Lexicalsemantics:meaningofwords

• Compositionalsemantics:Combineindividualunitstoformthemeaningoflargerunits.

Page 31: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Applications

• Semanticsiswhatwereallycareabout:– Questionanswering– Intelligentinformationaccess– Robotcommunication– Summarization– …

Page 32: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Deepvs.ShallowSemantics

• Surprisingly,wetendtobelievethatdogsunderstandmuchmore!

• Similarly– shallowNLPperformssurprisinglywell!

Page 33: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Semantics

• Wewilllookattwosemanticsproblems:

– FormalSemanticRepresentation:findamathematical representationofmeaning

– InformationExtraction:“machinereading”view-populateaDBoffactsfromtext.

Page 34: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

FormalModelsofMeaning

Page 35: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

FormalModelsofMeaning

• Formalmodelforcompositionalsemantics:– Formthesemanticsofparents,basedonthesemanticsofthechildren

• Weassumeadictionaryofitems:– Constant symbols– Functions

NP VPJohn Smokes

S

Smokes(John)

Page 36: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ConstantsandFunctions

• Constants– PurdueUniversity– BarakObama

• Properties:– Red(x),Small(x),..

• Relations:– Love(x,y),PresidentOf(BarakObama,USA)

Page 37: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Generatingameaningrepresentation

• Weassumethatsyntacticrepresentationsandcompositionalsemanticsarehighlydependent

• Simplealgorithm:– Createaparsetree– Findsemanticrepresentationofwords(leafnodes)– Combinesemanticsofchildrenintoparentnode(bottomup)

Page 38: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

SemanticParser

• Keyidea:augmentsyntacticparsingwithmeaning!

S à NP VPNP à Det NNP à NP PPVP à V NPVP à VP PPPP à P NP

NP à JohnNP à MaryN à binocularsN à dogV à sawP à withDet à a

Page 39: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

SemanticParser

• Keyidea:augmentsyntacticparsingwithmeaning!

S [SM] à NP [NPM] VP [VPM], Apply(SM, NPM,VPM)VP [VPM] à V [VM] NP [NPM], Apply(VPM, VM,NPM)NP [NPM] à N [NM], Apply(NP, N)..

N [john]à JohnN [mary]à MaryV [λx.saw(x)] à saw

Thisissometimescalledalexicon

Page 40: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Generatingameaningrepresentation

Page 41: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

LearningforSemanticParsing

• Similartosyntacticparsing,therearemanypossiblemeaningderivationforasinglesentence– Eachcouldresultinadifferentsemanticrepresentation!

• Tohelpusdisambiguatethemeaningofasentence,wecandefineprobabilisticparser:

N [john]à John 0.4N [mary]à Mary 0.2V [λx.saw(x)] à saw 0.9

Page 42: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

GroundedLanguageInterpretation• Compositionality:constructingmeaningbycomposing

themeaningoflowerlevelunits.– Lowestlevel(“leaves”)aretypicallyconstantsymbols

• Wheredothesymbolscomefrom?• Weassumeaworldmodel,providingtherelevantsetof

symbols– Peopleonyoursmartphone,transactionsinaDB,entitieson

wikipedia,realworldobjects(”pickupthatblock”)• Analyzingthedifficultyofsemanticinterpretation:

– Complexityoftheinputlanguage,complexityofthesetofsymbols,complexityoftheirmapping

Page 43: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

GroundedLanguageInterpretation• ”pickupthegreenpiece”• ”pickupthegreenpiecethat’snexttothebluepiece”• ”pickupthegreenpiecethat’sshapedlikealettuce”• “pickupthegreenpiecethat’sattheleftendofthebottomrow.

AJointModelofLanguageandPerceptionforGroundedAttributeLearning.Matuszeket-al.2012

Page 44: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

GroundedLanguageInterpretation

• Createameaningrepresentationcapturingthemappingfromlanguagetopreceptsintherealworld

The(probabilityof)truthvalueofthesepredictsdependsonrealworldgrounding

Page 45: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

GroundedLanguageInterpretation

• Createascoringfunction,connectingthetworepresentations

NaturalLanguageCommunicationwithRobots.Bisket-al.2016

Worldrepresentation

Page 46: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

GroundedLanguageInterpretation

• Twocompetingapproaches.– Createanexplicitmeaningrepresentation– Createascoringfunctionthatranksmeaningrepresentations(ortheiroutcomes)

• Wediscusseditinthecontextofgroundedrepresentations(“realworldobjects”)– Similardiscussionfordifferentsettings(e.g.,DBaccess).

• Whichonewillbeeasiertolearn?Whatkindofsupervisioneffortisneededineither?

Page 47: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Scalingup

Voters go to the polls in four states on Tuesday, with Michigan the biggest prize for both parties. Donald J. Trump seeks to strengthen his position as the Republican front-runner, while his rivals look to slow his drive toward the nomination. For the Democrats, Senator Bernie Sanders of Vermont faces a crucial test in his upstart campaign to derail Hillary Clinton.Here are some of the things we will be watching in the contests in Hawaii, Idaho, Michigan and Mississippi.

NYTimes article

Page 48: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

MachineReading

• Morerealistictask:givenunstructuredtext,createstructuredknowledge

• SimpleExamples:– NamedEntityRecognition

• Morecomplicated:– Relationships betweenentities

Page 49: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

IEExample

FortheDemocrats,SenatorBernieSandersofVermont facesacrucialtestinhisupstartcampaigntoderailHillaryClinton.HerearesomeofthethingswewillbewatchinginthecontestsinHawaii,Idaho,Michigan andMississippi.

BernieSandersis-ademocrat

BernieSandersis-fromVermont

Page 50: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

RelationExtraction

• Wemakeadistinctionbetweenclosed andopen IE

• Closed:focusonasmallsetofrelations– Easytothinkaboutasasupervisedtask

• Open:findallrelations

Page 51: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

RelationExtraction

• Populartask:ACE2003defined4types:– Role:member,owner,affiliate,client– Part:subsidiary,physicalpart-of,setmembership– At:location,based-in,residence– Social:parent,sibling,spouse

• Realisticsettings:Freebasehasthousandsofrelations!

Page 52: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

BuildingRelationExtractors

• Simplepatternrecognition

Hearst(1992)

Agarisasubstancepreparedfromamixtureofredalgae,suchasGelidium,forlaboratoryorindustrialuse.

WhatdoesGelidium mean?

Page 53: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

PatternbasedRelationExtraction

Page 54: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

PatternbasedRelationExtraction

Page 55: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

Bootstrapping• Simpleidea:– Givenasmallseedsetofrelations(e.,g byminingpatterns)

– AndALOTofunsupervisedtext– Findmentionsofrelation inthetext– Usementionstocomeupwithnewpatterns!

Page 56: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

SupervisedRelationExtraction

• Givenasentence,findthelistofentities,andpredictifthereisarelation.

• Keyproblem:findingagoodfeaturerepresentation

Zhouetal.2005

Page 57: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

ScalingupRE

• Keyproblem:realisticmachinereadingrequiresdealingwiththousandsofrelations.

• Directlyannotatingforthistaskisnotreasonable,howcanwescaleup?

• Keyidea: distantsupervision– SimilartoBootstrapping+Learning

Page 58: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

DistantSupervision

• Assumewehaveacollectionofrelations– Easy!(e.g.,Freebase,Wikipedia,..)

• ..andthatiftwoentitiesappearinarelation,sentencescontainingthesetwoentitieswillexpressthisrelationship.

• Usesuchsentencesasnoisytrainingdata!

Page 59: ML4NLP - Purdue University · ML4NLP Introduction to Syntax and Parsing CS 590NLP DanGoldwasser Purdue University dgoldwas@purdue.edu

DistantSupervisionExample