Download - Loss-augmented Structured Prediction
Loss-augmentedStructuredPredictionCMSC723/LING723/INST725
MarineCarpuat
Figures,algorithms&equationsfromCIMLchap17
POStaggingSequencelabelingwiththeperceptron
Sequencelabelingproblem• Input:
• sequenceoftokensx=[x1 … xL]• VariablelengthL
• Output(akalabel):• sequenceoftagsy=[y1 … yL]• #tags=K• Sizeofoutputspace?
StructuredPerceptron• Perceptronalgorithmcanbeusedforsequencelabeling
• Buttherearechallenges• Howtocomputeargmax efficiently?• Whatareappropriatefeatures?
• Approach:leveragestructureofoutputspace
Solvingtheargmax problemforsequenceswithdynamicprogramming
• Efficientalgorithmspossibleifthefeaturefunctiondecomposesovertheinput
• ThisholdsforunaryandmarkovfeaturesusedforPOStagging
Featurefunctionsforsequencelabeling
• StandardfeaturesofPOStagging
• Unaryfeatures: #timeswordwhasbeenlabeledwithtaglforallwordswandalltagsl
• Markovfeatures: #timestaglisadjacenttotagl’inoutputforalltagslandl’
• Sizeoffeaturerepresentationisconstantwrtinputlength
Solvingtheargmax problemforsequences
• Trellissequencelabeling• Anypathrepresentsalabelingofinputsentence
• Goldstandardpathinred
• Eachedgereceivesaweightsuchthataddingweightsalongthepathcorrespondstoscoreforinput/ouputconfiguration
• Anymax-weightmax-weightpathalgorithmcanfindtheargmax• e.g.ViterbialgorithmO(LK2)
Definingweightsofedgeintreillis
• Weightofedgethatgoesfromtimel-1totimel,andtransitionsfromytoy’
UnaryfeaturesatpositionltogetherwithMarkovfeaturesthat
endatpositionl
Dynamicprogram
• Define:thescoreofbestpossibleoutputprefixuptoandincludingpositionlthatlabelsthel-th wordwithlabelk
• Withdecomposablefeatures,alphascanbecomputedrecursively
AmoregeneralapproachforargmaxIntegerLinearProgramming• ILP:optimizationproblemoftheform,forafixedvectora
• Withintegerconstraints
• Pro:canleveragewell-engineeredsolvers(e.g.,Gurobi)• Con:notalwaysmostefficient
POStaggingasILP
• Markovfeaturesasbinaryindicatorvariables
• Outputsequence:y(z)obtainedbyreadingoffvariablesz
• Defineasuchthata.z isequaltoscore
• Enforcingconstraintsforwellformedsolutions
Sequencelabeling
• Structuredperceptron• Ageneralalgorithmforstructuredpredictionproblemssuchassequencelabeling
• TheArgmax problem• Efficientargmax forsequenceswithViterbialgorithm,givensomeassumptionsonfeaturestructure
• Amoregeneralsolution:IntegerLinearProgramming
• Loss-augmentedstructuredprediction• Trainingalgorithm• Loss-augmentedargmax
Instructuredperceptron,allerrorsareequallybad
Allbadoutputsequencesarenotequallybad
• Consider• 𝑦"# = 𝐴, 𝐴, 𝐴, 𝐴• 𝑦'# = [𝑁, 𝑉, 𝑁,𝑁]
• HammingLoss
• Givesamorenuancedevaluationofoutputthan0–1loss
Lossfunctionsforstructuredprediction
• Recalllearningasoptimizationforclassification
• e.g.,
• Let’sdefineastructure-awareoptimizationobjective
• e.g.,
Structuredhingeloss• 0iftrueoutputbeats
scoreofeveryimposteroutput
• Otherwise:scaleslinearlyasfunctionofscorediffbetweenmostconfusingimposterandtrueoutput
Optimization:stochasticsubgradient descent
• Subgradients ofstructuredhingeloss?
Optimization:stochasticsubgradient descent
• subgradients ofstructuredhingeloss
Optimization:stochasticsubgradient descentResultingtrainingalgorithm
Only2differencescomparedtostructuredperceptron!
Loss-augmentedinference/searchRecalldynamicprogrammingsolutionwithoutHammingloss
Loss-augmentedinference/searchDynamicprogrammingwithHammingloss
WecanuseViterbialgorithmasbeforeaslongasthelossfunctiondecomposesovertheinputconsistentlywfeatures!
Sequencelabeling
• Structuredperceptron• Ageneralalgorithmforstructuredpredictionproblemssuchassequencelabeling
• TheArgmax problem• Efficientargmax forsequenceswithViterbialgorithm,givensomeassumptionsonfeaturestructure
• Amoregeneralsolution:IntegerLinearProgramming
• Loss-augmentedstructuredprediction• Trainingalgorithm• Loss-augmentedargmax
Syntax&GrammarsFromSequencestoTrees
Syntax&Grammar
• Syntax• FromGreeksyntaxis,meaning“settingouttogether”• referstothewaywordsarearrangedtogether.
• Grammar• Setofstructuralrulesgoverningcompositionofclauses,phrases,andwordsinanygivennaturallanguage• Descriptive,notprescriptive• Panini’sgrammarofSanskrit~2000yearsago
SyntaxandGrammar
• Goalofsyntactictheory• “explainhowpeoplecombinewordstoformsentencesandhowchildrenattainknowledgeofsentencestructure”
• Grammar• implicitknowledgeofanativespeaker• acquiredwithoutexplicitinstruction• minimallyabletogenerateallandonlythepossiblesentencesofthelanguage
[Philips,2003]
SyntaxinNLP
• Syntacticanalysisoftenakeycomponent inapplications• Grammarcheckers• Dialoguesystems• Questionanswering• Informationextraction• Machinetranslation• …
Twoviewsofsyntacticstructure
• Constituency(phrasestructure)• Phrasestructureorganizeswordsinnestedconstituents
• Dependencystructure• Showswhichwordsdependon(modifyorareargumentsof)whichonotherwords
Constituency
• Basicidea:groupsofwordsactasasingleunit
• Constituentsformcoherentclassesthatbehavesimilarly• Withrespecttotheirinternalstructure:e.g.,atthecoreofanounphraseisanoun• Withrespecttootherconstituents:e.g.,nounphrasesgenerallyoccurbeforeverbs
Constituency:Example
• ThefollowingareallnounphrasesinEnglish...
• Why?• Theycanallprecedeverbs• Theycanallbepreposed/postposed• …
GrammarsandConstituency
• Foraparticularlanguage:• Whatarethe“right”setofconstituents?• Whatrulesgovernhowtheycombine?
• Answer:notobviousanddifficult• That’swhytherearemanydifferenttheoriesofgrammarandcompetinganalysesofthesamedata!
• Ourapproach• Focusprimarilyonthe“machinery”
Context-FreeGrammars
• Context-freegrammars(CFGs)• Akaphrasestructuregrammars• AkaBackus-Naurform(BNF)
• Consistof• Rules• Terminals• Non-terminals
Context-FreeGrammars
• Terminals• We’lltakethesetobewords
• Non-Terminals• Theconstituentsinalanguage(e.g.,nounphrase)
• Rules• Consistofasinglenon-terminalontheleftandanynumberofterminalsandnon-terminalsontheright
AnExampleGrammar
ParseTree:Example
Note:equivalencebetweenparsetreesandbracketnotation
DependencyGrammars
• CFGsfocusonconstituents• Non-terminalsdon’tactuallyappearinthesentence
• Independencygrammar,aparseisagraph(usuallyatree)where:• Nodesrepresentwords• Edgesrepresentdependencyrelationsbetweenwords(typedoruntyped,directedorundirected)
DependencyGrammars
• Syntacticstructure=lexicalitemslinkedbybinaryasymmetricalrelationscalleddependencies
DependencyRelations
ExampleDependencyParse
Theyhidtheletterontheshelf
Comparewithconstituentparse…What’stherelation?
UniversalDependenciesproject
• Setofdependencyrelationsthatare• Linguisticallymotivated• Computationallyuseful• Cross-linguisticallyapplicable• [Nivre etal.2016]
• Universaldependencies.org
Summary
• Syntax&Grammar
• Twoviewsofsyntacticstructures• Context-FreeGrammars• Dependencygrammars• Canbeusedtocapturevariousfactsaboutthestructureoflanguage(butnotall!)
• Treebanks asanimportantresourceforNLP