kernels for structured output

20
Kernels for Structured Output SPFLODD November 10, 2011

Upload: others

Post on 04-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

KernelsforStructuredOutput

SPFLODDNovember10,2011

PlanforToday

1.  Treekernels(CollinsandDuffy,2002)2.  Why(input,output)andoutputkernels

aren’treallyavailable

3.  Reranking4.  KernelizingCRFs5.  RaPonalkernels6.  KerneldependencyesPmaPon

KernelsonStructures

•  LastPme,Williamtalkedaboutkernelsonfactorialobjects(treepaths),andalsoaboutstringkernels.–  IdidnotmenPonitinSeptember,buttheM3Npaper(generalizingSVMstostructuredoutputs)useskernelsaswell–oninputs.

•  Theideageneralizesnicelytotrees.•  KeyassumpPon:learningandinferencecanbeaccomplishedifwecanefficientlycalculatef(x)Tf(x’),wherefisourimpliedfeaturespace.

ABitofHistory:“DOP”

•  Data‐orientedparsing:abadnameforaninteresPngidea(Bod,1998).–  EveryconPguoussubtreeisafeature.–  Lotsofpapersonhowtodothisefficiently.– Mostcloselyrelatedtomemory‐basedorinstance‐basedlearning(alongthelinesofKNN).

– Goodman(1996)approximatedwithaPCFG.

•  Theparttoremember:every treefragmentisafeature.

•  Relatedtotreesubs*tu*ongrammar.

AllTreeFragmentsFeatureVector

•  Everypossiblefragmentcorrespondstoadimensioninthevectorf(x).

•  fi(x)=thenumberofPmestheithfragmentoccursinx.

•  f(x)Tf(x’)=numberofexactlymatchingfragmenttokensinxandx’

TreeKernel(CollinsandDuffy,2002)

f(x)!f(x′) =∑

i

fi(x)fi(x′)

=∑

i

(∑

n∈x

[ith fragment matches at n]

)

︸ ︷︷ ︸Ii(n)

(∑

n′∈x′

[ith fragment matches at n′]

)

︸ ︷︷ ︸Ii(n′)

=∑

i

n∈x

Ii(n)∑

n′∈x′

Ii(n′)

=∑

n∈x

n′∈x′

i

Ii(n)Ii(n′)

︸ ︷︷ ︸∆(n,n′)

∆(n, n′) =

0 if productions at n and n′ differ1 if n and n′ are preterminals#kids(n)∏

j=1

(1 + ∆(jth child of n, jth child of n′) otherwise

Notes

•  O(|x||x’|)runPme(numberofnodesineachtree).–  CollinsandDuffyclaimit’sclosertolinearinpracPce.

•  Labeledsequencesareakindoftree.•  YoucanusewordsimilarityfuncPonsinsteadof0/1formatchingwords.

•  CollinsandDuffyusedtheCollinsparser(model2)to:–  providealikelihoodtousealongsidethekernelasafeature

–  provide“mulPplehypotheses”foruseinthevotedperceptronalgorithm

•  ParsinggainsonWSJPennTreebanktask.

“MulPpleHypotheses”?•  Structuredperceptronaswelearnedit(andalsoCRF,SSVM,etc.)

assumewereasonabouttheenPresetofpossibleoutputsyforeachinputx.–  Decoding,summing,cost‐augmenteddecoding.

•  Here,arerankingapproachisassumed. –  Usesomeothermodeltoprovidecandidates.–  DiscriminaPve,kernelizedmodel(here,perceptroninthedual)only

getstorerankcandidates.–  CharniakandJohnson(2005)ranwiththererankingideabutwent

backtolog‐linearmodels,andbyengineeringgoodfeaturesdidquitewell.

•  Reranking:apopularideaintheearly2000s,regardlessofwhetheryouusekernels.

•  Understudiedchallenge:diversityofthen‐bestlist.

GrumpyAside:KnowThyKernel

•  Kernel=setoffeatures•  You’repremymuchalwaysusingakernel.•  Empiricallyitseemsthat:–  knowingyourproblemanddesiginggoodfeaturestoaddtoyour“kernel”isawin

–  tryingallthedifferentkernelsimplementedinSVMlight(withoutunderstandingthedifferences)mayhelpalimle,butnobodycares.

•  Forlanguage,anythingbeyondalinearkernelusuallyneedssomejusPficaPon.

KernelsandDecoding

•  Ideally,wewouldlikekernelsonenPreinputsandoutputs,asinCollinsandDuffy,butlearndirectlyfromthedata,notasasecondaryrerankingstage.

•  Whywon’tthiswork?

decode(x) = arg maxy

w!f(x,y)

= arg maxy

N∑

i=1

y′∈Y(xi)

αi,y′K((xi,y′), (x,y))

KernelsonOutputs

•  InpracPce,apartfromreranking,thisisnotdoneyet.

•  ThereareafewinteresPngpapersthatexplorevariouspossibiliPes,andIwanttodiscusssomeofthem.– KernelCRFs– RaPonalkernels– KerneldependencyesPmaPon

KernelCRFs(Laffertyetal.,2004)

•  Don’ttryforanarbitraryK((x,y),(x’,y’)).•  Instead,defineyourstructureyasanassignmentofvaluestovariablesYinaMarkovnetwork.

•  Kernelsarenowoncliques:K((x,yc),(x’,yc’’)).– Anytwocliquesassignmentsinanytwographs.

•  Representertheorem:inthemodelthatmaximizesregularizedlog‐loss:

score(x,y) =N∑

i=1

c∈cliques(graph(xi))

y′c∈Yc

αi,c,y′cKc((xi,y

′c), (x,yc))

LearningAlgorithm

•  Toomanycliques!•  GreedyforwardselecPon(muchlikeolderfeatureselecPonalgorithms,e.g.,DellaPietraetal.,1997).

•  Basicideaistoiterate:–  Foreverylabeledcliqueinthetrainingdata,calculatethefirstderivaPveoftheobjecPve(regularizedlog‐likelihood)w.r.t.theclique.•  Thisisdoneapproximately,forefficiency.

–  AddthecliquewiththelargestgradienttotheacPveset.–  OpPmizelikelihoodforthecurrentacPvesetofcliques;thisisdoneinthedual.

But…

•  Thistechniqueisnotwidelyused.•  InNLP,mostreportedresultssPckwithlinearkernels;lotsofresultsincludesome“featureengineering.”– Someresearcherssee“featureengineering”asgood,honestwork.

– OthersseeitasadistracPonfrom“general”methods.

– Whatdoyouthink?

RaPonalKernels(Cortesetal.,2004)

•  UndersomecondiPons,youcanuseWFSTstodefineakernelbetweenstrings.– OrbetweensetsofstringsrepresentedasFSAs.

•  ThekernelfuncPonisdefinedbydoingweightedcomposiPonx∘T∘y,andthentakingthesemiringpathsum.– Editdistanceusesmin‐plus.– Stringkernelsuseplus‐Pmes.

PDSKernels

•  NotallkernelsareposiPvedefiniteandsymmetric.–  ThosearenecessarycondiPonsforlearningalgorithmsto“work”withakernel.

•  Cortesetal.definesomeformalproperPes(closureundervariousoperaPons).

•  TheycharacterizesomeexisPngkernelsasPDS.

•  Experimentsincluded,butnotforstructuredoutputs.

KernelDependencyEsPmaPon

PCAandKernelPCA

•  Principalcomponentanalysis(Pearson,1901):transformmulP‐dimensionaldataintouncorrelateddimensions.– EigenvaluedecomposiPonofthecovariancematrix

– SingularvaluedecomposiPonofthedatamatrix

•  KernelPCA(Schoelkopfetal.,1998):doitinaRKHS!– Onlyinnerproductsareneeded.

KernelDependencyEsPmaPon(Westonetal.,2003)

Fornow,imaginejustkernelsonoutputs,K(y,y’).

X

Y

inputs

outputs

“outputfeaturespace”

“pre‐image”problem

kernelPCAmap:principleaxesinRKHSfeaturespace

mulPvariateregression

Punchline

•  YoushouldunderstandthatkernelsareaformalizaPonofthenoPonoffeatures.

•  AbstracPngfeaturesintoakernelcanopenupthepossibilityofusingsomecoollearningalgorithms.

•  ButyouruntheriskofgesngtoofarfromthedataandapplicaPon.

•  Kernelsontheoutput sidecreatesignficantcomputaPonalchallengesthatremaintobesolvedforpracPcaluse.