cvpr 2020 tutorial - thoth.inrialpes.frthoth.inrialpes.fr/~alahari/teaching/inclearn.pdf · deep...

IncrementalLearning

KarteekAlahariInria,France

CVPR2020TutorialTowardsAnnotation-EfficientLearning

IncrementalLearning?

•  Continuallearning

•  Lifelonglearning

•  Sequentiallearning

•  Never-endingLearning2KA:IncrementalLearning

AnIncrementalLearningScenario

•  GrowingupinIndia

3KA:IncrementalLearning


•  Andthenduringtravels



•  Andthenduringtravels

Butcanstillrememberholycows!


Slidecredit:Hung-yiLee6KA:IncrementalLearning

TRAIN–VALIDATION–TEST

Allsampledfromthesamedistribution->benchmarksandacademicdatasets->real-worldsystems->embodiedlearning

StandardMachineLearning

Slidecredit:T.Tuytelaars7KA:IncrementalLearning

-  Task-incrementallearning-  Class-incrementallearning-  Domain-incrementallearning

IncrementalLearningSetup


IncrementalLearning•  Aclassicalprobleminmachinelearning,e.g.,

[Carpenteretal.’92,CauwenberghsandPoggio’00,Polikaretal.’01,SchlimmerandFisher’86,Thrun’96]

•  Somemethods

–  Zero-shotlearning,e.g.,[Lampertetal.’13]Notrainingstepforunseenclasses

–  Continuouslyupdatethetrainingset,e.g.,[Chenetal.’13]Keepdataandretrain

–  Useafixeddatarepresentation,e.g.,[Mensinketal.’13]Simplifythelearningproblem 10KA:IncrementalLearning

Originalmodel

Jointtraining“golden”baseline

Figuresfrom[LiandHoiem2016]

BruteForceSolution(non-incremental)


Retrainfullmodelwithbotholdandnewdata• Computationallyexpensive

• Needsaccesstoolddata•  Storagecapacitylimitations•  Privacyissues•  Scalabilityissues

FeatureExtraction

Classification

…

BruteForceSolution(non-incremental)

Slidecredit:T.Tuytelaars 12KA:IncrementalLearning

Whynotbruteforce?

•  Noaccesstoallthedata

•  Cannotstoreallthedata

•  Accesstoonlyapreviouslylearnedmodel,e.g.,trainedbyothers


Originalmodel

Featureextraction


NaïveSolution1


•  Finetuneonlylastlayerusingnewdataonly–  Leadstosuboptimalresults

FeatureExtraction

Classification

…

NaïveSolution1


Originalmodel

Finetuning


NaïveSolution2


NaïveSolution2

•  Finetunethenetworkusingnewdataonly–  Leadstocatastrophicforgetting

FeatureExtraction

Classification

…


IncrementalLearning:ComputerVisionTask

A

B

18

Baseline,i.e.,trainingwithalltheclasses

Trainingwiththeinitialsetofclasses

Trainingwiththenewsetofclasses

HowwelldoesnetworkBperform?

12.868.4

64.571.3

38.769.8

Noguidanceforretainingtheoldclasses19KA:IncrementalLearning[Catastrophicforgetting:McCloskeyandCohen1989,Ratcliff1990]

•  Learnonetaskaftertheother•  Withoutstoring(many)datafromprevioustasks•  Withoutmemoryfootprintgrowing(significantly)overtime•  Without(completely)forgettingoldtasks

IncrementalLearning:TheRules!

Slidecredit:T.Tuytelaars

Task1 Task2 Task3 Taskn…


Whatelsewillweseetoday?•  Flavourofdifferentapproaches:

1.  Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2.  Rehearsal/Replay:iCaRL,DGR,GEM,…3.  Architecturebased:PackNet,progressivenets,HAT,…

•  Morethanclassification?

•  Takeaways21KA:IncrementalLearning


1.   Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2.  Rehearsal/Replay:iCaRL,DGR,GEM,…3.  Architecturebased:PackNet,progressivenets,HAT,…



Regularization-basedModels

•  Whentraininganewtask,–  addaregularizationtermtotheloss–  i.e.,termtopenalizecatastrophicforgetting

•  R1:data-focusedmethods•  R2:model/prior-focusedmethods


Data-focusedRegularization:LearningwithoutForgetting

•  Knowledgedistillationloss–  i.e.,preservationofresponses

FeatureExtraction

Classification

…Newtaskinput

Newtaskannotations

Previousmodel’soutputforoldtasks

24KA:IncrementalLearning Slidecredit:T.Tuytelaars[Li&Hoiem2016]

Data-focusedRegularization:LearningwithoutForgetting

25KA:IncrementalLearning[Li&Hoiem2016]

Simplemethod;goodresultsforrelatedtasks

Poorresultsforunrelatedtasks

? Needtostoretheoldmodel

•  Penalizechangesto‘important’parameters

Model-focusedRegularization


Lossonnewtask(s) Regularization

Differentvariantspossiblefor“importance”andregularization

•  Elasticweightconsolidation[Kirkpatricketal.,2017]–  Indiv.penaltyforeachprevioustask–  Fisherinformationmatrixfor


27KA:IncrementalLearningFigurefrompaper

•  Elasticweightconsolidation[Kirkpatricketal.,2017]–  Indiv.penaltyforeachprevioustask–  Fisherinformationmatrixfor



Agnostictoarchitecture;Goodresultsempirically

Onlyvalidlocally

? Needtostoreimportanceweights

•  Memoryawaresynapses[Aljundietal.,2018]–  Considersonlytheprevioustask–  Changeingradientsfor


29KA:IncrementalLearningFigurefrompaper

•  Memoryawaresynapses[Aljundietal.,2018]–  Considersonlytheprevioustask–  Changeingradientsfor



Agnostictoarchitecture;Leveragesdata&output

Onlyvalidlocally

? Needtostoreimportanceweights

•  Twoexamples–  Elasticweightconsolidation[Kirkpatricketal.,2017]–  Memoryawaresynapses[Aljundietal.,2018]

•  Otheralternatives–  PathIntegral/SynapticIntelligence:largechangesduringtraining[Zenkeetal.,2017]

–  Momentmatching[Leeetal.,2017]–  Pathnet[Fernandoetal.,2017]–  …




1.  Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2.   Rehearsal/Replay:iCaRL,DGR,GEM,…3.  Architecturebased:PackNet,progressivenets,HAT,…



Rehearsal/Replay-basedmethods

•  Storeacoupleofexamplesfromprevioustasks

•  Orproducesamplesfromagenerativemodel

•  But–  Howmany?–  Howtoselectthem?–  Howtousethem?


iCaRL:Incrementalclassifierandrepresentationlearning

•  Selectssamplesthatareclosesttothefeaturemeanofeachclass

•  Knowledgedistillationloss[Hintonetal.’14]

•  Cleveruseofavailablememory(seethefollowing)

34KA:IncrementalLearning[Rebuffietal.2017]



Splittheprobleminto:•  learningfeatures,andthen•  usingNCMclassifier

iCaRL:Incrementalclassifierandrepresentationlearning[Rebuffietal.’17]


Classificationloss

Distillationloss:Comparingoldvsnew



Cleveruseofavailablememory

Potentialissueswithstoringdata,e.g.,privacy

Limitedbythememorycapacity(themorethebetter)


1.  Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2.   Rehearsal/Replay:iCaRL,DGR,GEM,…3.  Architecturebased:PackNet,progressivenets,HAT,…



●  Themodel“Scholar”iscomposedof:●  agenerator+asolver(classifier)

●  Thegeneratorandthesolverareupdatedineveryincrementalstep

DeepGenerativeReplay

[Shinetal.2017]Figurefromthepaper

Scholar


Trainingprocedure:

●  Attaskt,wetrainanewScholar●  withdatafromthetaskt,and

●  datageneratedbythepreviouslytrainedScholarattaskt-1


[Shinetal.2017]Figurefromthepaper Slidecourtesy:A.Massenet41KA:IncrementalLearning


Slidecourtesy:A.Massenet[Shinetal.2017]Figurefromthepaper

Trainingprocedure(Generator):

●  Withdatafromtaskt,and

●  datageneratedbythepreviouslytrainedScholarfortaskt-1


Target = Label


Slidecourtesy:A.Massenet[Shinetal.2017]Figurefromthepaper

Trainingprocedure(Solver):

●  Withdatafromtaskt,and

●  DatafromgeneratorandsolverofthepreviouslytrainedScholarfortaskt-1



Slidecourtesy:A.Massenet

Avoidsmemoryissues

Accumulationoferrors

Nocontrolovertheclassofthegeneratedsamples

[Shinetal.2017]Figurefromthepaper 44KA:IncrementalLearning


1.  Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2.  Rehearsal/Replay:iCaRL,DGR,GEM,…3.   Architecturebased:PackNet,progressivenets,HAT,…



Architecture-based

PackNet[Mallya&Lazebnik’17]Figurefromthepaper 46KA:IncrementalLearning

Fixedmemoryconsumption

Needsthetotalnumberoftasks

Avoidsforgetting

Architecture-based

PackNet[Mallya&Lazebnik’17] 47KA:IncrementalLearning

AComparativeAnalysis•  TinyImagenet:small,balanced,class-incremental•  iNaturalist:large-scale,unbalanced,task-incremental

•  Fairwayofsettinghyperparameters(stability-plasticitytradeoff)

48KA:IncrementalLearning[Langeetal.,2020]

ComparativeEvaluation(TinyImagenet)

Imagecredit:[Langeetal.,2020] 49KA:IncrementalLearning

Regularization&Architecturebased

ComparativeEvaluation(TinyImagenet)


Rehearsal/ReplaybasedImagecredit:[Langeetal.,2020]

GeneralTrends

•  Rehearsal/replaybasedmethodsonlypayoffwhenstoringsignificantamountofexemplars

•  PackNetresultsinno-forgettingandproducestopresults

•  MASmorerobustthanEWC

Slidecredit:T.Tuytelaars 51KA:IncrementalLearning

WhatkindofmodelshouldIuse?

•  Largermodelsgivemorecapacity(but:overfitting)•  Wideisbetterthandeep

•  Regularizationmayinterferewithincrementallearning•  Dropoutusuallybetterthanweightdecay

52KA:IncrementalLearningSlidecredit:T.Tuytelaars

MitigateCatastrophicForgetting•  Learningwithoutforgetting[LiandHoiem2016]

•  iCaRL[Rebuffietal.2017]

old model’s TasksdefinedonanewdatasetFocusonimageclassification(rareco-occurrenceofoldandnew)

DecoupleclassifierandfeaturelearningRelyonasubsetoftheolddata


MitigateCatastrophicForgetting•  Elasticweightconsolidation[Kirkpatricketal.,2017]–  Selectivelyslowingdownlearningonweights

•  Otherattempts,e.g.,[Aljundietal.,2018,Jungetal.,2016,MallyaandLazebnik,2017,Risinetal.,2014,Rusuetal.,2016]

LimitedtospecificsettingsFocusonimageclassification(rareco-occurrenceofoldandnew)


MitigateCatastrophicForgetting•  Elasticweightconsolidation[Kirkpatricketal.,2017]–  Selectivelyslowingdownlearningonweights

•  Otherattempts,e.g.,[Jungetal.,2016,MallyaandLazebnik,2017,Risinetal.,2014,Rusuetal.,2016]

LimitedtospecificsettingsFocusonimageclassification(rareco-occurrenceofoldandnew)

Lackofmethodsforincrementallearningofobjectdetectors


Anapproach

•  IncrementalLearningofObjectDetectorswithoutCatastrophicForgetting[Shmelkovetal.,2017]


OurApproach

•  Buildsonknowledgedistillation[Hintonetal.’14]

•  Knowledgedistillationoriginallyfor:–  learningasimplernetequivalenttoacomplexone,and–  producesadifferentlystructurednetwork

•  Wedistillthe“old”netwhenlearningnewclasses

[Shmelkov,Schmid,Alahari,ICCV2017] 58KA:IncrementalLearning

OurApproach•  Copyofnetwork(A)toevaluateproposals&memorizeoutputs

•  Newcomponent(B)tolearnthenewclasses•  Learnwithacombinationoflosses

[FastRCNN:Girshick2015]59KA:IncrementalLearning

•  MinimizeOurApproach

Lossoverclasses Localizationloss

Comparingthetwonetcomponents


•  MinimizeOurApproach

Lossoverclasses Localizationloss


Logitresponses Localization

Experimentalsetup•  Datasets:PASCALVOC2007andMSCOCO

•  A(n-m):initialtrainingonclassesfromntom(old)•  B(n-m):incrementaltrainingonclassesntom(new)addedall

atonce•  B(n)(m):incrementaltrainingforclassn,thenm,addedone

afteranother

•  Unbiaseddistillation:distillationproposalsareselectedcompletelyrandomly


Additionof10classes

•  VOC2007validationset

63.212.8

63.164.5


Additionof10classes


31.663.2


Additionof5classes


66.068.4


Additionof5classes


66.045.8


Additionof40classes

•  COCOminival(first5000validationimages)


IntermediateSummary

•  Nocatastrophicforgettingwith– distillationloss–  inparticular,biaseddistillation

•  AtECCV2018:Balancedfine-tuning


Summary•  Flavourofdifferentapproaches:




Lookingtothefuture

•  Desiderata–  Constantmemory–  Taskagnostic:Somerecentadvances[Raoetal.,NeurIPS’19]–  Forgettinggracefully–  Datasets


“Idon’tlikedatasets,it’smoreaproblemthanasolution”–heardatICCV2019

Summary•  Flavourofdifferentapproaches:




Seealso:Workshopon14thJuneContinualLearninginComputerVision

cvpr 2020 tutorial - thoth.inrialpes.frthoth.inrialpes.fr/~alahari/teaching/inclearn.pdf · deep...

Documents