cvpr 2020 tutorial - thoth.inrialpes.frthoth.inrialpes.fr/~alahari/teaching/inclearn.pdf · deep...
TRANSCRIPT
IncrementalLearning
KarteekAlahariInria,France
CVPR2020TutorialTowardsAnnotation-EfficientLearning
IncrementalLearning?
• Continuallearning
• Lifelonglearning
• Sequentiallearning
• Never-endingLearning2KA:IncrementalLearning
AnIncrementalLearningScenario
• GrowingupinIndia
3KA:IncrementalLearning
AnIncrementalLearningScenario
• Andthenduringtravels
4KA:IncrementalLearning
AnIncrementalLearningScenario
• Andthenduringtravels
Butcanstillrememberholycows!
5KA:IncrementalLearning
Slidecredit:Hung-yiLee6KA:IncrementalLearning
TRAIN–VALIDATION–TEST
Allsampledfromthesamedistribution->benchmarksandacademicdatasets->real-worldsystems->embodiedlearning
StandardMachineLearning
Slidecredit:T.Tuytelaars7KA:IncrementalLearning
Slidecredit:T.Tuytelaars8KA:IncrementalLearning
- Task-incrementallearning- Class-incrementallearning- Domain-incrementallearning
IncrementalLearningSetup
Slidecredit:T.Tuytelaars9KA:IncrementalLearning
IncrementalLearning• Aclassicalprobleminmachinelearning,e.g.,
[Carpenteretal.’92,CauwenberghsandPoggio’00,Polikaretal.’01,SchlimmerandFisher’86,Thrun’96]
• Somemethods
– Zero-shotlearning,e.g.,[Lampertetal.’13]Notrainingstepforunseenclasses
– Continuouslyupdatethetrainingset,e.g.,[Chenetal.’13]Keepdataandretrain
– Useafixeddatarepresentation,e.g.,[Mensinketal.’13]Simplifythelearningproblem 10KA:IncrementalLearning
Originalmodel
Jointtraining“golden”baseline
Figuresfrom[LiandHoiem2016]
BruteForceSolution(non-incremental)
11KA:IncrementalLearning
Retrainfullmodelwithbotholdandnewdata• Computationallyexpensive
• Needsaccesstoolddata• Storagecapacitylimitations• Privacyissues• Scalabilityissues
FeatureExtraction
Classification
…
BruteForceSolution(non-incremental)
Slidecredit:T.Tuytelaars 12KA:IncrementalLearning
Whynotbruteforce?
• Noaccesstoallthedata
• Cannotstoreallthedata
• Accesstoonlyapreviouslylearnedmodel,e.g.,trainedbyothers
13KA:IncrementalLearning
Originalmodel
Featureextraction
Figuresfrom[LiandHoiem2016]
NaïveSolution1
14KA:IncrementalLearning
• Finetuneonlylastlayerusingnewdataonly– Leadstosuboptimalresults
FeatureExtraction
Classification
…
NaïveSolution1
Slidecredit:T.Tuytelaars15KA:IncrementalLearning
Originalmodel
Finetuning
Figuresfrom[LiandHoiem2016]
NaïveSolution2
16KA:IncrementalLearning
NaïveSolution2
• Finetunethenetworkusingnewdataonly– Leadstocatastrophicforgetting
FeatureExtraction
Classification
…
Slidecredit:T.Tuytelaars17KA:IncrementalLearning
IncrementalLearning:ComputerVisionTask
A
B
18
Baseline,i.e.,trainingwithalltheclasses
Trainingwiththeinitialsetofclasses
Trainingwiththenewsetofclasses
HowwelldoesnetworkBperform?
12.868.4
64.571.3
38.769.8
Noguidanceforretainingtheoldclasses19KA:IncrementalLearning[Catastrophicforgetting:McCloskeyandCohen1989,Ratcliff1990]
• Learnonetaskaftertheother• Withoutstoring(many)datafromprevioustasks• Withoutmemoryfootprintgrowing(significantly)overtime• Without(completely)forgettingoldtasks
IncrementalLearning:TheRules!
Slidecredit:T.Tuytelaars
Task1 Task2 Task3 Taskn…
20KA:IncrementalLearning
Whatelsewillweseetoday?• Flavourofdifferentapproaches:
1. Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2. Rehearsal/Replay:iCaRL,DGR,GEM,…3. Architecturebased:PackNet,progressivenets,HAT,…
• Morethanclassification?
• Takeaways21KA:IncrementalLearning
Whatelsewillweseetoday?• Flavourofdifferentapproaches:
1. Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2. Rehearsal/Replay:iCaRL,DGR,GEM,…3. Architecturebased:PackNet,progressivenets,HAT,…
• Morethanclassification?
• Takeaways22KA:IncrementalLearning
Regularization-basedModels
• Whentraininganewtask,– addaregularizationtermtotheloss– i.e.,termtopenalizecatastrophicforgetting
• R1:data-focusedmethods• R2:model/prior-focusedmethods
Slidecredit:T.Tuytelaars23KA:IncrementalLearning
Data-focusedRegularization:LearningwithoutForgetting
• Knowledgedistillationloss– i.e.,preservationofresponses
FeatureExtraction
Classification
…Newtaskinput
Newtaskannotations
Previousmodel’soutputforoldtasks
24KA:IncrementalLearning Slidecredit:T.Tuytelaars[Li&Hoiem2016]
Data-focusedRegularization:LearningwithoutForgetting
25KA:IncrementalLearning[Li&Hoiem2016]
Simplemethod;goodresultsforrelatedtasks
Poorresultsforunrelatedtasks
? Needtostoretheoldmodel
• Penalizechangesto‘important’parameters
Model-focusedRegularization
26KA:IncrementalLearning
Lossonnewtask(s) Regularization
Differentvariantspossiblefor“importance”andregularization
• Elasticweightconsolidation[Kirkpatricketal.,2017]– Indiv.penaltyforeachprevioustask– Fisherinformationmatrixfor
Model-focusedRegularization
27KA:IncrementalLearningFigurefrompaper
• Elasticweightconsolidation[Kirkpatricketal.,2017]– Indiv.penaltyforeachprevioustask– Fisherinformationmatrixfor
Model-focusedRegularization
28KA:IncrementalLearning
Agnostictoarchitecture;Goodresultsempirically
Onlyvalidlocally
? Needtostoreimportanceweights
• Memoryawaresynapses[Aljundietal.,2018]– Considersonlytheprevioustask– Changeingradientsfor
Model-focusedRegularization
29KA:IncrementalLearningFigurefrompaper
• Memoryawaresynapses[Aljundietal.,2018]– Considersonlytheprevioustask– Changeingradientsfor
Model-focusedRegularization
30KA:IncrementalLearning
Agnostictoarchitecture;Leveragesdata&output
Onlyvalidlocally
? Needtostoreimportanceweights
• Twoexamples– Elasticweightconsolidation[Kirkpatricketal.,2017]– Memoryawaresynapses[Aljundietal.,2018]
• Otheralternatives– PathIntegral/SynapticIntelligence:largechangesduringtraining[Zenkeetal.,2017]
– Momentmatching[Leeetal.,2017]– Pathnet[Fernandoetal.,2017]– …
Model-focusedRegularization
31KA:IncrementalLearning
Whatelsewillweseetoday?• Flavourofdifferentapproaches:
1. Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2. Rehearsal/Replay:iCaRL,DGR,GEM,…3. Architecturebased:PackNet,progressivenets,HAT,…
• Morethanclassification?
• Takeaways32KA:IncrementalLearning
Rehearsal/Replay-basedmethods
• Storeacoupleofexamplesfromprevioustasks
• Orproducesamplesfromagenerativemodel
• But– Howmany?– Howtoselectthem?– Howtousethem?
33KA:IncrementalLearning
iCaRL:Incrementalclassifierandrepresentationlearning
• Selectssamplesthatareclosesttothefeaturemeanofeachclass
• Knowledgedistillationloss[Hintonetal.’14]
• Cleveruseofavailablememory(seethefollowing)
34KA:IncrementalLearning[Rebuffietal.2017]
iCaRL:Incrementalclassifierandrepresentationlearning
35KA:IncrementalLearning[Rebuffietal.2017]
Splittheprobleminto:• learningfeatures,andthen• usingNCMclassifier
iCaRL:Incrementalclassifierandrepresentationlearning
36KA:IncrementalLearning[Rebuffietal.2017]
iCaRL:Incrementalclassifierandrepresentationlearning[Rebuffietal.’17]
37KA:IncrementalLearning[Rebuffietal.2017]
Classificationloss
Distillationloss:Comparingoldvsnew
iCaRL:Incrementalclassifierandrepresentationlearning
38KA:IncrementalLearning[Rebuffietal.2017]
Cleveruseofavailablememory
Potentialissueswithstoringdata,e.g.,privacy
Limitedbythememorycapacity(themorethebetter)
Whatelsewillweseetoday?• Flavourofdifferentapproaches:
1. Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2. Rehearsal/Replay:iCaRL,DGR,GEM,…3. Architecturebased:PackNet,progressivenets,HAT,…
• Morethanclassification?
• Takeaways39KA:IncrementalLearning
● Themodel“Scholar”iscomposedof:● agenerator+asolver(classifier)
● Thegeneratorandthesolverareupdatedineveryincrementalstep
DeepGenerativeReplay
[Shinetal.2017]Figurefromthepaper
Scholar
40KA:IncrementalLearning
Trainingprocedure:
● Attaskt,wetrainanewScholar● withdatafromthetaskt,and
● datageneratedbythepreviouslytrainedScholarattaskt-1
DeepGenerativeReplay
[Shinetal.2017]Figurefromthepaper Slidecourtesy:A.Massenet41KA:IncrementalLearning
DeepGenerativeReplay
Slidecourtesy:A.Massenet[Shinetal.2017]Figurefromthepaper
Trainingprocedure(Generator):
● Withdatafromtaskt,and
● datageneratedbythepreviouslytrainedScholarfortaskt-1
42KA:IncrementalLearning
Target = Label
DeepGenerativeReplay
Slidecourtesy:A.Massenet[Shinetal.2017]Figurefromthepaper
Trainingprocedure(Solver):
● Withdatafromtaskt,and
● DatafromgeneratorandsolverofthepreviouslytrainedScholarfortaskt-1
43KA:IncrementalLearning
DeepGenerativeReplay
Slidecourtesy:A.Massenet
Avoidsmemoryissues
Accumulationoferrors
Nocontrolovertheclassofthegeneratedsamples
[Shinetal.2017]Figurefromthepaper 44KA:IncrementalLearning
Whatelsewillweseetoday?• Flavourofdifferentapproaches:
1. Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2. Rehearsal/Replay:iCaRL,DGR,GEM,…3. Architecturebased:PackNet,progressivenets,HAT,…
• Morethanclassification?
• Takeaways45KA:IncrementalLearning
Architecture-based
PackNet[Mallya&Lazebnik’17]Figurefromthepaper 46KA:IncrementalLearning
Fixedmemoryconsumption
Needsthetotalnumberoftasks
Avoidsforgetting
Architecture-based
PackNet[Mallya&Lazebnik’17] 47KA:IncrementalLearning
AComparativeAnalysis• TinyImagenet:small,balanced,class-incremental• iNaturalist:large-scale,unbalanced,task-incremental
• Fairwayofsettinghyperparameters(stability-plasticitytradeoff)
48KA:IncrementalLearning[Langeetal.,2020]
ComparativeEvaluation(TinyImagenet)
Imagecredit:[Langeetal.,2020] 49KA:IncrementalLearning
Regularization&Architecturebased
ComparativeEvaluation(TinyImagenet)
50KA:IncrementalLearning
Rehearsal/ReplaybasedImagecredit:[Langeetal.,2020]
GeneralTrends
• Rehearsal/replaybasedmethodsonlypayoffwhenstoringsignificantamountofexemplars
• PackNetresultsinno-forgettingandproducestopresults
• MASmorerobustthanEWC
Slidecredit:T.Tuytelaars 51KA:IncrementalLearning
WhatkindofmodelshouldIuse?
• Largermodelsgivemorecapacity(but:overfitting)• Wideisbetterthandeep
• Regularizationmayinterferewithincrementallearning• Dropoutusuallybetterthanweightdecay
52KA:IncrementalLearningSlidecredit:T.Tuytelaars
Whatelsewillweseetoday?• Flavourofdifferentapproaches:
1. Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2. Rehearsal/Replay:iCaRL,DGR,GEM,…3. Architecturebased:PackNet,progressivenets,HAT,…
• Morethanclassification?
• Takeaways53KA:IncrementalLearning
MitigateCatastrophicForgetting• Learningwithoutforgetting[LiandHoiem2016]
• iCaRL[Rebuffietal.2017]
old model’s TasksdefinedonanewdatasetFocusonimageclassification(rareco-occurrenceofoldandnew)
DecoupleclassifierandfeaturelearningRelyonasubsetoftheolddata
54KA:IncrementalLearning
MitigateCatastrophicForgetting• Elasticweightconsolidation[Kirkpatricketal.,2017]– Selectivelyslowingdownlearningonweights
• Otherattempts,e.g.,[Aljundietal.,2018,Jungetal.,2016,MallyaandLazebnik,2017,Risinetal.,2014,Rusuetal.,2016]
LimitedtospecificsettingsFocusonimageclassification(rareco-occurrenceofoldandnew)
55KA:IncrementalLearning
MitigateCatastrophicForgetting• Elasticweightconsolidation[Kirkpatricketal.,2017]– Selectivelyslowingdownlearningonweights
• Otherattempts,e.g.,[Jungetal.,2016,MallyaandLazebnik,2017,Risinetal.,2014,Rusuetal.,2016]
LimitedtospecificsettingsFocusonimageclassification(rareco-occurrenceofoldandnew)
Lackofmethodsforincrementallearningofobjectdetectors
56KA:IncrementalLearning
Anapproach
• IncrementalLearningofObjectDetectorswithoutCatastrophicForgetting[Shmelkovetal.,2017]
57KA:IncrementalLearning
OurApproach
• Buildsonknowledgedistillation[Hintonetal.’14]
• Knowledgedistillationoriginallyfor:– learningasimplernetequivalenttoacomplexone,and– producesadifferentlystructurednetwork
• Wedistillthe“old”netwhenlearningnewclasses
[Shmelkov,Schmid,Alahari,ICCV2017] 58KA:IncrementalLearning
OurApproach• Copyofnetwork(A)toevaluateproposals&memorizeoutputs
• Newcomponent(B)tolearnthenewclasses• Learnwithacombinationoflosses
[FastRCNN:Girshick2015]59KA:IncrementalLearning
• MinimizeOurApproach
Lossoverclasses Localizationloss
Comparingthetwonetcomponents
60KA:IncrementalLearning
• MinimizeOurApproach
Lossoverclasses Localizationloss
61KA:IncrementalLearning
Logitresponses Localization
Experimentalsetup• Datasets:PASCALVOC2007andMSCOCO
• A(n-m):initialtrainingonclassesfromntom(old)• B(n-m):incrementaltrainingonclassesntom(new)addedall
atonce• B(n)(m):incrementaltrainingforclassn,thenm,addedone
afteranother
• Unbiaseddistillation:distillationproposalsareselectedcompletelyrandomly
62KA:IncrementalLearning
Additionof10classes
• VOC2007validationset
63.212.8
63.164.5
63KA:IncrementalLearning
Additionof10classes
• VOC2007validationset
31.663.2
64KA:IncrementalLearning
Additionof5classes
• VOC2007validationset
66.068.4
65KA:IncrementalLearning
Additionof5classes
• VOC2007validationset
66.045.8
66KA:IncrementalLearning
Additionof40classes
• COCOminival(first5000validationimages)
67KA:IncrementalLearning
IntermediateSummary
• Nocatastrophicforgettingwith– distillationloss– inparticular,biaseddistillation
• AtECCV2018:Balancedfine-tuning
68KA:IncrementalLearning
Summary• Flavourofdifferentapproaches:
1. Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2. Rehearsal/Replay:iCaRL,DGR,GEM,…3. Architecturebased:PackNet,progressivenets,HAT,…
• Morethanclassification?
• Takeaways69KA:IncrementalLearning
Lookingtothefuture
• Desiderata– Constantmemory– Taskagnostic:Somerecentadvances[Raoetal.,NeurIPS’19]– Forgettinggracefully– Datasets
70KA:IncrementalLearning
“Idon’tlikedatasets,it’smoreaproblemthanasolution”–heardatICCV2019
Summary• Flavourofdifferentapproaches:
1. Regularizationbased:LwF,EBLL,EWC,SI,MAS,IMM,…2. Rehearsal/Replay:iCaRL,DGR,GEM,…3. Architecturebased:PackNet,progressivenets,HAT,…
• Morethanclassification?
• Takeaways71KA:IncrementalLearning
Seealso:Workshopon14thJuneContinualLearninginComputerVision