repeat before forgetting: spaced repetition for efficient ... · repeat before forgetting: spaced...
TRANSCRIPT
![Page 1: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/1.jpg)
Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks
Computational Health Informatics Program Harvard University Hadi Amiri, Timothy Miller, Guergana Savova [email protected]
![Page 2: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/2.jpg)
Motivation • Efficiency challenge ▫ Large amount of data is usually required to train
effective neural models ▫ Long turnaround time � E.g. Google’s neural MT system takes three weeks to
be trained with 100 GPUs
2 Wu et al., 2016; Hirschberg and Manning, 2015
![Page 3: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/3.jpg)
Motivation • Efficiency challenge ▫ Large amount of data is usually required to train
effective neural models ▫ Long turnaround time � E.g. Google’s neural MT system takes three weeks to
be trained with 100 GPUs
3 Wu et al., 2016; Hirschberg and Manning, 2015
Develop training paradigms that lead to efficient and also effective training of neural models.
![Page 4: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/4.jpg)
Motivation • Human Memory ▫ Memory retention / Recall
Probability � Prob. that a human recalls
a previously-seen item declines in time, especially if there is no attempt to retain information
4 Ebbinghaus, 1913
![Page 5: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/5.jpg)
Motivation • Human Memory ▫ Memory retention / Recall
Probability • Spaced Repetition ▫ Human learners can learn
efficiently and effectively by increasing intervals of time between subsequent reviews of previously learned materials.
5 Ebbinghaus, 1913; Dempster, 1989; Cepeda et al., 2006
![Page 6: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/6.jpg)
Motivation • Recall Indicators
� Delay since last review of the item, � Lower recall probability for longer delays
� Difficulty of the item, � Lower recall probability for more difficult items
� Strength of the human memory � Higher recall probability for stronger memories.
6 Cepeda et al., 2006; Averell and Heathcote, 2011, Reddy et al., 2016
![Page 7: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/7.jpg)
Research Question • Can we develop memory model that can accurately
estimate the time by which an item might be forgotten by a learner? ▫ Schedule a review for the learner before that time for
effective and efficient learning. � More effective: the learner reviews materials before they
are forgotten, and � More efficient: the learner avoids reviewing materials
that he/she already knows.
7
![Page 8: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/8.jpg)
Research Question • Can we develop memory model that can accurately
estimate the time by which an item might be forgotten by a learner? ▫ Schedule a review for the learner before that time for
effective and efficient learning.
8
![Page 9: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/9.jpg)
Key Questions • Q1: Is there an analogy between training neural
networks and findings in psychology about human memory model?
9
![Page 10: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/10.jpg)
Key Questions • Q1: Is there an analogy between training neural
networks and findings in psychology about human memory model?
• Q2: Could we use spaced repetition to efficiently and effectively train neural networks?
10
![Page 11: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/11.jpg)
Q1 Is there any analogy between training neural networks and findings in psychology about human memory model?
11
![Page 12: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/12.jpg)
Q1: Recall Indicators & Net Retention
• Directly evaluate the effect of the recall indicators on memory retention in neural networks
12
A: base set B: review set C: replacement set
80% 10% 10%
![Page 13: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/13.jpg)
Q1: Delay Since Last Review • Keep recall point fixed • Move sliding window
13
Inverse relation between network retention and delay since last review in neural networks
![Page 14: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/14.jpg)
Q1: Item Difficulty • Loss at last review
point
14
Inverse relation between network retention and item difficulty
![Page 15: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/15.jpg)
Q1: Network Strength • Accuracy on validation
data at recall point • Keep delay fixed • Increase Rec
15
Direct relation between network retention and network strength
![Page 16: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/16.jpg)
Q1: Recall Indicators & Net Retention
• Delay Effect ▫ Neural networks forget training examples after a
certain period of intervening training data • Item Difficulty Effect ▫ The period of recall is shorter for more difficult
examples, and • Network Strength Effect ▫ Recall improves as networks achieve better overall
performance.
16
![Page 17: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/17.jpg)
Q2 Could we use spaced repetition to efficiently and effectively train neural networks?
17
![Page 18: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/18.jpg)
Q2: Spaced Repetition • Challenge ▫ Estimating the time by which a training instance
should be reviewed! � Ideally before it is forgotten, � Ideally avoid early reviews.
18
![Page 19: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/19.jpg)
Q2: Spaced Repetition • Solution ▫ Use recall indicators to lengthen or shorten review
intervals with respect to individual learners and training instances
19
![Page 20: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/20.jpg)
Q2: Spaced Repetition • Solution ▫ Use recall indicators to lengthen or shorten review
intervals with respect to individual learners and training instances � Delay: epochs to next review of instance � Item Difficulty: Loss of network for instance � Network Strength: performance on validation data.
20
di: loss of network for instance i ti: epochs to next review of i se: performance of network— on validation data—at epoch e.
![Page 21: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/21.jpg)
Q2: Spaced Repetition • Use density kernels as schedulers that favor less
difficult training instances in stronger networks.
21
di: loss of network for instance i ti: epochs to next review of i se: performance of network— on validation data—at epoch e.
![Page 22: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/22.jpg)
Q2: Spaced Repetition
22
Training data
epoch e e+1
![Page 23: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/23.jpg)
Q2: Spaced Repetition
23
Current batch
Delayed batch
Training data
t<=1
t>1
epoch e e+1
![Page 24: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/24.jpg)
Q2: Spaced Repetition
24
Current batch
Delayed batch
Training data
t<=1
t>1
Train network
epoch e e+1
![Page 25: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/25.jpg)
Evaluation • We evaluate schedulers based on ▫ Scheduling accuracy � accuracy in estimating network retention with respect to
previously-seen instances
▫ Effect on the efficiency � Time required to train the network (running time) � Number of instances used for training per epoch
▫ Effect on the effectiveness � Accuracy of the downstream network on test data
25
![Page 26: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/26.jpg)
Scheduling accuracy
26
e e + t
t
Scheduler predicts a delay t for an item, i.e. t epochs to next review of the item
evaluate network retention with respect to the item at epoch e + t
A good scheduler accurately delays more items.
![Page 27: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/27.jpg)
Scheduling accuracy
27
RbF kernels accurately delay substantial amount of instances per epoch.
![Page 28: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/28.jpg)
Efficiency & Effectiveness
28
• RbF kernels are 2.9-4.8 times faster than standard training. • RbF kernels show comparable training accuracy to standard training.
![Page 29: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/29.jpg)
Efficiency & Effectiveness
29
CL starts small and gradually increases the amount of training data by adding harder instances into its training set
RbF with high recall confidence thr start big and gradually delays reviewing instances that the networks have learned.
![Page 30: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/30.jpg)
Conclusion 1. Memory retention in neural networks is affected
by the same (known) factors that affect memory retention in humans
2. A novel training paradigm for neural networks based on spaced repetition
3. Can be applied without modification to any neural network.
30
![Page 31: Repeat before Forgetting: Spaced Repetition for Efficient ... · Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks Computational](https://reader034.vdocuments.us/reader034/viewer/2022052320/5f14601273410638cf37b16b/html5/thumbnails/31.jpg)
Future Work • Could training paradigms benefit from each other? ▫ Predict easiness of novel training instances to inform
CL ▫ Incorporate Leitner’s queuing mechanism in CL or
RbF. • Could recall confidence parameter be learned
dynamically? ▫ E.g. with respect to network behavior?
• Theoretical analysis on their lower and upper bounds ▫ More flexible delay functions.
31