Download - Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Review Series of Recent Deep Learning Papers:

Parameter Prediction Paper: Learning toLearn by gradient descent by gradient

descentMarcin Andrychowicz, Misha Denil, Sergio Gmez Colmenarejo, MatthewW. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de

FreitasNIPS 2016

Reviewed by : Arshdeep Sekhon

1Department of Computer Science, University of Virginiahttps://qdata.github.io/deep2Read/

August 25, 2018Reviewed by : Arshdeep Sekhon (University of Virginia)Review Series of Recent Deep Learning Papers:Parameter Prediction Paper: Learning to Learn by gradient descent by gradient descentAugust 25, 2018 1 / 10

https://qdata.github.io/deep2Read/

Page 2: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Optimizer

A standard machine learning algorithm

θ∗ = arg minθ

f (θ) (1)

A standard optimizer

θt + 1 = θt − αt∇f (θt) (2)

Reviewed by : Arshdeep Sekhon (University of Virginia)Review Series of Recent Deep Learning Papers:Parameter Prediction Paper: Learning to Learn by gradient descent by gradient descentAugust 25, 2018 2 / 10

Page 3: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Optimizer

A standard machine learning algorithm

θ∗ = arg minθ

f (θ) (1)

A standard optimizer

θt + 1 = θt − αt∇f (θt) (2)

Page 4: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Motivation

Optimization strategies tailored to different classes of tasks:

Deep Learning: High Dimensional, non convex optimization problemsAdagrad,RMSprop, Rprop, etc.Combinatorial Optimization: Relaxations

Generally, hand designed update rules.

Page 5: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Motivation

Optimization strategies tailored to different classes of tasks:

Deep Learning: High Dimensional, non convex optimization problemsAdagrad,RMSprop, Rprop, etc.Combinatorial Optimization: Relaxations

Generally, hand designed update rules.

Learning to Learn

Meta-learning / Learning to Learn

Replace hand designed update rules with learned update rule

θt+1 = θt + gt(∇(f (θt)), φ) (3)

gt is the optimizer with its own parameters

gt is a recurrent neural network that predicts update at each timestep,parameterised by φ

Learning to Learn

Meta-learning / Learning to Learn

Replace hand designed update rules with learned update rule

θt+1 = θt + gt(∇(f (θt)), φ) (3)

gt is the optimizer with its own parameters

gt is a recurrent neural network that predicts update at each timestep,parameterised by φ

Page 8: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Generalization for optimizers

Goal

Find an optimizer with learned updates (instead of hand designed updates)that performs well on a class of optimization problems.

Generalization in machine learning: Capacity to make predictions aboutthe target at novel unseen points

Generalization in ’Learning to Learn’ context: Optimizer should performwell in unseen problems of the same ’type’:

transfer learning

Page 9: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Goal

transfer learning

Page 10: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Goal

transfer learning

Page 11: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Goal

transfer learning

Page 12: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

Learning to Learn with Recurrent Neural Networks

1 When can you say an optimizer is good?2 θ∗ (optimal parameters) is a function of the class of functions f being

optimized and the optimizer parameters

θ∗(f , φ) (4)

3 Expected Loss

L(φ) = Ef

[f (θ∗(f , φ))

](5)

4 Expected Loss with RNN optimizer

L(φ) = Ef

[ T∑t=1

wt f (θt)]

(6)

θt+1 = θt + gt (7)(gt

ht+1

)= m(∇t , ht , φ)

5 Generate updates at each timestep using a recurrent neural network.

Page 13: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

θ∗(f , φ) (4)

3 Expected Loss

L(φ) = Ef

[f (θ∗(f , φ))

](5)

L(φ) = Ef

[ T∑t=1

wt f (θt)]

(6)

θt+1 = θt + gt (7)(gt

ht+1

)= m(∇t , ht , φ)

Page 14: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

θ∗(f , φ) (4)

3 Expected Loss

L(φ) = Ef

[f (θ∗(f , φ))

](5)

L(φ) = Ef

[ T∑t=1

wt f (θt)]

(6)

θt+1 = θt + gt (7)(gt

ht+1

)= m(∇t , ht , φ)

Page 15: Review Series of Recent Deep Learning Papers: Parameter ... · Learning to Learn Meta-learning / Learning to Learn Replace hand designed update rules with learned update rule t+1

θ∗(f , φ) (4)

3 Expected Loss

L(φ) = Ef

[f (θ∗(f , φ))

](5)

L(φ) = Ef

[ T∑t=1

wt f (θt)]

(6)

θt+1 = θt + gt (7)(gt

ht+1

)= m(∇t , ht , φ)

5 Generate updates at each timestep using a recurrent neural network.Reviewed by : Arshdeep Sekhon (University of Virginia)Review Series of Recent Deep Learning Papers:Parameter Prediction Paper: Learning to Learn by gradient descent by gradient descentAugust 25, 2018 6 / 10