ece 6504: deep learning for perception dhruv batra virginia tech topics: –recurrent neural...

25
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: Recurrent Neural Networks (RNNs) BackProp Through Time (BPTT) Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial

Upload: beverly-melton

Post on 18-Jan-2016

223 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

ECE 6504: Deep Learningfor Perception

Dhruv Batra

Virginia Tech

Topics: – Recurrent Neural Networks (RNNs)– BackProp Through Time (BPTT)– Vanishing / Exploding Gradients– [Abhishek:] Lua / Torch Tutorial

Page 2: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Administrativia• HW3

– Out today– Due in 2 weeks– Please please please please please start early– https://computing.ece.vt.edu/~f15ece6504/homework3/

(C) Dhruv Batra 2

Page 3: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Plan for Today• Model

– Recurrent Neural Networks (RNNs)

• Learning – BackProp Through Time (BPTT)– Vanishing / Exploding Gradients

• [Abhishek:] Lua / Torch Tutorial

(C) Dhruv Batra 3

Page 4: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

New Topic: RNNs

(C) Dhruv Batra 4Image Credit: Andrej Karpathy

Page 5: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Synonyms• Recurrent Neural Networks (RNNs)

• Recursive Neural Networks– General familty; think graphs instead of chains

• Types:– Long Short Term Memory (LSTMs)– Gated Recurrent Units (GRUs)– Hopfield network– Elman networks– …

• Algorithms– BackProp Through Time (BPTT)– BackProp Through Structure (BPTS)

(C) Dhruv Batra 5

Page 6: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

What’s wrong with MLPs?• Problem 1: Can’t model sequences

– Fixed-sized Inputs & Outputs– No temporal structure

• Problem 2: Pure feed-forward processing– No “memory”, no feedback

(C) Dhruv Batra 6Image Credit: Alex Graves, book

Page 7: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Sequences are everywhere…

(C) Dhruv Batra 7Image Credit: Alex Graves and Kevin Gimpel

Page 8: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

(C) Dhruv Batra 8

Even where you might not expect a sequence…

Image Credit: Vinyals et al.

Page 9: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Even where you might not expect a sequence…

9Image Credit: Ba et al.; Gregor et al

• Input ordering = sequence

(C) Dhruv Batra

Page 10: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

(C) Dhruv Batra 10Image Credit: [Pinheiro and Collobert, ICML14]

Page 11: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Why model sequences?

Figure Credit: Carlos Guestrin

Page 12: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Why model sequences?

(C) Dhruv Batra 12Image Credit: Alex Graves

Page 13: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Name that model

Hidden Markov Model (HMM)

Y1 = {a,…z}

X1 =

Y5 = {a,…z}Y3 = {a,…z} Y4 = {a,…z}Y2 = {a,…z}

X2 = X3 = X4 = X5 =

Figure Credit: Carlos Guestrin(C) Dhruv Batra 13

Page 14: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

How do we model sequences?• No input

(C) Dhruv Batra 14Image Credit: Bengio, Goodfellow, Courville

Page 15: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

How do we model sequences?• With inputs

(C) Dhruv Batra 15Image Credit: Bengio, Goodfellow, Courville

Page 16: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

How do we model sequences?• With inputs and outputs

(C) Dhruv Batra 16Image Credit: Bengio, Goodfellow, Courville

Page 17: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

How do we model sequences?• With Neural Nets

(C) Dhruv Batra 17Image Credit: Alex Graves

Page 18: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

How do we model sequences?• It’s a spectrum…

(C) Dhruv Batra 18

Input: No sequence

Output: No sequence

Example: “standard”

classification /

regression problems

Input: No sequence

Output: Sequence

Example: Im2Caption

Input: Sequence

Output: No sequence

Example: sentence classification,

multiple-choice question answering

Input: Sequence

Output: Sequence

Example: machine translation, video captioning, open-ended question answering, video question answering

Image Credit: Andrej Karpathy

Page 19: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Things can get arbitrarily complex

(C) Dhruv Batra 19Image Credit: Herbert Jaeger

Page 20: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Key Ideas• Parameter Sharing + Unrolling

– Keeps numbers of parameters in check– Allows arbitrary sequence lengths!

• “Depth”– Measured in the usual sense of layers– Not unrolled timesteps

• Learning– Is tricky even for “shallow” models due to unrolling

(C) Dhruv Batra 20

Page 21: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Plan for Today• Model

– Recurrent Neural Networks (RNNs)

• Learning – BackProp Through Time (BPTT)– Vanishing / Exploding Gradients

• [Abhishek:] Lua / Torch Tutorial

(C) Dhruv Batra 21

Page 22: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

BPTT• a

(C) Dhruv Batra 22Image Credit: Richard Socher

Page 23: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Illustration [Pascanu et al]• Intuition

• Error surface of a single hidden unit RNN; High curvature walls• Solid lines: standard gradient descent trajectories• Dashed lines: gradient rescaled to fix problem

(C) Dhruv Batra 23

Page 24: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Fix #1• Pseudocode

(C) Dhruv Batra 24Image Credit: Richard Socher

Page 25: ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding

Fix #2• Smart Initialization and ReLus

– [Socher et al 2013]– A Simple Way to Initialize Recurrent Networks of Rectified

Linear Units, Le et al. 2015

(C) Dhruv Batra 25