recurrent neural networks lecture 11 - part a · 2nd day 3th day 4th day 5th day 6th day. 1st day...

Post on 21-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Recurrent Neural NetworksLecture 11 - Part A

Yaniv Bronhaim 11/6/2018

Outline

- Feedforward networks revisit

- The structure of Recurrent Neural Networks (RNN)

- RNN Architectures

- Bidirectional RNNs and Deep RNNs

- Backpropagation through time (BPTT)

- Natural Language Processing example

- “The unreasonable effectiveness” of RNNs (Andrej Karpathy)

- RNN Interpretations - Neural science with RNNs

- Image captioning with ConvNets and RNNs

- Summary

Feedforward network

InputY - Prediction (classification\regression)

A[x] - Network State in hidden layer x

W[x] - Network parameters for hidden layer x

b[x] - Bayes for layer x

Input -

Presentation

for valid

inputs

Feedforward networks

What will David do tonight?

Feedforward networks

What will David do tonight?

Party Sleep Trainin

g

Possible activities:

1

0

0

0

1

0

0

0

1

Feedforward networks

Possible Inputs:

Sunny

Day

Rainy

Day0

11

0

NN

F(Sunny Day) = Party

0.8

0.2

0

Training

over time1

0

0

Expected

score

NN

F(Rainy day) = Sleep

0.2

0.7

0.1

0

1

0

Expected

score

1

0

0

1

W1

0X =

1

0

0

Feedforward Neural Network Mission

f(x,W) = Wx

x

0.7

0.2

0.1

Let’s look on sequential data

Every morning David decides what to do -

Running in the gym

Riding on bicycles

Swimming

- Every new day David does the next

activity in his activities options, by

order.

Let’s look on sequential data

- Unless it’s rainy day. In that case David

stays to sleep instead of his daily

practice

- When the sun comes out again I do the

next activity since my last train.

First solution - Using Yesterday’s data in FFN

Sunny dayYesterday’s

activity

+

Today’s activity

First solution - Using Yesterday’s data in FFN

Sunny dayYesterday’s

activity

+

Today’s activity

+

+

First solution - Problem

Sunny dayYesterday’s

activity

+

Today’s activity

+

+

1st day

2nd day

3th day

4th day

5th day

6th day

1st day

2nd day

Func

Func

As long as we know the activity of the last sunny day

2nd day

3rd day

Func

Func

But we know only yesterday

2nd day

3rd day

Func

Func

The “Func” output includes also data from the past

+

Sequential data

- The input is a sequence x of vectors (Xt is the vector at

time t) + the output from previous run with “history” (HOW:

Hidden layer is looped back from the past into the future)

- Output is a softmax layer predicting the next activity

RNN Equations

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Same function and same

parameters are passed in each

timestep t

Recurrent Neural Network Computational Graph

Reusing same weight matrix every time step

Recurrent Neural Network

- W is shared across time - reduces the number of parameters

- Hidden state == Memory

- “temporal size” of sequences

RNN Architectures

Taken from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Image

Caption Sentiment Sequence to sequence POSClassification

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

RNN: many to one

RNN: one to many

RNN: many to many

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Multi Layer RNNs

More leaning capacity

Bidirectional RNNs

- I want to go to school\college, I have a lesson at 8 o’clock

- We might want to consider words that appear after the

word in focus

Backpropagation through time (BPTT)

Truncated Backpropagation through time (BPTT)

Process only

chunk of

sequence and

backprop to

update W

Carry hidden states

forward in time

forever, but only

backpropagate for

some smaller

number of steps

Truncated Backpropagation through time (BPTT)

Vocabulary - “H”, “E”, “L”, “O”

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Concrete example - Character-level language model

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Concrete example - Character-level language model

4 separate training examples:

1. The probability of “e” should be

likely given the context of “h”.

2. “l” should be likely in the context

of “he”.

3. “l” should also be likely given the

context of “hel”.

4. “o” should be likely given the

context of “hell”.

Generating Sequences

- Feed-back the sample

character to the model to

generate a sentence

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

<START>

Training this on a lot of sentences would give us a language

model. A way to predict:

Continue until <END>..

Generated Latex notes

http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

Generated Latex notes

http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

RNN Bible

https://twitter.com/rnn_bible

What can we learn from the internal state of

specific cells in the recurrent network

Interpretation - Neural Science With

RNN

Interpretation - Neural Science With RNN

Red = -1

Blue = 1

White = 0

As we process text we pick particular cell and

visualize it’s activation - looking at the firing rate

of the cell as we read the text

Interpretation - Neural Science With RNN

Generated C code - Trained on Linux kernel code

https://github.com/karpathy/char-rnn

Interpretation - Neural Science With RNN

Interpretation - Neural Science With RNN

Interpretation - Neural Science With RNN

Recurrent Neural Networks for Folk Music Generation

https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-

networks-for-folk-music-generation/

https://imgur.com/gallery/u76wY

Image captioning with ConvNets and RNNs

- Back to CNN - How RNN is integrated for

application related to image processing?

Image captioning with ConvNetsAnd RNNs

- Convolutional Networks express a single

differentiable function from raw image pixel

values to class probabilities

“VGGNet” or “OxfordNet”

(5 conv layers and 4 pooling layers)

“Very Deep Convolutional Networks for Large-Scale Visual Recognition” [Simonyan

and Zisserman, 2014]

- We use the FC-4096 layer as the image

representation and push it to RNNs which

generate sentences as we saw before

Image captioning with ConvNets and RNNs

- First caption input will be constant string -

<START>

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

- Generating first word in caption

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

- We use Y0 as the input for next iteration

X1

H1

Y1

With

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X1

H1

Y1

With

- continue until Yt = <END>

X2

H2

Y2

a

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X1

H1

Y1

With

- continue until Yt = <END>

X2

H2

Y2

a

X3

H3

Y3

Dog

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X1

H1

Y1

With

- continue until Yt = <END>

X2

H2

Y2

a

X3

H3

Y3

X4

H4

Y4

Dog <END>

X0 =

<START>

H0

Image captioning with ConvNets and RNNs

Wih

Y0

Man

X1

H1

Y1

With

- continue until Yt = <END>

X2

H2

Y2

a

X3

H3

Y3

X4

H4

Y4

Dog <END>

Summary

- RNN declaration and architectures

- Bits about language processing and RNN effectiveness

- Applications based RNNs

- Integrating with ConvNets

- Next - More advanced memory with Long Short Term

Memory (LSTM) and many more applications based RNNs

References

- Stanford CS231- Fei-Fei & Justin Johnson & Serena Yeung. Lecture 10

- https://deeplearning4j.org/lstm.html

- Coursera, Machine Learning course by Andrew Ng.

- https://karpathy.github.io/2015/05/21/rnn-effectiveness/

- https://arxiv.org/pdf/1406.6247.pdf

- Udacity - Deep learning, by Luis Serrano

- https://medium.com/syncedreview/a-brief-overview-of-attention-mechanism-13c578ba9129

- http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

- https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-networks-for-folk-music-generation/

- NLP course (IDC) - Kfir Bar - NLM lecture

- https://cs.stanford.edu/people/karpathy/deepimagesent/

- https://arxiv.org/abs/1308.0850

- https://deeplearning4j.org/lstm.html#backpropagation

- https://arxiv.org/pdf/1312.6026.pdf

- https://www.safaribooksonline.com/library/view/neural-networks-and/9781492037354/ch04.html

- https://www.di.ens.fr/~lelarge/dldiy/slides/lecture_8

top related