recurrent neural networks lecture 11 - part a · 2nd day 3th day 4th day 5th day 6th day. 1st day...

Recurrent Neural NetworksLecture 11 - Part A

Yaniv Bronhaim 11/6/2018

Outline

- Feedforward networks revisit

- The structure of Recurrent Neural Networks (RNN)

- RNN Architectures

- Bidirectional RNNs and Deep RNNs

- Backpropagation through time (BPTT)

- Natural Language Processing example

- “The unreasonable effectiveness” of RNNs (Andrej Karpathy)

- RNN Interpretations - Neural science with RNNs

- Image captioning with ConvNets and RNNs

- Summary

Feedforward network

InputY - Prediction (classification\regression)

A[x] - Network State in hidden layer x

W[x] - Network parameters for hidden layer x

b[x] - Bayes for layer x

Input -

Presentation

for valid

inputs

Feedforward networks

What will David do tonight?

Party Sleep Trainin

Possible activities:

Possible Inputs:

F(Sunny Day) = Party

Training

over time1

Expected

F(Rainy day) = Sleep

Expected

Feedforward Neural Network Mission

f(x,W) = Wx

Let’s look on sequential data

Every morning David decides what to do -

Running in the gym

Riding on bicycles

Swimming

- Every new day David does the next

activity in his activities options, by

order.

Let’s look on sequential data

- Unless it’s rainy day. In that case David

stays to sleep instead of his daily

practice

- When the sun comes out again I do the

next activity since my last train.

First solution - Using Yesterday’s data in FFN

Sunny dayYesterday’s

activity

Today’s activity

First solution - Using Yesterday’s data in FFN

activity

Today’s activity

First solution - Problem

activity

Today’s activity

1st day

2nd day

3th day

4th day

5th day

6th day

1st day

2nd day

As long as we know the activity of the last sunny day

2nd day

3rd day

But we know only yesterday

2nd day

3rd day

The “Func” output includes also data from the past

Sequential data

- The input is a sequence x of vectors (Xt is the vector at

time t) + the output from previous run with “history” (HOW:

Hidden layer is looped back from the past into the future)

- Output is a softmax layer predicting the next activity

RNN Equations

Token from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Same function and same

parameters are passed in each

timestep t

Recurrent Neural Network Computational Graph

Reusing same weight matrix every time step

Recurrent Neural Network

- W is shared across time - reduces the number of parameters

- Hidden state == Memory

- “temporal size” of sequences

RNN Architectures

Taken from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf

Caption Sentiment Sequence to sequence POSClassification

RNN: many to one

RNN: one to many

RNN: many to many

Multi Layer RNNs

More leaning capacity

Bidirectional RNNs

- I want to go to school\college, I have a lesson at 8 o’clock

- We might want to consider words that appear after the

word in focus

Backpropagation through time (BPTT)

Truncated Backpropagation through time (BPTT)

Process only

chunk of

sequence and

backprop to

update W

Carry hidden states

forward in time

forever, but only

backpropagate for

some smaller

number of steps

Truncated Backpropagation through time (BPTT)

Vocabulary - “H”, “E”, “L”, “O”

Concrete example - Character-level language model

4 separate training examples:

1. The probability of “e” should be

likely given the context of “h”.

2. “l” should be likely in the context

of “he”.

3. “l” should also be likely given the

context of “hel”.

4. “o” should be likely given the

context of “hell”.

Generating Sequences

- Feed-back the sample

character to the model to

generate a sentence

<START>

Training this on a lot of sentences would give us a language

model. A way to predict:

Continue until <END>..

https://github.com/karpathy/char-rnn

Center for Brains, Minds and Machines (CBMM)

Generated Latex notes

http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

Generated Latex notes

http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

RNN Bible

https://twitter.com/rnn_bible

What can we learn from the internal state of

specific cells in the recurrent network

Interpretation - Neural Science With

Interpretation - Neural Science With RNN

Red = -1

Blue = 1

White = 0

As we process text we pick particular cell and

visualize it’s activation - looking at the firing rate

of the cell as we read the text

Generated C code - Trained on Linux kernel code

https://github.com/karpathy/char-rnn

Recurrent Neural Networks for Folk Music Generation

https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-

networks-for-folk-music-generation/

https://imgur.com/gallery/u76wY

Image captioning with ConvNets and RNNs

- Back to CNN - How RNN is integrated for

application related to image processing?

Image captioning with ConvNetsAnd RNNs

- Convolutional Networks express a single

differentiable function from raw image pixel

values to class probabilities

“VGGNet” or “OxfordNet”

(5 conv layers and 4 pooling layers)

“Very Deep Convolutional Networks for Large-Scale Visual Recognition” [Simonyan

and Zisserman, 2014]

- We use the FC-4096 layer as the image

representation and push it to RNNs which

generate sentences as we saw before

- First caption input will be constant string -

<START>

- Generating first word in caption

<START>

- We use Y0 as the input for next iteration

<START>

- continue until Yt = <END>

<START>

Dog <END>

<START>

Dog <END>

Summary

- RNN declaration and architectures

- Bits about language processing and RNN effectiveness

- Applications based RNNs

- Integrating with ConvNets

- Next - More advanced memory with Long Short Term

Memory (LSTM) and many more applications based RNNs

References

- Stanford CS231- Fei-Fei & Justin Johnson & Serena Yeung. Lecture 10

- https://deeplearning4j.org/lstm.html

- Coursera, Machine Learning course by Andrew Ng.

- https://karpathy.github.io/2015/05/21/rnn-effectiveness/

- https://arxiv.org/pdf/1406.6247.pdf

- Udacity - Deep learning, by Luis Serrano

- https://medium.com/syncedreview/a-brief-overview-of-attention-mechanism-13c578ba9129

- http://vision.stanford.edu/pdf/KarpathyICLR2016.pdf

- https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-networks-for-folk-music-generation/

- NLP course (IDC) - Kfir Bar - NLM lecture

- https://cs.stanford.edu/people/karpathy/deepimagesent/

- https://arxiv.org/abs/1308.0850

- https://deeplearning4j.org/lstm.html#backpropagation

- https://arxiv.org/pdf/1312.6026.pdf

- https://www.safaribooksonline.com/library/view/neural-networks-and/9781492037354/ch04.html

- https://www.di.ens.fr/~lelarge/dldiy/slides/lecture_8

recurrent neural networks lecture 11 - part a · 2nd day 3th day 4th day 5th day 6th day. 1st day...

Documents

rush day 17 2nd page

2nd graphene day

kcks'2010 2nd day program

introductory func prog

workshops 2nd day

func dyn statement_set.c

· 1 day ago · nr.total func ie ublice de conducere...

2nd day presentation mohanlal

2010 lessons 2nd day review

education: func/marx

2nd day pre aw15

2013 intrams issue - 2nd day

func trigonometricas

edu - 517, 2nd day

chang hr func

user's manual - digi...

2nd day novena

20091128 func prog

func onal pathways employee newsle...

array func