Download - Recurrent Networks and LSTM deep dive
![Page 2: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/2.jpg)
Content
1. Example of Vanilla RNN2. RNN Forward pass3. RNN Backward pass4. LSTM design
RNN Training problem
![Page 3: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/3.jpg)
Feed-forward (βvanillaβ) network
1
0
0
1
0
![Page 4: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/4.jpg)
X
y
RNN
h
π hh
π hπ¦
π hπ₯
Vanilla recurrent network
1ΒΏhπ‘= tanh (π hh hπ‘β1+π hπ₯ π₯+πh )
2ΒΏ π¦=π hπ¦hπ‘+π π¦
![Page 5: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/5.jpg)
Example: character-level language processing
X
y
RNN
Training sequence: βhelloβ
Vocabulary: [e, h, l, o]
0100
1000
0010
0001
βhββeβ βlβ β0β
π hh
π hπ¦
π hπ₯
![Page 6: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/6.jpg)
hX Y
π hπ₯ =[3 .6 β4.8 0.35 β0.26 ]
π hπ¦=[ β12.β0.67β0.8514. ]
P
ππ¦=[β0.2β2.96.1β3.4 ]
βhelloβ RNN
![Page 7: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/7.jpg)
hX Y P
0100
βhβ
h0=0
βhβ
![Page 8: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/8.jpg)
hX Y P
0100
βhβ
hπ‘=tanh (π hh hπ‘β 1+π hπ₯ π₯+πh )
h0=0
βhβ
![Page 9: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/9.jpg)
hX Y P
0100
βhβ
h=β0.99
βhβ
![Page 10: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/10.jpg)
hX Y P
0100
βhβ
h=β0.99 π¦=π hπ¦ hπ‘+π π¦
βhβ
![Page 11: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/11.jpg)
hX Y P
0100
βhβ
h=β0.99 π¦=[ 11.β2.26.9β17 ]
βhβ
![Page 12: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/12.jpg)
hX Y P
0100
βhβ
h=β0.99 π¦=[ 11.β2.26.9β17 ] π=[0 .9900.010 ]
βhβ
![Page 13: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/13.jpg)
hX Y P
0100
βhβ
h=β0.99 π¦=[ 11.β2.26.9β17 ] π=[0 .9900.010 ]
1000
βeββhβ
![Page 14: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/14.jpg)
hX Y P
1000
βeβ
h=β0.99
βhβ βeβ
![Page 15: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/15.jpg)
hX Y P
1000
βeβ
h=β0.99hπ‘=tanh (π hh hπ‘β 1+π hπ₯ π₯+πh )
βhβ βeβ
![Page 16: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/16.jpg)
hX Y P
1000
βeβ
h=β0.09
βhβ βeβ
![Page 17: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/17.jpg)
hX Y P
1000
βeβ
h=β0.09 π¦=π hπ¦ hπ‘+π π¦
βhβ βeβ
![Page 18: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/18.jpg)
hX Y P
1000
βeβ
h=β0.09 π¦=[ 0 .86β2.86.2β4.6 ]
βhβ βeβ
![Page 19: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/19.jpg)
hX Y P
1000
βeβ
h=β0.09 π¦=[ 0 .86β2.86.2β4.6 ] π=[ 000.990 ]
βhβ βeβ
![Page 20: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/20.jpg)
hX Y P
1000
βeβ
h=β0.09 π¦=[ 0 .86β2.86.2β4.6 ] π=[ 000.990 ]
0010
βlββhβ βeβ
![Page 21: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/21.jpg)
hX Y P
0010
βlβ
h=β0.09
βhβ βeβ βlβ
![Page 22: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/22.jpg)
hX Y P
0010
βlβ
38
βhβ βeβ βlβ
![Page 23: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/23.jpg)
hX Y P
0010
βlβ
38 π¦=[β4.7β3.25.81.9 ]
βhβ βeβ βlβ
![Page 24: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/24.jpg)
hX Y P
0010
βlβ
38 π¦=[β4.7β3.25.81.9 ] π=[ 000.980.02]
βhβ βeβ βlβ
![Page 25: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/25.jpg)
hX Y P
0010
βlβ
38 π¦=[β4.7β3.25.81.9 ] π=[ 000.980.02]
0010
βlββhβ βeβ βlβ
![Page 26: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/26.jpg)
hX Y P
0010
βlβ
38
βhβ βeβ βlβ βlβ
![Page 27: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/27.jpg)
hX Y P
0010
βlβ
98
βhβ βeβ βlβ βlβ
![Page 28: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/28.jpg)
hX Y P
0010
βlβ
98
βhβ βeβ βlβ βlβ
π¦=[β12.β3.65.310. ]
![Page 29: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/29.jpg)
hX Y P
0010
βlβ
98
βhβ βeβ βlβ βlβ
π¦=[β12.β3.65.310. ] π=[ 000.010.99 ]
![Page 30: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/30.jpg)
hX Y P
0010
βlβ
98
βhβ βeβ βlβ βlβ
π¦=[β12.β3.65.310. ] π=[ 000.010.99 ]
0001
βoβ
![Page 31: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/31.jpg)
hX Y P
98
βhβ βeβ βlβ βlβ βoβ
![Page 32: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/32.jpg)
hX Y P
βhβ h0=0 βeββ¨
βeβ -0.99 βlββ¨
βlβ -0.09 βlββ¨
βlβ 0.38 βoββ¨
![Page 33: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/33.jpg)
hX Y P
βhelloβ βhelloβ
βhello benβ βhello benβ
βhello worldβ βhello worldβ
![Page 34: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/34.jpg)
hX Y P
βit wasβ βit wasβ
βit was theβ βit was theβ
βit was the bestβ βit was the bestβ
βIt was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishnessβ¦ β, A Tale of Two Cities, Charles Dickens
50,000
300,000 (loss = 1.6066)
1,000,000 (loss = 1.8197)
βit was the best ofβ βit wes the best ofβ 2,000,000 (loss = 4.0844)
![Page 35: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/35.jpg)
hX Y P
β¦epoch 500000, loss: 6.447782290456328 β¦epoch 1000000, loss: 5.290576956983398 β¦epoch 1800000, loss: 4.267105168323299 epoch 1900000, loss: 4.175163586546514 epoch 2000000, loss: 4.0844739848413285
![Page 36: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/36.jpg)
X
y
RNN
h
π hh
π hπ¦
π hπ₯
Vanilla recurrent network
1ΒΏhπ‘= tanh (π hh hπ‘β1+π hπ₯ π₯+πh )
2ΒΏ π¦=π hπ¦hπ‘+π π¦
![Page 37: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/37.jpg)
Input:
Target:
i t β β w a s β β
t β β w a s β β t h
t
![Page 38: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/38.jpg)
RNNs for Different Problems
Vanilla Neural Network
![Page 39: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/39.jpg)
RNNs for Different Problems
Image Captioningimage -> sequence of words
![Page 40: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/40.jpg)
RNNs for Different Problems
Sentiment Analysissequence of words -> class
![Page 41: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/41.jpg)
RNNs for Different Problems
Translationsequence of words -> sequence of words
![Page 42: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/42.jpg)
h1h0
1 1 2
3
h2
π₯0 π₯1 π₯2
πΏ= π (π hπ₯ ,π hh ,π hπ¦)
51
π hh=0.024
π€ hπ₯ βπ€ hπ₯ β0.01 βππΏππ€ hπ₯
π€hhβπ€hhβ0.01 βππΏππ€hh
π€hπ¦βπ€hπ¦β0.01βππΏππ€hπ¦
Training is hard with vanilla RNNs
π» πΏ=[ππΏππ€ hπ₯
, ππΏππ€hh, ππΏππ€h π¦
]
π hπ₯
π hh
π hπ¦
<β Forward pass
<β Backward pass
![Page 43: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/43.jpg)
h1h0
1 1 2
3
h2
π₯0 π₯1 π₯2
ππΏππ€hh
=?
πΏ=?
y
![Page 44: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/44.jpg)
ππΏππ€=
π πππ β
πππh β
πhππ β
πππ π β
ππππ β ππππ β
ππππ€πΏ= π (π (h(π (π (π (π (π€)))))))
ππΏππ€hh
=?
πΏ=(( π hh tanh (π hh tanh (π hh tanh (π hπ₯ π₯0)+π hπ₯ π₯1)+π hπ₯ π₯2))β3)2
Compute gradient
Recursive application of chain rule:
ππΏππ€=?
π = π (π)π=π(h)h=h (π)
![Page 45: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/45.jpg)
Gradient by hand
![Page 46: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/46.jpg)
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
π hh=0.024
1
Forward Pass
0.078
1.
π hπ₯
π₯0
![Page 47: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/47.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
0.078
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 48: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/48.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
0.078
tanh0.0778
h0
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 49: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/49.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
h0
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 50: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/50.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
h0
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 51: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/51.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
0.078
1.
π hπ₯
π₯1
h0
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 52: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/52.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 53: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/53.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 54: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/54.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970tanh
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 55: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/55.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
024
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 56: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/56.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
024
*0.0019
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 57: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/57.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
0.078
2.
π hπ₯
π₯2
024
*0.0019
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 58: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/58.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 59: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/59.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 60: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/60.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 61: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/61.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 62: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/62.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 63: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/63.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+-2.99
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 64: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/64.jpg)
1
π hh=0.024
Forward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 65: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/65.jpg)
ππΏππ€=
π πππ β
πππh β
πhππ β
πππ π β
ππππ β ππππ β
ππππ€
πΏ= π (π (h(π (π (π (π (π€)))))))
ππΏππ€hh
=?
Compute gradient
Recursive application of chain rule:
![Page 66: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/66.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π πππ β
πππh β
πhππ β
ππππ β
ππππ β ππππ β
ππππ€ hπ₯
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 67: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/67.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
πππh β
πhππ β
ππππ β
ππππ β ππππ β
ππππ€ hπ₯
1
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 68: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/68.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
πππh β
π hππ β
ππππ β
ππππ β ππππ β
ππππ€ hπ₯
1
π πππ=?
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 69: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/69.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
πππh β
π hππ β
ππππ β
ππππ β ππππ β
ππππ€ hπ₯
1
π πππ=
ππ2ππ =2π=2 (β2.99 )=β5.98
-5.98
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 70: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/70.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
π hππ β
ππππ β
ππππ β ππππ β
ππππ€ hπ₯
1-5.98
πππh=1
-5.98
tanh
tanhπ₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 71: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/71.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
ππππ β
ππππ β
ππππ β ππππ β
ππππ€ hπ₯
1-5.98
-5.98
π hππ=π hπ¦
0.051tanh
tanh
πhππ hπ¦
=π
0.1566
-0.304
0.936
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 72: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/72.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
ππππ β
ππππ β
ππππ β ππππ β
ππππ€ hπ₯
1-5.98
-5.98
π hππ=π hπ¦
tanh
tanh
πhππ hπ¦
=π
-0.304
0.936
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 73: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/73.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
ππππ β
ππππ β
ππππ β ππππ β
ππππ€ hπ₯
1-5.98
-5.98
ππππ =1βπ
2=1β .15662=.975
-0.304-0.297tanh
tanh
0.936
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 74: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/74.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.07970
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
ππππ β
ππππ β
π πππ β ππππ β
ππππ€ hπ₯
1-5.98
-5.98
-0.297tanh
tanh-0.297-0.0071
0.936
-0.304
-0.297
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 75: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/75.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.0797
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
ππππ β
ππππ β
π πππ β ππππ β ππππ€ hπ₯
1-5.98
-5.98
-0.297tanh
tanh-0.297-0.0071
1βπ2=1β .07972=.993
-0.0071
0.936
-0.304
-0.297
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 76: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/76.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.0797
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
1-5.98
-5.98
-0.297tanh
tanh-0.297-0.0071-0.0071
-0.0071
-0.00017
0.936
-0.304
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
ππππ β
ππππ β
π πππ β ππππ β ππππ€ hπ₯
-0.0005
-0.297
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 77: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/77.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.0797
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
1-5.98
-5.98
-0.297tanh
tanh-0.297-0.0071-0.0071
-0.0071
-0.00017
1βπ2=1β .07782=.993
0.936
-0.304
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
ππππ β
ππππ β
π πππ β ππππ β ππππ€ hπ₯
-0.00017
-0.0005
-0.297
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 78: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/78.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.0797
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
1-5.98
-5.98
-0.297tanh
tanh-0.297-0.0071-0.0071
-0.0071
-0.00017
0.936
-0.304
ππΏππ€ hπ₯
=π ππ π β
π πππ β
ππππ β
ππππ β
ππππ β
π πππ β ππππ β ππ
ππ ππ
-0.00017
-0.00017
-0.0005
-0.297
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 79: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/79.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh0.0778
*0.00187
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+0.07987
h1
0.0797
*
0.078
2.
π hπ₯
π₯2
0.156
024
*0.0019
+0.1579 0.1566
h2
0.051π hπ¦
*0.0080π¦
-3
+ **
-2.99 8.95
πΏ
1-5.98
-5.98
-0.297tanh
tanh-0.297-0.0071-0.0071
-0.0071
-0.00017
0.936
-0.304
-0.00017
-0.00017
-0.0005
-0.297π€πβπ€πβ0.01 β
ππΏππ€π
π€ hπ₯ β0.078β0.01β (β .00017 )=0.0780017
π€hhβ0.024β0.01 β (β .0005 )=0.024005
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 80: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/80.jpg)
Backward Pass
*
0.078
1.
π hπ₯
π₯0
024
0.078
tanh
*
*
0.078
1.
π hπ₯
π₯1
0.078
h0
+
h1
*
0.078
2.
π hπ₯
π₯2
0.156
024
*
+0.1579
0.051π hπ¦
*
+ **
1-5.98
tanh
tanh-0.297-0.0071
-0.0071
-0.00017
π₯1π₯0
h1h0
1 2
h2
π₯2
3
1
![Page 81: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/81.jpg)
ππΏπ π₯=π€hhβ¦π€hhβ¦π€hhβ¦π€hh=π€hh
π βπΆ (π€)
π€hhπ€hhπ€hhπ€hhπ€hh
1. 0.024 2. 0.000576 3. 1.382e-05 4. 3.318e-07 5. 7.963e-09 6. 1.911e-10 7. 4.586e-12 8. 1.101e-13 9. 2.642e-1510. 6.340e-17
π hh=0.024tanh tanhtanhtanhtanhtanh
![Page 82: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/82.jpg)
Source: https://imgur.com/gallery/vaNahKE
![Page 83: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/83.jpg)
W
x
2n
4n
(ππππ)=(
π ππππ ππππ πππ
hπ‘ππ )π ( π₯hπ‘β1)
ππ‘= π βππ‘β 1+ πβπ
hπ‘=π β tanh (ππ‘)
i
f
o
g
x
h
Long Short-Term Memory (LSTM)
n
n
n
n
π
π
π
π
π‘β1 π‘
hπ‘=( tanh )π ( π₯hπ‘β 1) - RNN
![Page 84: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/84.jpg)
ππ‘= π βππ‘β 1+ πβπ
hπ‘=tanh (π hh hπ‘β 1+π hπ₯ π₯ )RNN:
LSTM:
(ππππ)=(
π ππππ ππππ πππ
hπ‘ππ )π ( π₯hπ‘β1)
ππ‘= π βππ‘β 1+ πβπ
hπ‘=π β tanh (ππ‘)
forgetgate,0/1
inputgate, 0/1
![Page 85: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/85.jpg)
f
incomingX
i og
+
X
tanh
X
Long Short-Term Memory (LSTM)
(ππππ)=(
π ππππ ππππ πππ
hπ‘ππ )π ( π₯hπ‘β1)
ππ‘= π βππ‘β 1+ πβπ
hπ‘=π β tanh (ππ‘)
ππ‘β 1
hπ‘
![Page 86: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/86.jpg)
ππΏπ π₯=π€hhβ¦π€hhβ¦π€hhβ¦π€hh=π€hh
π βπΆ (π€)
π€hhπ€hhπ€hh
f f f
f f f
+ + +
RNN
LSTM
Flow of gradient
π‘β1 π‘ π‘+1
π‘β1 π‘ π‘+1
![Page 87: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/87.jpg)
Source: https://imgur.com/gallery/vaNahKE
![Page 88: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/88.jpg)
Long Short-Term Memory (LSTM)
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Page 89: Recurrent Networks and LSTM deep dive](https://reader035.vdocuments.us/reader035/viewer/2022062401/58ef94731a28ab447e8b4577/html5/thumbnails/89.jpg)
Reference
1. Long Term-Short Memory (Hochreiter, 1997), http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf
2. Learning Long Term Dependencies With Gradient Descent is Difficult (Yoshua Bengio, 1994), http://www.dsi.unifi.it/~paolo/ps/tnn-94-gradient.pdf
3. http://neuralnetworksanddeeplearning.com/chap5.html
4. Deep Learning, Ian Goodfellow et al., The MIT Press
5. Recurrent Neural Networks, LSTM, Andrej Karpathy, Stanford Lectures, https://www.youtube.com/watch?v=iX5V1WpxxkY
Alex Kalinin [email protected]