lecture 7: recurrent neural networksfall97.class.vision/slides/7.pdf · 2018-11-05 · srttu...

14
Lecture 7 - SRTTU A.Akhavan 1 دوشنبه،۱۴ آبان۱۳۹۷ Lecture 7: Recurrent Neural Networks Alireza Akhavan Pour CLASS.VISION

Upload: others

Post on 21-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan 1 ۱۳۹۷آبان ۱۴دوشنبه،

Lecture 7: Recurrent Neural Networks

Alireza Akhavan Pour

CLASS.VISION

Page 2: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

Forward propagation

2 ۱۳۹۷آبان ۱۴دوشنبه،

𝑤𝑦𝑎

𝑤𝑎𝑎

𝑤𝑎𝑥

𝒂<𝒕> = 𝒈𝟏(𝒘𝒂𝒂𝒂<𝒕−𝟏> +𝒘𝒂𝒙𝒙

<𝒕> + 𝒃𝒂)

൯ෝ𝒚<𝐭> = 𝒈𝟐(𝒘𝒚𝒂𝒂<𝐭> + 𝒃𝒚

Page 3: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

Simplified RNN notation

3 ۱۳۹۷آبان ۱۴دوشنبه،

𝒂<𝒕> = 𝒈𝟏(𝒘𝒂𝒂𝒂<𝒕−𝟏> +𝒘𝒂𝒙𝒙

<𝒕> + 𝒃𝒂)

൯ෝ𝒚<𝐭> = 𝒈𝟐(𝒘𝒚𝒂𝒂<𝐭> + 𝒃𝒚

:بازنویسی به صورت ساده تر

𝒂<𝒕> = 𝒈𝟏(𝒘𝒂 𝒂<𝒕−𝟏>, 𝒙<𝒕> + 𝒃𝒂)

𝒘𝒂𝒂 𝒘𝒂𝒙

(100, 100) (100, 10,000)

100 10,000

100 10,000

= 𝒘𝒂

(100, 10,100)

100𝒂<𝒕−𝟏>, 𝒙<𝒕> =

𝒂<𝒕−𝟏>

𝒙<𝒕>

100

10,000

10

,10

0

൯ෝ𝒚<𝐭> = 𝒈𝟐(𝒘𝒚𝒂<𝐭> + 𝒃𝒚

Page 4: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

Simplified RNN notation

4 ۱۳۹۷آبان ۱۴دوشنبه،

𝒂<𝒕> = 𝒈𝟏(𝒘𝒂 𝒂<𝒕−𝟏>, 𝒙<𝒕> + 𝒃𝒂)

൯ෝ𝒚<𝐭> = 𝒈𝟐(𝒘𝒚𝒂<𝐭> + 𝒃𝒚

wa is waa and wax stacked horizontally.[a<t-1>, x<t>] is a<t-1> and x<t> stacked vertically.wa shape: (NoOfHiddenNeurons, NoOfHiddenNeurons + nx)[a<t-1>, x<t>] shape: (NoOfHiddenNeurons + nx, 1)

Page 5: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

RNN Unit

5 ۱۳۹۷آبان ۱۴دوشنبه،

𝒂<𝒕> = 𝒈𝟏(𝒘𝒂 𝒂<𝒕−𝟏>, 𝒙<𝒕> + 𝒃𝒂)

tanh

𝒂<𝒕−𝟏>

𝒙<𝒕>

𝒂<𝒕>

൯ෝ𝒚<𝐭> = 𝒈𝟐(𝒘𝒚𝒂<𝐭> + 𝒃𝒚

softmax

ෝ𝒚<𝐭>

Page 6: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

GRU(Simplified)

[Cho et al., 2014. On the properties of neural machine translation: Encoder-decoder approaches]

[Chung et al., 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling]

6 ۱۳۹۷آبان ۱۴دوشنبه،

The cat, which already ate ..., was full......سفر کرده بودم، ایرانبه ....

.را به خوبی یاد گرفته بودمفارسیدر انتهای سفر زبان

𝒄 = memory cell

𝒄<𝒕> = 𝒂<𝒕>

)𝒄<𝒕> = 𝒕𝒂𝒏𝒉(𝒘𝒄 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒄

)𝚪𝒖 = 𝝈(𝒘𝒖 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒖 (Update)

𝒄<𝒕> = 𝚪𝒖 ∗ 𝒄<𝒕> + 1 − 𝚪𝒖 ∗ 𝒄<𝒕−1>

𝒄<𝒕−𝟏>

𝒙<𝒕>

tanh

𝒄<𝒕>

𝝈

𝚪𝒖

𝒄<𝒕>

softmax

ෝ𝒚<𝐭>

𝚪𝒖= 1 𝚪𝒖= 0 𝚪𝒖= 0 𝚪𝒖= 0 𝚪𝒖= 1

Page 7: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

Full GRU

8 ۱۳۹۷آبان ۱۴دوشنبه،

)𝒄<𝒕> = 𝒕𝒂𝒏𝒉(𝒘𝒄 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒄

)𝚪𝒖 = 𝝈(𝒘𝒖 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒖

𝒄<𝒕> = 𝚪𝒖 ∗ 𝒄<𝒕> + 1 − 𝚪𝒖 ∗ 𝒄<𝒕−1>

The cat, which already ate ..., was full.

)𝒄<𝒕> = 𝒕𝒂𝒏𝒉(𝒘𝒄 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒄𝚪𝒓 *

)𝚪𝒓 = 𝝈(𝒘𝒓 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒓

Page 8: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

GRU & LSTM

9 ۱۳۹۷آبان ۱۴دوشنبه،

)𝒄<𝒕> = 𝒕𝒂𝒏𝒉(𝒘𝒄 𝚪𝒓 * 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒄

)𝚪𝒖 = 𝝈(𝒘𝒖 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒖

𝒄<𝒕> = 𝚪𝒖 ∗ 𝒄<𝒕> + 1 − 𝚪𝒖 ∗ 𝒄<𝒕−1>

)𝚪𝒓 = 𝝈(𝒘𝒓 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒓

𝒂<𝒕> = 𝒄<𝒕>

GRU LSTM

[Hochreiter & Schmidhuber 1997. Long short-term memory]

)𝒄<𝒕> = 𝒕𝒂𝒏𝒉(𝒘𝒄 𝐚<𝒕−1>, 𝒙<𝒕> + 𝒃𝒄

)𝚪𝒖 = 𝝈(𝒘𝒖 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒖

൯𝚪𝒇 = 𝝈(𝒘𝒇 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒇

)𝚪𝒐 = 𝝈(𝒘𝒐 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒐

𝒄<𝒕> = 𝚪𝒖 ∗ 𝒄<𝒕> + 𝚪𝒇 ∗ 𝒄<𝒕−1>

𝒂<𝒕> = 𝚪𝒐 ∗ 𝒄<𝒕>

Page 9: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

LSTM in pictures

10 ۱۳۹۷آبان ۱۴دوشنبه،

)𝒄<𝒕> = 𝒕𝒂𝒏𝒉(𝒘𝒄 𝐚<𝒕−1>, 𝒙<𝒕> + 𝒃𝒄

)𝚪𝒖 = 𝝈(𝒘𝒖 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒖

൯𝚪𝒇 = 𝝈(𝒘𝒇 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒇

)𝚪𝒐 = 𝝈(𝒘𝒐 𝒄<𝒕−1>, 𝒙<𝒕> + 𝒃𝒐

𝒄<𝒕> = 𝚪𝒖 ∗ 𝒄<𝒕> + 𝚪𝒇 ∗ 𝒄<𝒕−1>

𝒂<𝒕> = 𝚪𝒐 ∗ 𝒕𝒂𝒏𝒉 𝒄<𝒕>

Page 10: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

Bidirectional RNN

11 ۱۳۹۷آبان ۱۴دوشنبه،

He said, “Teddy bears are on sale!”He said, “Teddy Roosevelt was a great President!”

Getting information from the future

Page 11: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

Bidirectional RNN (BRNN)

12 ۱۳۹۷آبان ۱۴دوشنبه،

Ԧ𝑎<1> Ԧ𝑎<2> Ԧ𝑎<3> Ԧ𝑎<4>

𝒙<𝟏> 𝒙<𝟐> 𝒙<𝟑> 𝒙<𝟒>

ശ𝑎<1> ശ𝑎<1> ശ𝑎<1> ശ𝑎<1>

ො𝑦<1> ො𝑦<2> ො𝑦<3> ො𝑦<4>

൯ෝ𝒚<𝐭> = 𝒈(𝒘𝒚[ Ԧ𝑎<1> , ശ𝑎<1>] + 𝒃𝒚

Page 12: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

Bidirectional RNN (BRNN)

13 ۱۳۹۷آبان ۱۴دوشنبه،

Ԧ𝑎<1> Ԧ𝑎<2> Ԧ𝑎<3> Ԧ𝑎<4>

𝒙<𝟏> 𝒙<𝟐> 𝒙<𝟑> 𝒙<𝟒>

ശ𝑎<1> ശ𝑎<1> ശ𝑎<1> ശ𝑎<1>

ො𝑦<1> ො𝑦<2> ො𝑦<3> ො𝑦<4>

൯ෝ𝒚<𝐭> = 𝒈(𝒘𝒚[ Ԧ𝑎<1> , ശ𝑎<1>] + 𝒃𝒚

را حساب کنیم<ෞ𝑦<3برای مثال اگر بخواهیم

Ԧ𝑎<1> Ԧ𝑎<2> Ԧ𝑎<3>

𝒙<𝟏> 𝒙<𝟐> 𝒙<𝟑>𝒙<𝟑> 𝒙<𝟒>

ശ𝑎<1> ശ𝑎<1>

Page 13: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan

Deep RNNs

14 ۱۳۹۷آبان ۱۴دوشنبه،

𝑦<1> 𝑦<2> 𝑦<3> 𝑦<4>

Page 14: Lecture 7: Recurrent Neural Networksfall97.class.vision/slides/7.pdf · 2018-11-05 · SRTTU –A.Akhavan Lecture 7- GRU(Simplified) [Cho et al., 2014. On the properties of neural

Lecture 7 -SRTTU – A.Akhavan 15 ۱۳۹۷آبان ۱۴دوشنبه،

منابع

• https://www.coursera.org/specializations/deep-learning