encoder-decoder sequence-to- sequence architecturessrihari/cse676/10.4 encod-decod sequen… · 4....
TRANSCRIPT
![Page 1: Encoder-Decoder Sequence-to- Sequence Architecturessrihari/CSE676/10.4 Encod-Decod Sequen… · 4. Encoder-Decoder Sequence-to-Sequence Architectures 5. Deep Recurrent Networks 6](https://reader033.vdocuments.us/reader033/viewer/2022052310/5f0c4d387e708231d434b8c6/html5/thumbnails/1.jpg)
Deep Learning Srihari
Encoder-Decoder Sequence-to-Sequence Architectures
Sargur [email protected]
1
This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/CSE676
![Page 2: Encoder-Decoder Sequence-to- Sequence Architecturessrihari/CSE676/10.4 Encod-Decod Sequen… · 4. Encoder-Decoder Sequence-to-Sequence Architectures 5. Deep Recurrent Networks 6](https://reader033.vdocuments.us/reader033/viewer/2022052310/5f0c4d387e708231d434b8c6/html5/thumbnails/2.jpg)
Deep Learning Srihari Topics in Sequence Modeling
• Overview1. Unfolding Computational Graphs2. Recurrent Neural Networks3. Bidirectional RNNs4. Encoder-Decoder Sequence-to-Sequence Architectures5. Deep Recurrent Networks6. Recursive Neural Networks7. The Challenge of Long-Term Dependencies8. Echo-State Networks9. Leaky Units and Other Strategies for Multiple Time Scales10. LSTM and Other Gated RNNs11. Optimization for Long-Term Dependencies12. Explicit Memory
2
![Page 3: Encoder-Decoder Sequence-to- Sequence Architecturessrihari/CSE676/10.4 Encod-Decod Sequen… · 4. Encoder-Decoder Sequence-to-Sequence Architectures 5. Deep Recurrent Networks 6](https://reader033.vdocuments.us/reader033/viewer/2022052310/5f0c4d387e708231d434b8c6/html5/thumbnails/3.jpg)
Deep Learning Srihari
Previously seen RNNs
1. An RNN to map an input sequence to a fixed-size vector
2. RNN to map a fixed-size vector to a sequence
3. How an RNN can map an input sequence to an output sequence of the same length
3
![Page 4: Encoder-Decoder Sequence-to- Sequence Architecturessrihari/CSE676/10.4 Encod-Decod Sequen… · 4. Encoder-Decoder Sequence-to-Sequence Architectures 5. Deep Recurrent Networks 6](https://reader033.vdocuments.us/reader033/viewer/2022052310/5f0c4d387e708231d434b8c6/html5/thumbnails/4.jpg)
Deep Learning Srihari
RNN when input/output are not same length
• Here we consider how an RNN can be trained to map an input sequence to an output sequence which is not necessarily the same length
• Comes up in applications such as speech recognition, machine translation or question-answering where the input and output sequences in the training set are generally not of the same length (although their lengths might be related)
4
![Page 5: Encoder-Decoder Sequence-to- Sequence Architecturessrihari/CSE676/10.4 Encod-Decod Sequen… · 4. Encoder-Decoder Sequence-to-Sequence Architectures 5. Deep Recurrent Networks 6](https://reader033.vdocuments.us/reader033/viewer/2022052310/5f0c4d387e708231d434b8c6/html5/thumbnails/5.jpg)
Deep Learning Srihari
An Encoder-Decoder or Sequence-to-Sequence RNN
5
Learns to generate an output sequence
(y(1),..,y(ny )
)
given an input sequence
(x(1),..,x (nx ))
It consists of an encoder RNN that readsan input sequence and a decoder RNNthat generates the output sequenceor computes the probability of a givenoutput sequence)
The final hidden state of the encoder RNNIs used to compute a fixed size context C which represents a semantic summary ofthe input sequence and is given as inputto the decoder