sequence models - umd
TRANSCRIPT
Sequence Models
Machine Learning: Jordan Boyd-GraberUniversity of MarylandLSTM VARIANTS
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 1 / 5
GRU simplifies slightly
No explicit memory
Only one gate
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 2 / 5
GRU simplifies slightly
Slightly fewer parameters
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 2 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Greff et al. explore
� No Input Gate (NIG)
� No Forget Gate (NFG)
� No Output Gate (NOG)
� No Input Activation Function(NIAF)
� No Output Activation Function(NOAF)
� No Peepholes (NP)
� Coupled Input and Forget Gate(CIFG) : GRU, ft = 1− it
� Full Gate Recurrence (FGR):Original LSTM paper
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
What’s most important part of LSTM
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 3 / 5
Bi-directional LSTMs
Simple extension, often slightly improve performance (but don’t alwaysmake sense for task)
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 4 / 5
Comparing architechtures
� GRUs seem competitive
� LSTM seems to be good tradeoff
� Bi-directional often offers slight improvement
Machine Learning: Jordan Boyd-Graber | UMD Sequence Models | 5 / 5