powerpoint presentation€¦ · title: powerpoint presentation author: rohan gupta created date:...
TRANSCRIPT
![Page 1: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/1.jpg)
End-To-End Memory Networks
Sainbayar Sukhbaatar, Arthur Szlam, Jason Wetson, Rob Fergus Dept. Of Computer Science
Courant Institute, NYU &
Facebook AI Research New York
![Page 2: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/2.jpg)
Outline
• Motivation • Model • Experiments • Results • Conclusion
![Page 3: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/3.jpg)
Motivation
• Make a model that can perform many computational steps to answer a question.
• Make a model that describes dependencies in sequential data.
• I.E. sequential reasoning • Lightweight & easily trainable
![Page 4: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/4.jpg)
Motivation over MemNN
• End-To-End Trainable • Far less supervision • More generalizable
![Page 5: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/5.jpg)
Overview of Model
• Variables: – Discrete set of inputs (𝑥𝑖) – A query (q) – Produce an answer (a)
• Static Memory Bank • Multiple Hops
![Page 6: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/6.jpg)
𝑝𝑖 = Softmax 𝑢𝑇𝑚𝑖 𝑜 = 𝑝𝑖𝑐𝑖𝑖
𝑎 = Softmax(𝑊 𝑜 + 𝑢 )
![Page 7: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/7.jpg)
![Page 8: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/8.jpg)
Weight Tying
• Adjacent: – 𝐴𝑘+1= 𝐶𝑘 – 𝑊𝑇 = 𝐶𝐾 – B = 𝐴1
• Layer-wise (RNN-like): – 𝐴1 = … = 𝐴𝑘, 𝐶1= … = 𝐶𝑘 – 𝑢𝑘+1= 𝐻𝑢𝑘 + 𝑜𝑘
![Page 9: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/9.jpg)
Sentence Representation
• Bag-of-words – 𝑚𝑖 = 𝐴𝑥𝑖𝑗𝑗
• Position Encoding (PE) – 𝑚𝑖 = 𝑙𝑗 ∙ 𝐴𝑥𝑖𝑗𝑗
• Temporal Encoding (TE) – 𝑚𝑖 = 𝐴𝑥𝑖𝑗𝑗 + 𝑇𝐴(𝑖)
![Page 10: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/10.jpg)
Synthetic QA Experiments
![Page 11: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/11.jpg)
Similarity to Attention
NOTE: This model does not use the “support” label during training
![Page 12: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/12.jpg)
Results
![Page 13: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/13.jpg)
Language Modeling
![Page 14: PowerPoint Presentation€¦ · Title: PowerPoint Presentation Author: Rohan Gupta Created Date: 4/17/2018 3:44:45 PM](https://reader035.vdocuments.us/reader035/viewer/2022062415/6027c5378dddcb033b4c1ded/html5/thumbnails/14.jpg)
Conclusion
• Outperforms all baselines with the same level of supervision (LSTMs etc.)
• Slightly worse than a strongly supervised Memory Network, but it was trained without supporting facts, so it can be easily trained in more general settings.
• On language modeling, outperforms RNNs and LSTMs