![Page 1: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/1.jpg)
Area AttentionYang Li, Lukasz Kaiser, Samy Bengio, Si Si
Google Research
![Page 2: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/2.jpg)
Neural Attentional Mechanisms
k1 , v1 k2 , v2 k3 , v3 k|M| , v|M|
query
a1 a2 a3 a|M|
![Page 3: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/3.jpg)
Neural Machine Translation
A B C D EOS X Y
a1 a2a3
a4
Bahdanau, Cho & Bengio, ICLR’15Luong, Pham, & Manning, ACL’15
![Page 4: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/4.jpg)
Image Captioning
Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel & Bengio, ICML’15Sharma, Ding, Goodman & Soricut, ACL’18
EOS X Y
a1 a2 a3
a4
Image Grid Cells
![Page 5: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/5.jpg)
Attention-Based Architectures
A B C D
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser & Polosukhin, NIPS’17
Transformer
![Page 6: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/6.jpg)
k1 , v1 k2 , v2 k3 , v3 k|M| , v|M|
query
a1 a2 a3 a|M|
Limitations
The unit of attention is predetermined rather than learned.
Airlines began charging for the first and second checked bags
A r e y o u a t o m eh ?
Word
Character
Image Grid Cell
![Page 7: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/7.jpg)
Research Goal
Enable a model to attend to information at varying granularity. The unit of attention emerges from learning.
A r e y o u a t o m eh ?
Characters Words
Grid cells Objects
Airlines began charging for the first and second checked bags
Words Phrases
![Page 8: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/8.jpg)
1D Area Attention
original memory
area memory
query
1-item areas 2-item areas 3-item area3
![Page 9: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/9.jpg)
2D Area Attention
original memory
area memory
query
1x1 areas 1x2 areas
2x1 areas 2x2 areas
![Page 10: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/10.jpg)
Features of Each Area
original memory
area memory
query
1x1 areas 1x2 areas
2x1 areas 2x2 areas
Area Features
Mean
Sum
Max
Standard deviation
Area shape, e.g., 2x2
original memory
area memory
query
1-item areas 2-item areas 3-item area3
![Page 11: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/11.jpg)
Area Attention consistently Improves upon Transformer & LSTM
Transformer Machine Translation
LSTM Machine Translation
Transformer Image Captioning
![Page 12: Area Attention11-16-00... · 2019-06-08 · Features of Each Area original memory area memory query 1x1 areas 1x2 areas 2x1 areas 2x2 areas Area Features Mean Sum Max Standard deviation](https://reader036.vdocuments.us/reader036/viewer/2022070915/5fb608db789888365a487d3c/html5/thumbnails/12.jpg)
Area Attention Yang Li, Lukasz Kaiser, Samy Bengio, Si Si
Google Research
Poster sessionTue Jun 11th 06:30 — 09:00 PM @ Pacific Ballroom #27
Source code https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/layers/area_attention.py