large pretrained models - nlp.cs.hku.hk
TRANSCRIPT
![Page 1: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/1.jpg)
Lingpeng Kong
Department of Computer Science, The University of Hong Kong Many materials from Stanford CS224n with special thanks!
Large Pretrained ModelsCOMP3361 — Week 9
![Page 2: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/2.jpg)
Pretrained Models in the Past Four Years
Microsoft Research Blog. Oct 6, 2021.
![Page 3: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/3.jpg)
Pretrained Models in the Past Four Years
Microsoft Research Blog. Oct 6, 2021.
![Page 4: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/4.jpg)
Pretrained Models are Expensive
One single training run
552 metric tons of carbon dioxide (120 cars per year)
$12 million
![Page 5: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/5.jpg)
Pretraining and Contextualized Word Representations
Transformer
I feel like eating [SEP][CLS] What you want ? [SEP]like [MASK] [MASK]
NSP MLM
Ep(xi,x̂i)[p(xi | x̂i)]<latexit sha1_base64="pLw1dc0bFf82DgikPLTS3krQVCg=">AAACU3icdVFNS8QwEE3r+rV+rXr0ElwEFVna9aBHUQSPCq4K21LSdHY3mqYlScUl9D+K4ME/4sWDprsr+DkQ8nhvZjLzEuecKe15L447VZuemZ2bry8sLi2vNFbXrlRWSAodmvFM3sREAWcCOpppDje5BJLGHK7ju5NKv74HqVgmLvUwhzAlfcF6jBJtqahxG8TQZ8IQzvpit6wHKdGDODanZWTy7YeI7QVxxhM1TO1lggHR5qEsI7ZTdkcyDlKW4H9ywnoAIvnsHTWaXssbBf4N/AlookmcR42nIMlokYLQlBOlur6X69AQqRnlYGctFOSE3pE+dC0UJAUVmpEnJd6yTIJ7mbRHaDxiv1YYkqpqYptZrax+ahX5l9YtdO8wNEzkhQZBxw/1Co51hiuDccIkUM2HFhAqmZ0V0wGRhGr7DXVrgv9z5d/gqt3y91vti3bz6HhixxzaQJtoG/noAB2hM3SOOoiiR/SK3h3kPDtvruvWxqmuM6lZR9/CXfoAfmy2CQ==</latexit>
![Page 6: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/6.jpg)
Pretraining and Contextualized Word Representations
Jurassic Park lacks the emotional unity of Spielberg’s classics .
Neural Network Encoder (LSTMs, Transformers, etc.)
contextualized word representation
Implicit linguistic knowledge
![Page 7: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/7.jpg)
Pretraining and Fine-tuning
Jurassic Park lacks the emotional unity of Spielberg’s classics .
Neural Network Encoder (LSTMs, Transformers, etc.)
hundreds of millions of parameters
MLP Layer
Hundreds Parameters
$7,079
![Page 8: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/8.jpg)
Key Elements in BERT
Transformer
Masked Language Modeling (MLM), Next Sentence Prediction (NSP)
— pretraining objective
— neural representation learner
Bidirectional Encoder — type of architecture
![Page 9: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/9.jpg)
Neural Representation Learners
Transformer LSTM
ELMo
BERT GPT-2
GPT-3 BART
T5
…
XLNet
![Page 10: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/10.jpg)
Why Transformers?
computing block 1
<latexit sha1_base64="cYdXFjFi2uU4mX1aMyu2o5al0oU=">AAACRXicbVDLSgMxFM3UVx3funQzWAQXpcyIqBux6MZlBVuFtkgmk2lDM0lI7gh16G+4VfBj+gniR7gTt5rpdOHrQsLhnHu5555QcWbA91+d0szs3PxCedFdWl5ZXVvf2GwZmWpCm0RyqW9CbChngjaBAac3SlOchJxeh4PzXL++o9owKa5gqGg3wT3BYkYwWKrT4ZEEU/y36xW/5k/K+wuCKaicjt0T9fziNm43nEonkiRNqADCsTHtwFfQzbAGRjgduZ3UUIXJAPdo20KBE2q62cT0yNu1TOTFUtsnwJuw3ycynBgzTELbmWDom99aTv6ntVOIj7sZEyoFKkixKE65B9LLE/AipikBPrQAE82sV4/0scYEbE4/tgAb3BdX5IizUGM9zGIGVSUNywNkoleNKJF6EqepKesmkVr1c4FgTkaujTX4HeJf0NqvBYe1g0u/Uj9DRZXRNtpBeyhAR6iOLlADNRFBCj2gR/TkjJ035935KFpLznRmC/0o5/MLLli2qw==</latexit>. . . . . .
FFN
<latexit sha1_base64="swEwItEFPj9KIPdwlEYzj9ErqPI=">AAACYXicbVBNbxMxEHWWQtsUaFKOvawaUXGoot2qgh44VOIAEpcgkbYoiaJZ72xixWtb9ixqsPYfcOC39Fr+CGf+CN6kB/oxkuXn92bs55cZKRwlyZ9W9GTj6bPNre32zvMXL3c73b1zpyvLcci11PYyA4dSKBySIImXxiKUmcSLbPGh0S++o3VCq6+0NDgpYaZEIThQoKadwzHhFa3u8dqCmmHtx5mWuVuWYfNX9dRTXU87vaSfrCp+CNJb0Ds7+Dbo/vr8fjDttnrjXPOqREVcgnOjNDE08WBJcIl1e1w5NMAXMMNRgApKdBO/MlLHrwOTx4W2YSmKV+z/Ex5K1xgMnSXQ3N3XGvIxbVRRcTrxQpmKUPH1Q0UlY9Jxk06cC4uc5DIA4FYErzGfgwVOIcM7r5BY/Fj/okFSZBbs0heCjox2oglXqNlRjjyk2pxc3wQ3pbZm3ggcJK/bIdb0fogPwflxP33bP/kS8v3I1rXF9tkBe8NS9o6dsU9swIaMs5/smt2w362/0XbUifbWrVHrduYVu1PR/j9yOb5R</latexit>xt
computing block 2
FFN
<latexit sha1_base64="OM7nXNLBGIOCvjRbmmfWlkpbQ84=">AAACY3icbVDLThsxFHWmLwh9BMoOVRo1QkItimaqqrBE6gKWIBFAJFF0x3MnseKxLftORbDmF/o13bb/0Q/osv+AJ2HB60qWj8+51z4+mZHCUZL8bUXPnr94+Wpltb32+s3bd531jTOnK8uxz7XU9iIDh1Io7JMgiRfGIpSZxPNs9r3Rz3+gdUKrU5obHJUwUaIQHChQ487OkPCKFvd4bUFNsPbDTMvczcuw+at67OlzWtfjTjfpJYuKH4P0FnQPtv/vH+pPl8fj9VZ3mGtelaiIS3BukCaGRh4sCS6xbg8rhwb4DCY4CFBBiW7kF1bqeDsweVxoG5aieMHenfBQusZi6CyBpu6h1pBPaYOKiv2RF8pUhIovHyoqGZOOm3ziXFjkJOcBALcieI35FCxwCinee4XE7Hr5iwZJkVmwc18I2jXaiSZeoSa7OfKQa3NyPRPclNqaaSNwkLxuh1jThyE+Bmdfeum33teTkO8hW9YK22If2Q5L2R47YEfsmPUZZz/ZL/ab/Wn9i9aijWhz2Rq1bmfes3sVfbgBeM6/Ug==</latexit>xt+1
computing block i
![Page 11: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/11.jpg)
Why Transformers?
<latexit sha1_base64="yKta//3FQy1vQ31n+yNYJMH8Inw=">AAACTXicbVBNSyNBEO3J+pl1NerRy2AQ9iBhZhHXi+iyBz0qGBWSEHp6apImPd293TVidpi/spc96J8Rj+IP8SZiz8SDXwUNj/eq6lW/SAtuMQjuvdqXqemZ2bn5+teFb4tLjeWVU6syw6DNlFDmPKIWBJfQRo4CzrUBmkYCzqLR71I/uwBjuZInONbQS+lA8oQzio7qN1a6CJdY7ckNxEX+q+g3mkErqMr/CMIX0Ny70f/v6rt/jvrLXrMbK5alIJEJam0nDDT2cmqQMwFFvZtZ0JSN6AA6Dkqagu3llWnhbzgm9hNl3JPoV+zriZym1o7TyHWmFIf2vVaSn2mdDJOdXs6lzhAkmxglmfBR+WUSfswNMBRjBygz3N3qsyE1lKHL640L8tHfyS9KJHhkqBnnCcdNrSwvg+RysBkDU6aK1ba0uyZVRg9LgVHBirqLNXwf4kdw+qMVbre2joPm/gGZ1BxZI+vkOwnJT7JPDskRaRNGLsk/ckWuvVvvwXv0niatNe9lZpW8qdrsM07ruTM=</latexit>
A
<latexit sha1_base64="Dpg4A5h/OyqHOrXGvQTH1K0B4Xg=">AAACVXicbZDfShwxFMYz4//Vumt7pzehiyBFlxmR1kuhF/ZSwVXp7rJksmd2w2aSkJwp3Q4DPk1v9W1KH8JHEMzMeuG/A4GP7zvJOfklRgqHUfQ/CBcWl5ZXVtca6xsfNputrY+XTueWQ5drqe11whxIoaCLAiVcGwssSyRcJdPvVX71C6wTWl3gzMAgY2MlUsEZemvY2u4j/Mb6nSKROZTFZFjgQVyWdNhqR52oLvpWxE+ifbJ7f3yqv/w8G24F7f5I8zwDhVwy53pxZHBQMIuCSygb/dyBYXzKxtDzUrEM3KCoh5d01zsjmmrrj0Jau89vFCxzbpYlvjNjOHGvs8p8L+vlmB4PCqFMjqD4fFCaS4qaVkToSFjgKGdeMG6F35XyCbOMo+f2YgqK6Z/5LyolRWKZnRWpwH2jnaiACjXeHwHXtsbrOsZvk2lrJlXAmeRlw2ONX0N8Ky4PO/HXztG553tK5rVKdshnskdi8o2ckB/kjHQJJzfkL7kld8G/4CFcDJfnrWHwdOcTeVFh8xE6+bl4</latexit>
ht�1
<latexit sha1_base64="x6yT+powiPzANGgKm18RcFeQf+k=">AAACY3icbVDLThsxFHWmLwh9BMoOVRo1QkIVjWaqqrBE6gKWIBFAJFF0x3MnseKxLftORbDmF/o13bb/0Q/osv+AJ2HB60qWj8+51z4+mZHCUZL8bUXPnr94+Wpltb32+s3bd531jTOnK8uxz7XU9iIDh1Io7JMgiRfGIpSZxPNs9r3Rz3+gdUKrU5obHJUwUaIQHChQ487OkPCKFvd4bUFNsPbDTMvczcuw+at67OlzWtfjTjfpJYuKH4P0FnQPtv/vH+pPl8fj9VZ3mGtelaiIS3BukCaGRh4sCS6xbg8rhwb4DCY4CFBBiW7kF1bqeDsweVxoG5aieMHenfBQusZi6CyBpu6h1pBPaYOKiv2RF8pUhIovHyoqGZOOm3ziXFjkJOcBALcieI35FCxwCinee4XE7Hr5iwZJkVmwc18I2jXaiSZeoSa7OfKQa3NyPRPclNqaaSNwkLxuh1jThyE+Bmdfeum33teTkO8hW9YK22If2Q5L2R47YEfsmPUZZz/ZL/ab/Wn9i9aijWhz2Rq1bmfes3sVfbgBfIC/VA==</latexit>xt�1
<latexit sha1_base64="yKta//3FQy1vQ31n+yNYJMH8Inw=">AAACTXicbVBNSyNBEO3J+pl1NerRy2AQ9iBhZhHXi+iyBz0qGBWSEHp6apImPd293TVidpi/spc96J8Rj+IP8SZiz8SDXwUNj/eq6lW/SAtuMQjuvdqXqemZ2bn5+teFb4tLjeWVU6syw6DNlFDmPKIWBJfQRo4CzrUBmkYCzqLR71I/uwBjuZInONbQS+lA8oQzio7qN1a6CJdY7ckNxEX+q+g3mkErqMr/CMIX0Ny70f/v6rt/jvrLXrMbK5alIJEJam0nDDT2cmqQMwFFvZtZ0JSN6AA6Dkqagu3llWnhbzgm9hNl3JPoV+zriZym1o7TyHWmFIf2vVaSn2mdDJOdXs6lzhAkmxglmfBR+WUSfswNMBRjBygz3N3qsyE1lKHL640L8tHfyS9KJHhkqBnnCcdNrSwvg+RysBkDU6aK1ba0uyZVRg9LgVHBirqLNXwf4kdw+qMVbre2joPm/gGZ1BxZI+vkOwnJT7JPDskRaRNGLsk/ckWuvVvvwXv0niatNe9lZpW8qdrsM07ruTM=</latexit>
A
<latexit sha1_base64="nrkEd8t8hkoI/4c7l9PSpLwa7iI=">AAACVXicbZDfShwxFMYz4//Vdtf2Tm+CiyCtLDMirZdCL+ylgqvi7rJksmd2w2aSkJwp3Q4DPk1v9W1KH8JHEMzMeuG/A4GP7zvJOfklRgqHUfQ/CBcWl5ZXVtca6xsfPjZbm58unM4thy7XUturhDmQQkEXBUq4MhZYlki4TKY/qvzyF1gntDrHmYFBxsZKpIIz9NawtdVH+I31O0UicyiLybDAr3FZ0mGrHXWiuuhbET+J9vHu/dGJ/nJ9OtwM2v2R5nkGCrlkzvXiyOCgYBYFl1A2+rkDw/iUjaHnpWIZuEFRDy/prndGNNXWH4W0dp/fKFjm3CxLfGfGcOJeZ5X5XtbLMT0aFEKZHEHx+aA0lxQ1rYjQkbDAUc68YNwKvyvlE2YZR8/txRQU0z/zX1RKisQyOytSgftGO1EBFWq8PwKubY3XdYzfJtPWTKqAM8nLhscav4b4VlwcdOJvncMzz/eEzGuVbJMdskdi8p0ck5/klHQJJzfkL7kld8G/4CFcDJfnrWHwdOczeVFh8xE3Rbl2</latexit>
ht+1
<latexit sha1_base64="OM7nXNLBGIOCvjRbmmfWlkpbQ84=">AAACY3icbVDLThsxFHWmLwh9BMoOVRo1QkItimaqqrBE6gKWIBFAJFF0x3MnseKxLftORbDmF/o13bb/0Q/osv+AJ2HB60qWj8+51z4+mZHCUZL8bUXPnr94+Wpltb32+s3bd531jTOnK8uxz7XU9iIDh1Io7JMgiRfGIpSZxPNs9r3Rz3+gdUKrU5obHJUwUaIQHChQ487OkPCKFvd4bUFNsPbDTMvczcuw+at67OlzWtfjTjfpJYuKH4P0FnQPtv/vH+pPl8fj9VZ3mGtelaiIS3BukCaGRh4sCS6xbg8rhwb4DCY4CFBBiW7kF1bqeDsweVxoG5aieMHenfBQusZi6CyBpu6h1pBPaYOKiv2RF8pUhIovHyoqGZOOm3ziXFjkJOcBALcieI35FCxwCinee4XE7Hr5iwZJkVmwc18I2jXaiSZeoSa7OfKQa3NyPRPclNqaaSNwkLxuh1jThyE+Bmdfeum33teTkO8hW9YK22If2Q5L2R47YEfsmPUZZz/ZL/ab/Wn9i9aijWhz2Rq1bmfes3sVfbgBeM6/Ug==</latexit>xt+1
<latexit sha1_base64="iCteJoXG2XVarc3QTnSciZoEvs0=">AAACUXicbZDPbhMxEMZnt/wpaYGUSlzKwSJC4lBFuxWiHCtxaI9FatpKSYi8zmxixWtb9mxFWPbYF+kVbr3zEpx4FG54Nz3QlpEsffq+sWf8y6ySnpLkdxSvPXj46PH6k87G5tNnz7tbL069KZ3AgTDKuPOMe1RS44AkKTy3DnmRKTzLFh+b/OwCnZdGn9DS4rjgMy1zKTgFa9J9OSL8Qu07VaZKrKv5hGo26faSftIWuy/SG9E7eHW5+fl6++fxZCvqjaZGlAVqEop7P0wTS+OKO5JCYd0ZlR4tFws+w2GQmhfox1U7uGZvgjNluXHhaGKt+++NihfeL4ssdBac5v5u1pj/y4Yl5R/GldS2JNRiNSgvFSPDGhpsKh0KUssguHAy7MrEnDsuKDC7NYXk4uvqF41SMnPcLatc0q41XjYwpZ7tTlEY16L1fRu2KYyz8yYQXIm6E7CmdyHeF6d7/fR9/92nwPcQVrUOO/Aa3kIK+3AAR3AMAxDwDa7gO/yIfkV/YojjVWsc3dzZhlsVb/wFGc64BQ==</latexit>
ht
<latexit sha1_base64="swEwItEFPj9KIPdwlEYzj9ErqPI=">AAACYXicbVBNbxMxEHWWQtsUaFKOvawaUXGoot2qgh44VOIAEpcgkbYoiaJZ72xixWtb9ixqsPYfcOC39Fr+CGf+CN6kB/oxkuXn92bs55cZKRwlyZ9W9GTj6bPNre32zvMXL3c73b1zpyvLcci11PYyA4dSKBySIImXxiKUmcSLbPGh0S++o3VCq6+0NDgpYaZEIThQoKadwzHhFa3u8dqCmmHtx5mWuVuWYfNX9dRTXU87vaSfrCp+CNJb0Ds7+Dbo/vr8fjDttnrjXPOqREVcgnOjNDE08WBJcIl1e1w5NMAXMMNRgApKdBO/MlLHrwOTx4W2YSmKV+z/Ex5K1xgMnSXQ3N3XGvIxbVRRcTrxQpmKUPH1Q0UlY9Jxk06cC4uc5DIA4FYErzGfgwVOIcM7r5BY/Fj/okFSZBbs0heCjox2oglXqNlRjjyk2pxc3wQ3pbZm3ggcJK/bIdb0fogPwflxP33bP/kS8v3I1rXF9tkBe8NS9o6dsU9swIaMs5/smt2w362/0XbUifbWrVHrduYVu1PR/j9yOb5R</latexit>xt
<latexit sha1_base64="yKta//3FQy1vQ31n+yNYJMH8Inw=">AAACTXicbVBNSyNBEO3J+pl1NerRy2AQ9iBhZhHXi+iyBz0qGBWSEHp6apImPd293TVidpi/spc96J8Rj+IP8SZiz8SDXwUNj/eq6lW/SAtuMQjuvdqXqemZ2bn5+teFb4tLjeWVU6syw6DNlFDmPKIWBJfQRo4CzrUBmkYCzqLR71I/uwBjuZInONbQS+lA8oQzio7qN1a6CJdY7ckNxEX+q+g3mkErqMr/CMIX0Ny70f/v6rt/jvrLXrMbK5alIJEJam0nDDT2cmqQMwFFvZtZ0JSN6AA6Dkqagu3llWnhbzgm9hNl3JPoV+zriZym1o7TyHWmFIf2vVaSn2mdDJOdXs6lzhAkmxglmfBR+WUSfswNMBRjBygz3N3qsyE1lKHL640L8tHfyS9KJHhkqBnnCcdNrSwvg+RysBkDU6aK1ba0uyZVRg9LgVHBirqLNXwf4kdw+qMVbre2joPm/gGZ1BxZI+vkOwnJT7JPDskRaRNGLsk/ckWuvVvvwXv0niatNe9lZpW8qdrsM07ruTM=</latexit>
A
![Page 12: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/12.jpg)
Why Transformers?
self-attention
Direct pair-wise interaction between any tokens in the sequence
![Page 13: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/13.jpg)
Pretraining Objective
I feel like eating <MASK> today. What <MASK> you want to eat?
training instance (MLM):
x:y: noodles, do
training instance (NSP):
I feel like eating <MASK> today. ||| What <MASK> you want to eat? x:y: True
![Page 14: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/14.jpg)
Pretraining Objective
What makes a good pretraining objective?
1. No human labeling should be involved.
2. Leads to good representations. (How and why?)
![Page 15: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/15.jpg)
Mutual InformationI(A,B) = H(A)�H(A | B)
= H(B)�H(B | A).<latexit sha1_base64="QXgRizGYm9up2pzRMvocLgRogPM=">AAACOXicbZDLSgMxFIYz3h1vVZdugkVpRctMXehGaOum7ipYFTqlZDKnbTCTGZKMUIa+lhvfwp3gxoUibn0B02kFbwdCfr5zSc7vx5wp7TiP1tT0zOzc/MKivbS8srqWW9+4VFEiKTRpxCN57RMFnAloaqY5XMcSSOhzuPJvTkf5q1uQikXiQg9iaIekJ1iXUaIN6uQang89JlLCWU/sDe2zQnW/VsS7J7heqBbxwejyQhZgAz3PznhtzGsZrxZLtgci+JrQyeWdkpMF/ivcicijSTQ6uQcviGgSgtCUE6VarhPrdkqkZpTD0PYSBTGhN6QHLSMFCUG102zzId4xJMDdSJojNM7o946UhEoNQt9UhkT31e/cCP6XayW6e9xOmYgTDYKOH+omHOsIj2zEAZNANR8YQahk5q+Y9okkVBuzbWOC+3vlv+KyXHIPS+Xzcr5Sm9ixgLbQNiogFx2hCqqjBmoiiu7QE3pBr9a99Wy9We/j0ilr0rOJfoT18QkL9aXh</latexit>
Goal of Training:
I(f(A,B)) � Ep(a,b)
2
4f✓(a, b)� Eq(B̃)
2
4logX
b̃2B̃
exp f✓(a, b̃)
3
5
3
5+ log | B̃ |,<latexit sha1_base64="r6JuHrhWnyrMaV3s4JC3i/dP8q8=">AAADAXicdVLLjtMwFHXCayivDixYsLGoQCmUKikLWA5FSLAbJDozUl1FjnOTWuM4mdhBVJbZ8CtsWIAQW/6CHX+D04YR8+BKlo/OOfde+9pJJbjSYfjb8y9cvHT5ytbV3rXrN27e6m/f3lNlUzOYsVKU9UFCFQguYaa5FnBQ1UCLRMB+cviy1fffQ614Kd/pVQWLguaSZ5xR7ah427tLEsi5NFTwXD6yvTdBFrwYTYdD/JDkcIRJQfUyScwrG5sqoKNkaImATM+z2JCkFKlaFW4zRC9BU2vXFvzkRN5RQDQXKZg1yagwU2uHFm8KEVHmmKimcAU3tsRJXOJjd0e7JCfAhwr/v/dxhSGpeb7Ui27Dj/GmT8HTc+q29KhHQKZ/BxH3B+E4XAc+C6IODFAXu3H/F0lL1hQgNRNUqXkUVnphaK05E2B7pFFQUXZIc5g7KGkBamHWL2jxA8ekOCtrt6TGa/bfDEML1V7WOdvDq9NaS56nzRudPV8YLqtGg2SbRlkjsC5x+x1wymtgWqwcoKzm7qyYLWlNmXafpueGEJ2+8lmwNxlHT8eTt5PBzrQbxxa6h+6jAEXoGdpBr9EumiHmffQ+e1+9b/4n/4v/3f+xsfpel3MHnQj/5x8flvZ8</latexit>
Ep(a,b)
2
4f✓(a, b)� logX
b̃2B
exp f✓(a, b̃)
3
5 .
<latexit sha1_base64="HWWqaLShN8G0k23b11CaDOr4GQI=">AAAClHicdVFNb9QwEHXCV1m+tiBx4WKxQmoRrJLlUA4cllaVOKEisW2ldbSynUnWquNE9gSxsvKL+Dfc+Dc46YKghZEsP715M543Fo1WDpPkRxTfuHnr9p2du6N79x88fDTefXzq6tZKWMha1/ZccAdaGVigQg3njQVeCQ1n4uKoz599AetUbT7jpoGs4qVRhZIcA7Uaf2MCSmU816o0L7sRqziuhfDH3co3e/yV2O+YhgKXxcozUevcbapweYZrQN51g4S+pkzXJWWurYIMlc7Bi44yZejQUHLtD7tAwNeG/r/T78p9ZlW5xmw6YmDyX8OtxpNkmgxBr4N0CyZkGyer8XeW17KtwKDU3LllmjSYeW5RSQ3BbOug4fKCl7AM0PAKXOaHpXb0RWByWtQ2HIN0YP+s8LxyvYWg7C26q7me/Fdu2WLxNvPKNC2CkZcPFa2mWNP+h2iuLEjUmwC4tCrMSuWaWy4x/OMoLCG9avk6OJ1N0zfT2afZZH64XccOeUaekz2SkgMyJx/ICVkQGe1GB9E8eh8/jd/FR/HxpTSOtjVPyF8Rf/wJdcrLXA==</latexit> Cross Entropy (Softmax)
f✓(a, b) = g (b)>g!(a)
<latexit sha1_base64="TwjtQpEsIzbklvts/4XlEZugdnk=">AAACXXicbVHBattAEF0pTZuoaeomhx5yWWoKcSlGcg/tpRCSS48JxEnAcs1oNZKXrHbF7qhghH6yt/bSX8nacaBxMrDs4715zOzbrFbSURz/CcKtF9svX+3sRq/33uy/7b07uHKmsQLHwihjbzJwqKTGMUlSeFNbhCpTeJ3dni31619onTT6khY1TisotSykAPLUrEdphqXULShZ6k9dVMzaNDMqd4vKX21KcyToumP4nA34d15uyLWTXswGP32nqbtN2VRYrtyDKEWdP0yZ9frxMF4VfwqSNeizdZ3Per/T3IimQk1CgXOTJK5p2oIlKRR2Udo4rEHcQokTDzVU6KbtKp2Of/RMzgtj/dHEV+z/jhYqt1zYd1ZAc7epLcnntElDxbdpK3XdEGpxP6hoFCfDl1HzXFoUpBYegLDS78rFHCwI8h8S+RCSzSc/BVejYfJlOLoY9U9O13HssCP2gR2zhH1lJ+wHO2djJtjfgAW7QRT8C7fDvXD/vjUM1p5D9qjC93ew4Lce</latexit>
✓ = {!, }<latexit sha1_base64="jkshjWuQ3xBEJGiq6Fxr9wvRr2g=">AAACRHicbVBNSyNBEO1Rd9fNfph1j14aw4LIEmbiQS8LohePCkaFTAg1PZVJY0/30F0jhGF+3F78Ad78BV48KItXsROzYNSCph/v1auufkmhpKMwvA4WFpc+fPy0/Lnx5eu37yvNH6snzpRWYFcYZexZAg6V1NglSQrPCouQJwpPk/P9iX56gdZJo49pXGA/h0zLoRRAnho0e3GCmdQVKJnpzboRJ0albpz7q4pphAQ1/8Pjao43OWZQ/57jCifr2PtRp/+HDZqtsB1Oi78F0Qy02KwOB82rODWizFGTUOBcLwoL6ldgSQqFfnjpsABxDhn2PNSQo+tX0xBq/sszKR8a648mPmVfOirI3WRZ35kDjdxrbUK+p/VKGu70K6mLklCL54eGpeJk+CRRnkqLgtTYAxBW+l25GIEFQT73hg8hev3lt+Ck04622p2jTmt3bxbHMltj62yDRWyb7bIDdsi6TLC/7IbdsfvgMrgN/gUPz60Lwczzk81V8PgEaq60Fg==</latexit>
InfoNCE (Logeswaran & Lee, 2018; van den Oord et al., 2019)
![Page 16: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/16.jpg)
Mutual Information
Ep(a,b)
2
4f✓(a, b)� logX
b̃2B
exp f✓(a, b̃)
3
5 .
<latexit sha1_base64="HWWqaLShN8G0k23b11CaDOr4GQI=">AAAClHicdVFNb9QwEHXCV1m+tiBx4WKxQmoRrJLlUA4cllaVOKEisW2ldbSynUnWquNE9gSxsvKL+Dfc+Dc46YKghZEsP715M543Fo1WDpPkRxTfuHnr9p2du6N79x88fDTefXzq6tZKWMha1/ZccAdaGVigQg3njQVeCQ1n4uKoz599AetUbT7jpoGs4qVRhZIcA7Uaf2MCSmU816o0L7sRqziuhfDH3co3e/yV2O+YhgKXxcozUevcbapweYZrQN51g4S+pkzXJWWurYIMlc7Bi44yZejQUHLtD7tAwNeG/r/T78p9ZlW5xmw6YmDyX8OtxpNkmgxBr4N0CyZkGyer8XeW17KtwKDU3LllmjSYeW5RSQ3BbOug4fKCl7AM0PAKXOaHpXb0RWByWtQ2HIN0YP+s8LxyvYWg7C26q7me/Fdu2WLxNvPKNC2CkZcPFa2mWNP+h2iuLEjUmwC4tCrMSuWaWy4x/OMoLCG9avk6OJ1N0zfT2afZZH64XccOeUaekz2SkgMyJx/ICVkQGe1GB9E8eh8/jd/FR/HxpTSOtjVPyF8Rf/wJdcrLXA==</latexit>
Cross Entropy (Softmax)
“Hope” “Fear”
...<latexit sha1_base64="C+LMdhPjPVUFsZ7cnLmlQNDRtHs=">AAAB7XicbVA9SwNBEJ2LXzF+RS1tFoNgFe5ioWXQxjKC+YDkCHt7m2TN3u2xOxcIR/6DjYUitv4fO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZVSqGW8yJZXuBNRwKWLeRIGSdxLNaRRI3g7Gd3O/PeHaCBU/4jThfkSHsRgIRtFKrd4kVGj65YpbdRcg68TLSQVyNPrlr16oWBrxGJmkxnQ9N0E/oxoFk3xW6qWGJ5SN6ZB3LY1pxI2fLa6dkQurhGSgtK0YyUL9PZHRyJhpFNjOiOLIrHpz8T+vm+Lgxs9EnKTIY7ZcNEglQUXmr5NQaM5QTi2hTAt7K2EjqilDG1DJhuCtvrxOWrWqd1WtPdQq9ds8jiKcwTlcggfXUId7aEATGDzBM7zCm6OcF+fd+Vi2Fpx85hT+wPn8Acwdj0Q=</latexit>
a b
![Page 17: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/17.jpg)
Masked Language Modeling
g!(a)<latexit sha1_base64="sKPkJ29I6lhRZPL6GvfWNK72OUQ=">AAACH3icbVBNS8NAEN34WeNX1aOXYBHUQ0kqqMeiF48VrBaaUiabaVzc7IbdjVBC/okX/4oXD4qIN/+N21pBqwPLPt57w8y8KONMG9//cGZm5+YXFitL7vLK6tp6dWPzSstcUWxTyaXqRKCRM4FtwwzHTqYQ0ojjdXR7NtKv71BpJsWlGWbYSyERbMAoGEv1q0dhhAkTBXCWiIPSTfpFGEke62FqvyKUKSZQlnuw74Yo4m9fv1rz6/64vL8gmIAamVSrX30PY0nzFIWhHLTuBn5megUowyjH0g1zjRnQW0iwa6GAFHWvGN9XeruWib2BVPYJ443Znx0FpHq0sHWmYG70tDYi/9O6uRmc9AomstygoF+DBjn3jPRGYXkxU0gNH1oAVDG7q0dvQAE1NlLXhhBMn/wXXDXqwWG9cdGoNU8ncVTINtkheyQgx6RJzkmLtAkl9+SRPJMX58F5cl6dty/rjDPp2SK/yvn4BACMo4s=</latexit>
g (b)<latexit sha1_base64="ttH8YcZgdbzsc3jobdaI2YLi8ow=">AAACHXicbVDLSsNAFJ3UV42vqks3wSJUFyWpgi6LblxWsLXQhDCZ3KZDJ5MwMxFKyI+48VfcuFDEhRvxb5w+BG29MMzhnHO5954gZVQq2/4ySkvLK6tr5XVzY3Nre6eyu9eRSSYItEnCEtENsARGObQVVQy6qQAcBwzuguHVWL+7ByFpwm/VKAUvxhGnfUqw0pRfOXMDiCjPMaMRPynMyM/dIGGhHMX6y91U0qKoBcemCzz8cfmVql23J2UtAmcGqmhWLb/y4YYJyWLgijAsZc+xU+XlWChKGBSmm0lIMRniCHoachyD9PLJdYV1pJnQ6idCP66sCfu7I8exHK+rnTFWAzmvjcn/tF6m+hdeTnmaKeBkOqifMUsl1jgqK6QCiGIjDTARVO9qkQEWmCgdqKlDcOZPXgSdRt05rTduGtXm5SyOMjpAh6iGHHSOmugatVAbEfSAntALejUejWfjzXifWkvGrGcf/Snj8xtwB6K7</latexit>
View a — corrupted context of word i
View b — word i
...<latexit sha1_base64="C+LMdhPjPVUFsZ7cnLmlQNDRtHs=">AAAB7XicbVA9SwNBEJ2LXzF+RS1tFoNgFe5ioWXQxjKC+YDkCHt7m2TN3u2xOxcIR/6DjYUitv4fO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZVSqGW8yJZXuBNRwKWLeRIGSdxLNaRRI3g7Gd3O/PeHaCBU/4jThfkSHsRgIRtFKrd4kVGj65YpbdRcg68TLSQVyNPrlr16oWBrxGJmkxnQ9N0E/oxoFk3xW6qWGJ5SN6ZB3LY1pxI2fLa6dkQurhGSgtK0YyUL9PZHRyJhpFNjOiOLIrHpz8T+vm+Lgxs9EnKTIY7ZcNEglQUXmr5NQaM5QTi2hTAt7K2EjqilDG1DJhuCtvrxOWrWqd1WtPdQq9ds8jiKcwTlcggfXUId7aEATGDzBM7zCm6OcF+fd+Vi2Fpx85hT+wPn8Acwdj0Q=</latexit>
Transformer
[MASK] [MASK]
...<latexit sha1_base64="C+LMdhPjPVUFsZ7cnLmlQNDRtHs=">AAAB7XicbVA9SwNBEJ2LXzF+RS1tFoNgFe5ioWXQxjKC+YDkCHt7m2TN3u2xOxcIR/6DjYUitv4fO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZVSqGW8yJZXuBNRwKWLeRIGSdxLNaRRI3g7Gd3O/PeHaCBU/4jThfkSHsRgIRtFKrd4kVGj65YpbdRcg68TLSQVyNPrlr16oWBrxGJmkxnQ9N0E/oxoFk3xW6qWGJ5SN6ZB3LY1pxI2fLa6dkQurhGSgtK0YyUL9PZHRyJhpFNjOiOLIrHpz8T+vm+Lgxs9EnKTIY7ZcNEglQUXmr5NQaM5QTi2hTAt7K2EjqilDG1DJhuCtvrxOWrWqd1WtPdQq9ds8jiKcwTlcggfXUId7aEATGDzBM7zCm6OcF+fd+Vi2Fpx85hT+wPn8Acwdj0Q=</latexit>
What you want ?do
![Page 18: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/18.jpg)
Next Sentence Prediction
Transformer
I feel like eating [SEP][CLS] What you want ? [SEP]ramen
NSP
do
Binary Classification — “local” NCE (Gutmann and Hyvarinen, 2012)
“global” NCE
Transformer
I feel like eating [SEP][CLS] ramen
What you want ? [SEP]do[CLS]
Transformer
[SEP][CLS]
Transformer
[SEP][CLS]
Transformer
[SEP][CLS]
Transformer
...<latexit sha1_base64="C+LMdhPjPVUFsZ7cnLmlQNDRtHs=">AAAB7XicbVA9SwNBEJ2LXzF+RS1tFoNgFe5ioWXQxjKC+YDkCHt7m2TN3u2xOxcIR/6DjYUitv4fO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZVSqGW8yJZXuBNRwKWLeRIGSdxLNaRRI3g7Gd3O/PeHaCBU/4jThfkSHsRgIRtFKrd4kVGj65YpbdRcg68TLSQVyNPrlr16oWBrxGJmkxnQ9N0E/oxoFk3xW6qWGJ5SN6ZB3LY1pxI2fLa6dkQurhGSgtK0YyUL9PZHRyJhpFNjOiOLIrHpz8T+vm+Lgxs9EnKTIY7ZcNEglQUXmr5NQaM5QTi2hTAt7K2EjqilDG1DJhuCtvrxOWrWqd1WtPdQq9ds8jiKcwTlcggfXUId7aEATGDzBM7zCm6OcF+fd+Vi2Fpx85hT+wPn8Acwdj0Q=</latexit>
| B̃ |<latexit sha1_base64="lK4Vk0reLBOJvh3QwDhGbj4CvI4=">AAACIHicbVBNS8NAEN3Urxq/qh69BIsgHkpSD/VY6sVjBfsBTSibzaRdutmE3Y1QQn6KF/+KFw+K6E1/jZu2grY+GHi8N8PMPD9hVCrb/jRKa+sbm1vlbXNnd2//oHJ41JVxKgh0SMxi0fexBEY5dBRVDPqJABz5DHr+5Lrwe/cgJI35nZom4EV4xGlICVZaGlYarg8jyjPM6Ihf5KYb0cCNsBoTzDJXURZA1srzQjZd4MFP47BStWv2DNYqcRakihZoDysfbhCTNAKuCMNSDhw7UV6GhaKEgV6cSkgwmeARDDTlOALpZbMHc+tMK4EVxkIXV9ZM/T2R4UjKaeTrzuJ2uewV4n/eIFXhlZdRnqQKOJkvClNmqdgq0rICKoAoNtUEE0H1rRYZY4GJ0pmaOgRn+eVV0q3XnMta/bZebbYWcZTRCTpF58hBDdREN6iNOoigB/SEXtCr8Wg8G2/G+7y1ZCxmjtEfGF/fMLSkNw==</latexit>
![Page 19: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/19.jpg)
Connections with Computer Vision
Deep InfoMax (DIM; Hjelm et al., 2019)
![Page 20: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/20.jpg)
Type of Architecture
Encoders
Encoder-Decoders Decoders
Parameters are what we get from the pretraining process.
Pros for the “encoders” architecture:
Gets bidirectional context.
Easy to use in language understanding tasks!
Other members in the family:
![Page 21: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/21.jpg)
BERT for Understanding
BERT
<CLS> This must be the greatest movie ever !
Positive / Negative
![Page 22: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/22.jpg)
BERT for Generation
<MASK> <MASK> <MASK> <MASK> <MASK> <MASK> <MASK> <MASK> <MASK>
What
BERT
![Page 23: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/23.jpg)
BERT for Generation
<MASK> <MASK> <MASK> <MASK> <MASK> <MASK> <MASK> <MASK>
What
What
do
Input has been changed. The representations will need to be recomputed!
Not a very good idea…
BERT
![Page 24: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/24.jpg)
Pretrained Models
— pretraining objective
— neural representation learner
— type of architecture
Ep(xi,x̂i)[p(xi | x̂i)]<latexit sha1_base64="pLw1dc0bFf82DgikPLTS3krQVCg=">AAACU3icdVFNS8QwEE3r+rV+rXr0ElwEFVna9aBHUQSPCq4K21LSdHY3mqYlScUl9D+K4ME/4sWDprsr+DkQ8nhvZjLzEuecKe15L447VZuemZ2bry8sLi2vNFbXrlRWSAodmvFM3sREAWcCOpppDje5BJLGHK7ju5NKv74HqVgmLvUwhzAlfcF6jBJtqahxG8TQZ8IQzvpit6wHKdGDODanZWTy7YeI7QVxxhM1TO1lggHR5qEsI7ZTdkcyDlKW4H9ywnoAIvnsHTWaXssbBf4N/AlookmcR42nIMlokYLQlBOlur6X69AQqRnlYGctFOSE3pE+dC0UJAUVmpEnJd6yTIJ7mbRHaDxiv1YYkqpqYptZrax+ahX5l9YtdO8wNEzkhQZBxw/1Co51hiuDccIkUM2HFhAqmZ0V0wGRhGr7DXVrgv9z5d/gqt3y91vti3bz6HhixxzaQJtoG/noAB2hM3SOOoiiR/SK3h3kPDtvruvWxqmuM6lZR9/CXfoAfmy2CQ==</latexit>
![Page 25: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/25.jpg)
GPT (Generative Pretrained Transformer)
Radford et al., 2018
Decoders
![Page 26: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/26.jpg)
Transformer as Decoder
Happy mid autumn festival
Need to prevent the attention the future words.
Happy
Happy
<s>
mid autumn festival
mid
autumn
festival
causal attention
![Page 27: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/27.jpg)
GPT (Generative Pretrained Transformer)
Radford et al., 2018
...<latexit sha1_base64="C+LMdhPjPVUFsZ7cnLmlQNDRtHs=">AAAB7XicbVA9SwNBEJ2LXzF+RS1tFoNgFe5ioWXQxjKC+YDkCHt7m2TN3u2xOxcIR/6DjYUitv4fO/+Nm+QKTXww8Hhvhpl5QSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZVSqGW8yJZXuBNRwKWLeRIGSdxLNaRRI3g7Gd3O/PeHaCBU/4jThfkSHsRgIRtFKrd4kVGj65YpbdRcg68TLSQVyNPrlr16oWBrxGJmkxnQ9N0E/oxoFk3xW6qWGJ5SN6ZB3LY1pxI2fLa6dkQurhGSgtK0YyUL9PZHRyJhpFNjOiOLIrHpz8T+vm+Lgxs9EnKTIY7ZcNEglQUXmr5NQaM5QTi2hTAt7K2EjqilDG1DJhuCtvrxOWrWqd1WtPdQq9ds8jiKcwTlcggfXUId7aEATGDzBM7zCm6OcF+fd+Vi2Fpx85hT+wPn8Acwdj0Q=</latexit>
Previous Context
Next Word
lookup table
Transformer
GIF credit: Lena Voita
![Page 28: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/28.jpg)
GPT for Understanding
GPT
This must be the greatest movie ever !
Positive / Negative
![Page 29: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/29.jpg)
GPT for Generation
GPT
This must be the greatest movie
ever
![Page 30: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/30.jpg)
GPT for Generation
GPT
This must be the greatest movie ever
!
Just “grow” the transformer!
![Page 31: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/31.jpg)
T5 (Text-to-Text Transfer Transformer)
Raffel et al., 2020
Encoder-Decoders
![Page 32: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/32.jpg)
T5 (Text-to-Text Transfer Transformer)
Raffel et al., 2020Thank you <X> me to your party <Y> week.
<X> for inviting <Y> last <Z>
![Page 33: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/33.jpg)
T5 (Text-to-Text Transfer Transformer)
Raffel et al., 2020
![Page 34: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/34.jpg)
T5 (Text-to-Text Transfer Transformer)
Raffel et al., 2020
![Page 35: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/35.jpg)
ELMo (Embeddings from Language Models)
Encoders
Bidirectional Language Model
Peters et al., 2018
![Page 36: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/36.jpg)
ELMo (Embeddings from Language Models)
![Page 37: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/37.jpg)
BART (Denoising Sequence-to-Sequence Pre-training )
Lewis et al., 2018
Encoder-Decoders
![Page 38: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/38.jpg)
BART (Denoising Sequence-to-Sequence Pre-training )
Lewis et al., 2018
![Page 39: Large Pretrained Models - nlp.cs.hku.hk](https://reader031.vdocuments.us/reader031/viewer/2022020700/61f4b525ff6530776104c185/html5/thumbnails/39.jpg)
InfoWord
Kong et al., 2019
Transformer
TransformerTransformer Transformer Transformer Transformer. . .<latexit sha1_base64="f6gDVSy0KXdLUDs2/Vp+blJOSrY=">AAACAnicbVDLSgNBEOz1GeMr6tHLYBA8hd0o6DHgxWME84BkCbOzk2TM7M4y0yuEJTc/wKt+gjfx6o/4Bf6Gk2QPJrGgoajqprsrSKQw6Lrfztr6xubWdmGnuLu3f3BYOjpuGpVqxhtMSaXbATVcipg3UKDk7URzGgWSt4LR7dRvPXFthIofcJxwP6KDWPQFo2ilZleGCk2vVHYr7gxklXg5KUOOeq/00w0VSyMeI5PUmI7nJuhnVKNgkk+K3dTwhLIRHfCOpTGNuPGz2bUTcm6VkPSVthUjmal/JzIaGTOOAtsZURyaZW8q/ud1Uuzf+JmIkxR5zOaL+qkkqMj0dRIKzRnKsSWUaWFvJWxINWVoA1rYEig1QhqYiU3GW85hlTSrFe+yUr2/KtfcPKMCnMIZXIAH11CDO6hDAxg8wgu8wpvz7Lw7H87nvHXNyWdOYAHO1y/owZhq</latexit>
Global View
Local View
. . .<latexit sha1_base64="f6gDVSy0KXdLUDs2/Vp+blJOSrY=">AAACAnicbVDLSgNBEOz1GeMr6tHLYBA8hd0o6DHgxWME84BkCbOzk2TM7M4y0yuEJTc/wKt+gjfx6o/4Bf6Gk2QPJrGgoajqprsrSKQw6Lrfztr6xubWdmGnuLu3f3BYOjpuGpVqxhtMSaXbATVcipg3UKDk7URzGgWSt4LR7dRvPXFthIofcJxwP6KDWPQFo2ilZleGCk2vVHYr7gxklXg5KUOOeq/00w0VSyMeI5PUmI7nJuhnVKNgkk+K3dTwhLIRHfCOpTGNuPGz2bUTcm6VkPSVthUjmal/JzIaGTOOAtsZURyaZW8q/ud1Uuzf+JmIkxR5zOaL+qkkqMj0dRIKzRnKsSWUaWFvJWxINWVoA1rYEig1QhqYiU3GW85hlTSrFe+yUr2/KtfcPKMCnMIZXIAH11CDO6hDAxg8wgu8wpvz7Lw7H87nvHXNyWdOYAHO1y/owZhq</latexit>
“Real” “Fake”