Download - Deep Learning & Natural Language processing
![Page 1: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/1.jpg)
Deep Learning
& Natural Language processingEmbeddings, CNNs, RNNs, etc...
Julius B. Kirkegaard 2019Snippets at: https://bit.ly/2VWvs3m
![Page 2: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/2.jpg)
Deep Neural Networks & PyTorch
![Page 3: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/3.jpg)
Deep neural networks
![Page 4: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/4.jpg)
Neural networks
Network architectureParameters
Data (perhaps preprocessed)
![Page 5: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/5.jpg)
Neural networks
Some matrix
Some non-linear function
”Activation Function”
Some vector
![Page 6: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/6.jpg)
Deep neural networks
![Page 7: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/7.jpg)
Requirements for DNN Frameworks
• Optimisation of parameters p
• Take first order derivatives
• Chain rule (backpropagation)
• Process large amounts of data fast
• Exploit GPUs
• Nice to haves:
• Standard functions and operations built-in
• Built-in optimizers
• Spread training across network
• Compile for fast inference
• …
![Page 8: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/8.jpg)
PyTorch
• GPU acceleration
• Automatic Error-Backpropagation
(chain rule through operations)
• Tons of functionality built-in
Hard to play with,
not good for new ideas
and research (IMO)
Easy play. Dificult to
implement custom and
dynamic architechtures
![Page 9: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/9.jpg)
Requirement 1: Calculate gradients
![Page 10: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/10.jpg)
Requirement 2: GPU
![Page 11: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/11.jpg)
Simple Neural Network: ”Guess the Mean”
1 2 3
Neural network in three steps:
Design the architecture
and initialise parametersCalculate the loss Update parameters based
on loss gradient
Warning: This is not the best way to
implement, better version will follow…
(this version is for understanding)Snippets at: https://bit.ly/2VWvs3m
![Page 12: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/12.jpg)
Better optimiser stepping
• What if some gradients are much smaller than others?
• What happens when gradients disappear when loss is small?
Solution → Variable learning rates and momentum
• Many algorithms exists, perhaps most popular: “Adam”
![Page 13: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/13.jpg)
Better optimiser stepping
SGD (Stochastic gradient descent) Adam (Adaptive Moment Estimation)
simple_nn.py module_nn.py
Snippets at: https://bit.ly/2VWvs3m
![Page 14: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/14.jpg)
Representing sentences
![Page 15: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/15.jpg)
The Trouble
“Hej med dig”
![Page 16: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/16.jpg)
Bag of Words
“Hej med dig”
“Hej hej”
![Page 17: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/17.jpg)
Bag of Words, poor behaviour #1
“I had my car cleaned.”
“I had cleaned my car.” (order ignored)
![Page 18: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/18.jpg)
Bag of Words, poor behaviour #2
“Hej med dig”
“Heej med dig”
“Haj medd dig” (typos)
(semantically similar)
![Page 19: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/19.jpg)
Bag of Words, poor behaviour #3
“Hej med dig”
![Page 20: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/20.jpg)
The idea for a solution
Idea: Represent each word as a vector, with similar words
having vector that are close
Problem: how to choose the vector representing each word?
![Page 21: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/21.jpg)
The idea for a solution
The country was ruled by a _____
The bishop anointed the ____ with aromatic oils
The crown was put on the ____
”Context defines meaning”:
King/Queen
![Page 22: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/22.jpg)
Continous Bag of Words
• Input is a ”one-hot” vector
• We force network to make eachword into a
~200 length vector
• From these vector we predict ”focus word”
• When done, keep ”embeddings”
See e.g. https://github.com/FraLotito/pytorch-continuous-bag-of-words/blob/master/cbow.py
for simple implementation
The bishop anointed the with aromatic oilsqueen
Context ContextFocus
word
![Page 23: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/23.jpg)
Continous Bag of Words
I think therefore
ContextContext
Focus
word
I am Dictionary: [“I”, “think”, “therefore”, “am”]
Context size = 2
![Page 24: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/24.jpg)
Continous Bag of Words
Very simple version:
![Page 25: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/25.jpg)
Continous Bag of Words
Probability distribution
of all words in
dictionary.
Can be > 1 million
words, so smarter
training techniques are
typically used:
“Negative sampling”
![Page 26: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/26.jpg)
Vectors
![Page 27: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/27.jpg)
Word2Vec Vectors
![Page 28: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/28.jpg)
Word2Vec Vectors
King – Man + Woman = Queen
![Page 29: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/29.jpg)
Pretrained word vectors
• Glove: https://nlp.stanford.edu/projects/glove/
• FastText: https://fasttext.cc/docs/en/crawl-vectors.html
• ELMo: https://github.com/HIT-SCIR/ELMoForManyLangs
Can be used as-is or further trained on specific corpus
Trained on Wikipedia and “common crawl”
![Page 30: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/30.jpg)
Representing sentences
Using word embeddings sentences become “pictures”:
“I think therefore I am”
5 x 200 matrix
![Page 31: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/31.jpg)
Convolutional Neural Networks
![Page 32: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/32.jpg)
CNNs: Convolutional Neural Networks
is trainable
![Page 33: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/33.jpg)
CNNs: Convolutional Neural Networks
Padded with zeros
![Page 34: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/34.jpg)
CNNs: Convolutional Neural Networks
Padded with zeros,
Stride = 2
![Page 35: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/35.jpg)
CNNs: Convolutional Neural Networks
Kernels = Filters = Features in CNN language
![Page 36: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/36.jpg)
Pooling
Max-pooling 3x3
Pooling = Subsampling in CNN language
![Page 37: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/37.jpg)
CNNs: Convolutional Neural Networks
![Page 38: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/38.jpg)
Text Classification
Standard choices:
• Convolutional Neural Networks
• Recursive Neural Networks (LSTMs)
![Page 39: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/39.jpg)
Classification using CNN
See e.g. https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/4%20-%20Convolutional%20Sentiment%20Analysis.ipynb
1D convolutions with 2D filters
(embedding size x kernel size)
![Page 40: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/40.jpg)
Recursive Neural Networks
![Page 41: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/41.jpg)
Language Modelling
Hi mom, I’ll be late for …
![Page 42: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/42.jpg)
Neural networks
Network architectureParameters
Data (perhaps preprocessed)
![Page 43: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/43.jpg)
Recursive neural networks
Network architecture
Parameters
Data (perhaps preprocessed)
Hidden state
![Page 44: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/44.jpg)
What are recursive neural networks?
Example: (“classic” RNN):
![Page 45: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/45.jpg)
Language Modelling with RNNs
Hi mom, I’ll be late for …
can be used to predict next word
![Page 46: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/46.jpg)
Language Modelling with RNNs
Snippets at: https://bit.ly/2VWvs3m
![Page 47: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/47.jpg)
RNN Design choices
“I grew up in France” “Since my mother tongue is ____”
Standard RNN:
![Page 48: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/48.jpg)
LSTMs: Long-Short Term Memory
Standard RNN:
LSTM:
See https://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Page 49: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/49.jpg)
LSTMs: Long-Short Term Memory
Standard RNN:
LSTM:
![Page 50: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/50.jpg)
LSTM Language Model
“I’ll be late for….”
Sample loop: take word of highest probability and repeat
(real models tend to stack many LSTMs)
![Page 51: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/51.jpg)
Sampling
Shakespeare
PANDARUS:Alas, I think he shall be come approached and the dayWhen little srain would be attain’d into being never fed,And who is but a chain and subjects of his death,I should not sleep.Second Senator:They are away this miseries, produced upon my soul,Breaking and strongly should be buried, when I perishThe earth and thoughts of many states.
Wikipedia
Naturalism and decision for the majority of Arab countries’ capitalide was groundedby the Irish language by [[John Clair]], [[An Imperial Japanese Revolt]], associatedwith Guangzham’s sovereignty. His generals were the powerful ruler of the Portugalin the [[Protestant Immineners]], which could be said to be directly in CantoneseCommunication, which followed a ceremony and set inspired prison, training. Theemperor travelled back to [[Antioch, Perth, October 25|21]] to note, the Kingdomof Costa Rica, unsuccessful fashioned the [[Thrales]], [[Cynth’s Dajoard]], knownin western [[Scotland]], near Italy to the conquest of India with the conflict.
<revision><id>15900676</id><timestamp>2002–08–03T18:14:12Z</timestamp><contributor><username>Paris</username><id>23</id></contributor><minor /><comment>Automated conversion</comment></revision>
XML ObamaGood everybody. Thank you very much. God bless the United States of America, and has already began with the world’s gatheringIt’s about hard-earned for our efforts that are not continued.We are all the assumptionion to the streets of the Americas that we are still for everybody and destruction.We are doing a lot of this.I know that someone would be prefered to their children to take a million insurance company. We’re watching their people and continued to find ourselves with RepubThank you. God bless you. Good morning, everybody. And May God loss man. Thank you very much. Thank you very much, everybody.
![Page 52: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/52.jpg)
OpenAI Language Model…
Prompt: “Recycling is good for the world, no, you could not be more wrong”
“Recycling is NOT good for the world. It is bad for the
environment, it is bad for our health, and it is bad for our
economy. I’m not kidding. Recycling is not good for the
environment. It is destructive to the earth and it is a major
contributor to global warming. Recycling is not good for our
health. It contributes to obesity and diseases like heart disease
and cancer. Recycling is bad for our economy. It increases the
cost of a product, and in turn, the price of everything that is made
with that product. Recycling is not good for our nation. We pay a
tremendous price for the privilege of having the world’s most
advanced and efficient recycling system. Recycling is a huge,
colossal waste of time, energy, money, and resources.”
![Page 53: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/53.jpg)
Transfer Learning & State-of-the-Art
![Page 54: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/54.jpg)
Example: IMDB Dataset
Task: Guess the rating from review text
![Page 55: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/55.jpg)
Transfer learning
![Page 56: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/56.jpg)
Transfer learning
![Page 57: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/57.jpg)
The Strength of Transfer learning
IMDB: What if only 1 % of reviews included a rating?
can the remaining 99 % reviews be used for anything?
Language model!
(and this is very, very standard situation, in academia and industry)
![Page 58: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/58.jpg)
The Strength of Transfer learning
“… we found that training our approach with only 100 labeled examples (and giving it access to about 50,000 unlabeled examples),
we were able to achieve the same performance as training a model from scratch with 10,000 labeled examples.
Another important insight was that we could use any reasonably general and large“
- Howard & Ruder (2018)
![Page 59: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/59.jpg)
Transfer learning: Other methods
They all laughed. [NEXT] Frodo felt his spirits reviving.
They all laughed. [NEXT] Bag End seemed sad and gloomy and dishevelled.
Task: Classify if two sentences are next to each other
See e.g. https://arxiv.org/abs/1810.04805
“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”
![Page 60: Deep Learning & Natural Language processing](https://reader034.vdocuments.us/reader034/viewer/2022050403/626feb4a27b0f26eee045638/html5/thumbnails/60.jpg)
Concepts skipped
• Encoder-Decoders (sequence to sequence)
• Attention
• Transformers
See e.g. paper: “Attention Is All You Need” (2017)