deep learning for natural language...

56
Deep Learning for Natural Language Processing

Upload: others

Post on 28-May-2020

36 views

Category:

Documents


0 download

TRANSCRIPT

Deep Learning for Natural Language Processing

Topics

•  Word embeddings

•  Recurrent neural networks

•  Long-short-term memory networks

•  Neural machine translation

•  Automatically generating image captions

Word meaning in NLP •  How do we capture meaning and context of words?

Synonyms: Synechdoche: “I loved the movie.” “Today, Washington affirmed “I adored the movie.” its opposition to the trade pact.” Homonyms: “I deposited the money in the bank.” “I buried the money in the bank.”

Polysemy: “I read a book today.” “I wasn’t able to book the hotel room.”

Word Embeddings

“One of the most successful ideas of modern NLP”.

One example: Google’s Word2Vec algorithm

Word2Vec algorithm

...

...

...

Word2Vec algorithm

...

...

...

Input:One-hotrepresenta.onofinputwordovervocabulary10,000units

Word2Vec algorithm

...

...

...

Input:One-hotrepresenta.onofinputwordovervocabulary10,000units

Hiddenlayer(linearac.va.onfunc.on)300units

Word2Vec algorithm

...

...

...

Input:One-hotrepresenta.onofinputwordovervocabulary10,000units

Hiddenlayer(linearac.va.onfunc.on)300units

Output:Probability(foreachwordwiinvocabulary)thatwiisnearbytheinputwordinasentence.

10,000units

Word2Vec algorithm

...

...

...

Input:One-hotrepresenta.onofinputwordovervocabulary10,000units

Hiddenlayer(linearac.va.onfunc.on)300units

Output:Probability(foreachwordwiinvocabulary)thatwiisnearbytheinputwordinasentence.

10,000units

10,000×300weights

300×10,000weights

Word2Vec training

•  Training corpus of documents

•  Collect pairs of nearby words

•  Example “document”: Every morning she drinks Starbucks coffee.

Training pairs (window size = 3): (every, morning) (morning, drinks) (drinks, Starbucks) (every, she) (she, drinks) (drinks, coffee) (morning, she) (she, Starbucks) (Starbucks, coffee)

Word2Vec training via backpropagation

...

...

...

10,000×300weights

300×10,000weights

Linearac<va<onfunc<on

drinks

...Starbucks

Target (probability that “Starbucks” is nearby “drinks”)

Word2Vec training via backpropagation

...

...

...

10,000×300weights

300×10,000weights

Linearac<va<onfunc<on

drinks

...coffee

Target (probability that “coffee” is nearby “drinks”)

Learned word vectors

...

...

...

10,000×300weights

drinks

...

Some surprising results of word2vec

h@p://www.aclweb.org/anthology/N13-1#page=784

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

h@p://papers.nips.cc/paper/5021-distributed-representa.ons-of-words-and-phrases-and-their-composi.onality.pdf

Word embeddings demo

http://bionlp-www.utu.fi/wv_demo/

From http://axon.cs.byu.edu/~martinez/classes/678/Slides/Recurrent.pptx

Recurrent Neural Network (RNN)

From http://eric-yuan.me/rnn2-lstm/

Recurrent Neural Network “unfolded” in time

Training algorithm: “Backpropagation in time”

Encoder-decoder (or “sequence-to-sequence”) networks for translation

h@p://book.paddlepaddle.org/08.machine_transla.on/image/encoder_decoder_en.png

Problem for RNNs: learning long-term dependencies.

“The cat that my mother’s sister took to Hawaii the year before last when you were in high school is now living with my cousin.” Backpropagation through time: problem of vanishing gradients

Long Short Term Memory (LSTM)

•  A “neuron” with a complicated memory gating structure.

•  Replaces ordinary hidden neurons in RNNs.

•  Designed to avoid the long-term dependency problem

Long-Short-Term-Memory (LSTM) Unit

SimpleRNN(hidden)unit

LSTM(hidden)unit Fromh@ps://deeplearning4j.org/lstm.html

Comments on LSTMs

•  LSTM unit replaces simple RNN unit •  LSTM internal weights still trained with backpropagation

•  Cell value has feedback loop: can remember value indefinitely

•  Function of gates (“input”, “forget”, “output”) is learned via minimizing loss

Google “Neural Machine Translation”: (unfolded in time)

From https://arxiv.org/pdf/1609.08144.pdf

Neural Machine Translation:

Training: Maximum likelihood, using gradient descent on weights

Trained on very large corpus of parallel texts in source (X) and target (Y) languages.

θ * = argmaxθ

logP(X |Y,X,Y∑ θ )

How to evaluate automated translations?

Human raters’ side-by-side comparisons: Scale of 0 to 6

0: “completely nonsense translation” 2: “the sentence preserves some of the meaning of the source sentence but misses significant parts” 4: “the sentence retains most of the meaning of the source sentence, but may have some grammar mistakes” 6: “perfect translation: the meaning of the translation is completely consistent with the source, and the grammar is correct.”

Results from Human Raters

Automating Image Captioning

Automating Image Captioning

Wordsincap<on

Wordembeddings

SoFmaxprobabilitydistribu<onovervocabulary

Training:Largedatasetofimage/cap.onpairsfromFlickrandothersources

CNNfeatures

Vinyalsetal.,“ShowandTell:ANeuralImageCap.onGenerator”,CVPR2015

“NeuralTalk” sample results

Fromh@p://cs.stanford.edu/people/karpathy/deepimagesent/genera.ondemo/

Microsoft Captionbot

https://www.captionbot.ai/

http://karpathy.github.io/2012/10/22/state-of-computer-vision/

From Andrej Karpathy’s Blog, Oct. 22, 2012:

“The State of Computer Vision and AI: We are Really, Really Far Away.”

What knowledge do you need to understand this situation?

Microsoft CaptionBot.ai: “I can understand the content of any photograph and I’ll try to describe it as well as any human.”

Microsoft CaptionBot.ai: “I can understand the content of any photograph and I’ll try to describe it as well as any human.”

Winograd Schema “Common Sense” Challenge

I poured water from the bottle into the cup until it was full. What was full?

I poured water from the bottle into the cup until it was empty. What was empty?

Winograd Schemas (Levesque et al., 2011)

Winograd Schema “Common Sense” Challenge

The steel ball hit the glass table and it shattered. What shattered?

The glass ball hit the steel table and it shattered. What shattered?

Winograd Schemas (Levesque et al., 2011)

Winograd Schema “Common Sense” Challenge

State-of-the-art AI: ~60% (vs. 50% with random guessing) Humans: 100% (if paying attention)

State-of-the-art AI: ~60% (vs. 50% with random guessing) Humans: 100% (if paying attention)

“When AI can’t determine what ‘it’ refers to in a sentence, it’s hard to believe that it will take over the world.”

— Oren Etzioni, Allen Institute for AI

https://allenai.org/alexandria/

https://www.seattletimes.com/business/technology/paul-allen-invests-125-million-to-teach-computers-common-sense/

Today’s machine learning systems are more advanced than ever, capable of automating increasingly complex tasks and serving as a critical tool for human operators. Despite recent advances, however, a critical component of Artificial Intelligence (AI) remains just out of reach – machine common sense. Defined as “the basic ability to perceive, understand, and judge things that are shared by nearly all people and can be reasonably expected of nearly all people without need for debate,” common sense forms a critical foundation for how humans interact with the world around them. Possessing this essential background knowledge could significantly advance the symbiotic partnership between humans and machines. But articulating and encoding this obscure-but-pervasive capability is no easy feat. “The absence of common sense prevents an intelligent system from understanding its world, communicating naturally with people, behaving reasonably in unforeseen situations, and learning from new experiences,” said Dave Gunning, a program manager in DARPA’s Information Innovation Office (I2O). “This absence is perhaps the most significant barrier between the narrowly focused AI applications we have today and the more general AI applications we would like to create in the future.”

https://www.darpa.mil/news-events/2018-10-11

Allen AI Institute Common Sense Challenge

•  Which factor will most likely cause a person to develop a fever? (A) a leg muscle relaxing after exercise (B) a bacterial population in the bloodstream (C) several viral particles on the skin (D) carbohydrates being digested in the stomach

Allen AI Institute Common Sense Challenge

•  Which factor will most likely cause a person to develop a fever? (A) a leg muscle relaxing after exercise (B) a bacterial population in the bloodstream (C) several viral particles on the skin (D) carbohydrates being digested in the stomach

•  Lichens are symbiotic organisms made of green algae and fungi. What do the green algae supply to the fungi in this symbiotic relationship? (A) carbon dioxide (B) food (C) protection (D) water

Allen AI Institute Common Sense Challenge

•  Which factor will most likely cause a person to develop a fever? (A) a leg muscle relaxing after exercise (B) a bacterial population in the bloodstream (C) several viral particles on the skin (D) carbohydrates being digested in the stomach

•  Lichens are symbiotic organisms made of green algae and fungi. What do the green algae supply to the fungi in this symbiotic relationship? (A) carbon dioxide (B) food (C) protection (D) water

•  When a switch is used in an electrical circuit, the switch can (A) cause the charge to build. (B) increase and decrease the voltage. (C) cause the current to change direction. (D) stop and start the flow of current.

Allen AI Institute Common Sense Challenge

•  Which factor will most likely cause a person to develop a fever? (A) a leg muscle relaxing after exercise (B) a bacterial population in the bloodstream (C) several viral particles on the skin (D) carbohydrates being digested in the stomach

•  Lichens are symbiotic organisms made of green algae and fungi. What do the green algae supply to the fungi in this symbiotic relationship? (A) carbon dioxide (B) food (C) protection (D) water

•  When a switch is used in an electrical circuit, the switch can (A) cause the charge to build. (B) increase and decrease the voltage. (C) cause the current to change direction. (D) stop and start the flow of current.

•  Which of the following is an example of an assistive device? (A) contact lens (B) motorcycle (C) raincoat (D) coffee pot

Allen AI Institute Common Sense Challenge

•  Which factor will most likely cause a person to develop a fever? (A) a leg muscle relaxing after exercise (B) a bacterial population in the bloodstream (C) several viral particles on the skin (D) carbohydrates being digested in the stomach

•  Lichens are symbiotic organisms made of green algae and fungi. What do the green algae supply to the fungi in this symbiotic relationship? (A) carbon dioxide (B) food (C) protection (D) water

•  When a switch is used in an electrical circuit, the switch can (A) cause the charge to build. (B) increase and decrease the voltage. (C) cause the current to change direction. (D) stop and start the flow of current.

•  Which of the following is an example of an assistive device? (A) contact lens (B) motorcycle (C) raincoat (D) coffee pot

•  Rocks are classified as igneous, metamorphic, or sedimentary according to (1) their color (2) their shape (3) how they formed (4) the minerals they contain

https://leaderboard.allenai.org/

h@ps://gpt2.apps.allenai.org/