neural networks and applications to natural language...

86
Neural Networks and applications to natural language processing John Ar´ evalo MindLAB Research Group - Universidad Nacional de Colombia June 21, 2018 John Ar´ evalo Neural Networks and applications to natural language processing

Upload: others

Post on 08-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks and applications to natural languageprocessing

John Arevalo

MindLAB Research Group - Universidad Nacional de Colombia

June 21, 2018

John Arevalo Neural Networks and applications to natural language processing

Page 2: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Outline

1 Neural NetworksIntroductionInteractive demoNeural Network Training

2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types

3 Convolutional neural networks

OperationsCNN For NLP

4 Language modeling with recurrentneural networks

Recurrent neural networksLong short-term memory networksVariantSome applicationsResources

John Arevalo Neural Networks and applications to natural language processing

Page 3: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks

Outline

1 Neural NetworksIntroductionInteractive demoNeural Network Training

2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types

3 Convolutional neural networks

OperationsCNN For NLP

4 Language modeling with recurrentneural networks

Recurrent neural networksLong short-term memory networksVariantSome applicationsResources

John Arevalo Neural Networks and applications to natural language processing

Page 4: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks Introduction

Neural Networks

Inspired by nature (the brain)

Simple processing units but many of them and highly interconnected

Distributed processing and memory

Learn from data samples

John Arevalo Neural Networks and applications to natural language processing

Page 5: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks Interactive demo

Interactive demo

Quick and dirty introduction to neural networks

John Arevalo Neural Networks and applications to natural language processing

Page 6: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks Neural Network Training

Learning as optimization

Problem definition: From x , predict y . i.e. find a function f such thatf (x )→ y .

H : hypothesis space, D :training data

D = {(x1, y1), . . . , (x`, y`)}

fΘ ∈ H denotes a function f parametrized by Θ. To measure howgood fΘ (x ) estimates y , a loss L is defined.

General optimization problem:

minf ∈H

L(f ,D),

Let y = fΘ (x ) be the prediction made by fΘ given x as input.Squared loss is an example:

L(fΘ,D) = E (Θ,D) =∑i=1

‖yi − yi‖22

John Arevalo Neural Networks and applications to natural language processing

Page 7: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks Neural Network Training

Other loss functions

L1 loss:

E (Θ,D) =∑i=1

‖yi − yi‖1

Cross-entropy loss:

E (Θ,D) = − ln∏i=1

p(yi |xi ,Θ)

= −∑i=1

[yi ln yi + (1− yi) ln(1− yi)]

Hinge loss:

E (Θ,D) =∑i=1

max(0, 1− yi yi)

John Arevalo Neural Networks and applications to natural language processing

Page 8: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks Neural Network Training

Optimization by Gradient descent

Θt+1 = Θt − ηt∇ΘE (Θt)

∇ΘE (Θ) =∂E (Θ)

∂Θ

John Arevalo Neural Networks and applications to natural language processing

Page 9: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks Neural Network Training

Backpropagation [Rumelhart, Hinton, 1986]

Efficient strategy to calculate the gradient.

Errors are back-propagated through the network to assign’responsibility’ to each neuron (δi)

Gradient is calculated based on delta values.

John Arevalo Neural Networks and applications to natural language processing

Page 10: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks Neural Network Training

Backpropagation [Rumelhart, Hinton, 1986]

Inside a neuron

John Arevalo Neural Networks and applications to natural language processing

Page 11: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Neural Networks Neural Network Training

Backpropagation [Rumelhart, Hinton, 1986]

Bigger network can composed thousands of operations

John Arevalo Neural Networks and applications to natural language processing

Page 12: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning

Outline

1 Neural NetworksIntroductionInteractive demoNeural Network Training

2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types

3 Convolutional neural networks

OperationsCNN For NLP

4 Language modeling with recurrentneural networks

Recurrent neural networksLong short-term memory networksVariantSome applicationsResources

John Arevalo Neural Networks and applications to natural language processing

Page 13: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature extraction

Feature extraction

John Arevalo Neural Networks and applications to natural language processing

Page 14: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature extraction

Features

Features represent our prior knowledge of the problem

Depend on the type of data

Specialized features for practically any kind of data (images, video,sound, speech, text, web pages, etc)

Medical imaging:

Standard computer vision features (color, shape, texture, edges,local-global, etc)Specialized features tailored to the problem at hand

New trend: learning features from data

John Arevalo Neural Networks and applications to natural language processing

Page 15: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Feature learning

John Arevalo Neural Networks and applications to natural language processing

Page 16: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Feature learning

John Arevalo Neural Networks and applications to natural language processing

Page 17: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Feature learning approaches

Unsupervised feature learning

Convolutional neural networks

Recurrent neural networks

John Arevalo Neural Networks and applications to natural language processing

Page 18: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Unsupervised feature learning

John Arevalo Neural Networks and applications to natural language processing

Page 19: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Deep feed-forward neural networks

John Arevalo Neural Networks and applications to natural language processing

Page 20: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

ImageNet 2012 [Krizhevsky, Sutskever, Hinton 2012]

(source: ICML2013 Deep Learning Tutorial, Yan LeCun et al.)John Arevalo Neural Networks and applications to natural language processing

Page 21: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

ImageNet 2012 [Krizhevsky, Sutskever, Hinton 2012]

(source: ICML2013 Deep Learning Tutorial, Yan LeCun et al.)John Arevalo Neural Networks and applications to natural language processing

Page 22: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Histopathology basal cell carcinoma

Tumor

Non-tumor

John Arevalo Neural Networks and applications to natural language processing

Page 23: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Convolutional Autoencoder for Histopathology ImageRepresentation Learning

John Arevalo Neural Networks and applications to natural language processing

Page 24: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Digital staining results

Cancer Cancer Cancer Non-cancer Non-cancer Non-cancer

Cancer Cancer Cancer Non-cancer Non-cancer Non-cancer

0.8272 0.9604 0.7944 0.2763 0.0856 0.0303

John Arevalo Neural Networks and applications to natural language processing

Page 25: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

TICA learned features

John Arevalo Neural Networks and applications to natural language processing

Page 26: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Feature learning for natural language data

But what about text?

Neural networks are a hot topic in NLP nowadays:

“NN language models and word embeddings were everywhere atNAACL2015 and ACL2015” C. Manning.Many successful applications:

Speech recognitionLanguage modelingTranslationImage captioning

John Arevalo Neural Networks and applications to natural language processing

Page 27: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Practical considerations

Traditional backpropagation does not work well with multiple layers

It gets stuck in local minima

During the last years several strategies have beendeveloped/discovered (tricks of the trade):

Stochastic gradient descent with minibatches and adaptive learning rateLogistic regression/soft max for classificationNormalization of input variables, shuffling of training samplesRegularization using L1 and L2 norms and dropout

John Arevalo Neural Networks and applications to natural language processing

Page 28: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Feature learning

Implementation

Use of GPUs is mandatory (speed-up > 100x)

Sometimes combined with distributed processing

Practically all the libraries use CUDA

Several higher-level frameworks:

NVIDIA CUDA Deep Neural Network library (cuDNN)CaffeTorchTheanoBlocksEtc.

John Arevalo Neural Networks and applications to natural language processing

Page 29: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Neural Network Types

Types

Feed-forward, multilayerperceptrons

Convolutional neuralnetworks

Recurrent neural networks

John Arevalo Neural Networks and applications to natural language processing

Page 30: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Neural Network Types

Types

Feed-forward, multilayerperceptrons

Convolutional neuralnetworks

Recurrent neural networks

John Arevalo Neural Networks and applications to natural language processing

Page 31: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Feature extraction and Learning Neural Network Types

Types

Feed-forward, multilayerperceptrons

Convolutional neuralnetworks

Recurrent neural networks

John Arevalo Neural Networks and applications to natural language processing

Page 32: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks

Outline

1 Neural NetworksIntroductionInteractive demoNeural Network Training

2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types

3 Convolutional neural networks

OperationsCNN For NLP

4 Language modeling with recurrentneural networks

Recurrent neural networksLong short-term memory networksVariantSome applicationsResources

John Arevalo Neural Networks and applications to natural language processing

Page 33: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolutional neural networks

John Arevalo Neural Networks and applications to natural language processing

Page 34: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 35: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 36: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 37: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 38: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 39: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 40: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 41: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 42: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolution

John Arevalo Neural Networks and applications to natural language processing

Page 43: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Pooling

John Arevalo Neural Networks and applications to natural language processing

Page 44: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Pooling

John Arevalo Neural Networks and applications to natural language processing

Page 45: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Pooling

John Arevalo Neural Networks and applications to natural language processing

Page 46: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Pooling

John Arevalo Neural Networks and applications to natural language processing

Page 47: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Convolutional neural networks

Repeat this conv->pooling operation multiple times.

John Arevalo Neural Networks and applications to natural language processing

Page 48: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks Operations

Interactive demo

CNN for text classification...

John Arevalo Neural Networks and applications to natural language processing

Page 49: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks CNN For NLP

Convolution

Use word embeddings to represent words. Each row represent a word.

John Arevalo Neural Networks and applications to natural language processing

Page 50: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks CNN For NLP

Convolution

Use word embeddings to represent words. Each row represent a word.

John Arevalo Neural Networks and applications to natural language processing

Page 51: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks CNN For NLP

Convolution

Use word embeddings to represent words. Each row represent a word.

John Arevalo Neural Networks and applications to natural language processing

Page 52: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks CNN For NLP

Pooling

John Arevalo Neural Networks and applications to natural language processing

Page 53: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks CNN For NLP

Pooling

John Arevalo Neural Networks and applications to natural language processing

Page 54: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks CNN For NLP

CNN For NLP

Repeat conv->pool operations mutiple times. Mutiple filters can be usedin the same layer.

John Arevalo Neural Networks and applications to natural language processing

Page 55: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Convolutional neural networks CNN For NLP

Interactive demo

CNN for text classification...

John Arevalo Neural Networks and applications to natural language processing

Page 56: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks

Outline

1 Neural NetworksIntroductionInteractive demoNeural Network Training

2 Feature extraction and LearningFeature extractionFeature learningNeural Network Types

3 Convolutional neural networks

OperationsCNN For NLP

4 Language modeling with recurrentneural networks

Recurrent neural networksLong short-term memory networksVariantSome applicationsResources

John Arevalo Neural Networks and applications to natural language processing

Page 57: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Recurrent neural networks

Recurrent neural network

Neural networks with memory

Feed-forward NN: output exclusivelydepends on the current input

Recurrent NN: output depends incurrent and previous states

This is accomplished throughlateral/backward connections whichcarry information while processing asequence of inputs

(source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

John Arevalo Neural Networks and applications to natural language processing

Page 58: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Recurrent neural networks

Neural Net Language Model

Problem: predict the next wordgiven the previous 3 words(4-gram language model)

The matrix U corresponds tothe word vector representationof the words.

Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2003). A neural probabilistic languagemodel. The Journal of Machine Learning Research, 3, 1137-1155.

John Arevalo Neural Networks and applications to natural language processing

Page 59: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Recurrent neural networks

Character-level language model

(source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

John Arevalo Neural Networks and applications to natural language processing

Page 60: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Recurrent neural networks

Sequence learning alternatives

(source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

John Arevalo Neural Networks and applications to natural language processing

Page 61: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Recurrent neural networks

Network unrolling

(source: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/)

John Arevalo Neural Networks and applications to natural language processing

Page 62: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Recurrent neural networks

Backpropagation through time (BPTT)

(source: http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/)

John Arevalo Neural Networks and applications to natural language processing

Page 63: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Recurrent neural networks

BPTT is hard

The vanishing and theexploding gradientproblem

Gradients could vanish(or explode) whenpropagated several stepsback

This makes difficult tolearn long-termdependencies.

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training Recurrent Neural Networks. Proc. ofICML, abs/1211.5063.

John Arevalo Neural Networks and applications to natural language processing

Page 64: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Long short-term memory networks

Long term dependencies

(source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

John Arevalo Neural Networks and applications to natural language processing

Page 65: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Long short-term memory networks

Long short-term memory (LSTM)

Hochreiter, Sepp, and Jurgen Schmidhuber. ”Long short-term memory.”Neuralcomputation 9, no. 8 (1997): 1735-1780.

LSTM networks solve the problem of long-term dependency problem.

They use gates that allow to keep memory through long sequencesand be updated only when required.

John Arevalo Neural Networks and applications to natural language processing

Page 66: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Long short-term memory networks

Conventional RNN vs LSTM

(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

John Arevalo Neural Networks and applications to natural language processing

Page 67: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Long short-term memory networks

Forget gate

Controls the flow of theprevious internal state Ct−1

ft = 1⇒ keep previous state

ft = 0⇒ forget previousstate

(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

John Arevalo Neural Networks and applications to natural language processing

Page 68: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Long short-term memory networks

Input gate

Controls the flow of inputinformation (xt)

it = 1⇒ take input intoaccount

it = 0⇒ ignore input

(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

John Arevalo Neural Networks and applications to natural language processing

Page 69: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Long short-term memory networks

Current state calculation

(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

John Arevalo Neural Networks and applications to natural language processing

Page 70: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Long short-term memory networks

Output gate

Controls the flow ofinformation from the internalstate (xt) to the outside (ht)

ot = 1⇒ allows internalstate out

ot = 0⇒ doesn’t allowinternal state out

(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

John Arevalo Neural Networks and applications to natural language processing

Page 71: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Variant

Gated recurrent units

Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., &Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statisticalmachine translation. arXiv preprint arXiv:1406.1078.(image source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

John Arevalo Neural Networks and applications to natural language processing

Page 72: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

The Unreasonable Effectiveness of Recurrent NeuralNetworks

Famous blog entry from Andrej Karpathy (UofS)

Character-level language models based on multi-layer LSTMs.

Data:

Shakspare playsWikipediaLATEXLinux source code

John Arevalo Neural Networks and applications to natural language processing

Page 73: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Algebraic geometry book in LATEX

(source: http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

John Arevalo Neural Networks and applications to natural language processing

Page 74: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Linux source code

/** Increment the size file of the new incorrect UI_FILTER group information

* of the size generatively.

*/

static int indicate_policy(void)

{

int error;

if (fd == MARN_EPT) {

/*

* The kernel blank will coeld it to userspace.

*/

if (ss->segment < mem_total)

unblock_graph_and_set_blocked();

else

ret = 1;

goto bail;

}

segaddr = in_SB(in.addr);

selector = seg / 16;

setup_works = true;

for (i = 0; i < blocks; i++) {

seq = buf[i++];

bpf = bd->bd.next + i * search;

if (fd) {

current = blocked;

}

}

John Arevalo Neural Networks and applications to natural language processing

Page 75: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Interactive demo

RNN for language modeling...

John Arevalo Neural Networks and applications to natural language processing

Page 76: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Image captioning

Karpathy, Andrej, and Li Fei-Fei. ”Deep visual-semantic alignments for generatingimage descriptions.” CVPR2015. arXiv preprint arXiv:1412.2306 (2014).

John Arevalo Neural Networks and applications to natural language processing

Page 77: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Approach

John Arevalo Neural Networks and applications to natural language processing

Page 78: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Image-sentence score model

John Arevalo Neural Networks and applications to natural language processing

Page 79: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Multimodal RNN

John Arevalo Neural Networks and applications to natural language processing

Page 80: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Alignment results

John Arevalo Neural Networks and applications to natural language processing

Page 81: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Some applications

Captioning results

John Arevalo Neural Networks and applications to natural language processing

Page 82: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Resources

Papers (1)

General:

S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation,9(8):1735-1780, 1997. Based on TR FKI-207-95, TUM (1995).J. Schmidhuber. Deep Learning in Neural Networks: An Overview. NeuralNetworks, Volume 61, January 2015, Pages 85-117 (DOI:10.1016/j.neunet.2014.09.003)

Language modeling:

Mikolov, Tomas, et al. ”Recurrent neural network based language model.”INTERSPEECH 2010, 11th Annual Conference of the International SpeechCommunication Association, Makuhari, Chiba, Japan, September 26-30, 2010.2010.Mikolov, Tomas, et al. ”Extensions of recurrent neural network language model.”Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE InternationalConference on. IEEE, 2011.Sutskever, Ilya, James Martens, and Geoffrey E. Hinton. ”Generating text withrecurrent neural networks.” Proceedings of the 28th International Conference onMachine Learning (ICML-11). 2011.

John Arevalo Neural Networks and applications to natural language processing

Page 83: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Resources

Papers (2)

Machine translation:

Liu, Shujie, et al. ”A recursive recurrent neural network for statistical machinetranslation.” Proceedings of ACL. 2014.Sutskever, Ilya, Oriol Vinyals, and Quoc VV Le. ”Sequence to sequence learningwith neural networks.” Advances in neural information processing systems. 2014.Auli, Michael, et al. ”Joint Language and Translation Modeling with RecurrentNeural Networks.” EMNLP. Vol. 3. No. 8. 2013.

Speech recognition:

Graves, Alex, and Navdeep Jaitly. ”Towards end-to-end speech recognition withrecurrent neural networks.” Proceedings of the 31st International Conference onMachine Learning (ICML-14). 2014.

John Arevalo Neural Networks and applications to natural language processing

Page 84: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Resources

Papers (3)

Image captioning:

Karpathy, Andrej, and Li Fei-Fei. ”Deep visual-semantic alignments for generatingimage descriptions.” CVPR2015. arXiv preprint arXiv:1412.2306 (2014).Vinyals, Oriol, et al. ”Show and tell: A neural image caption generator.” CVPR2015.arXiv preprint arXiv:1411.4555 (2014).Chen, Xinlei, and C. Lawrence Zitnick. ”Learning a recurrent visual representationfor image caption generation.” arXiv preprint arXiv:1411.5654 (2014).Fang, Hao, et al. ”From captions to visual concepts and back.” CVPR2015, arXivpreprint arXiv:1411.4952 (2014).

John Arevalo Neural Networks and applications to natural language processing

Page 85: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Resources

Other resources

This presentation was adapted fromhttp://fagonzalezo.github.io/nn_nlp_tutorial/

Christopher Olah, Understanding LSTM Networks,http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Denny Britz, Recurrent Neural Networks Tutorial,http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

Andrej Karpathy, The Unreasonable Effectiveness of Recurrent NeuralNetworks, http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Jurgen Schmidhuber, Recurrent Neural Networks,http://people.idsia.ch/∼juergen/rnn.html

John Arevalo Neural Networks and applications to natural language processing

Page 86: Neural Networks and applications to natural language ...ccd.cua.uam.mx/~evillatoro/TallerPLN2018/JohnArevalo/rnn_lm.pdf · Neural Networks Neural Network Training Learning as optimization

Language modeling with recurrent neural networks Resources

Thanks!

http://www.mindlaboratory.org

John Arevalo Neural Networks and applications to natural language processing