Download - Let’s learn deep
LET’S LEARN DEEPSHUBHANSHU MISHRA@THESHUBHANSHU
Some Interesting Results
Image Source: http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/ Distributed representations of words and phrases and their compositionalityT Mikolov, I Sutskever, K Chen, GS Corrado, J Dean - Advances in neural information processing systems, 2013
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems 26 3111–3119 (2013).
Deep learningY LeCun, Y Bengio, G Hinton - Nature, 2015
http://www.socher.org/uploads/Main/MultipleVectorWordEmbedding.png
Zou, Will Y., et al. "Bilingual Word Embeddings for Phrase-Based Machine Translation." EMNLP. 2013.
Paraphrase Detection
Socher, Richard, et al. "Dynamic pooling and unfolding recursive autoencoders for paraphrase detection." Advances in Neural Information Processing Systems. 2011.
Socher, Richard; Perelygin, Alex; Y. Wu, Jean; Chuang, Jason; D. Manning, Christopher; Y. Ng, Andrew; Potts, Christopher. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). EMNLP 2013.
http://cs.stanford.edu/people/karpathy/deepimagesent/
Why Neural Networks? - The perceptron algorithm can learn to classify linearly separable samples. ALWAYS.
- BUT, how to tackle non-linearity?
Enter NEURAL NETWORKS
- Add a non linear transform to the data
- 1 layer ANNs can approximate any continuous function [1,2]
- Can be trained through BACKPROPOGRATION
http://cs231n.github.io/neural-networks-1/[1] Cybenko, George. "Approximation by superpositions of a sigmoidal function."Mathematics of control, signals and systems 2.4 (1989): 303-314.[2] http://neuralnetworksanddeeplearning.com/chap4.html
A simple Neural Network
http://ufldl.stanford.edu/wiki/images/thumb/9/99/Network331.png/400px-Network331.png
Y
𝑙𝑜𝑠𝑠=𝐻 ( 𝑓 (𝑊 ,𝑋 ) ,𝑌 )
log 𝑙𝑜𝑠𝑠❑=∑ 𝑦 ∗ log ( 𝑓 (𝑊 ,𝑋 ))h𝑖𝑛𝑔𝑒𝑙𝑜𝑠𝑠=∑ max (0 ,1− 𝑓 (𝑊 , 𝑋 )∗ 𝑦)
Train it through back propagation
𝑊 𝑡=𝑊 𝑡− 1− 𝑙∗𝜕𝑙𝑜𝑠𝑠(𝑊 )
𝜕𝑊
Types of ANN: Vanilla Feed Forward NN
https://class.coursera.org/neuralnets-2012-001/lecture
Hinton, Geoffrey E. "Learning distributed representations of concepts."Proceedings of the eighth annual conference of the cognitive science society. Vol. 1. 1986.
https://class.coursera.org/neuralnets-2012-001/lecture
https://class.coursera.org/neuralnets-2012-001/lecture
Collobert, Ronan, et al. "Natural language processing (almost) from scratch."The Journal of Machine Learning Research 12 (2011): 2493-2537.
Example of multitasking with NN. Task 1 and Task 2 are two tasks trained with the window approach architecture presented in Figure 1. Lookup tables as well as the first hidden layer are shared. The last layer is task specific. The principle is the same with more than two tasks.
AI Question AnsweringCounting Compound Coreference
Factoid Q/A with supporting facts
Weston J, Bordes A, Chopra S, Mikolov T. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In: Unpublished.; 2015. doi:10.1016/j.jpowsour.2014.09.131.
Reasoning about agents motivation
Bordes A, Usunier N, Chopra S, Weston J. Large-scale Simple Question Answering with Memory Networks. arXiv. 2015.
Weston J, Chopra S, Bordes A. Memory Networks. In: International Conference on Learning Representations.; 2015:1-14. http://arxiv.org/abs/1410.3916.
Total 20 tasks. System should solve all tasks. No task specific system. Use Memory Network to solve these tasks. Accuracy of ~42% beats the older benchmarks.
http://www.thespermwhale.com/jaseweston/babi/abordes-ICLR.pdf
Types of ANN: Recurrent NN
http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-shorttermdepdencies.png
Learn sequential structures like sequence of chars, words, audio signals etc.
Types of ANN: Recurrent NN
http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/img/Bottou-Atree.png
From Machine Learning to Machine Reasoning Léon Bottou
Learn arbitrary structures like parse trees.
Types of ANN: Convolutional Neural Nets
http://colah.github.io/posts/2014-07-Conv-Nets-Modular/img/Conv-9-Conv2Max2Conv2.png
Learn similar features in different parts of the inputs
Are used heavily in Image Data because various parts of the image can refer to the same data.
Types of ANN: Auto Encoders
From Machine Learning to Machine Reasoning Léon Bottou
http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/img/Bottou-unfold.png
Learn to reconstruct the input
Types of ANN: RBMsand DBNsRBM: Restricted Boltzmann MachineDBN: Deep Belief NetworksGenerative graphical model
Salakhutdinov, Ruslan, Andriy Mnih, and Geoffrey Hinton. "Restricted Boltzmann machines for collaborative filtering." Proceedings of the 24th international conference on Machine learning. ACM, 2007.
What is Deep About Deep Learning?
1. Deep Belief networks
2. RBMs, Auto encoders
3. Convolutional Neural Networks
4. Stacked Auto Encoders
Deeper NNs are helpful so that number of parameters to learn are of polynomial order compared to less layers where number of parameters to learn will increase exponentially.
Wolf, Lior. "Deepface: Closing the gap to human-level performance in face verification." Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE. 2014.
What is Deep Learning? Like a Lego Building exercise.
Stacking of various models and propagating the error from the output of this architecture to each layer.
Solves the issue of feature selection
Non linear relationship between features
Much easier to train a model on large data than to hand craft features.
When Deep Learning?
LARGE DATALARGE
COMPUTATIONAL RESOURCES
USEFUL QUESTIONS
Why were Deep ANN’s in shadows?
There were major challenges in training ANNs:◦ Need large amounts of data to train (for better function approximation)◦ More weights to train for (Standard image classification models have weights in millions or billions)◦ Vanishing and exploding gradient problem (for Deeper Neural Networks)
What changed? Algorithms for training ANN:
◦ Stochastic Gradient Descent (with momentum)◦ RMSProp◦ Adam, AdaDelta
Fixed vanishing and exploding gradient problems:
◦ LSTM, GRU Units (for vanishing gradients)◦ Gradient Clipping (for exploding gradients)
Methods to prevent overfitting:◦ Regularization◦ Dropout◦ Adversial Networks
Computation Resources:◦ GPU Computing◦ HPC, MPI
Larger Datasets:◦ ImageNet (for image classifications)◦ Google Billion Words Corpus (for auto
generated word vectors)
Methods to gain sparsity:◦ DropOut◦ ReLU, MaxOut activations
Machine Learning to Neural NetworksMACHINE LEARNING METHODS
Deterministic Models◦ Linear Regression◦ Logistic Regression◦ SVM◦ CRFGenerative Models◦ HMM◦ LDA◦ Collaborative FilteringUnsupervised◦ K-means◦ Hierarchal Clustering
NEURAL NETWORK METHODS
Deterministic Models◦ ANN Squared Error loss◦ ANN Softmax layer and log loss◦ ANN Hinge loss◦ RNN with prediction at endGenerative Models◦ RNN generating sequences◦ RBMs◦ RBMsUnsupervised◦ Auto Encoders◦ RBMs◦ Deep Belief Networks
LITTLE MATHOPTIONAL
Loss Functions & Optimization
Rmsprop and Adagrad, Adadelta are used in high performance networks.
Idea is:
For some f(W, X) minimize the loss
Between y and f(W,X).
This is done using a loss function.
Major one is log-loss
Open Questions Autoencoders for text data
AI Question Answering
Sarcasm Sentiment analysis
Collaborate SEMEVAL 2016 is coming up and there are tasks like
◦ Sentiment analysis◦ Question Answering◦ http://alt.qcri.org/semeval2016/task4/
the didbend first water.bond warmerial in roid.the lagents to duttersprantessi harkian, arow ... with enkyber fanter-indoug tood cool... the summer small winding skates the moutledday markedgly searl.doupy of it your sold all ic house bat she - etther of thouder fol my old starsgream trains ond cat out the song"saurand shide of gres dewill a now centher mother of at, the creaking passs cool sunsing sapcingatale dowthing aland suncaking in.do a back-end stliagh in in ithicn like into whereso to the touther pate patin on' gal on the aloopmesaterfleoss the sound i lean
I andhe had begetter by His husband, brought unto a hundred cruelings,shrouded me, pierced Arjuna, on thy foe, proud directions and urged bySatyaki in the heart as the filled hill with his flying poison. Untothy host, called Earth, recognise him, by means of her abode, 'Thou shaltconquer thy car is in all kinds of righteousness. Whatever I is filledwith respect. In thee enjoyment will iniunto that Kshatriya enjoys verilyto that as to him that I have now take for me of Kuru's race.'"
SECTION LXXXVIII
"Drona said, 'Renounced still, thou art my great science and foreholder,thou wilt, O best of men, go now, may be said to be Pandu. Persons offooly acts also may injury With regions of entirety? Thou art the deterioryfrom this point of desire. There should be and enjoyeth rites defeatedby the world meet with without injury.
THANK YOU =)MANY OF THE RESOURCES USED CAN BE FOUND AT: HTTP://SHUBHANSHU.COM/DEEPLEARNING.HTML