Download - Deep learning on spark
![Page 1: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/1.jpg)
1
![Page 2: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/2.jpg)
Learning is about acquiring the ability to discriminate.
Memorization
Overfitting
Under fitting
Generalization
© Satyendra Rana 2
2 2 4
2 3 5
2 4 6
2 5 7
2 3 ?
2 7 ?
3 4 ?
2 2 4.05
2 3 4.98
4 2 5.95
2 5 7.06
2 3 ?
2 7 ?
3 4 ?
Noise
![Page 3: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/3.jpg)
Machine Learning
Data In Wisdom Out
© Satyendra Rana 3
?
Square Boxes
Thin Rectangular Boxes
RoundBoxes
Q1: Which type of box should we look for?
Q2: Having picked up the box type, how do we find the right box?
Computational Architecture
Learning Method
![Page 4: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/4.jpg)
Deep (Machine) Learning
Data In Wisdom Out
© Satyendra Rana 4
?
Type of box?
Right box?
Computational Architecture
Learning Method
Discrimination Ability?
Finer Discrimination (Non-linearity)
Network of Neurons (aka Neural Network or NN)
![Page 5: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/5.jpg)
© Satyendra Rana 5
Natural Language Generation
![Page 6: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/6.jpg)
© Satyendra Rana 6
Machine Translation
![Page 7: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/7.jpg)
© Satyendra Rana 7
Automatic Image Captioning
![Page 8: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/8.jpg)
© Satyendra Rana 8
Automatic Colorization of Gray Scale ImagesInput Image
Automatically Colorized
Ground-Truth
Source:
Nvidia news
![Page 9: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/9.jpg)
© Satyendra Rana 9
Ping Pong Playing Robot
Source:
Omron Automation Lab
Kyoto, Japan
![Page 10: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/10.jpg)
© Satyendra Rana 10
Deep learning for the sight-impaired (and also for the sight-endowed)
![Page 11: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/11.jpg)
© Satyendra Rana 11
Neurons
Adult Brain - 100 Trillion Infant Brain – 1 Quadrillion
Synapses
![Page 12: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/12.jpg)
© Satyendra Rana 12
Model of a Neuron & Artificial Neural Networks
I
N
P
U
T
w0
w1
w2
w3
w4
Hyper-parameters- number of layers- type & number of neurons in each layer
Parameters- weights (one for each connection)
![Page 13: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/13.jpg)
© Satyendra Rana 13
Multi-layered Neural Network
Synapse ScaleTypical NNs - 1-10 MillionGoogle Brain – 1 BillionMore Recent Ones – 10 Billion
Given a fixed number of neurons, spreading them in more layers (deep structure) is more effective than in fewer layers (shallow structure).
Given a fixed number of layers, higher number of neurons is better than fewer.
Deep Neural Networks are powerful, but they must also be trainable to be useful.
Different kinds of Deep Neural Networks Feed Forward NNs Recurrent NNs Recursive NNs Convolutional NNs
![Page 14: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/14.jpg)
© Satyendra Rana 14
How does a Neural Network Learn?Parameters
Learning problem is to find the best combination of parameter values, among all possible choices, which would give us on an average most accurate (or minimum error) result (output) in all possible situations (inputs).
![Page 15: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/15.jpg)
© Satyendra Rana 15
Feed Forward Neural Network (FNN)
W111W112W113W114
W121W122W123
W124W131W132
W133W134
W211W212W213W214
W221W222W223
W224W231W232W233
W234W241W242
W243W244
W321
W311
W331
W341Loss Function
Output
Credit Assignment ProblemWhich modifiable components of a learning system are responsible for its success or failure?How can I modify the responsible components to improve the system?
How do I change the weights (parameters) to make the NN exhibit desired behavior?
Supervised Learning
![Page 16: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/16.jpg)
© Satyendra Rana 16
Passing the Buck Example: Fine Tuning a Sales Team Performance
W111W112W113W114
W121W122W123
W124W131W132
W133W134
W211W212W213W214
W221W222W223
W224W231W232W233
W234W241W242
W243W244
W321
W311
W331
W341Loss Function
Output
Backward PropagationPropagating the error backwards from layer to layer, so that each layer can tweak its weights to account for their share of responsibility.
(direction, amount)
![Page 17: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/17.jpg)
© Satyendra Rana 17
Fn-2(Xn-3, Wn-2)
Fn-1(Xn-2, Wn-1)
Fn(Xn-1, Wn)Wn
Wn-2
Wn-1
Xn-1
Xn-2
Xn-3
X1
Xn
C (Xn, Y)
Y
E
Directionn
Directionn-1
Directionn-2
Forw
ard
Pas
s
Backw
ard Pass
Directionn = DF( Xn, C (Xn, Y))
Directionn-1 = Directionn *DF( Xn-1, Fn(Xn-1, Wn))
Directionn-2 = Directionn-1 *DF( Xn-1, Fn(Xn-2, Wn-1))
Directionn-3 = Directionn-2 *DF( Xn-2, Fn(Xn-3, Wn-2))
Stochastic Gradient Descent (SGD)
![Page 18: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/18.jpg)
© Satyendra Rana 18
Base camp
You are here
Climbing down Mountains with Zero Gravity
Steepest Descent
Learning rate
Epoch
![Page 19: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/19.jpg)
© Satyendra Rana 19
What changed since the 80’s?
1970 1975 1980 1985 1990 2010 2015 2020
Early NN Activity Deep NN Activity
• Slow Computers
• Small Data Sets
• Faster Computers
• Big Data
• Training Issues
Big Data & Deep Learning Symbiosis
![Page 20: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/20.jpg)
© Satyendra Rana 20
Reaching Saturation Point in Learning
I don’t want to learn anymore.
![Page 21: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/21.jpg)
© Satyendra Rana 21
Vanishing (or Unstable) Gradient Problem(Gradient at a layer involves the multiplication of gradient at previous layers)
What is the fix?
1. Random Initialization of Weights2. Pre-Training of Layers3. Choice of activation function
• Rectified Linear Unit (RELU)4. Don’t use SGD5. LSTM
![Page 22: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/22.jpg)
© Satyendra Rana 22
Implementation of Deep LearningIt’s all about scaling
1. Implementing a Neuron2. Implementing a Layer3. Composing Layers (Building the network)4. Implementing a Training (Learning) Iteration, aka epoch5. Learning Hyper-parameters
![Page 23: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/23.jpg)
© Satyendra Rana 23
Implementation of Neuron / Layer
Neuron Abstraction
Layer Abstraction
Fast Matrix / Tensor Computation Libraries
• Exploiting multi-threaded multi-core architectures
• GPU Acceleration
Single Node Architecture
Shared Memory Shared Memory
Memory
GPU
Memory
GPU
Single Node ArchitectureGPU Accelerated
Activation Functions
Loss Functions
Node 1
Node 2
Node 3
![Page 24: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/24.jpg)
© Satyendra Rana 24
Composing Layers / Building a Neural Network1. Specifying Layer Composition
(network specification)
SparkML
val mlp = new MultilayerPerceptronClassifier().setLayers(Array(784, 300, 100, 10)).setBlockSize(128)
SparkNet
val netparams = NetParams(RDDLayer(“data”, shape=List(batchsize, 1, 28, 28)),RDDLayer(“label”, shape=List(batchsize, 1)),ConvLayer(“conv1”, List(“data”), Kernel=(5,5), numFilters=20),PoolLayer(“pool1”, List(“conv1”), pool=Max, kernel=(2,2), stride=(2,2)),ConvLayer(“conv2”, List(“pool1”), Kernel=(5,5), numFilters=50),PoolLayer(“pool2”, List(“conv2”), pool=Max, kernel=(2,2), stride=(2,2)),LinearLayer(“ip1”, List(“pool2”), numOutputs=500),ActivationLayer(“relu1”, List(“ip1”), activation=ReLU),LinearLayer(“ip2”, List(“relu1”), numOutputs=10),SoftmaxWithLoss(“loss”, List(“ip2”, “label”))
)
2. Allocating layers to nodes
![Page 25: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/25.jpg)
© Satyendra Rana 25
Speeding up the Training IterationDistributed Implementation of SGD
Executor 1
BLAS
Master
Executor n
BLAS
Wk
Wk
Step 1: Get parameters from Master
Step 2: Compute gradient
Step 3: Send gradients to MasterMaster
Step 4: Compute Wk+1 from gradients
Wk+1
Wk+1
Iteration k
BLAS: Basic Linear Algebra Subprograms, use in Spark thru NetLib-java
![Page 26: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/26.jpg)
© Satyendra Rana 26
MultilayerPerceptronClassifier() in Spark ML
Scala Code
val digits: DataFrame = sqlContext.read.format(“libsvm”).load(“/data/mnist”)
val mlp = new MultilayerPerceptronClassifier().setLayers(Array(784, 300, 100, 10)).setBlockSize(128)
val model=mlp.fit(digits)
Features (input)
Classes (output)
Hidden layerWith 300 neurons
Hidden layerWith 100 neurons
![Page 27: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/27.jpg)
© Satyendra Rana 27
SparkNet: Training Deep Networks in Spark
Executor 3
GPU Caffe
Executor 2
GPU Caffe
Executor 1
GPU Caffe
Executor 4
GPU CaffeMaster
Data Shard 1
Data Shard 2 Data Shard 3
Data Shard 4
2. Run SGD on a mini-batchfor a fixed time/iterations3. Send parameters to master
1. Broadcast model parameters
4. Receive parameters from executors5. Average them to get new parameters
2. Run SGD on a mini-batchfor a fixed time/iterations3. Send parameters to master
2. Run SGD on a mini-batchfor a fixed time/iterations3. Send parameters to master
2. Run SGD on a mini-batchfor a fixed time/iterations3. Send parameters to master
![Page 28: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/28.jpg)
© Satyendra Rana 28
with
Best Model
Model # 1Training
Model # 2Training
Model # 3Training
Distributed Cross Validation
![Page 29: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/29.jpg)
© Satyendra Rana 29
Apache SINGAA General Distributed Deep Learning Platform
![Page 30: Deep learning on spark](https://reader031.vdocuments.us/reader031/viewer/2022022412/58f0b2ef1a28abbd6a8b4593/html5/thumbnails/30.jpg)
Why “Deep Learning” on Spark?
Sorry, I don’t have a GPU / GPU Cluster A 3-to-5 node Spark cluster can be as fast as a GPU
Most of my application and data resides on a Spark ClusterIntegrating Model Training with existing data-processing pipelines
High-throughput loading and pre-processing of data and the ability to keep data in between operations.
Hyper-parameter learning
Poor man’s deep learning
It’s simply fun …
© Satyendra Rana 30