![Page 1: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/1.jpg)
Scalable Deep Learning in Baidu
Weide Zhang, Kyle Tsai, Jiang WangBaidu USDC
![Page 2: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/2.jpg)
Background • Spark
– Batch/Streaming processing, Spark SQL, MLLib• Deep learning has many use cases in Baidu and showed
significant improvement in quality – Image retrieval ranking – Ads CTR prediction – Machine translation – Speech recognition
• Goal: able to use distributed deep learning in Spark
![Page 3: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/3.jpg)
Typical Training Data in Baidu• Image Recognition: 100 millions• OCR: 100 millions• Speech: 10 billions• CTR: 100 billions• Grows every year since 2012
![Page 4: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/4.jpg)
Baidu Spark One
![Page 5: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/5.jpg)
Paddle• Parallel Asynchronous Distributed Deep Learning Library
– Support distributed parameter servers to do synchronous/asynchronous parameter update
– Support Multi GPU / CPU training– Support sparse model– Easy to understand API for user to add new layers – Support rich features for deep learning use cases,
especially for NLP
![Page 6: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/6.jpg)
Deep learning options comparisonCaffe Tensor Flow Torch Paddle
Distributed Training Yes Yes No Yes
CommunicationCost
Medium High N/A Medium to Low
Easy to customizeand coding
Yes More LearningCurve
More LearningCurve
Yes
Sparse ModelSupport
No Yes Yes Yes
Area of Focus Vision All All All
Integration withSpark
Yes No No Yes
![Page 7: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/7.jpg)
High Level Goals• Implement Spark ML abstractions to let user train
deep learning models with minimal code change• Leverage paddle’s native training and parameter
server mechanisms to be scheduled in spark deeplearning jobs
• Handle multi-tenancy and heterogeneity• Parallelize hyper parameter selection• Batch and Streaming learning
![Page 8: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/8.jpg)
Paddle on Spark
•TrainingCommunication
•Deep LearningAlgorithm
•Resource Management
• Training Environment
Spark Yarn
Parameter ServerPaddle
![Page 9: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/9.jpg)
Training Data Flow
![Page 10: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/10.jpg)
System Architecture
![Page 11: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/11.jpg)
Spark ML’s Abstraction• Train
• Predict
![Page 12: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/12.jpg)
Simple Parameter Is Not EnoughImage Label
Bird
Cat
Convolution
Pooling
Full Connection Cost
Parameter For CNN
![Page 13: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/13.jpg)
Use Paddle As Estimator
![Page 14: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/14.jpg)
Code your Configuration
![Page 15: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/15.jpg)
Example of caffe prototxt
![Page 16: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/16.jpg)
Design decisions• Spark ML Compatible API
– Compatible with Spark is more important than implemented under Spark
• Code level configuration– Easy and flexible– Manual is prone to error
![Page 17: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/17.jpg)
PADDLE Scalable Deep Learning Platform at Baidu
![Page 18: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/18.jpg)
Sharded Parameter Server • One parameter an one trainer co-locate in a machine.
• Parameters are shared, but not replicated.
• All-to-all communication.• Our environments
• 4 GPUs per machine.• 4-10 machines.• all machines in one
switch• reliable data center.
![Page 19: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/19.jpg)
GPU Ring Synchronization• Each parameter only needs
to go through slow connection two times.
• One for reduce.• Another for scatter.
![Page 20: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/20.jpg)
ImageNet Scale Experiments
0
10
20
30
40
1 2 3 4 5
Tim
e (s
)
Number of machines
Time per 100 batches
TCP RDMA• AlexNet on ImageNet• batch size = 64• 1 Machine has 4 K10
GPUs.
![Page 21: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/21.jpg)
Sparse Training
![Page 22: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/22.jpg)
Sparse Training Experiment
0
75
150
225
1 2 4 8 16
Tim
e (s
)
Number of nodes
Time per 100 batches
Non Sparse Sparse • 1451594 dimensional sparse feature.
• Embedded to 128d, 96d, 96d, and 128d.
• Using a ranking cost on the top.
• batch size = 128.
![Page 23: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/23.jpg)
Flexible and Efficient RNN Implementation
![Page 24: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/24.jpg)
RNN Performance Comparison with TensorFlow
0
125
250
375
500
625
200 650 1500
Tim
e (m
s)
RNN Hidden Size
Time per BatchPADDLE TensorFlow
• Machine Translation• batch size = 64• embedding size =
hidden_size• dictionary size =
10000
![Page 25: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/25.jpg)
Distributed Training Performance Character Neural Machine Translation
• 8 Machines, each with 4 K40 GPUs• number of RNN encoder layers: 9, number of RNN decoder layers: 7• Word embedding size: 256, RNN size: 512• batch size: 25000 character• Speed:
•attention: 25 minutes / 100 batches .•encoder-decoder: 9 minutes / 100 batches.
![Page 26: Scalable Deep Learning Platform On Spark In Baidu](https://reader033.vdocuments.us/reader033/viewer/2022051404/58f9a91d760da3da068b6b1f/html5/thumbnails/26.jpg)
Future work• Streaming training• Dynamic trainer allocation• FairScheduler• Model serving