ibm, nvidia, and client under nda...(spearmint, smac) template batch size=128 lr=0.1 batch size=64...
TRANSCRIPT
![Page 1: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/1.jpg)
![Page 2: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/2.jpg)
IBM, NVIDIA, and Client under NDA
…
![Page 3: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/3.jpg)
Internet Services Medicine Media & Entertainment Security & Defense Autonomous Machines
➢ Cancer cell detection
➢ Diabetic grading
➢ Drug discovery
➢ Pedestrian detection
➢ Lane tracking
➢ Recognize traffic signs
➢ Face recognition
➢ Video surveillance
➢ Cyber security
➢ Video captioning
➢ Content based search
➢ Real time translation
➢ Image/Video classification
➢ Speech recognition
➢ Natural language processing
![Page 4: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/4.jpg)
IBM, NVIDIA, and Client under NDA
•••
* Bidirectional bandwidths, GPU links 40+40GB/sec
![Page 5: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/5.jpg)
PowerAI: World’s Fastest Platform for Enterprise
![Page 6: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/6.jpg)
Deep Learning Frameworks & Building Blocks
Accelerated Servers and
Infrastructure for Scaling
PowerAI: World’s Fastest Platform for Enterprise
![Page 7: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/7.jpg)
7© 2016 IBM Corporation
✓✓✓✓✓✓✓✓
![Page 8: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/8.jpg)
8© 2016 IBM Corporation
Data centers near every major metro area enabling low-latency connectivity to cloud infrastructure.
![Page 9: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/9.jpg)
9© 2016 IBM Corporation
Recently Announced!
![Page 10: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/10.jpg)
10
Leveraging Cloud Computing for Deep Learning
Validate/Test
Train
Preprocess
● Advantages to training models in cloud
● Effective training on Rescale platform
● Leveraging IBM Cloud and P100s
![Page 11: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/11.jpg)
11
Leveraging Cloud Computing for Deep Learning
Validate/Test
Train
Preprocess
● Advantages to training models in cloud
● Effective training on Rescale platform
● Leveraging IBM Cloud and P100s
![Page 12: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/12.jpg)
12
Scaling Up - Training with On Demand GPUs
Validate/Test
Train
Preprocess
Validate/Test
Preprocess
Train
● Higher-capacity GPUs● Multi-GPU● Multi-node
![Page 13: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/13.jpg)
13
Cloud Provides Choice in Sizing GPU Resources
ExploratoryData Analysis,Debugging models
Big Models,Big Inputs, Big Batches
Big + Data Parallel
OpenCV,Numba,Small batch sizes
ResNet152, batch size=64
Google Brain Grasping Dataset
K80 P100 P100s
Workflow use case
GPU resources
Examples
![Page 14: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/14.jpg)
14
Scaling Out - Model Design Exploration
Validate/Test
Train
Preprocess
ResNet101Batch size=128Learning rate=0.01
Validate/Test
Train
Preprocess
ResNet101Batch size=256Learning rate=0.01
Validate/Test
Train
Preprocess
ResNet152Batch size=256Learning rate=0.01
Validate/Test
Train
Preprocess
ResNet152Batch size=128Learning rate=0.1
. . .
Dynamically allocate many GPU clusters for large parameter sweeps
![Page 15: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/15.jpg)
15
Cloud scalability will cut result turnaround time
J1
J2
J3
J4
J1
J2
J3
J4
Time Time
J1
J2
J3
J4
Time
J1
J2
J3
J4
J1 J2 J3 J4
J1 J2 J3 J4
Concurrent run
Concurrent run
GPU sizeJob queue
Job queued
Job submitted concurrently
Multi GPU Model scalability
On-premise Potential with Cloud
![Page 16: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/16.jpg)
16
Range of Storage for Large Datasets
Validate/Test
Train
Local storage or distributed FS for
training set
Object Storage
Object storage for different data preparations in
experiment
Archival (cold) Storage
Archival storage for reproducability and compliance
I/O PerformanceAnd $$$
![Page 17: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/17.jpg)
17
Centralized location to break silos, Minimize file transfer and allow for data and tools connectivity
Generated Stored Managed Processed
Shared
● Cross organization collaboration
● Efficient cross geo transfer
![Page 18: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/18.jpg)
18
Challenges: Data Transfer for On Demand ResourcesOn-premise Cloud
Bulk import,Streaming updates
Sync to cluster on training start
ObjectStorage
Sync back on cluster teardown
![Page 19: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/19.jpg)
19
Challenges: Data Transfer for On Demand ResourcesOn-premise Cloud
Bulk import,Streaming updates
Sync to cluster on training start
ObjectStorage
Sync back on cluster teardown
● Multithreaded Xfer Tools● Streaming from Object
Storage● DFS
WAN accelerators
![Page 20: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/20.jpg)
20
Leveraging Cloud Computing for Deep Learning
Validate/Test
Train
Preprocess
● Advantages to training models in cloud
● Effective training on Rescale platform
● Leveraging IBM Cloud and P100s
![Page 21: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/21.jpg)
21
Rescale’s ScaleX on IBM Cloud: Turnkey DL SolutionFull-stack solution, seamless SaaS Deployment
ScaleX PlatformAutomated HPC IT Deployment
Seamless Hybrid, Multi-Core Environment
ScaleX SaaSPurpose built portals, intuitive workflows
IBM Cloud
ScaleX SW Library+180 Turn-key SW Solutions
Open Source Commercial ISV
Custom In-house
Engineers & Data Scientists
IT Admin & Managers Partners
+/- Customer’s Existing On
premise HW
Bare Metal Servers
Virtual Servers
![Page 22: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/22.jpg)
22
Rescale’s ScaleX on IBM Cloud: Turnkey DL SolutionIncreasing accessibility, usability and utilization
ScaleX PlatformAutomated HPC IT Deployment
Seamless Hybrid, Multi-Core Environment
ScaleX SaaSPurpose built portals, intuitive workflows
ScaleX SW Library+180 Turn-key SW Solutions
Open Source Commercial ISV
Custom In-house
Engineers & Data Scientists
IT Admin & Managers Partners
Specify input file2
Select SW
Select Compute
SaaS Access1
Run job5
3
4
+/- Customer’s Existing On
premise HW
IBM CloudBare Metal
ServersVirtual
Servers
![Page 23: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/23.jpg)
23
Training Deep Learning Models on Rescale● Case study: ILSVRC Distributed
TensorFlow training
imagenet_distributed_train --batch_size=64 --data_dir=$DATADIR --train_dir=out/train --job_name=worker --task_id=1 --ps_hosts="node1:2220...node16:2220" --worker_hosts="node1:2222,node1:2223...node16:2240"…imagenet_distributed_train --batch_size=64 --data_dir=$DATADIR --train_dir=out/train --job_name=ps --task_id=1 --ps_hosts="node1:2220...node16:2220" --worker_hosts="node1:2222,node1:2223...node16:2240"
● Data prep: tfrecords synced on job start to local storage
● Host strings: synthesize from machinefiles● Worker launch: use mpirun to launch
worker and parameter servers● Security: Isolated cluster network
![Page 24: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/24.jpg)
24
Scale out: Hyperparameter Search as a Service (GTC2016)
Validate/Test
Train
Preprocess
ResNet101Batch size=128Learning rate=0.01
Validate/Test
Train
Preprocess
ResNet101Batch size=256Learning rate=0.01
Validate/Test
Train
Preprocess
ResNet152Batch size=256Learning rate=0.01
Validate/Test
Train
Preprocess
ResNet152Batch size=128Learning rate=0.1
. . .
HyperparameterTemplate
Spawn shared nothing clusters,
evaluate in parallel
Summarize Results
![Page 25: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/25.jpg)
25
Scale out: Bayesian Optimization as a Service (GTC2016)
Validate/Test
Train
Preprocess
Batch size=128LR=0.01
Validate/Test
Train
Preprocess
Validate/Test
Train
Preprocess
Validate/Test
Train
Preprocess
. . .Hyperparameter
Optimizer(Spearmint, SMAC)
Template
Batch size=128LR=0.1
Batch size=64LR=0.1
Batch size=256LR=0.1
Precision=0.23 Runtime=145500 Precision=0.30
Runtime=105500
Precision=0.35Runtime=200566
Precision=0.20 Runtime=97000
![Page 26: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/26.jpg)
26
Leveraging Cloud Computing for Deep Learning
Validate/Test
Train
Preprocess
● Advantages to training models in cloud
● Effective training on Rescale platform
● Leveraging IBM Cloud and P100s
![Page 27: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/27.jpg)
27
Deep Learning on IBM Cloud and Rescale
Dataset and Code Management
Workflows and Job Management
Design of ExperimentsOptimization
Persistent Cluster
Software Libraries
Web portal, API, CLI
Hybrid Cluster Management
Rescale
Bare Metal Servers (P100s and K80s) Cloud Object StorageIBM Cloud
![Page 28: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/28.jpg)
28
Scale Up to P100s on IBM Cloud
● TensorFlow InceptionV3● ~½ training time compared to previous generation K80s
![Page 29: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/29.jpg)
29
IBM Bare Metal Networking + P100s = Fast Multi-Node
● Multi-Node TensorFlow Distributed (InceptionV3)● ~1.3x faster vs. 4x more K80s on competing provider
![Page 30: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/30.jpg)
30
PyTorch Large Model Training
● P100 enable larger batch sizes for big networks
![Page 31: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/31.jpg)
31
XGBoost - Boosted Tree Construction
● P100 >1.25x faster than CPU and K80● K80 acceleration does not provide benefit
over CPU
● P100 1.5x faster than CPU
![Page 32: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/32.jpg)
32
Streaming Training Data from Object Storage
Data sync (25 minutes, 150 GB)
Training on streaming data
● File system loads images on demand from IBM Object Storage● Large blocks (128MB) to ensure efficient download● Local cache sized to hold entire dataset so subsequent epochs are local
Training on cached data
![Page 33: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/33.jpg)
33
Questions?
Rescale● Mark Whitney, Head of Deep Learning, [email protected]● Tyler Smith, Head of Partnerships, [email protected]
IBM Cloud● Jerry Gutierrez, Global HPC Sales Lead, [email protected]● Casey Knott, IBM Cloud Platform Specialist, [email protected]
Rescale on IBM Cloud● http://www.rescale.com/ibm/
![Page 34: IBM, NVIDIA, and Client under NDA...(Spearmint, SMAC) Template Batch size=128 LR=0.1 Batch size=64 LR=0.1 Batch size=256 LR=0.1 Precision=0.23 Runtime=145500 Precision=0.30 Runtime=105500](https://reader034.vdocuments.us/reader034/viewer/2022051607/6033619919c7a95b86014a25/html5/thumbnails/34.jpg)
34