nvidia™ gpu performance testing and powerai on ibm cloud...mixed precision (fp16/32) leverages...
TRANSCRIPT
IBM Cloud / March, 2018 / © 2018 IBM Corporation
NVIDIA™ GPU Performance Testing and PowerAI on IBM Cloud—Alex HudakOffering Manager, IBM Cloud
Brian WanSoftware Engineer, IBM Cloud
2IBM Cloud / March, 2018 / © 2018 IBM Corporation
Please note
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.
The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Contents
3IBM Cloud / March, 2018 / © 2018 IBM Corporation
Part One: The AI & GPU Market 04
Part Two: NVIDIA GPUs on IBM Cloud 09
Part Three: Performance Data 16
Part Four: PowerAI on IBM Cloud 25
IBM Cloud / March, 2018 / © 2018 IBM Corporation 4
Part One: The Market
% of organizations that report some form of AI is already in production in their organization
5IBM Cloud / March, 2018 / © 2018 IBM Corporation Source: Teradata 2017
% of these organizations that still believe they will need to invest more in AI tech over the next 36 months
6IBM Cloud / March, 2018 / © 2018 IBM Corporation
30%
“Nowadays, for machine learning and particularly deep learning, it’s all about GPUs.”
ForbesDecember 2017
7IBM Cloud / March, 2018 / © 2018 IBM Corporation
Text or image
8IBM Cloud / March, 2018 / © 2018 IBM Corporation
“Humanity’s moonshots like eradicating cancer, intelligent customer experiences, and self-driving vehicles are within reach of this next era of AI.”
Text or image
Financial Services
Gaming
Medical Research
Automotive
Manufacturing
GPU Use Cases
IBM Cloud / March, 2018 / © 2018 IBM Corporation 9
Part Two: NVIDIA GPUs on IBM Cloud
Tesla V100
Bare Metal Monthly
&
Virtual (Coming Soon)Monthly & Hourly
Tesla K80
Bare Metal Monthly & Hourly
Tesla P100
Bare Metal Monthly
&
VirtualMonthly & Hourly
Tesla M60
Bare Metal Monthly & Hourly
10IBM Cloud / March, 2018 / © 2018 IBM Corporation
NVIDIA GPUs on IBM Cloud
Tesla M60Used for enterprise virtualization as well as boosting professional graphics performance.
Bare Metal Monthly & Hourly
Tesla K80Reliable GPU performance ideal for introductory AI computing at an affordable price.
Bare Metal Monthly & Hourly
Tesla P100Essential performance for standard AI and HPC capabilities.
Bare Metal Monthly &Virtual Monthly & Hourly
Tesla V100IBM Cloud’s most powerful and advanced GPU, purpose-built for progressive Deep Learning workloads —with the performance of 100 CPUs in a single GPU.
Bare Metal Monthly &Virtual (Coming Soon)Monthly & Hourly
11IBM Cloud / March, 2018 / © 2018 IBM Corporation
NVIDIA GPUs on IBM Cloud
NVIDIA GPU-Enabled Data Centersby region
12IBM Cloud / March, 2018 / © 2018 IBM Corporation
North AmericaDallasHoustonMexicoMontreal
SeattleSan JoseTorontoWashington DC
EuropeMilanOsloParis
AmsterdamFrankfurtLondon
South AmericaSao Paulo
AsiaSingaporeTokyo
ChennaiHong KongSeoul
AustraliaMelbourneSydney
NVIDIA™ Tesla™ V100 (Bare Metal)
NVIDIA™ Tesla™ P100 (Bare Metal)
NVIDIA™ Tesla™ P100 (Virtual)
NVIDIA™ Tesla™ K80 (Bare Metal)
NVIDIA™ Tesla™ M60 (Bare Metal)
NVIDIA GPU-Enabled Data Centersby GPU
13
DallasWashington DC
DallasSan JoseWashington DC
AmsterdamSeoulTokyo
DallasWashington DC (Mar 2018)
London (April 2018)
All GPU DCs
All GPU DCsIBM Cloud / March, 2018 / © 2018 IBM Corporation
NVIDIA GPUs on Virtual Servers
14IBM Cloud / March, 2018 / © 2018 IBM Corporation
Run HPC, AI, and simulation workloads with
efficiency and
scalabilityin a virtual environment
Source: https://console.bluemix.net/docs/vsi/vsi_public_gpu.html#gpu
ac1.8x60 acl1.8x60 ac1.16x120 acl1.16x120
GPU 1 x P100 1 x P100 2 x P100 2 x P100
GPU RAM (GB) 16 16 32 32
vCPU 8 8 16 16
vCPU RAM (GB) 60 60 120 120
Storage Type Block (SAN) Local SSD Block (SAN) Local SSD
Boot Disc (GB) 25 and 100 100 25 and 100 100
Secondary Disc (GB) 4 x 2000 2 x 300 4 x 2000 2 x 300
The only bare metal GPU provider
Superior security over competitors
All resources dedicated to a single user
IBM Cloud / March, 2018 / © 2018 IBM Corporation
NVIDIA GPUs on Bare Metal
15
Hard Drive1 TB SATA –3.8 TB SSD
Core12 – 28 Cores
Network100 MBps –10 GBps
Memory64 GB –1.5 TB
© 2018 IBM Corporation 16
Part Three: Performance Data
Notes: • Input dataset: ImageNet (crop size=224x224); Batch size = 64 per GPU (for both InceptionV3 and ResNet50 neural net models)• With NVIDIA V100 GPUs, independent distribution mode for model variables and gradients was used for optimal performance.• Mixed precision (FP16/32) leverages Tensor Cores in V100 GPUs. SoftLayer bare-metal server has 48 logical CPU cores while Power9 server and AWS P3 instance have 64 logical CPU cores.
• With same numbers of V100 GPUs, Power9 servers deliver better performance (up to 1.58X in single precision) than Amazon P3 instances. This could be attributed to Power9 CPU optimizations and CPU-GPU NVLink support.
• With 4 x V100 GPUs, Power9 server has higher performance than Amazon P3 instance with 8 x V100 GPUs in single precision mode.
• AWS P3 instance does not scale well beyond 4 x V100 GPUs in single precision mode (although it does scale well leveraging Tensor Cores in half precision (FP16) mode.
• TensorFlow 1.4 and 1.5 versions do not leverage the Tensor Cores in V100 GPUs very well, so the latest TensorFlow 1.6-dev build was used for optimal half precision (FP16) performance.
• For InceptionV3 on TensorFlow, half precision (FP16) on the V100 GPUs uses Tensor Cores to achieve ~1.8X better performance than single precision. The larger performance gain (up to 4.4X) of FP16 on AWS P3 is due to the relatively low performance of the Deep Learning AMI with TensorFlow v1.4 used for single precision compared to TensorFlow 1.6-dev used for half precision mode.
Higher is better
• For TensorFlow 1.6, in the default single precision mode and parameter_server method of model variable and gradient distribution, we need at least 32 logical/virtual CPU cores (or 16 physical CPU cores) for each V100 GPU for optimal performance. TensorFlow 1.5 does not leverage the V100 GPU well.
• For the current version of TensorFlow (v1.5) in half-precision (FP16) mode, we still need 32 logical/virtual CPU cores (16 physical CPU cores) to drive a V100 GPU optimally
• For the latest TensorFlow v1.6-dev build in half-precision (FP16) mode where the Tensor Cores on the V100 GPU are fully leveraged, the default parameter_server mode for model variable and gradient distribution becomes the performance bottleneck, so having more than 8 logical/virtual CPU cores (4 physical CPU cores) would not improve performance further.
• To alleviate the performance bottleneck of parameter_server, we could use the independent(replicated_distributed) method where model variables are replicated on each GPU. In this indepdendentmode, we only need 4 logical/virtual CPU cores (2 physical CPU cores) for each V100 GPU.
CPU cores required for an NVIDIA GPUDeep Learning Model Training – Impact of vCPUs on NVIDIA V100 GPUs
Deep Learning Model Training – Impact of vCPUs on NVIDIA V100 GPUs
• For each P100 GPU, we need at least 8 logical/virtual CPU cores (4 physical CPU cores) for optimal performance. For 2 x P100 GPUs, we need at least 16 logical/virtual CPU cores (8 physical CPU cores) for optimal performance.
• For 4 x K80 GPU cores (2 x K80 PCIe GPU cards), we need at least 8 logical/virtual (4 physical CPU cores) for optimal performance.
CPU cores required for an NVIDIA GPU Deep Learning Model Training – Impact of vCPUs on NVIDIA P100 GPUs
Deep Learning Model Training – Impact of vCPUs on NVIDIA K80 GPUs
Amazon P3 instance in single and half-precision modes
IBM Cloud (SoftLayer) server delivers better price-performance in single and half-precision modes than Amazon P3 instance
Higher is better
Multi-Node - Distributed Deep Learning
• Optimal deep-learning training across large number of GPUs (> 16 GPUs)
• Two options IBM Research’s Distributed Deep Learning (DDL) AI framework-independent, low-latency communication
library for implementing distributed deep learning Currently for PowerAI only Supports multiple AI frameworks (TensorFlow, Caffe, Caffe2,
Torch, Keras, pyTorch, etc.) Tested – could scale very well up to 256 GPUs across
multiple servers Horovod (open source, contributed by Uber) Novel overlapping of compute and communication Requires integration of communication protocols into each AI
framework Only supports TensorFlow at this time Tested – could scale up very well to (at least) 24 GPUs IBM DDL Scaling Up to 256 GPUs on 64 Power8 Servers
(Nodes)
Notes: • Input dataset: ImageNet (crop size=224x224)• For Caffe, highest batch sizes were used to fully exploit GPU memory. For TensorFlow, batch size = 64 per GPU.• Mixed precision (16-bit input matrices, 32-bit accumulator) leverages Tensor Cores in V100 GPUs
For TensorFlow, independent distribution mode (replicated_distributed) for model variables and gradient aggregation delivers much better performance for 4 GPUs (and higher) than the default parameter_server mode.
Higher is better
Deep Learning Model Training – Power9 Server w/ NVIDIA Tesla V100 GPUs
NVIDIA CaffeImpact of Batch Size – NVIDIA Tesla V100 GPUs – Single Precision
Deep Learning Training – VGG-16 on NVIDIA Caffe
Deep Learning Training – BVLC Caffe vs. NVIDIA Caffe – NVIDIA Tesla V100 GPUs
NVIDIA Caffe w/ 1xV100NVIDIA Caffe w/ 1xV100 NVIDIA Caffe w/ 2xV100
© 2018 IBM Corporation 25
Part Four: PowerAI on IBM Cloud
26
In IBM Cloud
Available early 2Q 2018Delivered via IBM Cloud CatalogBilled through IBM CloudSupported by IBM and Nimbix
• PowerAI Version 5• On-Demand Cloud Provisioning• Superb Price Performance• Highly Scalable Distributed Deep
Learning (DDL)• Large Model Support• Containerized and Extensible• Powered by Trusted Partner Nimbix
PowerAI on IBM Cloud
(1) Based on IBM internal measurements on 1/25/18 on the following configuration: InceptionV3 neural network model training with TensorFlow benchmark script (https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) on IBM Power System AC922 with 4 Nvidia V100 GPUs vs. Amazon P3 instance p3.8xlarge with 4 Nvidia V100 GPUs. Software versions on Power System AC922: CUDA 9.1, CuDNN 7.1, TensorFlow 1.5; on Amazon P3 instance: CUDA 9, CuDNN 7.0.5, TensorFlow 1.6-dev, Amazon Deep Learning AMI (Ubuntu) Version 2.0 (ami-9ba7c4e1). Hourly retail pricing for IBM PowerAI in IBM Cloud and Amazon P3 instances (https://aws.amazon.com/ec2/pricing/on-demand/).
(2) POWER8 performance data was collected on 3/7/2018 in Nimbix Cloud with IBM Power System S822LC for InceptionV3 model training with TensorFlow benchmark script (https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks), 4 Nvidia P100 GPUs, CUDA 9.0.176, CuDNN 7.0.5, and TensorFlow 1.4 in PowerAI R5. Amazon p3.8xlarge with 4 Nvidia V100 GPUs, CUDA 9, CuDNN 7.0.5, TensorFlow 1.6-dev, Amazon Deep Learning AMI (Ubuntu) Version 2.0 (ami-9ba7c4e1). Hourly retail pricing for IBM PowerAI in IBM Cloud and Amazon P3 instances (https://aws.amazon.com/ec2/pricing/on-demand/).
(3) Based on IBM internal measurements on 1/25/18 on the following configuration: InceptionV3 neural network model training with TensorFlow benchmark script (https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks) on IBM Power System AC922 with 4 Nvidia V100 GPUs vs. Amazon P3 instance p3.8xlarge with 4 Nvidia V100 GPUs. Software versions on Power System AC922: CUDA 9.1, CuDNN 7.1, TensorFlow 1.5; on Amazon P3 instance: CUDA 9, CuDNN 7.0.5, TensorFlow 1.6-dev, Amazon Deep Learning AMI (Ubuntu) Version 2.0 (ami-9ba7c4e1).
(4) Based on IBM internal measurements on 11/26/2017: https://developer.ibm.com/linuxonpower/perfcol/perfcol-mldl/.
1.6x3
1.1x2
1.2x1
3.7x4
DL Training Throughput (Images/sec) with 4 NVIDIA V100 GPUs vs. Amazon
AWS P3 instance
Better Price Performance with POWER8 vs. Amazon AWS P3 instance
Better Price Performance (DL Training Throughput per US$) with POWER9 vs.
Amazon AWS P3 instance
DL Training Throughput with Large Model Support (LMS) feature vs.
comparable x86 server
Cloud Price-Performance Leadership
POWER9 Performance Leadership
Open Source Frameworks: Supported Distribution
Developer Ease-of-Use Tools
Faster Training Times viaHW & SW Performance Optimizations
28
PowerAI Service on IBM Cloud
• PowerAI with Distributed Deep Learning will be a generally available offering in the IBM Cloud Catalog
• Users will be able to provision PowerAIinstances of various sizes through the web interface and CLI
• The offering will enable automated transfer and loading of data from IBM Cloud Object Storage into instances
• Offering authentication through IBM Identity Management• IBM Cloud Logging and Auditing for user provisioning actions• Data transfer between IBM Cloud and Nimbix over Direct Link Dedicated• Data encrypted at rest and in-transit between IBM and Nimbix• Nimbix environment physical and digital security inspected• PowerAI delivered to Nimbix by IBM via images inspected by IBM
29
PowerAI Service on IBM Cloud
IBM Cloud Catalog NimbixDirect Link
Dedicated
IBM Cloud Object
StorageIBM Cloud PowerAIService
Notices and disclaimers
30© 2018 IBM Corporation
© 2018 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM.U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided.IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how thosecustomers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law.
Notices and disclaimerscontinued
31© 2018 IBM Corporation
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose.The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml..
Thank you
32© 2018 IBM Corporation
Alex HudakOffering Manager, IBM Cloud—[email protected]+1-469-766-8058ibm.com
Brian WanSoftware Engineer, IBM Watson and Cloud Platform—[email protected]+1-512-286-8711ibm.com
33© 2018 IBM Corporation