past, present & future - gtc on-demand featured talks...

Past, Present & Future:AI & HPC Infrastructure in Azure

@Karan_Batta

Senior Program Manager

Microsoft Azure Compute

Our Mission

“No compromise infrastructure”

Invest in scale out; hyper-scale workloads need low latency and high bandwidth networking

Close to bare-metal performance

Invest in eco-system of partners

True “HPC in the cloud”

Compute Virtual Machines (NC)

NC6 NC12 NC24 NC24r

Cores 6 12 24 24

GPU1 K80 GPU (1/2

Physical Card)

2 K80 GPUs (1

Physical Card)

4 K80 GPUs (2

Physical Cards)

4 K80 GPUs (2

Physical Cards)

Memory 56 GB 112 GB 224 GB 224 GB

Disk ~380 GB SSD ~680 GB SSD ~1.5 TB SSD ~1.5 TB SSD

Network Azure Network Azure Network Azure Network InfiniBand

State of the Union

5000+ customer signups during preview

General Availability since December 1st

Huge demand for specialized hardware

GPU offerings at the forefront of hardware innovation

New 1st party products built on N-Series like Cris.ai

100s of external customers in production

Areas such as AI & Deep Learning driving growth

Under The Covers

Applications

GPU Provisioning

Host OS

Client OS

Hardware

• Azure Developer & Platform Services

• Custom Images

• Azure Marketplace

• Custom apps and services

• Hyper-V

• DDA

• NVIDIA M60 GPU (Viz SKU)

• NVIDIA K80 GPU (Compute SKU)

DDA? (Discreet Device Assignment)

Real World Case Studies

“By using GPU resources in Azure, we can

run simulations in days that would take a

month on CPU-based machines. This

speeds our progress toward the

development of lifesaving drugs.”

Dr. Nagarajan Vaidehi

Director

Computational Therapeutics Core

Beckman Research Institute

“We are not short on ideas,

just computers.”

City Of Hope

AudioBurst

Next-Gen Compute Virtual Machines (NC_v2)

NC6s_v2 NC12s_v2 NC24s_v2 NC24rs_v2

Cores 6 12 24 24

GPU 1 x P100 2 x P100 4 x P100 4 x P100

Disk ~700 GB SSD ~1.4 TB SSD ~3 TB SSD ~3 TB SSD

HPC Workloads Performance Gains with P100

2x K80 4x P100 16GB

Speedup Relative to

Dual Broadwell

Broadwell CPU System: Dual E5-2690v3@ 2.6GHz, 14 CoreGPU System: Same CPU system with 2x K80 and 4x P100 PCIe with 16GB

Artificial Intelligence

Seeing AI

Skype Translator

NOONUM

Algorithmia

Smart Refrigerator

The system's word error rate is reported to be 5.9 percent, which is "about equal" to professional transcriptionists asked to work on speech

Cognitive Toolkit fastest on Azure & Pascal GPUs

Deep Learning Virtual Machines (ND)

ND6s ND12s ND24s ND24rs

Cores 6 12 24 24

GPU 1 x P40 2 x P40 4 x P40 4 x P40

Disk ~700 GB SSD ~1.4 TB SSD ~3 TB SSD ~3 TB SSD

Training Workloads Performance Gains with P40

AlexnetOWT Googlenet InceptionV3 ResNet-50 VGG16 AlexnetOWT ResNet-152 ResNet-50

4x K80 4x P40

Speed-Up ranging to over 2x for training workloads

CNTKCaffe

Up to 21x Inference Throughput with P40

1 2 4 8 16 32 64 128

Batch Size

21x Speedup

GPU: Ubuntu 14.04.5, Tensor RT 2.1, CUDA 8.0.42, cuDNN 6.0.5; precision FP32 (K80), INT8 (P40 GPU).

ResNet-50

Optimize performance with TensorRT and reduced precision

NVIDIA Tesla P40 Demo

Follow me @Karan_Batta

Thanks!

past, present & future - gtc on-demand featured talks...

Documents

01-09 renovation nation (gtc austin...

savant syndrome gtc

gtc - presentation

gtc taxis 1

notes - gtc

gtc 2015 highlights

manual gtc

multinivel gtc 2013

statistics on three regulated asbestiform amphiboles:...

gpus in the film visual effects pipeline -...

gtc gtc opc - opel.de · opel gtc und gtc opc 3...

gtc brochure

all about gtc

gtc 45 2012

gtc 45 2011

academic research programs & sponsored research - gtc...

leveraging microsoft azure's gpu n-series for - gtc...

gtc 142 (imprimir)

gtc-300ex gtc-700ex · 40t telescopic boom crawler crane...

gtc lidong training.ppt