accelerated computing: the path forward...accelerated computing: the path forward “ it’s time to...
TRANSCRIPT
Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 | Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD
COMMODITY DISRUPTS CUSTOM
SOURCE: Top500
ACCELERATED COMPUTING:
THE PATH FORWARD
“ It’s time to start planning for the end of Moore’s Law, and it’s worth pondering how it will end, not just when.”
Robert Colwell
Director, Microsystems Technology Office, DARPA
NVIDIA ACCELERATES COMPUTING
Productive Programming Model & Tools
Expert Co-Design
Accessibility
APPLICATION
MIDDLEWARE
SYS SW
LARGE SYSTEMS
PROCESSOR 0.0
0.5
1.0
1.5
2.0
2.5
3.0
2008 2009 2010 2011 2012 2013 2014
NVIDIA GPU x86 CPU
Fast GPU Engineered for High Throughput
TFLOPS
M2090
M1060
K20
K80
K40
Fast GPU +
Strong CPU
0
25
50
75
100
125
2013 2014 2015
100+ accelerated systems now on Top500 list
1/3 of total FLOPS powered by accelerators
NVIDIA Tesla GPUs sweep 23 of 24 new
accelerated supercomputers
Tesla supercomputers growing at 50% CAGR
over past five years
Top500: # of Accelerated Supercomputers
ACCELERATORS SURGE IN WORLD’S TOP SUPERCOMPUTERS
MACHINE LEARNING HPC’S 1ST CONSUMER KILLER-APP
“NADELLA: SMART AGENTS LIKE CORTANA WILL REPLACE THE WEB BROWSER” -BI
FACEBOOK MESSENGER ADDS FACIAL RECOGNITION
YOUTUBE: CLICK-TO-BUY ADS
GOOGLE PHOTOS: ML-POWERED FEATURES
MICROSOFT OPEN-SOURCES DMTK
GOOGLE OPEN-SOURCES TENSORFLOW
TESLA FOR MACHINE LEARNING
10M Users 40 years of video/day
270M Items sold/day 43% on mobile devices
TESLA M4 TESLA M40
HYPERSCALE SUITE
POWERFUL: Fastest Deep Learning Performance LOW POWER: Highest Hyperscale Throughput
GPU Accelerated FFmpeg
Image Compute Engine
GPU REST Engine
MACHINE LEARNING REVOLUTIONIZING TRANSPORTATION
“Toyota Invests $1 Billion in Artificial Intelligence in U.S.”
— U.S. News & World Report
39%
45%
55%
62%
66%
72% 75%
79%
83%
30%
40%
50%
60%
70%
80%
90%
100%
7/8 7/22 8/5 8/19 9/2 9/16 9/30 10/1410/28
END-TO-END MACHINE LEARNING PLATFORM FOR AUTONOMOUS CARS
NVDRIVENET on KITTI Object Detection
BAIDU
11/10
DRIVE PX
DIGITS DevBox
NVIDIA
MACHINE LEARNING REVOLUTIONIZING AUTONOMOUS MACHINES
JETSON TX1 Supercomputer
on a Module
10x Energy Efficiency Alexnet
GPU 1 TFLOPS
256-core Maxwell
CPU 64-bit ARM A57s
Memory 4GB LPDDR4
26 GB/s
Power Under 10W
0
10
20
30
40
50
Intel Core i7-6700K (Skylake)
Jetson TX1
Images
/ S
ec /
Watt
Under 10W for typical use cases
PC GAMING
SUPERCOMPUTING EVERYWHERE
Titan X for PC
Tesla in the Cloud
Jetson TX1 for Robots
DRIVE PX for Auto
Ian Buck, VP of Accelerated Computing, NVIDIA SC15 | Nov. 16, 2015
ACCELERATED COMPUTING: THE PATH FORWARD
TESLA ACCELERATES DISCOVERY AND INSIGHT
270M Items sold/day 43% on mobile devices
SIMULATION
TESLA ACCELERATED COMPUTING
VISUALIZATION MACHINE LEARNING
“Approximately a third of HPC
systems operating today are
equipped with accelerators
and nearly half of all newly
deployed systems have them.”
ACCELERATED COMPUTING: A TIPPING POINT FOR HPC, Intersect360 Nov 2015
70% OF TOP HPC APPS NOW ACCELERATED
VASP NOW ACCELERATED
Typically Consumes
10-25%
of HPC System
INTERSECT360 SURVEY OF TOP APPS
Top 10 HPC Apps 90%
Accelerated
Top 50 HPC Apps 70%
Accelerated
1 Dual K80 Server 1.3x
4 Dual CPU Servers 1.0x
Intersect360, Nov 2015 “HPC Application Support for GPU Computing” Dual-socket Xeon E5-2690 v2 3GHz, Dual Tesla K80, FDR InfiniBand Dataset: NiAl-MD Blocked Davidson
370 GPU-Accelerated Applications
www.nvidia.com/appscatalog
TESLA FOR SIMLUATION
LIBRARIES
TESLA ACCELERATED COMPUTING
LANGUAGES DIRECTIVES
ACCELERATED COMPUTING TOOLKIT
TESLA K80 World’s Fastest Accelerator
for HPC 0 5 10 15 20 25 30
Tesla K80 Server
Dual CPU Server
# of Days
AMBER Benchmark: PME-JAC-NVE Simulation for 1 microsecond CPU: E5-2698v3 @ 2.3GHz. 64GB System Memory, CentOS 6.2
CUDA Cores 2496
Peak DP 1.9 TFLOPS
Peak DP w/ Boost 2.9 TFLOPS
GDDR5 Memory 24 GB
Bandwidth 480 GB/s
Power 300 W
Simulation Time from 1 Month to 1 Week
5x Faster AMBER Performance
APPLICATION PERFORMANCE BOOSTS DATA CENTER THROUGHPUT
TESLA K80: 5X FASTER 1/3 OF NODES ACCELERATED, 2X SYSTEM THROUGHPUT
100 Jobs Per Day 220 Jobs Per Day
CPU-only System Accelerated System
0x
5x
10x
15x
QMCPACK LAMMPS CHROMA NAMD AMBER
K80 CPU
CPU: Dual E5-2698 [email protected] 3.6GHz, 64GB System Memory, CentOS 6.2 GPU: Single Tesla K80, Boost enabled
Speed-up vs Dual CPU
OPENACC DELIVERS TRUE PERF PORTABILITY Paving the Path Forward: Single Code for All HPC Processors
4.1x 5.2x
7.1x
4.3x 5.3x 7.1x 7.6x
11.9x
30.3x
0x
5x
10x
15x
20x
25x
30x
35x
359.MINIGHOST (MANTEVO) NEMO (CLIMATE & OCEAN) CLOVERLEAF (PHYSICS)
CPU: MPI + OpenMP CPU: MPI + OpenACC CPU + GPU: MPI + OpenACC
Speedup v
s Sin
gle
CPU
Core
Application Performance Benchmark
359.miniGhost: CPU: Intel Xeon E5-2698 v3, 2 sockets, 32-cores total, GPU: Tesla K80- single GPU NEMO: Each socket CPU: Intel Xeon E5-‐2698 v3, 16 cores; GPU: NVIDIA K80 both GPUs CLOVERLEAF: CPU: Dual socket Intel Xeon CPU E5-2690 v2, 20 cores total, GPU: Tesla K80 both GPUs
TESLA HYPERSCALE FOR MACHINE LEARNING
10M Users 40 years of video/day
270M Items sold/day 43% on mobile devices
TESLA M4 TESLA M40
HYPERSCALE SUITE
POWERFUL: Fastest Deep Learning Performance LOW POWER: Highest Hyperscale Throughput
GPU Accelerated FFmpeg
Image Compute Engine
GPU REST Engine
TESLA M40 World’s Fastest Accelerator
for Deep Learning 0 1 2 3 4 5 6 7 8 9
Tesla M40
CPU
8x Faster Caffe Performance
# of Days
Caffe Benchmark: AlexNet training throughput based on 20 iterations, CPU: E5-2697v2 @ 2.70GHz. 64GB System Memory, CentOS 6.2
CUDA Cores 3072
Peak SP 7 TFLOPS
GDDR5 Memory 12 GB
Bandwidth 288 GB/s
Power 250W
Reduce Training Time from 8 Days to 1 Day
TESLA M4 Highest Throughput
Hyperscale Workload Acceleration
CUDA Cores 1024
Peak SP 2.2 TFLOPS
GDDR5 Memory 4 GB
Bandwidth 88 GB/s
Form Factor PCIe Low Profile
Power 50 – 75 W
Video Processing
4x
Image Processing
5x
Video Transcode
2x
Machine Learning Inference
2x
H.264 & H.265, SD & HD
Stabilization and Enhancements
Resize, Filter, Search, Auto-Enhance
Preliminary specifications. Subject to change.
TESLA FOR VISUALIZATION
IRAY
TESLA ACCELERATED COMPUTING
INDEX OPTIX
VISUALIZATION TOOLS FOR HPC
GROWING ADOPTION IN CLIMATE & WEATHER
MeteoSwiss Deploys World’s First Accelerated Weather Supercomputer
2x higher resolution for daily forecasts
14x more simulation with ensemble approach for medium range forecasts
NOAA Chooses Tesla To Improve Weather Forecast Research
Develop global model with 3km resolution, five-fold increase from today’s resolution
Improved resolution requires 40x higher in computational complexity
NEXT-GEN SUPERCOMPUTERS ARE GPU-ACCELERATED
SIMULATION
TESLA ACCELERATED COMPUTING
VISUALIZATION MACHINE LEARNING
SUMMIT
SIERRA
U.S. Dept. of Energy
Pre-Exascale Supercomputers for Science
IBM Watson
Breakthrough Natural Language Processing for Cognitive Computing
NOAA
New Supercomputer for Next-Gen Weather Forecasting
ENTERPRISE
World’s first in-situ weather simulation, running on Meteoswiss
supercomputer
Simulation of deadly tornado that hit El Reno,
Oklahoma on May 24, 2011
ACCELERATED SCIENCE AND DATA ANALYTICS ON DISPLAY AT SC’15
Analyze quantum effects in nanowire using 10,800 GPUs on
TITAN supercomputer
Predicting drug reactions for personalized medicine with
GPU-powered IBM Spark