bring the power of cuda to small deviceson-demand.gputechconf.com/gtc/2017/presentation/s7223... ·...
Post on 03-Apr-2020
0 Views
Preview:
TRANSCRIPT
BRING THE POWER OF CUDA
TO SMALL DEVICES
Daniel Lang, CTO, Toradex Inc.
www.toradex.com
CUDA in Traditional Setting
5/10/20172
• PCs
– 200 - 400W
– 11 TFLOPS (Titan X Pascal)
• NVIDIA® Digit
– 1600W
– 4x Titan X
• NVIDIA® DGX-1™
– 3200W
– 170 TFLOPS (FP16)
Picture source: https://commons.wikimedia.org/wiki/File:Avant-Tower-Gaming-PC.png
Picture source: https://www.NVIDIA.com/en-us/deep-learning-ai/solutions/
www.toradex.com
Lots of
Data
Small Devices CUDA-enabled via Cloud
5/10/20173
Training
Trained
Network
Picture source: https://clipartfox.com/categories/view/25ce873fc2c13fb0076162a29abb3ced4abd7d59/clipart-crab.html
www.toradex.com
Lots of
Data
5/10/20174
Training
Trained
Network
Picture source: https://clipartfox.com/categories/view/25ce873fc2c13fb0076162a29abb3ced4abd7d59/clipart-crab.html
Small Devices CUDA-enabled via Cloud
www.toradex.com5/10/2017
5
https://commons.wikimedia.org/wiki/File:Tesla_Model_S_on_CW-S5_Matte_Black_Machined_Face_cropped.jpghttps://commons.wikimedia.org/wiki/File:US_Navy_100609-N-0000S-002_Aerographer%27s_Mate_Airman_Alex_Boston,_left,_and_Aerographer%27s_Mate_3rd_Class_Ryan_Thuecks,_right,_both_assigned_to_the_Naval_Oceanography_Mine_Warfare_Center,_and_Ana_Ziegler.jpghttps://commons.wikimedia.org/wiki/File:FANUC_R-2000iB_series_robot_128.jpghttps://commons.wikimedia.org/wiki/File:Vermont_Lunar_CubeSat.jpghttps://commons.wikimedia.org/wiki/File:Amazone_BoniRob_Feldroboter-Entwicklungsprojekt.jpg
• Issues
– Latency
– Reliability / Safety
– Cost
– Security
– Privacy
– Bandwidth
– Practicality
Why Computing at the Edge
www.toradex.com5/10/2017
6
• Military Embedded Applications
– Power requirements similar to PC
– Ruggedized
– Heavy / Expensive
• Industrial PC
– PCs for factory floor
CUDA at the Edge so far
Picture source: http://www.rugged.com/a191-redibuilt-gpgpu-rugged-computer
www.toradex.com
NVIDIA Jetson™ TK1
5/10/20177
• Low-cost Maker Board
– Tegra TK1 SoC
– 4-Plus-1™ARM Cortex-A15 2.2Ghz
– 2 GB RAM / 4 GB Flash
– Kepler GPU with 192 CUDA cores
– Unified Memory
– 192 USD
• Less than 15W
• Large Ecosystem
Picture source: https://www.flickr.com/photos/120586634@N05/14672953894Picture source: http://elinux.org/Jetson_TK1
www.toradex.com
Create a Product
• Jetson Board not designed for volume production
• Too large
• Connector issues
• Long-term availability?
• Temperature range
• Same issues as RaspberryPi / BeagleBone /…
5/10/20178Picture source: https://www.flickr.com/photos/120586634@N05/14672953894
Picture source: https://commons.wikimedia.org/wiki/File:CSIRO_ScienceImage_10876_Camclone_T21_Unmanned_Autonomous_Vehicle_UAV_fitted_with_CSIRO_guidance_system.jpg
www.toradex.com
Do your own Design: Example - TK1
• 580 Parts
• 3’415 Pins
• 4’785 VIAs
• Fine-pitch uBGA
• 6+ different voltages
5/10/20179
www.toradex.com
Do your own Design
• 12 Layers
• CPU >12 A peak current
• GPU >11A peak current
• DDR3L high speed layout
• Software Adjustments
5/10/201710
www.toradex.com
Projects with Design-In
• Nexus 9 Tablet
• Nintendo Switch™
• Very High Volumes
5/10/201711Picture source: https://commons.wikimedia.org/wiki/File:Nexus_9.png
Picture source: https://commons.wikimedia.org/wiki/File:Nintendo_Switch_Portable.png
www.toradex.com
Computer on Modules (CoM) / System on Modules (SoM)
5/10/201712
• Encapsulates complexity
• 1 component instead of >500
• Maintenance of components
• Reduced risk / time-to-market
• Lower initial costs
• Scalability
• Ready-to-use Operating System
www.toradex.com
Carrier Board
5/10/201713
• Only application-specific interfaces
• Application-specific additional HW
• Board is a differentiator
• Typically relative simple
www.toradex.com
Typical Applications for CoMs
5/10/201714
• Not a new concept
• Typical up to 50k project volume per year
• From Microcontroller to High-end Computers
• Wide range of form factors
• Time-to-market and initial cost critical
Picture source: https://blogs.NVIDIA.com/blog/2016/04/19/wave-glider-robot/
Picture source: https://pixabay.com/en/drone-flight-fly-rotor-aircraft-1030650/
www.toradex.com
Example: Handheld Ultrasound
5/10/201715
• Veterinary diagnostic
• Computer Module for HMI, visualization
• Battery-powered
• GPUs for HMI
• GPUs help with diagnostics
Picture source: https://blogs.NVIDIA.com/blog/2016/04/19/wave-glider-robot/
Picture source: https://pixabay.com/en/drone-flight-fly-rotor-aircraft-1030650/
www.toradex.com
Example: IoT Gateway
5/10/201716
• Connect “Things” to the Internet
• Security, management, protocol translations
• Typically no GPU
• GPGPU for Edge Analytics
Picture source: https://www.flickr.com/photos/intel_de/13689005583
www.toradex.com
Example: Cover Meter
5/10/201717
• Inspection of Concrete Cover, Rebars
• Computer Module for HMI, visualization
• Small GPUs for HMI and Visualization
Processing
• Replace/ Supplement DSPs/ FPGAs with
GPU
Picture source: https://blogs.NVIDIA.com/blog/2016/04/19/wave-glider-robot/
Picture source: https://pixabay.com/en/drone-flight-fly-rotor-aircraft-1030650/
www.toradex.com
Example: Snow Plowing
5/10/201718
• CoM for controlling hydraulics and monitoring
• Recording for legal reasons
• GPUs for HMI
• Cameras for Monitoring equipment
• GPUs help monitoring environment, traffic
https://pixabay.com/en/snowplow-road-truck-night-weather-1168278//
www.toradex.com
Example: Commercial Coffee Machine
5/10/201719
• CoM for HMI and Control
• Small GPUs for HMI / videos
• Advertisement supported free coffee
• Face recognition, GPGPU-tuned coffee?
www.toradex.com5/10/2017
20
• Toradex Apalis TK1
– 192 Kepler CUDA Cores
– 8 -15W
• NVIDIA ® Jetson™ TX1
– 256 Maxwell CUDA Cores
• NVIDIA ® Jetson™ TX2
– 256 Pascal CUDA Cores
– 7.5 - 15W
Picture source: http://www.NVIDIA.com/object/embedded-systems-dev-kits-modules.html
Some Tegra-based Computer Modules
www.toradex.com
Some Tegra-based Computer Modules
5/10/201721Picture source: http://www.NVIDIA.com/object/embedded-systems-dev-kits-modules.html
Toradex Apalis TK1 NVIDIA Jetson TX1 NVIDIA Jetson TX2
CPU 4x A15 32bit 4x A57 64bit 2x Denver / 4x A57 64bit
GPU Kepler™, 192 CUDA cores Maxwell™, 256 CUDA cores Pascal™, 256 CUDA cores
RAM 2GB 64bit DDR3L 4 GB 64bit LPDDR4 8 GB 128bit LPDDR4
Flash Memory 16GB eMMC 16GB eMMC 32GB eMMC
Video Decode 4Kp30, 4x 1080p30 4Kp60, 4x 1080p60 2x 4Kp60
Video Encode 4Kp24, 1080p60 4Kp30, 2x 1080p60 4Kp60, 8x 1080p30
CAN Bus 2x No 2x
Size 82 x 45 mm / 314 pins 87 x 50 mm / 400 pins 87 x 50 mm / 400 pins
Compatible to other SoCs Yes No No
Availability 2025 (5 years) (5 years)
1k Price 175 USD 299 USD 399 USD
www.toradex.com
Projects with NVIDIA Modules
5/10/201722
• Robots Air / Water / Land
• High-precision agriculture
• Smart Cameras
• Cube Satellite in development
• TV / Cinema Special Effects
• ……….
Picture source: https://blogs.NVIDIA.com/blog/2016/04/19/wave-glider-robot/
Picture source: https://pixabay.com/en/drone-flight-fly-rotor-aircraft-1030650/
www.toradex.com
A closer look
• Typical Interface
– Single 3.3V power supply
– USB 3.0
– HDMI and LVDS
– Camera Serial Interfaces (CSI)
– PCIe
– CAN
– GPIO, I2C, SPI, UART, PWM
5/10/201723
www.toradex.com
Toradex Apalis Off-the-Shelf Carrier Boards
5/10/201724
www.toradex.com
NVIDIA Jetson Off-the-Shelf Carrier Boards
5/10/201725
.
(GTC Booth #525)
www.toradex.com
Customized Carrier Boards
5/10/201726
• Design Partners
– Jetson Ecosystem https://developer.NVIDIA.com/embedded/community/ecosystem
– Toradex Ecosystem https://www.toradex.com/support/partner-network/services/carrier-boards
www.toradex.com
DIY Carrier Board Resources
5/10/201727
• Proven reference designs
• Open Hardware (e.g. Altium® Design Files)
• Design Guide/ Layout Guide
• Pinout Tool
• Schematic reviews
www.toradex.com
DIY Carrier Board Complexity
5/10/201728
• Minimum requirement, just 3.3V power supply
• 179 components (Viola Carrier)
• 4 to 6 layer board
• Lower-cost design tools
www.toradex.com
Design your own Carrier Board
5/10/201729
• Gumstix Geppetto™
online editor
www.toradex.com
Software!
5/10/201730
• Very Important
• Long Term Support
• Out of the Box experience, volume production
• Yocto-based Embedded Linux
– Customizable
• Ubuntu
– Jetpack
• Frameworks
– cuDNN, VisonWorks, TensorRT, Caffe,….
www.toradex.com
Takeaway
5/10/201731
• Overview about solutions to deployment CUDA into devices
• Larger Installation base of CoMs, most are new to GPGPU
v
Questions and Answers
Daniel Lang: daniel.lang@toradex.com
Phone: +1-206-319-5612
Developer Center: developer.toradex.com
Community Forum: community.toradex.com
top related