september 13, 2016 | beijing 边缘计算揽洪荒之力挺直播大潮...
TRANSCRIPT
![Page 1: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/1.jpg)
September 13, 2016 | Beijing
边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDN
![Page 2: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/2.jpg)
2
全民直播
![Page 3: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/3.jpg)
3
传统视频直播平台流程
视频采集端 数据中心端 数据处理
CDN节点
压缩上传
由CDN链路 上传数据
处理完成 数据回流
![Page 4: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/4.jpg)
4
传统直播平台系统功能架构
就近访问
数据中心端 数据处理
转码
并行文件系统
视频分析
集群
分发
截图
切片
音频
视频 编码&封装
视频采集端
原视频 压缩数据
CDN节点 数据分发
CDN
CDN CDN
CDN
…
原视频 压缩数据
数据回流
![Page 5: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/5.jpg)
5
传统直播平台系统功能架构--分析
音频
视频 编码&封装
视频采集端
就近访问
CDN节点 数据分发
CDN
CDN CDN
CDN
…
数据中心端 数据处理
转码
并行文件系统
视频分析
集群
分发
截图
切片
原视频 压缩数据
原视频 压缩数据
数据推送
带宽昂贵
集中处理
计算能力有限
CDN 边缘计算能力弱
![Page 6: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/6.jpg)
6
GPU加速的智能CDN平台
视频采集端 数据中心端 数据存储
CDN节点 数据处理 分发
压缩上传
部分视频 切片数据备份
![Page 7: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/7.jpg)
7
GPU加速的智能CDN平台
音频
视频 编码&封装
视频采集端
就近访问
原视频 压缩数据
CDN 数据处理 数据分发
CDN
CDN CDN
CDN
GPU
转码
视频分析 存储系统
数据中心端 数据归档
备份
![Page 8: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/8.jpg)
8
GPU加速的智能CDN平台--分析
音频
视频 编码&封装
视频采集端
就近访问 数据中心端 数据归档
存储系统
原视频 压缩数据
CDN 数据处理 数据分发
CDN
CDN CDN
CDN
GPU
转码
视频分析
备份
增强CDN
边缘计算能力
节省带宽资源
更好用户体验
降低
TCO
![Page 9: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/9.jpg)
9
NVIDIA GPU 在视频分析中选择
训练端(training) 计算性能:7TFlops ~ 12TFlops (SP),22TFlops(16bit) GPU显存:12GB ~ 24GB 功耗:250W ~ 300W 尺寸:全高全长,占2个PCI-E槽位
线上识别(inference)
• 计算性能:2.2TFlops ~ 5.5TFlops(SP),22TOPS(INT8) • GPU显存: 4GB ~ 8GB • 功耗:50W ~ 75W • 尺寸:半高半长,占1个PCI-E槽位
用于训练端GPU
用于线上端GPU
![Page 10: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/10.jpg)
10
基于GPU的视频分析平台
Inference NVDEC
NVENC CUDA Filter Filter Filter
通过GPU构建高效的视频转码和分析平台
GPU支撑的深度学习训练平台实现高精确度的识别算法
GIE优化 DIGITS 支持GPU的线上平台
训练数据集
![Page 11: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/11.jpg)
11
NVDEC NVENC
使用GPU进行视频处理示例
GDDR5 GPU Memory
YUV
Frames
H.264
HEVC
MPEG-2
NVDEC NVENC
<<Resize
Kernel>>
GDDR5 GPU Memory
Resized
Frames
NVDEC NVENC
<<GIE Inference
Kernel>>
GDDR5 GPU Memory
Resized
Frames
YUV
Frames
Class
Bounding
Box
…
NVDEC NVENC
<<Video
Processing
Kernel>>
GDDR5 GPU Memory
YUV
Frames
NVDEC NVENC
GDDR5 GPU Memory
YUV
Frames
H.264
HEVC
1 2 3 4 5
![Page 12: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/12.jpg)
12
GPU硬件加速视频编解码
![Page 13: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/13.jpg)
13
GPU高效的编码性能 可处理1080p30视频流的路数
*Xeon E5 2.4GHz 14core, x264 preset slow
与XEON E5性能相比
Tesla M4 3.5x
Tesla M40 7x
Tesla M60 14x
2 7 5 7 5 7 5
7 5
7 5
7
5
7
5
0
5
10
15
20
25
30
![Page 14: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/14.jpg)
14
比特率 – 高质量预设参数 (2 B-FRAMES) NVENC h.264/AVC vs x264 同等质量下比特率= ±2%
内容强相关性
0
10
20
30
40
50
60
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
PSN
R
Actual Bitrate Mbps
RD Curve
x264 Av Mbps
NVENC Av Mbps
![Page 15: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/15.jpg)
15
NVIDIA VIDEO CODEC SDK 7.0
https://developer.nvidia.com/nvidia-video-codec-sdk
Fermi Kepler
Maxwell
(GM10X)
Maxwell
(GM20X) Pascal
H.264 encoding No Yes Yes1 Yes Yes
HEVC encoding No No No Yes Yes
MPEG2, MPEG-4,
H.264 decoding
Yes Yes Yes Yes Yes
HEVC decoding No No No Yes2 Yes
VP9 decoding No No No Yes Yes
Yes1 GM108除外,它不包含任何encoder或decoder硬件芯片
Yes2只有GM206芯片支持硬件的HEVC解码
![Page 16: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/16.jpg)
16
基于GPU的深度学习加速平台
![Page 17: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/17.jpg)
17
深度学习一个新的计算模式
基于深度学习的目标识别 DNN + Data + HPC
传统的计算机视觉处理 专家 + 大量时间
采用深度学习算法识别的结果已经超越了人类的识别能力
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2009 2010 2011 2012 2013 2014 2015 2016
Traditional CV
Deep Learning
ImageNet
![Page 18: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/18.jpg)
18
NVIDIA 深度学习SDK 支持几乎所有的深度学习框架
developer.nvidia.com/deep-learning-software
深度学习框架
计算视觉 语言识别 自然语言理解
目标检测 语言识别 语言翻译 推荐系统 情感分析
Mocha.jl
图片分类
NVIDIA 深度学习SDK
NCCL cuDNN cuBLAS cuSPARSE GIE
![Page 19: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/19.jpg)
19
GPU INFERENCE ENGINE (GIE) 高性能深度学习线上应用优化工具
developer.nvidia.com/gie
训练好的 深度神经网络
嵌入式设备
自带驾驶平台
CDN/数据中心
网络层复用
定制化kernel
优化的batch size
支持FP16优化
轻松部署管理
针对不同平台自动优化
GIE
![Page 20: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/20.jpg)
20
GIE 性能测试数据比较
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
400.00
450.00
batch=1 batch=2 batch=4 batch=10 batch=16
GIE 性能数据测试对比
Caffe CPU
Caffe cuDNNv5
GIE Inference
测试平台信息
GPU M4, 2.2Thlops, 4GB Memory
CPU Intel Xeon CPU E5 2.8GHz
CUDA CUDA 8.0
OS Ubuntu 14.04 64bit
![Page 21: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/21.jpg)
21
多路视频实时分析
![Page 22: September 13, 2016 | Beijing 边缘计算揽洪荒之力挺直播大潮 GPU携深度学习助智能CDNimages.nvidia.com/cn/gtc/downloads/pdf/deep-learning/111 GPU 携深度学习助智能... ·](https://reader030.vdocuments.us/reader030/viewer/2022041211/5dd0d4add6be591ccb62e932/html5/thumbnails/22.jpg)
22 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
谢谢!