applying deep learning vision technology to low-cost/power embedded systems

35
© 2016 Synopsys, Inc. 1 Applying Deep Learning Vision Technology to Low-cost, Low-power Embedded Systems: An Industrial Perspective Pierre Paulin Director of R&D 16 January 2016

Upload: jenny-midwinter

Post on 08-Feb-2017

205 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 1

Applying Deep Learning Vision Technology to

Low-cost, Low-power Embedded Systems:

An Industrial Perspective

Pierre Paulin

Director of R&D

16 January 2016

Page 2: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 2

Agenda

• Embedded Vision application

trends and challenges

• Synopsys Embedded Vision

Processor Overview

• Convolution Neural Networks

– Applications, requirements

– Dedicated CNN engine for EV

– Competitive analysis

• Summary & Final Thoughts

Page 3: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 3

Embedded Vision is Coming Fast

• Embedded Vision is the use of computer vision in embedded systems to interpret meaning from images or video

• In cars to improve safety

• Surveillance for detection and tracking

• In industrial automation to improve quality and control

• Estimated $300B+ market in 2020, 35% CAGR

0

50

100

150

200

250

300

350

2013 2014 2015 2016 2017 2018 2019 2020

Billio

ns o

f D

ollars

Vision Systems Shipments

Sources: ABI Research, Insight Media, Transparency

Market Research, Markets And Markets, Synopsys

Page 4: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

Wide Variety of Vision Applications

Cameras

Drones

Home AutomationRetailGaming Infotainment

Augmented RealityMobile SurveillanceADAS

Page 5: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 5

Autonomous Driving Buzz

1/14/2016 – U.S. Proposes Spending $4 Billion to Encourage Driverless CarsObama administration aims to remove hurdles to making autonomous cars more widespread

Wall Street Journal

8/17/2016 – Ford's self-driving car 'coming in 2021’ (BBC News)

8/24/2016 – Self-driving taxis roll out in Singapore -

beating Uber to it (The Guardian)

10/20/2016 – Elon Musk: You'll be able to summon your driverless Tesla

from cross-country (CNN Money)

10/25/2016 – Uber's Self-Driving Truck Makes Its First Delivery:

50000 Beers (Wired)

Page 6: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 6

Largest Embedded Vision Application SegmentAdvanced Driver Assistance Systems Driven By Safety Concerns

Source: IC Market Drivers, IC Insights, January 2015 & Trends and Opportunities in Driver Assistance and Automated Driving, IHS Automotive Sep 2015

Page 7: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 7

Video Surveillance Markets Growing Rapidly

• Global IP Video Surveillance Market

expected to grow at CAGR of 37.3%

from 2012-20

• Demand driven by

– Growing installations of IP cameras

– Need for surveillance cameras

with better video quality

– Limited ability for real-time human

analysis

http://www.alliedmarketresearch.com/IP-video-surveillance-VSaaS-market

3X Growth Forecast

2013 - 2019

Security (Airports, Govt, Banks, Casinos), Home Surveillance, Retail, Healthcare

Page 8: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 8

Less Efficient EV Options Dedicated Embedded Vision Processors

EV Challenges Require Embedded Vision Processors

Perf

orm

ance P

ow

er

Are

a

CPUs don’t have math horsepower for fast

2D vision processing

GPUs have high performance but large

areas and higher power

DSPs are designed for low power audio

and speech applications, not 2D video

FPGAs are good for prototyping but are

expensive and performance limited

Higher performance

Lower power

Smaller area

Can include a dedicated deep learning

(CNN) engine

Page 9: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 9

Embedded Vision Applications and

Power, Performance and Area (PPA) Requirements

Page 10: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 10

Vision Pipeline Example

Object detection pipeline

Grayscale

Conversion

Image

Pyramid

Detecting

Areas of

Interest in a

Frame

Non-max

Suppression

Draw Box

Page 11: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 11

Vision Pipeline Example

Video surveillance pipeline

Grayscale &

Image

Pyramid

Face

Detection

Tracking &

Detection

Cascade

Fusion &

Learning

Page 12: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 12

Vision Algorithm Computation

• Object detection

• Background

subtraction

• Feature extraction

• Image

segmentation

• Connected comp.

labeling

• Noise reduction

• Color space

conversion

• Gamma correction

• Image scaling

• Gaussian

pyramid

Simple Data-Level

Parallelism (DLP)

• Good spatial locality

• Good compute intensity

• Small context

More Complex DLP

• Complex data structures

• Irregular compute intensity

• Larger context

Scalar Processing

• General purpose compute

• Thread level parallelism

Pre-processing Selecting Areas

of Interest

Precise

Processing of

Selected Areas

Decision

Making

• Object recognition

• Tracking

• Feature matching

• Gesture

recognition

• Motion analysis

• Match/no match

• Flag events

CNN

RISC scalar

Multi-core Gen2

EV SIMD processorMulti-core Gen1

EV SIMD processorMulti-core

CNN Engine

Page 13: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 13

Sample Power, Performance and Area Targets

• Intelligent video surveillance applications

– Face detection & tracking, pose detection, gaze

estimation, gender recognition, age estimation

– People detection & counting for video surveillance

– Driver fatigue detection

– Advanced detection and tracking

– Implementation on

GPP and GP-GPU

– Typical customer

targets for

HD @30 fps

Based on 28 nm process node

<500 mW 1-2 mm2

10-500 GOP/s

1-10 W 50-100 mm2

Page 14: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 14

Sample Power, Performance and Area Targets

• ADAS

– Pedestrian, vehicle, traffic sign, lane detections

– Scene segmentation

– Implementation on

GPP and GP-GPU

– Typical customer

targets for

HD @30 fps

Based on 28 nm process node

100-2000 GOP/s

1-2 W 2-5 mm2

>100 W >100 mm2

Page 15: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 15

DesignWare® ARC EV6 Processor and CNN

- Vision-specific wide SIMD engine

- Optimized CNN engine

- Programming tools

Page 17: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 17

EV Processor Solution: EV6x with CNN Engine

Embedded Vision Programming Tools

Vision CPU (1 to 4 cores) CNN Engine

Option

Convolution

ALU Conv. 2D

AGUs CC MEMs

Cluster

Comm. Shared Mem.DMA

Classification

AXI Interconnect

User kernels

Ui

Uk

C/C++

OpenCL C

K1 Kn…

Kernel Lib

OpenCL C compiler, with

whole function vectorization

C/C++

compiler

Lib

Ui

Uj

Uk

Kn

Uk

Um

graph

CNN Graph

Mapping Tools

HAPS®

Rapid

Prototyping

Board

Virtual

Prototype

ALU Conv. 1D

AGUs CC MEMs

Coherency

ARConnect Sync Debug Power Mgmt.

Up to 880 MAC/cycleUp to 620 GOP/s

at 800 MHz

Core 4

Core 3

32b

Scalar

512b

Vector DSP

Core 2

Core 1

32b

Scalar

512b

Vector DSP

VCCMD$I$ VCCMD$I$

CNN

graphCn

CNN graph

node

Page 18: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 18

CNN – Convolution Neural Networks

Deep Learning Approach to Embedded Vision

Page 19: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 19

CNN for a Wide Range of Vision Applications

• Image classification, search similar images

• Object detection, classification & localization

– Any type of object(s), depending on training phase

• Face recognition

• Visual attention

• Facial expression recognition

• Gesture recognition / hand tracking

• Resolution upscaling

• Scene recognition and labelling, semantic segmentation

– Sky, mountain, road, tree, building, …

• Recent advocates

– Nvidia, Microsoft, Google, Baidu, Adobe, Qualcomm, Yahoo …

– Mobileye for autonomous driving carcar

skybuilding

building

road

Page 20: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 20

Pedestrian Detection: HoG vs. CNN

Page 21: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 21

Computation Requirements for CNN

Accuracy

Com

pu

tation

al co

mp

lexity

Lenet (1994)

4 layers

AlexNet (2012)

8 layers

100MByte

VGG-19 (2014)

19 layers

270MByte

GoogleNet (2014)

22 layer

20MByteResNet (2015)

152 layers!

10MByte

1 GOPs/frame

10 GOPs/frame

Page 22: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 22

Scene Segmentation

Source: Press Release by Toshiba and Denso, 17 Oct. 2016

Page 23: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 23

Super resolution using CNN

Source

Bicubic

Interpolation CNN Reference Source

Bicubic

Interpolation CNN Reference

“Image Super-Resolution Using Deep Convolutional Networks (2016), C. Dong et al.”

Page 24: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 24

Super-Resolution using Convolutional Neural Networks

• CNN’s deliver superior Super-Resolution for single image and video

• CNN’s for Super-Resolution require dedicated compute engine with high compute capacity

• Example “Image Super-Resolution Using Deep Convolutional Networks (2016), C. Dong et al.”

Requires 600 GMAC for one 4K frame

Page 25: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 25

CNN Graph Training and Porting

Image labeling

Graph

explore,

training

GPU farm

Code

vectorization

Tra

inin

gP

ort

ing

coeff.

Code

Object

detection

executable

CNN

graphGPP

CNN-optimized

processor

GP-GPU

Page 26: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 26

CNN Computation

• Convolution of multiple

inputs together

– Fixed kernel size

• Optional subsampling

– 1x, 2x, 4x

• Optional max-pooling

• Very regular, repetitive

computation

– Dominated by MAC

– Deterministic

• Non-linear activation

function

– Rectifier, Sigmoid,

Hyperbolic tangent

I0

IM-1

I1

O0

ON-1

M inputs

(XI * YI)Z kernels (K * K) with

associated weights

N outputs (XO * YO)

Oj = act(Bj+ (Iv x Kw) + …)

Convolution (x)

act

act

Activation (tanh, ReLU)…

Page 27: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 27

EV6x Second Generation CNN Engine for

Object Detection and Semantic Segmentation

- High performance, low power and area

- Fully programmable

carcar

skybuilding

building

Page 28: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 28

High-Performance EV6x CNN Engine

• Dedicated EV6x CNN Engine with

performance equal or better than GP-GPU

• Programmable to support full range of fixed point

CNN graphs

• State-of-the-art power-efficiency

• Real-time, high quality image classification, object

recognition, semantic segmentation

• Supports resolutions up to 4K

• Operates in parallel with Vision CPUs increasing

efficiency and throughput

AX

I Inte

rco

nn

ec

t

Vision CPU Core

32 bit

RISC

512-bit

Vector DSP

Cluster

Shared

Memory

DMA

AR

Co

nn

ec

t

CNN Engine

Convolution

Classification

Preliminary – Subject to Change

ALUConv. 2D

AGUs CC MEMs

ALU Conv. 1D

AGUs CC MEMs

Page 29: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 29

AlexNet on ImageNet

Quantization opportunities for recognition tasks

32-bit

floating point

16-bit

fixed point

vs

[Moons WACV2016]

Recognitio

n a

ccura

cy

Fixed-point word length

• 12 bit good compromise between

CNN recognition performance and

hardware cost

– 8 bit will cause recognition rate loss on

existing graphs

– 12 bit multiplier is almost half the area

of a 16 bit multiplier

12-bit

Page 30: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 30

CNN data precision – Qualcomm data

Page 31: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 31

CNN Competitive Analysis

Page 32: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 32

CNN Performance and Area Efficiency Comparison

Preliminary – Subject to Change

GM

AC

/s/

mm

2

10 10001

10

100

1000

300X

2X

100

GMAC/s

20X

14X

First gen

vision

processors

GP/GPU

EV6x Embedded

Vision Processor

w/integrated CNN

Circle area proportional

to logic area

Page 33: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 33

CNN Performance and Power Efficiency Comparison

Preliminary – Subject to Change

GM

AC

/s/

W

10 100 100010

100

1000

10000

11X

30X

GMAC/s

EV6x Embedded

Vision Processor

w/integrated CNN

First gen

vision

processors

GP/GPU

Circle area proportional

to logic area

Page 34: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

© 2016 Synopsys, Inc. 34

Less Efficient EV Options Dedicated Embedded Vision Processors

EV Challenges Require Embedded Vision Processors

Perf

orm

ance P

ow

er

Are

a

CPUs don’t have math horsepower

for fast 2D vision processing

GPUs have high performance but

large areas and higher power

DSPs are designed for low power

audio and speech applications, not

2D video

FPGAs are good for prototyping but

are expensive and performance

limited

High performance

Lower power

Smaller area

Dedicated deep learning (CNN) engine provides

PPA numbers compatible with surveillance,

ADAS and mobile targets

1000

GMACs/W

100-1000

GOP/sFew

mm2

Page 35: Applying Deep Learning Vision Technology to low-cost/power Embedded Systems

Thank You

Contact me at:

[email protected]