mobile: driving the next wave with gpu compute... · gpu compute delivers high performance ... data...

29
www.imgtec.com Douglas Watt 21 May 2015 Mobile: Driving the next wave with GPU compute

Upload: dangbao

Post on 07-Mar-2018

230 views

Category:

Documents


5 download

TRANSCRIPT

www.imgtec.com

Douglas Watt

21 May 2015

Mobile: Driving the next wave with GPU compute

© Imagination Technologies USA Summit 2

CPU

Compute

GPU

Graphics and compute

Today’s opportunity

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

Differentiated camera applications

Exposed for

highlights

Exposed for

Shadows

HDR

Android’s camera subsystem is now

modelled after professional camera

In an attempt to maintain a consistent

platform running on multiple SoCs, Google

limits rate of adoption of new features

Android

version

Addition to

android.camera.hal

Lollipop

KitKat

JB MR2

JB MR1

Camera

HAL

3.2 Noise reduction

3.1 None

3.0 None

2.0 HDR

JB

ICS

ICS

Auto focus

Video stabilization

1.0 Face detection

© Imagination Technologies USA Summit 3

Today’s opportunity

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

Differentiated camera applications

CPU

Compute

GPU

Graphics and compute

Head

in focus

Wings

in focus

Everything

in focus

GPU compute delivers high performance

for many image processing algorithms

OEMs will choose SoCs that allow them to

differentiate their Android products

Features most requested are computational

photography and computer vision:

Sensor processing – stereo, array and ToF

Panoramas – real-time and high-res

Depth of field (focus stacking)

Gesture recognition

Augmented reality – real lighting

Bringing new features to market fast

requires:

large amounts of processing power

programmability

© Imagination Technologies USA Summit 4

SMP configurations unlikely to scale efficiently beyond four CPUs

GPU multi-processor and multi-pipe configurability enables far more

extensive processor scaling

OpenCL unlocks the full potential of GPUs

GPU

Graphics and compute

USSE1

GPU increasingly dominates SoC area Particularly in premium SoCs

CPU

Compute

2012 2013 2014

65nm SoC

CPU1

2015

© Imagination Technologies USA Summit 5

CPU

Compute

GPU increasingly dominates SoC area Particularly in premium SoCs

GPU

Graphics and compute

SMP

USSE (4 pipes)

USSE (4 pipes)

USSE (4 pipes)

USSE (4 pipes)

40nm SoC

CPU1 CPU0

Sch

ed

ule

r

2012 2013 2014 2015

SMP configurations unlikely to scale efficiently beyond four CPUs

GPU multi-processor and multi-pipe configurability enables far more

extensive processor scaling

OpenCL unlocks the full potential of GPUs

© Imagination Technologies USA Summit 6

GPU increasingly dominates SoC area Particularly in premium SoCs

CPU

Compute

GPU

Graphics and compute

28nm SoC

SMP

CPU1 CPU0 CPU3 CPU2

Sch

ed

ule

r

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

2012 2013 2014 2015

SMP configurations unlikely to scale efficiently beyond four CPUs

GPU multi-processor and multi-pipe configurability enables far more

extensive processor scaling

OpenCL unlocks the full potential of GPUs

© Imagination Technologies USA Summit 7

GPU increasingly dominates SoC area

CPU

Compute

GPU

Graphics and compute

28nm SoC

SMP

CPU1 CPU0 CPU3 CPU2

Sch

ed

ule

r

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

2012 2013 2014

100s of Giga Flops

2015

SGX 544

Rogue Series 8

Rogue Series 7

Rogue Series 6

© Imagination Technologies USA Summit 8

Mobile SoCs provide a single unified

system memory that is shared between all

the hardware blocks

The available bandwidth is often 10x less

than a Desktop PCIe bus

SoC bandwidth does not scale

CPU

Compute

GPU

Graphics and compute

SMP

CPU1 CPU0 CPU3 CPU2

Sch

ed

ule

r

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

USC (16 pipes)

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

© Imagination Technologies USA Summit 9

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

Android’s problem with buffer copies

CPU

Compute

Display Subsystem

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

VDE Encode, Decode

ISP Camera

GPU

Graphics and compute

YUV input YUV copy

Unnecessary copying of

buffers can cripple

performance and power

RGB copy RGB output

Android dictates the formats of camera

and video data presented to apps

developers

The OS APIs may copy frames from one

format to another

Unnecessarily increases bandwidth

Unnecessarily reduces achievable GPU

compute performance

Performance losses can be quickly

compounded, especially when

processing HD video content

© Imagination Technologies USA Summit 10

A suite of software extensions that

enables efficient interoperability of

software running on PowerVR GPUs with

many other SoC hardware blocks

Interoperable with:

CPU

ISP

VDE

PowerVR imaging framework for Android The basis of OEM product differentiation

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

© Imagination Technologies USA Summit 11

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

VDE Encode, Decode

ISP Camera

GPU

Graphics and compute

Display Subsystem

PowerVR imaging framework for Android

Software extensions enable images

written by the ISP to be directly read by

the GPU

Gralloc and EGL used to maintain data

interoperability between the ISP/camera HAL

and the GPU

EGL Image pointers enable separate access

to the Y, U and V planes

EGL extensions enable mapping of image

data into OpenCL and OpenGL ES data

types

Gralloc buffer

NV12

Y

UV

image2d_t

EGLImage KHR

image2d_t

EGLImage KHR

How it works

USC1 USC0

USC3 USC2

• Denoise filter

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading • Denoise filter

© Imagination Technologies USA Summit 12

Software extensions enable images

written by the ISP to be directly read by

the GPU

Gralloc and EGL used to maintain data

interoperability between the ISP/camera HAL

and the GPU

EGL Image pointers enable separate access

to the Y, U and V planes

EGL extensions enable mapping of image

data into OpenCL and OpenGL ES data

types

Other extensions enable interoperability

with the CPU and VDE

Many complex vision and computational

software pipelines can be created,

incorporating the VDE and other

compatible hardware on the SoC

PowerVR imaging framework for Android How it works

GPU

Graphics and compute

CPU

Compute

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

Display Subsystem

USC3 USC2

CPU1 CPU0

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading • Denoise filter • Denoise filter

• H264 encode

© Imagination Technologies USA Summit 13

Improved picture quality for a given

camera sensor

Improves low-light performance

PowerVR imaging framework examples Denoising

GPU

Graphics and compute

CPU

Compute

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

Display Subsystem

USC3 USC2

CPU1 CPU0

Without GPU compute With GPU compute • Denoise filter • Denoise filter

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading

• H264 encode

© Imagination Technologies USA Summit 14

Reveal full dynamic range of video

Compensate the excess or lack of

light

Bring out shadows and highlight

details

PowerVR imaging framework examples HDR

With GPU compute Without GPU compute

GPU

Graphics and compute

CPU

Compute

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

• YUV-to-RGB conversion

• Blend

• Tone map

• H264 decode

• Decode additional bits

© Imagination Technologies USA Summit 15

Reduces frame-to-frame jitter when

user is walking/in motion

Provides smooth recording and

playback of user-generated content

Improves low-light performance

PowerVR imaging framework examples Image stabilization

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

• H264 encode

• Apply transformation matrix

• Read gyro data

• Construct transformation matrix

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading Without GPU compute With GPU compute

© Imagination Technologies USA Summit 16

PowerVR imaging framework examples Face detection

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

Accurate face detection enables

camera auto-focus and auto-

exposure

Enables selective high-fidelity

encoding of key regions of interest

(and bit-rate savings on background)

Without GPU compute With GPU compute

• Blob detector

• Auto focus

• Auto exposure

• Integral image

• Face detect

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading

© Imagination Technologies USA Summit 17

PowerVR imaging framework examples Beautification

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

Without GPU compute With GPU compute

• H264 encode • Poisson Cloning

• Bilateral filter

• Bilateral filter • Integral image

• Face detect

• Blob detector

• Auto focus

• Auto exposure

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading

Skin smoothing, blemish and wrinkle

removal

Popular feature in image and video

sharing apps

© Imagination Technologies USA Summit 18

PowerVR imaging framework examples Background removal

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

Without GPU compute With GPU compute

• H264 encode • Background subtraction

• Person detect

• Integral image

• HoG • Generate

depth map

• Blob detector

• Auto focus

• Auto exposure

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading

• RGB sensor

• ToF/depth sensor

Immersive video conferencing

Reduced bandwidth

Live presentations with presenter

overlaid onto PowerPoint

Gaze correction preserves eye

contact

© Imagination Technologies USA Summit 19

PowerVR imaging framework examples Augmented reality

Perspective correct positioning of objects in

screen

Mimicking light sources to improve the reality

of the object in scene

Improved online shopping experience

Without GPU compute With GPU compute

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1

• Compose camera stream with 3D object

• Apply lighting to 3D object

• Light source calculation

• Light source calculation

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading

CPU0 • 3D object

selection

© Imagination Technologies USA Summit 20

PowerVR imaging framework examples Security cameras

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

• H264 encode • Track • Object detect

• HoG • Integral image

• Face detect

• Blob detector

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading With GPU compute

• Face and object recognition

Mass transit monitoring and security

Identify known persons in crowds

Identify suspicious objects

Reduced CCTV operator costs

© Imagination Technologies USA Summit 21

PowerVR imaging framework examples Retail analytics

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

• H264 encode • Track • Person and

object detect

• Integral image

• HoG • Generate

depth map

• Blob detector

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading With GPU compute

• RGB sensor

• ToF/depth sensor

• Face and object recognition

Providing supermarkets retail information

akin to retail websites

Track people in store, demographics (age,

gender, ethnicity), dwell time

Feedback to retailers and advertisers

Upsell via in-store digital signage

© Imagination Technologies USA Summit 22

PowerVR imaging framework examples

GPU

Graphics and compute

CPU

Compute

Mem

ory

contr

olle

r, c

aches a

nd inte

rconnect

RPU Wi-Fi, Bluetooth,sensors connectivity R

F

VDE Encode, Decode

ISP Camera

Display Subsystem

USC1 USC0

USC3 USC2

CPU1 CPU0

Automotive

• Headlight detection

• Bilateral control algorithm

• Heuristics

• Object recognition

• Kalman smoothing

• Panoramic stitching

• Markov random field

• Bilateral filter

• Ransac detection

• Turn analysis

• Spatial post-processing

• Fish-eye

• Image segmentation

• HoG calculation

• Sensor acquisition

• Defective pixel correction

• Auto white balance

• Lens shading

• Camera

• Radar

• Infrared / thermal

• Ultrasonic

• ToF / depth

Pedestrian detection

Lane departure warning

Traffic sign recognition

Night vision

Intelligent high-beam control

Adaptive cruise control

Headway monitoring and collision

avoidance

Surround view

With GPU compute

© Imagination Technologies USA Summit 23

Libraries PowerVR Imaging Framework Zero-copy Compute

Integration models Integrating the PowerVR imaging framework into Android

Application Software

Hardware Abstraction Layer

Linux Kernel

Android SDK

Hardware (System-on-Chip)

Heterogeneous Compute CPU Compute

VDE Encode, Decode

ISP Camera

OCL Driver UM

OpenGL ES UM

OpenVX

Camera Driver Video Driver

Camera HAL

Media Player

Stock App

Camera Stock

App

Media Player

API

Camera Services

API

Video HAL

Graphics Driver KM

GPU Graphics and compute

© Imagination Technologies USA Summit 24

Libraries PowerVR Imaging Framework Zero-copy Compute

Integration models Integrating the PowerVR imaging framework into Android

Application Software

Hardware Abstraction Layer

Linux Kernel

Android SDK

Hardware (System-on-Chip)

Heterogeneous Compute CPU Compute

VDE Encode, Decode

ISP Camera

OCL Driver UM

OpenGL ES UM

OpenVX

Camera HAL

Media Player

Stock App

Camera Stock

App

Media Player

API

Camera Services

API

Video HAL

Graphics Driver KM

GPU Graphics and compute

Differentiated Multimedia Application

• New features enabled by GPU compute

• Higher performance

• Lower power

Camera Driver Video Driver

© Imagination Technologies USA Summit 25

Libraries

Integration models Integrating the PowerVR imaging framework into Android

Application Software

Hardware Abstraction Layer

Linux Kernel

Android SDK

Hardware (System-on-Chip)

Heterogeneous Compute CPU Compute

VDE Encode, Decode

ISP Camera

Camera HAL Video HAL

GPU Graphics and compute

PowerVR Vision Demo

Fish-eye effect Beautification

HDR mode

Differentiated Multimedia Application – Actual OEM product

© Imagination Technologies USA Summit 26

Ecosystem Partner Silicon Vendor OEM

OEM Product

BSP

How we can engage

PowerVR based SoC

PVR imaging framework

License a PowerVR GPU for your next product

Incorporate the PowerVR imaging framework into your Android BSP

If you provide custom camera software to your OEMs, incorporate the framework into this software as well

PowerVR based SoC

Select a PowerVR-based SoC for your next product

Identify premium features you wish to support using GPU compute

Develop the required software algorithms yourself – or license from Imagination and its ecosystem partners

Optional camera platform

Obtain an OEM platform that supports our PowerVR imaging framework

Prototype your computer vision and photography algorithms using our OpenCL SDK

Camera / MM app

BSP

PVR imaging framework

Optional camera platform

OEM Product

BSP

PVR imaging framework

Demo of new feature

SoC

© Imagination Technologies USA Summit 27

How we can engage

Many development boards and OEM products are now available in the market with a PowerVR

Rogue GPU and OpenCL driver

Most platforms now support PowerVR imaging framework extensions

Development boards

Meizu MX4

MediaTek MT6595

Rogue G6200

Merrii OptimusBoard

AllWinner A80

Rogue G6230

Dell Venue 7/8

Intel Atom Z3480

Rogue G6430

ASUS ZenFone 2

Intel Atom Z3460

Rogue G6430

© Imagination Technologies USA Summit 28

Conclusion

Imagination’s PowerVR imaging framework for Android has been successfully

deployed by OEMs in mobile and tablet products, enabling new camera and

multimedia applications

The framework’s zero-copy buffer management extensions were crucial to building

apps that efficiently leverage all available hardware resources (ISP, CPU, GPU

compute and VDE)

The resulting application software minimizes SoC bandwidth while delivering high

performance and low power consumption

Game changing technology from Imagination

www.imgtec.com/gpucompute

www.imgtec.com