Download - Image and Vision Processing on Tegra K1 | GTC 2014on-demand.gputechconf.com/gtc/2014/presentations/S4873-image-vision-processing-tegra-k...Result = Data for advanced user interface

IMAGE AND VISION PROCESSING

ON TEGRA K1 Elif Albuz

IMAGE AND VISION USE CASES

Driven by using camera as a sensor

Augmented Reality

Face, Body and Gesture Tracking

Computational Photography and

Videography

3D Scene/Object Reconstruction

MOBILE VISION COMPUTING

Pro

cess

ing D

em

ands

Time

Photography Input = 2D Camera

Processors = ISP + CPU Product = Static Images

Computational Photography Input = MEMS + 2D Camera

Processors = ISP + CPU + GPU Result = Enhance Images and Videos

Mobile Vision Computing Input = MEMS + Depth Camera Processors = ISP + CPU + GPU

Result = Data for advanced user interface and environment modeling

VISION COMPUTING APP CATEGORIES Vision Computing

3D Reconstruction (constructs 3D geometry)

Environmental Feature Tracking Face and gesture tracking Object Reconstruction

Scene Reconstruction

Tracking (constructs positions and motions)

Indoor/Outdoor Positional Tracking Body Modeling

Facial Modeling

Body Tracking

3D Grid with SFM Ped/car detection & tracking

AUGMENTED REALITY

High-Quality Reflections, Refractions, and Caustics in Augmented Reality and their Contribution to Visual Coherence

P. Kán, H. Kaufmann, Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria

HYPER-REALISM

Ray-tracing and light-field

calculations running today on CUDA laptop PC – 50+ Watts

Ongoing research to use depth cameras to reconstruct global

illumination model in real-time

Need on mobile devices at 100x less power = 0.5W

SIMULTANEOUS LOCALIZATION AND MAPPING

WHAT IS NEEDED? Accelerated image & vision processing

— Tegra K1: CPU, GPU, ISP

Camera flexibility and handling of new sensor types

— Android HAL V3, V4L

Low latency routing of image streams to GPU

— EGLStreams

Integrated handling of various sensors and camera

— Global Time Stamps

Effective image & vision programming frameworks

— VisionWorks

TEGRA K1 PLATFORM Desktop GPGPU

features and tools on mobile

GPGPU Compute

CUDA Tools and Libraries

Advanced Graphics

THREE GENERATIONS OF REFINEMENT

Tesla - 2006

Fermi - 2010

Kepler - 2012

TEGRA K1: A MAJOR LEAP FORWARD FOR MOBILE & EMBEDDED APPLICATIONS KEPLER GPU, 192 CORES CUDA 12GB/S BANDWIDTH VIDEO IMAGE COMPOSITOR (VIC)

HD Video Processor 1080p24/30 Video Decode 1080p24/30 Video Encode H.264 | MPEG4 | VC1 | MPEG2 VP8

Kepler GeForce®

GPU w/CUDA

OpenGL-ES nextgen

192 Stream Processors

2D Graphics/Scaling

DAP x5 (12S/TDM)

HDMI eDP/LVDS

ARM

7

Audio

Pro

cess

or Image Processor

25MP Sensor Support ISP 1080p60 Enhanced JPEG Engine

PCIe* G2 x4 + x1

CSI x4 + x4

SATA2 x1 USB 2.0 x3

Security Engine

Display x2

NOR Flash

UART x4 I2C x5

DDR3 Ctlr 64b

800+ MHz

SPI x4 SDIO/MMC x4

28 nm HPM 23x23mm, 0.7mm pitch HS-FCBGA

USB 3.0* x2

Quad Cortex-A15

4x Cores (1+ GHz) NEON SIMD 2 MB L2 (Shared) ARM Trust Zone

Shadow LP C-A15 CPU

CPU

Quad-core A15, NEON

Unified memory, access from CPU and GPU

low-power Shadow core


Kepler GeForce®

GPU w/CUDA

OpenGL-ES nextgen


2D Graphics/Scaling

DAP x5 (12S/TDM)

HDMI eDP/LVDS

ARM

7

Audio

Pro

cess

or Image Processor


PCIe* G2 x4 + x1

CSI x4 + x4

SATA2 x1 USB 2.0 x3

Security Engine

Display x2

NOR Flash

UART x4 I2C x5

DDR3 Ctlr 64b

800+ MHz

SPI x4 SDIO/MMC x4


USB 3.0* x2

Quad Cortex-A15


Shadow LP C-A15 CPU

4 +1

IMAGE PROCESSOR

2 ISPs, independently programmable

25Mp camera support

(upto 250Mp)

1.2Gp throughput

GPGPU interoperability


Kepler GeForce®

GPU w/CUDA

OpenGL-ES nextgen


2D Graphics/Scaling

DAP x5 (12S/TDM)

HDMI eDP/LVDS

ARM

7

Audio

Pro

cess

or Image Processor


PCIe* G2 x4 + x1

CSI x4 + x4

SATA2 x1 USB 2.0 x3

Security Engine

Display x2

NOR Flash

UART x4 I2C x5

DDR3 Ctlr 64b

800+ MHz

SPI x4 SDIO/MMC x4


USB 3.0* x2

Quad Cortex-A15


Shadow LP C-A15 CPU

x2

VIDEO ENCODE/DECODE 2X 1920X1080@30FPS CUVID/CUVENC VIDEO ENCODE/DECODE INTERFACE MOTION ESTIMATION ONLY MODE


Kepler GeForce®

GPU w/CUDA

OpenGL-ES nextgen


2D Graphics/Scaling

DAP x5 (12S/TDM)

HDMI eDP/LVDS

ARM

7

Audio

Pro

cess

or Image Processor


PCIe* G2 x4 + x1

CSI x4 + x4

SATA2 x1 USB 2.0 x3

Security Engine

Display x2

NOR Flash

UART x4 I2C x5

DDR3 Ctlr 64b

800+ MHz

SPI x4 SDIO/MMC x4


USB 3.0* x2

Quad Cortex-A15


Shadow LP C-A15 CPU

HD DEC/ENC PROCESSOR

VIDEO ENCODER Dedicated hardware accelerator

Current frame

Ref. frame

Compressed video (h.264, ..)

Encoder Motion Vectors/Track info

KEPLER Architecture 192 CUDA Cores, SM3.2 ISA Compatible to GeForce, Quadro, Tesla 64kb L1 Cache and Shared Memory 128kb L2 Cache 128 kb Register File


Kepler GeForce®

GPU w/CUDA

OpenGL-ES nextgen


2D Graphics/Scaling

DAP x5 (12S/TDM)

HDMI eDP/LVDS

ARM

7

Audio

Pro

cess

or Image Processor


PCIe* G2 x4 + x1

CSI x4 + x4

SATA2 x1 USB 2.0 x3

Security Engine

Display x2

NOR Flash

UART x4 I2C x5

DDR3 Ctlr 64b

800+ MHz

SPI x4 SDIO/MMC x4


USB 3.0* x2

Quad Cortex-A15


Shadow LP C-A15 CPU

GPU




— Android HAL V3, V4L, Camera API


— EGLStreams




— VisionWorks

Camera ISP (Image Signal Processor)

— Little or no programmability

— Data flows thru compact hardware pipe

— Scan-line-based - no global memory

— Best perf/watt

CAMERA IMAGE PROCESSING

~760 math Ops

~42K vals = 670Kb

300MHz ~250Gops

VISUAL SENSOR REVOLUTION Single RGB sensors just the start of mobile visual revolution

— IR sensors – LEAP Motion, eye-trackers

— Active illumination depth sensors – TOF and structured light

Multi-sensors: Stereo pairs -> Plenoptic array -> Depth cameras

— Stereo pair can enable object scaling and enhanced depth extraction

— Plenoptic Field processing needs FFTs and ray-casting

Hybrid visual sensing solutions

— Different sensors mixed for different distances and lighting conditions

Dual Camera LG Electronics

Plenoptic Array Pelican imaging

Capri Structured Light 3D Camera PrimeSense

Camera HAL v1 focused on simplifying basic camera apps

— Difficult or impossible to do much else

— New features require proprietary driver extensions

— Extensions not portable - restricted growth of third party app ecosystem

Camera HAL v3 is a fundamentally different API

— Flexible primitives for building sophisticated use-cases

— Interface is clean and easily extensible

— Apps can have more control, and more responsibility

Enables sophisticated camera applications

Faster time to market and higher quality

ANDROID CAMERA HAL V3

KHRONOS CAMERA API Specification available 2015

Also FCAM-Based

— Will be available for any OS

— FCAM running on NVIDIA Linux today

No global state

— State travels with image requests

— Every stage in the pipeline may have different state

— Enables fast, deterministic state changes

Synchronize devices

— Lens, flash, sound capture, gyro…

— Devices can schedule Actions

— E.g. to be triggered on exposure change

KHRONOS CAMERA API REQUIREMENTS Application control over ISP processing (including 3A)

— Including multiple, re-entrant ISPs

Control multiple sensors with synch and alignment

— E.g. Stereo pairs, Plenoptic arrays, TOF/structured light depth cameras

Enhanced per frame detailed control

— Format flexibility, Region of Interest (ROI) selection

Global timing & synchronization

— E.g. Between cameras and MEMS sensors

Flexible processing/streaming

— Multiple input and output streams

— RAW, Bayer or YUV Processing

— Streaming of rows (not just frames)

Enable new camera

functionality not available

on current platforms and

align with future platform

directions for easy adoption

TEGRA K1 CAMERA DATAFLOW

Flexible routing of sensor data to ISPs and unified memory

Enables sensor processing by any combination of GPUs, CPUs and ISPs

Kepler

GPGPU

Camera

One

Camera

Two

Cam

era

Routin

g

Unified

Memory

CPU CPU

CPU CPU

ISP-A

ISP-B

Camera ISP (Image Signal Processor)

— Little or no programmability

CPU

— Single processor or Neon SIMD - running fast

— Makes heavy use of general memory

— Non-optimal performance and power

GPU

— Programmable and flexible

— Many way parallelism - run at lower frequency

— Efficient image caching close to processors

— BUT cycles frames in and out of memory

COMPUTATIONAL PHOTOGRAPHY

~760 math Ops

~42K vals = 670Kb

300MHz ~250Gops






— EGLStreams




— VisionWorks

EGL 1.5 RELEASED EGL 1.5 brings functionality from multiple

extensions into core

— Increased reliability and portability

EGLImages

— Sharing textures and renderbuffers

Context Robustness

— Defending against malicious code

EGLSync objects

— Improved OpenGL /OpenCL interop

Platform extensions

— Standardized interactions for multiple OS e.g. Android and 64-bit platforms

sRGB colorspace rendering

Applications

OS and Display

Platforms

Application Portability EGL abstracts graphics context

management, surface and

buffer binding and rendering

synchronization

API Interop EGL provides efficient

transfer of data and events

between Khronos APIs






— EGLStreams




— VisionWorks

HOW MANY SENSORS ARE IN A SMARTPHONE? Light

Proximity

2 cameras

3 microphones

Touch

Position

— GPS

— WiFi (fingerprint)

— Cellular (tri-lateration)

— NFC, Bluetooth (beacons)

Accelerometer

Magnetometer

Gyroscope

Pressure

Temperature

Humidity

27

19

STREAMINPUT SENSOR ABSTRACTION API

Apps Need Sophisticated Access to Sensor Data Without coding to specific

sensor hardware

Apps request semantic sensor information StreamInput defines possible requests, e.g.

Read Physical or Virtual Sensors e.g. “Game Quaternion”

Context detection e.g. “Am I in an elevator?”

StreamInput processing graph provides

optimized sensor data stream High-value, smart sensor fusion middleware can connect

to apps in a portable way

Apps can gain ‘magical’ situational awareness

Advanced Sensors Everywhere Multi-axis motion/position, quaternions,

context-awareness, gestures, activity monitoring, health and environmental sensors

Sensor Discoverability

Sensor Code Portability

http://www.google.com/imgres?imgurl=http://www.1800pocketpc.com/blog/wp-content/uploads/2009/03/cellvibrate.jpg&imgrefurl=http://www.1800pocketpc.com/2009/03/04/vibra-10-command-line-utility-to-vibrate-your-windows-mobile-phone.html&usg=__zXgm6oY3lUCJYgE81gZqjedXQ_Y=&h=180&w=200&sz=10&hl=en&start=1&zoom=1&um=1&itbs=1&tbnid=Qx-j3cJkkyiwjM:&tbnh=94&tbnw=104&prev=/search?q=mobile+phone+vibra&um=1&hl=en&lr=&rls=ig&biw=1208&bih=893&tbm=isch&ei=ZFYATqitAZGTswaW39z7DA

http://www.google.com/imgres?imgurl=http://www.sensorwiki.org/lib/exe/fetch.php/sensors/gyroscope_axes_crop.jpg?w=&h=&cache=cache&imgrefurl=http://www.sensorwiki.org/doku.php/sensors/gyroscope&usg=__3RYmBCLmbQcYOZIIqrDQb-A4zPQ=&h=281&w=435&sz=21&hl=en&start=10&zoom=1&um=1&itbs=1&tbnid=P9Ugo13KjtfNaM:&tbnh=81&tbnw=126&prev=/search?q=gyroscope+mems&um=1&hl=en&sa=X&rls=ig&biw=1208&bih=893&tbm=isch&ei=DKEATpuLJtH5sga3wLzEDQ






— EGLStreams




— VisionWorks

MOBILE & EMBEDDED DEVELOPERS NEED HELP!

Control, coordinate and synchronize a diverse

array of mobile sensors

Write maintainable code for a heterogeneous mix of CPUs, GPUs and DSPs

Handle a diverse selection of emerging depth camera

technologies

Create fluid 60Hz experiences on battery-powered mobile devices

Leverage dedicated vision hardware for minimized power

Write code that is deployable across multiple devices,

platforms and OS

DESKTOP TO MOBILE DEVELOPMENT Image & Vision Processing code development starts at desktop

PC

Tegra K1 enables easy migration through libraries and tools available across platforms

TEGRA K1 CUDA DEVELOPMENT

CUDA-Aware Editor

Automated CPU to GPU code refactoring

Semantic highlighting of CUDA code

Integrated code samples & docs

Nsight Debugger

Simultaneously debug of CPU and GPU

Inspect variables across CUDA threads

Use breakpoints & single-step debugging

Nsight Profiler

Quickly identifies performance issues

Integrated expert system

Source line correlation

Cross platform development

Native memcheck, GDB, nvprof

VisionWorks

OpenCV

CUDA LIBRARIES

NPP CUFFT

CUBLAS CUDA Math Lib

NPP LIBRARY (CUDA) Data exchange &

initialization

— Set, Convert, CopyConstBorder, Copy, Transpose, SwapChannels

Arithmetic & Logical Ops

— Add, Sub, Mul, Div, AbsDiff

Threshold & Compare

— Threshold, Compare

Color Conversion

— RGB To YCbCr (& vice versa),

ColorTwist, LUT_Linear

JPEG

— DCTQuantInv/Fwd, QuantizationTable

Functions

— FilterBox, Row, Column, Max, Min, Median, Dilate, Erode, SumWindowColumn/Row

Geometry Transforms

— Mirror, WarpAffine / Back/ Quad, WarpPerspective / Back / Quad, Resize

Statistics

— Mean, StdDev, NormDiff, MinMax,

Histogram, SqrIntegral,

RectStdDev

Computer Vision

— ApplyHaarClassifier,

— GraphCuts

OPENCV LIBRARY Initially developed by Intel for single-core x86 CPUs

— Version 2.4.5 >900 functions (x the datatypes)

— OpenCV4Tegra - Accelerated CUDA+NEON+GLSL+TBB multithreading

General Image

Processing

Segmentation Machine Learning,

Detection Image Pyramids Transforms Fitting

Image processing

Video, Stereo, and 3D

Camera Calibration Features Depth Maps Optical Flow Inpainting Tracking

OpenCV

OPENCV-GPU VALUE ADD ON LOGAN

Speedup

0

1

2

3

4

5

6

7

core filter imgproc objdetect

Jetson Kepler GPU /Quadcore A15 Public OCV for Mobile/Embedded

Average speedup for different function categories with Logan GPU compared to public source code.

VISIONWORKS MOTIVATION

Advanced

Silicon

+ = Widespread vision

processing in embedded,

mobile and automotive

devices and applications

VisionWorks Simplify vision programming

Fully optimized and accelerated

Modular and Extensible

VISIONWORKS

ADAS – Advanced Driver Assistance Systems

Computational Photography

Augmented Reality Robotics

Power Efficient Computer Vision

Powered with CUDA

Supported on Tegra K1 Linux and Android

ACCELERATING • Advanced Driver Assistance • Computational Photography

• Augmented Reality • Robotics

• Deep Learning and more…

Version 0.10 is available for registered partners!

VisionWorks Primitives

VISIONWORKS SOFTWARE STACK

Application Code

Tegra K1

CUDA

Classifier Corner

Detection

3rd Party

Primitives

Sample Pipelines

Object

Detection

3rd Party

Pipelines …

…

SfM

NVIDIA supplied vision

primitives using CUDA and

Tegra processing resources

Customers and developers

can create their own

primitives e.g. using CUDA

OpenVX enables power

efficient and flexible

chaining of primitives

NVIDIA provides sample

pipelines for common use

cases

Applications use combination

of direct primitives, the

OpenVX framework and

supplied pipelines

Framework

OPENVX – POWER EFFICIENT VISION ACCELERATION

Khronos, open, cross-vendor vision API

— Focus on mobile and embedded systems

Foundational API for vision acceleration

— Useful for middleware or by applications

— Enables diverse efficient implementations

Complementary to OpenCV

— Which is great for prototyping

Open source sample

implementation

Hardware vendor

implementations

OpenCV open

source library VisionWorks

Sample Pipelines

Application

OpenCV

OPENVX GRAPHS – THE KEY TO EFFICIENCY Directed graphs for processing power and efficiency

— Each Node can be implemented in software or accelerated hardware

— Nodes may be fused to eliminate memory transfers

— Processing can be tiled to keep data entirely in local memory/cache

EGLStreams route data from camera and to application

Can extend with “VisionWorks” nodes using CUDA

OpenVX Node

VisionWorks Node

OpenVX Node

VisionWorks Node

Application Native

Camera

Control

Example OpenVX Graph

OPENVX AND OPENCV ARE COMPLEMENTARY

Governance Community driven open source

with no formal specification

Defined and

implemented by Khronos

Portability APIs can vary depending on processor Tegra K1 mobile platforms

Scope Very wide

1000s of imaging and vision functions

Multiple camera APIs/interfaces

Tight focus on hardware accelerated functions

for mobile vision

Use external camera API

Efficiency Memory-based architecture

Each operation reads and writes

memory

Graph-based execution

Optimizable computation, data transfer

Use Case Rapid experimentation Production development & deployment

OPENVX 1.0 FUNCTION OVERVIEW Core data structures

— Images and Image Pyramids

— Processing Graphs, Kernels, Parameters

Image Processing

— Arithmetic, Logical, and statistical operations

— Multichannel Color and BitDepth Extraction and Conversion

— 2D Filtering and Morphological operations

— Image Resizing and Warping

Core Computer Vision

— Pyramid computation

— Integral Image computation

Feature Extraction and Tracking

— Histogram Computation and Equalization

— Canny Edge Detection

— Harris and FAST Corner detection

— Sparse Optical Flow

OpenVX 1.0 defines

framework for

creating, managing and

executing graphs

Focused set of widely

used functions that are

readily accelerated

Implementers can add

functions as extensions

Widely used extensions

adopted into future

versions of the core

OpenVX Specification

Evolution

VISIONWORKS PRIMITIVES – JAN 2014

Sobel

Convolve

Bilateral Filter

Integral Image

Integral Histogram

Corner Harris

Corner FAST

Image Pyramid

Optical Flow PyrLK

Optical Flow Farneback

Warp Perspective

Hough Lines

Fast NLM Denoising

Stereo Block Matching

IME (Iterative Motion

Estimation)

HOG (Histogram of

Oriented Gradients)

Soft Cascade Detector

Object Tracker

TLD Object Tracker

SLAM

Path Estimator

MedianFlow Estimator

OPENVX AND CUDA ARE COMPLEMENTARY

Use Case GPGPU Programming Domain targeted Vision processing

Architecture Language-based Library-based

- no separate compiler required

Target Hardware

‘Exposed’ architected memory model – programmer manages memory

Abstracted node and memory model - diverse implementations can be optimized

for power and performance

Precision Full IEEE floating point mandated Minimal floating point requirements –

optimized for vision operators

Ease of Use General-purpose math and other

libraries Fully implemented vision operators and

framework ‘out of the box’

Use CUDA to build new VisionWorks OpenVX Nodes

VISIONWORKS SAMPLE PIPELINES (V0.10)

Structure From Motion/SLAM

Pedestrian Detection Vehicle detection Object tracking

Dense optical flow Active Shape Model Denoising

VISIONWORKS – LOOKING FORWARD

Enable multi-camera applications

3D sensors

Conformance with OpenVX once specification finalized

TEGRA K1 DEVELOPMENT PLATFORMS

JETSON X3 (TK1 PRO)

gigE, usb3.0, HDMI, CANBUS

running Vibrante Linux

AUTOMOTIVE GRADE

Coming to Android K1

& Other Linux

Devices soon..

QUESTIONS?

BACKUP

Download - Image and Vision Processing on Tegra K1 | GTC 2014on-demand.gputechconf.com/gtc/2014/presentations/S4873-image-vision-processing-tegra-k...Result = Data for advanced user interface

Top Related