accelerating mobile augmented reality - nvidia · 2012-12-10 · accelerating mobile augmented...
TRANSCRIPT
1
© 2012 NVIDIA - Page 1
Accelerating Mobile Augmented RealityNeil TrevettVP Mobile Content, NVIDIAPresident, Khronos Group
© 2012 NVIDIA - Page 2
Topics
Why mobile devices will be ideal for Augmented Reality
The challenges for compelling mobile Augmented Reality
Trends in mobile SOCs acceleration and associated APIs
Emerging trends and directions in the future of mobile AR
How to start developing mobile AR on Tegra devices
2
© 2012 NVIDIA - Page 3
0
500
1000
1500
2000
Million Units
Computing Device TAMSmartphones
Tablets
Netbooks
Notebooks
Desktops
Personal Computing Redefined
Source: IDC, Gartner, Morgan Stanley
© 2012 NVIDIA - Page 4
Global Smartphone Market Share
Windows Mobile/Phone
3
© 2012 NVIDIA - Page 5
Vision-based Augmented Reality
Camera video stream sent to the compositor
3D AugmentationRendering
3D augmentationscomposited with video stream
CameraTracking
GPS, accelerometer,
gyro and camera images used to track the
camera’s location and orientation
Camera-to-scene transform locks the 3D rendering to the real world
© 2012 NVIDIA - Page 6
AR = Input AND Output Processing
Augmented Reality is uniquely challenging
Needs sophisticated INPUT AND OUTPUT processing
In perfect harmony
Generating 3D Graphical
Augmentations
Sensing and tracking the context and the scene around the user
Detecting user gestures and actions
Compositing real and synthetic elements to delight and inform
Augmented RealityApplication
4
© 2012 NVIDIA - Page 7
How Many Sensors are in a Smartphone?
LightProximity2 cameras3 microphones (ultrasound)TouchPosition
GPSWiFi (fingerprint)Cellular (tri-lateration)NFC, Bluetooth (beacons)
AccelerometerMagnetometerGyroscopePressureTemperatureHumidity
7
19
!
© 2012 NVIDIA - Page 8
The Vision of Wearable Computing - Tomorrow
Google Glass concept videoLooks Great! BUT…
… Google Glass hardware does not provide full FOV augmentations
… Not using vision tracking,just GPS
But wearable computers will be essential to consumer adoption of Augmented Reality..
Courtesy Googlehttp://www.youtube.com/watch?v=9c6W4CCU9M4
5
© 2012 NVIDIA - Page 9
State-of-the-art Mobile Vision-Based AR - Today
In the meantime a lot of development is happening with todays devices..
Combining GPS, accelerometer, gyro and camera tracking
Lots of current research into best feature tracking algorithms – all of them very compute intensive
Beachhead industrial applications could use todays devices…
Courtesy Metaiohttp://www.youtube.com/watch?v=xw3M-TNOo44&feature=related
© 2012 NVIDIA - Page 10
Accelerating AR to Meet User Expectations
Ingredients for great mobile AR…
Cameras with advanced controls
Lots of secondary sensors – GPS, sensor, gyro etc.
High performance image and vision processing
Fast 3D processing
Mobile is an enabling platform for Augmented Reality
Mobile SOC and sensor capabilities are expanding quickly
But we need mobile AR to be 60Hz buttery smooth AND low power
Power is now the main challenge to increasing quality of the AR experience
SOC =‘System On Chip’
Complete compute system minus memory and some peripherals
6
© 2012 NVIDIA - Page 11
The World’s First Mobile Quad Core,
with 5th Companion Core for Low PowerTegra 3
CPUCPUQuad Core, with 5Quad Core, with 5thth Companion CoreCompanion Core
—— Up to 1.4GHz Single Core, 1.3GHz Quad CoreUp to 1.4GHz Single Core, 1.3GHz Quad Core
GPUGPUUp to 3x Higher GPU PerformanceUp to 3x Higher GPU Performance
—— 12 Core12 Core GeForce GPUGeForce GPU
VIDEOVIDEOBluBlu--Ray Quality VideoRay Quality Video
—— 1080p1080p High Profile @ 40MbpsHigh Profile @ 40Mbps
AUDIOAUDIO HD AudioHD Audio, 7.1 channel surround, 7.1 channel surround
© 2012 NVIDIA - Page 12
Mobile SOC Performance Increases
1
100
CPU/G
PU A
GGREGATE P
ERFORMANCE
2012 2014
WAYNE
2013
2010
2011
TEGRA 2
Tegra 3 (KAL-EL)5x
LOGAN
10
Core 2 Duo
STARK
10x
50x
75x
Core i5
HTC One X+
Google Nexus 7
25x perf increase in next three
years
7
© 2012 NVIDIA - Page 13
Power is the New Design Limit
The Process Fairy keeps bringing more transistors
Transistors are getting cheaper
The End of Voltage Scaling
The Process Fairy isn’t helping as much on power as in the past
In the Good Old DaysLeakage was not important, and voltage
scaled with feature size
L’ = L/2V’ = V/2E’ = CV2 = E/8f’ = 2fD’ = 1/L2 = 4DP’ = P
Halve L and get 4x the transistors and 8x the capability for the same power
The New RealityLeakage has limited threshold voltage,
largely ending voltage scaling
L’ = L/2V’ = ~VE’ = CV2 = E/2f’ = ~2fD’ = 1/L2 = 4DP’ = 4P
Halve L and get 4x the transistors and 8x the capability for
4x the power!!
© 2012 NVIDIA - Page 14
Mobile Thermal Design Point
2-4W4-7W
6-10W30-90W
4-5” Screen takes 250-500mW
4-5” Screen takes 250-500mW
7” Screen takes 1W7” Screen takes 1W
10” Screen takes 1-2WResolution makes a difference! The iPad3 screen takes up to
8W
10” Screen takes 1-2WResolution makes a difference! The iPad3 screen takes up to
8W
Leakage current in latest silicon geometries means that Moore’s Law no longer delivers more performance without increasing power
Typical max system power levels before thermal failureEven as battery technology improves - these thermal limits remain
8
© 2012 NVIDIA - Page 15
How to Save Power?
Much more expensive toMOVE data than COMPUTE data
Energy efficiency must now be key metric during silicon AND software design
Awareness of where data lives, where
computation happens, how is it scheduled
Need to use hardware acceleration
Lots of processing in parallel
Efficient caching and memory usage
Reduces data movement
32-bit Integer Add1pJ
32-bit Float Operation7pJ
32-bit Register Write0.5pJ
Send 32-bits 2mm24pJ
Send 32-bits Off-chip50pJ
For 40nm,1V process
Write 32-bits to Memory600pJ
© 2012 NVIDIA - Page 16
Example - Typical Camera ISP
Camera ISP (Image Signal Processor) has little or no programmability
Scan-line-based, Data flows through compact hardware pipe
No global memory used to minimize power
BUT… computational photography apps now want to mix non-programmable ISP processing with more flexible GPU or CPU processing
ISP pipelines will provide tap/insertion points to/from CPU/GPU at critical points
~760 math Ops~42K vals = 670Kb
~250Gops @ 300MHz
~760 math Ops~42K vals = 670Kb
~250Gops @ 300MHz
9
© 2012 NVIDIA - Page 17
Engineering Software Apps for Power
Use Dark Silicon - specialized hardware only turned on when needed
SOCs lots of space for transistors – just can’t turn them all on at same time!
Dedicated units increase locality and parallelism of computation to save power compared to programmable processors
Be power smart when using programmable processors
Instrumentation will soon appear for energy-aware compilers and profilers
Dynamic and feedback-driven software power optimization
Power optimizing compiler back-end compilers / installers
Smart, holistic use of sensors and peripherals
Wireless modems and networks
Motion sensors, cameras, networking, GPS
© 2012 NVIDIA - Page 18
Programmers View of Typical SOC c. 2012
CPU Complex1-5 Cortex A9/A15 Cores with NEON
L1 Cache
L2 Cache
UnifiedMemoryController for
0.5-2GB DDR3L/LP-DDR3
2D/3D GPUComplex
GraphicsCache
PeripheralBusses
1-4 DisplayControllers
SensorArray
Image Signal
Processor
1-3 Cameras
Video EncoderDecoder
Automatic scaling of frequency/voltage and # cores to meet current software load
Automatic scaling of frequency/voltage and # cores to meet current software load
Significant programmable acceleration. Fully C-programmable soon.
10-50+ cores
Significant programmable acceleration. Fully C-programmable soon.
10-50+ cores
Typically NO or very limited programmable functionalityTypically NO or very limited programmable functionality
Software configurability
Software configurability
VisionProcessor(Future)
Programmable? Or dedicated hardware for power efficiency?Programmable? Or dedicated
hardware for power efficiency?
HD AudioEngine / IO
Software configurability
Software configurability
Unified memory is critical departure from
PC architecture
Common virtual address space for all units – but typically not coherent
Common virtual address space for all units – but typically not coherent
10
© 2012 NVIDIA - Page 19
Khronos Connects Software to Silicon
ROYALTY-FREE, OPEN STANDARD APIs for advanced hardware acceleration
Graphics, video, audio, compute, visual and sensor processing
Low level silicon to software interface needed on every platform
Defines the forward looking roadmap for the silicon community
Shipping on billions of devices across multiple operating systems
Rigorous conformance tests for cross-vendor consistency
Khronos is OPEN for any company to join and participate
Acceleration APIs BY the Industry FOR the Industry
© 2012 NVIDIA - Page 20
Mobile Platform Innovation Vectors
Console-Class 3DPerformance, Quality,
Controllers and TV connectivity
HTML5 and WebGLWeb Apps that can be discovered on the Net and run on any platform
VisionCameras as sensors,
Computational Photography,Gesture Processing
Sensor FusionDevices become ‘magically’ context aware –location, usage, position
New platform capabilities being driven by SILICON and APIs
11
© 2012 NVIDIA - Page 21
Native APIs for Augmented Reality
Advanced Camera Control and stream
generation
3D Rendering and Video Composition
On GPU
AudioRendering
Application on CPUs and GPUs
Positional and GPS Sensor Data
Computer Vision/Tracking &Computational Photography
Position and Tracking SemanticsSynchronization
and sensor fusion
Positional Sensors
CameraEGLStream
Image streams to GPU and CPU
Tracked features
Dataflow and synchronization
Proprietary Vendors APIs
© 2012 NVIDIA - Page 22
FCAM – Open Source Project
Capture of stream of camera images with precision control
A pipeline that converts requests into images
All parameters packed into the requests - no visible state
Programmer has full control over sensor settings for each stream frame
Control over focus and flash
No hidden daemon running autofocus/metering
Control ISP
Can access supplemental statistics from ISP
12
© 2012 NVIDIA - Page 23
Requested Camera Extensions for AR
Enhanced exposure parameters in FCAM single or burst mode
ISO, white balance, frame rate, focus modes, resolution
Synchronization with other system sensors
And with output displayed frame
ROI extraction
From wide angle and fish-eye lenses
Data output format control
Grayscale, RGB(A), YUV
Access to the raw data e.g. Bayer pattern
Query camera information
Focal length (fx, fy), principal point (cx, cy), skew (s), image resolution (h, w)
Spatial information of how cameras and sensors are placed on device
Calibration and lens distortion
© 2012 NVIDIA - Page 24
OpenCV – Open Source Project
Computer vision open source project
Excellent functionality - widely used in academia, fast prototyping, some products
Not an API definition and not managed by Khronos
Extensive functionality >1,000 functions
Difficult for silicon vendors to provide complete acceleration
Traditionally runs on a single CPU
Some partial acceleration projects underway: OpenCL, CUDA, Neon …
13
© 2012 NVIDIA - Page 25
OpenCV4Tegra
Tegra-specific acceleration for OpenCV Library
OpenGL ES GLSL, ARM Multithreading and NEON optimizations
Available on Tegra Android Development Platform Toolkit (TADP)
Develop native vision functionality in the NDK
Can access from Java applications using JNI
FunctionsCore
absdiff, add, bitwise_and, bitwise_not, bitwise_or, bitwise_xor, compare, countNonZero, cvRound, dot, max, mean, meanStdDev, min, minMaxLoc, norm2, norm, phase32f, reduceC, reduceR, subtract,sum
Imgproc
Blur3x3, blur5x5, canny, cvtColorGray2RGBA, cvtColorRGBA2Gray, cvtColorYCrCb, cvtColorYUV420, dilate, erode, gaussianBlur3x3, gaussianBlur5x5, integral , medianBlur3x3, pyrDown, pyrUp , resizeAreaDownBy2 , resizeAreaDownBy4, resizeDownLinear, resizeUpLinear, scharrFilter , sobelFilter , threshold
Calib3d
Features2d (FAST detector)
Stitching (warpers)
© 2012 NVIDIA - Page 26
OpenCV4Tegra – Neon Speedups
NEON Speedup on Tegra
14
© 2012 NVIDIA - Page 27
OpenCV4Tegra – GPU Speedups
Tegra GPU Speedup
© 2012 NVIDIA - Page 28
OpenVX
Vision Hardware Acceleration Layer
Enables hardware vendors to
implement accelerated imaging and
vision algorithms
For use by high-level libraries or apps
Focus on enabling real-time vision
On mobile and embedded systems
Diversity of efficient implementations
From programmable processors,
through GPUS to dedicated hardware
pipelines
Open source sample implementation?
Hardware vendor implementations
OpenCV open source library
Other higher-levelCV libraries
Application
Dedicated hardware can help make vision processing performant and low-power enough for pervasive ‘always-on’ use
OpenVX does not duplicate OpenCV functionality JUST
provides essential acceleration
OpenVX does not duplicate OpenCV functionality JUST
provides essential acceleration
Aiming for provisional specification in 1H 2013 and final specification 2H13
15
© 2012 NVIDIA - Page 29
OpenVX Execution Flow
OpenVX Graph for efficient execution
Each Node can be implemented in software or accelerated hardware
EGL provides data and event interop – with streaming
BUT use of other Khronos APIs are not mandated
VXU Utility Library provides efficient access to single nodes
Open source implementation
OpenVXNode
OpenVXNode
OpenVXNode
OpenVXNode
Camera Control & Image Processing UI and Display
Compute Processing
Other Outputs
Compute Processing
Other Inputs
© 2012 NVIDIA - Page 30
Market Demand for Sensor Fusion API
Innovative use of growing sensor diversity
PORTABLE apps need to be isolated from sensor and OS
details Application developers do not wish to be Sensor
Fusion experts
Synchronized use of multiple interoperating
sensors in one app
StreamInputA High-level Sensor Fusion API
Do NOT force the application developer to access individual sensors (unlike almost all other sensor APIs)
High-level API enables sensor vendors to drive and deliver competitive
sensor fusion innovation
16
© 2012 NVIDIA - Page 31
StreamInput - Portable Access to Sensor Fusion
Advanced Sensors EverywhereRGB and depth cameras, multi-axis motion/position, touch and gestures,
microphones, wireless controllers, haptics keyboards, mice, track pads
Apps Need SophisticatedAccess to Sensor DataWithout coding to specific
sensor hardware
Apps request semantic sensor informationStreamInput defines possible requests, e.g.
“Provide Skeleton Position” “Am I in an elevator?”
Processing graph provides sensor data streamUtilizes optimized, smart, sensor middlewareApps can gain ‘magical’ situational awareness
UniversalTimestamps
Standardized NodeIntercommunication
InputDevice
Input Device
Input Device
FilterNode
FilterNode
AppFilterNode
Spec expected in 2013
© 2012 NVIDIA - Page 32
Using Android Java for AR
CameraProcessing
3D Rendering and Video Composition
On GPU
AudioRendering
Application on CPUs
Positional and GPS Sensor Data
Computer Vision and Tracking
Position and Tracking SemanticsSynchronization
and sensor fusion
Positional Sensors
Camera
Video stream to GPU
OpenCV through JNI
SensorManager
JetPlayer
GLSurfaceViewSurfaceTexture
MediaSDK
Java CallbackPreview frames
No advanced camera controls: ROI, burst parameter control,
focus modes, format choices etc.
No advanced camera controls: ROI, burst parameter control,
focus modes, format choices etc.30FPS but 2-3 frames
latency. Framerate jitter30FPS but 2-3 frames
latency. Framerate jitter
Limited fusion. No cross sensor synch.
No virtual sensors
Limited fusion. No cross sensor synch.
No virtual sensors
Java provides access to EGL-type and OpenGL ES functionality
Java provides access to EGL-type and OpenGL ES functionality
Need synch between input and composited frames
Need synch between input and composited frames
No programmable, close to sensor, image processing
No programmable, close to sensor, image processing
RenderScript
Just CPU parallel programming – no
GPU compute
Just CPU parallel programming – no
GPU compute
17
© 2012 NVIDIA - Page 33
http://developer.nvidia.com/develop4tegra
Tegra Android Development PackTegra Android Development Pack
CPU DEBUGGING with Nsight Tegra
GPU DEBUGGING with PerfHUD ES
OPTIMIZE applications with Tegra Profiler
REFERENCE docs, samples & tutorials
CPU DEBUGGING with Nsight Tegra
GPU DEBUGGING with PerfHUD ES
OPTIMIZE applications with Tegra Profiler
REFERENCE docs, samples & tutorials
OPTIMIZED for Tegra Android development
FLASHES Tegra DevKit with OS Image
CONFIGURED for debugging and profiling
INCLUDES Kernel symbols and DS-5 support
OPTIMIZED for Tegra Android development
FLASHES Tegra DevKit with OS Image
CONFIGURED for debugging and profiling
INCLUDES Kernel symbols and DS-5 support
For Windows, OSX and LinuxFor Windows, OSX and Linux
GET STARTED in minutes NOT hours
INSTALLS all tools required for Tegra Android
GET STARTED in minutes NOT hours
INSTALLS all tools required for Tegra Android
© 2012 NVIDIA - Page 34
AR Portability with Middleware Engines
Native APIs and Java APIsOpenGL ES, EGLStreams, StreamInput
Middleware SDKs and tools for application developers
Hides OS details – provides OS portability
Porting and optimization through close cooperation with silicon vendors
NVIDIA has close relationships with Metaio and Unity
Optimizing for Tegra power/performance
Applications
Apps Engines
Partners
18
© 2012 NVIDIA - Page 35
Future AR Tracking
The industry has just started figuring out how to do highly realistic Augmented Reality
Research today using CUDA-accelerated PCs
Real-Time Surface Light-field Capture for Augmentation of Planar Specular SurfacesJan Jachnik, Richard A. Newcombe, Andrew J. Davison
Imperial College, UK
© 2012 NVIDIA - Page 36
More Hyper Realistic Augmentations
High-Quality Reflections, Refractions, and Caustics in Augmented Reality
and their Contribution to Visual Coherence
P. Kán, H. Kaufmann
Institute of Software Technology and Interactive Systems,
Vienna University of Technology, Vienna, Austria
19
© 2012 NVIDIA - Page 37
CUDA on Tegra
Tegra + discrete GPU development platforms
Available to select developers
CUDA 5.0
Kepler support
Enables early development of ARM-based applications with desktop-class graphics and compute
See me after the presentation if you are interested
Or email [email protected]
© 2012 NVIDIA - Page 38
In Summary
Augmented Reality will change the way we use computing – again – but needs significantly more improvement
Multiple APIs and standards being put into place to deliver the needed performance at low power
You can get started today with AR on NVIDIA Tegra
Today – Optimized Android Java APIs, OpenCV4Tegra, OpenGL ES, Unity
Talk to me if you have interest ARM/desktop GPU development systems
Soon – FCAM, GPU Compute, OpenVX, Metaio AR SDK