accelerating image recognition on mobile devices using gpgpu
DESCRIPTION
Accelerating image recognition on mobile devices using GPGPU. Jari Hannuksela, Olli Silvén Machine Vision Group, Infotech Oulu Department of Electrical and Information Engineeering University of Oulu, Finland. - PowerPoint PPT PresentationTRANSCRIPT
MACHINE VISION GROUP
Accelerating image recognition on mobile devices using GPGPU
Miguel Bordallo1, Henri Nykänen2, Jari Hannuksela1, Olli Silvén1 and Markku Vehviläinen3
1 University of Oulu, Finland2 Visidon Ltd. Oulu, Finland
3 Nokia Research Center, Tampere, Finland
Jari Hannuksela, Olli SilvénMachine Vision Group, Infotech Oulu
Department of Electrical and Information EngineeeringUniversity of Oulu, Finland
MACHINE VISION GROUP
Contents
IntroductionMobile Image Recognition
• Local Binary PatternGraphics processor as a computing
engineGPU accelerated image recognition
• LBP Fragment Shader implementation
• Image preprocessingExperiments and results
• Speed• Power Consumptions
MACHINE VISION GROUP
Motivation
• Face detection and recognition is a key component of future multimodal user interfaces
• Mobile computation power still not harnessed properly for real-time computer vision
• High demand computations compromise battery life.
• Need for energy and computationally efficient solutions
MACHINE VISION GROUP
Face analysis using local binary patterns
• Face analysis is one of the major challenges in computer vision
• LBP method has already been adopted by many leading scientists
• Excellent results in face recognition and authentication, face detection, facial expression recognition, gender classification
MACHINE VISION GROUP
Local Binary Pattern
MACHINE VISION GROUP
GPU as a computing engine
• Newer phones include a GPU chipset• OpenGL ES as a highly optimized and attractive
accelerator interface• Emerging platforms (OpenCL EP) will facilitate
using the GPU as a computing resource• Compatible data formats for graphics and
camera sub-systems desirable
GPU can be treated aan independent
entity
MACHINE VISION GROUP
Fixed pipeline (OpenGL ES 1.1) vs. programmable pipeline (OpenGL ES 2.0)
MACHINE VISION GROUP
Stream processing (OpenGL) vs. shared memory processing (CUDA)
MACHINE VISION GROUP
OpenCL (Embedded Profile)
• Emerging platforms will offer needed flexibility• OpenCL Embedded Profile is a subset of OpenCL• Supports data and task parallel programming
models• Code executed concurrently on CPU & GPU (& DSP)
– Other current and future resources are compatible – Easier programming in a heterogeneous processor
environment
• High parallelization on image processing computations -> High efficiency
MACHINE VISION GROUP
GPU assisted face analysis process
MACHINE VISION GROUP
GPU-accelerated image recognition
• Open GL ES 2.0:– Image features (LBP,...) extraction:– Image preprocessing– Image scaling– Displaying
• C code:– Camera control– Classification
• c
MACHINE VISION GROUP
LBP fragment shader implementation
•Access the image via texture lookup•Fetch the selected picture pixel•Fetch the neighbours values•Compute binary vector•Multiply by weighting factor
• Two versions:– Version 1: calculates LBP map in one grayscale channel– Version 2: calculates 4 LBP maps in RGBA channels
MACHINE VISION GROUP
Preprocessing
Create quad
Divide texture &Convert to grayscale
Render each piecein one channel
MACHINE VISION GROUP
Experiments setup
• OMAP 3 family (OMAP3530)– ARM Cortex A8 CPU– Power VRSGX535 GPU
• 3 set-ups:– Beagleboard revision 3– Zoom AM3517EVM (TI Sitara)– Nokia N900
MACHINE VISION GROUP
Processing times: LBP extraction
•Computing LBP in four channels (version 2) faster than computing in one
•CPU faster than GPU
•Concurrent execution of algorithms in GPU + CPU increases performance
Size GPUv1 GPUv2 CPU CPU& GPUv1
CPU& GPUv2
1024x1024
232ms 180ms 100ms 116ms 90ms
512x512 76ms 46ms 25ms 37ms 23ms
64x64 2ms 1,5ms 0,4ms 1ms 0,2ms
MACHINE VISION GROUP
Processing times: Preprocessing
•GPU outperforms CPU in pixelwise simple operations (scaling + interpolation)
•Concurrent execution of algorithms in GPU + CPU slower than GPU alone due to data transfers
Size GPU CPU CPU &GPU
1024x1024 35ms 100ms 54ms
512x512 10ms 25ms 15ms
64x64 0,2ms 0,4ms 0,4ms
MACHINE VISION GROUP
Speed (II): Preprocessing
Size GPU CPU CPU&GPU
1024x1024 35ms 100ms 54ms
512x512 10ms 25ms 15ms
64x64 0,2ms 0,4ms 0,4ms
MACHINE VISION GROUP
Speed (II): Preprocessing
Size GPU CPU GPU preprocessing & CPU LBP extraction
1024x1024 215ms 205ms 142ms
512x512 56ms 50ms 40ms
64x64 1,8ms 1ms 0,8ms
MACHINE VISION GROUP
Power and Energy consumptions
•Power consumption of GPU and CPU is independent•CPU – 190mW•GPU – 110mW-130mW (increases with image size)
•Energy consumption depends on processing time•GPU has smaller energy per operation.
Operation GPU CPU
Preprocesing 27mJ 19mJ
LBP 5,3mJ 10mJ
Combined algorithm
32,3mJ 28mJ
MACHINE VISION GROUP
Summary
•GPUs can be used as a general purpose procesors•New platforms will offer more efficiency and flexibility
•Not optimized interfaces include excesive overheads
MACHINE VISION GROUP
Future directions
• Implementation of classifier• Implementations in OpenCL• Multi-scale LBP• Implementation of other feature
extraction
MACHINE VISION GROUP
Thank you!
• Any questions???
Thanks to Texas Instruments for the donation of the Hardware