opencl imaging on the gpu: optical flow · optical flow performance 0 10 20 30 40 50 60 32 16 8 4...
TRANSCRIPT
© Copyright Khronos Group, 2010 - Page 1
OpenCL Imaging on The GPU: Optical Flow
James FungDeveloper Technology, NVIDIA
© Copyright Khronos Group, 2010 -
Page 2
Image Sequence: Middlebury “Minicooper” sequence, frames 10
Pyramidal Lucas KanadeOptical Flow in OpenCL
Algorithm by Jean-Yves Bouguet (Intel)
Implementation by James Fung (NVIDIA)
© Copyright Khronos Group, 2010 -
Page 3
Optical Flow• Calculate movement of selected points in pairs of images
• Applications:
- Image stabilization
- Feature tracking
- Video encoding
• May be used to track
- a few select points: sparse optical flow
- All image points: dense optical flow- Computationally intensive!
© Copyright Khronos Group, 2010 -
Page 4
Image Sequence: Middlebury “Minicooper” sequence, frames 10
© Copyright Khronos Group, 2010 -
Page 5
Image Sequence: Middlebury “Minicooper” sequence, frames 10
© Copyright Khronos Group, 2010 -
Page 6
© Copyright Khronos Group, 2010 -
Page 7
Large Motion (3-7 pixels)
Small motion (< 3 pixels)
Calculate image mismatch vector bEstimate motion v = G-1 b(x’,y’)Update position (x’,y’) = ( x’+vx, y’+vy )Repeat for N iterations or until convergence
Algorithm
• For each point u on image I, find the corresponding
point v on image J
• Calculate spatial gradient G in vicinity of each point u
• Iteratively solve for flow:
© Copyright Khronos Group, 2010 -
Page 8
G2x2 Flow (v)
Downsample Scharr Edge filter
Flow
Result
Solve for flow
v = G-1b
Precompute G from I
Iterate v = G-1b
Algorithm
Image I
Source image pair
Image J
Iterate v = G-1b
Iterate v = G-1b
© Copyright Khronos Group, 2010 -
Page 9
Optical Flow in OpenCL• Embarrassingly parallel
• Computationally intensive
• Can take advantage of special-purpose GPU hardware:
- Texture cache for 2D spatial locality
- Hardware bilinear interpolation for sub-pixel lookups
- Local Memory
• Can use OpenCL-OpenGL interoperability for visualization
© Copyright Khronos Group, 2010 -
Page 10
Large Motion (3-7 pixels)
Small motion (< 3 pixels)
Texture Cache• Different areas of image have different amounts of motion,
but motion is spatially coherent
- Lookup window offset varies inside the image
- Texture cache captures 2D locality
© Copyright Khronos Group, 2010 -
Page 11
),(),(),( L
y
L
y
L
x
L
x
LL
k vgyvgxJyxIyxI
),(),(
),(),(
yxIyxI
yxIyxIb
yk
xkwp
wpy
wp
wpx
k
yy
yy
xx
xx
•Image level L
•Pixel center p
•Window w
•Guess g from previous level L+1
sampler_t bilinSample =... | CLK_FILTER_LINEAR;
…float Jsample = read_imagef( J_float,
bilinSampler, Jidx+(float2)(i,j) ).x;…
Hardware Interpolation• Sub-pixel accuracy & sampling is crucial
• Between iterations, x,y is non-integer
• Use texture hardware linear interpolation
© Copyright Khronos Group, 2010 -
Page 12
Using OpenCL Local Memory• Lookup locations for I, Ix and Iy are static per iteration
• Can be explicitly cached in Local Memory
- Local memory is on-chip
- Local memory is faster to access, useful for repeated access
© Copyright Khronos Group, 2010 -
Page 13
0
10
20
30
40
50
60
Texture Only 2 Tex, 1 Lmem 1 Tex, 2 Lmem Local Mem Only
Time(ms)
# Local/Texture Buffers
Effect of moving from Texture to Local Memory Usage
GTX 460 (7 SMs)
GTX 580 (16 SMs)
Middlebury “Minicooper” data set Frames: 10-11 Resolution: 640 x 480 GreyscaleConvergence: 0.0004pxFlow compute time for 3 pyramid levels shown
© Copyright Khronos Group, 2010 -
Page 14
Optical Flow Demo
• Pyramidal Lucas Kanade
Optical Flow
• Visualization done on GPU by
sharing data between
OpenGL & OpenCL
© Copyright Khronos Group, 2010 -
Page 15
Optical Flow Performance
0 10 20 30 40 50 60
32
16
8
4
Time (ms) (lower is better)
Ite
rati
on
sLK Pyramidal Optical Flow
GTX 460 & GTX 580
GTX 460
GTX 580
Middlebury “Minicooper” data set Frames: 10-11 Resolution: 640x480 GreyscaleConvergence: 0.0004pxPyramid 3-level flow compute time shown
© Copyright Khronos Group, 2010 -
Page 16
OpenCL Performance Analysiswith the CUDA Visual Profiler 4.0