multi-view real-time depth estimation based on combination of visual-hull and hybrid recursive...
TRANSCRIPT
Multi-view real-time depth estimation based on
combination of visual-hull and hybrid recursive matching
HHI
Wolfgang Waizenegger
Overview• Field of application: 3D Presence
– 2D Videoconferencing – 3D Videoconferencing– 3D Presence concept and 3D displays– The camera system
• 3D Analysis– 3D algorithmic chain– Hybrid recursive matching (HRM)– Visual Vull (VH)– HRM and VH combination
• Results• Hardware• Conclusion and Outlook
3D Presence Consortium
SoA of Telepresence Systems
Polycom TPX System
Telepresence System by
CISCO
HP Halo Telepresence
System
Drawbacks of conventional telepresence systems
• Drawback: – No eye contact, e.g. it is hard to
recognize who is talking to whom– Misleading gestures and body
language
• Ideal situation:Every local participant has its own view for each remote conferee
• Solution: Immersive 3D videoconferencing
Missing eye contact (CISCO system)
SoA of 3D Videoconferencing
MultiView by Univ. of
California,Berkeley, 2004
Virtue/im.point by Fraunhofer HHI, 2003/2004
Real Meet Room, France Telecom R&D, 2001
The concept of 3D Presence
Three partiesTwo conferees per party
• Multi-party 3D videoconferencing• 3D multi-user auto-stereoscopic display technology• Multi-party eye contact and gesture-based
interaction
Replace remote confereesby 3D displays
Multi-View 3D Displays
Multiple 3D views from different perspectives
Advantages:- Own view for each local conferee- Adapted viewing perspective- 3D impression- Multiple views allow conferees to switch perspective by moving the head
multiple viewing cones
Multi-View 3D Display
The Multi-View Camera System
Narrow baseline system• Robust disparity estimation• Consistency check by trifocal matching
b
b
kb combined trifocal system
vertical wide baseline system
horizontal wide baseline system
horizontal narrow baseline system
vertical narrow baseline system
vertical wide baseline system
Wide baseline system• Increased depth resolution• Option to combine with Visual Hull
The Mock-up for Camera Configuration Testing
3D Analysis Chain
n stereo streams
segmentation
disparity estimation
volumetric reconstructi
on
head tracking
hand tracking
data fusion
depth maps
3D modeling
data
occlusion information etc.
video + depth (n)
Hybrid-Recursive Matching (HRM)
pixel recursion
choice of best disparity
disparity memory
block recursion
3 candidates
disparity vector
left image
start vector
update vector
right image
Trifocal system
vertical narrow baseline
after consistency check
horizontalnarrow baseline
Multi-View Video Analysis Chain
n stereo streams
segmentation
disparity estimation
volumetric reconstructi
on
head tracking
hand tracking
data fusion
depth maps
3D modeling
data
occlusion information etc.
video + depth (n)
Colored Visual Hull reconstruction
Visual Hull Techniques
• Polygonal• Volume based space carving (VH)• Image based (IBVH)
3D Presence demands real-time processing!!
Parallelization of the last two approaches on graphics hardware is straightforward!
IBVH Algorithm
Our implementation is based on the initial work of Matusik et al. (2000)
Advantages of our algorithm• Improved caching strategy that allows pixel pre-selection
which significantly speeds up the computation• GPU only implementation using CUDA• Establishes an interconnection to voxel based
implementation by applying cameras at infinity.
IBVH interconnection to voxel based methods
VH vs. IBVHTimings for two GPU based implementations with different resolutions. The
imageupload time is included.
Volume based approach from Ladikos et al. 2008 (VH_Lad)Our image based approach (PPSIBVH, without pixel pre-selection IBVH)
Input: Middlebury dinoRig dataset ( 48 images, 640 x 480 )
Hardware 1283 2563 5123
VH_Lad 4 x 8800GTX 99.89 ms 296.71 ms -
IBVH 1 x GTX280 47.9 ms 82.5 ms 280.6 ms
PPSIBVH 1 x GTX280 41.6 ms 60.9 ms 150.6 ms
IBVH result for the dinoRig dataset
left) Voxel representation of the IBVH result (5123), right) image based depth map
IBVH result for a 3D Presence conferee
Timing for a typical 3D Presence setup with depth maps of 192x256 and 8 Visual Hull cameras: 10–20 msec on a single GTX280.
Soares et al. use an eight CPU dual Opteron 2.2GHz machine to achieve almost the same results with 5 cameras and an octree based Visual Hull algorithm
Combination HRM and VH
Result for the combination of HRM and VH
Combination HRM and VH (cont.)
Realization: Hardware Overview for the 3D Presence
setup
• 5 x PCs with dual Nehalem Xeon CPUs
• 2 x Geforce GTX295 per cluster node• Infiniband 40GB/s interconnection
3D Presence System Architecture
Node_VH
Node_2
Node_0
Node_1 Node_3
Node_N-Capture (4 cameras)-Segmentation-Lens un-distortion-Rectification-HRM (trifocal)-Bilateral filtering-Virtual view generation-Encoding (video+depth)-Networking
Inalienability of GPUs• Hardware:
– CPU: Intel 3.0GHz (single core computation)– GPU: Geforce GTX280
• Input:– Images: 1024 x 768, RGB24– Depth Maps: 1024 x 768, float
• GPU results include up- and download times
GPU CPU
Lens un-distortion + rectification
2 msec 68 msec
Bilateral filtering of depth mapVirtual view synthesis (RGB)
11 msec 1000 msec
1 msec 150 msec
Demo
Virtual view generation based on estimated depth maps
Conclusion and Outlook
• Three party immersive 3D Videoconferencing system • Real-time 3D analysis for a 16 camera setup• Fast IBVH algorithm which runs entirely on a single GPU• Combination of trifocal HRM and VH significantly improves the
results• All processing runs in real-time on only 5 PCs• System allows to rapidly test various camera configuration
• First real-time demonstrator prototype available by October 2009
• Future: Full HD real-time 3D processing chain
ReferencesAtzpadin, N., Kauff, P. and Schreer, O.: Stereo Analysis by Hybrid Recursive Matching for
Real-Time Immersive Video Conferencing, IEEE Transactions on Circuits and Systems for Video Technology, special Issue on Immersive Telecommunications, vol. 14, no. 3, pp. 321-334, January 2004.
Matusik, W., Buehler, C., Raskar, R., Gortler, S. J., and McMillan, L. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and interactive Techniques International Conference on Computer Graphics and Interactive Techniques.
Lakikos, A., Benhimane, S., Navab, N., Efficient Visual Hull Computation for Real-Time 3D Reconstruction using CUDA, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska (USA), June 2008. Workshop on Visual Computer Vision on GPUs (CVGPU).
Soares, L., Menier, C., Raffin, B., and Roch, J.L. Parallel adaptive octree carving for real-time 3d modeling. Poster at IEEE VR'2007 - Virtual Reality Charlotte, Northe Carolina, USA, March 2007.