estimating camera pose from a single urban ground-view omnidirectional image and a 2d building...
TRANSCRIPT
![Page 1: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/1.jpg)
Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map
Tat-Jen CHAM(with Arridhana Ciptadi, Wei-Chian Tan, Minh-Tri Pham, Liang-Tien Chia)
Center for Multimedia & Network Technology (CeMNet)
School of Computer Engineering
Nanyang Technological University, Singapore
![Page 2: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/2.jpg)
Tat-Jen CHAM(with Arridhana Ciptadi, Wei-Chian Tan, Minh-Tri Pham, Liang-Tien Chia)
Center for Multimedia & Network Technology (CeMNet)
School of Computer Engineering
Nanyang Technological University, Singapore
Estimating Camera Pose from a Single Urban
Ground-View Omnidirectional Image and a 2D
Building Outline Map
![Page 3: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/3.jpg)
Urban Landmarks
• Those easy to recognize • Those that aren’t
© kevincole
© qureyoon
© Anirudh Koul
![Page 4: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/4.jpg)
“Back-to-Basics” Map Reading!
• An image or images taken from a single location, at probe time
• A plan-view outline map
• Won’t consider GPS– GPS reception bad in high-
rise urban areas– GPS can be jammed or
spoofed
![Page 5: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/5.jpg)
Related Work and Differences
• Appearance-based matching in urban areas– Robertson & Cipolla BMVC04, Yeh et al. CVPR04, Zhang & Košecká
3DPVT06
• General wide-baseline stereo / multi-view (but not targeted for searching through significant-sized datasets)– Bay et al. CVPR05, Mičušík et al. CVPR08, Schindler et al. 3DPVT06,
Schmid & Zisserman IJCV00, Werner & Zisserman ECCV02
• Key differences here:– No prior appearance information
• Only a 2D plan-view geometric map available– No stereo / multi-view
• Images are taken from single location
![Page 6: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/6.jpg)
A Geometric Matching Paradigm
• Assume buildings are vertical planar extrusions
• Match building corners in map vertical corner lines in rectified image– Significant building corners – Not façade details / painted edges
Geometric Signature
![Page 7: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/7.jpg)
2D Geometric Image Features
Basic Lines (2D)
![Page 8: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/8.jpg)
2½D Geometric Image Features
• David Marr’s bottom-up visual perception framework
Image Primal Sketch 2½D Sketch 3D model
Augmented Lines (2D + adjacent 3D normals)
![Page 9: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/9.jpg)
2½D Geometric Image Features
• David Marr’s bottom-up visual perception framework
Image Primal Sketch 2½D Sketch 3D model
Elemental Planes (2D + fixed depth ratios of vertical boundaries)
![Page 10: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/10.jpg)
2½D Geometric Image Features
• David Marr’s bottom-up visual perception framework
Image Primal Sketch 2½D Sketch 3D model
Structural Fragments (piecewise 3D structures with unknown scales)
![Page 11: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/11.jpg)
Basic Lines (2D)
Geometric Signatures – Uniqueness Analysis Under Ideal Conditions
Augmented Lines (2D + 3D normals)Elemental Planes (2D + fixed depth ratios)Structural Fragments (3D structure with unknown scale) Strongmatch
Poormatch
![Page 12: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/12.jpg)
BOTTOM-UP
TOP-DOWN
Overview of Localization Method
2D mapCamera pose
Geometric hashing lookup for
correspondence candidates
Voting-based estimate of optimal camera
pose
Query image Extract vertical corners + normals
Recover elemental planes with 3D normals
Link into plan-view structural fragments (modulo similarity)
Calibration from vanishing points
![Page 13: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/13.jpg)
Estimation of Quasi-Manhattan Vanishing Points
• Use EM algorithm (Schindler et al. 3DPVT 2006)
– Details in paper
• Image rectification 3D verticals become || to image y-axis
![Page 14: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/14.jpg)
Vertical Corner Line Hypothesis (VCLH)• Hypotheses for corners of buildings
– Based on heuristics• 3 Categories:
Basic lineUni-Normal
Augmented LineBi-Normal
Augmented Line
![Page 15: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/15.jpg)
Elemental Planes• Elemental Plane:
– 2 VCLHs connected by groups of collinear horizontal edges• Same plane normals on linked sides
az bz
constant
),( positions VCLHnormal plane
fz
z
b
a
Invariant Depth Ratio:
![Page 16: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/16.jpg)
Structural Fragments
• Structural fragment– Sequence of adjacent elemental planes sharing bi-normal
VCLHs
),( positions VCLHnormals planefz
Full 3D structure(modulo scale)
![Page 17: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/17.jpg)
More Examples
Elemental Planes
Structural Fragments
![Page 18: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/18.jpg)
Matching with Structural Fragments
• Exhaustive testing:– Correspondence
• structural fragment of l planes l linked building edges– Best-fit matching with error– Consensus support C from other VCLHs
• Vote in pose-space accumulator array– Vote score:
• Complexity: O(n), n = # of building corners in map– 8s per search on Matlab
2
21 lC
s
![Page 19: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/19.jpg)
Matching Example with Structural Fragments
Inconsistent matches Consistent matches
![Page 20: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/20.jpg)
Experiments – Dataset I• Bronx neighborhood of Woodstock• Google Street View images (total 212)
– 53 unique locations, 4 images per location (shown in quads)
• Manually created building outline plan view map– 111 buildings with 885 corners
![Page 21: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/21.jpg)
Experiments – Dataset II
• Singapore government housing (HDB) estate• Self-collected images (total 120)
– 30 unique locations, 4 images per location• Manually created building outline plan view map
– 20 mega buildings with 659 corners
![Page 22: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/22.jpg)
Matching Results• Compare probe signature to signatures at 3600 grid locations, and
sort matching scores– Find rank of ground truth
Selectivity of 0-10%
Match ranks
% of test probes where correct pose is better than
this rank
![Page 23: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/23.jpg)
• Example results for matching• 3D models are only used for visualizing results
Dataset II Example Correct Matches
![Page 24: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/24.jpg)
Observations• This is a start to solving a challenging problem
– difficult even for humans• Results are mixed:
– Selectivity is very high• 57-70% of correct poses within top-1% selectivity (36 out of
3600)– But need to be higher to be end-usable– Yet in ideal conditions signatures appear very discriminative
• Main challenges– False VCLH negatives (some)
• building corners not detected due to poor resolution, etc.– False VCLH positives (many)
• Windows / other façade features often misdetected as corners– Architectural designs are seldom perfect extrusions
• Overhangs, balconies, fire escapes, etc.
![Page 25: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/25.jpg)
Concluding Remarks• Geometric features can be powerful for discriminating locations
– Do not always have to rely on prior appearance data– Intelligent extension to geometric 2½D features
• 2D 2D+normals 2D+depth ratios 3D (mod scale)– Informal test in ideal conditions show excellent discriminating
power
• Key challenge lies in more robust image analysis– Needs robustness to noise and minor deviations from map
• Future Work– Use existing results to bootstrap more advanced (and costly)
registration techniques• E.g. top-down bundle adjustment working directly on raw
image intensities, rather than detected edgels
![Page 26: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/26.jpg)
Credits• Joint work with
– Arridhana Ciptadi– Wei-Chian Tan– Minh-Tri Pham– Clement Liang-Tien Chia
• Thanks– Teck-Khim Ng– Zahoor Zafrulla– Rudianto Sugiyarto
• Research Sponsor– Project Tacrea Grant
Defence Science & Technology Agency (DSTA), Singapore
![Page 27: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/27.jpg)
![Page 28: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/28.jpg)
Scene Assumptions
• Quasi-Manhattan World– Vertical direction is orthogonal to all horizontal directions– Horizontal directions need not be orthogonal to each other
• Vertical Extrusion Model– Each building is a vertical extrusion of a ground-plane cross
section
• Implies buildings have simple vertical planar facades
![Page 29: Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian](https://reader036.vdocuments.us/reader036/viewer/2022062322/56649ebd5503460f94bc6ab8/html5/thumbnails/29.jpg)
Potential Future Directions
• Exploit localized architectural design “language”?– priors to improve geometric feature detection in poor quality
images– predict occluded parts of higher order geometric features that
form the local architectural “vocabulary”
• Investigate if reasonable to have prior distribution that buildings close by have similar geometric designs