aerosynth: aerial scene synthesis from imagescnspci/references/nilosek2009.pdfaerosynth: aerial...

1
POSTER TEMPLATE BY: www.PosterPresentations.com AeroSynth: Aerial Scene Synthesis from Images David Nilosek, Lt. Col. Karl Walli Rochester Institute of Technology, AFIT/CI Carlson Center for Imaging Science Digital Imaging and Remote Sensing Lab Introduction Automated synthetic terrain and architecture generation is now becoming feasible with calibrated camera remote sensing. This poster shows computer vision techniques that have recently become popular to extract ‘structure from motion’ of a calibrated camera with respect to a target. This process will build off of Microsoft’s popular ” PhotoSynth” technique and apply it to geographic scenes imaged from an airborne platform. Images taken from an airborne platform often have wide baselines and a sparse number of images covering the desired target. Generation of both sparse and dense point clouds will be used to increase the fidelity of the 3D structure for realistic scene modeling. Our Approach Scenes are reconstructed on both a macro (sparse) and a micro (dense) scale The macro process establishes an initial correspondence and from that derives the epipolar geometry between images and a sparse point cloud of scene coordinates The micro process uses the found epipolar geometry and a region of interest over a target to generate a dense point cloud of scene coordinates A point in left image corresponds to a line in right image, this is called the epipolar constraint [1] The fundamental matrix F describes the relationship between two images [1] Using RANSAC with the equation above, the outliers in the initial correspondence from SIFT are removed [1] Initial correspondence Epipolar Geometry & RANSAC (RANdom SAmple Consensus) The points are put into UTM coordinates by projecting the Scene coordinates through the collinearity equations onto the base image, these equations use the calibrated camera information 0 L T R FX X Finding the scene coordinates Photogrammetry is used to calculate an initial estimate of the scene coordinates [4] Derived from figure 3: R L a x x Bf H Z R L L a x x By Y R L L a x x Bx X Where a,b are vectors of camera information, i is the image point the in the right image, j is the image number. P is the predicted projection of point i onto image j, d represents the Euclidean distance operator and v is a binary operator that is 1 if the point exists and 0 otherwise n i m j ij i j ij b a x b a P d v i j 1 2 1 , , , min Scale invariant feature transform (SIFT) is a scale and orientation invariant detector [3] Image is convolved with Gaussian kernels of different widths Features are detected by finding local extrema in the difference between Gaussian images Features are described by the scale of the Gaussian curve and by the relative orientation of the area around the feature to create scale and orientation invariant description vectors Description vectors are matched using nearest neighbor approach ) ( ) ( ) ( ) ( 33 01 32 01 31 23 01 22 01 21 33 01 32 01 31 13 01 12 01 11 H Z m Y Y m X X m H Z m Y Y m X X m f y y H Z m Y Y m X X m H Z m Y Y m X X m f x x a a a a a a R a a a a a a a R a cos cos cos sin sin cos sin sin sin cos cos cos sin sin sin sin cos sin sin cos sin cos sin cos cos sin sin cos cos m Dense Correspondence Using principles of epipolar geometry, every point over the ROI is matched along epipolar line in other camera to generate dense correspondence Once correspondence is determined the scene coordinates are extracted using the previously mentioned methods This model is facetized and an image is projected onto the model The estimates along with the calibrated camera information and the correspondences are put through a sparse bundle adjustment to refine the scene coordinate estimation [2] AeroSynth output Combination of macro and micro scene reconstruction with a comparison to a hand created CAD model Unordered Images A B C Feature Extraction A B C B C Image Correspondences A C B B C A B C Image Group Relationship Sparse Bundle Adjustment Sparse Point Cloud & Coordinate Transformation 3D Models Pass Relevant AeroSynth Data from Macro Process to Micro Process Cam1 - Base (ωφκxyzfpk) H Region of Overlap (ROO) ROO LR LL UL UR Camera 2 (ωφκxyzfpk) Camera 3 (ωφκxyzfpk) ROI Region of Interest (ROI) References [1] HARTLEY, R., AND ZISSERMAN, A. 2003. Multiple view geometry in computer vision. Cambridge Univ Pr. [2] LOURAKIS, M., AND ARGYROS, A. 2004. The design and implementation of a generic sparse bundle adjustment software package based on the levenberg- marquardt algorithm. ICS/FORTH Technical Report TR 340. [3] LOWE, D. 1999. Object recognition from local scale-invariant features. In International Conference on Computer Vision, vol. 2, Corfu, Greece, 11501157. [4] WOLF, P., AND DEWITT, B. 1983. Elements of photogrammetry. McGraw-Hill Singapore. *Environment used to display models is Google Earth Figure 2: Basic epipolar geometry showing the epipolar constraint Figure 1: The AeroSynth workflow Camera Parameters & Dense Point Cloud F (ωφκxyzfpk) H Figure 3: Simple geometry of straight baseline aerial photography Figure 4: TOP: Five images overlapping a scene of a wastewater treatment plant. BOTTOM: The images projected onto a map to show the overlapping regions* Figure 5: TOP: The five overlapping regions and initial scene coordinate estimate before sparse bundle adjustment. BOTTOM LEFT: The five overlapping regions after the sparse bundle adjustment. BOTTOM RIGHT: The points projected back onto the base image to calculate the UTM coordinates for each point. Figure 6: LEFT: Target chosen in base image with single point chosen. MIDDLE/RIGHT: Corresponding epipolar line Figure 7: LEFT: Point cloud derived from dense correspondence. RIGHT: Facetized point cloud with image texture map overlaid Figure 7: AeroSynth output with comparison models (Comparison models provided by Pictometry International Corp)

Upload: lycong

Post on 22-Mar-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: AeroSynth: Aerial Scene Synthesis from Imagescnspci/references/nilosek2009.pdfAeroSynth: Aerial Scene Synthesis from Images David Nilosek, Lt. Col. Karl Walli ... from that derives

POSTER TEMPLATE BY:

www.PosterPresentations.com

AeroSynth: Aerial Scene Synthesis from Images

David Nilosek, Lt. Col. Karl WalliRochester Institute of Technology, AFIT/CI

Carlson Center for Imaging Science

Digital Imaging and Remote Sensing Lab

Introduction

Automated synthetic terrain and architecture generation is now

becoming feasible with calibrated camera remote sensing. This

poster shows computer vision techniques that have recently

become popular to extract ‘structure from motion’ of a calibrated

camera with respect to a target. This process will build off of

Microsoft’s popular ”PhotoSynth” technique and apply it to

geographic scenes imaged from an airborne platform. Images

taken from an airborne platform often have wide baselines and a

sparse number of images covering the desired target.

Generation of both sparse and dense point clouds will be used to

increase the fidelity of the 3D structure for realistic scene

modeling.

Our Approach

Scenes are reconstructed on both a macro (sparse) and a

micro (dense) scale

The macro process establishes an initial correspondence and

from that derives the epipolar geometry between images and a

sparse point cloud of scene coordinates

The micro process uses the found epipolar geometry and a

region of interest over a target to generate a dense point cloud of

scene coordinates

A point in left image corresponds to a line in right image, this is

called the epipolar constraint [1]

The fundamental matrix F describes the relationship between

two images [1]

Using RANSAC with the equation above, the outliers in the

initial correspondence from SIFT are removed [1]

Initial correspondence

Epipolar Geometry & RANSAC (RANdom SAmple Consensus)

The points are put into UTM coordinates by projecting the

Scene coordinates through the collinearity equations onto the

base image, these equations use the calibrated camera

information

0L

T

R FXX

Finding the scene coordinates

Photogrammetry is used to calculate an initial estimate of the

scene coordinates [4]

Derived from figure 3:

RL

axx

BfHZ

RL

La

xx

ByY

RL

La

xx

BxX

Where a,b are vectors of camera information, i is the image

point the in the right image, j is the image number. P is the

predicted projection of point i onto image j, d represents the

Euclidean distance operator and v is a binary operator that is 1 if

the point exists and 0 otherwise

n

i

m

j

ijijijba

xbaPdvij 1

2

1,

,,min

Scale invariant feature transform (SIFT) is a scale and

orientation invariant detector [3]

Image is convolved with Gaussian kernels of different widths

Features are detected by finding local extrema in the

difference between Gaussian images

Features are described by the scale of the Gaussian curve

and by the relative orientation of the area around the feature to

create scale and orientation invariant description vectors

Description vectors are matched using nearest neighbor

approach

)(

)(

)(

)(

3301320131

2301220121

3301320131

1301120111

HZmYYmXXm

HZmYYmXXmfyy

HZmYYmXXm

HZmYYmXXmfxx

aaa

aaaRa

aaa

aaaRa

coscoscossinsin

cossinsinsincoscoscossinsinsinsincos

sinsincossincossincoscossinsincoscos

m

Dense Correspondence

Using principles of epipolar geometry, every point over the ROI

is matched along epipolar line in other camera to generate dense

correspondence

Once correspondence is determined the scene coordinates

are extracted using the previously mentioned methods

This model is facetized and an image is projected onto the

model

The estimates along with the calibrated camera information

and the correspondences are put through a sparse bundle

adjustment to refine the scene coordinate estimation [2]

AeroSynth output

Combination of macro and micro scene reconstruction with a

comparison to a hand created CAD modelUnordered Images

A

B

C

Feature Extraction

A

B

CB

C

Image Correspondences

A

B CB CB C

A

B CB C

Image Group Relationship

Sparse Bundle Adjustment

Sparse Point Cloud &

Coordinate Transformation

3D Models

Pass Relevant AeroSynth Data from

Macro Process to Micro Process

Cam1 -

Base

(ωφκxyzfpk)

H

Region

of

Overlap

(ROO)ROO

LRLL

UL UR

Camera 2

(ωφκxyzfpk)

Camera 3

(ωφκxyzfpk)

ROI

Region

of

Interest

(ROI)

References

[1] HARTLEY, R., AND ZISSERMAN, A. 2003. Multiple view geometry in computer vision.

Cambridge Univ Pr.

[2] LOURAKIS, M., AND ARGYROS, A. 2004. The design and implementation of a

generic sparse bundle adjustment software package based on the levenberg-

marquardt algorithm. ICS/FORTH Technical Report TR 340.

[3] LOWE, D. 1999. Object recognition from local scale-invariant features. In International

Conference on Computer Vision, vol. 2, Corfu, Greece, 1150–1157.

[4] WOLF, P., AND DEWITT, B. 1983. Elements of photogrammetry. McGraw-Hill

Singapore.

*Environment used to display models is Google Earth

Figure 2: Basic epipolar geometry showing the epipolar constraint

Figure 1: The AeroSynth workflow

Camera Parameters

& Dense Point Cloud

F

(ωφκxyzfpk)

H

Figure 3: Simple geometry of straight baseline aerial photography

Figure 4: TOP: Five images overlapping a scene of a wastewater treatment plant. BOTTOM: The images

projected onto a map to show the overlapping regions*

Figure 5: TOP: The five overlapping regions and initial scene coordinate estimate before sparse bundle adjustment.

BOTTOM LEFT: The five overlapping regions after the sparse bundle adjustment. BOTTOM RIGHT: The points

projected back onto the base image to calculate the UTM coordinates for each point.

Figure 6: LEFT: Target chosen in base image with single point chosen. MIDDLE/RIGHT:

Corresponding epipolar line

Figure 7: LEFT: Point cloud derived from dense correspondence. RIGHT: Facetized point cloud with image

texture map overlaid

Figure 7: AeroSynth output with comparison models (Comparison models provided by Pictometry

International Corp)