automatic dense semantic mapping from visual street-level imagery

41
Automatic Dense Semantic Mapping From Visual Street- level Imagery Sunando Sengupta [1] , Paul Sturgess [1] , Lubor Ladicky [2] , Phillip H.S. Torr [1] [1] Oxford Brookes University [2] Visual geometry group, Oxford University http://cms.brookes.ac.uk/research/visiongroup/index.php 1

Upload: sunando-sengupta

Post on 12-Aug-2015

70 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Automatic Dense Semantic Mapping From Visual Street-level Imagery

Sunando Sengupta[1], Paul Sturgess[1], Lubor Ladicky[2], Phillip H.S. Torr[1]

[1]Oxford Brookes University[2] Visual geometry group, Oxford University

http://cms.brookes.ac.uk/research/visiongroup/index.php 1

Page 2: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Dense Semantic Map

• Generate an overhead view of an urban region.• Label every pixel in the Map View is associated with an

object class label

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post 2

Page 3: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Dense Semantic Map• Street images captured inexpensively from vehicle with

multiple mounted camera[1].

3[1] Yotta. DCL, “Yotta dcl case studies,” Available: http://www.yottadcl.com/surveys/case-studies/

Page 4: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Mapping Framework

• Semantic mapping framework comprises of two stages

Street level Images acquisition

4

Page 5: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Mapping Framework

• Semantic mapping framework comprises of two stages– Semantic Image Segmentation at street level.

Street level Images acquisition

Image Segmentation

5

Page 6: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Mapping Framework

• Semantic mapping framework comprises of two stages– Semantic Image Segmentation at street level.– Ground Plane Labelling at a global level.

• One of the first attempts to do overhead mapping from street level images.

Street level Images acquisition

Image Segmentation

Ground plane labelling

6

Page 7: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation

Label every pixel in the image with an object class

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

Input Output

Raw Image Labelled Image

Automatic Labeller

Object Class Labels

7

Page 8: Automatic Dense Semantic Mapping From Visual Street-level Imagery

CRFCRF

constructionconstruction

Semantic Image Segmentation• We use Conditional Random Field Framework (CRF)

Final SegmentationInput Image

8

• Each pixel is a node in a grid graph G = (V,E).• Each node is a random variable x taking a label from label

set.

X

Page 9: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation - CRF• Total energy

• Optimal labelling given as

9

Cc

ccNjVi

jiijVi

ii

i

xxxE )(),()()(,

xx

Epix EpairEregion

Page 10: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation - CRF

• Total energy E = Epix + Epair + Eregion

• Epix - Model individual pixel’s cost of taking a label.

– Computed via the dense boosting approach– Multi feature variant of texton boost[1]

x

Car 0.2

Road 0.3

10[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009.

Page 11: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation - CRF

• Total energy E = Epix + Epair + Eregion

• Epair- Model each pixel neighbourhood interactions.

– Encourages label consistency in adjacent pixels

– Sensitive to edges in images.

– Contrast sensitive Potts modelxi xj

Car

Road

0

g(i,j)

Car

Road

11

Epair

Page 12: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation - CRF

• Total energy E = Epix + Epair + Eregion

• Eregion - Model behaviour of a group of pixels.

– Classify a region – Encourages all the pixels in a region to take the same label.– Group of pixels given by a multiple meanshift segmentations

c

Car 0.3

Road 0.1

12

Page 13: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

13

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

Input Image Road Expansion

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Page 14: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

14

Input Image Building Expansion

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Page 15: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

15

Input Image Sky Expansion

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Page 16: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

16

Input Image Pavement Expansion

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Page 17: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

17

Input Image Final solution

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

Page 18: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane Labelling• Combine many labellings from street level imagery.

Automatic Labeller

Output

Labelled Ground PlaneStreet Levellabellings

Input

18

Page 19: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane CRF• A CRF defined over the ground plane.

• Each ground plane pixel (zi) is a random variable taking a label from the label set.

• Energy for ground plane crf is

Z

19

gpair

gpix

g EEZE )(

Page 20: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane Pixel Cost

KX

Z

• We assume a flat world.

20

Page 21: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane Pixel Cost

Homography Road Pavement Post/Pole

KX

Z

• A ground plane region is estimated.

21

Page 22: Automatic Dense Semantic Mapping From Visual Street-level Imagery

KX

Z

Ground Plane Pixel Cost

22

Homography Road Pavement Post/Pole

• Each point in the image projects to a unique point on the ground plane.– Creating a homography

Page 23: Automatic Dense Semantic Mapping From Visual Street-level Imagery

KX

Z

Ground Plane Pixel Cost

23

Ground plane

Pixel histogramsHomography Road Pavement Post/Pole

• The image labelling is mapped to the ground plane – via the homography.

Page 24: Automatic Dense Semantic Mapping From Visual Street-level Imagery

• Labels projected from many views are combined in a histogram.• The normalised histogram gives the naïve probability of the

ground plane pixel taking a label.

Ground Plane Pixel Cost

24

KX

ZGround plane Pixel histogramsHomography Road Pavement Post/Pole

Page 25: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane Pixel Cost

25

KX

ZGround plane Pixel histogramsHomography Road Pavement Post/Pole

• Labels projected from many views are combined in a histogram.• The normalised histogram gives the naïve probability of the

ground plane pixel taking a label.

Page 26: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane labelling

• Histogram is built for every ground plane pixel giving Egpix

• Pairwise cost (Egpair) added to induce smoothness

– Contrast sensitive potts model

Z

Page 27: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane labelling• Final CRF solution obtained using alpha expansion.

Void

Page 28: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane labelling

Road expansion

• Final CRF solution obtained using alpha expansion.

Page 29: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane labelling

Building expansion

• Final CRF solution obtained using alpha expansion.

Page 30: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane labelling

Pavement expansion

• Final CRF solution obtained using alpha expansion.

Page 31: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane labelling

Car expansion

• Final CRF solution obtained using alpha expansion.

Page 32: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane Labelling

Final Solution

• Final CRF solution obtained using alpha expansion.

Page 33: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Dataset

• Subset of the images captured by the van– 14.8 km of track, 8000 images from each camera.

• Pixel-level labelled ground truth images. Dataset available[1].

• 13 object categories –

• Training - 44 images, testing - 42 images.

[1]http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

33

Page 34: Automatic Dense Semantic Mapping From Visual Street-level Imagery

SIS Results

• Input Images, output of our image level CRF, ground truths.• Used Automatic Labelling environment[1]

[1] The Automatic Labelling Environment, L Ladicky, PHS Torr. Code available http://cms.brookes.ac.uk/staff/PhilipTorr/ale.htm

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post

34

Input

Semanticsegmentation

Ground Truth

Page 35: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Semantic Map Results

Semantic map of Pembroke city

35

Page 36: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground plane Map Evaluation

36

Street Images

Back-projectedMap results

Ground Truth

• We back-project the ground plane map into image domain and evaluate the results.

• Global pixel accuracy of 86%

Page 37: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Results

37

Page 38: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Conclusions• Presented a method to generate

overhead view semantic mapping.

• Experiments on large tracks (~15km) which can be scaled up to country wide mapping

• Dataset available[1].

[1] http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php 38

Page 39: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Future Work

39

Oxford Brookes Vision groupOxford Brookes Universityhttp://cms.brookes.ac.uk/research/visiongroup/index.php

• Perform a 3D street level semantic mapping and reconstruction.

• Add detailed street level information like signs, information boards etc.

Thank you!!!

Page 40: Automatic Dense Semantic Mapping From Visual Street-level Imagery
Page 41: Automatic Dense Semantic Mapping From Visual Street-level Imagery

Ground Plane Pixel Cost

41

• Using single view will create a shadow effect for objects violating flat world assumption and wrong label estimate

KX

Z

Single view

Multi-view

Homography Road Pavement Post/Pole