automatic dense semantic mapping from visual street-level imagery

Automatic Dense Semantic Mapping From Visual Street-level Imagery

Sunando Sengupta[1], Paul Sturgess[1], Lubor Ladicky[2], Phillip H.S. Torr[1]

[1]Oxford Brookes University[2] Visual geometry group, Oxford University

http://cms.brookes.ac.uk/research/visiongroup/index.php 1

Dense Semantic Map

• Generate an overhead view of an urban region.• Label every pixel in the Map View is associated with an

object class label

BuildingRoadTreeVegetation FenceSignage

SkyPavement Car Pedestrian Bollard Shop Sign Post 2

Dense Semantic Map• Street images captured inexpensively from vehicle with

multiple mounted camera[1].

3[1] Yotta. DCL, “Yotta dcl case studies,” Available: http://www.yottadcl.com/surveys/case-studies/

Semantic Mapping Framework

• Semantic mapping framework comprises of two stages

Street level Images acquisition

4


• Semantic mapping framework comprises of two stages– Semantic Image Segmentation at street level.


Image Segmentation

5


• Semantic mapping framework comprises of two stages– Semantic Image Segmentation at street level.– Ground Plane Labelling at a global level.

• One of the first attempts to do overhead mapping from street level images.


Image Segmentation

Ground plane labelling

6

Semantic Image Segmentation

Label every pixel in the image with an object class


SkyPavement Car Pedestrian Bollard Shop Sign Post

Input Output

Raw Image Labelled Image

Automatic Labeller

Object Class Labels

7

CRFCRF

constructionconstruction

Semantic Image Segmentation• We use Conditional Random Field Framework (CRF)

Final SegmentationInput Image

8

• Each pixel is a node in a grid graph G = (V,E).• Each node is a random variable x taking a label from label

set.

X

Semantic Image Segmentation - CRF• Total energy

• Optimal labelling given as

9

Cc

ccNjVi

jiijVi

ii

i

xxxE )(),()()(,

xx

Epix EpairEregion

Semantic Image Segmentation - CRF

• Total energy E = Epix + Epair + Eregion

• Epix - Model individual pixel’s cost of taking a label.

– Computed via the dense boosting approach– Multi feature variant of texton boost[1]

x

Car 0.2

Road 0.3

10[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009.



• Epair- Model each pixel neighbourhood interactions.

– Encourages label consistency in adjacent pixels

– Sensitive to edges in images.

– Contrast sensitive Potts modelxi xj

Car

Road

0

g(i,j)

Car

Road

11

Epair



• Eregion - Model behaviour of a group of pixels.

– Classify a region – Encourages all the pixels in a region to take the same label.– Group of pixels given by a multiple meanshift segmentations

c

Car 0.3

Road 0.1

12

Semantic Image Segmentation• Solved using alpha-expansion algorithm[1]

13



Input Image Road Expansion

[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99


14

Input Image Building Expansion





15

Input Image Sky Expansion





16

Input Image Pavement Expansion





17

Input Image Final solution




Ground Plane Labelling• Combine many labellings from street level imagery.

Automatic Labeller

Output

Labelled Ground PlaneStreet Levellabellings

Input

18

Ground Plane CRF• A CRF defined over the ground plane.

• Each ground plane pixel (zi) is a random variable taking a label from the label set.

• Energy for ground plane crf is

Z

19

gpair

gpix

g EEZE )(

Ground Plane Pixel Cost

KX

Z

• We assume a flat world.

20


Homography Road Pavement Post/Pole

KX

Z

• A ground plane region is estimated.

21

KX

Z


22


• Each point in the image projects to a unique point on the ground plane.– Creating a homography

KX

Z


23

Ground plane

Pixel histogramsHomography Road Pavement Post/Pole

• The image labelling is mapped to the ground plane – via the homography.

• Labels projected from many views are combined in a histogram.• The normalised histogram gives the naïve probability of the

ground plane pixel taking a label.


24

KX

ZGround plane Pixel histogramsHomography Road Pavement Post/Pole


25

KX

ZGround plane Pixel histogramsHomography Road Pavement Post/Pole

• Labels projected from many views are combined in a histogram.• The normalised histogram gives the naïve probability of the

ground plane pixel taking a label.

Ground Plane labelling

• Histogram is built for every ground plane pixel giving Egpix

• Pairwise cost (Egpair) added to induce smoothness

– Contrast sensitive potts model

Z

Ground Plane labelling• Final CRF solution obtained using alpha expansion.

Void


Road expansion

• Final CRF solution obtained using alpha expansion.


Building expansion



Pavement expansion



Car expansion


Ground Plane Labelling

Final Solution


Dataset

• Subset of the images captured by the van– 14.8 km of track, 8000 images from each camera.

• Pixel-level labelled ground truth images. Dataset available[1].

• 13 object categories –

• Training - 44 images, testing - 42 images.

[1]http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php



33

http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php

SIS Results

• Input Images, output of our image level CRF, ground truths.• Used Automatic Labelling environment[1]

[1] The Automatic Labelling Environment, L Ladicky, PHS Torr. Code available http://cms.brookes.ac.uk/staff/PhilipTorr/ale.htm



34

Input

Semanticsegmentation

Ground Truth

Semantic Map Results

Semantic map of Pembroke city

35

Ground plane Map Evaluation

36

Street Images

Back-projectedMap results

Ground Truth

• We back-project the ground plane map into image domain and evaluate the results.

• Global pixel accuracy of 86%

Results

37

Conclusions• Presented a method to generate

overhead view semantic mapping.

• Experiments on large tracks (~15km) which can be scaled up to country wide mapping

• Dataset available[1].

[1] http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php 38

http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php

Future Work

39

Oxford Brookes Vision groupOxford Brookes Universityhttp://cms.brookes.ac.uk/research/visiongroup/index.php

• Perform a 3D street level semantic mapping and reconstruction.

• Add detailed street level information like signs, information boards etc.

Thank you!!!

http://cms.brookes.ac.uk/research/visiongroup/index.php

http://cms.brookes.ac.uk/research/visiongroup/index.php


41

• Using single view will create a shadow effect for objects violating flat world assumption and wrong label estimate

KX

Z

Single view

Multi-view


automatic dense semantic mapping from visual street-level imagery

Documents