tri-modal human body segmentation - ubsergio/linked/cristinamaster.pdf · tri-modal human body...

50
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantari˜ no Advisor: Sergio Escalera Guerrero February 6, 2014

Upload: ngokien

Post on 16-Apr-2018

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Tri-modal Human Body SegmentationMaster of Science Thesis

Cristina Palmero Cantarino

Advisor: Sergio Escalera Guerrero

February 6, 2014

Page 2: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Outline

1 Introduction

2 Tri-modal dataset

3 Proposed baseline

4 Evaluation

5 Conclusions and future work

2/47

Page 3: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Outline

1 IntroductionHuman body segmentationMotivationProposal

2 Tri-modal dataset

3 Proposed baseline

4 Evaluation

5 Conclusions and future work

3/47

Page 4: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Human body segmentation

Segmentation: labeling problem.

Main challenges:

Different points of view.

Illumination changes.

Complex and clutteredbackgrounds.

Presence of occlusions.

Human body articulatednature.

Diversity of poses.

Variable appearance.Segmentation using Grabcut

Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. “Grabcut:Interactive foreground extraction using iterated graph cuts”. In: ACMTransactions on Graphics (TOG). Vol. 23. 3. ACM. 2004, pp. 309–314.4/47

Page 5: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Motivation

Applications:

Security.

Leisure.

Health.

Imaging modalities:

Mostly RGB cues from color cameras.

Recently, RGB-Depth cues (Microsoft R©KinectTM).

Little attention to thermal.

Thermal Imaging:

Price of thermal sensors is lowering substantially every year.

Less intrinsic problems than RGB cues.

Lack of benchmarks comparing RGB-Depth-Thermalmodalities.

5/47

Page 6: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Motivation

Applications:

Security.

Leisure.

Health.

Imaging modalities:

Mostly RGB cues from color cameras.

Recently, RGB-Depth cues (Microsoft R©KinectTM).

Little attention to thermal.

Thermal Imaging:

Price of thermal sensors is lowering substantially every year.

Less intrinsic problems than RGB cues.

Lack of benchmarks comparing RGB-Depth-Thermalmodalities.

5/47

Page 7: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Motivation

Applications:

Security.

Leisure.

Health.

Imaging modalities:

Mostly RGB cues from color cameras.

Recently, RGB-Depth cues (Microsoft R©KinectTM).

Little attention to thermal.

Thermal Imaging:

Price of thermal sensors is lowering substantially every year.

Less intrinsic problems than RGB cues.

Lack of benchmarks comparing RGB-Depth-Thermalmodalities.

5/47

Page 8: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Proposal

Tri-modal dataset

Novel tri-modal dataset of continuous image sequences.

RGB-Depth-Thermal modalities.

People interacting with everyday objects.

Baseline methodology

Automatic segmentation of people in video sequences inindoor scenarios with a fixed camera.

Usage of state-of-the-art descriptors for feature extractionamong modalities.

GMM modeling of subject/object regions.

Multi-modal fusion using several approaches.

6/47

Page 9: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Proposal

Tri-modal dataset

Novel tri-modal dataset of continuous image sequences.

RGB-Depth-Thermal modalities.

People interacting with everyday objects.

Baseline methodology

Automatic segmentation of people in video sequences inindoor scenarios with a fixed camera.

Usage of state-of-the-art descriptors for feature extractionamong modalities.

GMM modeling of subject/object regions.

Multi-modal fusion using several approaches.

6/47

Page 10: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Outline

1 Introduction

2 Tri-modal datasetDatasetScenes

3 Proposed baseline

4 Evaluation

5 Conclusions and future work

7/47

Page 11: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Tri-modal dataset

Novel registered multi-modal dataset

RGB - Depth - Thermal modalities

3 different scenarios

3 continuous image sequences

More than 2,000 frames per sequence

RGB - Depth pixel-level registration

Thermal near pixel-level registration

Manually annotated ground truth

Registration algorithm provided

8/47

Page 12: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Tri-modal datasetScene 1

9/47

Page 13: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Tri-modal datasetScene 2

10/47

Page 14: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Tri-modal datasetScene 3

11/47

Page 15: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Outline

1 Introduction

2 Tri-modal dataset

3 Proposed baselineExtraction of masksExtraction of regions of interestFeature extractionClassifiers overviewCell classificationIndividual PredictionMulti-modal fusion

4 Evaluation

5 Conclusions and future work

12/47

Page 16: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of masks

13/47

Page 17: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of masksBackground Subtraction

Limit the search space.

Non-adaptive background modeling using Mixture ofGaussians.

Select modality:

14/47

Page 18: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of masksBackground Subtraction

Limit the search space.

Non-adaptive background modeling using Mixture ofGaussians.

Select modality: depth.

15/47

Page 19: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of masksMask registration

Registration

Depth/RGB Foreground Masks Thermal Foreground Masks

16/47

Page 20: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of regions of interest

17/47

Page 21: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of regions of interestBounding box generation from regions of interest

People overlap:

Bimodal disparity distribution.

Otsu’s threshold to split regions.

18/47

Page 22: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of regions of interestBounding box transformation and correspondence to other modalities

All modalities must have the same number of bounding boxes,corresponding to the same regions of interest.

Tasks:

1 Find correspondence between rgb/depth and thermal regionsof interest.

2 Compute the corresponding bounding boxes in thermalmodality generated after applying Otsu’s threshold in depthmodality.

19/47

Page 23: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of regions of interestBounding box transformation and correspondence to other modalities

1 Find correspondence between rgb/depth and thermal regions ofinterest.

Iterative search among depth and thermal modalities.Takes into account deviation among them.Best match: bounding box coordinates, amount of overlap andarea similarity.Correspondence function:

bthermaliq = β(bdepthij ) (1)

where bij is the j-th bounding box in frame i.

20/47

Page 24: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of regions of interestBounding box transformation and correspondence to other modalities

2 Compute the corresponding bounding boxes in thermal modalitygenerated after applying Otsu’s threshold in depth modality.

Assuming bounding boxes of both rgb/depth and thermalmodalities are proportional, find the equivalence ratio to createthe split bounding boxes in thermal.Ratio k:

kh =hbdepth

ij

hbthermaliq

, kw =wbdepth

ij

wbthermaliq

(2)

where h and w are the size of a given bounding box.

21/47

Page 25: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of regions of interestBounding box transformation and correspondence to other modalities

Result:

Correspondence of regions of interest among modalities.

Grid partitioning 2× 2 cells per bounding box.

RGB Depth Thermal

1 11

2 2 2

22/47

Page 26: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Extraction of regions of interestBounding box ground truth generation

Comparing overlap between:

Bounding boxes extracted from Ground Truth Masks

Bounding boxes extracted from Background SubtractionMasks

Label:

tdr =

0 (Object) if overlap ≤ 0.1−1 (Unknown) if 0.1 < overlap < 0.61 (Subject) if overlap ≥ 0.6

23/47

Page 27: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Feature extraction

24/47

Page 28: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Feature extraction: ColorHistogram of Oriented Gradients (HOG)

Unsigned gradients (0 - 180degrees).

9-bin histogram.

Contribution to the histogramgiven by the vector magnitude.

No block overlap applied.

Final vector of 288 values percell.

HOG

Navneet Dalal and Bill Triggs. “Histograms of oriented gradients forhuman detection”. In: Computer Vision and Pattern Recognition, 2005. CVPR2005. IEEE Computer Society Conference on. Vol. 1. IEEE. 2005, pp. 886–893.25/47

Page 29: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Feature extraction: ColorHistogram of Oriented Optical Flow (HOOF)

Dense optical flow computation.

8-bin histogram.

Signed gradients (0 - 360degrees).

Contribution to the histogramgiven by the vector magnitude.

Final vector of 8 values per cell.Optical flow

26/47

Page 30: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Feature extraction: ColorScore Maps (SM)

Score map based on Gaborfilters.

C = 6 component filters perbody part.

M = 26 body parts.

L scales per image.

Score maps from Ramanan et al

score(pl) =1

C

1

M

∑c∈C

∑m∈M

score(pl)mc (3)

score(p) =1

L

∑l∈L

score(pl)′ (4)

Yi Yang and Deva Ramanan. “Articulated pose estimation with flexiblemixtures-of-parts”. In: Computer Vision and Pattern Recognition (CVPR),2011 IEEE Conference on. IEEE. 2011, pp. 1385–1392.27/47

Page 31: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Feature extraction: DepthHistogram of Oriented Depth Normals (HON)

Depth dense maps to 3Dpoint cloud structures.

Surface normalscomputations.

Angle distribution quantizedin 8-bin histogram.

Depth normals

28/47

Page 32: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Feature extraction: ThermalHistogram of Thermal Intensities and Oriented Gradients (HIOG)

Concatenation of 2histograms:

1 Thermal intensitities[0, 255].

2 Orientation of thermalgradients (similar toHOG).

8 bins per histogram.Thermal intensities and

oriented gradients

29/47

Page 33: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Classifiers overview

1 Statistical Learning:

Gaussian Mixture Models (GMM)Subject and object probabilitiesIndividual prediction

2 Multi-modal fusion approaches:

Naive approachDiscriminative classifiersStacked learning fashion

30/47

Page 34: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Cell classificationGaussian Mixture Models

Unsupervised learning method for fitting multiple Gaussians toa set of multi-dimensional data points to obtain a likelihood L.

Trained using Expectation Maximization algorithm.

L =

N∏x∈X

K∏k=1

p(x|k)P (k) (5)

h1

GMM

HOGHOOFHONHIOG

Cell-based

y1

HOG HOOF

HON HIOG

31/47

Page 35: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Individual Prediction

h3

Multi-modalFusion

⎨y1' ∪ y2⎬

Normalization

Pixel to cell

description +

Normalization

y3

+

y1'

h1

GMM

h2

Individual Prediction

HOGHOOFHONHIOG

Cell-based

SM

Pixel-based

y1

y2

32/47

Page 36: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Individual PredictionCell-based

Predict if a region corresponds to subject or object, for eachcell-based descriptor individually. Grid cell voting v:

v =∑i,j

1{Ld,subij > Ld,objij } (6)

Based on a threshold vthr that defines the minimum number ofpositive votes needed to assign the subject label to the givenregion:

vthr =vgridhgrid

2(7)

Final decision tdr :

tdr = 1

{v > vthr

}∨{1{v = vthr

}·1{∑

i,j

(Ld,subij −Ld,objij

)> 0}}(8)

33/47

Page 37: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Individual PredictionPixel-based

Prediction of a given region defined by:

α: minimum score of a pixel to be considered as a person.

η: minimum percentage of pixels inside a region considered asperson that are needed to label the whole region as a person.

Final decision tdr :

tdr = 1

{ 1

Nr

Nr∑i=1

1{score(pi) > α} > η}

(9)

where Nr denotes the number of pixels of a region.

34/47

Page 38: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Multi-modal fusion

h3

Multi-modalFusion

⎨y1' ∪ y2⎬

Normalization

Pixel to cell

description +

Normalization

y3

+

y1'

h1

GMM

h2

Individual Prediction

HOGHOOFHONHIOG

Cell-based

SM

Pixel-based

y1

y2

35/47

Page 39: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Multi-modal fusionNaive approach

1 Voting among all descriptors using individual predictions tdr .

2 If there is a strong agreement between descriptions, thosedescriptions that differ are not taking into account in the thirdstep.

3 Cell level fusion:

Ld,subij =∑d∈D′

Ld,subij , Ld,objij =∑d∈D′

Ld,objij (10)

4 Predict tr following the same procedure as in individualprediction.

36/47

Page 40: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Multi-modal fusionSVM-based approach

Discriminative supervised binary classifier that learns a modelwhich represents the instances as points in space, mapped insuch a way that instances of different classes are separated bya hyperplane in a high dimensional space.

Approaches:

Simple: {Ld,subij , Ld,obj

ij }Stacked: {Ld,sub

ij , Ld,objij , tdr}

37/47

Page 41: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Outline

1 Introduction

2 Tri-modal dataset

3 Proposed baseline

4 EvaluationExperimental methodology and validation measuresExperimental results

5 Conclusions and future work

38/47

Page 42: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Experimental methodology and validation measures

10-fold cross validation.

Grid search to optimize all the parameters.

Training without unknown labels.

GMM with 3 components per Gaussian.

SVM approaches for multi-modal fusion (simple and stacked):LinearRBF

Don’t care region.

Segmentation accuracy measure: Jaccard Index

overlap(A,B) =|A ∩B||A ∪B|

(11)

39/47

Page 43: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Quantitative results

0 1 3 5 70.45

0.5

0.55

0.6

0.65

0.7

0.75

DCR (pixels)

Overlap

Individual Prediction

HOG

SM

HOOF

HIOG

HON

(a) Individual prediction

0 1 3 5 70.35

0.4

0.45

0.5

0.55

0.6

DCR (pixels)

Overlap

Naive Fusion

Thermal

Color/Depth

(b) Naive fusion

40/47

Page 44: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Quantitative results

0 1 3 5 70.45

0.5

0.55

0.6

0.65

0.7

0.75

DCR (pixels)

Ove

rla

p

Simple Linear SVM

Thermal

Color/Depth

(c) Fusion usingSimple linear SVM

0 1 3 5 70.45

0.5

0.55

0.6

0.65

0.7

0.75

DCR (pixels)

Ove

rla

p

Simple RBF SVM

Thermal

Color/Depth

(d) Fusion usingSimple RBF SVM

0 1 3 5 70.45

0.5

0.55

0.6

0.65

0.7

0.75

DCR (pixels)

Ove

rla

p

Stacked Linear SVM

Thermal

Color/Depth

(e) Fusion usingStacked Linear SVM

0 1 3 5 70.45

0.5

0.55

0.6

0.65

0.7

0.75

DCR (pixels)

Ove

rla

p

Stacked RBF SVM

Thermal

Color/Depth

(f) Fusion usingStacked RBF SVM

41/47

Page 45: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Quantitative results

Table: Overlap results of the individual predictions for each description

DCR HOG SM HOOF HIOG HON0 62.10 % 63.12 % 56.97 % 46.35 % 56.76 %

1 64.71 % 65.85 % 59.41 % 47.99 % 59.09 %

3 67.59 % 69.02 % 62.13 % 50.85 % 61.70 %

5 68.65 % 70.40 % 63.20 % 53.02 % 62.77 %

7 68.65 % 70.72 % 63.28 % 54.45 % 62.94 %

Table: Overlap results of fusion using Stacked Linear SVM for each modality

DCR Thermal Color/Depth0 49.64 % 64.65 %

1 51.33 % 67.39 %

3 54.29 % 70.43 %

5 56.56 % 71.58 %

7 58.11 % 71.63 %

42/47

Page 46: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Qualitative results

Comparison between masks generated after background subtractionand masks generated using Stacked Linear SVM.

43/47

Page 47: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Outline

1 Introduction

2 Tri-modal dataset

3 Proposed baseline

4 Evaluation

5 Conclusions and future workConclusionsFuture work

44/47

Page 48: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Conclusions

A solution for human body segmentation in multi-modal datahas been proposed.

A novel tri-modal dataset has been presented, containing RGB- Depth - Thermal modalities.

Results show variable performance for the different modalitieswhen segmenting people in multi-modal data, and improvedsegmentation accuracy of the multi-modal GMM-SVMstacked learning method.

45/47

Page 49: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Future work

Impact factor journal in progress in collaboration with AalborgUniversity and HuPBA group.

Silhouette masks refinement using Grabcuts.

If pixel-level registration available among all modalities:

Combination of different modalities in background subtraction.Pixel-level feature extraction.Pixel-level description.

Extensive validation in real surveillance scenarios as a first realcase study, including gesture recognition methodologies(planning just started with Aalborg University).

46/47

Page 50: Tri-modal Human Body Segmentation - UBsergio/linked/cristinamaster.pdf · Tri-modal Human Body Segmentation ... Human body articulated nature. ... M= 26 body parts. L scales per image

Introduction Tri-modal dataset Proposed baseline Evaluation Conclusions and future work

Thank you.

47/47