2009.06.09 chris poppe - public phd defense

39
ELIS – Multimedia Lab Detectie en representatie van bewegende objecten voor videobewaking Detection and Representation of Moving Objects for Video Surveillance Chris Poppe Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University Supervisor: prof. dr. ir. Rik Van de Walle

Upload: chris-poppe

Post on 19-Jan-2015

1.074 views

Category:

Technology


1 download

DESCRIPTION

Chris Poppe's public PhD defense entitled: "Detection and Representation of Moving Objects for Video Surveillance", 9th of June, 2009.

TRANSCRIPT

Page 1: 2009.06.09   chris poppe - public PhD defense

ELIS – Multimedia Lab

Detectie en representatie van bewegende objecten voor

videobewaking

Detection and Representation of Moving Objects for Video

SurveillanceChris Poppe

Multimedia LabDepartment of Electronics and Information Systems

Faculty of EngineeringGhent University

Supervisor: prof. dr. ir. Rik Van de Walle

Page 2: 2009.06.09   chris poppe - public PhD defense

2/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

Page 3: 2009.06.09   chris poppe - public PhD defense

3/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Video Surveillance

• “Usage of a video camera to act upon crime” • Number of cameras and surveillance systems has grown

– 2004: 4 285 000 cameras in United Kingdom

• Operators have problems to interpret the increasing amount of data– need for intelligent video surveillance systems

Page 4: 2009.06.09   chris poppe - public PhD defense

4/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Intelligent Video Surveillance System

encoding

video

analytics

storage

visualization

video + metadat

a

Page 5: 2009.06.09   chris poppe - public PhD defense

5/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Video Surveillance

• Automated analysis of the video to make intelligent decisions

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

person1

person2

intruder alert!!!

analytics

1. detection of moving objects

2. tracking3. classification4. identification5. interpretation

Page 6: 2009.06.09   chris poppe - public PhD defense

6/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Moving Object Detection

• Detection of moving objects first step in video analytics– needs to be fast and accurate

• Classify each pixel in the image as foreground or background

• Current techniques – good for “simple” situations– problems with moving trees, changing lighting conditions,

environmental conditions, …

• Goal– fast and robust detection of moving objects

analytics

Page 7: 2009.06.09   chris poppe - public PhD defense

7/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Introduction: Moving Object Representation

• Analytics extracts information (e.g., moving objects) from video– represented using standardized formats (metadata standards)

• Large video surveillance systems contain several analytics modules – same information can be represented using different formats

• To retrieve relevant information (e.g., find all moving objects) a common understanding of this information is needed

• Goal – provide means to combine different metadata standards

analytics information

metadatastandard

Page 8: 2009.06.09   chris poppe - public PhD defense

8/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

Page 9: 2009.06.09   chris poppe - public PhD defense

9/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Moving Object Detection in the Pixel Domain

• Background subtraction – create a background model for each pixel– compare new images with the background model– large differences result in foreground objects

• Different background models have been proposed in the literature– previous value, average value, …

background model

new image result

- =

Page 10: 2009.06.09   chris poppe - public PhD defense

10/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Moving Object Detection in the Pixel Domain

• Problems with background subtraction1. moving trees, opened or closed doors, construction works, …

• single static model is insufficient

2. noise, weather conditions, shadows, …• model needs to accommodate for such situations

3. parked car • need to gather information on background and foreground

• Solution: multimodal background subtraction 1. multiple models per pixel2. each model contains several dynamic parameters3. model can represent both background and foreground

background model• noise statistics• previous value• average value

background model• noise statistics• previous value• average value

foreground model• noise statistics• previous value• average value

Page 11: 2009.06.09   chris poppe - public PhD defense

11/39

ELIS – Multimedia Lab

model 2

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Multimodal Background Subtraction

model 1 model 3

For each new image1.compare pixel value with the models

• find a match with one of the models2.adapt the parameters of the models3.decision based on the matched model

background model• noise statistics• previous value• average value

background model• noise statistics• previous value• average value

foreground model• noise statistics• previous value• average value

pixel is background

Page 12: 2009.06.09   chris poppe - public PhD defense

12/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Multimodal Background Subtraction

• Each pixel in the image has been classified as foreground or background

• Problem of “camouflage”– moving objects can contain parts that resemble the

environment

• Only using temporal information is not sufficient

Page 13: 2009.06.09   chris poppe - public PhD defense

13/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Spatio-Temporal Multimodal Background Subtraction

• Use spatial information to improve the temporal background subtraction– spatial segmentation

• edge detection• fill the segments

Page 14: 2009.06.09   chris poppe - public PhD defense

14/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Spatio-Temporal Multimodal Background Subtraction

• Combine spatial segmentation with temporal detection– segments containing many foreground pixels

are entirely regarded as foreground

spatio-tempor

al

temporal

spatial

Page 15: 2009.06.09   chris poppe - public PhD defense

15/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Objective Results

• Precision: How much of the detected foreground pixels are correct?

• Recall: How much of the real foreground pixels are detected?

• Apply algorithm on video sequence and count correct and wrong detections– calculate precision and recall value

• Good systems obtain high precision and recall• Different parameter of an algorithm gives different outputs

– vary parameters– calculate precision and recall values– represent on a graph

Page 16: 2009.06.09   chris poppe - public PhD defense

16/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Objective Results

• Compare proposed algorithm with similar techniques– Stauffer (2001), Shan (2006)

Page 17: 2009.06.09   chris poppe - public PhD defense

17/39

ELIS – Multimedia Lab

Evaluation: Subjective Results

• Visual examples of output of different algorithms

input image

ground truth

Stauffer ‘01

Shan ‘06 proposed

Page 18: 2009.06.09   chris poppe - public PhD defense

18/39

ELIS – Multimedia Lab

• Proposed system is faster than related work• Spatial segmentation can occur in parallel with temporal

detection– processing speed can be increased

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Execution Times

Sequence Stauffer’01(fps)

proposed

(fps)

temporal(fps)

spatial(fps)

PetsD2TeC2 (384x288)

8.33 10 29.4 18.2

Indoor (340x240) 9.5 15.4 45.5 30

Ismail (320x240) 9.7 14.9 71.4 29.4

ThirdView (720x576)

1.1 2.3 3.6 7.7

Page 19: 2009.06.09   chris poppe - public PhD defense

19/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

Page 20: 2009.06.09   chris poppe - public PhD defense

20/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Moving Object Detection in the Compressed Domain

• Video is encoded to reduce network traffic and storage cost

• Video coding exploits redundancy in video– neighboring pixels often have similar values– successive images are closely related

• Before video analytics can be applied a decoding step is needed

• Apply analytics directly on the compressed bit stream

encoding

analytics

Page 21: 2009.06.09   chris poppe - public PhD defense

21/39

ELIS – Multimedia Lab

H.264/AVC

• Block-based video coding (standardized 2003)– frame divided into macroblocks (MBs) of 16x16 pixels – MBs are predicted based on previously encoded data– difference between prediction and MB is further encoded

• motion vector is stored in the bit stream to point to the prediction

• Current object detection techniques are based on motion vectors– motion vectors are created to compress,

not to represent the real motion– processing/filtering needed

to deal with noisy motion vectors

• Search for new approach

motion vectors

Page 22: 2009.06.09   chris poppe - public PhD defense

22/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Observations

• Size of a MB (number of bits used within the compressed bit stream) changes over several consecutive images– MBs corresponding to background use few bits (frame 0 to 90)– if moving object passes the size of the MB rises (frame 90 to

120)

Page 23: 2009.06.09   chris poppe - public PhD defense

23/39

ELIS – Multimedia Lab

• Background model for each MB– training period– determine maximum size

• Threshold T• Compare MB sizes

with maximum + T– MBs with large sizes are

considered foreground

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

MB-based Background Subtraction

T

Page 24: 2009.06.09   chris poppe - public PhD defense

24/39

ELIS – Multimedia Lab

(sub)MB-based Background Subtraction

• MBs can be coarse (16x16 pixels)• H.264/AVC divides MBs into subMBs (4x4 pixels)• Refine the MB output to subMB level

– only regard foreground MBs at the boundaries of moving object

– analyze the size (in bits) of the subMBs in these boundary MBs

– small subMBs are regarded as background

Page 25: 2009.06.09   chris poppe - public PhD defense

25/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Objective comparison

• Precision: How much of the detected foreground pixels are correct?

• Recall: How much of the real foreground pixels are detected?• Comparison with Zeng (2005) (based on motion vectors)

Page 26: 2009.06.09   chris poppe - public PhD defense

26/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Execution Times

• Very high execution speeds– up to 20x faster than the related work

SequenceZeng’0

5(fps)

proposed(fps)

Etri od A (352x240) 28 662

PetsD2TeC2 (384x288)

22 448

Indoor (340x240) 31 751

Page 27: 2009.06.09   chris poppe - public PhD defense

27/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Subjective Results

• Demonstration

Page 28: 2009.06.09   chris poppe - public PhD defense

28/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

Page 29: 2009.06.09   chris poppe - public PhD defense

29/39

ELIS – Multimedia Lab

• Metadata is “data about data”– data about detected object: size, color, bounding box, …

• Metadata standard– common agreement on the format of the metadata

• Several metadata standards exist for video surveillance– modules can use different standards– same information can be represented in different formats

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

analytics1

metadata

metadatastandard A

analytics2

metadata

metadata standard B

metadata

metadatastandard B

Page 30: 2009.06.09   chris poppe - public PhD defense

30/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

• Metadata standards– XML (eXtensible Markup Language)

• describes terms and structure of metadata

– specification• textual description of the semantics of the XML elements

<object id=“0”> <box xc=“77” yc=“73” w=“21” h=“16”/></object>

Box: “Coordinates of the centre and the dimensions of the bounding box of a detected object in pixels.”

metadata example 1

CVML (Computer Vision Markup Language)

<LLID =“LLID1”><Mask> <BB mp7:dim=“4”>67 65 88 91</BB></Mask> </LLID>

BB: “Coordinates of a rectangular segment.”

metadata example 2

VS7 (Video Surveillance Schema)

Page 31: 2009.06.09   chris poppe - public PhD defense

31/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

• Proposal: use Semantic Web Technologies– make information on the internet accessible for

machines– information in a domain is structured using an

ontology• a data model that represents a set of concepts and relations

amongst these concepts within a specific domain

• OWL (Web Ontology Language)– W3C Recommendation (2004)

– standardized language for the description of an ontology

• classes, properties and relations• Individuals or instances

– can be queried through standardized languages

Page 32: 2009.06.09   chris poppe - public PhD defense

32/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

• Example: ontology for domain of science

subClassOf

birth date

DatatypeProperty

PersonClass: Person

Class: ScientistScientist

Individualbirth date

“14/10/1801”

OWL constructs• Class• DatatypeProperty• subClassOf• Individual• …

“Joseph Plateau”

Page 33: 2009.06.09   chris poppe - public PhD defense

33/39

ELIS – Multimedia Lab

• Create OWL ontologies for the metadata standards used in video surveillance– CVML, VS7, MPEG-7, …

• Mappings link the different ontologies– use OWL constructs to link classes– denote that classes in the different ontologies can be

the same

• Information in different formats is linked– however, metadata can be very technical or general

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

OWL ontologyCVML

OWL ontologyVS7

OWL ontologyMPEG7

Page 34: 2009.06.09   chris poppe - public PhD defense

34/39

ELIS – Multimedia Lab

• One global ontology with general concepts for video surveillance

• Link with metadata ontologies through mappings• Layered metadata model • Only need to know the upper ontology to retrieve

information (e.g., retrieve all images with moving objects)OWL ontologyVideo Surveillance

upper layer

lower layer

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Metadata: Representing Moving Objects

OWL ontologyCVML

OWL ontologyVS7

OWL ontologyMPEG7

Page 35: 2009.06.09   chris poppe - public PhD defense

35/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Evaluation: Practical Use Case Scenario

• Scenario– “operator wants to retrieve images that contain moving

objects”– analytics module 1 detects objects in CVML (XML)– analytics module 2 detects objects in VS7 (XML)

• Proposed– XML fragments are automatically converted to OWL instances– through the mappings these instances are linked to each

otherand to the Video Surveillance Ontology

– operator can use standardized languages to query the Video Surveillance Ontology

• Related work– specific software written to interpret CVML and VS7– specific software written to “translate” the operator’s request

to the corresponding XML elements

Page 36: 2009.06.09   chris poppe - public PhD defense

36/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Outline

• Introduction: Context and Problem Description

• Detection of Moving Objects in the Pixel Domain

• Detection of Moving Objects in the Compressed Domain

• Metadata: Representing Moving Objects

• Conclusions

Page 37: 2009.06.09   chris poppe - public PhD defense

37/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Conclusions

• Algorithm for the detection of moving objects in pixel domain– multimodal background subtraction technique – combines spatial and temporal information– evaluated by comparison with related work

• more robust detection• faster execution speeds

• Algorithm for detection of moving objects in the compressed domain– novel approach that disregards motion vectors– macroblock-based background subtraction– evaluated by comparison with related work

• better detection results (very high precision)• up to 20 times faster than the related work

Page 38: 2009.06.09   chris poppe - public PhD defense

38/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Conclusions

• Metadata for the representation of moving objects– discussed problems of the usage of different XML-based

metadata standards– introduction of Semantic Web Technologies – layered metadata model

• upper Video Surveillance Ontology• lower layer with pool of metadata ontologies• links defined using mappings

– evaluation based on practical use case scenario

Page 39: 2009.06.09   chris poppe - public PhD defense

39/39

ELIS – Multimedia Lab

Detection and Representation of Moving Objects for Video Surveillance Chris Poppe

Ghent, Belgium – June 9 2009

Publications

• First author of 3 publications recorded in SCI (A1)– Robust Spatio-Temporal Multimodal Background

Subtraction for Video Surveillance

Optical Engineering

– Moving Object Detection in the H.264/AVC Compressed Domain for Video Surveillance Applications

Journal of Visual Communication & Image Representation

– Personal Content Management System, a Semantic Approach

Journal of Visual Communication & Image Representation

• Co-author of 1 publication recorded in SCI (A1)• 17 articles at international conferences• 5 standardization contributions