talk 2009-monash-seminar-intelligent-video-surveillance

37
www.monash.edu.au Gippsland School of Information Technology (GSIT) Behaviour Recognition Framework for Intelligent Visual Surveillance 06 April 2009

Upload: mahfuzul-haque

Post on 24-May-2015

86 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

Gippsland School of Information Technology (GSIT)

Behaviour Recognition

Framework for Intelligent

Visual Surveillance

06 April 2009

Page 2: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

2

Project Team

• Mahfuzul Haque, PhD student (2 yrs, 1 m)

• A/Prof. Manzur Murshed

• Dr. Manoranjan Paul

Page 3: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

3

Project Motivation

“Behaviour Recognition Framework for

Intelligent Visual Surveillance”

Why “Intelligent” Surveillance?

Why “Behaviour Recognition”?

What type of “Behaviours”?

Page 4: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

4

Surveillance Everywhere

Are we really protected?

Page 5: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

5

Too Many Cameras

Deployment of large number of surveillance cameras in recent years

London Heathrow airport has more than 5000 cameras!!

Page 6: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

6

Behind the Scene: Worried Human Monitor

Dependability on human monitors has increased.

Reliability on surveillance system has decreased.

Page 7: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

7

Project Goals

Aiding human monitors by

automatic detection of

specific abnormal behaviors

Decreasing dependability on

human monitors

Improving reliability of

surveillance systems for

ensuring human security

Page 8: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

8

Project Scope

Mob Violence

Crowding

Sudden Group Formation

Sudden Group Deformation

Shooting

Panic Driven Behaviours

Group Behaviours

Page 9: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

9

Research Question

How to recognize specific group

behaviours from surveillance video

streams in real-time?

Research Area - Computer Vision

- Application of Machine Learning

Page 10: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

10

System Architecture

Surveillance

Video Stream

Behaviour Profile

Behaviour Recognition

Framework

Page 11: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

11

System Architecture

Surveillance

Video Stream

Behaviour Recognition

Framework

Behaviour Profile

Page 12: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

12

Behaviour Profile

0 20 60 140 320

Surveillance Video Stream (System Input)

Unknown Group

Appearing

Group

Appearing

Group

Merging

Group

Splitting

Time

Behaviour Profile (System Output)

Page 13: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

13

System Architecture

Surveillance

Video Stream

Behaviour Recognition

Framework

Behaviour Profile

Page 14: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

14

Behaviour Recognition Framework

Framework Components

• Background Modelling

• Frame Level Feature Extraction

• Temporal Feature Extraction

• Behaviour Classification

Page 15: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

15

Behaviour Recognition Framework

Background Modelling

Frame Level Feature

Extraction

Temporal Feature

Extraction

Behaviour Classification

Page 16: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

16

Background Modelling

Background Modelling

Frame Level Feature

Extraction

Temporal Feature

Extraction

Behaviour Classification

Page 17: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

17

Background Modelling

How to extract the active regions from surveillance video stream?

Background Subtraction

- =

Current frame Background Moving foreground

Challenges!!

Page 18: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

18

Background Modelling

Sky

Cloud

Leaf

Moving Person

Road

Shadow

Moving Car

Floor

Shadow

Walking People

P(x)

x µ

σ2

P(x)

x µ

σ2

P(x)

x µ

σ2

P(x)

Sky

Cloud

Person

Leaf

x (Pixel intensity)

Page 19: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

19

Background Modelling

road shadow car road shadow

Frame 1 Frame N

Current frame Moving foreground

Background

Model

ω1

σ12

µ1

road

ω2

σ22

µ2

shadow

ω3

σ32

µ3

car

65% 20% 15%

Background Models

Models are ordered by ω/σ

Page 20: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

20

Background Modelling

(1) PETS2000; (2) PETS2006-B1; (3) PETS2006-B2; (4) PETS2006-B3; and (5) PETS2006-B4.

First

Frame

Test

Frame

Ground

Truth S&G Lee

Proposed

(1)

(2)

(3)

(4)

(5)

Page 21: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

21

Frame Level Feature Extraction

Background Modelling

Frame Level Feature

Extraction

Temporal Feature

Extraction

Behaviour Classification

Page 22: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

22

Frame Level Feature Extraction

• Feature Categories:

– Count

– Area

– Density

– Bounding Box

– Filling Ratio

– Aspect Ratio

• 30 frame level features

Bounding Boxes

Page 23: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

23

Frame Level Feature Extraction

Foreground Count

• FC (Foreground Count)

Foreground Area

• TFA (Total Foreground Area)

• AFA (Average Foreground Area)

• VFA (Variance of Foreground Area)

• MAXFA (Maximum Foreground Area)

• MINFA (Minimum Foreground Area)

Foreground Density

• AFD (Average Foreground Density)

• VFD (Variance of Foreground Density)

Filling Ratio

• TFR (Total Filling Ratio)

• AFR (Average Filling Ratio)

• VFR (Variance of Filling Ratio)

• MAXFR (Maximum Filling Ratio)

• MINFR (Minimum Filling Ratio)

Bounding Box – Area

• TBBA (Total Bounding Box Area)

• ABBA (Average Bounding Box Area)

• VBBA (Variance of Bounding Box Area)

• MAXBBA (Maximum Bounding Box Area)

• MINBBA (Minimum Bounding Box Area)

Bounding Box – Width

• ABBW (Average Bounding Box Width)

• VBBW (Variance of Bounding Box Width)

• MAXBBW (Maximum Bounding Box Width)

• MINBBW (Minimum Bounding Box Width)

Bounding Box – Height

• ABBH (Average Bounding Box Height)

• VBBH (Variance of Bounding Box Height)

• MAXBBH (Maximum Bounding Box Height)

• MINBBH (Minimum Bounding Box Height)

Aspect Ratio

• AAR (Average Aspect Ratio)

• VAR (Variance of Aspect Ratio)

• MAXAR (Maximum Aspect Ratio)

• MINAR (Minimum Aspect Ratio)

Page 24: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

24

Temporal Feature Extraction

Background Modelling

Frame Level Feature

Extraction

Temporal Feature

Extraction

Behaviour Classification

Page 25: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

25

Temporal Feature Extraction

• Fixed length, partially overlapped

sliding window

• Temporal data smoothing – polynomial

curve fitting

• 9 temporal features for each frame level

feature

• Output: 270 temporal features

Page 26: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

26

Temporal Feature Extraction

TFA (Total Foreground Area)

TF

A (

%)

Time (window = 100 frames)

Temporal Features

• MAX

• MIN

• AVG

• VAR

• RATE

• TIME(MAX)

• TIME(MIN)

• D = TIME(MAX) - TIME(MIN)

• SLOPE ( D/2 )

Page 27: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

27

Temporal Feature Extraction

TFA (Total Foreground Area)

TF

A (

%)

Time (window = 100 frames)

MAX

MIN

TIME(MAX) TIME(MIN)

Temporal Features

• MAX

• MIN

• AVG

• VAR

• RATE

• TIME(MAX)

• TIME(MIN)

• D = TIME(MAX) - TIME(MIN)

• SLOPE ( D/2 )

Page 28: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

28

Behaviour Classification

Background Modelling

Frame Level Feature

Extraction

Temporal Feature

Extraction

Behaviour Classification

Page 29: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

29

Behaviour Classification

• Individual classifiers for each behaviour

• Supervised training

• Feature ranking

• Top 100 features from 270 features

• Dimension reduction (PCA)

• Max dimension 30

• SVM classifier

• Output: Behaviour Profile

Page 30: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

30

Behaviour Classification

Experiments

GROUP FORMING

• Accuracy: 0.9767

• Top 3 features

• TIME(MAX)-VFD

• TIME(MAX)-AFD

• TIME(MAX) - TIME(MIN)-VFD

GROUP SPLITTING AND SPREADING

• Accuracy: 0.8488

• Top 3 features

• TIME(MAX)-VFD

• TIME(MIN)-ABBA

• TIME(MIN)-AFA

BLOCKED EXIT

• Accuracy: 0.9651

• Top 3 features

• TIME(MIN)-TFA

• MIN-MINAR

• TIME(MAX)-TFA

Page 31: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

31

Summary: Framework Components

Background Modelling

• Multiple Background Models

• Gaussian Mixture Models (GMM)

• Unsupervised

• Output: Foreground Region/Mask

Frame Level Feature Extraction

• Feature Categories:

• Count

• Area

• Density

• Bounding Box

• Filling Ratio

• Aspect Ratio

• Output: 30 Frame Level Features

Temporal Feature Extraction

• Fixed Length, Partially Overlapped Sliding Window

• Temporal Data Smoothing – Polynomial Curve Fitting

• 9 Temporal Features for Each Frame Level Features

• Output: 270 Temporal Features

Behaviour Classification

• Individual Classifiers for Each Behaviour

• Each Classifier is Trained Using Supervised Learning

• Feature Ranking

• Top 100 Features

• Dimension Reduction (PCA)

• Max Dimension 30

• SVM classifier

• Output: Behaviour Profile

Page 32: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

32

Summary: Framework Components

Background Modelling

• Multiple Background Models

• Gaussian Mixture Models (GMM)

• Unsupervised

• Output: Foreground Region/Mask

Frame Level Feature Extraction

• Feature Categories:

• Count

• Area

• Density

• Bounding Box

• Filling Ratio

• Aspect Ratio

• Output: 30 Frame Level Features

Temporal Feature Extraction

• Fixed Length, Partially Overlapped Sliding Window

• Temporal Data Smoothing – Polynomial Curve Fitting

• 9 Temporal Features for Each Frame Level Features

• Output: 270 Temporal Features

Behaviour Classification

• Individual Classifiers for Each Behaviour

• Each Classifier is Trained Using Supervised Learning

• Feature Ranking

• Top 100 Features

• Dimension Reduction (PCA)

• Max Dimension 30

• SVM classifier

• Output: Behaviour Profile

Page 33: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

33

Research Challenges

• No tracking/trajectory

• Simple behaviours

– Group appear/disappear

– Group merge/split

• Panic driven behaviours

– Fire/Blocked exit

– Fighting/Shooting

• Context variation

– Speed

– Direction

– Object Resolution

Page 34: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

34

Implemented System: VSTK

Page 35: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

35

Publications

1. Mahfuzul Haque, Manzur Murshed, and Manoranjan Paul, Improved

Gaussian Mixtures for Robust Object Detection by Adaptive Multi-

Background Generation, International Conference on Pattern

Recognition (ICPR), Tampa, Florida, USA, 2008. (CORE A)

2. Mahfuzul Haque, Manzur Murshed, and Manoranjan Paul, A Hybrid

Object Detection Technique from Dynamic Background Using

Gaussian Mixture Models, IEEE International Workshop on Multimedia

Signal Processing (MMSP), Cairns, Australia, 2008. (CORE A)

3. Mahfuzul Haque, Manzur Murshed, and Manoranjan Paul, On Stable

Dynamic Background Generation Technique using Gaussian Mixture

Models for Robust Object Detection, IEEE International Conference On

Advanced Video and Signal Based Surveillance (AVSS), Santa Fe,

New Mexico, USA, 2008. (CORE A)

CORE - COmputing Research and Education Association

Page 36: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

36

Thank you!

Q&A [email protected]

http://www.mahfuzulhaque.com

Page 37: Talk 2009-monash-seminar-intelligent-video-surveillance

www.monash.edu.au

37

Acknowledgments

• http://www.fotosearch.com/DGV464/766029/

• http://www.cyprus-trader.com/images/alert.gif

• http://security.polito.it/~lioy/img/einstein8ci.jpg

• http://www.dtsc.ca.gov/PollutionPrevention/images/question.jpg

• http://www.unmikonline.org/civpol/photos/thematic/violence/streetvio2.jpg

• http://www.airports-worldwide.com/img/uk/heathrow00.jpg

• http://www.highprogrammer.com/alan/gaming/cons/trips/genconindy2003/exhibit-hall-crowd-2.jpg

• http://www.bhopal.org/fcunited/archives/fcu-crowd.jpg

• http://img.dailymail.co.uk/i/pix/2006/08/passaPA_450x300.jpg

• http://www.defenestrator.org/drp/files/surveillance-cameras-400.jpg

• http://www.cityofsound.com/photos/centre_poin/crowd.jpg

• http://www.hindu.com/2007/08/31/images/2007083156401501.jpg

• http://paulaoffutt.com/pics/images/crowd-surfing.jpg

• http://msnbcmedia1.msn.com/j/msnbc/Components/Photos/070225/070225_surveillance_hmed.hmedium.jpg

URLs of the images used in this presentation