crowd scene understanding with coherent recurrent neural...

Post on 22-May-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Crowd Scene Understanding with Coherent RecurrentNeural Networks

Hang Su, Yinpeng Dong, Jun Zhu

May 22, 2016

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 1 / 26

Outline

1 Introduction

2 LSTM Recap

3 Coherent LSTM

4 Experimental Results

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 2 / 26

Outline

1 Introduction

2 LSTM Recap

3 Coherent LSTM

4 Experimental Results

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 3 / 26

Background

Crowd scene is the scene of public places where a large group ofpeople who have gathered together such as a university campus orthe sidewalks of a busy street.

Groups are the main entities that make up a crowd.

When pedestrians walk in a crowded space, their trajectories areinfluenced by others and obstacles.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 4 / 26

Background

Crowd scene is the scene of public places where a large group ofpeople who have gathered together such as a university campus orthe sidewalks of a busy street.

Groups are the main entities that make up a crowd.

When pedestrians walk in a crowded space, their trajectories areinfluenced by others and obstacles.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 4 / 26

Background

Crowd scene is the scene of public places where a large group ofpeople who have gathered together such as a university campus orthe sidewalks of a busy street.

Groups are the main entities that make up a crowd.

When pedestrians walk in a crowded space, their trajectories areinfluenced by others and obstacles.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 4 / 26

Background

Crowd scene is the scene of public places where a large group ofpeople who have gathered together such as a university campus orthe sidewalks of a busy street.

Groups are the main entities that make up a crowd.

When pedestrians walk in a crowded space, their trajectories areinfluenced by others and obstacles.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 4 / 26

Applications

Understanding collective behaviors in crowd scenes has a wide range ofapplications in

Video Surveillance

Crowd Management

Avoiding Tragic Accidents

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 5 / 26

Applications

Understanding collective behaviors in crowd scenes has a wide range ofapplications in

Video Surveillance

Crowd Management

Avoiding Tragic Accidents

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 5 / 26

Applications

Understanding collective behaviors in crowd scenes has a wide range ofapplications in

Video Surveillance

Crowd Management

Avoiding Tragic Accidents

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 5 / 26

Problem Formulation

Obtain reliable tracklets from each scene using KLT trackers. Atany time-instant t, the ith person is represented by his/hercoordinate (x

i

(t),yi

(t)).

Predict future trajectories of pedestrians and use extracted hiddenfeatures to do other classification tasks.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 6 / 26

Problem Formulation

Obtain reliable tracklets from each scene using KLT trackers. Atany time-instant t, the ith person is represented by his/hercoordinate (x

i

(t),yi

(t)).

Predict future trajectories of pedestrians and use extracted hiddenfeatures to do other classification tasks.

Coherent regularization

Coherent regularization

Motion Prediction

LSTM

LSTM

LSTM

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 6 / 26

Challenge

Crowd spatio-temporalpatterns behavenonlinear dynamics

Limit cyclesQuasi-periodChaos

Collective e↵ect (orcoherent motion)

Pedestrian tend toform groupsIntra-group propertiesand inter-groupproperties.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 7 / 26

Challenge

Crowd spatio-temporalpatterns behavenonlinear dynamics

Limit cyclesQuasi-periodChaos

Collective e↵ect (orcoherent motion)

Pedestrian tend toform groupsIntra-group propertiesand inter-groupproperties.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 7 / 26

Challenge

Crowd spatio-temporalpatterns behavenonlinear dynamics

Limit cyclesQuasi-periodChaos

Collective e↵ect (orcoherent motion)

Pedestrian tend toform groupsIntra-group propertiesand inter-groupproperties.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 7 / 26

Previous Work

Traditional approach such asSocial Force model

Optimize energy functionHand-crafted functionsHard to generalize

Probabilistic Forecasting suchas Gaussian Process

Recurrent Neural NetworksN-LSTM (CVPR 2016)

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 8 / 26

Previous Work

Traditional approach such asSocial Force model

Optimize energy functionHand-crafted functionsHard to generalize

Probabilistic Forecasting suchas Gaussian Process

Recurrent Neural NetworksN-LSTM (CVPR 2016)

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 8 / 26

Previous Work

Traditional approach such asSocial Force model

Optimize energy functionHand-crafted functionsHard to generalize

Probabilistic Forecasting suchas Gaussian Process

Recurrent Neural NetworksN-LSTM (CVPR 2016)

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 8 / 26

Previous Work

Traditional approach such asSocial Force model

Optimize energy functionHand-crafted functionsHard to generalize

Probabilistic Forecasting suchas Gaussian Process

Recurrent Neural NetworksN-LSTM (CVPR 2016)

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 8 / 26

Outline

1 Introduction

2 LSTM Recap

3 Coherent LSTM

4 Experimental Results

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 9 / 26

LSTM

StructureInput / Output / ForgetgateMemory state ct

AdvantagePrevent vanishing gradientproblemNonlinear characteristicGeneralization

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 10 / 26

LSTM

StructureInput / Output / ForgetgateMemory state ct

AdvantagePrevent vanishing gradientproblemNonlinear characteristicGeneralization

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 10 / 26

LSTM

StructureInput / Output / ForgetgateMemory state ct

AdvantagePrevent vanishing gradientproblemNonlinear characteristicGeneralization

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 10 / 26

LSTM

i

t

= �(Wxi

x

t

+W

hi

h

t�1 +W

ci

c

t�1 + b

i

) (1)

f

t

= �(Wxf

x

t

+W

hf

h

t�1 +W

cf

c

t�1 + b

f

) (2)

c

t

= f

t

� c

t�1 + i

t

� tanh(Wxc

x

t

+W

hc

h

t�1 + b

c

) (3)

o

t

=�(Wxo

x

t

+W

ho

h

t�1 +W

co

c

t

+ b

o

) (4)

h

t

= o

t

� tanh(ct

) (5)

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 11 / 26

Outline

1 Introduction

2 LSTM Recap

3 Coherent LSTM

4 Experimental Results

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 12 / 26

Why Coherent LSTM?

LSTM can model individual behaviors but dose not capture theinteraction of people in a group.

The individuals are always willing to engage with “seed” groupsand form spatially coherent structures.

When the neighboring relationship of individuals remain invariantover time and correlation of their velocities remain high, they tendto have similar hidden state.

The trajectories of pedestrians not only follow the old trend, butalso are influenced by current environment.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 13 / 26

Why Coherent LSTM?

LSTM can model individual behaviors but dose not capture theinteraction of people in a group.

The individuals are always willing to engage with “seed” groupsand form spatially coherent structures.

When the neighboring relationship of individuals remain invariantover time and correlation of their velocities remain high, they tendto have similar hidden state.

The trajectories of pedestrians not only follow the old trend, butalso are influenced by current environment.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 13 / 26

Why Coherent LSTM?

LSTM can model individual behaviors but dose not capture theinteraction of people in a group.

The individuals are always willing to engage with “seed” groupsand form spatially coherent structures.

When the neighboring relationship of individuals remain invariantover time and correlation of their velocities remain high, they tendto have similar hidden state.

The trajectories of pedestrians not only follow the old trend, butalso are influenced by current environment.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 13 / 26

Why Coherent LSTM?

LSTM can model individual behaviors but dose not capture theinteraction of people in a group.

The individuals are always willing to engage with “seed” groupsand form spatially coherent structures.

When the neighboring relationship of individuals remain invariantover time and correlation of their velocities remain high, they tendto have similar hidden state.

The trajectories of pedestrians not only follow the old trend, butalso are influenced by current environment.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 13 / 26

cLSTM Unit

c

t

= f

t

� c

t�1 +X

j2N�j

(t)f jt

� c

j

t�1 + i

t

� tanh(Wxc

x

t

+W

hc

h

t�1 + b

c

)

(6)

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 14 / 26

cLSTM Unit

c

t

= f

t

� c

t�1 +X

j2N�j

(t)f jt

� c

j

t�1 + i

t

� tanh(Wxc

x

t

+W

hc

h

t�1 + b

c

)

(6)

σ

σ

σ

ϕ

Forget Gate

Input Gate

Output Gate

Cell

tx

1th −

th

Coherent Regularization

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 14 / 26

Coherent Motion Modeling

Use coherent filtering [Zhou et al., 2012a] [Shao et al., 2014] to discoverthe coherent group.

The dependency relationship between two tracklets within the samegroup is measured as:

⌧j

(t) =v

i

(t) · vj

(t)

kvi

(t)kkvj

(t)k (7)

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 15 / 26

Coherent Motion Modeling

Use coherent filtering [Zhou et al., 2012a] [Shao et al., 2014] to discoverthe coherent group.

The dependency relationship between two tracklets within the samegroup is measured as:

⌧j

(t) =v

i

(t) · vj

(t)

kvi

(t)kkvj

(t)k (7)

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 15 / 26

Dependency Coe�cient

The dependency coe�cient between the ith and jth tracklets in Eq. (6)is defined as

�j

(t) =1

Z

i

exp

✓⌧j

(t)� 1

2�2

◆2 (0, 1], (8)

Z

i

: normalization constant corresponding to the ith tracklet.

�j

(t) ' 1 if vi

(t) ' v

j

(t) which implies that tracklets i and j aresimilar.

Coherent regularization encourages the tracklets to learn similarfeature distributions by sharing information across tracklets withina coherent group.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 16 / 26

Dependency Coe�cient

The dependency coe�cient between the ith and jth tracklets in Eq. (6)is defined as

�j

(t) =1

Z

i

exp

✓⌧j

(t)� 1

2�2

◆2 (0, 1], (8)

Z

i

: normalization constant corresponding to the ith tracklet.

�j

(t) ' 1 if vi

(t) ' v

j

(t) which implies that tracklets i and j aresimilar.

Coherent regularization encourages the tracklets to learn similarfeature distributions by sharing information across tracklets withina coherent group.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 16 / 26

Dependency Coe�cient

The dependency coe�cient between the ith and jth tracklets in Eq. (6)is defined as

�j

(t) =1

Z

i

exp

✓⌧j

(t)� 1

2�2

◆2 (0, 1], (8)

Z

i

: normalization constant corresponding to the ith tracklet.

�j

(t) ' 1 if vi

(t) ' v

j

(t) which implies that tracklets i and j aresimilar.

Coherent regularization encourages the tracklets to learn similarfeature distributions by sharing information across tracklets withina coherent group.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 16 / 26

Dependency Coe�cient

The dependency coe�cient between the ith and jth tracklets in Eq. (6)is defined as

�j

(t) =1

Z

i

exp

✓⌧j

(t)� 1

2�2

◆2 (0, 1], (8)

Z

i

: normalization constant corresponding to the ith tracklet.

�j

(t) ' 1 if vi

(t) ' v

j

(t) which implies that tracklets i and j aresimilar.

Coherent regularization encourages the tracklets to learn similarfeature distributions by sharing information across tracklets withina coherent group.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 16 / 26

Framework

Unsupervised encoder-decoder cLSTM framework:

h

T

= cLSTMe

(xT

,hT�1), (9)

t

= cLSTMdr

(ht

, x̂t+1), where t 2 [1, T ], (10)

t

= cLSTMdp

(ht

, x̂t�1). where t > T, (11)

Coherent Regularization

1x 2x 3x

2x̂ 1̂x3x̂

4x̂ 5x̂ 6x̂eW eW

rdW rdW

pdW pdWEncoder

Reconstruction Decoder

Prediction DecoderLearnt Hidden

Features

Th

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 17 / 26

Crowd Scene Profiling

Solve critical tasks in crowd scene analysis:

Estimating group stateGas, Solid, Pure Fluid and Impure FluidSoftmax classification using the feature learnt from theunsupervised cLSTM.

Crowd video classification

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 18 / 26

Crowd Scene Profiling

Solve critical tasks in crowd scene analysis:

Estimating group stateGas, Solid, Pure Fluid and Impure FluidSoftmax classification using the feature learnt from theunsupervised cLSTM.

Crowd video classification

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 18 / 26

Outline

1 Introduction

2 LSTM Recap

3 Coherent LSTM

4 Experimental Results

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 19 / 26

Datasets and Settings

CUHK Crowd Datasethttp://www.ee.cuhk.edu.hk/~xgwang/CUHKcrowd.html

Scene: streets, shopping malls, airports and parksMore than 400 sequences and more then 200,000 traklets

Settings128 hidden units in cLSTM2/3 of tracklets as the input and 1/3 as the predicted tracklets toevaluate the performance.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 20 / 26

Datasets and Settings

CUHK Crowd Datasethttp://www.ee.cuhk.edu.hk/~xgwang/CUHKcrowd.html

Scene: streets, shopping malls, airports and parksMore than 400 sequences and more then 200,000 traklets

Settings128 hidden units in cLSTM2/3 of tracklets as the input and 1/3 as the predicted tracklets toevaluate the performance.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 20 / 26

Future Path Forecasting

Table 1: Error of Path Prediction(pixels)

Kalman Filter Un-coherent LSTM Coherent LSTM9.32 ± 1.99 6.64 ± 1.76 4.37±0.93

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 21 / 26

Future Path Forecasting

Table 1: Error of Path Prediction(pixels)

Kalman Filter Un-coherent LSTM Coherent LSTM9.32 ± 1.99 6.64 ± 1.76 4.37±0.93

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 21 / 26

Group State Estimation

Gas: Particles move in di↵erent directions without formingcollective behaviors

Solid: Particles move in the same direction with relative positionsunchanged

Pure Fluid: Particles move towards the same direction withever-changing relative positions

Impure Fluid: Particles move in a pure fluid style with invasion ofparticles from other groups

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 22 / 26

Group State Estimation

Gas: Particles move in di↵erent directions without formingcollective behaviors

Solid: Particles move in the same direction with relative positionsunchanged

Pure Fluid: Particles move towards the same direction withever-changing relative positions

Impure Fluid: Particles move in a pure fluid style with invasion ofparticles from other groups

(a) Gas (b) Solid (c) Pure Fluid (d) Impure Fluid

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 22 / 26

Group State Estimation

Confusion matrices of estimating group states using di↵erent methods:(a) collective transition [Shao et al., 2014]; (b) prediction LSTM; (c)reconstruction LSTM; (d) un-coherent LSTM; and (e) coherent LSTM.

(a) Collective Transition

(e) Coherent LSTM(d) Un-coherent LSTM

(b) Prediction LSTM (c) Reconstruction LSTM

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 23 / 26

Crowd Video Classification

All video clips are annotated into 8 classes as 1) Highly mixedpedestrian walking; 2) Crowd walking following a mainstream and wellorganized; 3) Crowd walking following a mainstream but poorlyorganized; 4) Crowd merge; 5) Crowd split; 6) Crowd crossing inopposite directions; 7) Intervened escalator tra�c; and 8) Smoothescalator tra�c.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 24 / 26

Crowd Video Classification

All video clips are annotated into 8 classes as 1) Highly mixedpedestrian walking; 2) Crowd walking following a mainstream and wellorganized; 3) Crowd walking following a mainstream but poorlyorganized; 4) Crowd merge; 5) Crowd split; 6) Crowd crossing inopposite directions; 7) Intervened escalator tra�c; and 8) Smoothescalator tra�c.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 24 / 26

Crowd Video Classification

All video clips are annotated into 8 classes as 1) Highly mixedpedestrian walking; 2) Crowd walking following a mainstream and wellorganized; 3) Crowd walking following a mainstream but poorlyorganized; 4) Crowd merge; 5) Crowd split; 6) Crowd crossing inopposite directions; 7) Intervened escalator tra�c; and 8) Smoothescalator tra�c.

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 25 / 26

Thanks for your time!

Questions?

Hang Su, Yinpeng Dong, Jun Zhu IJCAI 2016 May 22, 2016 26 / 26

top related