2.our framework 2.1. enforcing temporal consistency by post processing human detection from yang...

1
2. Our Framework 2.1. Enforcing Temporal Consistency by Post Processing Human Detection from Yang and Ramanan [1] Articulated Pose Estimation using Flexible Mixtures of Parts. Human Detection in Videos using Spatio-Temporal Pictorial Structures Amir Roshan Zamir, Afshin Dehghan, Ruben Villegas University of Central Florida 1. Problem Human Detection in Videos: Making Human Detection in Videos more Accurate. Possibility of Numerous False Detections. Applications: Video surveillance, Human Tracking, Action Recognition, etc. 5. Conclusion 6. Temporal Part Deformation Improves Human Detection in Videos Based on Our Experiments. Less False Detections and More True Detections. Part Trajectories are More Precise. 3. Learning Transition of Parts Human Body Parts Have a Set Range of Motion that Can Be Approximated. These Movements(Trajectories) Can Be Learned by Training on the Annotated Dataset. We will Use the HumanEva Dataset [2] for Our Training. 1.1Our Approach Using Temporal Information (Transition of Human Parts in Pictorial Structures). False Detections Should Be Temporally Less Consistent than True Detections. Human Parts Transition Should Convey Information Which is Ignored in the Frame-By-Frame Scenario. 2.2. Enforcing Temporal Consistency by Embedding them into the Detection Process Our Contribution: Extending Spatial Pictorial Structures to Spatio- Temporal Pictorial Structures. Temporal Deformation Cost 1 2 3 Frame Number : i i i Configuration of parts Appearance Spatial Deformation Cost More Elegant Approach than Post Processing(2.1). Best Detections Are Determined During the Optimization Process. Configuration of Parts are Limited to Transitions in Time (Temporal Deformation). This Transitions will Be Learned and Embedded in Our Optimization Process to Restrict the Detections. Part’s Trajectories Before Temporal Adjustment Part’s Trajectories After Temporal Adjustment Next Steps Applying the Temporal Deformation Cost in the Optimization Process. Train a Model that Considers Usual Human Part Transitions in Time. Part’s Trajectories on Video Parts Trajectories of Annotations Head Trajectory Before Temporal Adjustment Head Trajectory After Temporal Adjustment Annotated Parts in each frame Head Trajectories Comparison Input Frame Compute Human Detection Pick a Bundle of n Frames Check Part Transition in the Bundle of Frames Keep Frames that Move Consistent ly in Time Refine Part Location using Temporal Informatio n Input Frame Immediate Output from Human Detection Temporally Consistent Detection without Part Adjustment Temporally Consistent Detection with Part Adjustment 4. Results 5. Videos Taken From TRECVID MED11 Dataset. Human Detection Output Human Detection Output with Temporal Consistency Human Detection Output with Temporal Consistency and Part Adjustment Human Detection Output with Temporal Consistency and Part Adjustment Human Detection Output with Temporal Consistency Human Detection Output Input Frame Input Frame References [1] Y. Yang, D. Ramanan. “Articulated Pose Estimation using Flexible Mixture of Parts” Computer Vision and Pattern Recognition (CVPR) Colorado Springs, Colorado, June 2011 [2] L. Sigal, A. O. Balan and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulate Human Motion, International Journal of Computer Vision (IJCV), Volume 87, Number 1- 2, pp. 4-27, March, 2010.

Upload: scot-spencer-marsh

Post on 17-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2.Our Framework 2.1. Enforcing Temporal Consistency by Post Processing  Human Detection from Yang and Ramanan [1] Articulated Pose Estimation using Flexible

2. Our Framework

2.1. Enforcing Temporal Consistency by Post Processing Human Detection from Yang and Ramanan

[1] Articulated Pose Estimation using Flexible Mixtures of Parts.

Human Detection in Videos using Spatio-Temporal Pictorial Structures Amir Roshan Zamir, Afshin Dehghan, Ruben Villegas

University of Central Florida

1. Problem Human Detection in Videos:

Making Human Detection in Videos more Accurate. Possibility of Numerous False Detections.

Applications: Video surveillance, Human Tracking, Action Recognition, etc.

5. Conclusion6. Temporal Part Deformation Improves Human Detection in Videos Based on

Our Experiments. Less False Detections and More True Detections. Part Trajectories are More Precise.

3. Learning Transition of Parts Human Body Parts Have a Set Range of Motion that

Can Be Approximated. These Movements(Trajectories) Can Be Learned by

Training on the Annotated Dataset. We will Use the HumanEva Dataset [2] for Our

Training.

1.1 Our Approach Using Temporal Information (Transition of Human Parts in

Pictorial Structures). False Detections Should Be Temporally Less Consistent than

True Detections. Human Parts Transition Should Convey Information Which is

Ignored in the Frame-By-Frame Scenario.

2.2. Enforcing Temporal Consistency by Embedding them into the Detection Process

Our Contribution: Extending Spatial Pictorial Structures to Spatio-

Temporal Pictorial Structures.

Temporal Deformation Cost

1 2 3Frame Number :

ii

i

Configuration of parts

Appearance Spatial Deformation Cost

More Elegant Approach than Post Processing(2.1). Best Detections Are Determined During the

Optimization Process. Configuration of Parts are Limited to Transitions in

Time (Temporal Deformation).

This Transitions will Be Learned and Embedded in Our Optimization Process to Restrict the Detections.

Part’s Trajectories Before Temporal Adjustment

Part’s Trajectories After Temporal Adjustment

Next Steps Applying the Temporal Deformation Cost in the

Optimization Process. Train a Model that Considers Usual Human Part

Transitions in Time.

Part’s Trajectories on Video

Parts Trajectories of Annotations

Head Trajectory Before Temporal Adjustment

Head Trajectory After Temporal Adjustment

Annotated Parts in each frame

Head Trajectories Comparison

Input Frame

Compute Human

Detection

Pick a Bundle of n Frames

Check Part Transition in the Bundle of

Frames

Keep Frames that Move

Consistently in Time

Refine Part Location using

Temporal Information

Input Frame

Immediate Output from Human Detection

Temporally Consistent Detection without Part Adjustment

Temporally Consistent Detection with Part Adjustment

4. Results5. Videos Taken From TRECVID MED11 Dataset.

Human Detection Output

Human Detection Output with Temporal Consistency

Human Detection Output with Temporal Consistency and Part Adjustment

Human Detection Output with Temporal Consistency and Part Adjustment

Human Detection Output with Temporal Consistency

Human Detection Output

Input Frame

Input Frame

References[1] Y. Yang, D. Ramanan. “Articulated Pose Estimation using Flexible Mixture

of Parts” Computer Vision and Pattern Recognition (CVPR) Colorado Springs, Colorado, June 2011

[2] L. Sigal, A. O. Balan and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulate Human Motion, International Journal of Computer Vision (IJCV), Volume 87, Number 1-2, pp. 4-27, March, 2010.