cvpr tutorial: first person vision - university of minnesotahspark... · side wall . side wall...
TRANSCRIPT
CVPR Tutorial: First Person Vision
INSIDE OUT: Riley’s First Date?, PIXAR
Third person camera
External space
What is where? - D. Marr
Person detection
External space
What is where? - D. Marr
Person detection Ground plane
Side wall
Side wall
Object detection Surface normal estimation Object affordance
External space
What is where? - D. Marr
Person detection Ground plane
Side wall
Side wall
Object detection Surface normal estimation Object affordance
Semantic segmentation Tracking
External space
What is where? - D. Marr
Person detection Ground plane
Side wall
Side wall
Object detection Surface normal estimation Object affordance
Semantic segmentation Tracking
External space
What is where? - D. Marr
First person is not moving third person.
What is first person vision?
Internal space
We move in order to see and we see in order to move.
- J. J. Gibson
?
Internal space
We move in order to see and we see in order to move.
- J. J. Gibson
Vanishing line
My orientation
Internal space
We move in order to see and we see in order to move.
- J. J. Gibson
Interaction with me
Person detection Ground plane
Side wall
Side wall
My motion Internal space
We move in order to see and we see in order to move.
- J. J. Gibson
“First person vision is an embedded human-system symbiosis.”
- Takeo Kanade
“First person vision is an embedded human-system symbiosis.”
- Takeo Kanade
First person vision is all about me.
Why first person vision is ideal for human behavior understanding?
Distance from camera, d 0.03m 0.1m 1m 10m 30m
Distance from camera, d
3-30m
0.03m 0.1m 1m 10m 30m
Third person
Target
Distance from camera, d
3-30m
0.03m 0.1m 1m 10m 30m
102 p 103 p 104 p 105 p 106 p
Third person
Target
Number of pixels for head pose (HD resolution), 1/d 2 ∝
Distance from camera, d 0.03m 0.1m 1m 10m 30m
102 p 103 p 104 p 105 p 106 p
Second person
Target
Number of pixels for head pose (HD resolution), 1/d 2 ∝
Distance from camera, d 0.03m 0.1m 1m 10m 30m
102 p 103 p 104 p 105 p 106 p
Second person
Target
0.5-3m
Number of pixels for head pose (HD resolution), 1/d 2 ∝
Distance from camera, d 0.03m 0.1m 1m 10m 30m
102 p 103 p 104 p 105 p 106 p Number of pixels for head pose (HD resolution), 1/d 2 ∝
First person
Target
Distance from camera, d 0.03m 0.1m 1m 10m 30m
102 p 103 p 104 p 105 p 106 p Number of pixels for head pose (HD resolution), 1/d 2 ∝
First person < 0.3m
Target
First person
Target
Second person
Target
Third person
Target
Noninvasiveness
Measurement accuracy 3D estimation error < 5cm
First person
Target
Second person
Target
Third person
Target
Noninvasiveness
Measurement accuracy 3D estimation error < 5cm
Prediction Learning
First person vs. Third person
I. Attention Following
1. Attention Following
Group Attention Following
2. Egocentric Spatial Organization
2.3m
2.3m
2. Egocentric Spatial Organization
2.3m
30cm
30cm
2. Egocentric Spatial Organization
2.3m
Orientation
2. Egocentric Spatial Organization
w/ prior w/o prior
Egocentric action-object detection
Graphical Representation via Kinematics
V1 V2
V4 V3
Position Orientation
Pose Velocity
Role
Graphical Representation via Kinematics
V1 V2
V4 V3
E12
E23
Position Orientation
Pose Velocity
Role
Distance Relative orientation
Relative velocity Social relation
E13 E14
E34
E24
Graphical Representation via Kinematics
Coach’s note
V4 V3
V1 V2
What can first person cameras tell us about me?
What can first person cameras tell us about me?
1. Attention
Personal attention: what am I looking? [Li ICCV13]
Social attention: what are we looking?
1. Attention 2. Kinematics
What can first person cameras tell us about me?
Human kinematics I: Where is my body and object?
Human kinematics II: What does that mean to me?
1. Attention 2. Kinematics 3. Control (sensorimotor)
What can first person cameras tell us about me?
Visual Sensorimotor I: How do I control?
3D reconstruction
Visual Sensorimotor II: What do I feel?
1. Attention 2. Kinematics 3. Control (sensorimotor)
What can first person cameras tell us about me?