flow based action recognition papers to discuss: the representation and recognition of action using...
TRANSCRIPT
![Page 1: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/1.jpg)
Flow Based Action Recognition
Papers to discuss: • The Representation and Recognition of Action Using
Temporal Templates (Bobbick & Davis 2001) • Recognizing Action at a Distance (Efros et al. 2003)
![Page 2: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/2.jpg)
What is an Action?
Action: Atomic motion(s) that can be unambiguously distinguished and usually has a semantic association (e.g. sitting down, running).
An activity is composed of several actions performed in succession (e.g. dining, meeting a person).
Event is a combination of activities (e.g. football match, traffic accident).
![Page 3: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/3.jpg)
Action Recognition• Previously
o action recognition is part of articulated tracking problem
o or generalized tracking problem for directly detecting (activities/events)
• Noveltyo direct recognition of short time motion segmentso new feature descriptors
motion history images motion energy images Efros' features
![Page 4: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/4.jpg)
Flow Based Action Recognition
Papers to discuss: • The Representation and Recognition of Action Using
Temporal Templates (Bobbick-Davis 2001) • Recognizing Action at a Distance (Efros et al. 2003)
![Page 5: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/5.jpg)
Motivation
![Page 6: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/6.jpg)
Goal
Action: Motion over time
Create a view-specific representation of action
Construct a vector-image suitable for matching against other instances of action
![Page 7: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/7.jpg)
Motion Energy Images
D(x,y,t): Binary image sequence indicating motion locations
![Page 8: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/8.jpg)
Motion Energy Images
![Page 9: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/9.jpg)
Motion History Images
Descriptor: Build a 2-component vector image by combining MEI and MH Images
![Page 10: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/10.jpg)
Matching
• Compute the 7 Hu moments • Model the 7 moments each action class with a
Gaussian distribution (diagonal covariance) • Given a new action instance: measure the Mahalanobis
distance to all classes. Pick the nearest one.
![Page 11: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/11.jpg)
Image Moments
Translation Invariant Moments
![Page 12: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/12.jpg)
Scale Invariant Moment
7 Hu Moments
![Page 13: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/13.jpg)
ResultsOnly the left (30 dg) camera as input and matches against all 7 views of all 18 moves (126 total). Metric: a pooled independent Mahalanobis distance using a diagonal covariance matrix to accommodate variations in magnitude of the moments.
![Page 14: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/14.jpg)
Results
Two camera The minimum sum of Mahalanobis distances between the two input templates and two stored views of an action that have the correct angular difference between them (in this case 90) The assumption: we know the approximate angular relationship between the cameras.
![Page 15: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/15.jpg)
Flow Based Action Recognition
Papers to discuss: • The Representation and Recognition of Action Using
Temporal Templates (Bobbick-Davis 2001) • Recognizing Action at a Distance (Efros et al. 2003)
![Page 16: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/16.jpg)
Recognize medium-field human actions
Humans few pixels tall
Noisy video
The Goal
![Page 17: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/17.jpg)
1. Track and stabilize the human figureo Simple normalized-correlation based
tracker• Compute pixelwise optical flow
o On the stabilized space time volume • Build the descriptor
o More on this later... • Find NN
System Flow
![Page 18: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/18.jpg)
Descriptor
What are good features for motion?
• Pixel values• Spatial image gradients • Temporal gradients
Problems: Appearance dependent and no directionality information on motion
• Pixel-wise optical flow Captures motion independent of appearance
![Page 19: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/19.jpg)
Descriptor
The key idea is that the channels must be sparse and non-negative
![Page 20: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/20.jpg)
Similarity
T: motion lengthI: frame (size)c: # of channels
a,b: motion descriptors for two different sequences
![Page 21: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/21.jpg)
Similarity
![Page 22: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/22.jpg)
Classification
• Construct similarity matrix as outlined. • Convolve with the temporal kernel
• For each frame of the novel sequence, the maximum
score in the corresponding row of this matrix will indicate the best match to the motion descriptor centered at this frame.
• Classify this frame using a k-nearest-neighbor
classifier: find the k best matches from labeled data and take the majority label.
![Page 23: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/23.jpg)
ResultsBallet (16 Classes): Clips of motions from an instructional video.Professional dancers, two men and two women.Performing mostly standard ballet moves. Tennis (6 Classes): Two amateur tennis players outdoors (one player test, one player train).Each player was video-taped on different days in different locations with slightly different camera positions.Players about 50 pixels tall.Football (8 Classes):Several minutes of a World Cup football game from an NTSC video tape. Wide angle of the playing field.Substantial camera motion and zoom.About 30-by-30 noisy pixels per human figure.
![Page 24: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/24.jpg)
Results
Values on the diagonals: Ballet (K=5, T=51): [.94 .97 .88 .88 .97 .91 1 .74 .92 .82 .99 .62 .71 .76 .92 .96]Tennis (K=5, T=7): [.46 .64 .7 .76 .88 .42]Football (K=1, T=13): [.67 .58 .68 .79 .59 .68 .58 .66]
![Page 25: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/25.jpg)
Do As I Do Synthesis
Given a “target” actor database T and a “driver” actor sequence D, the goal is to create a synthetic sequence S that contains the actor from T performing actions described by D.
![Page 26: Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649e365503460f94b25097/html5/thumbnails/26.jpg)
Alper Yilmaz; Mubarak Shah, "Actions sketch: a novel action representation," Computer Vision and Pattern Recognition, 2005.
Extensions to MHI
Volumetric Features for Event Recognition in VideoYan Ke, Rahul Sukhtankar, Martial Hebertin ICCV 2007.