action recognition based graph cut

Action Recognition using Graph-Cut (I)

J.Iveel

2014-9-23

Intro

• Proposed to recognize human action from video using Graph-Cut approach.

• Algorithmic stages can be defined as follows:– Pre-GraphCut: Input video segment S should be

converted into graphical representation Gs(V,E) – Pro-GraphCut: Given Gs(V,E) and optimum action-

category-labels Lv for its node V, which is the output from Graph-Cut, select a set of sub-graph where action(s) of interest might happened.

Some Notation

• “Video segment”, S, refers to a set of local feature point extracted at X location, described by descriptor D:

• “Confidence score” refers to a likelihood of class label l given observation o :

Pre-GraphCut

• Converting video segment into graphical representation requires:

(1)Breaking down whole video segment S into spatio-temporal grids. Each grid volume is node Vi connected to its neibhourhood by edge Ei in graph Gs.

(2)Assigning confidence-score for node Vi

Node Confidence-Score

• The most challenging problem is (2): assigning confidence score for each node:– Node is, simply, a set of feature points within

grid volume:

– Therefore, node confidence can be defined by an unknown function, g, over these feature points inside.

Node Confidence-Score

• The naïve approach is to find confidence-score for each feature point inside node and accumulate these scores to get node-score:

Then, let us find feature confidence-score, i.e, likelihood of class l given local feature fj.

Feature Confidence-Score (1)

• Target is to measure:


• Constructed BOV histogram for each test video segment, with centroids C:

• Trained binary linear SVM, to produce a support vector for class label l:


• Given a feature point from test segment, then its confidence score: (1) Hard Assignment:

(2) N-Soft Assignment:

Experiment: Feature Confidence (1)

• Hard-Assignment case:

Experiment: Feature Confidence (2)

• N-Soft Assignment case:

Node Cost-Value (1)

• Graph-Cut framework, it minimizes the total penalty/cost value of single nodes and neighborhood nodes given node label configuration L:

• Node cost score is inversely proportional to the likelihood or confidence score:

Node Cost-Score (2)

• Assuming node confidence score is a sum of feature point scores (using hard assignment):

• Considered following inverse relationship to derive node cost score:

(1) Nlog ( Negative Log-likelihood)

(2) Norm ( Negative Normalized Confidence Score)

(3) Naive ( Negative Raw Confidence Score)

Method 1: NLog

• Probabilistic interpretation: According Platt[1], he showed interpreting SVM confidence score in a probabilistic manner using a parametric form of a sigmoid to :

• Negative Log Likelihood: In MRF (Graph-Cut), the cost values often associated with neg-log of the measurement of noise. Similar, once confidence values are translated into probability, operation is applied to derive cost score:

•

•

Method 2: Norm

• The confidence score is scaled between 0 and 1. Then cost value is associated with the negative of these values:

Method 3: Naive

• The cost value is directly associated with the negative of the raw confidence score:

Experiment: Node Cost Score (1)

• With default parameters, Naive approach, surprisingly, outperforming other two methods. The worst performance is observed with the Norm method

• The NLog approach performed lesser than my personal expectation. The reason, maybe, associated with the tuning parameters, A and B, of the sigmoid equation:

• In particular, the parameter A is in control of slope. Let's inspect this parameter's effect on the performance


• NLog approach: Sigmoid parameter A's effect on the performance


Num Method Avg. Recognition

1 Nlog ( optimized parameter) 96.8 %

2 Norm 95.8 %

3 Naive 93.5 %

Conclusion

• In this slides, the two main questions being explored, which all related to construction of video graph G and proposed a few methods and did an experiment on the KTH dataset.

– (i) Assign confidence score at feature-level ● Soft-assignment● Hard-assignment

– (ii) Assigning confidence score at node-level● Nlog ( Negative likelihood )● Norm● Naive

Future Works

• Future work will explore: – Alternative construction of video graph:

● Instead of defined grid, use super-voxel for choosing node region.

– Single feature confidence score:● Instead of BOF, using VLAD descriptor for

obtaining more discriminative representation of feature.

action recognition based graph cut

Technology