video pattern mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/isimp04/file/hk-isimp-2004.pdfs.-f....
TRANSCRIPT
![Page 1: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/1.jpg)
S.-F. Chang, Columbia U. 1
1S.F. Chang
Prof. Shih-Fu Chang
Digital Video and Multimedia LabColumbia University
October, 2004http://www.ee.columbia.edu/dvmm
Video Pattern Mining
ISIMP 2004
2S.F. Chang
Joint workLexing Xie, Lyndon Kennedy, Winston Hsu, Dong-Qing Zhang (Columbia U.)Ajay Divakaren and Huifan Sun (MERL)John R. Smith and Ching-Yung Lin (IBM)
Supported in part by MERLARDA VACE phase IIIBM-Columbia Semantrix Project
![Page 2: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/2.jpg)
S.-F. Chang, Columbia U. 2
3S.F. Chang
Temporal Patterns Everywhere …
[Iyengar99][www.music-scores.com][Purdue Stat490B]
Internet traffic patterns
music motifs
QESVADKMGMGQSGVGALFN
QTKTAKDLGVYQSAINKAIH
QTELATKAGVKQQSIQLIEA
QTRAALMMGINRGTLRKKLK
λ Cro
λ Rep
434 Cro
Fis Protein
protein motifs
There are interesting patterns representing useful information in different domains.
4S.F. Chang
Example Patterns in Video
time
financial news, CNN
98-06-02
98-06-07
98-05-20
anchor interview text/graphics footage …
soccer video
play start interception attemptspass attempt at the goal break
Play level break
View levelbaseball
![Page 3: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/3.jpg)
S.-F. Chang, Columbia U. 3
5S.F. Chang
Patterns Across News Sources
Story/event often re-occur within or across channelsRelated stories share intrinsic audio-visual patterns -- forming news thread Tendency how a story is repeated at different times and by different channels – forming higher-level patterns
Source 1
Source 2
Source n
Event 1 Event 2 Event nReal world ... ...
...
...
...
:
Event n
Thread 1 Thread 2 Event nThreads ... ...Thread n
Example‘Marines Extend Duty’
Chinese News
CNN News
6S.F. Chang
Types of Temporal PatternsDense and stochastic
Sparse and deterministic
Temporal association of eventsIf A occurs in channel 1, then B occurs within time T in channel 2, e.g. recurrent news topics
Others …
soccer video
play start interception attemptspass attempt at the goal breakFocus of the talk
![Page 4: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/4.jpg)
S.-F. Chang, Columbia U. 4
7S.F. Chang
(Grand) Challenges for Pattern MiningThere are many patterns in video at different levels.
How do we discover them (semi)-automatically?Do they correspond to any semantic meanings?What are the underlying features/structures of each pattern?Which patterns are more predictable and thus detectable?
Why do these matter?Explore structures and knowledge in a domain without a priori manual definition
discover interesting, meaningful concepts & eventsdefine normal states and outliers
Facilitate scalability to new content and domains
8S.F. Chang
Challenge: Unsupervised Pattern Discovery
Given a new domain/corpus, discover patterns automaticallyE.g., News, consumer, surveillance, and personal life log
Technical Issues:Find appropriate spatio-temporal statistical models to represent and explain patternsFind locations in the video stream that fit such models
… …… timeIssues
What’s the adequate class of models?How to learn suitable model structures?How to select good features?
vs.
vs.
![Page 5: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/5.jpg)
S.-F. Chang, Columbia U. 5
9S.F. Chang
Finding the right class of models …Lessons from prior worksupervised + unsupervised
Many video temporal patterns are characterized byDynamics and Features
0 2 4 6 8 10 120
0.1
0.2
0.3
0.4
20
40
60
(sec)
dominant color ratio
Motion intensity
play
ColorMotionObjects Audio…
DistinctiveFeature
Distributions
… Play BreakRecurrent Soccer patterns
Close-up
Wide Zoom-inDistinctiveTransitionDynamics
Distinctive patterns are characterized by state-dependent transitions and features.Like speech recognition, HMM and variants may serve as promising candidates.
![Page 6: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/6.jpg)
S.-F. Chang, Columbia U. 6
11S.F. Chang
A simple test: HMM Parsing of Soccer Video
LabeledSegments
(color, motion, audio, objects)
TrainHMM familiesFor soccer
plays/breaks
learn high-level transition
prob.
(training phase)‘play’
‘break’
Model library
Test Data Feature
Sequence
Likelihood Estimationfor each Model
High-Level Play-BreakOptimal sequence
(Dynamic Programming)
(testing phase)
… Play Break
12S.F. Chang
Supervised HMM models are effective for parsing structural patterns in sports video
81.7%89.6%89.6%79.9%Espana
89.6%85.3%85.3%79.9%KoreaB
79.8%84.3%84.3%78.1%KoreaA
80.6%82.5%82.5%87.2%Argentina
EspanaKoreaBKoreaAArgentina
Training SetTest set
Sports video from various countriesEvaluation by Cross Validation
• The good classification provides preliminary evidence supporting HMM model and the A-V features Demo• Avg. Play-Break Classification Rate: 83.5% vs. 60% of blind guessing• Boundary timing accuracy: 62% within 3 seconds
(Xie et al ‘02)
![Page 7: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/7.jpg)
S.-F. Chang, Columbia U. 7
13S.F. Chang
Temporal Pattern Mining with a Single Model: Hierarchical HMM
Intuitive Representation for Video PatternsPatterns occur at different levels following different transition models States in each level may correspond to different semantic concepts
time… ……
top-level states
running pitching
break
bottom-level states
bench close up
batteraudiencefield bird view
pitcher1st base
BaseballExample
time… ……
top-level states
Topic/Genre 1
Topic/Genre 2
bottom-level states
Interview,Financialdata
Reporteranchor Sports footage
NewsReporting
(Xie et al ‘02)
14S.F. Chang
[Clarkson et al ’99]Long personal videos captured by wearable
deviceColor histogram and MFCC features at 10Hzdiscovered pattern sequence has high
correlation with ground truth events
Applying HMM to unsupervised mining
A simplified structure without higher level state transition
[Naphade et al ’02]Films and talk shows Color and edge histogram, MFCC and energy features at 30Hzdiscover recurrent patterns of explosion and applause
![Page 8: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/8.jpg)
S.-F. Chang, Columbia U. 8
15S.F. Chang
Hierarchical HMM
Dynamic Bayesian Network representation
Tree-Structured representation
Ft+1 bottom-level hidden states
top-level hidden states
observations
Gt
Ht
Yt
Ft
Gt+1
Ht+1
Yt+1 level-exiting
states
g1 g3
g2
h11
h12
h21 h22
h32
h31
[Fine, Singer, Tishby ‘98][K. Murphy, ’01] [Xie et al ’02]
Flexible control structure (bottom-up control with exit state)Extensible to multiple levels and distributionsEfficient inference technique available
Complexity O(D·T·QαD), α=1.5 to 2Application in unsupervised discovery has not been explored
Questions: how to find right model structures and feature sets?
16S.F. Chang
The Need for Model Selection
Different domains have different descriptive complexities.
talk show
news
soccer
?
?
?
![Page 9: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/9.jpg)
S.-F. Chang, Columbia U. 9
17S.F. Chang
Model Selection with RJ-MCMCDefault HHMM
EM
Possible Model Operations
Split
MergeSwap
(move, state)=(split, 2-2)
new modelAccept proposal?
Prob. Thresh. for accepting change= (eBIC ratio)x(proposal ratio)xJ
next iteration
stop
[Green95][Andrieu99]
[Xie ICME03]
Optimum points:balance (data
fitness) + (model complexity)
18S.F. Chang
Which Features Shall We Use?
color histogram
edge histogram
MFCCzero-crossing
rate delta energy
nasdaqjenningstonight lawyerclinton
juriaccus
tf-idf
Spectral rolloff
pitch
keywords
Gaborwavelet
descriptors
zernikemoments
outdoors?
people?
LPC coeff.
vehicle?
motion estimates
… ……time
?
logtf-entropyface?
text
![Page 10: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/10.jpg)
S.-F. Chang, Columbia U. 10
19S.F. Chang
Issues of Feature Selection
X1
X2
X3
X4
[Zhu et.al.’97][Koller,Sahami’96][Ellis,Bilmes’00]…
Criteria: (1) find the feature-feature or feature-concept relevance(2) eliminate any redundancy
Target concept Y1
32
4
Temporal samples not independent Temporal sequence
No target concept defined a priori
But new problems for temporal sequence mining:
Unsupervised
Standard process for Supervised learning:Identify a good subset of observations to improve model generalization performance and reduce computation.
[Xing, Jordan’01]
20S.F. Chang
A Divide and Conquer Approach:Feature Selection for Temporal Pattern Mining
Feature pool
[Koller’96] [Xing’01][Xie et al. ICIP’03]
Multiple consistentfeature sets
wrapper
Mutual information
1 2
3
Ranked feature sets with redundancy eliminated
filter
Feature sequences
State sequences
q1=“abaaabbb”q2=“BABBBAAA”I(q1,q2)=1
Markov Blanket
{X?, Xb} ⇒ q1=“abaaabbb”{Xb} ⇒ q1’=“abaaabbb”= q1
Eliminate X?
![Page 11: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/11.jpg)
S.-F. Chang, Columbia U. 11
21S.F. Chang
Results: on Sports Videos
baseball
videos featuresHHMM
audio: energies, zero-crossing rate, spectral rolloff
visual: dominant color ratio, camera translation motion
soccer
patterns
HHMM top-level label sequence
vs. play/break?
22S.F. Chang
A Simple Test: Mining Baseball Videos
videosbaseball
HHMM + feature selection
dominant color ratiohorizontal motion (vertical motion)
(1)
(2)
(3) …
audio volume low-band energy
…
Ranking based on BIC
score
Semantics of patterns?
Correspond to play/break with 82.3% accuracy
(demo)
?
![Page 12: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/12.jpg)
S.-F. Chang, Columbia U. 12
23S.F. Chang
Unsupervised Mining not less Effective than Supervised Learning
Fixed features {DCR, MI}, MPEG-7 Korean Soccer video
* DCR=‘dominant-color-ratio’, MI=‘motion-intensity’, Mx=‘horizontal-camera-pan’
Automatic selection of both model and features
75.2%
74.8%
82.3%
Correspondence w. Play/Break
75.2±1.3%
75.0±1.2%
73.1±1.1%
64.0±10.%
75.5±1.8%
Correspondence w. Play/Break
Other examples of video pattern discovery (e.g., Jojic and Frey)
Graphical Model
Nebojsa Jojic, Brendan Frey, ‘A Generative Model for 2.5D Vision: Estimating appearance, transformation, illumination, transparency and occlusion’, IJCV 2002
Hidden
Observed
Similar principle in using video generative modelsAssume latent spatial-temporal models and transformationsAV features are observations of the generation processUnsupervised inferencing
![Page 13: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/13.jpg)
S.-F. Chang, Columbia U. 13
25S.F. Chang
Video mining methods seem promising in finding hidden patterns in video.But how to find the meanings of the patterns? Approach
fuse the metadata streams when available.Multi-modal fusion
26S.F. Chang
OutlineThe problemUnsupervised pattern discovery with HHMMFinding meaningful patterns
With text associationBy multi-modal fusion
Summary
politics goal!snow… …
![Page 14: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/14.jpg)
S.-F. Chang, Columbia U. 14
27S.F. Chang
Towards Meaningful PatternsWhat are the meanings of the hidden patterns?Manual association feasible only if meanings are fewand known.Metadata come to the rescue.
time… ……
? ??
???
food
ASR/VOCR
iraq nasdaqclintonpeter jennings snowtonight juri comput damagaccuslawyerwin
28S.F. Chang
Associating Patterns with TextHHMMvideos AV features
and conceptsPatterns/tokensnewslabel-word associationwords (ASR/VOCR)
![Page 15: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/15.jpg)
S.-F. Chang, Columbia U. 15
Co-Occurrence of HHMM Labels & Words
Likelihood ratio:independent
strong correlation
strong exclusion 0
1
1+
…clinton
damage
foodgrand
jury……nasdaq
reportsnow
tonightw
in…wordlabel
…
damagenasdaqdow
clintonfoodjennings compute
temparaturewin
... ...... Words
from English
HHMM labels from Language “X”
“correlation” between HHMM labels and wordsco-occurrence counts.
)(.,/),()|(,.)(/),()|(wCwqCqwCqCwqCwqC
==
Conditional prob.
Not very informative:no distinctive mapping!
Problems with the Co-occurrence StatisticsStory#News VideoHHMM labelASR token
q1 q2 q2w2 w2w1
1 2 3“true mapping
prob.”“observed
co-occurrence”
Applied to Image Annotation[Dyugulu et. al. 2002]
image ~ blobs {b1, …, bn}~ words {w1, …, wn}
Machine Translation[Brown’93]
Her dog is typing on my computer.
Son chien dactylographie sur mon ordinateur.
True mapping
dactylograph*
e=‘computer’
chien
ordinateur
sur
mon/ma
son/sa
f
Co-occurrence
q1w1
![Page 16: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/16.jpg)
S.-F. Chang, Columbia U. 16
31S.F. Chang
Inferring the translation probability between visual tokens and words
Prob. of the hidden alignmentsAuxiliary Function for E-M
( ) ( )
1 1 1( ; ) ( | , , ) log[ ( ) ( | )]
n nM LNold old
nj nj ni nj nj nin j i
Q p a i w b p a i t w w b bθ θ θ= = =
= = = = =∑∑∑Sum over the true joint prob.
*find optimal argmax log ( | , )p w bθ
θ θ=
1 1 1log ( | , ) log( ( ) ( | ))
n nM LN
nj nj nin j i
p w b p a i t w w b bθ= = =
= = = =∑∑ ∑
32S.F. Chang
The problem: Co-occurrence “un-smoothing”.
Translation between AV Tokens & Words
Solve with EM [Brown’93]
independent
strong correlation
strong exclusion 0
1
1+
L=
(assume q’s are ind.)
(assume w’s are ind.)
Mappings become much more selective
![Page 17: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/17.jpg)
S.-F. Chang, Columbia U. 17
33S.F. Chang
ExperimentsTRECVID2003 news
44 half-hour videos, ABC/CNNRaw audio-visual features: color, motion, speech12 semantic concepts for each shot [IBM-TREC’03](weather, people, sports, non-studio, nature-vegetation, outdoors, news-subject-face, female speech, airplane, vehicle, building, road)ASR transcript
HHMM on concept confidence scoresConstruct 10 HHMM models with automatically determined size and featuresMachine translation methods to find association between HHMM patterns and words
34S.F. Chang
Example Correspondences
indoor, news-subject-face,
building
people, non-studio-
setting
Visual Concept
Clinton-Jones (Recall 45%,
Precision 15%) Iraqi-weapon (Recall 25%,
Precision 15%)
murder, lewinski, congress, allege, jury, judge, clinton, preside, politics, saddam, lawyer, accuse, independent, monica, charge, …
(9,1)
El-nino Storm ’98
(recall 80%)
storm, rain, forecast, flood, coast, el, nino, administer, water, cost, weather, protect, starr, north, plane, …
(6,3)
ManualInspection
Translated WordsHHMM label
[Xie et al. ICIP’04]
Obtained with SVM classifiers
[IBM’03]
(m, q): model # m state # q
Lexicon obtained by shallow parsing of keywords from speech recognition output.
Automatic ManualInspection
![Page 18: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/18.jpg)
S.-F. Chang, Columbia U. 18
35S.F. Chang
Can’t we find such patterns using conventional clustering?(HHMM vs. K-means comparison)
independent
strong correlation
strong exclusion 0
1
1+
L=
HHMM provides unambiguous association between discovered patterns and words
36S.F. Chang
OutlineThe problemUnsupervised pattern discovery with HHMMFinding meaningful patterns
With text associationBy multi-modal fusion
Summary
politics goal!snow… …
- Versatile- Multi-modal- Meaningful- Knowledge-adaptive
![Page 19: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/19.jpg)
S.-F. Chang, Columbia U. 19
37S.F. Chang
Is MT the right paradigm?
Potential Problems:Words topical meanings
Temporal correspondence between words and pattern tokens may not exist.Audio-visual features and Text are processed separately and then fused at a late stage.
HHMMvideos features patterns
news
audio-visual label-word association
words from ASR
≠
38S.F. Chang
Change to Joint AVT Pattern Mining
Instead, both the AV pattern tokens and words should be used to jointly define hidden semantic states.A multi-modal data fusion problem [AVSR, multi-modal interaction]Aiming at recovering the semantics [TDT 1998-2004]
videos features multi-modal pattern
sequencenews
audio visualtext nasdaq
tonightclinton
juri
An early fusion engine
![Page 20: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/20.jpg)
S.-F. Chang, Columbia U. 20
39S.F. Chang
Layer Dynamic Mixture Model for Produced Videos
time……
nasdaqjenningstonight clinton
juri
text
audio
visual
observationstokenization
PLSA
HMM
HHMM
meaningful patterns?
Extracting High-level Concepts from Text – pLSA
Use data-driven analysis to find concept association and latent semantic topicsUse the syntactic story structures of video to define co-occurrenceUse graphics model statistical inferencing to discover latent semantics
Not just counting occurrences
nasdaqjennings
tonight lawyerclintonjuri
accus
nasdaqjennings
tonight lawyerclintonjuri
accus
a
newshowever
t……storiesshots …
?latent
semantics y
news stories d
words x
![Page 21: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/21.jpg)
S.-F. Chang, Columbia U. 21
Some semantic clusters from text pLSA
saddamiraqbaghdadweaponhusseinstrikesecure … …
goldolympics… …
jury lewinskistarrgrand accusation sexual independent water monicainvestigationpresident… …
temperaturerain coast snow el heavy northern stormforecasttornadopressureeastfloridaninogulfweather… …
downasdaqindustrialaveragewalljonesgaintrade… …
cancer increase secure temperaturetexasaccusation chance nasdaqpressure center … …
cancer africatemperaturemovie coast center heavy research rain strike … …
“financial”
“investigation”
“weather”“iraq”
“olympics”
“bad” clusters
42S.F. Chang
Layered Dynamic Mixture Model (LDMM)[Xie’TR04]
[Oliver’02][Stork’96][Nefian’02]
text
audio
visual
observations X
mid-level labels Y
time
Inference: EM
high-level clusters Z
What do the highest-level patterns represent?Conjecture: News Topics with salient multimedia and temporal cues!
![Page 22: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/22.jpg)
S.-F. Chang, Columbia U. 22
43S.F. Chang
Linking Multi-Source Videos, Web Pages
Source # 1
ThreadingNews Stories
News Video Sources
Source # 2
Source # 3
Question: does the discovered thread correspond to distinct semantics?• no ground truth• tentatively use NIST TDT topics
44S.F. Chang
Fusion Results Evaluated by Text TDT Topics & Metrics
http://www.nist.gov/speech/tests/tdt/
1. Story Segmentation - Detect changes between topically cohesive sections 2. Topic Tracking - Keep track of stories similar to a set of example stories 3. Topic Detection - Build clusters of stories that discuss the same topic 4. First Story Detection - Detect if a story is the first story of a new, unknown topic 5. Link Detection - Detect whether or not two stories are topically linked
•“NIST TDT research develops algorithms for discovering and threading together topically related material in streams of datasuch as newswire and broadcast news in both English and Mandarin Chinese. “
• Current TDT use text or ASR only.
TDT Tasks
![Page 23: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/23.jpg)
S.-F. Chang, Columbia U. 23
45S.F. Chang
Topic Detection and Trackinghttp://www.nist.gov/speech/tests/tdt/
Topic linking
?
The tracking of known topics -- classification;The detection of unknown topics – clustering;The detection of pairs of stories on the same topic (links).
[TDT04 evaluation plan v1.1]
46S.F. Chang
3 7 2931 37 48 56 62 66 72 76 860
1000
2000
3000
115 207
1014
0621 13
243
971 56 37 10
531 72 12 8 9 28 18
10 0 1 67 3 12
19 1 4 9 2 74 41 8 12
649 0 7 69 8
2354
179
7 1978
631
070
709
16 135
49 1 37 125
11 63 9 338
729 1 14
513 9 0 7 10 1
986
150
30 6 19 105 23
03 2 13
133 47 14 2 28 6 18 3 16 2 12
48 3 1 65 8 8 28 1 1 21 18 5 4
topics
Elections
Scandals Hearings
Legal Criminal Cases
Natural Disasters
Accidents
Acts of Violence or War
Science and Discovery
Financial
New Laws
SportsPolitical and Diplomatic Meetings
Celebrity Human Interest
Miscellaneous
0
20
40
60
80 ABCCNN
# of documents in the TDT text corpus over different topics
# of stories in trecvid2003 (151 videos)
Videos have different topical distribution from text documents
![Page 24: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/24.jpg)
S.-F. Chang, Columbia U. 24
523 video stories in 30 frequent topics
1 30 Asian Economic Crisis 2 87 Monica Lewinsky Case 8 4 Casey Martin Sues PGA 12 4 Pope visits Cuba 13 16 1998 Winter Olympics 15 67 Current Conflict with Iraq 18 12 Bombing AL Clinic 19 9 Cable Car Crash 20 4 China Airlines Crash 21 9 Tornado in Florida 31 5 John Glenn 32 13 Sgt. Gene McKinney 41 7 Grossberg baby murder 43 6 Dr. Spock Dies 44 38 National Tobacco Settlement 47 28 Viagra Approval 48 20 Jonesboro shooting 56 7 James Earl Ray's Retrial? 65 15 Rats in Space! 70 39 "India, A Nuclear Power?" 71 24 Israeli-Palestinian Talks (London) 76 20 Anti-Suharto Violence 77 7 Unabomber 83 4 World AIDS Conference 84 6 Job incentives 86 24 GM Strike 87 18 NBA finals 89 4 Afghan Earthquake 91 9 German Train derails 96 12 Clinton-Jiang Debate
Topic # #stories topic title
Out of 151 videos(V, F1, F2, Test),4241 segments,3066 stories.
48S.F. Chang
ExperimentsNews videos [TRECVID2003]
ABC, CNN : 30 min x 151 video programsTraining/testing on same channels
PLSAevery storyASR word-stems tf-idftext
every shot22 conceptsvisual
1 seccamera translation (2-d)motion
1 sechistogram (15-d)colorHHMM
.5 secpitch, pause, audio-classaudio
bottom-level model
granularityfeature elementsmodality
![Page 25: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/25.jpg)
S.-F. Chang, Columbia U. 25
Evaluate the MM patterns against TDT topics
Auto-story boundary, 32 clustersMinimum Cdet among the CNN videos for each topic
Topic (#stories)
0
0.2
0.4
0.6
0.8
1
1.2
AsianE(30)Lew
insky(87)suePGA(4)PopeCuba(4)W
O'98(16)
Iraq(67)Bom
bAC(12)
CCCrash(9)CACrash(4)TornadoFL(9)
JGlenn(5)GM
cKin(13)GBM
urder(7)dSdie(6)Tobacco(38)
Viagra(28)JShoot(20)RayTrial(7)RatSpace(15)InNuclear(39)IsPalTalk(24)ASViolence(20)
Unabomber(7)
AIDConf(4)
JobIncen(6)GM
Strike(24)N
BAFianl(18)
AfEaQuake(4)
GTraindR(9)CJDebate(12)
MM fusionText onlyvconcept1vconcept2vconcept3motioncoloraudio
better
multi-modal fusion able to detect certain topics more accurately than text aloneSuch topics tend to be rich in non-textual cues
Topic-ClusterMisalignment
Example: Cluster #28
Unique audio-visual + text terms + temporal transition• AV: graphics, music, anchor speech, subject etc• textual terms: most related to financial• temporal structure: statistically predictable transition patterns
Correspond to “Dollars & Sense, CNN” (demo)
![Page 26: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/26.jpg)
S.-F. Chang, Columbia U. 26
51S.F. Chang
Example: Sports (demo)
Consistent audio-visual (color, motion, graphics), words, and temporal transitions.
52S.F. Chang
Example: Advertisements
• Audio-visual dominate in this thread. • Text not useful
(demo)
![Page 27: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/27.jpg)
S.-F. Chang, Columbia U. 27
53S.F. Chang
Application in Multi-Source News Threading
Joint Work with IBM T.J. Watson ResearchSupported by ARDA VACE II program
Reconstructing Semantic Threads Across Multiple Video Broadcast News Sources Using Multi-Level Concept Modeling
ObjectivesExtraction of semantic threads through multi-level video content analysis (core- and semantic-level features, scene markers & similarities, commonality of content and structure)Establishment of viability of automatic concept detection as basis for video semantics understanding and higher-level analysis functions and user interactionsDevelopment of framework and technologies supporting extraction of thousands of concepts from news video
Novel ApproachHierarchical segmentation and threading at multiple granularities (key-frames, shots, scenes, episodes and stories) Dynamic probabilistic graphical modeling for detection of concepts to form “semantic basis” of objects, sites, actionsStatistical modeling & mining of spatio-temporal & semantic structures across multiple broadcast video news sources
PIs: IBM (John R. Smith); Columbia Univ. (Prof. Shih-Fu Chang)
Event 1 Event 2 Event n…
News Source 1
News Source 2
News Source m
News Source 3…
Thread 1 Thread nThread 2
Video Broadcast News – Content Acquisition & Production
Multi-level Video Analysis and Thread Extraction
Core-level
Fusion-level
Event-level
Thread-level
Visual Analysis
Video News Analysis Tools and System
Multi-level searching, filtering & interactionSemantics-based browsing Concept-based classification
Analyst
![Page 28: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/28.jpg)
S.-F. Chang, Columbia U. 28
Multi-level News Video Content Analysis
Multi-level feature extraction and systematic development of higher-level access functions
Core-level
Fusion-level
Event-level
Thread-level
VideoOCR
RegionExtraction
VideoTexture
ConceptDetection
ContextExploitation
ReducingSupervision
dim1
dim2highlow
high
low
(car)
(sky)
Objects
Actions
Sites
Concepts
Outdoors IndoorsPerson
PeopleFace
NewsSubjectAnchor
Crowd
NewsMonolog
NewsDialog Studio
StructureModeling
StoryClustering
ThreadExtraction
EventModeling
StorySegmentation
Visual modalitiesNon-visual (speech, text, closed captions, etc.)
PIs: IBM (John R. Smith); Columbia Univ. (Prof. Shih-Fu Chang)
56S.F. Chang
Detection of Image Near Duplicate (IND)
With Dongqing Zhang
![Page 29: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/29.jpg)
S.-F. Chang, Columbia U. 29
How to Link Video Threads with external Sources (e.g., web pages)?
Source # 1
ThreadingNews Stories
Search byexample
News Web Site 1
News Web Site 2
News Video Sources
Source # 2
Source # 3
58S.F. Chang
Image Near-Duplicate DetectionImage Near-Duplicate (IND): A pair of images in which one is close to the exact duplicate of the other, but differs slightly due to variations of content and camera parameters
Source # 1
News Story Thread Across Sources
Source # 2
Source # 3
Application of IND Detection:Threading news videosacross sources
![Page 30: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/30.jpg)
S.-F. Chang, Columbia U. 30
59S.F. Chang
Part-based Approach for IND Detection
Part representation: we use interest point based representationPart detection: SUSAN corner detector –detecting corners using local geometries, such as curvature. Efficient algorithm compared with others, such as HarrisFeatures
Part: Color, spatial location, Gaborwavelet coefficients
Inter-part relation: Difference of spatial locations
Visual scene is composed of objects (parts) with spatial/attribute relations
Part-based representation by Attributed Relational Graph (ARG)
ARG
==??
ARGSimilarity
IND Detection becomesARG similarity problem
Graphical Model for ARG similarityUncertainty of Vertex Correspondence(due to occlusion, content change)
Variations of Attributes(due to lighting, object move)
ARG similarity is the likelihood ratio of the stochastic transformation process
VertexCorrespondence
Attribute Transformation
1
2
3
1
2
11x
32x
GraphS
GraphtX },...,,{ 321211 xxx=
Product Graph
Graph s
Graph t
H: Hypothesis
H=1 : two graphs are similar
Graphical Model of the Stochastic Process
H
![Page 31: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/31.jpg)
S.-F. Chang, Columbia U. 31
61S.F. Chang
Inference and LearningLearning
Learning using node-level annotation
Positive Samples
Negative Samples
Learning using image-level annotation
Likelihood computation
1
2
3
1
2
11x
32x
GraphS
Grapht
Computing exact likelihood is intractable, so approximate it using Bethe approximation, leading to Loopy Brief Propagation in the product graph
Use E-M process:
Inference ParameterEstimation
Inference results(after thresholdingbeliefs)
Stochastic GraphEditing Process
IND vs. non-IND Likelihood Ratioa new similarity measure
Similarity by Likelihood Ratio
Learning
Similarity = P(Graph_t | Graph_s, Two_graph_is_NID)
P(Graph_t | Graph_s, Two_graph_is_not_NID)
• Compute Likelihood
,=Intractable ! So approximate it by using
Jensen’s lower bound
+ constant
1. Inference : Compute the approximate distribution by Loopy Belief Propagation
• Learning by node-level annotation
• Learning by image-level annotation
Positive Samples
Negative Samples
Learning is realized by Variational E-M
Learning is realized by Parameter Computation
q̂
Statistical Approach to Graph Matching
2. Learning : Estimate and by using E-M
YsH
Yt
![Page 32: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/32.jpg)
S.-F. Chang, Columbia U. 32
IND Detection Performance (demo)
Detection Results
Search Scope
Probabilityof SuccessfulRetrieval
Retrieval Results
Precision
RecallStatistics of Image Near-Duplicate (IND)in TREC News Video Database
4147 News StoriesIn TREC-VID 2003
About 747 StoriesAnnotated by TDT2
At least 78 Storieshave INDs
Dataset for IND detection• 150 IND pairs, 300 non-duplicate images
• No multiple duplicate
Training set• 30 IND pairs, 60 non-duplicate images
• Node-level learning: 5 IND pairs, 5 non-duplicate images
64S.F. Chang
Conclusions
Patterns abound in multimedia dataMining facilitates auto discovery of salient or novel concepts scalable automatic knowledge discoveryCurrent results seen in
Multi-level temporal patterns mining through HHMMVideo generation modelsFusion between pattern labels and ASR metadata
Challenging IssuesMining of patterns of different types or complex patterns at higher levelsDetection of alerts and novel eventsEvaluation and Visualization
Theme: Media Pattern Mining for Semantics Discovery
![Page 33: Video Pattern Mining - eie.polyu.edu.hkeie.polyu.edu.hk/~cmsp/ISIMP04/file/HK-ISIMP-2004.pdfS.-F. Chang, Columbia U. 3 S.F. Chang 5 Patterns Across News Sources Story/event often re-occur](https://reader036.vdocuments.us/reader036/viewer/2022071115/5ff7e4934ed2601be936edfd/html5/thumbnails/33.jpg)
S.-F. Chang, Columbia U. 33
65S.F. Chang
Conclusions (2)
Multi-source news video analysis and content exploitation offers technical challenges and application opportunities
Multimedia Video MiningCross-source linking using image near duplicateMulti-modal fusion for story segmentationCurrent work:
Large scale AV concept detectionStory topical threading
Closely related to evaluation tasks in TRECVIDA workshop for developing large-scale concept ontology (LSCOM)
Threading News Videos Across Sources Using Multi-modal Cues
66S.F. Chang
More InformationDigital Video| Multimedia Lab http://www.ee.columbia.edu/dvmmPapers
Winston Hsu, Lyndon Kennedy, Chih-Wei Huang, Shih-Fu Chang, Ching-Yung Lin, Giridharan Iyengar. News Video Story Segmentation using Fusion of Multi-Level Multi-modal Features in TRECVID 2003. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, May 2004. L. Xie, L. Kennedy, S.-F. Chang, A. Divakaran, H. Sun, C.-Y. Lin (2004). "Discovering Meaningful Multimedia Patterns with Audio-visual Concepts and Associated Text." IEEE International Conference on Image Processing (ICIP 2004), Singapore, October 2004. Dong-Qing Zhang and Shih-Fu Chang, "Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning", To appear in ACM Multimedia 2004 (ACM MM2004). Xie, S.-F. Chang, A. Divakaran and H. Sun (2003), Unsupervised Mining of Statistical Temporal Structures in Video, in Video Mining, Azreil Rosenfeld, David Doremann, Daniel Dementhon Eds, Kluwer Academic Publishers, 2003