May 17, 2002 EECS, Kansas University 1
Feature Based Video Sequence Identification
Dissertation Defense Presentation
Committee : Prof. Susan Gauch (Chair)
Prof. John Gauch
Prof. Joseph Evans
Prof. Jerry James
Prof. Tom Schreiber
by
Kok Meng Pua
May 17, 2002 EECS, Kansas University 2
Agenda
• Driving Problem
• The Goals
• Our Solution
• Related Work and Pilot Work
• The Video Identification Technique Design Approach
• Experimental Results and Discussion
• Conclusion and Future Work
May 17, 2002 EECS, Kansas University 3
Driving Problem
• Television and video streaming over the Internet enable us to see latest events and stories
• Large portion of these news and video sequences are repeated
• Issues:
– inefficient use of storage media in archives
– ineffective use of a viewer’s time
• Need:
– a fully automated topic-based video tracking system
May 17, 2002 EECS, Kansas University 4
The Goals
• Design and develop a feature-based automatic video sequence identification and tracking technique
– real time video processing
– video stream domain independence
• Not a topic-based tracking system
• Goals:
– real time video sequence identification and tracking technique with high accuracy
– lossless compression - reduce storage requirement by keeping only unique video sequences
May 17, 2002 EECS, Kansas University 5
Our Solutions
• Technique:
– Combination of video hashing and frame by frame video comparison
• Two standalone systems were implemented
a) Video Processing System
• Video Feature Abstraction and video sequence segmentation
b)Video Sequence Identification System
• Video Sequence Hashing
• Video Sequence Comparison
• Video Archiving and Tracking
• Located repeated video sequences with high accuracy (>90% in recall)
• Video input domain independence
May 17, 2002 EECS, Kansas University 6
-Global color distribution of an image
-Easy to compute and insensitive to small changes in viewing position
Flickner‘95 (IBM)
Pentland ’95 (MIT)
Huang & Pass ’97 (Cornell)Color histogram
- Divide an image into five overlapping regions
Stricker’96 (Swiss Federal Institution)
Image Partitioning with Color Moment
-Partition pixel value (histogram bucket) based upon their spatial coherence
-Fast to compute
Pass & Zabih’96 (Cornell)
Color Coherent Vector
-a few main colors
-less storage size
Swain ’91 Dominant Color
-easy to compute
-less storage size
-more robust than histogram
Chang & Smith ’96 (Columbia)
Color Moments
DescriptionResearchersMethods
Related Work – Image Abstraction and Similarity Measure
May 17, 2002 EECS, Kansas University 7
-Map the entire video sequence to some small representative images
-Use still images as key frames
Zhang ’95 (NUS) Key frames
-created from key frames that composed of 35 numbers (32 color features and 3 texture features)
-Less storage and faster similarity computation
Wu ’98 (Zhejiang University)
Centroid Feature Vector
-A body (10th frame of the video sequence, values of four motion tracking regions, and size of sequence
-Shape (moment invariant) and color (histogram) properties of Rframes
Arman’94 (Princeton)
Rframe
-9 signature frames per video
-very expensive
Cheung ’00 (California University)Video Signature
-frame by frame comparison
-less storage size but preserve uniqueness of video
J. Gauch’99 (KU)Frame-by-Frame
(9 Color Moments)
DescriptionResearchersMethods
Related Work - Video Sequence Abstraction & Similarity Measure
May 17, 2002 EECS, Kansas University 8
Pilot Work
• The VISION Digital Library System– test bed for evaluating automatic and comprehensive mechanisms for
library creation (video segmentation) and content-based search, retrieval, filtering and browsing of video across networks.
– support real time content-based video scene detection and segmentation using combination of video, audio and closed caption.
• Video Seeking System (VIDSEEK)– web-enabled digital video library browsing system.– dynamic-on-demand clustering allows users to organize the video clips
based on multiple user-specified video features (color, shape, category)– category-based browsing allows users to interactively and dynamically
filter the VISION digital video library clips based on a given set of constraints (video source, keywords and date of capture)
• VidWatch Project (Video Authentication)– color moment based frame-by-frame video information processing
technique– Detect and record video content difference between two television
channels
May 17, 2002 EECS, Kansas University 9
Four Main Processes of The Video Sequence Identification and Tracking System
Video Input Stream
Input Video Sequences
Similar Video Sequences
Video Abstraction Extraction
Video Stream Segmentation
Video Sequence Filtering
Video Frame Hashing
Potential Similar Video Sequences
Labelled Repeat or New Video Sequences
Video Processing andStream Segmentation
Video Sequence Hashing
Video SequenceComparison
Video SequenceArchiving & Tracking
Frame-by-Frame VideoSequence Comparison
Video Sequence Archving &Tracking
May 17, 2002 EECS, Kansas University 10
Video Processing and Stream Segmentation
• Video Processing System (VPS)
– A standalone real time video segmentation system on NT platform
– Use Osprey digitizer board for video stream digitization.
– Two main functions : video sequence creation and video abstraction extraction
May 17, 2002 EECS, Kansas University 11
Video Processing and Stream Segmentation
• Video Sequence Abstraction
– Compare abstraction for every frame in the video sequence but not key frames only
• Video abstraction method used should require small storage and also easy comparison computation.
– Use first 3 color moments of each primary color component (Red, Blue, and Green)
– 3 moments are the mean, standard deviation, and skew of each color component of a video frame.
– 9 floating points per video frame
May 17, 2002 EECS, Kansas University 12
Video Processing and Stream Segmentation
• Video Sequence Creation– A video sequence is a video shot ( an image sequence that
represents continuous action and corresponds to a single action of the camera)
– Use video segmentation technique developed in VISION project – Shot boundary detection : combination of average brightness,
intensity difference and histogram difference of adjacent frames.– Look at the differences between two frames dt distance from each
other to detect shot boundary with smoother (slow) transition – Meta-information captured:
• Size of video sequence• Video Sequence identifier (date and time the sequence is
captured)
May 17, 2002 EECS, Kansas University 13
Video Sequence Hashing Process
Purpose:Reduce the size of number of sequences requiring frame-by-frame comparison by selecting only similar video sequences which have many similar frames
Potential similar video sequences
No Potential Similar VideoSequence
New Sequence Detected
Similar Video Sequences(to Video Sequence
Comparison Process)
Input Video Sequences(frame Video Processing and
Stream Segmentation Process)
Video FrameHashing
Similar VideoSequenceFiltering
Vid
eo S
eque
nce
Has
hing
Pro
cess
No Similar Video Sequence
May 17, 2002 EECS, Kansas University 14
Video Sequence Hashing Process
• Video Frame Hashing:• Map video frames into similar frame buckets and hash
similar frames instead of “identical” video frames• Each video frame is represented by a color moment string• Color Moment String – map 9 original (10 to 1 mapping)
floating numbers into 9 integers and concatenate them to become a character string
• Use concatenation of all digits of the 9 original float numbers( moment values) as the hashing key proved to be ineffective due to noise introduced by both transmission and digitization into the video source.
• Identical values unlikely for repeated video broadcast
May 17, 2002 EECS, Kansas University 15
2 3 7 921-2 23-5-3
24.56 30.34 70.12 99.02210.20-20.02 230.02-50.21-32.23
Skew(Red) Skew(Green) Skew(Blue)M(Blue) S(Red)M(Red) S(Green)M(Green) S(Blue)
Mapping Process
237-2-3-521239
Integer NumberConcatenation
Color Moment String(Hashing Key)
Resulting Integer Numbers
Original nine Color MomentNumbers
Example of Color Moment String Mapping
May 17, 2002 EECS, Kansas University 16
Video Sequence Hashing Process
• Video Frame Hashing:• Mapping Ratio Issues:
• Mapping ratio too small : actual identical frames fall into difference similar buckets
• Mapping ratio too large : none-identical frames fall into same similar buckets
• Experimental result showed 1% of mapping error (identical frames into different similar buckets)
May 17, 2002 EECS, Kansas University 17
Video Sequence Hashing Process
• Video Frame Hashing Cost Estimation:
• Total video frame hashing cost for an input video sequence
Cost(m) = �
[H(n) + L(n)] for n = 1….m
m = size (total video frame count) of the video sequence
H(n) = hashing cost
L(n) = linked list traversal cost
• Independent of hash table size (video archive size)
• Experimental results showed an average hashing time of 500ms for each input video sequence
May 17, 2002 EECS, Kansas University 18
Video Sequence Hashing Process
Video Sequence Filtering :• Second component of similar video hashing process• Purpose:
• Identify truly similar video sequences• Potential similar video sequences are filtered to remove
sequences whose degree of similarity is below some threshold• Two video sequences are similar if:
1. The size difference of the two video sequences is less than 10%
2. The percentage of frames in the two video sequences that have identical color moment strings is > 30% (overlap threshold)
May 17, 2002 EECS, Kansas University 19
Video Sequence Comparison Process
• Video Sequence hashing results in false positive matches:
• Color moment strings used for hashing are built from approximate color moment values
• consider only the percentage of similar video frames and ignore their temporal ordering
• A more accurate frame-by-frame comparison is required
• Sum of the absolute moment difference of two sequences is calculated
• One video sequences is detected as a repeat of the other if sum of their absolute moment difference < moment difference threshold (10.0)
May 17, 2002 EECS, Kansas University 20
An Example of Video Identification Process
Filtering Output List(Sequence Q1)
SequenceB1
Video Sequence Filtering Output List
Step 1 : Video Frame Hashing
Step 2 : Video Sequence Filtering
A GFEDCB JIH
A GFERCB IH
T
T
Input Video Sequence Q1
Video Sequence A1
Video Sequence B1
Video Sequence C1
Note :Each block is a video frame with its color moment stringrepresented by an alphabet
StartingFrame
EndingFrame
D UH M Z P
D UH M Z P
A
G
F
E
D
C
B
I
H
B1
B1
B1
B1
A1
B1
A1
B1
B1
B1
C1
C1
R
T
U
M
Z
P
A1
C1
C1
C1
C1
A1
A1
A1
A1
B1
B1
Hash Bucket(Color Moment String) Linked list of Sequence Index Data
Video Hash Table
Video Index Table
Hashing Output List(Sequence Q1)
SequenceC1
Matching SimilarFrames =2
SequenceA1
Matching SimilarFrames =2
SequenceB1
Matching SimilarFrames =8
Video Frame Hashing Output List
May 17, 2002 EECS, Kansas University 21
Video Sequence Archiving and Tracking
• Purposes
– Record sequence identification results
• Number of video sequences processed
• Number of detected new and repeat sequences
– Control and enable video sliding window function
• Allow the total video archive to grow until it contains 24 hoursworth of video sequences
• Oldest/expired sequences are dropped from video archive as the newer sequences are added
• Use a video index table to capture identification results and keep the
total video archive within the 24 hours window size
May 17, 2002 EECS, Kansas University 22
Experimental Results
Input video source:
• 32 hours of video stream as primary input source
• A second 24 hours of difference video source for video identification technique input domain independent validation
Measurements :
• Video hashing accuracy and efficiency measures
– Video hashing time
– Select an optimum overlap threshold value
• Video sequence comparison cost
• Overall video identification accuracy and efficiency
– Recall and precision measurement
• Achievable storage compression
• Validating video identification technique with different video source
May 17, 2002 EECS, Kansas University 23
Video Hashing Time vs. Video Archive Size
May 17, 2002 EECS, Kansas University 24
Video Hashing Time vs. Video Sequence Size
May 17, 2002 EECS, Kansas University 25
Frame-by-Frame Video Sequence Comparison Cost
May 17, 2002 EECS, Kansas University 26
Experiment Results
• Recall and Precision Calculation• Recall
– The ratio of the number of correctly detected repeated sequencesto the total number of true repeated sequences in the video archive
– Recall = true positives / (true positives + false negatives)• Precision
– The ratio of the number of correctly detected repeated sequencesto the number of detected repeated sequences in the video archive
– Precision = true positives / (true positives + false positives )• Term definition:
– True positive : detect repeat, and it is repeat– True negative : detect new, and it is new– False positive : detect repeat, and it was new– False negative : detect new, and it was repeat
May 17, 2002 EECS, Kansas University 27
Example of Recall and Precision Calculation
• Inputs : S1.1, S1.2, S1.3,S1.4, S2.1, S2.2, S2.3• S1.2, S1.3 & S1.4 are repeat of S1.1 Also, S2.2 & S2.3 are repeat of S2.1
Recall = true positives / (true positives + false negatives) = 6 / (6+2) = 0.75Precision = true positives / (true positives + false positives) = 6 / (6+1) = 0.85
Outputs:S1.1 : Detected NewS1.2 : Detected Repeat of S1.1S1.3 : Detected NewS1.4 : Detected Repeat of S1.1, S1.2 & S1.3S2.1 : Detected NewS2.2 : Detected Repeat of S2.1S2.3 : Detected Repeat of S2.1 & S1.2
May 17, 2002 EECS, Kansas University 28
Choosing An Optimum Overlap Threshold Value
• Overlap threshold :
• percentage of total similar video frames (identical color moment string) over the total video frames of a video sequence
•Recall = true positives / (true positives + false negatives)
= total correctly detected repeated sequence / total true repeated sequence
•Precision = true positives / (true positives + false positives )
= total correctly detected repeated sequence / total detected similar video sequences
May 17, 2002 EECS, Kansas University 29
Overall Video Identification Performance Measurement
-Total of 32 hours of video source (2831 sequences)-1228 new and 1603 repeat in a 24 hour sliding window
May 17, 2002 EECS, Kansas University 30
Achievable Compression Ratio
May 17, 2002 EECS, Kansas University 31
Validating Video Identification Technique with difference Video Source
-Total of 24 hours of video source (3394 sequences)-1938 new and 1456 repeat
May 17, 2002 EECS, Kansas University 32
Achievable Storage Compression Ratio
May 17, 2002 EECS, Kansas University 33
Conclusion
Summary
• This thesis reports on a video sequence identification and tracking technique that can be used to process continuous video streams, identify unique sequences and remove repeated sequences
• The algorithm described is efficient (runs in real time) and effective
• Accurately locate repeated sequences (recall >90% and precision >89%) for two different video streams
• Achieve a compression gain factor of approximately 30% for both video streams
May 17, 2002 EECS, Kansas University 34
Future Work
• Technique Improvement– Partial Matching
• Detection of subset of a known video sequence or superset of few sequences overlapping one another
– Disk based Hashing• Handle a very large video window size to detect and track
repeated video sequences that occur farther apart in time• User Application
– Web-enabled Video Stream Browsing System• Enable users to search and browse the video archive and view
selected video sequences– Story based Video Sequence Identification
• Group video sequences into different stories using video abstraction such as closed caption, audio and video content.