ardhendu behera university of fribourg passive capture and structuring of lectures sugata...

48
Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell University

Post on 22-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Passive Capture and Structuring of LecturesPassive Capture and

Structuring of Lectures

Sugata Mukhopadhyay, Brian Smith

Department of Computer Science Cornell University

Page 2: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

2Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

IntroductionIntroduction

• Multimedia Presentations– Manual– Labor-intensive

• Experience-on-Demand (EOD) of CMU– Capture & abstract personal experiences

(audio / video)– Synchronization of Audio, Video & position

data

Page 3: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

3Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Introduction Contd.Introduction Contd.

• Classroom 2000 (C2K, Georgia Tech)– Authoring multimedia documents from live

events– Data from white boards, cameras, etc. are

combined to create multimedia documents for classroom activities

• Similarity (EOD & C2K)– Automatically capture– Author Multimedia documents

Page 4: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

4Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Introduction Contd.Introduction Contd.

• Dissimilarity :– C2K: Invasive capture (Explicitly start capture),

Structured environment (Specific)– EOD : Passive capture, unstructured

Unstructured Structured

Invasive C2K

Passive EOD Lecture

Browser

Page 5: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

5Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

MotivationMotivation

• Structured Multimedia document from seminars, talk, or class

• Speaker can walk, press a button and give a presentation using blackboards, whiteboards, 35mm slides, overheads, or computer projection

• One hour later, structured presentation on web

Page 6: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

6Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

OverviewOverview

• Cameras ( Encoded in MPEG format)– Overview camera (entire lecture)– Tracking camera (H/W built tracker), tracks

speaker, capture head & shoulders

• Upload slides to server (Speaker)

Page 7: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

7Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

OverviewOverview• Video Region

– RealVideo• Index

– Title & duration of current slide

– Synchronized with video

– Prev / Next skip slides

• Timeline – Boxes represents

the duration of slide

Timeline Slides

Video

Index

Page 8: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

8Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Problems HandledProblems Handled

• Synchronization– Transitive (position of event A in a timeline)

• A<->B => B can be add to same timeline• Synchronization error E1 = (A,B) and E2 = (B,C)

=> error (A,C) = E1 + E2

– Collected data• Timed (T-data, Video)• Untimed (U-data, Electronic slides)

Page 9: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

9Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Problems Handled Contd.Problems Handled Contd.

• Synchronization– Time-timed Synchronization (TTS)

• Two video streams

– Timed-untimed Synchronization (TUS)• Slides with video

– Untimed-untimed Synchronization (UUS)• Slides titles : Parsing the HTML produced by

PowerPoint

• Automatic Editing– Rule based structuring of Synchronized data

Page 10: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

10Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Timed Synchronization

Timed-Timed Synchronization

• Temporal link between streams captured from independent cameras

Δ1

Δ

Δ2V2(t)

V1(t)

Synchronization point

V1(t + Δ) V2(t ± ε) tolerance of To solve this, consider one or more Synchronization Points

Δ = Δ1 - Δ2 maximum uncertainty = 1 + 2

Page 11: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

11Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Timed Synchronization Contd.

Timed-Timed Synchronization Contd.

• Artificial creation of Synchronization Point of duration 1 second

• One of the channel in MPEG streams• Sound card is used for tone generation• Later, detection of the positions of tones in each stream.

Camera Machine

MPEG Audio

Left Right

Camera Machine

MPEG AudioRight Left

SoundCard

Sync Tone

Wireless MicReceiver

Wireless MicReceiver

Speaker Audio

Page 12: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

12Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Timed Synchronization Contd.

Timed-Timed Synchronization Contd.

• Detection of Synchronization Tone– Brute force approach (Fully decoding of MPEG

Audio)– Proposed Method

• Scale factors indicates overall volume of packets• Summing up Scale factors for volume estimation• Exceeds certain thresholds• Assuming MPEG-2 : worst error 26 ms (22.5 * 1152

microseconds) and max error 52 ms• Video: 30 FPS, e < 1/30 seconds

Page 13: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

13Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Timed Synchronization Contd.

Timed-Timed Synchronization Contd.

• Tighter bound (22.5 kHz)– Error <= 1/22.5 <= 44 micro sec < 26 ms (max

error 26 ms)– For video of 15 FPS, max error 66 ms– Using this in MPEG System, a tone of 70

seconds can be located < 2 seconds

Page 14: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

14Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Untimed SynchronizationTimed-Untimed Synchronization

• Synchronization of slides with one of the video

1 2{ , ,..., }, ( ) :[0, ] , Where d is the duration of ( )nS S S S f t d S V t

Use a tolerance of 0.5 sec for the synchronization

Page 15: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

15Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Untimed Synchronization Contd.

Timed-Untimed Synchronization Contd.

• Segmentation of slides from video of V(t)

0 1 1{ , ,..., }: ( , ) No change of slide imagei ikT t t t V t t

• Color Histogram– Slide having same background– Low resolution

• Feature based Algorithm– Clipping frames, Low-pass filter, Adaptively

thresholded– Let B1 and B2 is the two consecutive processed

frames 1 2( 1, 2)1 2

Td dDist B Bb b

Page 16: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

16Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Untimed Synchronization Contd.

Timed-Untimed Synchronization Contd.

– Assumption : Slides contain dark foreground and light background

– Applied to I-frame of MPEG video with 0.5 sec interval

• Matching– Matching performed with the original slides for

confirmation of slide change– Similarity > 95%, match declared & terminated– Similarity > 90%, highest similarity is returned– Too much noisy to match

Page 17: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

17Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Untimed Synchronization Contd.

Timed-Untimed Synchronization Contd.

• Unwrapping– Video sequence contain foreshorten

version of slides– Quadrilateral F -> Rectangle (size

as original)– Camera & Projector fixed, corner

points of F are same– Perspective transform -> Rectangle– Bilinear Interpolation (Rectangle)

Page 18: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

18Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Untimed Synchronization Contd.

Timed-Untimed Synchronization Contd.

Page 19: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

19Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Untimed Synchronization Contd.

Timed-Untimed Synchronization Contd.

• Similarity– Hausdorff Distance– Dilation (radius 3) of pixels in original

binary images– Setting all pixel to black in the dilation

radius of any black pixels to count overlap (G)

– b # of black pixels dilation (for extracted one, F)

– b’ # of black pixels F & G– Forward match ratio = b’ / b– Similarly, reverse match ratio is

calculated by dilating the F & keeping G (without dilating)

Page 20: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

20Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Timed-Untimed Synchronization Contd.

Timed-Untimed Synchronization Contd.

• Evaluation– 106 slides, 143 transitions– Accuracy 97.2 %– Need to be tuned for dark background and

light foreground

Page 21: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

21Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Automatic EditingAutomatic Editing

• Combining captured videos into single stream• Constraints

– Footage from overview must be shown 3 sec before and 5 second after slide change

– 3 sec < any shot < 25 sec

• Heuristic algorithm Edit Decision List (EDL)– Shot taken from one video source– Consecutive shots come from different video source– Shot: Start time, duration, which video source– Concatenating the footage of shots (final edited video)

Page 22: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

22Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Automatic Editing Contd.Automatic Editing Contd.

Page 23: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

23Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Automatic Editing Contd.Automatic Editing Contd.

Shots from overview camera < 3 sec & separated from the tracking camera are merged

Short from tracking camera > 25 sec are broken to 5 sec shots

Page 24: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

24Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

ConclusionConclusion

• Automatic Synchronization and Editing Systems• Classification of different kind of Synchronization• Slide change detection for dark foreground and

light background (Textual part)• Slide Identification confirms slide change

detection• Rotation and translation can affect the matching

Page 25: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

25Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Future WorkFuture Work

• Motion vector analysis and scene cut detection (Trigger switch to overview camera)

• Automatic enhancement to poor lighting• Orientation and position of speaker for editing• Shots from more cameras• Use of blackboards, whiteboards and

transparencies

Page 26: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Looking at Projected Documents:

Looking at Projected Documents:

Event Detection & Document Identification:

Event Detection & Document Identification:

Page 27: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

27Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

IntroductionIntroduction• Documents play major role in presentations, meetings,

lectures, etc.• Captured as a video stream or images

2. Identification of extracted low-resolution document images

1. Temporal segmentation of meetings based on documents events (projected): Inter-documents (slide change, etc) Intra-documents (animation, scrolling, etc) Extra-documents (sticks, beams, etc)

• Goal: annotation & retrieval using visible documents

Page 28: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

28Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

MotivationMotivation

• Detection & identification from low-resolution devices

• Current focus on projected documents

• Extendable for documents on table

• Captured as a video stream (Web-cam)

Page 29: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

29Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Slide Change DetectionSlide Change Detection

• Slides in a slideshow: same layout, background, pattern, etc.

• Web-cam is auto-focusing (nearly 400 ms for stable image)

• Variation of lighting condition

• Presentation slides as a video stream

Page 30: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

30Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Slide Change Detection (Cont’d)

Slide Change Detection (Cont’d)

Different slides with similar text layout

Fading during auto-focusingDuring Auto-focusing period

Page 31: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

31Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Slide Change Detection (Cont’d)

Slide Change Detection (Cont’d)

• Existing methods for scene cut detection– Histogram (color and gray)– Cornell method (Hausdorff Distance)

• Histogram methods fail due to: a) low-resolution b) low-contrast c) auto-focusing d) fading

• Cornell: Uses identification to validate the changes

• Fribourg method: Slide stability - Assumption : Slide visible 2 seconds slide

skipping

Page 32: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

32Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Proposed Slide Change Detection

Proposed Slide Change Detection

x0 x1

xi-1xi xN-2

xN-1

210

xi+1

i +1ii -1 NN -1

Stability Confirmation

2 s2 s2 s

0.5 s0.5 s

Check for Stability

1{ ( ), ( 2)}Dist S t S t T 0,0.5,1,1.5,.....,t D

2 1

0 1 1

( )

( ){ , ,.., }N

E XT T

Var XX x x x

Page 33: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

33Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Ground-Truth PreparationGround-Truth Preparation

• Based on SMIL

300 Slideshows collected from web

• Automatic generation of SMIL file: Random duration of each slide

• Contains slide id, start time, stop time and type (skip or normal)

Page 34: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

34Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

EvaluationEvaluation

1. Ground-Truth: SMIL XML

2. Slideshow video Slide Change Detection XML

3. Evaluation: Compare 1 & 2

4. Metric used: Recall (R), Precision (P), F-measure (F)

<slide id="1" imagefile="Slide1.JPG" st="0000000" et="9.641000" type="normal" /><slide id="2" imagefile="Slide2.JPG" st="9.641000" et="12.787199" type="normal" /><slide id="3" imagefile="Slide15.JPG" st="12.787199" et="13.775500" type="skip" />   <slide id="4" imagefile="Slide11.JPG" st="13.775500" et="14.341699" type="skip" /><slide id="5" imagefile="Slide25.JPG" st="14.341699" et="15.885400" type="skip" /><slide id="6" imagefile="Slide20.JPG" st="15.885400" et="16.476199" type="skip" /><slide id="7" imagefile="Slide9.JPG" st="16.476199" et="18.094100" type="skip" /><slide id="8" imagefile="Slide3.JPG" st="18.094100" et="23.160102" type="normal" /><slide id="9" imagefile="Slide4.JPG" st="23.160102" et="26.523102" type="normal" />

……..

An example of Ground-Truth SMIL file

Page 35: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

35Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

ResultsResults

Fribourg (R:0.84,P:0.82,F:0.83)Cornell (R:0.40, P:0.21, F:0.23)

1 Frame Tolerance

R:0.80, P:0.83, F:0.81

1 Frame Tolerance

R:0.92, P:0.96, F:0.93

4 Frame Tolerance

1 Frame Tolerance

Color Hist (R:0.07, P:0.04, F:0.05) Gray Hist (R:0.18, P:0.12, F:0.13)

Page 36: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

36Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Results (Cont’d)Results (Cont’d)4 Frames Tolerance

Fribourg (R:0.93, P:0.91, F:0.92)

Cornell (R:0.80, P:0.51, F:0.54)

4 Frames Tolerance

Color Hist (R:0.13, P:0.09, F:0.10)

Gray Hist (R:0.27, P:0.17, F:0.19)

Page 37: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

37Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Low-resolution Docs Identification

Low-resolution Docs Identification

• Difficulties in Identification– Hard to use existing DAS (50-100 dpi)– Performance of OCR is very bad

– Hard to extract complete layout (Physical, Logical)– Rotation, translation and resolution affect global image matching

– Captured images vary : lighting, flash, distance, auto-focusing, motion blur, occlusion, etc.

Page 38: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

38Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Proposed Docs IdentificationProposed Docs Identification

• Based on Visual Signature– Shallow layout with zone labeling– hierarchically structured using features’ priority

• Identification : matching of signatures

• Matching : simple heuristics, following hierarchy of signature

Page 39: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

39Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Visual Signature ExtractionVisual Signature Extraction

• Common resolution, RLSA

• Zone labeling (text, image, solid bars, etc.)

• Blocks separation: Projection Profiles

• Text blocks (One line per block)

• Bullet and vertical text line extraction

Page 40: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

40Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Visual Signature ExtractionVisual Signature Extraction

• Feature vector for Image, Bars (Horizontal and Vertical), Bullets : min min max max( , , , , )Y X H W P

Bounding box of various features

min min max max( , , , , , ( , ), )word i i iY X H W N R Y X P

• Feature vector for each Text line and, Bar with text (Horizontal and Vertical):

Page 41: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

41Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Structuring Visual SignatureStructuring Visual Signature– Hierarchy depends on extraction process & real world slideshow – Narrows the search path during matching

<VisualSign> <BoundingBox NoOfBb="10"> <Text NoOfLine="7"> <HasHorizontalText NoOfSentence="7"> <S y="53" x="123" width="436" height="25" NoOfWords="4" PixelRatio="0.40" /> … </HasHorizontalText> <HasVerticalText NoOfSentence="0" /> </Text> <HasImage NoOfImage="3">  <Image y="1" x="16" width="57" height="533" PixelRatio="0.88" /> …  </HasImage> <HasBullet NoOfBullets="2">  <Bullet y="122" x="141" width="12" height="12" PixelRatio="1.0" />   .. </HasBullet> <Line NoOfLine="0"> <HasHLine NoOfLine="0" /> <HasVLine NoOfLine="0" /> </Line> <BarWithText NoOfBar="0"> <HBarWithText NoOfBar="0" /> <VBarWithText NoOfBar="0" /> </BarWithText></BoundingBox></VisualSign>

Page 42: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

42Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Structured Signature-based Matching

Structured Signature-based Matching

• Search Technique:– Takes the advantage of hierarchical structure of

visual signature

– Higher level features compared lower-level features matched

Tree representation of features in visual signature

f3

f2

f7 f5 f4f1

H-Text

Image Bullets

H-Line V-LineV-Text

F

Text Line

f8 f6

HBarText VBarText

Text Bar

Bbox

Page 43: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

43Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

ResultsResults• Evaluation based on Recall and Precision

• ~ 200 slide images (web-cam) queried (repository 300 slides) (R:0.94, P:0.80, F:0.86)

Matching Performance

Page 44: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

44Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

ConclusionConclusion

• Proposed Slide Change Detection– Automatic evaluation

– Performance : best compared to state-of-the-art

– Lower time and computational complexity

– Overcomes: auto-focusing, fading nature of web-cam– Performance : accuracy improved compared to Cornell (low tolerance)

– Could be used for meeting indexing : high precision

Page 45: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

45Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

ConclusionConclusion• Proposed Slide Identification:

– Based on Visual Signature– No need for any classifier– Fast : only Signature matching (without

global image matching)

– Without OCR– Could be helpful for real-time application

(translation, mobile OCR, etc.)

– Applicable for digital cameras and mobile phones• Finally: Documents as a way for indexing & retrieval

Page 46: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

46Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Future WorksFuture Works

• Detection and identification : pointed and partially occluded documents

Identification with complex background structure

Evaluation: Digital cameras, mobile phones

Background pattern and color information to Visual Signature

Identification of documents on table

• Evaluation of animation

Page 47: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

47Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Possible ProjectsPossible Projects

• Deformation correction (Perspective, Projective, etc.)

• Automatic detection of projected documents in the captured video

• Detection of occluded objects

• Background pattern recognition

Page 48: Ardhendu Behera University of Fribourg Passive Capture and Structuring of Lectures Sugata Mukhopadhyay, Brian Smith Department of Computer Science Cornell

Ardhendu BeheraArdhendu BeheraUniversity of FribourgUniversity of Fribourg

Thank You !

Thank You !