motivation - university of...
TRANSCRIPT
Motivation
Where is my W-2 Form?
Video-based Tracking
Camera view of the desk
Camera
Overhead video camera
Example
40 minutes, 1024x768 @ 15 fps
System Overview
UserInput Video
System Overview
User
Internal Representation
Video Analysis
Vision Engine
Desk Desk
T T+1
Input Video
Query Interface
System Overview
Query(where is my W-2 form?)
Internal Representation
Input Video User
Video Analysis
Desk Desk
T T+1
System Overview
Query(where is my W-2 form?)
Internal Representation
Input Video User
Video Analysis
Answer
Query Interface
Desk Desk
T T+1
System Overview
User
Internal Representation
Video Analysis
Input Video
Vision Engine
Desk Desk
T T+1
Vision Problem
… …
Vision ProblemEvent
… …
Vision Problem
…
Event
… …
Desk
Vision Problem
Desk
… …
Event
… …
Desk
Vision Problem
… …tut-article.pdf
sanders01.pdf
objectspaces.pdf kidd94.pdf
lowe04sift.pdf
Event
… …
Desk Desk
Vision Problem
… …
Event
Scene Graph(DAG)
… …
tut-article.pdf
sanders01.pdf
objectspaces.pdf kidd94.pdf
lowe04sift.pdf
Desk Desk
Assumptions
• Simplifying– Corresponding electronic copy exists
Assumptions
• Simplifying– Corresponding electronic copy exists– 3 event types: move/entry/exit– One document at a time– Only topmost document can move– No duplicate copies of same document
Assumptions
• Simplifying– Corresponding electronic copy exists– 3 event types: move/entry/exit– One document at a time– Only topmost document can move– No duplicate copies of same document
• Other– Desk need not be initially empty
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
Document Recognition
Scene Graph Update
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
Document Recognition
before after
Scene Graph Update
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
“A document moved from (x1,y1) to (x2,y2)”
Document Recognition
before after
Scene Graph Update
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
“A document moved from (x1,y1) to (x2,y2)”
Document Recognition
before after
File1.pdf
File2.pdf
File3.pdf
Scene Graph Update
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
“A document moved from (x1,y1) to (x2,y2)”
Document Recognition
before after
File1.pdf
File2.pdf
File3.pdf
Scene Graph Update
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
“A document moved from (x1,y1) to (x2,y2)”
Document Recognition
before after
File1.pdf
File2.pdf
File3.pdf
Scene Graph Update
Event Detection
… …
Event Detection
time
Frame Difference
… …
Event Detection
time
Threshold
Event Frames
time
… …
Frame Difference
Event Detection
before after
Event Frames
… …
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
“A document moved from (x1,y1) to (x2,y2)”
Document Recognition
before after
File1.pdf
File2.pdf
File3.pdf
Scene Graph Update
Event Interpretation
Move
before after
Event Interpretation
Move
Entry
before after
Event Interpretation
Move
Entry
Exit
before after
Event Interpretation
Move
Entry
Exit
Motion: (x,y,θ)
before after
Event Interpretation
Move
Entry
Exit
1. Move vs. Entry/Exit
before after
Event Interpretation
Move
Entry
Exit
2. Entry vs. Exit
before after
Event Interpretation
• Use SIFT [Lowe 99]
– Scale Invariant Feature Transform– Distinctive feature descriptor– Reliable object recognition
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Move vs. Entry/Exit
before after
Motion: (x,y,θ)
Entry vs. Exit
before after
Example 1 (entry)
Entry vs. Exit
before after
Example 1 (entry)
Entry vs. Exit
before after
Example 1 (entry)
Entry vs. Exit
before after
…
File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf
…
Entry vs. Exit
before after
Example 2 (entry)
Entry vs. Exit
before after
…
File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf
…
Entry vs. Exit
before after
Entry vs. Exit
before after
?
Entry vs. Exit
before after
?
Entry vs. Exit
before after
……
Entry vs. Exit
before after
……
Amount of change
time
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
“A document moved from (x1,y1) to (x2,y2)”
Document Recognition
before after
File1.pdf
File2.pdf
File3.pdf
Scene Graph Update
Document Recognition
…
File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf
• Match against PDF image database
…
Document Recognition
…
File1.pdf File2.pdf File3.pdf File4.pdf File5.pdf File6.pdf
• Match against PDF image database
• Performance– Can differentiate between ~100 (or more) documents– ~200x300 pixels per document for reliable match
…
Algorithm OverviewInput
Frames… …
Event Detection
Event Interpretation
“A document moved from (x1,y1) to (x2,y2)”
Document Recognition
before after
File1.pdf
File2.pdf
File3.pdf
Scene Graph Update
Scene Graph Update
before after
Motion: (x,y,θ)
Desk
Scene Graph Update
before after
Motion: (x,y,θ)
Desk
Scene Graph Update
before after
Motion: (x,y,θ)
Desk Desk
Scene Graph Update
before after
Motion: (x,y,θ)
Desk
?
Scene Graph Update
before after
Motion: (x,y,θ)
Desk Desk
Scene Graph Update
Desk DeskDeskDesk
…
Scene Graph Update
Desk DeskDeskDesk
…
Photo Sorting Example
Current Directions
• Handle more realistic desktops• Speed up processing time• Other useful functionalities
– Written annotation– Version management– Bookmark– Multi-user queries
For More Information
• Our publications– Jiwon Kim, Steven M. Seitz, Maneesh Agrawala. The Office of
the Past. IEEE Workshop on Real-time Vision for HCI, 2004.– Jiwon Kim, Steven M. Seitz, Maneesh Agrawala. Video-based
Document Tracking: Unifying Your Physical and Electronic Desktops. To appear in Proceedings of UIST, 2004.
• Other related work– David G. Lowe. Distinctive image features from scale invariant
keypoints. International Journal of Computer Vision, 2004.– Pierre Wellner. Interacting with paper on the DigitalDesk.
Communications of the ACM, 36(7):86.97,1993.– D. Rus and P. deSantis. The self-organizing desk. In
Proceedings of International Joint Conference on Artificial Intelligence, 1997.
For More Information
• Our publications– Jiwon Kim, Steven M. Seitz, Maneesh Agrawala. The Office of
the Past. IEEE Workshop on Real-time Vision for HCI, 2004.– Jiwon Kim, Steven M. Seitz, Maneesh Agrawala. Video-based
Document Tracking: Unifying Your Physical and Electronic Desktops. To appear in Proceedings of UIST, 2004.
• Other related work– David G. Lowe. Object recognition from local scale-invariant
features. International Conference on Computer Vision, 1999.– Pierre Wellner. Interacting with paper on the DigitalDesk.
Communications of the ACM, 36(7):86.97,1993.– D. Rus and P. deSantis. The self-organizing desk. In
Proceedings of International Joint Conference on Artificial Intelligence, 1997.