semantics and multimedia
DESCRIPTION
This is Gerald Friedland's presentation for SVST's Multi-Media and the Semantic Web.TRANSCRIPT
Advances in Semantic Analysis of Multimedia
Dr. Gerald FriedlandInternational Computer Science Institute Berkeley, [email protected]
The Internet Today
2
Internet Use Today
3
Raphaël Troncy: Linked Media: Weaving non-textual content into the Semantic Web, MozCamp, 03/2009.
Types of Videos
4
5
Addressable Market forEnterprise Video Applications
Security $1.2 Billion
(Total Market $7.8B, 2005)(Source: JP Freeman)
($7B in 06. Source Lehman)
Asset Tracking $480m by 2010
(RFID in 2006 2.4B)(Total Asset protection $14.7B)(Source: Lehman report 2006)
QA/Operational Efficiency$700m
(source: Envysion, Arrowsight, corporate
analysis)
Training$600m
(source: Forrester Enterprise Software
report 2005)
Compliance$450m
(source: JP Freeman)
BI$400m
(Reporting and Analysis 4B)(Total BI market $13.3B)
(source: IDC BI tools 03-08)
IntelligentMarketing
$200m(source: T3CI corporate
analysis)
Government
(Intelligence, Defense, Homeland Security)
$4.0 Billion Commercially
Multimedia Capabilities: 1985
• Record• Store• Play• Random Seek• Annotate Manually
6
Multimedia Capabilities: 2009
• Record• Store• Stream• Play• Random Seek• Annotate Manually
7
Multimedia Capabilities: Wanted• Semantic Navigation• Search• Content Compare• Object Cut & Paste• Annotate Automatically• Infer over Content
8
=> Make multimedia “understandable” for computers.
Problems
9
•Multimedia data very dense manual annotation not feasable
•Multimedia content analysis is difficult and rarely good enough to create reliable products.
My Research...
Features
Recognition
Understanding
Filtering
Machine Learning
Context
AudioImages Video Text
Semantic Computing
Artificial Intelligence
Signal/Text Processing
KnowledgeNetwork
Semantic Web
My Research...
Hypotheses:• Multimedia content analysis works
better when every cue is taken into account (eg. video AND audio).
• Semantic is enabled through context. Converts AI research into products.
Context
• Inclusion of prior knowledge• Combination of algorithms• Multimodality:
– audio+video+...– extra hardware
• Human interaction• ...
12
Sources of Context:
Context as Key: Example 1
13
→ →
Visual Object Extraction
Cut
Paste
Horse
Meadow^V
Simple Interactive Object Extraction (SIOX)
14
→ →
Image User Input Output
Context delivered by human interaction
15
SIOX: Algorithm IdeaColor Signatures from image retrieval:
Y. Rubner, C. Tomasi, and L. J. Guibas: The Earth Mover’s Distance as a Metric for Image Retrieval. Int. Journal of Computer Vision, 40(2):99–121, 2000.
Idea: Instead of searching and image database, use Color Signatures to search inside an image.
16
SIOX in GIMPSIOX
Button
G. Friedland, K. Jantz, T. Lenz, F. Wiesel, R. Rojas: “Object Cut and Paste in Images and Videos”, International Journal of Semantic Computing Vol 1, No 2, pp. 221-247, World Scientific, USA, June 2007.
17
SIOX in Inkscape
18
SIOX in Blender
19
Extensions
→
Extracting multiple similar objects at once:
20
Sub-Pixel Refinement
→
→
Problem: Spill colors and foreground disappearance
Original SIOX GraphCut
21
Sub-Pixel Refinement
→
→
Detail Refinement Brush: Coarse Interaction
22
VideoSIOX
1st Frame:
Subsequent Frames:
24
Shoesurfer
25
Shoesurfer
26
Shoesurfer
27
Shoesurfer
28
Shoesurfer
Context as Key: Example 2
29
Speaker Diarization: Who Spoke When?
30
Audiotrack:
Segmentation:
Clustering:
G. Friedland, O. Vinyals, Y. Huang, C. Müller: “Prosodic and other Long-Term Features for Speaker Diarization”, IEEE Transactions on Audio, Speech, and Language Processing, Vol 17, No 5, pp 985--993, July 2009.
Analyzing Meetings
31
Dominance Estimation
I Know You...
33
http://www.icsi.berkeley.edu/~fractor/ioda_demo.avi
Narrative Theme Navigation
34
G. Friedland, L. Gottlieb, A. Janin: “Joke-o-mat: Browsing Sitcoms Punchline by Punchline”, Proceedings of ACM Multimedia, Beijing, China, October 2009.
Joke-O-Mat: Demo
35
http://www.youtube.com/watch?v=1qfa84Ulm5s
36
GStreamer
Source Recorder
User
Component 1
User
Component 2
User
Component n
Appscio
.
.
.
File
Device
Driver
Connecting Multimedia and Semantic Technologies
37
Custom Event
Source 1
Custom Event
Source 2
Custom Event
Source n
.
.
.
C/C++/Java
Interface
Pipeline Framework
Video Application Server
Scripting & Logic Engine
Web Technology
Interface
Events
Integrated
Development
Environment
Services Connector
Code
Semantic Media Framework
http://www.appscio.com
Semantic Analysis of Multimedia Data• enables automatic logical
inference on perceptually encoded data
• enables more “natural” interaction with the computer: “do what the user means”
• Interfaces nicely with Semantic Web technologies
38
A note...
39
James A. Hendler
40
MySTT
Open-Source, open-model, state-of-the-art speech recognizer for multiparty conversations.
Release Date: February 2010
41
4th IEEE International Conference on Semantic Computing 2010
Paper Deadline: May 3rd, 2010
Upcoming...
42
Thank You!
43
Questions?Contact:Dr. Gerald FriedlandInternational Computer Science Institute Berkeley, CAhttp://[email protected]