semantics and multimedia

Advances in Semantic Analysis of Multimedia

Dr. Gerald FriedlandInternational Computer Science Institute Berkeley, [email protected]

mailto:[email protected]


The Internet Today

2

Internet Use Today

3

Raphaël Troncy: Linked Media: Weaving non-textual content into the Semantic Web, MozCamp, 03/2009.

Types of Videos

4

5

Addressable Market forEnterprise Video Applications

Security $1.2 Billion

(Total Market $7.8B, 2005)(Source: JP Freeman)

($7B in 06. Source Lehman)

Asset Tracking $480m by 2010

(RFID in 2006 2.4B)(Total Asset protection $14.7B)(Source: Lehman report 2006)

QA/Operational Efficiency$700m

(source: Envysion, Arrowsight, corporate

analysis)

Training$600m

(source: Forrester Enterprise Software

report 2005)

Compliance$450m

(source: JP Freeman)

BI$400m

(Reporting and Analysis 4B)(Total BI market $13.3B)

(source: IDC BI tools 03-08)

IntelligentMarketing

$200m(source: T3CI corporate

analysis)

Government

(Intelligence, Defense, Homeland Security)

$4.0 Billion Commercially

Multimedia Capabilities: 1985

• Record• Store• Play• Random Seek• Annotate Manually

6

Multimedia Capabilities: 2009

• Record• Store• Stream• Play• Random Seek• Annotate Manually

7

Multimedia Capabilities: Wanted• Semantic Navigation• Search• Content Compare• Object Cut & Paste• Annotate Automatically• Infer over Content

8

=> Make multimedia “understandable” for computers.

Problems

9

•Multimedia data very dense manual annotation not feasable

•Multimedia content analysis is difficult and rarely good enough to create reliable products.

My Research...

Features

Recognition

Understanding

Filtering

Machine Learning

Context

AudioImages Video Text

Semantic Computing

Artificial Intelligence

Signal/Text Processing

KnowledgeNetwork

Semantic Web

My Research...

Hypotheses:• Multimedia content analysis works

better when every cue is taken into account (eg. video AND audio).

• Semantic is enabled through context. Converts AI research into products.

Context

• Inclusion of prior knowledge• Combination of algorithms• Multimodality:

– audio+video+...– extra hardware

• Human interaction• ...

12

Sources of Context:

Context as Key: Example 1

13

→ →

Visual Object Extraction

Cut

Paste

Horse

Meadow^V

Simple Interactive Object Extraction (SIOX)

14

→ →

Image User Input Output

Context delivered by human interaction

15

SIOX: Algorithm IdeaColor Signatures from image retrieval:

Y. Rubner, C. Tomasi, and L. J. Guibas: The Earth Mover’s Distance as a Metric for Image Retrieval. Int. Journal of Computer Vision, 40(2):99–121, 2000.

Idea: Instead of searching and image database, use Color Signatures to search inside an image.

16

SIOX in GIMPSIOX

Button

G. Friedland, K. Jantz, T. Lenz, F. Wiesel, R. Rojas: “Object Cut and Paste in Images and Videos”, International Journal of Semantic Computing Vol 1, No 2, pp. 221-247, World Scientific, USA, June 2007.

17

SIOX in Inkscape

18

SIOX in Blender

19

Extensions

→

Extracting multiple similar objects at once:

20

Sub-Pixel Refinement

→

→

Problem: Spill colors and foreground disappearance

Original SIOX GraphCut

21

Sub-Pixel Refinement

→

→

Detail Refinement Brush: Coarse Interaction

22

VideoSIOX

1st Frame:

Subsequent Frames:

More Information

http://www.siox.org

23

http://www.siox.org

http://www.siox.org

24

Shoesurfer

25

Shoesurfer

26

Shoesurfer

27

Shoesurfer

28

Shoesurfer

Context as Key: Example 2

29

Speaker Diarization: Who Spoke When?

30

Audiotrack:

Segmentation:

Clustering:

G. Friedland, O. Vinyals, Y. Huang, C. Müller: “Prosodic and other Long-Term Features for Speaker Diarization”, IEEE Transactions on Audio, Speech, and Language Processing, Vol 17, No 5, pp 985--993, July 2009.

Analyzing Meetings

31

Dominance Estimation

I Know You...

33

http://www.icsi.berkeley.edu/~fractor/ioda_demo.avi





Narrative Theme Navigation

34

G. Friedland, L. Gottlieb, A. Janin: “Joke-o-mat: Browsing Sitcoms Punchline by Punchline”, Proceedings of ACM Multimedia, Beijing, China, October 2009.

Joke-O-Mat: Demo

35

http://www.youtube.com/watch?v=1qfa84Ulm5s



36

GStreamer

Source Recorder

User

Component 1

User

Component 2

User

Component n

Appscio

.

.

.

File

Device

Driver

Connecting Multimedia and Semantic Technologies

37

Custom Event

Source 1

Custom Event

Source 2

Custom Event

Source n

.

.

.

C/C++/Java

Interface

Pipeline Framework

Video Application Server

Scripting & Logic Engine

Web Technology

Interface

Events

Integrated

Development

Environment

Services Connector

Code

Semantic Media Framework

http://www.appscio.com

http://www.siox.org

http://www.siox.org

Semantic Analysis of Multimedia Data• enables automatic logical

inference on perceptually encoded data

• enables more “natural” interaction with the computer: “do what the user means”

• Interfaces nicely with Semantic Web technologies

38

A note...

39

James A. Hendler

40

MySTT

Open-Source, open-model, state-of-the-art speech recognizer for multiparty conversations.

Release Date: February 2010

41

4th IEEE International Conference on Semantic Computing 2010

Paper Deadline: May 3rd, 2010

Upcoming...

42

Thank You!

43

Questions?Contact:Dr. Gerald FriedlandInternational Computer Science Institute Berkeley, CAhttp://[email protected]

http://www.gerald-friedland.org

http://www.gerald-friedland.org