media - is.inf.uni-due.de€¦ · coding and compression methods text images audio video other...

62
Chapter 2 Media Media classification Requirements for media representations Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classification 2.1.1 Basic concepts kinds of media: perception media how do humans percept the information? – viewing: text, image, video – listening: music, sound, speech touching tasting smelling balance representation media how is the information coded? e.g. text in ASCII presentation media which devices are used for I/O to/from the computer? input keyboard, camera, microphone, mouse, data glove output paper, monitor, loudspeaker Universit¨ at Dortmund, Informatik VI, N. Fuhr storage media where is information stored? microfilm, paper, floppy disc, harddisc, CD-ROM, DVD, tape communication media what is used for transmitting information? coax cable, twisted pair, FDDI, electromagnetic waves information exchange media what is used for exchanging information between dif- ferent sites? paper, floppy disc, CD, microfilm (see also: communication media) here: perception media Universit¨ at Dortmund, Informatik VI, N. Fuhr presentation space each medium yields presentation value in presentation space presentation value: representation of information in the medium e.g. text: sequence of characters speech: sound waves dimensions of presentation space spatial dimensions (2-3) temporal dimension (1) classification of media according to temporal dimen- sion discrete: timed independent e.g. text, graphics continuous (temporal): time dependent video, audio, sensor signals Universit¨ at Dortmund, Informatik VI, N. Fuhr

Upload: others

Post on 18-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Chapter 2

Media

• Media classification• Requirements for media representations• Coding and compression methods• Text• Images• Audio• Video• Other media

10

2.1 Media Classification

2.1.1 Basic concepts

kinds of media:

• perception mediahow do humans percept the information?

– viewing: text, image, video– listening: music, sound, speech– touching– tasting– smelling– balance

• representation mediahow is the information coded?e.g. text in ASCII

• presentation mediawhich devices are used for I/O to/from the computer?

– inputkeyboard, camera, microphone, mouse, dataglove

– outputpaper, monitor, loudspeaker

Universitat Dortmund, Informatik VI, N. Fuhr

• storage mediawhere is information stored?microfilm, paper, floppy disc, harddisc, CD-ROM,DVD, tape

• communication mediawhat is used for transmitting information?coax cable, twisted pair, FDDI, electromagneticwaves

• information exchange mediawhat is used for exchanging information between dif-ferent sites?paper, floppy disc, CD, microfilm(see also: communication media)

here: perception media

Universitat Dortmund, Informatik VI, N. Fuhr

presentation space

each medium yields presentation value in presentationspace

presentation value:representation of information in the mediume.g. text: sequence of charactersspeech: sound waves

dimensions of presentation space

• spatial dimensions (2-3)• temporal dimension (1)

classification of media according to temporal dimen-sion

• discrete: timed independente.g. text, graphics

• continuous (temporal): time dependentvideo, audio, sensor signals

Universitat Dortmund, Informatik VI, N. Fuhr

Page 2: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

audio

t

x

yim

age

video

text is a linear medium

...

Universitat Dortmund, Informatik VI, N. Fuhr

2.1.2 Data streams

required for continuous media

properties of data streams:

• classical:– asynchronuous– synchronuous:

finite upper bound for end-to-end time difference– isochronuous:

finite upper bound for start and end time differ-ence

• periodicitychange of time interval for transmission of data pack-ets

– periodical:– weakly periodical– aperiodical

e.g. for transmission of events• variation of data rate

for subsequent information units– uniform– weakly uniform

periodic variation of data volume per informa-tion unite.g. MPEG: ratio of I:P:B frames

– varying

Universitat Dortmund, Informatik VI, N. Fuhr

• dependence of subsequent packets– dependent– independent (data stream with “holes”)

• information unitscan be defined differentlyhere: information unit = logical data unitdifferent granularities possiblee.g. video: pixel — raster — frame — clip — film

Universitat Dortmund, Informatik VI, N. Fuhr

2.2 Requirements for media repre-

sentations

• compression• easy processing• transmission (progressive mode)• referencing/addressing• logical structure• layout specification• attributes• annotation

Universitat Dortmund, Informatik VI, N. Fuhr

Page 3: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.3 Coding and compression meth-

ods

goal: reduction of storage/bandwidth requirements

2.3.1 Classification of methods

• losslessexploitation of redundancy (entropy)

• lossygoal: minimum impact on presentation quality

Universitat Dortmund, Informatik VI, N. Fuhr

types of lossless coding methods:

• entropy coding– run-length coding– Huffmann coding– arithmetic coding

• source coding– prediction: DPCM– transformation: FFT, DCT

• hybrid coding: JPEG, MPEG

Universitat Dortmund, Informatik VI, N. Fuhr

2.3.2 Basic methods

2.3.2.1 Lossless coding methods

run-length codingencoding of bytestreamsin case of frequent repetition of a byte:byte + # occurrences(requires an escape byte)ABCAAABBBBCCCCCD →ABCAAA!4B!5CD

zero suppressionspecial case of run-length coding,only run length of special byte is coded

Universitat Dortmund, Informatik VI, N. Fuhr

pattern substitutionreplaces frequent patterns by single codes

frequently used: LZW (Lempel-Ziv-Welch)

uses adaptive table of predefined sizecodes are pointers into the dictionary(typically 9-14 bits)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 4: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

dictionary initialization:character set = codes 0. . . 255

encoding:sequential processing of input characters

1. if string is in table, append next char2. if string is not in table:

a) output last known string’s codeb) add new string to tablec) start new string with char

example:

Prefix Suffix New String Output

∆ a a -a b ab 97b a ba 98a b ab -ab c abc 256c b cb 99b a ba -ba ∆ ba 257

Universitat Dortmund, Informatik VI, N. Fuhr

statistical coding

• characters encoded with different # bits• frequent characters with few bits,

infrequent characters with more bits

Universitat Dortmund, Informatik VI, N. Fuhr

a) Huffman codingrequires probability of occurrence for each characterminimizes # bits for average message

varying # bits for different characters→ prefix property necessary (decoding without backtrack-ing)

Universitat Dortmund, Informatik VI, N. Fuhr

Huffman code example

byte prob. codeA 0.40 00B 0.20 01C 0.20 10D 0.10 110E 0.10 111

0 1

0

0

1

1

D

A B C

E

0 1

avg. # bits/character: 2.2

theoretical optimum:

H =∑

pi · ld 1pi

= 2.12

Universitat Dortmund, Informatik VI, N. Fuhr

Page 5: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

algorithm for code development:

• order characters by decreasing probabilities• repeat

– select 2 lines with lowest probabilities– assign bit for distinction– join lines, form new lines with sum of probabili-

tiesuntil 1 line left

Universitat Dortmund, Informatik VI, N. Fuhr

E 0.13T 0.09A 0.08O 0.08N 0.07R 0.065I 0.065H 0.06S 0.06D 0.04L 0.035C 0.03U 0.03M0.03F 0.02P 0.02Y 0.02B 0.015W0.015G 0.015V 0.010J 0.005K 0.005X 0.005Q 0.0025Z 0.0025

Universitat Dortmund, Informatik VI, N. Fuhr

0.13E 0.13 0.30T 0.09 0.17A 0.08 0.058O 0.08 0.15N 0.07 0.28R 0.065 0.13I 0.065

1.0H 0.106 0.12S 0.06 0.195D 0.04 0.075L 0.035 0.305C 0.03 0.06U 0.03 0.11M0.03 0.05F 0.02 0.42P 0.02 0.040Y 0.02 0.070B 0.015 0.030W0.015 0.115G 0.015 0.025V 0.010 0.02J 0.005 0.010K 0.005 0.02X 0.005 0.010Q 0.0025 0.005Z 0.0025

Universitat Dortmund, Informatik VI, N. Fuhr

b) arithmetic coding

• optimum coding (like Huffmann),but assigns fractions of bits to single characters

• encodes character by considering leading characters

idea:assign each symbol unique interval ⊂ [0, 1](width = character probability)

character string = nesting of intervalsresulting interval represented as floating point number

code definition:

• fix symbol order• assign disjoint ranges [l[s], h[s]) of [0, 1] to symbols s,

width h[s] − l[s] = character probability

encoding of string s1, . . . , sn:

b = l[s1]t = h[s1]for i = 2 to n do

r = t − bb = b + r · l[si]t = t + r · h[si]

• output: arbitrary floating point number ∈ [b, t]

Universitat Dortmund, Informatik VI, N. Fuhr

Page 6: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

example for arithmetic coding

byte prob. rangeA 0.40 [0.0, 0.4)B 0.20 [0.4, 0.6)C 0.20 [0.6,0.8)D 0.10 [0.8,0.9)E 0.10 [0.9,1.0)

Universitat Dortmund, Informatik VI, N. Fuhr

transformation coding

• transforms values into different mathematical space(which is suited better for coding)

• examples:discrete cosine transform (DCT)fast fourier transform (FFT)

Universitat Dortmund, Informatik VI, N. Fuhr

prediction / relative codingencodes only differences between subsequent bytes/blocksexamples:

• integers

5, 8, 12, 13, 15, 18, 23, 28, 29, 40, 60encode differences:

5, 3, 4, 1, 2, 3, 5, 5, 1, 11, 20→ smaller # bits/entry required

• images:homogeneous area → small differences between neigh-boured pixels→ many 0 differences → zero suppression/run-lengthencoding

• still videosmall differences between subsequent images(e.g. in background)

• audio;differential pulse code modulation:encoding of differences between subsequent PCM val-ues

Universitat Dortmund, Informatik VI, N. Fuhr

adaptive coding

• other coding methods:suitable only in typical contextsnon-typical byte sequence → no compression

• adaptive methods– adapt to specific context– but require additional transmission of coding pa-

rameters

Universitat Dortmund, Informatik VI, N. Fuhr

Page 7: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.3.2.2 Lossy coding methods

vector quantizationdivides bytestream into blocks of n bytesuses table with patterns,block approximated by patternblock encoded as index in pattern table

• linear quantization• logarithmic quantization

subband coding

• transformation of certain frequencies only• quality criterion: # bands• used for speech, MPEG audio

Universitat Dortmund, Informatik VI, N. Fuhr

wavelets

wavelet functions:

• orthogonal basis of functions• squared errors sum up

Haar basis:

e(x) = α0 +k∑

i=1

2k−1∑j=1

αij · wij(x)

wij(x) =

1 , if 2j−22i ≤ x < 2j−1

2i

−1 , if 2j−12i ≤ x < 2j

2i

0 , otherwise

derivation of e(x) for example function:

9 7 2 68 4 1 -2

6 2

e(x) = 6 + 2w11(x) + 1w21(x) − 2w22(x)

Universitat Dortmund, Informatik VI, N. Fuhr

example function and Haar basis

1 1

1

1

1 1

w

w

w11

21

22

10

1

1

1w0

Universitat Dortmund, Informatik VI, N. Fuhr

task

approximate f(x) by f ′(x) such that

||f(x) − f ′(x)|| < ε∑x

(f(x) − f ′(x))2 < ε

where f ′(x) is wavelet function and

|{αij |αij 6= 0}| = min

solution

sort coefficients by |αij | · 2−(i/2)

(gives order of increasing squared error)

find maximum n s.th. setting first n αij = 0 yields

||f(x) − f ′(x)|| < ε

example:

e(x) = 6 + 2w11(x) + 1w21(x) − 2w22(x)

coefficients:

(6, α0), (2, α11), (+1, α21), (−2, α22)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 8: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

sorted by increasing squared error:(12, α21

),

(22, α22

),

(2√2, α11

), (6, α0)

Universitat Dortmund, Informatik VI, N. Fuhr

Example:non-Haar basis - squared errors do not sum up

l·a2

l2b2

l1a2 + l

2

(a2 + b2 + 2ab

)

Haar basis - squared errors sum up

l·a2

l2b2

l

2a2 +

l

4((a + b)2 + (a − b)2

)

Universitat Dortmund, Informatik VI, N. Fuhr

=l

2a2 +

l

4(a2 + 2ab + b2 + a2 − 2ab + b2

)

Universitat Dortmund, Informatik VI, N. Fuhr

2-dimensional case

Universitat Dortmund, Informatik VI, N. Fuhr

Page 9: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

standard wavelet decomposition

� � �

.

.

.

-

transform rows

?

transform

Universitat Dortmund, Informatik VI, N. Fuhr

nonstandard wavelet decomposition

.

.

.

-

transform rows

?

transform

columns

Universitat Dortmund, Informatik VI, N. Fuhrex

ample

wavelet

compression

(a)

(b)

(c)

(d)

a)originalim

ageb)

19%of

coefficients

(5%error)

c)3%

ofcoeffi

cients(10%

error)d)

1%of

coefficients

(15%error)

Universitat Dortmund, Informatik VI, N. Fuhr

2.4 Text

2.4.1 Media type

Non-temporal: Text

• Representation– ASCII, ISO character sets– Marked-up, structured text– Hypertext

• Operations– Operations: character, string, language-specific– Editing, formatting– Pattern-matching and searching– Sorting– Compression– Encryption

Universitat Dortmund, Informatik VI, N. Fuhr

Page 10: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Universitat Dortmund, Informatik VI, N. Fuhr

2.4.2 SGML

markup language for text,worldwide standard

markup approaches:

1. punctuation2. layout (WYSIWYG)3. procedural (Troff, TeX, LaTeX)4. descriptive (GML, SGML)5. referential (embed, include; SGML)6. meta-markup

Universitat Dortmund, Informatik VI, N. Fuhr

SGML standards

• SGML = ISO 8879,Standard Generalized Markup Language

• related standards:– ISO 10179: DSSSL,

Document Style Semantics & Specifications(layout specification language for SGML docu-ments)

– ISO 8613: ODA,Office Document Architecture:(formating, presentation, exchange)ODML: SGML-DTD for ODA documents

Universitat Dortmund, Informatik VI, N. Fuhr

properties of SGML

SGML is

• markup language, database language• extensible document description language• meta language for the definition of document types

SGML supports

• logical structures, hierarchies• linking and addressing of files• multimedia and hypertext

Universitat Dortmund, Informatik VI, N. Fuhr

Page 11: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Processing of SGML documents

DSSL1

DSSL2

SGML Parser

FormattedDocuments Documents

Displayed

Doc1 Doc2 Doc3DTD 1

DTD 2

• syntax checking (according to a DTD)• printing according to a DSSSL specification• presentation on a screen

(according to a DSSSL specification)• indexing for context-oriented search• transformation in other representations

Universitat Dortmund, Informatik VI, N. Fuhr

SGML markup

SGML supports 4 types of markup:

1. descriptive: tags2. referential: references to objects3. meta markup: markup declarations (DTD)4. procedural: LINK, CONCUR

Universitat Dortmund, Informatik VI, N. Fuhr

Descriptive markup

• SGML document consists of elements<author><first>John</first><last>Smith</last></author>

• element:1. start tag2. content3. end tag

• content: defined by content model(grammar production)

– text (#PCDATA) or– sequence of elements

→ nesting of elements• top level element: document• start tag may have attributes

(attribute-value pairs)• document syntax defined in DTD (document type

definition)

Universitat Dortmund, Informatik VI, N. Fuhr

Example DTD

<!ELEMENT article - -(title, abstract, section+)>

<!ELEMENT title - - (#PCDATA)><!ELEMENT abstract - o (#PCDATA)><!ELEMENT section - o((title, body+) | (title, body*, subsectn+))><!ELEMENT subsectn - o (title, body+)><!ELEMENT body - o (figure | paragr)><!ELEMENT figure - o EMPTY><!ELEMENT paragr - o (#PCDATA)>

<!ATTLIST article author NAMES #REQUIREDstatus (final | draft) draft #REQUIRED><!ATTLIST figure file ENTITY #IMPLIED>

<!ENTITY file SYSTEM "/tmp/picture.ps" NDATA><!ENTITY amp "&">

Universitat Dortmund, Informatik VI, N. Fuhr

Page 12: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Example document

<article status = draft"author = "Cluet Christophides">

<title>From Structured Documents to ...</title><abstract>Structured Documents (e.g SGML) canbenefit from...

<section><title>Introduction</title><body><paragr>This Paper is organized as follows....</body></section>

<section><title>SGML preliminaries</title><body><figure>

</article>

Universitat Dortmund, Informatik VI, N. Fuhr

DTD syntax

element:<!ELEMENT element name omitstart omitend production>

attribute list for elements:<!ATTLIST element name attribute name domain default>

entities: (macro mechanism)<!ENTITY ename value >referencing: &ename

DTDs

• define a class of documents• specialize SGML for documents of a class• contain an attribute grammar• contain a nesting grammar• support hierarchies by means of nesting

Universitat Dortmund, Informatik VI, N. Fuhr

<!ELEMENT HTML O O HEAD, BODY --HTML document--><!ELEMENT HEAD O O TITLE><!ELEMENT TITLE - - #PCDATA><!ELEMENT BODY O O %content><!ENTITY % content

"(%heading | %htext | %block | HR)*"><!ENTITY % heading "H1|H2|H3|H4|H5|H6"><!ENTITY % htext "A | %text" --hypertext--><!ENTITY % text "#PCDATA | IMG | BR"><!ELEMENT IMG - O EMPTY --Embed. image--><!ELEMENT BR - O EMPTY><!ENTITY % block "P | PRE"><!ELEMENT P - O (%htext)+ --paragraph--><!ELEMENT PRE - - (%pre.content)+ --preform.--><!ENTITY % pre.content "#PCDATA | A"><!ELEMENT A - - (%text)+ --anchor--><!ELEMENT HR - O EMPTY -- horizontal rule --><!ATTLIST A

NAME CDATA #IMPLIEDHREF CDATA #IMPLIED --link-->

<!ATTLIST IMGSRC CDATA #REQUIRED --URL of img--ALT CDATA #REQUIREDALIGN (top|middle|bottom) #IMPLIEDISMAP (ISMAP) #IMPLIED>

Universitat Dortmund, Informatik VI, N. Fuhr

HTML

• is an SGML document class (DTD)

• mixture of logical and layout tags• no fixed DSSSL style sheet

no possibility for transmission of style sheets

consequences:

• HTML is less flexible than SGML• only minimum logical structuring possible

(makes retrieval difficult)• layout can be controlled only partially by document

provider

Universitat Dortmund, Informatik VI, N. Fuhr

Page 13: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.4.2.1 DSSSL

language for describing layout of SGML documents

1. expression language (subset of Scheme)2. style language for formatting3. query language for retrieving document parts

Universitat Dortmund, Informatik VI, N. Fuhr

SG

ML

Do

cu

me

nt

SG

ML

Do

cu

me

nt

STTPSTFP

SPDL

or p

rop

rieta

ryfo

rm

Sourc

eD

oc

ume

ntTre

eTra

nsform

atio

nPro

ce

ss

STTPO

utput

Do

cum

ent

Tree

Form

atting

Proc

ess

Outp

ut of

Form

atte

r

DSSSL Sp

ec

ifica

tion

STTP-SPECSG

ML D

ec

ls&D

TDs

STFP-SPEC

Universitat Dortmund, Informatik VI, N. Fuhr

formatting

• input: SGML document + DTD + DSSSL stylesheet• output: formatted document (format depends on pro-

cessor)(e.g. TeX, RTF, Postscript, PDF)

formatting process:

• recursive processing of document according to DSSSLspecification

• output: tree of flow objectsflow object classes defined in DSSSL standard(e.g. page-sequence, paragraph, sequence)

Universitat Dortmund, Informatik VI, N. Fuhr

example document

<!DOCTYPE FAQ SYSTEM "FAQ.DTD"><FAQ><INFO><SUBJECT> XML </SUBJECT><AUTHOR> Lars Marius Garshol</AUTHOR><EMAIL> [email protected] </EMAIL><VERSION> 1.0 </VERSION><DATE> 20.jun.97 </DATE>

</INFO>

<PART NO="1"><Q NO="1"><QTEXT>What is XML?</QTEXT><A>SGML light.</A>

</Q>

<Q NO="2"><QTEXT>What can I use it for?</QTEXT><A>Anything.</A>

</Q>

</PART></FAQ>

Universitat Dortmund, Informatik VI, N. Fuhr

Page 14: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

DTD:

<!ELEMENT FAQ (INFO, PART+)>

<!ELEMENT INFO (SUBJECT, AUTHOR, EMAIL?,VERSION?, DATE?)>

<!ELEMENT SUBJECT (#PCDATA)><!ELEMENT AUTHOR (#PCDATA)><!ELEMENT EMAIL (#PCDATA)><!ELEMENT VERSION (#PCDATA)><!ELEMENT DATE (#PCDATA)>

<!ELEMENT PART (Q+)><!ELEMENT Q (QTEXT, A)>

<!ELEMENT QTEXT (#PCDATA)><!ELEMENT A (#PCDATA)>

<!ATTLIST PART NO CDATA #IMPLIEDTITLE CDATA #IMPLIED>

<!ATTLIST Q NO CDATA #IMPLIED>

Universitat Dortmund, Informatik VI, N. Fuhr

style sheet

<!doctype style-sheet PUBLIC "-//James Clark//">

;--- DSSSL stylesheet for FAQML

;---Constants

(define *font-size* 12pt)(define *font* "Times New Roman")

;---Element styles

(element FAQ(make simple-page-sequence

font-family-name: *font*input-whitespace-treatment: ’collapsefont-size: *font-size*line-spacing: (* *font-size* 1.2)

(process-children)))

(element INFO(make paragraph

quadding: ’centerspace-after: (* *font-size* 1.5)

(process-children)))

Universitat Dortmund, Informatik VI, N. Fuhr

(element SUBJECT(make paragraph

font-size: (* *font-size* 2)line-spacing: (* *font-size* 2)space-after: (* *font-size* 2)

(process-children)))

(element AUTHOR(make sequence

(process-children)(literal ", ")))

(element VERSION(make paragraph

(make sequence(literal "Version: "))

(process-children)))

(element DATE(make paragraph

(make sequence(literal "Last modified: "))

(process-children)))

Universitat Dortmund, Informatik VI, N. Fuhr

(element PART(make paragraph

font-size: (* *font-size* 1.5)line-spacing: (* *font-size* 2)

(make sequence(literal (attribute-string "NO"

(current-node)))(literal ". ")(literal (attribute-string "TITLE"

(current-node))))

(process-children)))

(element QTEXT(make paragraph

font-weight: ’boldfont-size: *font-size*line-spacing: (* *font-size* 1.2)

(make sequence(literal (attribute-string "NO"

(parent (current-node))))(literal ". "))

(process-children)))

Universitat Dortmund, Informatik VI, N. Fuhr

Page 15: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

(element A(make paragraph

space-after: (* *font-size* 0.66667)font-size: *font-size*line-spacing: (* *font-size* 1.2)

(process-children)))

Universitat Dortmund, Informatik VI, N. Fuhr

2.4.3 XML

weaknesses of HTML

• mixture of logical and layout markup:– logical: TITLE, H1, MENU, P– layout: I, B; FONT, CENTER, BGCOLOR att-

tributes• lack of markup facilities for specific texts

(e.g. math, chemistry)• little internal structure of elements

Universitat Dortmund, Informatik VI, N. Fuhr

XML vs. SGML

• complexity of SGML implementations→ XML is simplified version of SGML

• weak support for different character sets in SGML→ XML is based on Unicode

• SGML document not understandable without DTD

Universitat Dortmund, Informatik VI, N. Fuhr

XML Standard

• markup language: XML• linking language: XLink, XPointer• formatting language: XSL/XSLT

Universitat Dortmund, Informatik VI, N. Fuhr

Page 16: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.4.3.1 XML language

simplification of SGML:

• start tag and end tag always must be present• special form: combined start-end tag:

e.g. <br/>, <img src="icon.gif"/>• DTD not always required:

well-formed XML: syntactically correct XMLvalid XML: XML-document satisfies specified DTD

• element names: case matters, underscore and colonin names allowed

• many special cases from SGML forbidden

Universitat Dortmund, Informatik VI, N. Fuhr

DTD

<!ENTITY % xhtml SYSTEM "xhtml-1.0-strict.dtd" >

%xhtml;

<!ELEMENT project (projecttitle,

shortdesc,

logo*,

fieldofoperation,

timeperiod?,

contactpersons,

involvedpersons?,

sponsoredby?,

participatinginstitutes?,

description,

publicationlist?,

notes?,

doccreator) >

<!ELEMENT projecttitle (langtext+) >

<!ATTLIST projecttitle state (work|closed) "closed">

<!ATTLIST projecttitle workgroup (ir|issi) #REQUIRED>

<!ELEMENT shortdesc (langtext+) >

<!ELEMENT logo (#PCDATA) > <!-- image file name -->

<!ATTLIST logo align (left|right) #IMPLIED

width %Length; #IMPLIED

height %Length; #IMPLIED >

<!ELEMENT referenceno (#PCDATA) >

<!ELEMENT fieldofoperation (langtext+) >

<!ELEMENT sponsoredby (sponsor+) >

<!ELEMENT sponsor (langtext+ | weblink) >

<!ELEMENT timeperiod (langtext+ |

(startdate, enddate)) >

<!ELEMENT startdate (day, month, year) >

Universitat Dortmund, Informatik VI, N. Fuhr

<!ELEMENT enddate (day, month, year) >

<!ELEMENT day (#PCDATA) >

<!ELEMENT month (#PCDATA) >

<!ELEMENT year (#PCDATA) >

<!ELEMENT contactpersons (personnel+) >

<!ELEMENT involvedpersons (personnel+) >

<!ELEMENT personnel (langtext+) >

<!ELEMENT participatinginstitutes (institute+) >

<!ELEMENT institute (langtext+ | weblink) >

<!ELEMENT description (langflow+) >

<!ELEMENT publicationlist (publication+) >

<!ELEMENT publication (langtext+) >

<!ELEMENT notes (langflow+) >

<!ELEMENT doccreator EMPTY>

<!ELEMENT weblink (url, linkdescription, langtext*)>

<!ELEMENT url (#PCDATA) >

<!ELEMENT linkdescription (langtext+) >

<!ELEMENT langtext %Inline; >

<!ELEMENT langflow %Flow; >

Universitat Dortmund, Informatik VI, N. Fuhr

Example document

<?xml version="1.0" encoding="ISO-8859-1" ?>

<!DOCTYPE project SYSTEM

"/services/www/xml/dtd/project.dtd">

<project>

<projecttitle state="work" workgroup="ir">

<langtext>

MIND

</langtext>

</projecttitle>

<shortdesc>

<langtext>

Resource Selection and Data Fusion for

Multimedia International Digital Libraries

</langtext>

</shortdesc>

<logo align="right">mast2_sm.gif</logo>

<fieldofoperation>

<langtext>Information Retrieval</langtext>

</fieldofoperation>

<timeperiod>

<startdate>

<day>01</day>

<month>02</month>

<year>2001</year>

</startdate>

<enddate>

<day>31</day>

<month>07</month>

<year>2003</year>

Universitat Dortmund, Informatik VI, N. Fuhr

Page 17: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

</enddate>

</timeperiod>

<contactpersons>

<personnel>

<langtext>

<a href="/staff/members/nottelma.html">

Dipl.-Inform. Henrik Nottelmann</a>

</langtext>

</personnel>

</contactpersons>

<sponsoredby>

<sponsor>

<langtext>EU FP5</langtext>

</sponsor>

</sponsoredby>

<participatinginstitutes>

<institute>

<weblink>

<url>http://www.strath.ac.uk/</url>

<linkdescription>

<langtext>

University of Strathclyde

</langtext>

</linkdescription>

</weblink>

</institute>

<institute>

<weblink>

<url>http://ls6-www.informatik.uni-dortmund.de

</url>

<linkdescription> <langtext>

University of Dortmund

</langtext>

Universitat Dortmund, Informatik VI, N. Fuhr

</linkdescription>

</weblink>

</institute>

</participatinginstitutes>

<description>

<langflow>

<p> This research addresses problems associated

with the emergence of thousands of

heterogeneous multimedia Digital libraries...

</p>

</langflow>

</description>

<notes>

<langflow >

<ul>

<li><a href="internal/index.html">

Internal pages</a></li>

</ul>

</langflow>

</notes>

<publicationlist>

<publication>

<langtext>

<a href="overview/mind-overview.html">

MIND Overview slides

</a>

</langtext>

</publication>

</publicationlist>

<doccreator/>

</project>

Universitat Dortmund, Informatik VI, N. Fuhr

XSLT

transformation of XML documents(e.g. from XML into HTML)

similar to DSSSL, but in XML syntax

XSLT-Stylesheet =frame + set of transformation rules

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns="http://www.w3.org/TR/REC-html40">

<xsl:output method="html"/>

<xsl:template match="...">

...

</xsl:template>

...

</xsl:stylesheet>

Universitat Dortmund, Informatik VI, N. Fuhr

Some XSLT elements

<xsl:template>

specifies a template rulematch attribute identifies source node(s) to which rule applies

<xsl:if>

test attribute specifies an expression:if true, content template is instantiated

<xsl:choose>

selects one among a number of possible alternative child ele-ments <xsl:when> and <xsl:otherwise>

<xsl:when>

if expression specified by test attribute is true, content templateis instantiated

<xsl:text>

contains literal data to be included in the output

Universitat Dortmund, Informatik VI, N. Fuhr

Page 18: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

A small example

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE brief SYSTEM "brief.dtd">

<brief>

<anrede geschlecht="f" sozial="du">Nora</anrede>

<text>habe gerade den Ulysses beendet. Mal sehen,

wann der in den USA gedruckt werden darf...</text>

<gruss>J</gruss>

</brief>

Universitat Dortmund, Informatik VI, N. Fuhr

Stylesheet

<xsl:template match="/">

<html>

<body>

<xsl:apply-templates/>

</body>

</html>

</xsl:template>

<xsl:template match="anrede">

<p>

<xsl:choose>

<xsl:when test="@sozial=’du’">

<xsl:text>Liebe</xsl:text>

<xsl:if test="@geschlecht=’m’">

<xsl:text>r</xsl:text>

</xsl:if>

<xsl:text> </xsl:text>

</xsl:when>

<xsl:when test="@sozial=’sie’">

<xsl:choose>

<xsl:when test="@geschlecht=’m’">

<xsl:text>Sehr geehrter Herr </xsl:text>

</xsl:when>

<xsl:when test="@geschlecht=’m’">

<xsl:text>Sehr geehrte Frau </xsl:text>

</xsl:when>

</xsl:choose>

</xsl:when>

</xsl:choose>

Universitat Dortmund, Informatik VI, N. Fuhr

<xsl:apply-templates/>

<xsl:text>,</xsl:text>

</p>

</xsl:template>

<xsl:template match="text | gruss">

<p>

<xsl:apply-templates/>

</p>

</xsl:template>

Universitat Dortmund, Informatik VI, N. Fuhr

XSL stylesheet for project page

<?xml version="1.0" encoding="ISO-8859-1" ?>

<xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns="http://www.w3.org/TR/REC-html40"

result-ns=""

version="1.0"

default-space="strip"

indent-result="yes">

<xsl:output method="html" encoding="iso-8859-1"/>

<xsl:param name="mailto"/>

<xsl:param name="fullname"/>

<xsl:param name="date"/>

<xsl:param name="lang"/>

<xsl:param name="url"/>

<xsl:include href="ls6common.xsl"/>

<xsl:template match="/">

<html>

<head>

<xsl:apply-templates

select="/project/projecttitle" mode="head"/>

<meta name="description">

<xsl:attribute name="content">

University of Dortmund,

Department of Computer Science (Chair VI):

<xsl:value-of

select="/project/projecttitle/langtext"/>,

<xsl:value-of

select="/project/shortdesc/langtext"/>

Universitat Dortmund, Informatik VI, N. Fuhr

Page 19: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

</xsl:attribute>

</meta>

</head>

<body bgcolor="white">

<xsl:apply-templates/>

</body>

</html>

</xsl:template>

<xsl:template match="project">

<xsl:if test="//projecttitle[@workgroup=’ir’]">

<xsl:call-template name="navbar-top">

<xsl:with-param name="upurl">

/ir/projects.html.en</xsl:with-param>

<xsl:with-param name="upname">

IR Projects</xsl:with-param>

</xsl:call-template>

</xsl:if>

<xsl:if test="//projecttitle[@workgroup=’issi’]">

<xsl:call-template name="navbar-top">

<xsl:with-param name="upurl">

/issi/projects.html.en</xsl:with-param>

<xsl:with-param name="upname">

ISSI Projects</xsl:with-param>

</xsl:call-template>

</xsl:if>

<xsl:apply-templates/>

</xsl:template>

<xsl:template match="projecttitle" mode="head">

<title>

<xsl:apply-templates select="langtext" mode="head"/>

</title>

Universitat Dortmund, Informatik VI, N. Fuhr

</xsl:template>

<xsl:template match="projecttitle">

<h1>

<xsl:apply-templates select="langtext"/>

</h1>

<xsl:call-template name="hrule"/>

</xsl:template>

<xsl:template match="shortdesc">

<em><xsl:apply-templates/></em>

<br/>

</xsl:template>

<xsl:template match="logo">

<img src="{.}">

<xsl:if test="@width">

<xsl:attribute name="width">

<xsl:value-of select="@width"/></xsl:attribute>

</xsl:if>

<xsl:if test="@height">

<xsl:attribute name="height">

<xsl:value-of select="@height"/></xsl:attribute>

</xsl:if>

<xsl:if test="@align">

<xsl:attribute name="align">

<xsl:value-of select="@align"/></xsl:attribute>

</xsl:if>

</img>

</xsl:template>

<xsl:template match="referenceno">

<p> <h3>Reference Number</h3>

Universitat Dortmund, Informatik VI, N. Fuhr

<xsl:apply-templates/>

</p>

</xsl:template>

<xsl:template match="fieldofoperation">

<p> <h3>Field of operation</h3>

<xsl:apply-templates/>

</p>

</xsl:template>

<xsl:template match="timeperiod">

<p> <h3>Project Duration</h3>

From <xsl:apply-templates select="startdate"/>

until <xsl:apply-templates select="enddate"/>

</p>

</xsl:template>

<xsl:template match="startdate|enddate">

<xsl:apply-templates select="day"/>.

<xsl:apply-templates select="month"/>.

<xsl:apply-templates select="year"/>

</xsl:template>

<xsl:template match="day|month|year">

<xsl:apply-templates/>

</xsl:template>

<xsl:template match="contactpersons">

<p> <h3>Contact Persons</h3>

<ul>

<xsl:apply-templates/>

</ul>

</p>

Universitat Dortmund, Informatik VI, N. Fuhr

</xsl:template>

<xsl:template match="involvedpersons">

<p> <h3>Involved Persons</h3>

<ul>

<xsl:apply-templates/>

</ul>

</p>

</xsl:template>

<xsl:template match="sponsoredby">

<p> <h3>Sponsored by</h3>

<ul>

<xsl:apply-templates/>

</ul>

</p>

</xsl:template>

<xsl:template match="publicationlist">

<p> <h3>Publications</h3>

<ul>

<xsl:apply-templates/>

</ul>

</p>

</xsl:template>

<xsl:template match="publication">

<li><xsl:apply-templates/></li>

</xsl:template>

<xsl:template match="sponsor">

<li><xsl:apply-templates/></li>

</xsl:template>

Universitat Dortmund, Informatik VI, N. Fuhr

Page 20: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

<xsl:template match="participatinginstitutes">

<p> <h3>Participating Institutes</h3>

<ul>

<xsl:apply-templates/>

</ul>

</p>

</xsl:template>

<xsl:template match="institute">

<li><xsl:apply-templates/></li>

</xsl:template>

<xsl:template match="description">

<h3>Description</h3>

<xsl:apply-templates/>

</xsl:template>

<xsl:template match="notes">

<h3>Notes</h3>

<xsl:apply-templates/>

</xsl:template>

<xsl:template match="linkdescription">

<xsl:apply-templates/>

</xsl:template>

<xsl:template

match="a[@href]|A[@HREF]|a[@name]|A[@NAME]|A[@href]">

<xsl:if test="@href">

<a href="{@href}"><xsl:apply-templates/></a>

</xsl:if>

<xsl:if test="@HREF">

Universitat Dortmund, Informatik VI, N. Fuhr

<a href="{@HREF}"><xsl:apply-templates/></a>

</xsl:if>

<xsl:if test="@name">

<a name="{@name}"><xsl:apply-templates/></a>

</xsl:if>

<xsl:if test="@NAME">

<a name="{@NAME}"><xsl:apply-templates/></a>

</xsl:if>

</xsl:template>

<xsl:template match="personnel">

<li><xsl:apply-templates/></li>

</xsl:template>

</xsl:stylesheet>

Universitat Dortmund, Informatik VI, N. Fuhr

HTML output

<html xmlns="http://www.w3.org/TR/REC-html40">

<head>

<META http-equiv="Content-Type"

content="text/html; charset=iso-8859-1">

<title> MIND </title>

<meta name="description"

content="University of Dortmund,

Department of Computer Science (Chair VI): MIND,

Resource Selection and Data Fusion for Multimedia

International Digital Libraries ">

</head>

<body bgcolor="white">

<table width="100%">

<tr>

<td width="10%"></td><td width="80%" align="center">

[<a href="/ir/projects.html.en">IR Projects</a>]

[<a href="/ir/index.html.en">IR</a>]

[<a href="/issi/index.html.en">IS and Security</a>]

</td><td width="10%" align="right">

<a href="index.html.de">(deutsch)</a></td>

</tr>

</table>

<h1> MIND </h1>

<hr noshade size="2" width="100%">

<em>Resource Selection and Data Fusion for

Multimedia International Digital Libraries</em><br>

<img src="mast2_sm.gif" align="right">

<p> <h3>Field of operation</h3>

Information Retrieval </p>

<p> <h3>Project Duration</h3>

Universitat Dortmund, Informatik VI, N. Fuhr

From 01. 02. 2001 until 31. 07. 2003</p>

<p> <h3>Contact Persons</h3>

<ul> <li>

<a href="/staff/members/nottelma.html">

Dipl.-Inform. Henrik Nottelmann</a> </li>

</ul> </p>

<p><h3>Sponsored by</h3><ul><li>EU FP5</li></ul></p>

<p> <h3>Participating Institutes</h3>

<ul><li><a href="http://www.strath.ac.uk/">

University of Strathclyde </a> </li>

<li> <a

href="http://ls6-www.informatik.uni-dortmund.de">

University of Dortmund </a> </li>

</ul> </p>

<h3>Description</h3>

<p xmlns="">

This research addresses problems associated with

the emergence of thousands of heterogeneous

multimedia Digital libraries ... </p>

<h3>Notes</h3>

<ul xmlns="">

<li>

<a href="internal/index.html"

xmlns="http://www.w3.org/TR/REC-html40">

Internal pages</a>

</li>

</ul>

<p>

<h3>Publications</h3>

<ul>

<li>

<a href="overview/mind-overview.html">

MIND Overview slides </a>

Universitat Dortmund, Informatik VI, N. Fuhr

Page 21: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

</li>

</ul>

</p>

<hr noshade size="2" width="100%">

<address>

<a href="mailto:[email protected]">

Henrik Nottelmann</a>

&lt;[email protected]&gt;,

20. March 2001</address>

</body>

</html>

Universitat Dortmund, Informatik VI, N. Fuhr

2.4.3.2 XLink: XML linking language

linking possible in any XML-DTD→ no special linking elements

linking via special attribute (for arbitrary elements):xml:link

terminology:

resource: adressable service or unit of information that partic-ipates in a link

link: explicit relationship between two or more resources

locator: data, provided as part of a link, which identifies aresource(attribute HREF)

inline link: link which serves as one of its own resourcese.g. A in HTML

out-of-line link: link whose content does not serve as one ofthe link’s resources

Universitat Dortmund, Informatik VI, N. Fuhr

Simple links

• one-directional

• mostly inline

<mylink xml:link="simple" title="Citation"

href="http://www.xyz.com/xml/foo.xml"

show="new" content-role="Reference">

as discussed in Smith(1997)</mylink>

<!ELEMENT mylink (#PCDATA)>

<!ATTLIST mylink

xml:link CDATA #FIXED "simple"

href CDATA #REQUIRED

content-role CDATA #IMPLIED

>

Universitat Dortmund, Informatik VI, N. Fuhr

Extended linksusually out-of-line links

capabilities:

• enable outgoing links in read-only documents

• create links to and from resouces in other formats

• applying and filtering sets of relevant links on demand

• enable other advanced hypermedia capabilities(e.g. via attribute ROLE)

example out-of-line extended link:

<commentary xml:link="extended" inline="false">

<locator href="smith2.1" role="Essay"/>

<locator href="jones1.4" role="Rebuttal"/>

<locator href="robin3.2" role="Comparison"/>

</commentary>

Universitat Dortmund, Informatik VI, N. Fuhr

Page 22: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

definitions:

<!ELEMENT extended ANY>

<!ATTLIST extended

xml:link CDATA #FIXED "extended"

%link-semantics.att;

%local-resource-semantics.att;

>

<!ELEMENT locator ANY>

<!ATTLIST locator

xml:link CDATA #FIXED "locator"

%locator.att;

%remote-resource-semantics.att;

>

Universitat Dortmund, Informatik VI, N. Fuhr

<!ENTITY % locator.att

"href CDATA #REQUIRED"

>

<!ENTITY % link-semantics.att

"inline (true|false) ’true’

role CDATA #IMPLIED"

>

<!ENTITY % local-resource-semantics.att

"content-role CDATA #IMPLIED

content-title CDATA #IMPLIED"

>

<!ENTITY % remote-resource-semantics.att

"role CDATA #IMPLIED

title CDATA #IMPLIED

show (embed|replace|new) #IMPLIED

actuate (auto|user) #IMPLIED

behavior CDATA #IMPLIED"

>

Universitat Dortmund, Informatik VI, N. Fuhr

Link behaviour

SHOW attribute:describes display behaviour on traversal of link

• embed: designated resource embedded in body of currentresource

• replace: designated resource replaces current resource

• new: designated resource displayed in a new window

ACTUATE attribute:when should traversal of link occur?

• auto: retrieve resource when current resource is encoun-tered

• user: present resource only upon request from user

all combinations of SHOW and ACTUATE values are possible!

Universitat Dortmund, Informatik VI, N. Fuhr

2.4.3.3 XPointer: XML Pointer Language

for locators in XLink

• reference to whole document

• reference to named element in document

• reference to unnamed element in read-only document

locator syntax

Locator ::= URI

| Connector ( XPointer | Name)

| URI Connector (XPointer | Name)

Connector ::= ’#’ | ’|’

URI ::= URIchar*

Universitat Dortmund, Informatik VI, N. Fuhr

Page 23: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

XPointer syntax

• location of individual nodes in element tree

• spanning locations across several elements

• arbitrary set of elements

syntax:

XPointer ::= AbsTerm ’.’ OtherTerms

| AbsTerm

| OtherTerms

OtherTerms ::= OtherTerm

| OtherTerm ’.’ OtherTerm

OtherTerm ::= RelTerm

| SpanTerm

| AttrTerm

| StringTerm

Universitat Dortmund, Informatik VI, N. Fuhr

Absolute location terms

AbsTerm ::= ’root()’ | ’origin()’ | IdLoc | HTMLAddr

IdLoc ::= ’id(’ Name ’)’

HTMLAddr ::= ’html(’ SkipLit ’)’

• root: root element of containing resource

• origin: application-dependent

• id: element with named id value

• html(NAMEVALUE): A element in HTML withNAME=NAMEVALUE

Universitat Dortmund, Informatik VI, N. Fuhr

Relative location terms

RelTerm ::= Keyword? Arguments

Keyword ::= ’child’

| ’descendant’

| ’ancestor’

| ’preceding’

| ’following’

| ’psibling’

| ’fsibling’

example:child(2,section).child(1,subsection)

Universitat Dortmund, Informatik VI, N. Fuhr

relative location term arguments

• selection by instance number

• selection by node type

• selection by attribute

Universitat Dortmund, Informatik VI, N. Fuhr

Page 24: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Spanning location termdata between two XPointers:

SpanTerm ::= ’span(’ XPointer ’,’ XPointer ’)’

examples:

id(a23).span(child(1),child(3))

span(id(sec2.1).child(-1,P),id(sec2.2).child(1,P))

Attribute location termreturns value of named attribute

String location termstring match

Universitat Dortmund, Informatik VI, N. Fuhr

2.5 Images

2.5.1 Media type

Non-temporal: Image

• Representation

– Color model: CIE, RGB, HSV, CMYK, YUV– Channels: alpha?, number, depth– Interlacing– Indexing– Pixel aspect ratio– Compression

• Operations

– Editing– Point operations: thresholding, color correction– Filtering– Compositing– Geometric transformations: displacing, rotating,

mirroring, scaling, skewing, warping– Conversion: color separation, resampling

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.2 Color

2.5.2.1 Human perception

visible light: λ ∈ [380nm . . . 780nm]([violet . . . red])

retina:

• rods for brightness

• cones for chromaticity (color)

three types of cones:

• yellow: λx = 600nm

• green: λy = 535nm

• blue: λz = 445nm

Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr

Page 25: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr

ϕ(λ): wavelength distribution of source light

k: normalization factor

x(λ), y(λ), z(λ): eye response functions

X, Y , Z: perceived color

X = k

∫ϕ(λ)x(λ)dλ

Y = k

∫ϕ(λ)y(λ)dλ

Z = k

∫ϕ(λ)z(λ)dλ

CIE Yxy color system

x =X

X + Y + Z

y =Y

X + Y + Z

Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr

Page 26: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

perceived visual distance(magnified by 10)

→ need for equidistant color spaces

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.2.2 Color spaces

RGB three basic colours: red, green, blue

YUV luminance (as in b/w TV) +2 chrominance channels

YIQ used for NTSC TV

YCrCb used in JPEG digital image standard

CMY(k) cyan, magenta, yellow (black) used for printers

HSV color model with (approximately) equidistant colors

mapping RGB → YUV:Y = 0.30R + 0.59G + 0.11BU = (B − Y ) · 0.493V = (R − Y ) · 0.877

mapping RGB → YIQ:Y = 0.30R + 0.59G + 0.11BI = 0.60R − 0.28G − 0.32BQ = 0.21R − 0.52G + 0.31B

mapping RGB → YCrCb:Y = 0.30R + 0.59G + 0.11BCr = 0.50R − 0.42G − 0.08BCb = −0.17R − 0.33G + 0.50B

mapping RGB → CMY(K):C = 1 − RM = 1 − GY = 1 − BK = min(C,M, Y )

Universitat Dortmund, Informatik VI, N. Fuhr

mapping RGB → HSV:

v = max(r, g, b), s = v - min(r,g,b)v

let r = v - rv - min(r,g,b)

, g =v - g

v - min(r,g,b), b = v - b

v - min(r,g,b)

6h =

5 + b if r = max(r, g, b) and g = min(r, g, b)1 − g if r = max(r, g, b) and g 6= min(r, g, b)1 + r if g = max(r, g, b) and b = min(r, g, b)

3 − b if g = max(r, g, b) and b 6= min(r, g, b)3 + g if b = max(r, g, b) and r = min(r, g, b)5 − r otherwise

Universitat Dortmund, Informatik VI, N. Fuhr

vc

rg

b,

,(

)=

R

G

B

S

V

H

wc

Tvc

⋅=

r

gb

wc

hs

v,

,(

)=

Universitat Dortmund, Informatik VI, N. Fuhr

Page 27: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.5.3 GIF format

(graphics interchange format, proprietary standards of Com-puServe)

• lossless compression of image data

• restricted to 256 colors

structure of a GIF file:

• GIF signature:“GIF87a” / “GIF89a”

• screen descriptor

– width– height– color resolution (1. . . 8 bits)– background color

• global color map:table of RGB values

• sequence of images

• GIF terminator

Universitat Dortmund, Informatik VI, N. Fuhr

strucure of an image:

• image descriptor (image position+size)

• local color map

• raster data:sequence of color index values,compressed by patented variation of LZW

sequence of raster data: sequential / interlaced rows

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.4 PNG format

(portable network graphics)non-proprietary standard proposed by W3C

GIF features retained in PNG:

• Indexed-color images of up to 256 colors.

• Streamability:files can be read and written serially(file format usable as communications protocol)

• Progressive display

• Transparency(portions of the image can be marked as transparent),

• Ancillary information:textual comments and other data can be stored within theimage file.

• Complete hardware and platform independence.

• Effective, 100% lossless compression.

Universitat Dortmund, Informatik VI, N. Fuhr

New features of PNG:

• Truecolor images of up to 48 bits per pixel.

• Grayscale images of up to 16 bits per pixel.

• Full alpha channel (general transparency masks).

• Image gamma information(automatic display of images with correct bright-ness/contrast)

• Reliable, straightforward detection of file corruption.

• Faster initial presentation in progressive display mode.

Universitat Dortmund, Informatik VI, N. Fuhr

Page 28: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.5.5 JPEG Formats

2.5.5.1 Requirements

• high compression rate vs. image fidelity

• applicable to any kind of continuous-tone digital sourceimage

• tractable computational complexity

• modes of operation:

1. sequential encoding(left-to-right, top-to-bottom)

2. progressive encodingencoding in multiple scans for low-bandwidth com-munication(user watches image built up in multiple course-to-clear passes)

3. lossless encodingexact recovery of source image possible(although low compression compared to lossymodes)

4. hierarchical encodingencoding at multiple resolutions

2.5.5.2 Processing steps for DCT-based coding

DCT: discrete cosine transform

here: consider single component only= greyscale image

Universitat Dortmund, Informatik VI, N. Fuhr

FDC

TQ

uant

izer

Ent

ropy

Enc

oder

Spec

ific

atio

nT

able

Spec

ific

atio

nT

able

Imag

e D

ata

Sour

ceC

ompr

esse

dIm

age

Dat

a

8x8

bloc

ksD

CT

-Bas

ed E

ncod

er

Universitat Dortmund, Informatik VI, N. Fuhr

Spec

ific

atio

nT

able

Spec

ific

atio

nT

able

Imag

e D

ata

Rec

onst

ruct

edC

ompr

esse

dIm

age

Dat

a

Ent

ropy

Dec

oder

Deq

uant

izer

IDC

T

DC

T-B

ased

Dec

oder

Universitat Dortmund, Informatik VI, N. Fuhr

8*8 DCTcompression of a stream of 8*8 blocks of image samples

• group image samples into 8*8 blocks

• shift from unsigned integers to signed integers:[0, 2p − 1] → [−2p−1, 2p−1 − 1]

• input to the forward DCT

F (u, v) =1

4C(u)C(v)

(7∑

x=0

7∑y=0

f(x, y)

· cos (2x + 1)uπ

16cos

(2y + 1)vπ

16

)

u, v = 0 . . . 7

C(u), C(v) =

{1/√

(2) for u, v = 01 otherwise

Universitat Dortmund, Informatik VI, N. Fuhr

Page 29: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

FDCT:64-point discrete signals→ 64 orthogonal basis signals(amplitudes of cosine functions)

F (0, 0) – DC coefficient:

1

4

1√(2)

1√(2)

(7∑

x=0

7∑y=0

f(x, y) · 1

16

1

16

)

other 63 coefficients – AC coefficients

little variation in 8*8 block→ most spatial frequencies with zero amplitude→ no encoding necessary→ compression

Universitat Dortmund, Informatik VI, N. Fuhr

inverse DCT:maps 64 DCT coefficients onto 8*8 image block

f(x, y) =1

4

(7∑

u=0

7∑v=0

C(u)C(v)F (u, v)

· cos (2x + 1)uπ

16cos

(2y + 1)vπ

16

)

Universitat Dortmund, Informatik VI, N. Fuhr

problems

theoretically:DCT is 1:1 mapping of 64 point vectors between image andfrequency domain

practically:loss through

• quantization

• computation of transcendental functions

Universitat Dortmund, Informatik VI, N. Fuhr

Quantizationmapping of FDCT output (F (u, v), u, v = 0 . . . 7)onto integers

quantization table:Q(u, v), u, v = 0 . . . 7, 1 ≤ Q(u, v) ≤ 255

quantization:

• goal: achieve further compression

• represent DCT coefficients with minimum necessary pre-cision(and minimum effect on visual image quality)

• lossy, n : 1 mapping

F Q(u, v) = IntegerRound

(F (u, v)

Q(u, v)

)

dequantization:

F ′(u, v) = F Q(u, v) · Q(u, v)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 30: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

DC coding and zig-zag sequence

• separate treatment of DC and AC coefficients

• DC:strong correlation between coefficients of adjacent 8*8blocks→ differential encoding

• AC:ordering in zig-zag sequencelow frequency coefficients (mostly nonzero) before high-frequency coefficients (mostly zero)(facilitate entropy coding)

DC = DC - DCl l l-1

DC DC

DC ACAC

AC AC 7770

01 07

ll-1

...... block blockl-1 l

Differential DC encoding Zig-zag sequence

Universitat Dortmund, Informatik VI, N. Fuhr

Entropy coding

lossless compression of DCT coefficients

1. convert zig-zag sequence of quantized coefficients into in-termediate sequence of symbols(with zero suppression)

2. convert symbols into data stream with no externally iden-tifiable boundaries(Huffman coding / arithmetic coding)

Universitat Dortmund, Informatik VI, N. Fuhr

Compression and picture quality

input: typically 8 bits/pixel per component(12 bits/pixel for special applications, e.g. medical images)

1 chrominance sample/4 luminance samples1 luminance component + 2 chrominance components→∑

12 bits/pixel

output:

• 0.25–0.5 bits/pixel: moderate to good quality

• 0.5–0.75 bits/pixel: good to very good quality

• 0.75–1.5 bits/pixel: excellent quality

• 1.5–2.0 bits/pixel: indistinguishable from the original

Universitat Dortmund, Informatik VI, N. Fuhr

Luminance sample

Chrominance sample

Block Edge

Universitat Dortmund, Informatik VI, N. Fuhr

Page 31: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.5.5.3 Predictive lossless coding

difficult with DCT→ independent method for lossless coding

typically 2:1 compression

Universitat Dortmund, Informatik VI, N. Fuhr

Imag

e D

ata

Sour

ceC

ompr

esse

dIm

age

Dat

aSp

ecif

icat

ion

Tab

le

Ent

ropy

Enc

oder

Los

sles

s E

ncod

er

Pred

icto

r

Universitat Dortmund, Informatik VI, N. Fuhr

C BA X

Selection value Prediction

0 no prediction1 A2 B3 C4 A + B - C5 A + ((B - C)/2)6 B + ((A - C)/2)7 (A + B)/2

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.5.4 Multiple-Component images

Source image formatsJPEG poses no restrictions onpixel aspect ratio,color space,image acquisition characteristics

JPEG source Image Model

...

C CC

12

N

Y

X

(a) Source Image withmultiple components

Universitat Dortmund, Informatik VI, N. Fuhr

Page 32: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

yi

xi

bottom

right

(b) Characteristics of anImage component

Ci top

left

samples

• image contains 1 . . . 255 components(spectral bands, channels)

• component = rectangular area of samples

• sample = unsigned integer with p bits

• p fixed for all samples of an image

• p = 8 or p = 12 for DCT coding

• p = 2 . . . 12 for predictive coding

xi, yi sample dimensions of ith component

Universitat Dortmund, Informatik VI, N. Fuhr

Hi, Vi relative horizontal/vertical sampling factor1 ≤ Hi, Vi ≤ 4

X, Y overall image dimensionsX = maxi(xi), Y = maxi(yi), X, Y ≤ 216

encoder stores X, Y and Hi, Vi

decoder:

xi =⌈X · Hi

Hmax

⌉, yi =

⌈Y · Vi

Vmax

Universitat Dortmund, Informatik VI, N. Fuhr

Entropy order and interleavinginterleaving of data from multiple components

data unit=

• sample in predictive coding

• 8*8 block in DCT coding

order of compressed data units:generalization of raster-scan order

noninterleaved data ordering:

top

bottom

left right

Universitat Dortmund, Informatik VI, N. Fuhr

interleaved data ordering

• component Ci partitioned into rectangular regions Hi*Vi

• regions ordered left-to-right, top-to-bottom

• data units within region ordered left-to-right, top-to-bottom

• MCU: minimum coded unit=smallest group of interleaved data units

1 2C : H = 2 , V = 2 C : H = 2 , V = 1

0

11 2 3 540 0 2

0

1 1

2

3

3

4 5

0 1 0 1 22

0 0

1 1

3

2

3 4C : H = 1 , V = 2 C : H = 1 , V = 1

restrictions:

Universitat Dortmund, Informatik VI, N. Fuhr

Page 33: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

• maximum number of components interleaved: 4

• maximum number of data units in an MCU: 10

Universitat Dortmund, Informatik VI, N. Fuhr

Multiple tablescomponent-specific tables for quantization and entropy coding

TableSpec.1

TableSpec. 2

EncodingProcess

CompressedImage Data

A

B

C

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.5.5 Baseline and other DCT sequential codecs

components of sequential coding:

• FDCT

• quantization

• entropy coding

• multiple-component control

variations:

• sample precisions: 8 bit / 12 bit

• Huffman / arithmetic coding

baseline sequential coding:

• 8 bit samples

• Huffman coding

• max. two sets of Huffman tables

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.5.6 DCT progressive mode

uses FDCT and quantization as with sequential coding

difference:each image component encoded in multiple scans

• requires image-sized buffer memory between quantizerand entropy encoder

• stores image as quantized DCT coefficients

• buffered coefficients partially encoded in multiple scans

• two complementary methods

– spectral selection:only specific band of coefficients from zig-zag se-quence encoded in a scan

– successive approximation:coefficients within current band encoded with lim-ited accuracy in a scan

Universitat Dortmund, Informatik VI, N. Fuhr

Page 34: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

(a) Image componentas quantized

DCT coefficients

0

12

0

1

62

63

7 6 1

LSBMSB

Universitat Dortmund, Informatik VI, N. Fuhr

(b) Sequentialencoding

Sending

0

1

2

1 st scanSending

3

4

5

Universitat Dortmund, Informatik VI, N. Fuhr

7 6 5 4

Sending

2 nd scan

3

MSB

1 st scan

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.5.7 Hierarchical mode of operation

“pyramidal” encoding of an image at multiple resolutions

subsequent encoding uses double resolution(horizontal/vertical/both)

procedure:

1. filter and down-scale original image by desired power of 2in each dimension

2. encode reduced-size image by sequential DCT / progres-sive DCT / lossless coding

3. decode reduced-size image, then interpolate and oversam-ple it by 2 (horizontally/vertically/both)

4. use up-sampled image as prediction of the original,encode difference image as above

5. repeat steps 3. and 4. until full resolution has been en-coded

application of hierarchical encoding:access to high-resolution images for low-resolution-devices withlimited buffer capacity

Universitat Dortmund, Informatik VI, N. Fuhr

Page 35: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.5.5.8 Coded representation for compressed im-ages

• interchange format syntax

• tables stored with the image / default tables / referencedtables

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.5.9 JPEG2000

capabilities supported:

• resolution scalability:arbitrary number of resolution levels

• region of interest coding:certain parts of image coded in better quality

• SNR (signal-noise ration) scalability

• random access capability

• multi-component imagery

• arbitrary wavelet decompositions

• arbitrary wavelet kernels

• arbitrary bit-depth images

• tiling

– any number of tiles– rate-control performed jointly over all tiles

• frames

– similar to tiles– coder operates independently in frames

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.6 Fractal image compression

2.5.6.1 Introduction

Input Image Output Image

Copy machine

Universitat Dortmund, Informatik VI, N. Fuhr

Initial Image First Copy Second Copy Third Copy

(a)

(b)

(c)

final attractor independent of starting image -depends only on transformation

affine transformation:

wi

[xy

]=

[ai bi

ci di

] [xy

]+

[ei

fi

]

Universitat Dortmund, Informatik VI, N. Fuhr

Page 36: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

some affine transformations

each image is transformed copy of itself→ image must have detail at every scale→ images are fractals

fractal image compression:store images as collections of transformations e.g. fern

Universitat Dortmund, Informatik VI, N. Fuhr

advantage: multiresolution representation of images

fractal vs. pixel-based representation:

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.6.2 Iterated function systems

Contractive transformations

transformation contractive iff for any two points P1, P2:

d(w(P1), w(P2)) < s · d(P1, P2)

(for s < 1)

distance in the plane:

d(P1, P1) =√

(x2 − x1)2 + (y2 − y1)2

example contractive transformation

wi

[xy

]=

[12

00 1

2

] [xy

]

Universitat Dortmund, Informatik VI, N. Fuhr

iterated function system:collection of contractive transformations

{wi :

<−>

2

→ R2|i = 1, . . . , n}maps plane R2 to itself

collection of transformations defines map

W (·) =

n⋃i=1

wi(·)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 37: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

f0 input image

f1 = W (f0)

f2 = W (W (f0)) = W ◦2(f0)

contractive mapping fixpoint theorem:

|W | ≡ f∞ = limn→∞

W ◦n(f0)

attractor is independent of f0 !

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.6.3 Self-similarity in images

grey-scale images as functions f(x, y)

Universitat Dortmund, Informatik VI, N. Fuhr

metric on images

δ(f, g) = sup(x,y)∈I2

|f(x, y) − g(x, y)|

Universitat Dortmund, Informatik VI, N. Fuhr

natural images are not exactly self-similar

Universitat Dortmund, Informatik VI, N. Fuhr

Page 38: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.5.6.4 Partitioned iterated function systems

partitioned copying machine

specification of copying machine:

1. # copies

2. affine transformation (for each copy)

3. contrast and brightness adjustment (for each copy)

4. mask for selecting part of the original to be transformed(for each copy, Di → Ri)

specification of transformation wi

wi

[xyz

]=

[ai bi 0ci di 00 0 si

][xyz

]+

[ei

fi

oi

]

si controls contrast (s < 1)oi affects brightness

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.6.5 Encoding images

ideal goal of fractal image compression:satisfy fixed point equation

f = W (F ) = w1(f) ∪ w2(f) ∪ · · ·wN(f)

→ seek partition of f into pieces s.th. f.p.e. is fulfilled

approximation:

f ≈ f ′ = W (f ′) ≈ W (f) = w1(f) ∪ w2(f) ∪ · · ·wN (f)

minimize quantities

δ(f ∩ (Ri × I), wi(f)) i = 1, . . . , N

1. find good choice for Di

2. find good contrast and brightness settings si and oi

Universitat Dortmund, Informatik VI, N. Fuhr

example:

• 256*256 pixels input image

• output ranges: Ri: consider nonoverlapping 8*8 sub-squares (1024)

• input ranges: Di: overlapping 16*16 subsquares (241 ·241 = 58 0581)

• 8 ways for mapping square → square(4 rotations, flip + 4 rotations)

• estimate si and oi using least squares regression

Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr

Page 39: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

compression

input image: 65536 bytes

compressed image: 3968 bytes

→ compression factor: 16.5

Universitat Dortmund, Informatik VI, N. Fuhr

2.5.6.6 Partitioning images

image areas requiring different levels of detail →vary size of input ranges Ri

quadtree partitioningdivide square into 4 sub-squares

Universitat Dortmund, Informatik VI, N. Fuhr

HV-partitioningdivide rectangle either horizontally or vertically

R21R

1st Partition 2nd 3rd and 4th Partitions

(a) (b) (c)

Universitat Dortmund, Informatik VI, N. Fuhr

triangula

rpartitio

nin

grecta

ngle→

2tria

ngles,

triangle→

4tria

ngles

(connect

partitio

nin

gpoin

tson

each

side)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 40: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.6 Audio

2.6.1 Introduction

human perception: 20 Hz – 20 kHz

digital audio:

• sample audio input in regular, discrete intervals

• quantize sampled values

digital audio data: sequence of binary values representingnumber of quantizer levels

pulse code modulation:represent each sample with an independent code word

Universitat Dortmund, Informatik VI, N. Fuhr

PCM

VA

LU

ES

DIG

ITA

L S

IGN

AL

PRO

CE

SSIN

GD

IGIT

AL

-TO

-AN

AL

OG

CO

NV

ER

SAT

ION

AN

AL

OG

-TO

-DIG

ITA

LC

ON

VE

RSA

TIO

N

AN

AL

OG

AU

DIO

INPU

TPC

MV

AL

UE

S

AN

AL

OG

AU

DI

OU

TPU

T

0011

0111

000.

..11

0011

0010

0...

Universitat Dortmund, Informatik VI, N. Fuhr

Nyquist theory:time-sampled signal can represent signals up to half the samplingrate

typical sampling rates:

8 kHz for speech

44.1 kHz for music (audio CD)

quantizer levels: power of 2each bit reduces signal-to-noise ratio by 6 db

typical # bits/sample:

8 (= 48 dB) speech, low-quality audio

16 (= 96 dB) high-quality audio (audio CD)

data rates for uncompressed audio:8 . . . 176 kB/sec (176 for audio CD, stereo)

Universitat Dortmund, Informatik VI, N. Fuhr

2.6.2 Media type

Temporal: Digital audio

• Representation

– Sampling frequency– Sample size and quantization: linear, nonlinear– Number of channels (tracks): 2, 4, 16, 32– Interleaving– Negative samples: one or two’s complement– Encoding: PCM, ADPCM

• Operations

– Storage– Retrieval– Editing: cross-fade, play list– Effects and filtering: delay, equalization, normaliza-

tion, noise reduction, time compression/expansion,pitch shifting, stereoization, acoustic environments

– Conversion

Universitat Dortmund, Informatik VI, N. Fuhr

Page 41: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.6.3 Formats

2.6.3.1 µ-Law Audio Compression

logarithmic quantization

• represents low-amplitude audio samples with greater ac-curacy

• →uniform signal-to-noise ratio over range of amplitudes

• 8 bits/sample represent 14 bits in linear sampling

• used in ISDN telephone (with 8kHz sampling)

x input signal, |x| ≤ 1

y output signal

µ = 255

y =

{255 − 127

ln(1+µ)· ln(1 + µ · |x|) for x ≥ 0

127 − 127ln(1+µ)

· ln(1 + µ · |x|) for x < 0

Universitat Dortmund, Informatik VI, N. Fuhr

2.6.3.2 ADPCM

adaptive pulse code modulation

adjacent samples have similar values→ encode PCM value of the difference

(ADAPTIVE)DEQUANTIZER

(ADAPTIVE)PREDICTOR

C[n] Dq[n]

Xp[n-1]

Xp[n]+

+

+

(b) ADPCM Decoder

(ADAPTIVE)QUANTIZER

(ADAPTIVE)PREDICTOR

(ADAPTIVE)DEQUANTIZER

+X[n]

Xp[n-1]

D[n] C[n]

Xp[n]

Dq[n]

+

+

(a) ADPCM Encoder

-

ADPCM coder can adapt to characteristics of audio signal

Universitat Dortmund, Informatik VI, N. Fuhr

• change step size of quantizer

• change step size of predictor

different algorithms/standards, depending on

• adaptation possibilities

• side information

– quantizer/predictor step size– redundant contextual information (for error recov-

ery)

algorithms:

• IMA/ADPCM: Interactive Multimedia Association

• CCITT G.721 (32 kbps compressed data)

• CCITT G.723 (24 kbps compressed data)

• compact disc interactive (CD-I) audio compression algo-rithm

Universitat Dortmund, Informatik VI, N. Fuhr

IMA/ADPCM Algorithm

• compression rate: 4:1

• 16 bits/sample → 4 bits/sample

simple predictor:predicted value = previous sample

quantizer4 bits output:signed multiples of current step size/4

adaptation

• quantizer adapts step size based on

– current step size– quantizer output of previous input

• based on table lookup

• no side information required

good error recovery

Universitat Dortmund, Informatik VI, N. Fuhr

Page 42: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.6.3.3 MPEG Audio

• lossy, but perceptually lossness compression

• 48 kHz sampling rate, 2*16 bits/sample

• compression rate: 6:1

• exploitation of auditory masking

���������������������������������������������������������������������������������������������������������������������������������������

���������������������������������������������������������������������������������������������������������������������������������������

SIGANLS ARE MASKEDREGION WHERE WEAKER

AM

PLIT

UD

E

FREQUENCY

STRONG TONAL SIGNAL

Universitat Dortmund, Informatik VI, N. Fuhr

Layer I

• filter bank divides audio signal into 32 frequency bands

• 12 samples per band

• for each nonzero sample:

– bit allocation– scale factor

output of layer I:frame with 32 groups of 12 samples = 384 samples

Universitat Dortmund, Informatik VI, N. Fuhr

Layer II

codes data in larger groups:frame with 3*12*32 samples

exploits common bit allocation and scale factors

Universitat Dortmund, Informatik VI, N. Fuhr

Layer III

• alias reduction:modified discrete cosine transformation

• logarithmic quantization

• entropy coding (Huffman)

• bit reservoir for effects due to entropy coding

• noise allocation instead of bit allocation

Universitat Dortmund, Informatik VI, N. Fuhr

Page 43: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Stereo redundancy coding

two types of coding:

• intensity stereo codingfor high frequencies:

– encode single summed signal for both channels– only independent scale factors

• middle/side stereo coding

– middle channel– + 2 side channels

Universitat Dortmund, Informatik VI, N. Fuhr

2.7 Video

2.7.1 Basics

2.7.1.1 B/W TV

presentation in greyscale only (luminance)

European format:

• 625 lines

• 833 colums

• ratio width/height: 4:3

• 25 frames/second

bandwidththeoretically:

1 s / 25 frames/s / 625 lines/frame= 64 µs/line= 15625 Hz

b/w changes between every pair of pixels in a line→ 15625 Hz * 833/2 ≈ 6.5 MHz

in practice: 5 – 5.5 MHz

Universitat Dortmund, Informatik VI, N. Fuhr

interlaced mode

50 half images/second

Universitat Dortmund, Informatik VI, N. Fuhr

2.7.1.2 Colour TV

colour representations:

RGB three basic colours: red, green, blue

YUV luminance (as in b/w TV) +2 chrominance channels - used in PAL

YIQ used for NTSC

Universitat Dortmund, Informatik VI, N. Fuhr

Page 44: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.7.1.3 TV standards

NTSC National Television Systems Committee (USA)

– 30 images/second– 525 lines/image

PAL Phase alternating line (Germany)

– 25 images/second– 625 lines/image

Universitat Dortmund, Informatik VI, N. Fuhr

HDTV High definition Television (forthcoming)

HD-MAC Europe: 1250 lines, 50 Hz (interlaced)

MUSE Japan: 1125 lines, 60 Hz

NTSC USA: 1040 lines, 60 Hz

digital TV

component-wise coding: 4:2:2emphasis on luminanceluminance sampling: 13.5 MHzchrominance sampling: 6.75 MHz→ 216 Mbps

Universitat Dortmund, Informatik VI, N. Fuhr

2.7.1.4 Computer video

• non-interlaced display

• image rate: typically 70 Hz

• colour display:in RGB mode

a) with 24 bits/pixelb) via CLUT (colour lookup table)

8 or 16 bits/pixel → 256 or 65536 colours(out of 224)

Universitat Dortmund, Informatik VI, N. Fuhr

2.7.2 Media types

Media type Temporal: Analog video

• Representation

– Frame rate– Number of scan lines– Aspect ratio, e.g., 4:3– Interlacing, e.g., 2:1 fields per frame– Quality, e.g., signal-to-noise ratio and image resolu-

tion– Component versus composite

• Operations

– Storage: Tapes – Type B or C, Betacam, U-matic,Hi8, S-VHS, VHS; Videodisc

– Retrieval: based on time codes– Synchronization: avoid timebase jitter and timebase

phase shift using sync generator, genlock, timebasecorrector

– Editing: cuts-only editing, A-B roll editing, edit de-cision list (EDL)

– Mixing: cut, fade, dissolve (cross-fade), wipe, tum-ble, wrapping, keying

– Conversion: scan converter, standards conversion

Universitat Dortmund, Informatik VI, N. Fuhr

Page 45: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Media type Temporal: Digital video

• Representation

– Analog formats sampled: CCIR 601, digital compos-ite, CIF, QCIF, digital HDTV; synthesis, sampling

– Sampling rate– Sample size and quantization: linear, logarithmic– Data rate– Frame rate: 10, 15, 25, 30– Compression– Support for interactivity– Scalability: transmit scalability, receive scalability

• Operations

– Storage– Retrieval– Synchronization– Editing: tape based, nonlinear– Effects– Conversion

Universitat Dortmund, Informatik VI, N. Fuhr

2.7.3 MPEG-1/2

MPEG-video requirements

Generic standard

• independence of particular application

• acceptable quality for bandwidth of 1.5 Mb/s(as with CD-ROM)

Universitat Dortmund, Informatik VI, N. Fuhr

Applications

• digital storage medialow storage costs + sufficient bandwith (MPEG-1: 600MB/h = 1.5 Mb/s,MPEG-2: 1.8–4 GB/h = 0.5–1.1 MB/s)

– CD-ROM: 1.5 Mb/s– DVD: 1.1 MB/s– harddisc: ≥ 3 MB/s

• asymmetric applicationsfrequent decompression, compression only once

– electronic publishing

∗ education and training

∗ travel guidance

∗ videotext

∗ points of sale

– games– entertainment

• symmetric applicationsequal use of compression and decompression

– electronic publishing production– video mail– videotelephone– video conferencing

Universitat Dortmund, Informatik VI, N. Fuhr

Features of the compression algorithm

• random access

– access to any frame– access time ≤ 0.5 s– access points:

information unit coded without reference to otherunits

• fast forward/reverse searches

– scan compressed bit stream– display selected pictures

• reverse playback

– for specific applications only– possible without extreme memory requirements

• audio-visual synchronization

– permanent resynchronization of audio and video– integration of multiple audio and video signals

• robustness to errors

• coding/decoding delay(limited according to specific application)videotelephone: 150 ms

• editabilitypossibility of constructing short editing units

• format flexibility

– raster size– frame rate

• cost tradeoffs

– decoding with small chipsets– real time encoding possible (1990)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 46: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Overview of the MPEG compression algorithm

quality requirements→ high compression rate→ interframe encoding

random access requirements→ intraframe coding

Universitat Dortmund, Informatik VI, N. Fuhr

MPEG-1:

• block-based motion compensationfor temporal redundancy reduction

– causal (predictive) coding: P frames– noncausal (interpolative) coding: B frames

• DCT-based spatial redundancy coding(as in JPEG)

Universitat Dortmund, Informatik VI, N. Fuhr

Temporal redundancy reduction

frame types:

• intra-frames (I)

– access points for random access– moderate compression

• prediction frames (P)

– coded with reference to a past (I or P) frame– used as reference for future P frames

• interpolation (bidirectional prediction) frames (B)

– reference to a past and a future P frame– never used as reference

reference always uses motion prediction

ratio I:P:B frames is application-specific

Universitat Dortmund, Informatik VI, N. Fuhr

Forward prediction

1 2 3 4 5 6 7 8

I B B B P B B B

Bidirectional prediction

I

9

transmission order: I P B B B I B B B

FDCT QuantizationColorspace

converter

Entropyencoder

Colorspace

converter

FDCT

Entropyencoder

Reference

Errorterms

Moniorestimator

(RGB YUV)

(RGB YUV)

Compressed image data100111001 ...

Compressed image data100111001 ...+

+

-

I frame

P/B frame

Universitat Dortmund, Informatik VI, N. Fuhr

Page 47: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

motion compensation

matchBest

matchBest

3. Block B = (Block A + Block C)/22. Block B = Block C1. Block B = Block A

Block-Matching Technique

Previous frame

Future frame

Current frameA

C

B

• prediction

– local modelling of current picture as translation ofpicture at some previous time

– locality: amplitude and direction of displacementmay vary over the picture

• interpolation

– improves random access– reduces effect of errors– increases image quality

Universitat Dortmund, Informatik VI, N. Fuhr

multiresolution technique:

– subsignal with low temporal resolution (1/3 . . . 1/2frame rate)

– full-resolution signal =interpolation of low-resolution signal + correctionterm

– interpolation uses combination of past and futurereferences (bidirectional)

Universitat Dortmund, Informatik VI, N. Fuhr

bidirectional prediction

advantages:

• deals properly with areas not covered by prediction

• noise reduction by averaging between past and future ref-erence frames

• allows decoupling between prediction and coding(no error propagation)

• trade-off due to frequency of B frames:more B frames→ lower correlation of B frames with references,→ lower correlation between referencestypically: 10 B frames per seconde.g. I B B P B B P B B . . . I B B P B B

Universitat Dortmund, Informatik VI, N. Fuhr

motion representation in B frames

macroblock: 16 * 16 pixels

predictor of a macroblock depends on reference frames:

x coordinate of picture element

mj1 motion vector relative to reference frame Ij

(motion estimation information)

prediction modes:

macroblock type predictor

intra I1(x) = 128

forward predicted I1(x) = I0(x + m01)

backward predicted I1(x) = I2(x + m21)

average I1(x) = 0.5[I0(x + m01) + I2(x + m21)]

prediction error in each case: I1(x) − I1(x)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 48: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

motion estimation

computation of motion vectors:not specified in MPEG standard

typically:block-based matching technique, combined with cost funtion

Ic current frame

Ir reference frame

Mi macroblock in Ic

vi displacement of Mi w.r.t. Ir

V search range of possible motion vectors

D cost function

optimal displacement:

v∗i = min

v∈V

∑x∈Mi

D (Ic(x) − Ir(x − v))

(V , D chosen by implementation)

Universitat Dortmund, Informatik VI, N. Fuhr

Spatial redundancy reduction

fixed JPEG variant:

• 8 bits per pixel

• 1 luminance component, 2 chrominance components

• fixed DCUs: macroblock with 16*16 luminance pels, 8*8chrominanc pels

• Huffman entropy coding

• sequential encoding

Universitat Dortmund, Informatik VI, N. Fuhr

Layered structure, syntax and bit stream

goals

• genericity

• flexibilityvideo sequence parameters:

– picture width– picture height– pixel aspect ratio– frame rate– bit rate– buffer size

• efficiency

Universitat Dortmund, Informatik VI, N. Fuhr

layered syntax

• sequence layer(random access unit: context)

• group of frames layer(random access unit: video coding)

• frame layer(primary coding unit)

• slice layer(resynchronizing unit)

• macroblock layer(motion compensation unit)

• block layer(DCT unit)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 49: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

bit stream

• bit sequence consistent with syntax

• video buffer constraints

• decoding process

BufferMUX

-1

Q-1 IDCT +

+ Ref

Ref

MacroBlock Type

Motion vectors

Universitat Dortmund, Informatik VI, N. Fuhr

Standard and quality

Conformance: encoder and decoder

• bit stream and decoding process:standard defines syntax and meaning

• encoders and decoders:standard defines decoding process

Universitat Dortmund, Informatik VI, N. Fuhr

Resolution, bit rates and quality

VHS-like quality at 1.2 Mb/s

constrained parameter bit streams (CPB):

• horizontal size ≤ 720 pels

• vertical size ≤ 576 pels

• max. # macroblocks/picture ≤ 396

• max. # macroblocks/second ≤ 396·25 = 330·30• frame rate ≤ 30 frames/second

• bit rate ≤ 1.86 Mb/second

• decoder buffer ≤ 376832 bits

CIF format:352*240, 30 Hz / 384*288, 25 Hzyields 1.2–3 Mbps

CIF format often mixed up with MPEG-1but: MPEG-1 allows frame sizes up to 4096*4096!

Universitat Dortmund, Informatik VI, N. Fuhr

MPEG-2

for wider range of applications and higher bandwidth

• backward compatibility to MPEG-1

• support for interlaced video

• improvements on coding efficiency

• multiresolution video

• multichannel audio

typical frame sizes (in kbits):

Mbps Picture typeI P B Avg.

MPEG-1 SIF 1.15 150 50 20 38MPEG-2 601 4.00 400 200 80 130

Universitat Dortmund, Informatik VI, N. Fuhr

Page 50: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.7.4 MPEG-4

Content-based interactivity

• Content-based multimedia data access tools• Content-based manipulation and bit-stream editing• Hybrid natural and synthetic data coding• Improved temporal random access

Compression

• Improved coding efficiency• Coding of multiple concurrent data streams

Universal access

• Robustness in error-prone environments• Content-based scalability

Universitat Dortmund, Informatik VI, N. Fuhr

Basic concepts

AV objects:

• video object component

• audio object component

video object plane (VOP):2D video object“frame” may consist of

• Only 1 VOP (2D)

• 2 or more mutually disjoint VOPs, resulting from the seg-mentation of a 2D scene

• 2 or more VOPs, resulting from the composition of thescene from several sources

possible object manipulations:

• change of the spatial position of an object (VOP) in thescene

• application of a spatial scaling factor to an object in thescene

• change of the ‘speed’ with which an object moves in thescene

• inclusion of objects (VOPs) available at the composer butnot currently in display

• deletion of an object in the scene

• change of the scene area being displayed

Universitat Dortmund, Informatik VI, N. Fuhr

a scene with three AVOs:

a scene before transformation:

a scene after the receiver transformation:

Universitat Dortmund, Informatik VI, N. Fuhr

2.8 MPEG-7

2.8.1 Introduction

content description for audio-visual data

Universitat Dortmund, Informatik VI, N. Fuhr

Page 51: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Terminology

Data

• audio-visual information,• described by MPEG-7,• regardless of storage, coding, display, transmission,

medium, or technology.

Feature distinctive characteristic of the Data

Descriptor representation of a Feature, defines the syntax andthe semantics of the Feature representation

Descriptor Value an instantiation of a Descriptor for a givendata set

Description Scheme (DS) specifies structure and semanticsof relationships between Descriptors and/or DescriptionSchemes

Description DS (structure) + Descriptor Values (instantia-tions) describing Data.

Coded Description Description encoded for reasons of com-pression efficiency, error resilience, random access, etc.

Description Definition Language (DDL) language allow-ing for the creation / extension / modification of Descrip-tion Schemes and Descriptors

Universitat Dortmund, Informatik VI, N. Fuhr

Abstract architectue of MPEG-7 applications

Universitat Dortmund, Informatik VI, N. Fuhr

MPEG-7 parts

Systems tools for

• transport,• storage,• synchronization between content and descriptions,• managing and protecting intellectual property

Description Definition Language language for definingnew Description Schemes + Descriptors.

Audio Descriptors and Description Schemes dealing with(only) Audio descriptions

Visual Descriptors and Description Schemes dealing with(only) Visual descriptions

Generic entities and Multimedia Description SchemesDescriptors and Description Schemes dealing with genericfeatures and multimedia descriptions

Reference Software software implementation of relevantparts of the MPEG-7 Standard

Conformance guidelines and procedures for testing confor-mance of MPEG-7 implementations.

Universitat Dortmund, Informatik VI, N. Fuhr

2.8.2 MPEG-7 systems

tools for

• transport,

• storage,

• synchronization between content and descriptions,

• managing and protecting intellectual property

– to be defined in the future –

Universitat Dortmund, Informatik VI, N. Fuhr

Page 52: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.8.3 MPEG-7 Description Definition Lan-guage (DDL)

language for defining new Description Schemes + Descriptorsrequirements:

• express spatial, temporal, structural, and conceptual re-lationships

• rich model for links and references between descriptionsand the data

• validation of descriptor data types

• platform and application independent

• human- and machine-readable

→ based on XML syntax

Universitat Dortmund, Informatik VI, N. Fuhr

XML Schema Overview

• XML Schema:

– datatypes– simple and complex types– elements– inheritance, abstract types

• MPEG-7 Extensions:

– array and matrix datatyp– enumerated datatypes for Mime type, country code,

region code, currency code and character set code– typed references

Universitat Dortmund, Informatik VI, N. Fuhr

2.8.4 MPEG-7 Audio

Audio description tools for

• Sound effects description• Instrument description• Speech Recognition description

Audio Descriptor Frameworklow-level audio description

Universitat Dortmund, Informatik VI, N. Fuhr

2.8.5 MPEG-7 Video

• Color

• Texture

• Shape

• Motion

Universitat Dortmund, Informatik VI, N. Fuhr

Page 53: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Color Descriptors

Color space RGB, YUV, HSV, HMMD

Dominant color(s)

Color Histogram

Color Quantization

GoF/GoP Color Histogram Group of Frames/Group ofPictures color histogramaverage/ median / intersection

Color-Structure Histogram local cooccurrence of colors

Color Layout spatial distribution of color

Haar transformed Binary Histogram compact descriptorfor color (63 bits)

Universitat Dortmund, Informatik VI, N. Fuhr

Texture Descriptors

Luminance Edge Histogram spatial distribution of four di-rectional edges and one non-directional edge

Homogenous Texture Descriptors 2 descriptors:

1. structuredness, directionality and coarseness2. quantitative description (62 factors)

Universitat Dortmund, Informatik VI, N. Fuhr

Shape Descriptors

1. Object Bounding Box

2. Region-Based Shape

3. Contour-Based Shape

Universitat Dortmund, Informatik VI, N. Fuhr

Motion Descriptors

• Camera Motion

• Object Motion Trajectory

Universitat Dortmund, Informatik VI, N. Fuhr

Page 54: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.8.6 MPEG-7 Multimedia DescriptionSchemes

Universitat Dortmund, Informatik VI, N. Fuhr

Content Management

Universitat Dortmund, Informatik VI, N. Fuhr

Navigation & Access

Summary Efficient support of browsing

• Hierarchical: Coarse to fine• Sequential: 1D temporal structure

Variation Substitution of the original content

• Adaptation to terminal, network, or user preferences

Universitat Dortmund, Informatik VI, N. Fuhr

MMDS: elements and functionality

Creation & Production Meta information describing cre-ation and production of the content

• title,• creator,• classification,• purpose of the creation,• etc.

Usage Meta information related to the usage of the content:

• rights holders,• access right,• publication,• financial information.

Media Description of storage media:

• storage format,• encoding of the AV content• identification of the media

Universitat Dortmund, Informatik VI, N. Fuhr

Page 55: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

Structural aspects description structured around segments

• physical spatial, temporal or spatio-temporal com-ponent of the AV content

• signal-based features (color, texture, shape, motion,audio features) + elementary semantic information

Conceptual aspects Description from the conceptual view-point (under development)

Universitat Dortmund, Informatik VI, N. Fuhr

2.9 Other media

2.9.1 Music

Temporal

• Representation

– Operational versus symbolic– MIDI– SMDL: Standard Music Description Language

(SGML)

• Operations

– Playback and synthesis– Timing– Editing and composition

Universitat Dortmund, Informatik VI, N. Fuhr

2.9.1.1 MIDI

(Music Instrument Digital Interface)

defines interface between electronic music instruments and com-puters

compact representation of music data(≈ 0,3 kB/sec, vs. 176 kB/sec for CD audio)

basic idea:coding comprises

• name of instrument,

• start/end of note,

• base frequency,

• volume

Universitat Dortmund, Informatik VI, N. Fuhr

MIDI: model

• 16 channels for data transmission

– each channel corresponds to a synthesizer– several instruments can play different notes at the

same time– 3–16 simultanous notes per channel

(subject to quality of synthesizer)

• 128 instruments

– including sound effects(e.g. telephone, helicopter)

– addressed by unique number 0–127

• MIDI-clock

– allows for synchronization between sender and re-ceiver

– 24 ticks per quarter note

• SMPTE time code as alternative to MIDI-clock

– SMPTE = Society of Motion Picture and TelevisionEngineers

– SMPTE defines format:hours:minutes:seconds:frames(e.g. 30 frames/sec)

Universitat Dortmund, Informatik VI, N. Fuhr

Page 56: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

MIDI infrastructure

• components

– input: typically via keyboard (like piano)different instruments can be imitated

– output: typically via synthesizer(transforms stored digital signal via D/A trans-former in acoustic signal)

– sequencer as editor for MIDI datauser interface: notes / technical MIDI data

• multimedia applications based on MIDI data allow for in-stantanous output via synthesizer

– MIDI requires precise timing of data transmission

Universitat Dortmund, Informatik VI, N. Fuhr

2.9.2 Graphics

Non-temporal

• Representation

– Geometric models (used in GKS, PHIGS, PEX)– Solid models: constructive solid geometry, surfaces

of revolution, extrusion– Physically based models (considering mass, velocity,

rigidity)– Empirical models: fractals, particle systems– Drawing models: PostScript, LOGO graphics– External formats for models: CGM, Render-Man In-

terface Binary (RIB)

• Operations

– Primitive editing: for objects, of vertex coordinates,surface normals

– Structural editing: creating, modifying, spatial rela-tionships

– Shading: flat, Gouraud, Phong, ray tracing, radios-ity, programmable shaders

– Mapping: texture mapping, bump mapping, dis-placement mapping, environment mapping, shadowmapping

– Lighting: ambient light, point lights, directionallights, spot lights

– Viewing: 2 or 3D, parallel and perspective projec-tions

– Rendering: converts a model (shading, lighting,viewing info) into an image

Universitat Dortmund, Informatik VI, N. Fuhr

2.9.3 Animation

Temporal

• Representation

– Cel models: celluloid sheets– Scene-based models– Event-based models– Keyframes– Articulated objects and hierarchical models– Scripting and procedural models– Physically based and empirical models

• Operations

– Graphics– Motion and parameter control– Animation rendering– Animation playback

Universitat Dortmund, Informatik VI, N. Fuhr

Other Media

• Media type Other: Extended Images

• Media type Other: Digital ink

• Media type Other: Speech audio

• Media type Temporal: Animation

Universitat Dortmund, Informatik VI, N. Fuhr

Page 57: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.10 Multimedia

• MHEG

• SMIL

Universitat Dortmund, Informatik VI, N. Fuhr

2.10.1 MHEG

standard for interoperability and interchange of hypermedia(MH) objects

application areas:

• training and education

• documentation

• electronic books

• computer-supported multimedia cooperative work

• point of information

• medical applications

issues:

• association of content and presentation attributes

• synchronization in space and time

• linking between components

Universitat Dortmund, Informatik VI, N. Fuhr

2.10.1.1 The MHEG Standard

object:coded representation of independent and elementary unit of in-formation

objects interchanged and handled by applications

types of objects:

• monomedia

– text– graphics– image– audio– video– menu

• aggregated objectsdifferent media,with internal synchronization and links

input/output objects

Universitat Dortmund, Informatik VI, N. Fuhr

Specifity of the MHEG Standard Scope

• interactivity and multimedia synchronization

• real-time presentation

• real-time interchange

• final form presentations

Universitat Dortmund, Informatik VI, N. Fuhr

Page 58: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.10.1.2 MH Objects Classes

Object Orientation

advantages of object-orientation:

• data encapsulation

• inheritance

• homogeneity of MH object descriptions

• representation of behaviour(autonomous objects in highly dynamic environment)

Universitat Dortmund, Informatik VI, N. Fuhr

Representation of MH Objects

content objectencoded monomedia data + decoding and presentationinformation

projector objectpresentation attributes for content or composite object

basic objectcontent + projector object

composite objectset of MH objects + temporal and spatial interobject re-lations

conditional action set objectdefines relations based on conditions

generic input objectdefines selection + text input methods

Universitat Dortmund, Informatik VI, N. Fuhr

MH Object Classes

object hierarchy:

MH object

• all-object

• clock

• null

Universitat Dortmund, Informatik VI, N. Fuhr

• all-object

– output content

∗ text content

∗ graphics content

∗ still picture content

∗ audio content

∗ audiovisual sequence content

– generic input

∗ action-button

∗ stay-on-button

∗ on-off button

∗ menu selection

∗ multiple selection

∗ etc . . .

– projector

∗ area projector

· text projector

· graphics projector

· still picture projector

· input projector

∗ audio projector

∗ audiovisual projector

– basic– spatio-temporal composites– conditional action set

Universitat Dortmund, Informatik VI, N. Fuhr

Page 59: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.10.1.3 Methodology for MH Object Classes De-scription

4 levels:

1. informal text description

2. object-oriented definition

• class hierarchy• class behaviour• structure and semantics of attributes

3. notation of structure of presentationASN.1 syntax (abstract syntax notation)

4. coded object representation (ASN.1)

Universitat Dortmund, Informatik VI, N. Fuhr

2.10.1.4 Basic Objects Representation

basic object = content + projector object

content class:

• general attributes (inherited from superclass)

• specific attributes for encoding parameters

projector class:parameters relevant for presentation

• area projector: position + area size

• audio projector: volume, stereo/mono, balance, speed

Universitat Dortmund, Informatik VI, N. Fuhr

example: still picture object class(object-oriented definition)

descriptioninherits from = content classinherited by = NONE

representation(notation of structure)

• coding method

• coding parameters

• JPEG-parameters

• Huffman/arithmetic

• progressive/sequential

• color space

• source pixel density

• source data precision

• source image format

Universitat Dortmund, Informatik VI, N. Fuhr

2.10.1.5 Composite Objects: Multimedia Syn-chronization

General Considerations

synchronization modes:

• script defined by a using application

• system synchronization already provided within the ob-ject(e.g. MPEG)

• spatiotemporal synchronization provided by compositionof child objects within parent object

• conditional synchronization provided by management ofevents generated by

– other objects,– user’s interaction, or– a using application

Universitat Dortmund, Informatik VI, N. Fuhr

Page 60: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

description of conditional synchronization

conditioncombination of event(s) + additional conditions

event

• event typestart/end of object, elapsed time

• object id• current state of object

running, stopped, selected

end, object ni, state=running

additional conditiondescribes context in which event occursobject nj , state=stopped

actionto be performed when condition=true

conditional action setset of (condition,action) pairs

Universitat Dortmund, Informatik VI, N. Fuhr

multimediascenarioStort of the

Fixed-delayinput

T 1

Altanatescenario

: syncro conditioning events(generated by presentationprocess, or by user’s interaction)

Picture

Sound

User response

Text 1 Text 2

S1 S2

Delay

T 2

time

END

Maxtime

{ MPEG sequencePicture n° x

Graphics1Delay

(text and graphics on video)1

Universitat Dortmund, Informatik VI, N. Fuhr

Space and Time Relations

placement of objects in space and time,based on attributes:

spatial position

• parallel relation• serial relation

Area sizeObject A

Area sizeObject B

1

2

3

1 2 3 4

X1=1Y1=2

MHgenericspaceorigin

X2=3Y2=1

Parallel spatial relation

MH generic coordinate space

Universitat Dortmund, Informatik VI, N. Fuhr

X1=1Y1=3

X2=2Y2=-1

Area sizeObject A

Area sizeObject B

1

2

3

1 2 3 4

Serial spatial relation

MHgenericspaceorigin

MH generic coordinate space

Universitat Dortmund, Informatik VI, N. Fuhr

Page 61: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

temporal position

• parallel relation• serial relation

Temporal parallel relation

Parent object

Child object 1

Child object 2

t2

t1

Universitat Dortmund, Informatik VI, N. Fuhr

Parent object

Temporal serial (or sequential) relation

Child object 1

Child object 2

t2

t1

Universitat Dortmund, Informatik VI, N. Fuhr

General Framework for Spatiotemporal Composi-tion Representation

representation of composite objects:

1. description of relationshipin terms of position in time and space

2. list of component objectscomponent object:

• contained in the composite object, or• referenced by application-provided instance number,

or• standardized reference to external object

Universitat Dortmund, Informatik VI, N. Fuhr

2.10.1.6 Input Objects

buttons

• action-button: trigger, yields event• stay-on-button: trigger + local boolean variable• switch button: two-state input object

menu selectionyields number of selected item

multiple selectionyields indication of selected items

character stringcharacter sequence + text attributes

locationyields horizontal + vertical coordinates

numerical valueyields integer between minimum and maximum,linearly related to cursor position

Universitat Dortmund, Informatik VI, N. Fuhr

Page 62: Media - is.inf.uni-due.de€¦ · Coding and compression methods Text Images Audio Video Other media 10 2.1 Media Classi cation 2.1.1 Basic concepts kinds of media: perception media

2.11 SMIL

Synchronized Multimedia Integration Language(W3C standard)

motivation:

• spatio-temporal composition of presentations

• declarative spezification

• text-based format

• specified as XML-DTD

• non-interactive presentations only! (except via linking)

Universitat Dortmund, Informatik VI, N. Fuhr

SMIL concepts

• media objects referenced via URIs

• spatial and temporal addressing by means of intervals andregions

• all objects in a single root window

• Z index for layer ordering for visual display

• spercification of temporal synchronization

• hard and soft synchronization:

hard: for audio-video synchronization, limited jittersoft: for background music; only fixed starting time

• alternative content for different presentation qual-ity/output devices

• flexiblelinking model

• semantic annotations

Universitat Dortmund, Informatik VI, N. Fuhr

SMIL example

<smil> <head>

<meta name="Title" content="Welcome to RealPlayer" />

<meta name="Author" content="RealNetworks" />

<meta name="Copyright" content = "(c) Real" />

<layout>

<root-layout height="300" width="350"

background-color="black" />

<region id="full_screen" left="0" top="0"

height="300" width="350" fit="fill" z-index="1" />

</layout></head>

<body> <par>

<audio src="firstrun.rm" />

<animation src="firstrun.swf" region="full_screen"

fill="freeze">

<anchor href="command:openwindow(tutorial,

http://ramhurl.real.com/g2install.html?

file=tutorials/607/free/overview.smi)"

coords="40,130,315,160" begin="14.9s" />

<anchor href="command:openwindow(tutorial,

http://ramhurl.real.com//take5demo.smi)"

coords="40,170,315,200" begin="14.9s" />

<anchor href="command:openwindow(tutorial,

http://ramhurl.real.com/start.smi)"

coords="40,205,315,235" begin="14.9s" />

</animation>

</par> </body>

</smil>

Universitat Dortmund, Informatik VI, N. Fuhr