media - is.inf.uni-due.de€¦ · coding and compression methods text images audio video other...

Chapter 2

Media

• Media classification• Requirements for media representations• Coding and compression methods• Text• Images• Audio• Video• Other media

10

2.1 Media Classification

2.1.1 Basic concepts

kinds of media:

• perception mediahow do humans percept the information?

– viewing: text, image, video– listening: music, sound, speech– touching– tasting– smelling– balance

• representation mediahow is the information coded?e.g. text in ASCII

• presentation mediawhich devices are used for I/O to/from the computer?

– inputkeyboard, camera, microphone, mouse, dataglove

– outputpaper, monitor, loudspeaker

Universitat Dortmund, Informatik VI, N. Fuhr

• storage mediawhere is information stored?microfilm, paper, floppy disc, harddisc, CD-ROM,DVD, tape

• communication mediawhat is used for transmitting information?coax cable, twisted pair, FDDI, electromagneticwaves

• information exchange mediawhat is used for exchanging information between dif-ferent sites?paper, floppy disc, CD, microfilm(see also: communication media)

here: perception media


presentation space

each medium yields presentation value in presentationspace

presentation value:representation of information in the mediume.g. text: sequence of charactersspeech: sound waves

dimensions of presentation space

• spatial dimensions (2-3)• temporal dimension (1)

classification of media according to temporal dimen-sion

• discrete: timed independente.g. text, graphics

• continuous (temporal): time dependentvideo, audio, sensor signals


audio

t

x

yim

age

video

text is a linear medium

...


2.1.2 Data streams

required for continuous media

properties of data streams:

• classical:– asynchronuous– synchronuous:

finite upper bound for end-to-end time difference– isochronuous:

finite upper bound for start and end time differ-ence

• periodicitychange of time interval for transmission of data pack-ets

– periodical:– weakly periodical– aperiodical

e.g. for transmission of events• variation of data rate

for subsequent information units– uniform– weakly uniform

periodic variation of data volume per informa-tion unite.g. MPEG: ratio of I:P:B frames

– varying


• dependence of subsequent packets– dependent– independent (data stream with “holes”)

• information unitscan be defined differentlyhere: information unit = logical data unitdifferent granularities possiblee.g. video: pixel — raster — frame — clip — film


2.2 Requirements for media repre-

sentations

• compression• easy processing• transmission (progressive mode)• referencing/addressing• logical structure• layout specification• attributes• annotation


2.3 Coding and compression meth-

ods

goal: reduction of storage/bandwidth requirements

2.3.1 Classification of methods

• losslessexploitation of redundancy (entropy)

• lossygoal: minimum impact on presentation quality


types of lossless coding methods:

• entropy coding– run-length coding– Huffmann coding– arithmetic coding

• source coding– prediction: DPCM– transformation: FFT, DCT

• hybrid coding: JPEG, MPEG


2.3.2 Basic methods

2.3.2.1 Lossless coding methods

run-length codingencoding of bytestreamsin case of frequent repetition of a byte:byte + # occurrences(requires an escape byte)ABCAAABBBBCCCCCD →ABCAAA!4B!5CD

zero suppressionspecial case of run-length coding,only run length of special byte is coded


pattern substitutionreplaces frequent patterns by single codes

frequently used: LZW (Lempel-Ziv-Welch)

uses adaptive table of predefined sizecodes are pointers into the dictionary(typically 9-14 bits)


dictionary initialization:character set = codes 0. . . 255

encoding:sequential processing of input characters

1. if string is in table, append next char2. if string is not in table:

a) output last known string’s codeb) add new string to tablec) start new string with char

example:

Prefix Suffix New String Output

∆ a a -a b ab 97b a ba 98a b ab -ab c abc 256c b cb 99b a ba -ba ∆ ba 257


statistical coding

• characters encoded with different # bits• frequent characters with few bits,

infrequent characters with more bits


a) Huffman codingrequires probability of occurrence for each characterminimizes # bits for average message

varying # bits for different characters→ prefix property necessary (decoding without backtrack-ing)


Huffman code example

byte prob. codeA 0.40 00B 0.20 01C 0.20 10D 0.10 110E 0.10 111

0 1

0

0

1

1

D

A B C

E

0 1

avg. # bits/character: 2.2

theoretical optimum:

H =∑

pi · ld 1pi

= 2.12


algorithm for code development:

• order characters by decreasing probabilities• repeat

– select 2 lines with lowest probabilities– assign bit for distinction– join lines, form new lines with sum of probabili-

tiesuntil 1 line left


E 0.13T 0.09A 0.08O 0.08N 0.07R 0.065I 0.065H 0.06S 0.06D 0.04L 0.035C 0.03U 0.03M0.03F 0.02P 0.02Y 0.02B 0.015W0.015G 0.015V 0.010J 0.005K 0.005X 0.005Q 0.0025Z 0.0025


0.13E 0.13 0.30T 0.09 0.17A 0.08 0.058O 0.08 0.15N 0.07 0.28R 0.065 0.13I 0.065

1.0H 0.106 0.12S 0.06 0.195D 0.04 0.075L 0.035 0.305C 0.03 0.06U 0.03 0.11M0.03 0.05F 0.02 0.42P 0.02 0.040Y 0.02 0.070B 0.015 0.030W0.015 0.115G 0.015 0.025V 0.010 0.02J 0.005 0.010K 0.005 0.02X 0.005 0.010Q 0.0025 0.005Z 0.0025


b) arithmetic coding

• optimum coding (like Huffmann),but assigns fractions of bits to single characters

• encodes character by considering leading characters

idea:assign each symbol unique interval ⊂ [0, 1](width = character probability)

character string = nesting of intervalsresulting interval represented as floating point number

code definition:

• fix symbol order• assign disjoint ranges [l[s], h[s]) of [0, 1] to symbols s,

width h[s] − l[s] = character probability

encoding of string s1, . . . , sn:

b = l[s1]t = h[s1]for i = 2 to n do

r = t − bb = b + r · l[si]t = t + r · h[si]

• output: arbitrary floating point number ∈ [b, t]


example for arithmetic coding

byte prob. rangeA 0.40 [0.0, 0.4)B 0.20 [0.4, 0.6)C 0.20 [0.6,0.8)D 0.10 [0.8,0.9)E 0.10 [0.9,1.0)


transformation coding

• transforms values into different mathematical space(which is suited better for coding)

• examples:discrete cosine transform (DCT)fast fourier transform (FFT)


prediction / relative codingencodes only differences between subsequent bytes/blocksexamples:

• integers

5, 8, 12, 13, 15, 18, 23, 28, 29, 40, 60encode differences:

5, 3, 4, 1, 2, 3, 5, 5, 1, 11, 20→ smaller # bits/entry required

• images:homogeneous area → small differences between neigh-boured pixels→ many 0 differences → zero suppression/run-lengthencoding

• still videosmall differences between subsequent images(e.g. in background)

• audio;differential pulse code modulation:encoding of differences between subsequent PCM val-ues


adaptive coding

• other coding methods:suitable only in typical contextsnon-typical byte sequence → no compression

• adaptive methods– adapt to specific context– but require additional transmission of coding pa-

rameters


2.3.2.2 Lossy coding methods

vector quantizationdivides bytestream into blocks of n bytesuses table with patterns,block approximated by patternblock encoded as index in pattern table

• linear quantization• logarithmic quantization

subband coding

• transformation of certain frequencies only• quality criterion: # bands• used for speech, MPEG audio


wavelets

wavelet functions:

• orthogonal basis of functions• squared errors sum up

Haar basis:

e(x) = α0 +k∑

i=1

2k−1∑j=1

αij · wij(x)

wij(x) =

1 , if 2j−22i ≤ x < 2j−1

2i

−1 , if 2j−12i ≤ x < 2j

2i

0 , otherwise

derivation of e(x) for example function:

9 7 2 68 4 1 -2

6 2

e(x) = 6 + 2w11(x) + 1w21(x) − 2w22(x)


example function and Haar basis

1 1

1

1

1 1

w

w

w11

21

22

10

1

1

1w0


task

approximate f(x) by f ′(x) such that

||f(x) − f ′(x)|| < ε∑x

(f(x) − f ′(x))2 < ε

where f ′(x) is wavelet function and

|{αij |αij 6= 0}| = min

solution

sort coefficients by |αij | · 2−(i/2)

(gives order of increasing squared error)

find maximum n s.th. setting first n αij = 0 yields

||f(x) − f ′(x)|| < ε

example:

e(x) = 6 + 2w11(x) + 1w21(x) − 2w22(x)

coefficients:

(6, α0), (2, α11), (+1, α21), (−2, α22)


sorted by increasing squared error:(12, α21

),

(22, α22

),

(2√2, α11

), (6, α0)


Example:non-Haar basis - squared errors do not sum up

l·a2

l2b2

l1a2 + l

2

(a2 + b2 + 2ab

)

Haar basis - squared errors sum up

l·a2

l2b2

l

2a2 +

l

4((a + b)2 + (a − b)2

)


=l

2a2 +

l

4(a2 + 2ab + b2 + a2 − 2ab + b2

)


2-dimensional case


standard wavelet decomposition

� � �

.

.

.

-

transform rows

?

transform


nonstandard wavelet decomposition

.

.

.

-

transform rows

?

transform

columns

Universitat Dortmund, Informatik VI, N. Fuhrex

ample

wavelet

compression

(a)

(b)

(c)

(d)

a)originalim

ageb)

19%of

coefficients

(5%error)

c)3%

ofcoeffi

cients(10%

error)d)

1%of

coefficients

(15%error)


2.4 Text

2.4.1 Media type

Non-temporal: Text

• Representation– ASCII, ISO character sets– Marked-up, structured text– Hypertext

• Operations– Operations: character, string, language-specific– Editing, formatting– Pattern-matching and searching– Sorting– Compression– Encryption



2.4.2 SGML

markup language for text,worldwide standard

markup approaches:

1. punctuation2. layout (WYSIWYG)3. procedural (Troff, TeX, LaTeX)4. descriptive (GML, SGML)5. referential (embed, include; SGML)6. meta-markup


SGML standards

• SGML = ISO 8879,Standard Generalized Markup Language

• related standards:– ISO 10179: DSSSL,

Document Style Semantics & Specifications(layout specification language for SGML docu-ments)

– ISO 8613: ODA,Office Document Architecture:(formating, presentation, exchange)ODML: SGML-DTD for ODA documents


properties of SGML

SGML is

• markup language, database language• extensible document description language• meta language for the definition of document types

SGML supports

• logical structures, hierarchies• linking and addressing of files• multimedia and hypertext


Processing of SGML documents

DSSL1

DSSL2

SGML Parser

FormattedDocuments Documents

Displayed

Doc1 Doc2 Doc3DTD 1

DTD 2

• syntax checking (according to a DTD)• printing according to a DSSSL specification• presentation on a screen

(according to a DSSSL specification)• indexing for context-oriented search• transformation in other representations


SGML markup

SGML supports 4 types of markup:

1. descriptive: tags2. referential: references to objects3. meta markup: markup declarations (DTD)4. procedural: LINK, CONCUR


Descriptive markup

• SGML document consists of elements<author><first>John</first><last>Smith</last></author>

• element:1. start tag2. content3. end tag

• content: defined by content model(grammar production)

– text (#PCDATA) or– sequence of elements

→ nesting of elements• top level element: document• start tag may have attributes

(attribute-value pairs)• document syntax defined in DTD (document type

definition)


Example DTD

<!ELEMENT article - -(title, abstract, section+)>

<!ELEMENT title - - (#PCDATA)><!ELEMENT abstract - o (#PCDATA)><!ELEMENT section - o((title, body+) | (title, body*, subsectn+))><!ELEMENT subsectn - o (title, body+)><!ELEMENT body - o (figure | paragr)><!ELEMENT figure - o EMPTY><!ELEMENT paragr - o (#PCDATA)>

<!ATTLIST article author NAMES #REQUIREDstatus (final | draft) draft #REQUIRED><!ATTLIST figure file ENTITY #IMPLIED>

<!ENTITY file SYSTEM "/tmp/picture.ps" NDATA><!ENTITY amp "&">


Example document

<article status = draft"author = "Cluet Christophides">

<title>From Structured Documents to ...</title><abstract>Structured Documents (e.g SGML) canbenefit from...

<section><title>Introduction</title><body><paragr>This Paper is organized as follows....</body></section>

<section><title>SGML preliminaries</title><body><figure>

</article>


DTD syntax

element:<!ELEMENT element name omitstart omitend production>

attribute list for elements:<!ATTLIST element name attribute name domain default>

entities: (macro mechanism)<!ENTITY ename value >referencing: &ename

DTDs

• define a class of documents• specialize SGML for documents of a class• contain an attribute grammar• contain a nesting grammar• support hierarchies by means of nesting


<!ELEMENT HTML O O HEAD, BODY --HTML document--><!ELEMENT HEAD O O TITLE><!ELEMENT TITLE - - #PCDATA><!ELEMENT BODY O O %content><!ENTITY % content

"(%heading | %htext | %block | HR)*"><!ENTITY % heading "H1|H2|H3|H4|H5|H6"><!ENTITY % htext "A | %text" --hypertext--><!ENTITY % text "#PCDATA | IMG | BR"><!ELEMENT IMG - O EMPTY --Embed. image--><!ELEMENT BR - O EMPTY><!ENTITY % block "P | PRE"><!ELEMENT P - O (%htext)+ --paragraph--><!ELEMENT PRE - - (%pre.content)+ --preform.--><!ENTITY % pre.content "#PCDATA | A"><!ELEMENT A - - (%text)+ --anchor--><!ELEMENT HR - O EMPTY -- horizontal rule --><!ATTLIST A

NAME CDATA #IMPLIEDHREF CDATA #IMPLIED --link-->

<!ATTLIST IMGSRC CDATA #REQUIRED --URL of img--ALT CDATA #REQUIREDALIGN (top|middle|bottom) #IMPLIEDISMAP (ISMAP) #IMPLIED>


HTML

• is an SGML document class (DTD)

• mixture of logical and layout tags• no fixed DSSSL style sheet

no possibility for transmission of style sheets

consequences:

• HTML is less flexible than SGML• only minimum logical structuring possible

(makes retrieval difficult)• layout can be controlled only partially by document

provider


2.4.2.1 DSSSL

language for describing layout of SGML documents

1. expression language (subset of Scheme)2. style language for formatting3. query language for retrieving document parts


SG

ML

Do

cu

me

nt

SG

ML

Do

cu

me

nt

STTPSTFP

SPDL

or p

rop

rieta

ryfo

rm

Sourc

eD

oc

ume

ntTre

eTra

nsform

atio

nPro

ce

ss

STTPO

utput

Do

cum

ent

Tree

Form

atting

Proc

ess

Outp

ut of

Form

atte

r

DSSSL Sp

ec

ifica

tion

STTP-SPECSG

ML D

ec

ls&D

TDs

STFP-SPEC


formatting

• input: SGML document + DTD + DSSSL stylesheet• output: formatted document (format depends on pro-

cessor)(e.g. TeX, RTF, Postscript, PDF)

formatting process:

• recursive processing of document according to DSSSLspecification

• output: tree of flow objectsflow object classes defined in DSSSL standard(e.g. page-sequence, paragraph, sequence)


example document

<!DOCTYPE FAQ SYSTEM "FAQ.DTD"><FAQ><INFO><SUBJECT> XML </SUBJECT><AUTHOR> Lars Marius Garshol</AUTHOR><EMAIL> [email protected] </EMAIL><VERSION> 1.0 </VERSION><DATE> 20.jun.97 </DATE>

</INFO>

<PART NO="1"><Q NO="1"><QTEXT>What is XML?</QTEXT><A>SGML light.</A>

</Q>

<Q NO="2"><QTEXT>What can I use it for?</QTEXT><A>Anything.</A>

</Q>

</PART></FAQ>


DTD:

<!ELEMENT FAQ (INFO, PART+)>

<!ELEMENT INFO (SUBJECT, AUTHOR, EMAIL?,VERSION?, DATE?)>

<!ELEMENT SUBJECT (#PCDATA)><!ELEMENT AUTHOR (#PCDATA)><!ELEMENT EMAIL (#PCDATA)><!ELEMENT VERSION (#PCDATA)><!ELEMENT DATE (#PCDATA)>

<!ELEMENT PART (Q+)><!ELEMENT Q (QTEXT, A)>

<!ELEMENT QTEXT (#PCDATA)><!ELEMENT A (#PCDATA)>

<!ATTLIST PART NO CDATA #IMPLIEDTITLE CDATA #IMPLIED>

<!ATTLIST Q NO CDATA #IMPLIED>


style sheet

<!doctype style-sheet PUBLIC "-//James Clark//">

;--- DSSSL stylesheet for FAQML

;---Constants

(define *font-size* 12pt)(define *font* "Times New Roman")

;---Element styles

(element FAQ(make simple-page-sequence

font-family-name: *font*input-whitespace-treatment: ’collapsefont-size: *font-size*line-spacing: (* *font-size* 1.2)

(process-children)))

(element INFO(make paragraph

quadding: ’centerspace-after: (* *font-size* 1.5)



(element SUBJECT(make paragraph

font-size: (* *font-size* 2)line-spacing: (* *font-size* 2)space-after: (* *font-size* 2)


(element AUTHOR(make sequence

(process-children)(literal ", ")))

(element VERSION(make paragraph

(make sequence(literal "Version: "))


(element DATE(make paragraph

(make sequence(literal "Last modified: "))



(element PART(make paragraph

font-size: (* *font-size* 1.5)line-spacing: (* *font-size* 2)

(make sequence(literal (attribute-string "NO"

(current-node)))(literal ". ")(literal (attribute-string "TITLE"

(current-node))))


(element QTEXT(make paragraph

font-weight: ’boldfont-size: *font-size*line-spacing: (* *font-size* 1.2)

(make sequence(literal (attribute-string "NO"

(parent (current-node))))(literal ". "))



(element A(make paragraph

space-after: (* *font-size* 0.66667)font-size: *font-size*line-spacing: (* *font-size* 1.2)



2.4.3 XML

weaknesses of HTML

• mixture of logical and layout markup:– logical: TITLE, H1, MENU, P– layout: I, B; FONT, CENTER, BGCOLOR att-

tributes• lack of markup facilities for specific texts

(e.g. math, chemistry)• little internal structure of elements


XML vs. SGML

• complexity of SGML implementations→ XML is simplified version of SGML

• weak support for different character sets in SGML→ XML is based on Unicode

• SGML document not understandable without DTD


XML Standard

• markup language: XML• linking language: XLink, XPointer• formatting language: XSL/XSLT


2.4.3.1 XML language

simplification of SGML:

• start tag and end tag always must be present• special form: combined start-end tag:

e.g. <br/>, <img src="icon.gif"/>• DTD not always required:

well-formed XML: syntactically correct XMLvalid XML: XML-document satisfies specified DTD

• element names: case matters, underscore and colonin names allowed

• many special cases from SGML forbidden


DTD

<!ENTITY % xhtml SYSTEM "xhtml-1.0-strict.dtd" >

%xhtml;

<!ELEMENT project (projecttitle,

shortdesc,

logo*,

fieldofoperation,

timeperiod?,

contactpersons,

involvedpersons?,

sponsoredby?,

participatinginstitutes?,

description,

publicationlist?,

notes?,

doccreator) >

<!ELEMENT projecttitle (langtext+) >

<!ATTLIST projecttitle state (work|closed) "closed">

<!ATTLIST projecttitle workgroup (ir|issi) #REQUIRED>

<!ELEMENT shortdesc (langtext+) >

<!ELEMENT logo (#PCDATA) > 

<!ATTLIST logo align (left|right) #IMPLIED

width %Length; #IMPLIED

height %Length; #IMPLIED >

<!ELEMENT referenceno (#PCDATA) >

<!ELEMENT fieldofoperation (langtext+) >

<!ELEMENT sponsoredby (sponsor+) >

<!ELEMENT sponsor (langtext+ | weblink) >

<!ELEMENT timeperiod (langtext+ |

(startdate, enddate)) >

<!ELEMENT startdate (day, month, year) >


<!ELEMENT enddate (day, month, year) >

<!ELEMENT day (#PCDATA) >

<!ELEMENT month (#PCDATA) >

<!ELEMENT year (#PCDATA) >

<!ELEMENT contactpersons (personnel+) >

<!ELEMENT involvedpersons (personnel+) >

<!ELEMENT personnel (langtext+) >

<!ELEMENT participatinginstitutes (institute+) >

<!ELEMENT institute (langtext+ | weblink) >

<!ELEMENT description (langflow+) >

<!ELEMENT publicationlist (publication+) >

<!ELEMENT publication (langtext+) >

<!ELEMENT notes (langflow+) >

<!ELEMENT doccreator EMPTY>

<!ELEMENT weblink (url, linkdescription, langtext*)>

<!ELEMENT url (#PCDATA) >

<!ELEMENT linkdescription (langtext+) >

<!ELEMENT langtext %Inline; >

<!ELEMENT langflow %Flow; >


Example document

<?xml version="1.0" encoding="ISO-8859-1" ?>

<!DOCTYPE project SYSTEM

"/services/www/xml/dtd/project.dtd">

<project>

<projecttitle state="work" workgroup="ir">

<langtext>

MIND

</langtext>

</projecttitle>

<shortdesc>

<langtext>

Resource Selection and Data Fusion for

Multimedia International Digital Libraries

</langtext>

</shortdesc>

<logo align="right">mast2_sm.gif</logo>

<fieldofoperation>

<langtext>Information Retrieval</langtext>

</fieldofoperation>

<timeperiod>

<startdate>

<day>01</day>

<month>02</month>

<year>2001</year>

</startdate>

<enddate>

<day>31</day>

<month>07</month>

<year>2003</year>

</enddate>

</timeperiod>

<contactpersons>

<personnel>

<langtext>

<a href="/staff/members/nottelma.html">

Dipl.-Inform. Henrik Nottelmann</a>

</langtext>

</personnel>

</contactpersons>

<sponsoredby>

<sponsor>

<langtext>EU FP5</langtext>

</sponsor>

</sponsoredby>

<participatinginstitutes>

<institute>

<weblink>

<url>http://www.strath.ac.uk/</url>

<linkdescription>

<langtext>

University of Strathclyde

</langtext>

</linkdescription>

</weblink>

</institute>

<institute>

<weblink>

<url>http://ls6-www.informatik.uni-dortmund.de

</url>

<linkdescription> <langtext>

University of Dortmund

</langtext>


</linkdescription>

</weblink>

</institute>

</participatinginstitutes>

<description>

<langflow>

<p> This research addresses problems associated

with the emergence of thousands of

heterogeneous multimedia Digital libraries...

</p>

</langflow>

</description>

<notes>

<langflow >

<ul>

<li><a href="internal/index.html">

Internal pages</a></li>

</ul>

</langflow>

</notes>

<publicationlist>

<publication>

<langtext>

<a href="overview/mind-overview.html">

MIND Overview slides

</a>

</langtext>

</publication>

</publicationlist>

<doccreator/>

</project>


XSLT

transformation of XML documents(e.g. from XML into HTML)

similar to DSSSL, but in XML syntax

XSLT-Stylesheet =frame + set of transformation rules

<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns="http://www.w3.org/TR/REC-html40">

<xsl:output method="html"/>

<xsl:template match="...">

...

</xsl:template>

...

</xsl:stylesheet>


Some XSLT elements

<xsl:template>

specifies a template rulematch attribute identifies source node(s) to which rule applies

<xsl:if>

test attribute specifies an expression:if true, content template is instantiated

<xsl:choose>

selects one among a number of possible alternative child ele-ments <xsl:when> and <xsl:otherwise>

<xsl:when>

if expression specified by test attribute is true, content templateis instantiated

<xsl:text>

contains literal data to be included in the output


A small example

<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE brief SYSTEM "brief.dtd">

<brief>

<anrede geschlecht="f" sozial="du">Nora</anrede>

<text>habe gerade den Ulysses beendet. Mal sehen,

wann der in den USA gedruckt werden darf...</text>

<gruss>J</gruss>

</brief>


Stylesheet

<xsl:template match="/">

<html>

<body>

<xsl:apply-templates/>

</body>

</html>

</xsl:template>

<xsl:template match="anrede">

<p>

<xsl:choose>

<xsl:when test="@sozial=’du’">

<xsl:text>Liebe</xsl:text>

<xsl:if test="@geschlecht=’m’">

<xsl:text>r</xsl:text>

</xsl:if>

<xsl:text> </xsl:text>

</xsl:when>

<xsl:when test="@sozial=’sie’">

<xsl:choose>

<xsl:when test="@geschlecht=’m’">

<xsl:text>Sehr geehrter Herr </xsl:text>

</xsl:when>

<xsl:when test="@geschlecht=’m’">

<xsl:text>Sehr geehrte Frau </xsl:text>

</xsl:when>

</xsl:choose>

</xsl:when>

</xsl:choose>



<xsl:text>,</xsl:text>

</p>

</xsl:template>

<xsl:template match="text | gruss">

<p>


</p>

</xsl:template>


XSL stylesheet for project page

<?xml version="1.0" encoding="ISO-8859-1" ?>

<xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns="http://www.w3.org/TR/REC-html40"

result-ns=""

version="1.0"

default-space="strip"

indent-result="yes">

<xsl:output method="html" encoding="iso-8859-1"/>

<xsl:param name="mailto"/>

<xsl:param name="fullname"/>

<xsl:param name="date"/>

<xsl:param name="lang"/>

<xsl:param name="url"/>

<xsl:include href="ls6common.xsl"/>

<xsl:template match="/">

<html>

<head>

<xsl:apply-templates

select="/project/projecttitle" mode="head"/>

<meta name="description">

<xsl:attribute name="content">

University of Dortmund,

Department of Computer Science (Chair VI):

<xsl:value-of

select="/project/projecttitle/langtext"/>,

<xsl:value-of

select="/project/shortdesc/langtext"/>


</xsl:attribute>

</meta>

</head>

<body bgcolor="white">


</body>

</html>

</xsl:template>

<xsl:template match="project">

<xsl:if test="//projecttitle[@workgroup=’ir’]">

<xsl:call-template name="navbar-top">

<xsl:with-param name="upurl">

/ir/projects.html.en</xsl:with-param>

<xsl:with-param name="upname">

IR Projects</xsl:with-param>

</xsl:call-template>

</xsl:if>

<xsl:if test="//projecttitle[@workgroup=’issi’]">

<xsl:call-template name="navbar-top">

<xsl:with-param name="upurl">

/issi/projects.html.en</xsl:with-param>

<xsl:with-param name="upname">

ISSI Projects</xsl:with-param>

</xsl:call-template>

</xsl:if>


</xsl:template>

<xsl:template match="projecttitle" mode="head">

<title>

<xsl:apply-templates select="langtext" mode="head"/>

</title>


</xsl:template>

<xsl:template match="projecttitle">

<h1>

<xsl:apply-templates select="langtext"/>

</h1>

<xsl:call-template name="hrule"/>

</xsl:template>

<xsl:template match="shortdesc">

<em><xsl:apply-templates/></em>

<br/>

</xsl:template>

<xsl:template match="logo">

<img src="{.}">

<xsl:if test="@width">

<xsl:attribute name="width">

<xsl:value-of select="@width"/></xsl:attribute>

</xsl:if>

<xsl:if test="@height">

<xsl:attribute name="height">

<xsl:value-of select="@height"/></xsl:attribute>

</xsl:if>

<xsl:if test="@align">

<xsl:attribute name="align">

<xsl:value-of select="@align"/></xsl:attribute>

</xsl:if>

</img>

</xsl:template>

<xsl:template match="referenceno">

<p> <h3>Reference Number</h3>



</p>

</xsl:template>

<xsl:template match="fieldofoperation">

<p> <h3>Field of operation</h3>


</p>

</xsl:template>

<xsl:template match="timeperiod">

<p> <h3>Project Duration</h3>

From <xsl:apply-templates select="startdate"/>

until <xsl:apply-templates select="enddate"/>

</p>

</xsl:template>

<xsl:template match="startdate|enddate">

<xsl:apply-templates select="day"/>.

<xsl:apply-templates select="month"/>.

<xsl:apply-templates select="year"/>

</xsl:template>

<xsl:template match="day|month|year">


</xsl:template>

<xsl:template match="contactpersons">

<p> <h3>Contact Persons</h3>

<ul>


</ul>

</p>


</xsl:template>

<xsl:template match="involvedpersons">

<p> <h3>Involved Persons</h3>

<ul>


</ul>

</p>

</xsl:template>

<xsl:template match="sponsoredby">

<p> <h3>Sponsored by</h3>

<ul>


</ul>

</p>

</xsl:template>

<xsl:template match="publicationlist">

<p> <h3>Publications</h3>

<ul>


</ul>

</p>

</xsl:template>

<xsl:template match="publication">

<li><xsl:apply-templates/></li>

</xsl:template>

<xsl:template match="sponsor">


</xsl:template>


<xsl:template match="participatinginstitutes">

<p> <h3>Participating Institutes</h3>

<ul>


</ul>

</p>

</xsl:template>

<xsl:template match="institute">


</xsl:template>

<xsl:template match="description">

<h3>Description</h3>


</xsl:template>

<xsl:template match="notes">

<h3>Notes</h3>


</xsl:template>

<xsl:template match="linkdescription">


</xsl:template>

<xsl:template

match="a[@href]|A[@HREF]|a[@name]|A[@NAME]|A[@href]">

<xsl:if test="@href">

<a href="{@href}"><xsl:apply-templates/></a>

</xsl:if>

<xsl:if test="@HREF">


<a href="{@HREF}"><xsl:apply-templates/></a>

</xsl:if>

<xsl:if test="@name">

<a name="{@name}"><xsl:apply-templates/></a>

</xsl:if>

<xsl:if test="@NAME">

<a name="{@NAME}"><xsl:apply-templates/></a>

</xsl:if>

</xsl:template>

<xsl:template match="personnel">


</xsl:template>

</xsl:stylesheet>


HTML output

<html xmlns="http://www.w3.org/TR/REC-html40">

<head>

<META http-equiv="Content-Type"

content="text/html; charset=iso-8859-1">

<title> MIND </title>

<meta name="description"

content="University of Dortmund,

Department of Computer Science (Chair VI): MIND,

Resource Selection and Data Fusion for Multimedia

International Digital Libraries ">

</head>

<body bgcolor="white">

<table width="100%">

<tr>

<td width="10%"></td><td width="80%" align="center">

[<a href="/ir/projects.html.en">IR Projects</a>]

[<a href="/ir/index.html.en">IR</a>]

[<a href="/issi/index.html.en">IS and Security</a>]

</td><td width="10%" align="right">

<a href="index.html.de">(deutsch)</a></td>

</tr>

</table>

<h1> MIND </h1>

<hr noshade size="2" width="100%">

<em>Resource Selection and Data Fusion for

Multimedia International Digital Libraries</em><br>

<img src="mast2_sm.gif" align="right">

<p> <h3>Field of operation</h3>

Information Retrieval </p>

<p> <h3>Project Duration</h3>


From 01. 02. 2001 until 31. 07. 2003</p>

<p> <h3>Contact Persons</h3>

<ul> <li>

<a href="/staff/members/nottelma.html">

Dipl.-Inform. Henrik Nottelmann</a> </li>

</ul> </p>

<p><h3>Sponsored by</h3><ul><li>EU FP5</li></ul></p>

<p> <h3>Participating Institutes</h3>

<ul><li><a href="http://www.strath.ac.uk/">

University of Strathclyde </a> </li>

<li> <a

href="http://ls6-www.informatik.uni-dortmund.de">

University of Dortmund </a> </li>

</ul> </p>

<h3>Description</h3>

<p xmlns="">

This research addresses problems associated with

the emergence of thousands of heterogeneous

multimedia Digital libraries ... </p>

<h3>Notes</h3>

<ul xmlns="">

<li>

<a href="internal/index.html"

xmlns="http://www.w3.org/TR/REC-html40">

Internal pages</a>

</li>

</ul>

<p>

<h3>Publications</h3>

<ul>

<li>

<a href="overview/mind-overview.html">

MIND Overview slides </a>


</li>

</ul>

</p>

<hr noshade size="2" width="100%">

<address>

<a href="mailto:[email protected]">

Henrik Nottelmann</a>

<[email protected]>,

20. March 2001</address>

</body>

</html>


2.4.3.2 XLink: XML linking language

linking possible in any XML-DTD→ no special linking elements

linking via special attribute (for arbitrary elements):xml:link

terminology:

resource: adressable service or unit of information that partic-ipates in a link

link: explicit relationship between two or more resources

locator: data, provided as part of a link, which identifies aresource(attribute HREF)

inline link: link which serves as one of its own resourcese.g. A in HTML

out-of-line link: link whose content does not serve as one ofthe link’s resources


Simple links

• one-directional

• mostly inline

<mylink xml:link="simple" title="Citation"

href="http://www.xyz.com/xml/foo.xml"

show="new" content-role="Reference">

as discussed in Smith(1997)</mylink>

<!ELEMENT mylink (#PCDATA)>

<!ATTLIST mylink

xml:link CDATA #FIXED "simple"

href CDATA #REQUIRED

content-role CDATA #IMPLIED

>


Extended linksusually out-of-line links

capabilities:

• enable outgoing links in read-only documents

• create links to and from resouces in other formats

• applying and filtering sets of relevant links on demand

• enable other advanced hypermedia capabilities(e.g. via attribute ROLE)

example out-of-line extended link:

<commentary xml:link="extended" inline="false">

<locator href="smith2.1" role="Essay"/>

<locator href="jones1.4" role="Rebuttal"/>

<locator href="robin3.2" role="Comparison"/>

</commentary>


definitions:

<!ELEMENT extended ANY>

<!ATTLIST extended

xml:link CDATA #FIXED "extended"

%link-semantics.att;

%local-resource-semantics.att;

>

<!ELEMENT locator ANY>

<!ATTLIST locator

xml:link CDATA #FIXED "locator"

%locator.att;

%remote-resource-semantics.att;

>


<!ENTITY % locator.att

"href CDATA #REQUIRED"

>

<!ENTITY % link-semantics.att

"inline (true|false) ’true’

role CDATA #IMPLIED"

>

<!ENTITY % local-resource-semantics.att

"content-role CDATA #IMPLIED

content-title CDATA #IMPLIED"

>

<!ENTITY % remote-resource-semantics.att

"role CDATA #IMPLIED

title CDATA #IMPLIED

show (embed|replace|new) #IMPLIED

actuate (auto|user) #IMPLIED

behavior CDATA #IMPLIED"

>


Link behaviour

SHOW attribute:describes display behaviour on traversal of link

• embed: designated resource embedded in body of currentresource

• replace: designated resource replaces current resource

• new: designated resource displayed in a new window

ACTUATE attribute:when should traversal of link occur?

• auto: retrieve resource when current resource is encoun-tered

• user: present resource only upon request from user

all combinations of SHOW and ACTUATE values are possible!


2.4.3.3 XPointer: XML Pointer Language

for locators in XLink

• reference to whole document

• reference to named element in document

• reference to unnamed element in read-only document

locator syntax

Locator ::= URI

| Connector ( XPointer | Name)

| URI Connector (XPointer | Name)

Connector ::= ’#’ | ’|’

URI ::= URIchar*


XPointer syntax

• location of individual nodes in element tree

• spanning locations across several elements

• arbitrary set of elements

syntax:

XPointer ::= AbsTerm ’.’ OtherTerms

| AbsTerm

| OtherTerms

OtherTerms ::= OtherTerm

| OtherTerm ’.’ OtherTerm

OtherTerm ::= RelTerm

| SpanTerm

| AttrTerm

| StringTerm


Absolute location terms

AbsTerm ::= ’root()’ | ’origin()’ | IdLoc | HTMLAddr

IdLoc ::= ’id(’ Name ’)’

HTMLAddr ::= ’html(’ SkipLit ’)’

• root: root element of containing resource

• origin: application-dependent

• id: element with named id value

• html(NAMEVALUE): A element in HTML withNAME=NAMEVALUE


Relative location terms

RelTerm ::= Keyword? Arguments

Keyword ::= ’child’

| ’descendant’

| ’ancestor’

| ’preceding’

| ’following’

| ’psibling’

| ’fsibling’

example:child(2,section).child(1,subsection)


relative location term arguments

• selection by instance number

• selection by node type

• selection by attribute


Spanning location termdata between two XPointers:

SpanTerm ::= ’span(’ XPointer ’,’ XPointer ’)’

examples:

id(a23).span(child(1),child(3))

span(id(sec2.1).child(-1,P),id(sec2.2).child(1,P))

Attribute location termreturns value of named attribute

String location termstring match


2.5 Images

2.5.1 Media type

Non-temporal: Image

• Representation

– Color model: CIE, RGB, HSV, CMYK, YUV– Channels: alpha?, number, depth– Interlacing– Indexing– Pixel aspect ratio– Compression

• Operations

– Editing– Point operations: thresholding, color correction– Filtering– Compositing– Geometric transformations: displacing, rotating,

mirroring, scaling, skewing, warping– Conversion: color separation, resampling


2.5.2 Color

2.5.2.1 Human perception

visible light: λ ∈ [380nm . . . 780nm]([violet . . . red])

retina:

• rods for brightness

• cones for chromaticity (color)

three types of cones:

• yellow: λx = 600nm

• green: λy = 535nm

• blue: λz = 445nm

Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr


ϕ(λ): wavelength distribution of source light

k: normalization factor

x(λ), y(λ), z(λ): eye response functions

X, Y , Z: perceived color

X = k

∫ϕ(λ)x(λ)dλ

Y = k

∫ϕ(λ)y(λ)dλ

Z = k

∫ϕ(λ)z(λ)dλ

CIE Yxy color system

x =X

X + Y + Z

y =Y

X + Y + Z


perceived visual distance(magnified by 10)

→ need for equidistant color spaces


2.5.2.2 Color spaces

RGB three basic colours: red, green, blue

YUV luminance (as in b/w TV) +2 chrominance channels

YIQ used for NTSC TV

YCrCb used in JPEG digital image standard

CMY(k) cyan, magenta, yellow (black) used for printers

HSV color model with (approximately) equidistant colors

mapping RGB → YUV:Y = 0.30R + 0.59G + 0.11BU = (B − Y ) · 0.493V = (R − Y ) · 0.877

mapping RGB → YIQ:Y = 0.30R + 0.59G + 0.11BI = 0.60R − 0.28G − 0.32BQ = 0.21R − 0.52G + 0.31B

mapping RGB → YCrCb:Y = 0.30R + 0.59G + 0.11BCr = 0.50R − 0.42G − 0.08BCb = −0.17R − 0.33G + 0.50B

mapping RGB → CMY(K):C = 1 − RM = 1 − GY = 1 − BK = min(C,M, Y )


mapping RGB → HSV:

v = max(r, g, b), s = v - min(r,g,b)v

let r = v - rv - min(r,g,b)

, g =v - g

v - min(r,g,b), b = v - b

v - min(r,g,b)

6h =

5 + b if r = max(r, g, b) and g = min(r, g, b)1 − g if r = max(r, g, b) and g 6= min(r, g, b)1 + r if g = max(r, g, b) and b = min(r, g, b)

3 − b if g = max(r, g, b) and b 6= min(r, g, b)3 + g if b = max(r, g, b) and r = min(r, g, b)5 − r otherwise


vc

rg

b,

,(

)=

R

G

B

S

V

H

wc

Tvc

⋅=

r

gb

wc

hs

v,

,(

)=


2.5.3 GIF format

(graphics interchange format, proprietary standards of Com-puServe)

• lossless compression of image data

• restricted to 256 colors

structure of a GIF file:

• GIF signature:“GIF87a” / “GIF89a”

• screen descriptor

– width– height– color resolution (1. . . 8 bits)– background color

• global color map:table of RGB values

• sequence of images

• GIF terminator


strucure of an image:

• image descriptor (image position+size)

• local color map

• raster data:sequence of color index values,compressed by patented variation of LZW

sequence of raster data: sequential / interlaced rows


2.5.4 PNG format

(portable network graphics)non-proprietary standard proposed by W3C

GIF features retained in PNG:

• Indexed-color images of up to 256 colors.

• Streamability:files can be read and written serially(file format usable as communications protocol)

• Progressive display

• Transparency(portions of the image can be marked as transparent),

• Ancillary information:textual comments and other data can be stored within theimage file.

• Complete hardware and platform independence.

• Effective, 100% lossless compression.


New features of PNG:

• Truecolor images of up to 48 bits per pixel.

• Grayscale images of up to 16 bits per pixel.

• Full alpha channel (general transparency masks).

• Image gamma information(automatic display of images with correct bright-ness/contrast)

• Reliable, straightforward detection of file corruption.

• Faster initial presentation in progressive display mode.


2.5.5 JPEG Formats

2.5.5.1 Requirements

• high compression rate vs. image fidelity

• applicable to any kind of continuous-tone digital sourceimage

• tractable computational complexity

• modes of operation:

1. sequential encoding(left-to-right, top-to-bottom)

2. progressive encodingencoding in multiple scans for low-bandwidth com-munication(user watches image built up in multiple course-to-clear passes)

3. lossless encodingexact recovery of source image possible(although low compression compared to lossymodes)

4. hierarchical encodingencoding at multiple resolutions

2.5.5.2 Processing steps for DCT-based coding

DCT: discrete cosine transform

here: consider single component only= greyscale image


FDC

TQ

uant

izer

Ent

ropy

Enc

oder

Spec

ific

atio

nT

able

Spec

ific

atio

nT

able

Imag

e D

ata

Sour

ceC

ompr

esse

dIm

age

Dat

a

8x8

bloc

ksD

CT

-Bas

ed E

ncod

er


Spec

ific

atio

nT

able

Spec

ific

atio

nT

able

Imag

e D

ata

Rec

onst

ruct

edC

ompr

esse

dIm

age

Dat

a

Ent

ropy

Dec

oder

Deq

uant

izer

IDC

T

DC

T-B

ased

Dec

oder


8*8 DCTcompression of a stream of 8*8 blocks of image samples

• group image samples into 8*8 blocks

• shift from unsigned integers to signed integers:[0, 2p − 1] → [−2p−1, 2p−1 − 1]

• input to the forward DCT

F (u, v) =1

4C(u)C(v)

(7∑

x=0

7∑y=0

f(x, y)

· cos (2x + 1)uπ

16cos

(2y + 1)vπ

16

)

u, v = 0 . . . 7

C(u), C(v) =

{1/√

(2) for u, v = 01 otherwise


FDCT:64-point discrete signals→ 64 orthogonal basis signals(amplitudes of cosine functions)

F (0, 0) – DC coefficient:

1

4

1√(2)

1√(2)

(7∑

x=0

7∑y=0

f(x, y) · 1

16

1

16

)

other 63 coefficients – AC coefficients

little variation in 8*8 block→ most spatial frequencies with zero amplitude→ no encoding necessary→ compression


inverse DCT:maps 64 DCT coefficients onto 8*8 image block

f(x, y) =1

4

(7∑

u=0

7∑v=0

C(u)C(v)F (u, v)

· cos (2x + 1)uπ

16cos

(2y + 1)vπ

16

)


problems

theoretically:DCT is 1:1 mapping of 64 point vectors between image andfrequency domain

practically:loss through

• quantization

• computation of transcendental functions


Quantizationmapping of FDCT output (F (u, v), u, v = 0 . . . 7)onto integers

quantization table:Q(u, v), u, v = 0 . . . 7, 1 ≤ Q(u, v) ≤ 255

quantization:

• goal: achieve further compression

• represent DCT coefficients with minimum necessary pre-cision(and minimum effect on visual image quality)

• lossy, n : 1 mapping

F Q(u, v) = IntegerRound

(F (u, v)

Q(u, v)

)

dequantization:

F ′(u, v) = F Q(u, v) · Q(u, v)


DC coding and zig-zag sequence

• separate treatment of DC and AC coefficients

• DC:strong correlation between coefficients of adjacent 8*8blocks→ differential encoding

• AC:ordering in zig-zag sequencelow frequency coefficients (mostly nonzero) before high-frequency coefficients (mostly zero)(facilitate entropy coding)

DC = DC - DCl l l-1

DC DC

DC ACAC

AC AC 7770

01 07

ll-1

...... block blockl-1 l

Differential DC encoding Zig-zag sequence


Entropy coding

lossless compression of DCT coefficients

1. convert zig-zag sequence of quantized coefficients into in-termediate sequence of symbols(with zero suppression)

2. convert symbols into data stream with no externally iden-tifiable boundaries(Huffman coding / arithmetic coding)


Compression and picture quality

input: typically 8 bits/pixel per component(12 bits/pixel for special applications, e.g. medical images)

1 chrominance sample/4 luminance samples1 luminance component + 2 chrominance components→∑

12 bits/pixel

output:

• 0.25–0.5 bits/pixel: moderate to good quality

• 0.5–0.75 bits/pixel: good to very good quality

• 0.75–1.5 bits/pixel: excellent quality

• 1.5–2.0 bits/pixel: indistinguishable from the original


Luminance sample

Chrominance sample

Block Edge


2.5.5.3 Predictive lossless coding

difficult with DCT→ independent method for lossless coding

typically 2:1 compression


Imag

e D

ata

Sour

ceC

ompr

esse

dIm

age

Dat

aSp

ecif

icat

ion

Tab

le

Ent

ropy

Enc

oder

Los

sles

s E

ncod

er

Pred

icto

r


C BA X

Selection value Prediction

0 no prediction1 A2 B3 C4 A + B - C5 A + ((B - C)/2)6 B + ((A - C)/2)7 (A + B)/2


2.5.5.4 Multiple-Component images

Source image formatsJPEG poses no restrictions onpixel aspect ratio,color space,image acquisition characteristics

JPEG source Image Model

...

C CC

12

N

Y

X

(a) Source Image withmultiple components


yi

xi

bottom

right

(b) Characteristics of anImage component

Ci top

left

samples

• image contains 1 . . . 255 components(spectral bands, channels)

• component = rectangular area of samples

• sample = unsigned integer with p bits

• p fixed for all samples of an image

• p = 8 or p = 12 for DCT coding

• p = 2 . . . 12 for predictive coding

xi, yi sample dimensions of ith component


Hi, Vi relative horizontal/vertical sampling factor1 ≤ Hi, Vi ≤ 4

X, Y overall image dimensionsX = maxi(xi), Y = maxi(yi), X, Y ≤ 216

encoder stores X, Y and Hi, Vi

decoder:

xi =⌈X · Hi

Hmax

⌉, yi =

⌈Y · Vi

Vmax

⌉


Entropy order and interleavinginterleaving of data from multiple components

data unit=

• sample in predictive coding

• 8*8 block in DCT coding

order of compressed data units:generalization of raster-scan order

noninterleaved data ordering:

top

bottom

left right


interleaved data ordering

• component Ci partitioned into rectangular regions Hi*Vi

• regions ordered left-to-right, top-to-bottom

• data units within region ordered left-to-right, top-to-bottom

• MCU: minimum coded unit=smallest group of interleaved data units

1 2C : H = 2 , V = 2 C : H = 2 , V = 1

0

11 2 3 540 0 2

0

1 1

2

3

3

4 5

0 1 0 1 22

0 0

1 1

3

2

3 4C : H = 1 , V = 2 C : H = 1 , V = 1

restrictions:


• maximum number of components interleaved: 4

• maximum number of data units in an MCU: 10


Multiple tablescomponent-specific tables for quantization and entropy coding

TableSpec.1

TableSpec. 2

EncodingProcess

CompressedImage Data

A

B

C


2.5.5.5 Baseline and other DCT sequential codecs

components of sequential coding:

• FDCT

• quantization

• entropy coding

• multiple-component control

variations:

• sample precisions: 8 bit / 12 bit

• Huffman / arithmetic coding

baseline sequential coding:

• 8 bit samples

• Huffman coding

• max. two sets of Huffman tables


2.5.5.6 DCT progressive mode

uses FDCT and quantization as with sequential coding

difference:each image component encoded in multiple scans

• requires image-sized buffer memory between quantizerand entropy encoder

• stores image as quantized DCT coefficients

• buffered coefficients partially encoded in multiple scans

• two complementary methods

– spectral selection:only specific band of coefficients from zig-zag se-quence encoded in a scan

– successive approximation:coefficients within current band encoded with lim-ited accuracy in a scan


(a) Image componentas quantized

DCT coefficients

0

12

0

1

62

63

7 6 1

LSBMSB


(b) Sequentialencoding

Sending

0

1

2

1 st scanSending

3

4

5


7 6 5 4

Sending

2 nd scan

3

MSB

1 st scan


2.5.5.7 Hierarchical mode of operation

“pyramidal” encoding of an image at multiple resolutions

subsequent encoding uses double resolution(horizontal/vertical/both)

procedure:

1. filter and down-scale original image by desired power of 2in each dimension

2. encode reduced-size image by sequential DCT / progres-sive DCT / lossless coding

3. decode reduced-size image, then interpolate and oversam-ple it by 2 (horizontally/vertically/both)

4. use up-sampled image as prediction of the original,encode difference image as above

5. repeat steps 3. and 4. until full resolution has been en-coded

application of hierarchical encoding:access to high-resolution images for low-resolution-devices withlimited buffer capacity


2.5.5.8 Coded representation for compressed im-ages

• interchange format syntax

• tables stored with the image / default tables / referencedtables


2.5.5.9 JPEG2000

capabilities supported:

• resolution scalability:arbitrary number of resolution levels

• region of interest coding:certain parts of image coded in better quality

• SNR (signal-noise ration) scalability

• random access capability

• multi-component imagery

• arbitrary wavelet decompositions

• arbitrary wavelet kernels

• arbitrary bit-depth images

• tiling

– any number of tiles– rate-control performed jointly over all tiles

• frames

– similar to tiles– coder operates independently in frames


2.5.6 Fractal image compression

2.5.6.1 Introduction

Input Image Output Image

Copy machine


Initial Image First Copy Second Copy Third Copy

(a)

(b)

(c)

final attractor independent of starting image -depends only on transformation

affine transformation:

wi

[xy

]=

[ai bi

ci di

] [xy

]+

[ei

fi

]


some affine transformations

each image is transformed copy of itself→ image must have detail at every scale→ images are fractals

fractal image compression:store images as collections of transformations e.g. fern


advantage: multiresolution representation of images

fractal vs. pixel-based representation:


2.5.6.2 Iterated function systems

Contractive transformations

transformation contractive iff for any two points P1, P2:

d(w(P1), w(P2)) < s · d(P1, P2)

(for s < 1)

distance in the plane:

d(P1, P1) =√

(x2 − x1)2 + (y2 − y1)2

example contractive transformation

wi

[xy

]=

[12

00 1

2

] [xy

]


iterated function system:collection of contractive transformations

{wi :

<−>

2

→ R2|i = 1, . . . , n}maps plane R2 to itself

collection of transformations defines map

W (·) =

n⋃i=1

wi(·)


f0 input image

f1 = W (f0)

f2 = W (W (f0)) = W ◦2(f0)

contractive mapping fixpoint theorem:

|W | ≡ f∞ = limn→∞

W ◦n(f0)

attractor is independent of f0 !


2.5.6.3 Self-similarity in images

grey-scale images as functions f(x, y)


metric on images

δ(f, g) = sup(x,y)∈I2

|f(x, y) − g(x, y)|


natural images are not exactly self-similar


2.5.6.4 Partitioned iterated function systems

partitioned copying machine

specification of copying machine:

1. # copies

2. affine transformation (for each copy)

3. contrast and brightness adjustment (for each copy)

4. mask for selecting part of the original to be transformed(for each copy, Di → Ri)

specification of transformation wi

wi

[xyz

]=

[ai bi 0ci di 00 0 si

][xyz

]+

[ei

fi

oi

]

si controls contrast (s < 1)oi affects brightness


2.5.6.5 Encoding images

ideal goal of fractal image compression:satisfy fixed point equation

f = W (F ) = w1(f) ∪ w2(f) ∪ · · ·wN(f)

→ seek partition of f into pieces s.th. f.p.e. is fulfilled

approximation:

f ≈ f ′ = W (f ′) ≈ W (f) = w1(f) ∪ w2(f) ∪ · · ·wN (f)

minimize quantities

δ(f ∩ (Ri × I), wi(f)) i = 1, . . . , N

1. find good choice for Di

2. find good contrast and brightness settings si and oi


example:

• 256*256 pixels input image

• output ranges: Ri: consider nonoverlapping 8*8 sub-squares (1024)

• input ranges: Di: overlapping 16*16 subsquares (241 ·241 = 58 0581)

• 8 ways for mapping square → square(4 rotations, flip + 4 rotations)

• estimate si and oi using least squares regression


compression

input image: 65536 bytes

compressed image: 3968 bytes

→ compression factor: 16.5


2.5.6.6 Partitioning images

image areas requiring different levels of detail →vary size of input ranges Ri

quadtree partitioningdivide square into 4 sub-squares


HV-partitioningdivide rectangle either horizontally or vertically

R21R

1st Partition 2nd 3rd and 4th Partitions

(a) (b) (c)


triangula

rpartitio

nin

grecta

ngle→

2tria

ngles,

triangle→

4tria

ngles

(connect

partitio

nin

gpoin

tson

each

side)


2.6 Audio

2.6.1 Introduction

human perception: 20 Hz – 20 kHz

digital audio:

• sample audio input in regular, discrete intervals

• quantize sampled values

digital audio data: sequence of binary values representingnumber of quantizer levels

pulse code modulation:represent each sample with an independent code word


PCM

VA

LU

ES

DIG

ITA

L S

IGN

AL

PRO

CE

SSIN

GD

IGIT

AL

-TO

-AN

AL

OG

CO

NV

ER

SAT

ION

AN

AL

OG

-TO

-DIG

ITA

LC

ON

VE

RSA

TIO

N

AN

AL

OG

AU

DIO

INPU

TPC

MV

AL

UE

S

AN

AL

OG

AU

DI

OU

TPU

T

0011

0111

000.

..11

0011

0010

0...


Nyquist theory:time-sampled signal can represent signals up to half the samplingrate

typical sampling rates:

8 kHz for speech

44.1 kHz for music (audio CD)

quantizer levels: power of 2each bit reduces signal-to-noise ratio by 6 db

typical # bits/sample:

8 (= 48 dB) speech, low-quality audio

16 (= 96 dB) high-quality audio (audio CD)

data rates for uncompressed audio:8 . . . 176 kB/sec (176 for audio CD, stereo)


2.6.2 Media type

Temporal: Digital audio

• Representation

– Sampling frequency– Sample size and quantization: linear, nonlinear– Number of channels (tracks): 2, 4, 16, 32– Interleaving– Negative samples: one or two’s complement– Encoding: PCM, ADPCM

• Operations

– Storage– Retrieval– Editing: cross-fade, play list– Effects and filtering: delay, equalization, normaliza-

tion, noise reduction, time compression/expansion,pitch shifting, stereoization, acoustic environments

– Conversion


2.6.3 Formats

2.6.3.1 µ-Law Audio Compression

logarithmic quantization

• represents low-amplitude audio samples with greater ac-curacy

• →uniform signal-to-noise ratio over range of amplitudes

• 8 bits/sample represent 14 bits in linear sampling

• used in ISDN telephone (with 8kHz sampling)

x input signal, |x| ≤ 1

y output signal

µ = 255

y =

{255 − 127

ln(1+µ)· ln(1 + µ · |x|) for x ≥ 0

127 − 127ln(1+µ)

· ln(1 + µ · |x|) for x < 0


2.6.3.2 ADPCM

adaptive pulse code modulation

adjacent samples have similar values→ encode PCM value of the difference

(ADAPTIVE)DEQUANTIZER

(ADAPTIVE)PREDICTOR

C[n] Dq[n]

Xp[n-1]

Xp[n]+

+

+

(b) ADPCM Decoder

(ADAPTIVE)QUANTIZER

(ADAPTIVE)PREDICTOR

(ADAPTIVE)DEQUANTIZER

+X[n]

Xp[n-1]

D[n] C[n]

Xp[n]

Dq[n]

+

+

(a) ADPCM Encoder

-

ADPCM coder can adapt to characteristics of audio signal


• change step size of quantizer

• change step size of predictor

different algorithms/standards, depending on

• adaptation possibilities

• side information

– quantizer/predictor step size– redundant contextual information (for error recov-

ery)

algorithms:

• IMA/ADPCM: Interactive Multimedia Association

• CCITT G.721 (32 kbps compressed data)

• CCITT G.723 (24 kbps compressed data)

• compact disc interactive (CD-I) audio compression algo-rithm


IMA/ADPCM Algorithm

• compression rate: 4:1

• 16 bits/sample → 4 bits/sample

simple predictor:predicted value = previous sample

quantizer4 bits output:signed multiples of current step size/4

adaptation

• quantizer adapts step size based on

– current step size– quantizer output of previous input

• based on table lookup

• no side information required

good error recovery


2.6.3.3 MPEG Audio

• lossy, but perceptually lossness compression

• 48 kHz sampling rate, 2*16 bits/sample

• compression rate: 6:1

• exploitation of auditory masking

��

��

SIGANLS ARE MASKEDREGION WHERE WEAKER

AM

PLIT

UD

E

FREQUENCY

STRONG TONAL SIGNAL


Layer I

• filter bank divides audio signal into 32 frequency bands

• 12 samples per band

• for each nonzero sample:

– bit allocation– scale factor

output of layer I:frame with 32 groups of 12 samples = 384 samples


Layer II

codes data in larger groups:frame with 3*12*32 samples

exploits common bit allocation and scale factors


Layer III

• alias reduction:modified discrete cosine transformation

• logarithmic quantization

• entropy coding (Huffman)

• bit reservoir for effects due to entropy coding

• noise allocation instead of bit allocation


Stereo redundancy coding

two types of coding:

• intensity stereo codingfor high frequencies:

– encode single summed signal for both channels– only independent scale factors

• middle/side stereo coding

– middle channel– + 2 side channels


2.7 Video

2.7.1 Basics

2.7.1.1 B/W TV

presentation in greyscale only (luminance)

European format:

• 625 lines

• 833 colums

• ratio width/height: 4:3

• 25 frames/second

bandwidththeoretically:

1 s / 25 frames/s / 625 lines/frame= 64 µs/line= 15625 Hz

b/w changes between every pair of pixels in a line→ 15625 Hz * 833/2 ≈ 6.5 MHz

in practice: 5 – 5.5 MHz


interlaced mode

50 half images/second


2.7.1.2 Colour TV

colour representations:

RGB three basic colours: red, green, blue

YUV luminance (as in b/w TV) +2 chrominance channels - used in PAL

YIQ used for NTSC


2.7.1.3 TV standards

NTSC National Television Systems Committee (USA)

– 30 images/second– 525 lines/image

PAL Phase alternating line (Germany)

– 25 images/second– 625 lines/image


HDTV High definition Television (forthcoming)

HD-MAC Europe: 1250 lines, 50 Hz (interlaced)

MUSE Japan: 1125 lines, 60 Hz

NTSC USA: 1040 lines, 60 Hz

digital TV

component-wise coding: 4:2:2emphasis on luminanceluminance sampling: 13.5 MHzchrominance sampling: 6.75 MHz→ 216 Mbps


2.7.1.4 Computer video

• non-interlaced display

• image rate: typically 70 Hz

• colour display:in RGB mode

a) with 24 bits/pixelb) via CLUT (colour lookup table)

8 or 16 bits/pixel → 256 or 65536 colours(out of 224)


2.7.2 Media types

Media type Temporal: Analog video

• Representation

– Frame rate– Number of scan lines– Aspect ratio, e.g., 4:3– Interlacing, e.g., 2:1 fields per frame– Quality, e.g., signal-to-noise ratio and image resolu-

tion– Component versus composite

• Operations

– Storage: Tapes – Type B or C, Betacam, U-matic,Hi8, S-VHS, VHS; Videodisc

– Retrieval: based on time codes– Synchronization: avoid timebase jitter and timebase

phase shift using sync generator, genlock, timebasecorrector

– Editing: cuts-only editing, A-B roll editing, edit de-cision list (EDL)

– Mixing: cut, fade, dissolve (cross-fade), wipe, tum-ble, wrapping, keying

– Conversion: scan converter, standards conversion


Media type Temporal: Digital video

• Representation

– Analog formats sampled: CCIR 601, digital compos-ite, CIF, QCIF, digital HDTV; synthesis, sampling

– Sampling rate– Sample size and quantization: linear, logarithmic– Data rate– Frame rate: 10, 15, 25, 30– Compression– Support for interactivity– Scalability: transmit scalability, receive scalability

• Operations

– Storage– Retrieval– Synchronization– Editing: tape based, nonlinear– Effects– Conversion


2.7.3 MPEG-1/2

MPEG-video requirements

Generic standard

• independence of particular application

• acceptable quality for bandwidth of 1.5 Mb/s(as with CD-ROM)


Applications

• digital storage medialow storage costs + sufficient bandwith (MPEG-1: 600MB/h = 1.5 Mb/s,MPEG-2: 1.8–4 GB/h = 0.5–1.1 MB/s)

– CD-ROM: 1.5 Mb/s– DVD: 1.1 MB/s– harddisc: ≥ 3 MB/s

• asymmetric applicationsfrequent decompression, compression only once

– electronic publishing

∗ education and training

∗ travel guidance

∗ videotext

∗ points of sale

– games– entertainment

• symmetric applicationsequal use of compression and decompression

– electronic publishing production– video mail– videotelephone– video conferencing


Features of the compression algorithm

• random access

– access to any frame– access time ≤ 0.5 s– access points:

information unit coded without reference to otherunits

• fast forward/reverse searches

– scan compressed bit stream– display selected pictures

• reverse playback

– for specific applications only– possible without extreme memory requirements

• audio-visual synchronization

– permanent resynchronization of audio and video– integration of multiple audio and video signals

• robustness to errors

• coding/decoding delay(limited according to specific application)videotelephone: 150 ms

• editabilitypossibility of constructing short editing units

• format flexibility

– raster size– frame rate

• cost tradeoffs

– decoding with small chipsets– real time encoding possible (1990)


Overview of the MPEG compression algorithm

quality requirements→ high compression rate→ interframe encoding

random access requirements→ intraframe coding


MPEG-1:

• block-based motion compensationfor temporal redundancy reduction

– causal (predictive) coding: P frames– noncausal (interpolative) coding: B frames

• DCT-based spatial redundancy coding(as in JPEG)


Temporal redundancy reduction

frame types:

• intra-frames (I)

– access points for random access– moderate compression

• prediction frames (P)

– coded with reference to a past (I or P) frame– used as reference for future P frames

• interpolation (bidirectional prediction) frames (B)

– reference to a past and a future P frame– never used as reference

reference always uses motion prediction

ratio I:P:B frames is application-specific


Forward prediction

1 2 3 4 5 6 7 8

I B B B P B B B

Bidirectional prediction

I

9

transmission order: I P B B B I B B B

FDCT QuantizationColorspace

converter

Entropyencoder

Colorspace

converter

FDCT

Entropyencoder

Reference

Errorterms

Moniorestimator

(RGB YUV)

(RGB YUV)

Compressed image data100111001 ...

Compressed image data100111001 ...+

+

-

I frame

P/B frame


motion compensation

matchBest

matchBest

3. Block B = (Block A + Block C)/22. Block B = Block C1. Block B = Block A

Block-Matching Technique

Previous frame

Future frame

Current frameA

C

B

• prediction

– local modelling of current picture as translation ofpicture at some previous time

– locality: amplitude and direction of displacementmay vary over the picture

• interpolation

– improves random access– reduces effect of errors– increases image quality


multiresolution technique:

– subsignal with low temporal resolution (1/3 . . . 1/2frame rate)

– full-resolution signal =interpolation of low-resolution signal + correctionterm

– interpolation uses combination of past and futurereferences (bidirectional)


bidirectional prediction

advantages:

• deals properly with areas not covered by prediction

• noise reduction by averaging between past and future ref-erence frames

• allows decoupling between prediction and coding(no error propagation)

• trade-off due to frequency of B frames:more B frames→ lower correlation of B frames with references,→ lower correlation between referencestypically: 10 B frames per seconde.g. I B B P B B P B B . . . I B B P B B


motion representation in B frames

macroblock: 16 * 16 pixels

predictor of a macroblock depends on reference frames:

x coordinate of picture element

mj1 motion vector relative to reference frame Ij

(motion estimation information)

prediction modes:

macroblock type predictor

intra I1(x) = 128

forward predicted I1(x) = I0(x + m01)

backward predicted I1(x) = I2(x + m21)

average I1(x) = 0.5[I0(x + m01) + I2(x + m21)]

prediction error in each case: I1(x) − I1(x)


motion estimation

computation of motion vectors:not specified in MPEG standard

typically:block-based matching technique, combined with cost funtion

Ic current frame

Ir reference frame

Mi macroblock in Ic

vi displacement of Mi w.r.t. Ir

V search range of possible motion vectors

D cost function

optimal displacement:

v∗i = min

v∈V

∑x∈Mi

D (Ic(x) − Ir(x − v))

(V , D chosen by implementation)


Spatial redundancy reduction

fixed JPEG variant:

• 8 bits per pixel

• 1 luminance component, 2 chrominance components

• fixed DCUs: macroblock with 16*16 luminance pels, 8*8chrominanc pels

• Huffman entropy coding

• sequential encoding


Layered structure, syntax and bit stream

goals

• genericity

• flexibilityvideo sequence parameters:

– picture width– picture height– pixel aspect ratio– frame rate– bit rate– buffer size

• efficiency


layered syntax

• sequence layer(random access unit: context)

• group of frames layer(random access unit: video coding)

• frame layer(primary coding unit)

• slice layer(resynchronizing unit)

• macroblock layer(motion compensation unit)

• block layer(DCT unit)


bit stream

• bit sequence consistent with syntax

• video buffer constraints

• decoding process

BufferMUX

-1

Q-1 IDCT +

+ Ref

Ref

MacroBlock Type

Motion vectors


Standard and quality

Conformance: encoder and decoder

• bit stream and decoding process:standard defines syntax and meaning

• encoders and decoders:standard defines decoding process


Resolution, bit rates and quality

VHS-like quality at 1.2 Mb/s

constrained parameter bit streams (CPB):

• horizontal size ≤ 720 pels

• vertical size ≤ 576 pels

• max. # macroblocks/picture ≤ 396

• max. # macroblocks/second ≤ 396·25 = 330·30• frame rate ≤ 30 frames/second

• bit rate ≤ 1.86 Mb/second

• decoder buffer ≤ 376832 bits

CIF format:352*240, 30 Hz / 384*288, 25 Hzyields 1.2–3 Mbps

CIF format often mixed up with MPEG-1but: MPEG-1 allows frame sizes up to 4096*4096!


MPEG-2

for wider range of applications and higher bandwidth

• backward compatibility to MPEG-1

• support for interlaced video

• improvements on coding efficiency

• multiresolution video

• multichannel audio

typical frame sizes (in kbits):

Mbps Picture typeI P B Avg.

MPEG-1 SIF 1.15 150 50 20 38MPEG-2 601 4.00 400 200 80 130


2.7.4 MPEG-4

Content-based interactivity

• Content-based multimedia data access tools• Content-based manipulation and bit-stream editing• Hybrid natural and synthetic data coding• Improved temporal random access

Compression

• Improved coding efficiency• Coding of multiple concurrent data streams

Universal access

• Robustness in error-prone environments• Content-based scalability


Basic concepts

AV objects:

• video object component

• audio object component

video object plane (VOP):2D video object“frame” may consist of

• Only 1 VOP (2D)

• 2 or more mutually disjoint VOPs, resulting from the seg-mentation of a 2D scene

• 2 or more VOPs, resulting from the composition of thescene from several sources

possible object manipulations:

• change of the spatial position of an object (VOP) in thescene

• application of a spatial scaling factor to an object in thescene

• change of the ‘speed’ with which an object moves in thescene

• inclusion of objects (VOPs) available at the composer butnot currently in display

• deletion of an object in the scene

• change of the scene area being displayed


a scene with three AVOs:

a scene before transformation:

a scene after the receiver transformation:


2.8 MPEG-7

2.8.1 Introduction

content description for audio-visual data


Terminology

Data

• audio-visual information,• described by MPEG-7,• regardless of storage, coding, display, transmission,

medium, or technology.

Feature distinctive characteristic of the Data

Descriptor representation of a Feature, defines the syntax andthe semantics of the Feature representation

Descriptor Value an instantiation of a Descriptor for a givendata set

Description Scheme (DS) specifies structure and semanticsof relationships between Descriptors and/or DescriptionSchemes

Description DS (structure) + Descriptor Values (instantia-tions) describing Data.

Coded Description Description encoded for reasons of com-pression efficiency, error resilience, random access, etc.

Description Definition Language (DDL) language allow-ing for the creation / extension / modification of Descrip-tion Schemes and Descriptors


Abstract architectue of MPEG-7 applications


MPEG-7 parts

Systems tools for

• transport,• storage,• synchronization between content and descriptions,• managing and protecting intellectual property

Description Definition Language language for definingnew Description Schemes + Descriptors.

Audio Descriptors and Description Schemes dealing with(only) Audio descriptions

Visual Descriptors and Description Schemes dealing with(only) Visual descriptions

Generic entities and Multimedia Description SchemesDescriptors and Description Schemes dealing with genericfeatures and multimedia descriptions

Reference Software software implementation of relevantparts of the MPEG-7 Standard

Conformance guidelines and procedures for testing confor-mance of MPEG-7 implementations.


2.8.2 MPEG-7 systems

tools for

• transport,

• storage,

• synchronization between content and descriptions,

• managing and protecting intellectual property

– to be defined in the future –


2.8.3 MPEG-7 Description Definition Lan-guage (DDL)

language for defining new Description Schemes + Descriptorsrequirements:

• express spatial, temporal, structural, and conceptual re-lationships

• rich model for links and references between descriptionsand the data

• validation of descriptor data types

• platform and application independent

• human- and machine-readable

→ based on XML syntax


XML Schema Overview

• XML Schema:

– datatypes– simple and complex types– elements– inheritance, abstract types

• MPEG-7 Extensions:

– array and matrix datatyp– enumerated datatypes for Mime type, country code,

region code, currency code and character set code– typed references


2.8.4 MPEG-7 Audio

Audio description tools for

• Sound effects description• Instrument description• Speech Recognition description

Audio Descriptor Frameworklow-level audio description


2.8.5 MPEG-7 Video

• Color

• Texture

• Shape

• Motion


Color Descriptors

Color space RGB, YUV, HSV, HMMD

Dominant color(s)

Color Histogram

Color Quantization

GoF/GoP Color Histogram Group of Frames/Group ofPictures color histogramaverage/ median / intersection

Color-Structure Histogram local cooccurrence of colors

Color Layout spatial distribution of color

Haar transformed Binary Histogram compact descriptorfor color (63 bits)


Texture Descriptors

Luminance Edge Histogram spatial distribution of four di-rectional edges and one non-directional edge

Homogenous Texture Descriptors 2 descriptors:

1. structuredness, directionality and coarseness2. quantitative description (62 factors)


Shape Descriptors

1. Object Bounding Box

2. Region-Based Shape

3. Contour-Based Shape


Motion Descriptors

• Camera Motion

• Object Motion Trajectory


2.8.6 MPEG-7 Multimedia DescriptionSchemes


Content Management


Navigation & Access

Summary Efficient support of browsing

• Hierarchical: Coarse to fine• Sequential: 1D temporal structure

Variation Substitution of the original content

• Adaptation to terminal, network, or user preferences


MMDS: elements and functionality

Creation & Production Meta information describing cre-ation and production of the content

• title,• creator,• classification,• purpose of the creation,• etc.

Usage Meta information related to the usage of the content:

• rights holders,• access right,• publication,• financial information.

Media Description of storage media:

• storage format,• encoding of the AV content• identification of the media


Structural aspects description structured around segments

• physical spatial, temporal or spatio-temporal com-ponent of the AV content

• signal-based features (color, texture, shape, motion,audio features) + elementary semantic information

Conceptual aspects Description from the conceptual view-point (under development)


2.9 Other media

2.9.1 Music

Temporal

• Representation

– Operational versus symbolic– MIDI– SMDL: Standard Music Description Language

(SGML)

• Operations

– Playback and synthesis– Timing– Editing and composition


2.9.1.1 MIDI

(Music Instrument Digital Interface)

defines interface between electronic music instruments and com-puters

compact representation of music data(≈ 0,3 kB/sec, vs. 176 kB/sec for CD audio)

basic idea:coding comprises

• name of instrument,

• start/end of note,

• base frequency,

• volume


MIDI: model

• 16 channels for data transmission

– each channel corresponds to a synthesizer– several instruments can play different notes at the

same time– 3–16 simultanous notes per channel

(subject to quality of synthesizer)

• 128 instruments

– including sound effects(e.g. telephone, helicopter)

– addressed by unique number 0–127

• MIDI-clock

– allows for synchronization between sender and re-ceiver

– 24 ticks per quarter note

• SMPTE time code as alternative to MIDI-clock

– SMPTE = Society of Motion Picture and TelevisionEngineers

– SMPTE defines format:hours:minutes:seconds:frames(e.g. 30 frames/sec)


MIDI infrastructure

• components

– input: typically via keyboard (like piano)different instruments can be imitated

– output: typically via synthesizer(transforms stored digital signal via D/A trans-former in acoustic signal)

– sequencer as editor for MIDI datauser interface: notes / technical MIDI data

• multimedia applications based on MIDI data allow for in-stantanous output via synthesizer

– MIDI requires precise timing of data transmission


2.9.2 Graphics

Non-temporal

• Representation

– Geometric models (used in GKS, PHIGS, PEX)– Solid models: constructive solid geometry, surfaces

of revolution, extrusion– Physically based models (considering mass, velocity,

rigidity)– Empirical models: fractals, particle systems– Drawing models: PostScript, LOGO graphics– External formats for models: CGM, Render-Man In-

terface Binary (RIB)

• Operations

– Primitive editing: for objects, of vertex coordinates,surface normals

– Structural editing: creating, modifying, spatial rela-tionships

– Shading: flat, Gouraud, Phong, ray tracing, radios-ity, programmable shaders

– Mapping: texture mapping, bump mapping, dis-placement mapping, environment mapping, shadowmapping

– Lighting: ambient light, point lights, directionallights, spot lights

– Viewing: 2 or 3D, parallel and perspective projec-tions

– Rendering: converts a model (shading, lighting,viewing info) into an image


2.9.3 Animation

Temporal

• Representation

– Cel models: celluloid sheets– Scene-based models– Event-based models– Keyframes– Articulated objects and hierarchical models– Scripting and procedural models– Physically based and empirical models

• Operations

– Graphics– Motion and parameter control– Animation rendering– Animation playback


Other Media

• Media type Other: Extended Images

• Media type Other: Digital ink

• Media type Other: Speech audio

• Media type Temporal: Animation


2.10 Multimedia

• MHEG

• SMIL


2.10.1 MHEG

standard for interoperability and interchange of hypermedia(MH) objects

application areas:

• training and education

• documentation

• electronic books

• computer-supported multimedia cooperative work

• point of information

• medical applications

issues:

• association of content and presentation attributes

• synchronization in space and time

• linking between components


2.10.1.1 The MHEG Standard

object:coded representation of independent and elementary unit of in-formation

objects interchanged and handled by applications

types of objects:

• monomedia

– text– graphics– image– audio– video– menu

• aggregated objectsdifferent media,with internal synchronization and links

input/output objects


Specifity of the MHEG Standard Scope

• interactivity and multimedia synchronization

• real-time presentation

• real-time interchange

• final form presentations


2.10.1.2 MH Objects Classes

Object Orientation

advantages of object-orientation:

• data encapsulation

• inheritance

• homogeneity of MH object descriptions

• representation of behaviour(autonomous objects in highly dynamic environment)


Representation of MH Objects

content objectencoded monomedia data + decoding and presentationinformation

projector objectpresentation attributes for content or composite object

basic objectcontent + projector object

composite objectset of MH objects + temporal and spatial interobject re-lations

conditional action set objectdefines relations based on conditions

generic input objectdefines selection + text input methods


MH Object Classes

object hierarchy:

MH object

• all-object

• clock

• null


• all-object

– output content

∗ text content

∗ graphics content

∗ still picture content

∗ audio content

∗ audiovisual sequence content

– generic input

∗ action-button

∗ stay-on-button

∗ on-off button

∗ menu selection

∗ multiple selection

∗ etc . . .

– projector

∗ area projector

· text projector

· graphics projector

· still picture projector

· input projector

∗ audio projector

∗ audiovisual projector

– basic– spatio-temporal composites– conditional action set


2.10.1.3 Methodology for MH Object Classes De-scription

4 levels:

1. informal text description

2. object-oriented definition

• class hierarchy• class behaviour• structure and semantics of attributes

3. notation of structure of presentationASN.1 syntax (abstract syntax notation)

4. coded object representation (ASN.1)


2.10.1.4 Basic Objects Representation

basic object = content + projector object

content class:

• general attributes (inherited from superclass)

• specific attributes for encoding parameters

projector class:parameters relevant for presentation

• area projector: position + area size

• audio projector: volume, stereo/mono, balance, speed


example: still picture object class(object-oriented definition)

descriptioninherits from = content classinherited by = NONE

representation(notation of structure)

• coding method

• coding parameters

• JPEG-parameters

• Huffman/arithmetic

• progressive/sequential

• color space

• source pixel density

• source data precision

• source image format


2.10.1.5 Composite Objects: Multimedia Syn-chronization

General Considerations

synchronization modes:

• script defined by a using application

• system synchronization already provided within the ob-ject(e.g. MPEG)

• spatiotemporal synchronization provided by compositionof child objects within parent object

• conditional synchronization provided by management ofevents generated by

– other objects,– user’s interaction, or– a using application


description of conditional synchronization

conditioncombination of event(s) + additional conditions

event

• event typestart/end of object, elapsed time

• object id• current state of object

running, stopped, selected

end, object ni, state=running

additional conditiondescribes context in which event occursobject nj , state=stopped

actionto be performed when condition=true

conditional action setset of (condition,action) pairs


multimediascenarioStort of the

Fixed-delayinput

T 1

Altanatescenario

: syncro conditioning events(generated by presentationprocess, or by user’s interaction)

Picture

Sound

User response

Text 1 Text 2

S1 S2

Delay

T 2

time

END

Maxtime

{ MPEG sequencePicture n° x

Graphics1Delay

(text and graphics on video)1


Space and Time Relations

placement of objects in space and time,based on attributes:

spatial position

• parallel relation• serial relation

Area sizeObject A

Area sizeObject B

1

2

3

1 2 3 4

X1=1Y1=2

MHgenericspaceorigin

X2=3Y2=1

Parallel spatial relation

MH generic coordinate space


X1=1Y1=3

X2=2Y2=-1

Area sizeObject A

Area sizeObject B

1

2

3

1 2 3 4

Serial spatial relation

MHgenericspaceorigin

MH generic coordinate space


temporal position

• parallel relation• serial relation

Temporal parallel relation

Parent object

Child object 1

Child object 2

t2

t1


Parent object

Temporal serial (or sequential) relation

Child object 1

Child object 2

t2

t1


General Framework for Spatiotemporal Composi-tion Representation

representation of composite objects:

1. description of relationshipin terms of position in time and space

2. list of component objectscomponent object:

• contained in the composite object, or• referenced by application-provided instance number,

or• standardized reference to external object


2.10.1.6 Input Objects

buttons

• action-button: trigger, yields event• stay-on-button: trigger + local boolean variable• switch button: two-state input object

menu selectionyields number of selected item

multiple selectionyields indication of selected items

character stringcharacter sequence + text attributes

locationyields horizontal + vertical coordinates

numerical valueyields integer between minimum and maximum,linearly related to cursor position


2.11 SMIL

Synchronized Multimedia Integration Language(W3C standard)

motivation:

• spatio-temporal composition of presentations

• declarative spezification

• text-based format

• specified as XML-DTD

• non-interactive presentations only! (except via linking)


SMIL concepts

• media objects referenced via URIs

• spatial and temporal addressing by means of intervals andregions

• all objects in a single root window

• Z index for layer ordering for visual display

• spercification of temporal synchronization

• hard and soft synchronization:

hard: for audio-video synchronization, limited jittersoft: for background music; only fixed starting time

• alternative content for different presentation qual-ity/output devices

• flexiblelinking model

• semantic annotations


SMIL example

<smil> <head>

<meta name="Title" content="Welcome to RealPlayer" />

<meta name="Author" content="RealNetworks" />

<meta name="Copyright" content = "(c) Real" />

<layout>

<root-layout height="300" width="350"

background-color="black" />

<region id="full_screen" left="0" top="0"

height="300" width="350" fit="fill" z-index="1" />

</layout></head>

<body> <par>

<audio src="firstrun.rm" />

<animation src="firstrun.swf" region="full_screen"

fill="freeze">

<anchor href="command:openwindow(tutorial,

http://ramhurl.real.com/g2install.html?

file=tutorials/607/free/overview.smi)"

coords="40,130,315,160" begin="14.9s" />


http://ramhurl.real.com//take5demo.smi)"

coords="40,170,315,200" begin="14.9s" />


http://ramhurl.real.com/start.smi)"

coords="40,205,315,235" begin="14.9s" />

</animation>

</par> </body>

</smil>