media - is.inf.uni-due.de€¦ · coding and compression methods text images audio video other...
TRANSCRIPT
Chapter 2
Media
• Media classification• Requirements for media representations• Coding and compression methods• Text• Images• Audio• Video• Other media
10
2.1 Media Classification
2.1.1 Basic concepts
kinds of media:
• perception mediahow do humans percept the information?
– viewing: text, image, video– listening: music, sound, speech– touching– tasting– smelling– balance
• representation mediahow is the information coded?e.g. text in ASCII
• presentation mediawhich devices are used for I/O to/from the computer?
– inputkeyboard, camera, microphone, mouse, dataglove
– outputpaper, monitor, loudspeaker
Universitat Dortmund, Informatik VI, N. Fuhr
• storage mediawhere is information stored?microfilm, paper, floppy disc, harddisc, CD-ROM,DVD, tape
• communication mediawhat is used for transmitting information?coax cable, twisted pair, FDDI, electromagneticwaves
• information exchange mediawhat is used for exchanging information between dif-ferent sites?paper, floppy disc, CD, microfilm(see also: communication media)
here: perception media
Universitat Dortmund, Informatik VI, N. Fuhr
presentation space
each medium yields presentation value in presentationspace
presentation value:representation of information in the mediume.g. text: sequence of charactersspeech: sound waves
dimensions of presentation space
• spatial dimensions (2-3)• temporal dimension (1)
classification of media according to temporal dimen-sion
• discrete: timed independente.g. text, graphics
• continuous (temporal): time dependentvideo, audio, sensor signals
Universitat Dortmund, Informatik VI, N. Fuhr
audio
t
x
yim
age
video
text is a linear medium
...
Universitat Dortmund, Informatik VI, N. Fuhr
2.1.2 Data streams
required for continuous media
properties of data streams:
• classical:– asynchronuous– synchronuous:
finite upper bound for end-to-end time difference– isochronuous:
finite upper bound for start and end time differ-ence
• periodicitychange of time interval for transmission of data pack-ets
– periodical:– weakly periodical– aperiodical
e.g. for transmission of events• variation of data rate
for subsequent information units– uniform– weakly uniform
periodic variation of data volume per informa-tion unite.g. MPEG: ratio of I:P:B frames
– varying
Universitat Dortmund, Informatik VI, N. Fuhr
• dependence of subsequent packets– dependent– independent (data stream with “holes”)
• information unitscan be defined differentlyhere: information unit = logical data unitdifferent granularities possiblee.g. video: pixel — raster — frame — clip — film
Universitat Dortmund, Informatik VI, N. Fuhr
2.2 Requirements for media repre-
sentations
• compression• easy processing• transmission (progressive mode)• referencing/addressing• logical structure• layout specification• attributes• annotation
Universitat Dortmund, Informatik VI, N. Fuhr
2.3 Coding and compression meth-
ods
goal: reduction of storage/bandwidth requirements
2.3.1 Classification of methods
• losslessexploitation of redundancy (entropy)
• lossygoal: minimum impact on presentation quality
Universitat Dortmund, Informatik VI, N. Fuhr
types of lossless coding methods:
• entropy coding– run-length coding– Huffmann coding– arithmetic coding
• source coding– prediction: DPCM– transformation: FFT, DCT
• hybrid coding: JPEG, MPEG
Universitat Dortmund, Informatik VI, N. Fuhr
2.3.2 Basic methods
2.3.2.1 Lossless coding methods
run-length codingencoding of bytestreamsin case of frequent repetition of a byte:byte + # occurrences(requires an escape byte)ABCAAABBBBCCCCCD →ABCAAA!4B!5CD
zero suppressionspecial case of run-length coding,only run length of special byte is coded
Universitat Dortmund, Informatik VI, N. Fuhr
pattern substitutionreplaces frequent patterns by single codes
frequently used: LZW (Lempel-Ziv-Welch)
uses adaptive table of predefined sizecodes are pointers into the dictionary(typically 9-14 bits)
Universitat Dortmund, Informatik VI, N. Fuhr
dictionary initialization:character set = codes 0. . . 255
encoding:sequential processing of input characters
1. if string is in table, append next char2. if string is not in table:
a) output last known string’s codeb) add new string to tablec) start new string with char
example:
Prefix Suffix New String Output
∆ a a -a b ab 97b a ba 98a b ab -ab c abc 256c b cb 99b a ba -ba ∆ ba 257
Universitat Dortmund, Informatik VI, N. Fuhr
statistical coding
• characters encoded with different # bits• frequent characters with few bits,
infrequent characters with more bits
Universitat Dortmund, Informatik VI, N. Fuhr
a) Huffman codingrequires probability of occurrence for each characterminimizes # bits for average message
varying # bits for different characters→ prefix property necessary (decoding without backtrack-ing)
Universitat Dortmund, Informatik VI, N. Fuhr
Huffman code example
byte prob. codeA 0.40 00B 0.20 01C 0.20 10D 0.10 110E 0.10 111
0 1
0
0
1
1
D
A B C
E
0 1
avg. # bits/character: 2.2
theoretical optimum:
H =∑
pi · ld 1pi
= 2.12
Universitat Dortmund, Informatik VI, N. Fuhr
algorithm for code development:
• order characters by decreasing probabilities• repeat
– select 2 lines with lowest probabilities– assign bit for distinction– join lines, form new lines with sum of probabili-
tiesuntil 1 line left
Universitat Dortmund, Informatik VI, N. Fuhr
E 0.13T 0.09A 0.08O 0.08N 0.07R 0.065I 0.065H 0.06S 0.06D 0.04L 0.035C 0.03U 0.03M0.03F 0.02P 0.02Y 0.02B 0.015W0.015G 0.015V 0.010J 0.005K 0.005X 0.005Q 0.0025Z 0.0025
Universitat Dortmund, Informatik VI, N. Fuhr
0.13E 0.13 0.30T 0.09 0.17A 0.08 0.058O 0.08 0.15N 0.07 0.28R 0.065 0.13I 0.065
1.0H 0.106 0.12S 0.06 0.195D 0.04 0.075L 0.035 0.305C 0.03 0.06U 0.03 0.11M0.03 0.05F 0.02 0.42P 0.02 0.040Y 0.02 0.070B 0.015 0.030W0.015 0.115G 0.015 0.025V 0.010 0.02J 0.005 0.010K 0.005 0.02X 0.005 0.010Q 0.0025 0.005Z 0.0025
Universitat Dortmund, Informatik VI, N. Fuhr
b) arithmetic coding
• optimum coding (like Huffmann),but assigns fractions of bits to single characters
• encodes character by considering leading characters
idea:assign each symbol unique interval ⊂ [0, 1](width = character probability)
character string = nesting of intervalsresulting interval represented as floating point number
code definition:
• fix symbol order• assign disjoint ranges [l[s], h[s]) of [0, 1] to symbols s,
width h[s] − l[s] = character probability
encoding of string s1, . . . , sn:
b = l[s1]t = h[s1]for i = 2 to n do
r = t − bb = b + r · l[si]t = t + r · h[si]
• output: arbitrary floating point number ∈ [b, t]
Universitat Dortmund, Informatik VI, N. Fuhr
example for arithmetic coding
byte prob. rangeA 0.40 [0.0, 0.4)B 0.20 [0.4, 0.6)C 0.20 [0.6,0.8)D 0.10 [0.8,0.9)E 0.10 [0.9,1.0)
Universitat Dortmund, Informatik VI, N. Fuhr
transformation coding
• transforms values into different mathematical space(which is suited better for coding)
• examples:discrete cosine transform (DCT)fast fourier transform (FFT)
Universitat Dortmund, Informatik VI, N. Fuhr
prediction / relative codingencodes only differences between subsequent bytes/blocksexamples:
• integers
5, 8, 12, 13, 15, 18, 23, 28, 29, 40, 60encode differences:
5, 3, 4, 1, 2, 3, 5, 5, 1, 11, 20→ smaller # bits/entry required
• images:homogeneous area → small differences between neigh-boured pixels→ many 0 differences → zero suppression/run-lengthencoding
• still videosmall differences between subsequent images(e.g. in background)
• audio;differential pulse code modulation:encoding of differences between subsequent PCM val-ues
Universitat Dortmund, Informatik VI, N. Fuhr
adaptive coding
• other coding methods:suitable only in typical contextsnon-typical byte sequence → no compression
• adaptive methods– adapt to specific context– but require additional transmission of coding pa-
rameters
Universitat Dortmund, Informatik VI, N. Fuhr
2.3.2.2 Lossy coding methods
vector quantizationdivides bytestream into blocks of n bytesuses table with patterns,block approximated by patternblock encoded as index in pattern table
• linear quantization• logarithmic quantization
subband coding
• transformation of certain frequencies only• quality criterion: # bands• used for speech, MPEG audio
Universitat Dortmund, Informatik VI, N. Fuhr
wavelets
wavelet functions:
• orthogonal basis of functions• squared errors sum up
Haar basis:
e(x) = α0 +k∑
i=1
2k−1∑j=1
αij · wij(x)
wij(x) =
1 , if 2j−22i ≤ x < 2j−1
2i
−1 , if 2j−12i ≤ x < 2j
2i
0 , otherwise
derivation of e(x) for example function:
9 7 2 68 4 1 -2
6 2
e(x) = 6 + 2w11(x) + 1w21(x) − 2w22(x)
Universitat Dortmund, Informatik VI, N. Fuhr
example function and Haar basis
1 1
1
1
1 1
w
w
w11
21
22
10
1
1
1w0
Universitat Dortmund, Informatik VI, N. Fuhr
task
approximate f(x) by f ′(x) such that
||f(x) − f ′(x)|| < ε∑x
(f(x) − f ′(x))2 < ε
where f ′(x) is wavelet function and
|{αij |αij 6= 0}| = min
solution
sort coefficients by |αij | · 2−(i/2)
(gives order of increasing squared error)
find maximum n s.th. setting first n αij = 0 yields
||f(x) − f ′(x)|| < ε
example:
e(x) = 6 + 2w11(x) + 1w21(x) − 2w22(x)
coefficients:
(6, α0), (2, α11), (+1, α21), (−2, α22)
Universitat Dortmund, Informatik VI, N. Fuhr
sorted by increasing squared error:(12, α21
),
(22, α22
),
(2√2, α11
), (6, α0)
Universitat Dortmund, Informatik VI, N. Fuhr
Example:non-Haar basis - squared errors do not sum up
l·a2
l2b2
l1a2 + l
2
(a2 + b2 + 2ab
)
Haar basis - squared errors sum up
l·a2
l2b2
l
2a2 +
l
4((a + b)2 + (a − b)2
)
Universitat Dortmund, Informatik VI, N. Fuhr
=l
2a2 +
l
4(a2 + 2ab + b2 + a2 − 2ab + b2
)
Universitat Dortmund, Informatik VI, N. Fuhr
2-dimensional case
Universitat Dortmund, Informatik VI, N. Fuhr
standard wavelet decomposition
� � �
.
.
.
-
transform rows
?
transform
Universitat Dortmund, Informatik VI, N. Fuhr
nonstandard wavelet decomposition
.
.
.
-
transform rows
?
transform
columns
Universitat Dortmund, Informatik VI, N. Fuhrex
ample
wavelet
compression
(a)
(b)
(c)
(d)
a)originalim
ageb)
19%of
coefficients
(5%error)
c)3%
ofcoeffi
cients(10%
error)d)
1%of
coefficients
(15%error)
Universitat Dortmund, Informatik VI, N. Fuhr
2.4 Text
2.4.1 Media type
Non-temporal: Text
• Representation– ASCII, ISO character sets– Marked-up, structured text– Hypertext
• Operations– Operations: character, string, language-specific– Editing, formatting– Pattern-matching and searching– Sorting– Compression– Encryption
Universitat Dortmund, Informatik VI, N. Fuhr
Universitat Dortmund, Informatik VI, N. Fuhr
2.4.2 SGML
markup language for text,worldwide standard
markup approaches:
1. punctuation2. layout (WYSIWYG)3. procedural (Troff, TeX, LaTeX)4. descriptive (GML, SGML)5. referential (embed, include; SGML)6. meta-markup
Universitat Dortmund, Informatik VI, N. Fuhr
SGML standards
• SGML = ISO 8879,Standard Generalized Markup Language
• related standards:– ISO 10179: DSSSL,
Document Style Semantics & Specifications(layout specification language for SGML docu-ments)
– ISO 8613: ODA,Office Document Architecture:(formating, presentation, exchange)ODML: SGML-DTD for ODA documents
Universitat Dortmund, Informatik VI, N. Fuhr
properties of SGML
SGML is
• markup language, database language• extensible document description language• meta language for the definition of document types
SGML supports
• logical structures, hierarchies• linking and addressing of files• multimedia and hypertext
Universitat Dortmund, Informatik VI, N. Fuhr
Processing of SGML documents
DSSL1
DSSL2
SGML Parser
FormattedDocuments Documents
Displayed
Doc1 Doc2 Doc3DTD 1
DTD 2
• syntax checking (according to a DTD)• printing according to a DSSSL specification• presentation on a screen
(according to a DSSSL specification)• indexing for context-oriented search• transformation in other representations
Universitat Dortmund, Informatik VI, N. Fuhr
SGML markup
SGML supports 4 types of markup:
1. descriptive: tags2. referential: references to objects3. meta markup: markup declarations (DTD)4. procedural: LINK, CONCUR
Universitat Dortmund, Informatik VI, N. Fuhr
Descriptive markup
• SGML document consists of elements<author><first>John</first><last>Smith</last></author>
• element:1. start tag2. content3. end tag
• content: defined by content model(grammar production)
– text (#PCDATA) or– sequence of elements
→ nesting of elements• top level element: document• start tag may have attributes
(attribute-value pairs)• document syntax defined in DTD (document type
definition)
Universitat Dortmund, Informatik VI, N. Fuhr
Example DTD
<!ELEMENT article - -(title, abstract, section+)>
<!ELEMENT title - - (#PCDATA)><!ELEMENT abstract - o (#PCDATA)><!ELEMENT section - o((title, body+) | (title, body*, subsectn+))><!ELEMENT subsectn - o (title, body+)><!ELEMENT body - o (figure | paragr)><!ELEMENT figure - o EMPTY><!ELEMENT paragr - o (#PCDATA)>
<!ATTLIST article author NAMES #REQUIREDstatus (final | draft) draft #REQUIRED><!ATTLIST figure file ENTITY #IMPLIED>
<!ENTITY file SYSTEM "/tmp/picture.ps" NDATA><!ENTITY amp "&">
Universitat Dortmund, Informatik VI, N. Fuhr
Example document
<article status = draft"author = "Cluet Christophides">
<title>From Structured Documents to ...</title><abstract>Structured Documents (e.g SGML) canbenefit from...
<section><title>Introduction</title><body><paragr>This Paper is organized as follows....</body></section>
<section><title>SGML preliminaries</title><body><figure>
</article>
Universitat Dortmund, Informatik VI, N. Fuhr
DTD syntax
element:<!ELEMENT element name omitstart omitend production>
attribute list for elements:<!ATTLIST element name attribute name domain default>
entities: (macro mechanism)<!ENTITY ename value >referencing: &ename
DTDs
• define a class of documents• specialize SGML for documents of a class• contain an attribute grammar• contain a nesting grammar• support hierarchies by means of nesting
Universitat Dortmund, Informatik VI, N. Fuhr
<!ELEMENT HTML O O HEAD, BODY --HTML document--><!ELEMENT HEAD O O TITLE><!ELEMENT TITLE - - #PCDATA><!ELEMENT BODY O O %content><!ENTITY % content
"(%heading | %htext | %block | HR)*"><!ENTITY % heading "H1|H2|H3|H4|H5|H6"><!ENTITY % htext "A | %text" --hypertext--><!ENTITY % text "#PCDATA | IMG | BR"><!ELEMENT IMG - O EMPTY --Embed. image--><!ELEMENT BR - O EMPTY><!ENTITY % block "P | PRE"><!ELEMENT P - O (%htext)+ --paragraph--><!ELEMENT PRE - - (%pre.content)+ --preform.--><!ENTITY % pre.content "#PCDATA | A"><!ELEMENT A - - (%text)+ --anchor--><!ELEMENT HR - O EMPTY -- horizontal rule --><!ATTLIST A
NAME CDATA #IMPLIEDHREF CDATA #IMPLIED --link-->
<!ATTLIST IMGSRC CDATA #REQUIRED --URL of img--ALT CDATA #REQUIREDALIGN (top|middle|bottom) #IMPLIEDISMAP (ISMAP) #IMPLIED>
Universitat Dortmund, Informatik VI, N. Fuhr
HTML
• is an SGML document class (DTD)
• mixture of logical and layout tags• no fixed DSSSL style sheet
no possibility for transmission of style sheets
consequences:
• HTML is less flexible than SGML• only minimum logical structuring possible
(makes retrieval difficult)• layout can be controlled only partially by document
provider
Universitat Dortmund, Informatik VI, N. Fuhr
2.4.2.1 DSSSL
language for describing layout of SGML documents
1. expression language (subset of Scheme)2. style language for formatting3. query language for retrieving document parts
Universitat Dortmund, Informatik VI, N. Fuhr
SG
ML
Do
cu
me
nt
SG
ML
Do
cu
me
nt
STTPSTFP
SPDL
or p
rop
rieta
ryfo
rm
Sourc
eD
oc
ume
ntTre
eTra
nsform
atio
nPro
ce
ss
STTPO
utput
Do
cum
ent
Tree
Form
atting
Proc
ess
Outp
ut of
Form
atte
r
DSSSL Sp
ec
ifica
tion
STTP-SPECSG
ML D
ec
ls&D
TDs
STFP-SPEC
Universitat Dortmund, Informatik VI, N. Fuhr
formatting
• input: SGML document + DTD + DSSSL stylesheet• output: formatted document (format depends on pro-
cessor)(e.g. TeX, RTF, Postscript, PDF)
formatting process:
• recursive processing of document according to DSSSLspecification
• output: tree of flow objectsflow object classes defined in DSSSL standard(e.g. page-sequence, paragraph, sequence)
Universitat Dortmund, Informatik VI, N. Fuhr
example document
<!DOCTYPE FAQ SYSTEM "FAQ.DTD"><FAQ><INFO><SUBJECT> XML </SUBJECT><AUTHOR> Lars Marius Garshol</AUTHOR><EMAIL> [email protected] </EMAIL><VERSION> 1.0 </VERSION><DATE> 20.jun.97 </DATE>
</INFO>
<PART NO="1"><Q NO="1"><QTEXT>What is XML?</QTEXT><A>SGML light.</A>
</Q>
<Q NO="2"><QTEXT>What can I use it for?</QTEXT><A>Anything.</A>
</Q>
</PART></FAQ>
Universitat Dortmund, Informatik VI, N. Fuhr
DTD:
<!ELEMENT FAQ (INFO, PART+)>
<!ELEMENT INFO (SUBJECT, AUTHOR, EMAIL?,VERSION?, DATE?)>
<!ELEMENT SUBJECT (#PCDATA)><!ELEMENT AUTHOR (#PCDATA)><!ELEMENT EMAIL (#PCDATA)><!ELEMENT VERSION (#PCDATA)><!ELEMENT DATE (#PCDATA)>
<!ELEMENT PART (Q+)><!ELEMENT Q (QTEXT, A)>
<!ELEMENT QTEXT (#PCDATA)><!ELEMENT A (#PCDATA)>
<!ATTLIST PART NO CDATA #IMPLIEDTITLE CDATA #IMPLIED>
<!ATTLIST Q NO CDATA #IMPLIED>
Universitat Dortmund, Informatik VI, N. Fuhr
style sheet
<!doctype style-sheet PUBLIC "-//James Clark//">
;--- DSSSL stylesheet for FAQML
;---Constants
(define *font-size* 12pt)(define *font* "Times New Roman")
;---Element styles
(element FAQ(make simple-page-sequence
font-family-name: *font*input-whitespace-treatment: ’collapsefont-size: *font-size*line-spacing: (* *font-size* 1.2)
(process-children)))
(element INFO(make paragraph
quadding: ’centerspace-after: (* *font-size* 1.5)
(process-children)))
Universitat Dortmund, Informatik VI, N. Fuhr
(element SUBJECT(make paragraph
font-size: (* *font-size* 2)line-spacing: (* *font-size* 2)space-after: (* *font-size* 2)
(process-children)))
(element AUTHOR(make sequence
(process-children)(literal ", ")))
(element VERSION(make paragraph
(make sequence(literal "Version: "))
(process-children)))
(element DATE(make paragraph
(make sequence(literal "Last modified: "))
(process-children)))
Universitat Dortmund, Informatik VI, N. Fuhr
(element PART(make paragraph
font-size: (* *font-size* 1.5)line-spacing: (* *font-size* 2)
(make sequence(literal (attribute-string "NO"
(current-node)))(literal ". ")(literal (attribute-string "TITLE"
(current-node))))
(process-children)))
(element QTEXT(make paragraph
font-weight: ’boldfont-size: *font-size*line-spacing: (* *font-size* 1.2)
(make sequence(literal (attribute-string "NO"
(parent (current-node))))(literal ". "))
(process-children)))
Universitat Dortmund, Informatik VI, N. Fuhr
(element A(make paragraph
space-after: (* *font-size* 0.66667)font-size: *font-size*line-spacing: (* *font-size* 1.2)
(process-children)))
Universitat Dortmund, Informatik VI, N. Fuhr
2.4.3 XML
weaknesses of HTML
• mixture of logical and layout markup:– logical: TITLE, H1, MENU, P– layout: I, B; FONT, CENTER, BGCOLOR att-
tributes• lack of markup facilities for specific texts
(e.g. math, chemistry)• little internal structure of elements
Universitat Dortmund, Informatik VI, N. Fuhr
XML vs. SGML
• complexity of SGML implementations→ XML is simplified version of SGML
• weak support for different character sets in SGML→ XML is based on Unicode
• SGML document not understandable without DTD
Universitat Dortmund, Informatik VI, N. Fuhr
XML Standard
• markup language: XML• linking language: XLink, XPointer• formatting language: XSL/XSLT
Universitat Dortmund, Informatik VI, N. Fuhr
2.4.3.1 XML language
simplification of SGML:
• start tag and end tag always must be present• special form: combined start-end tag:
e.g. <br/>, <img src="icon.gif"/>• DTD not always required:
well-formed XML: syntactically correct XMLvalid XML: XML-document satisfies specified DTD
• element names: case matters, underscore and colonin names allowed
• many special cases from SGML forbidden
Universitat Dortmund, Informatik VI, N. Fuhr
DTD
<!ENTITY % xhtml SYSTEM "xhtml-1.0-strict.dtd" >
%xhtml;
<!ELEMENT project (projecttitle,
shortdesc,
logo*,
fieldofoperation,
timeperiod?,
contactpersons,
involvedpersons?,
sponsoredby?,
participatinginstitutes?,
description,
publicationlist?,
notes?,
doccreator) >
<!ELEMENT projecttitle (langtext+) >
<!ATTLIST projecttitle state (work|closed) "closed">
<!ATTLIST projecttitle workgroup (ir|issi) #REQUIRED>
<!ELEMENT shortdesc (langtext+) >
<!ELEMENT logo (#PCDATA) > <!-- image file name -->
<!ATTLIST logo align (left|right) #IMPLIED
width %Length; #IMPLIED
height %Length; #IMPLIED >
<!ELEMENT referenceno (#PCDATA) >
<!ELEMENT fieldofoperation (langtext+) >
<!ELEMENT sponsoredby (sponsor+) >
<!ELEMENT sponsor (langtext+ | weblink) >
<!ELEMENT timeperiod (langtext+ |
(startdate, enddate)) >
<!ELEMENT startdate (day, month, year) >
Universitat Dortmund, Informatik VI, N. Fuhr
<!ELEMENT enddate (day, month, year) >
<!ELEMENT day (#PCDATA) >
<!ELEMENT month (#PCDATA) >
<!ELEMENT year (#PCDATA) >
<!ELEMENT contactpersons (personnel+) >
<!ELEMENT involvedpersons (personnel+) >
<!ELEMENT personnel (langtext+) >
<!ELEMENT participatinginstitutes (institute+) >
<!ELEMENT institute (langtext+ | weblink) >
<!ELEMENT description (langflow+) >
<!ELEMENT publicationlist (publication+) >
<!ELEMENT publication (langtext+) >
<!ELEMENT notes (langflow+) >
<!ELEMENT doccreator EMPTY>
<!ELEMENT weblink (url, linkdescription, langtext*)>
<!ELEMENT url (#PCDATA) >
<!ELEMENT linkdescription (langtext+) >
<!ELEMENT langtext %Inline; >
<!ELEMENT langflow %Flow; >
Universitat Dortmund, Informatik VI, N. Fuhr
Example document
<?xml version="1.0" encoding="ISO-8859-1" ?>
<!DOCTYPE project SYSTEM
"/services/www/xml/dtd/project.dtd">
<project>
<projecttitle state="work" workgroup="ir">
<langtext>
MIND
</langtext>
</projecttitle>
<shortdesc>
<langtext>
Resource Selection and Data Fusion for
Multimedia International Digital Libraries
</langtext>
</shortdesc>
<logo align="right">mast2_sm.gif</logo>
<fieldofoperation>
<langtext>Information Retrieval</langtext>
</fieldofoperation>
<timeperiod>
<startdate>
<day>01</day>
<month>02</month>
<year>2001</year>
</startdate>
<enddate>
<day>31</day>
<month>07</month>
<year>2003</year>
Universitat Dortmund, Informatik VI, N. Fuhr
</enddate>
</timeperiod>
<contactpersons>
<personnel>
<langtext>
<a href="/staff/members/nottelma.html">
Dipl.-Inform. Henrik Nottelmann</a>
</langtext>
</personnel>
</contactpersons>
<sponsoredby>
<sponsor>
<langtext>EU FP5</langtext>
</sponsor>
</sponsoredby>
<participatinginstitutes>
<institute>
<weblink>
<url>http://www.strath.ac.uk/</url>
<linkdescription>
<langtext>
University of Strathclyde
</langtext>
</linkdescription>
</weblink>
</institute>
<institute>
<weblink>
<url>http://ls6-www.informatik.uni-dortmund.de
</url>
<linkdescription> <langtext>
University of Dortmund
</langtext>
Universitat Dortmund, Informatik VI, N. Fuhr
</linkdescription>
</weblink>
</institute>
</participatinginstitutes>
<description>
<langflow>
<p> This research addresses problems associated
with the emergence of thousands of
heterogeneous multimedia Digital libraries...
</p>
</langflow>
</description>
<notes>
<langflow >
<ul>
<li><a href="internal/index.html">
Internal pages</a></li>
</ul>
</langflow>
</notes>
<publicationlist>
<publication>
<langtext>
<a href="overview/mind-overview.html">
MIND Overview slides
</a>
</langtext>
</publication>
</publicationlist>
<doccreator/>
</project>
Universitat Dortmund, Informatik VI, N. Fuhr
XSLT
transformation of XML documents(e.g. from XML into HTML)
similar to DSSSL, but in XML syntax
XSLT-Stylesheet =frame + set of transformation rules
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/REC-html40">
<xsl:output method="html"/>
<xsl:template match="...">
...
</xsl:template>
...
</xsl:stylesheet>
Universitat Dortmund, Informatik VI, N. Fuhr
Some XSLT elements
<xsl:template>
specifies a template rulematch attribute identifies source node(s) to which rule applies
<xsl:if>
test attribute specifies an expression:if true, content template is instantiated
<xsl:choose>
selects one among a number of possible alternative child ele-ments <xsl:when> and <xsl:otherwise>
<xsl:when>
if expression specified by test attribute is true, content templateis instantiated
<xsl:text>
contains literal data to be included in the output
Universitat Dortmund, Informatik VI, N. Fuhr
A small example
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE brief SYSTEM "brief.dtd">
<brief>
<anrede geschlecht="f" sozial="du">Nora</anrede>
<text>habe gerade den Ulysses beendet. Mal sehen,
wann der in den USA gedruckt werden darf...</text>
<gruss>J</gruss>
</brief>
Universitat Dortmund, Informatik VI, N. Fuhr
Stylesheet
<xsl:template match="/">
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="anrede">
<p>
<xsl:choose>
<xsl:when test="@sozial=’du’">
<xsl:text>Liebe</xsl:text>
<xsl:if test="@geschlecht=’m’">
<xsl:text>r</xsl:text>
</xsl:if>
<xsl:text> </xsl:text>
</xsl:when>
<xsl:when test="@sozial=’sie’">
<xsl:choose>
<xsl:when test="@geschlecht=’m’">
<xsl:text>Sehr geehrter Herr </xsl:text>
</xsl:when>
<xsl:when test="@geschlecht=’m’">
<xsl:text>Sehr geehrte Frau </xsl:text>
</xsl:when>
</xsl:choose>
</xsl:when>
</xsl:choose>
Universitat Dortmund, Informatik VI, N. Fuhr
<xsl:apply-templates/>
<xsl:text>,</xsl:text>
</p>
</xsl:template>
<xsl:template match="text | gruss">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
Universitat Dortmund, Informatik VI, N. Fuhr
XSL stylesheet for project page
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/REC-html40"
result-ns=""
version="1.0"
default-space="strip"
indent-result="yes">
<xsl:output method="html" encoding="iso-8859-1"/>
<xsl:param name="mailto"/>
<xsl:param name="fullname"/>
<xsl:param name="date"/>
<xsl:param name="lang"/>
<xsl:param name="url"/>
<xsl:include href="ls6common.xsl"/>
<xsl:template match="/">
<html>
<head>
<xsl:apply-templates
select="/project/projecttitle" mode="head"/>
<meta name="description">
<xsl:attribute name="content">
University of Dortmund,
Department of Computer Science (Chair VI):
<xsl:value-of
select="/project/projecttitle/langtext"/>,
<xsl:value-of
select="/project/shortdesc/langtext"/>
Universitat Dortmund, Informatik VI, N. Fuhr
</xsl:attribute>
</meta>
</head>
<body bgcolor="white">
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="project">
<xsl:if test="//projecttitle[@workgroup=’ir’]">
<xsl:call-template name="navbar-top">
<xsl:with-param name="upurl">
/ir/projects.html.en</xsl:with-param>
<xsl:with-param name="upname">
IR Projects</xsl:with-param>
</xsl:call-template>
</xsl:if>
<xsl:if test="//projecttitle[@workgroup=’issi’]">
<xsl:call-template name="navbar-top">
<xsl:with-param name="upurl">
/issi/projects.html.en</xsl:with-param>
<xsl:with-param name="upname">
ISSI Projects</xsl:with-param>
</xsl:call-template>
</xsl:if>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="projecttitle" mode="head">
<title>
<xsl:apply-templates select="langtext" mode="head"/>
</title>
Universitat Dortmund, Informatik VI, N. Fuhr
</xsl:template>
<xsl:template match="projecttitle">
<h1>
<xsl:apply-templates select="langtext"/>
</h1>
<xsl:call-template name="hrule"/>
</xsl:template>
<xsl:template match="shortdesc">
<em><xsl:apply-templates/></em>
<br/>
</xsl:template>
<xsl:template match="logo">
<img src="{.}">
<xsl:if test="@width">
<xsl:attribute name="width">
<xsl:value-of select="@width"/></xsl:attribute>
</xsl:if>
<xsl:if test="@height">
<xsl:attribute name="height">
<xsl:value-of select="@height"/></xsl:attribute>
</xsl:if>
<xsl:if test="@align">
<xsl:attribute name="align">
<xsl:value-of select="@align"/></xsl:attribute>
</xsl:if>
</img>
</xsl:template>
<xsl:template match="referenceno">
<p> <h3>Reference Number</h3>
Universitat Dortmund, Informatik VI, N. Fuhr
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="fieldofoperation">
<p> <h3>Field of operation</h3>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="timeperiod">
<p> <h3>Project Duration</h3>
From <xsl:apply-templates select="startdate"/>
until <xsl:apply-templates select="enddate"/>
</p>
</xsl:template>
<xsl:template match="startdate|enddate">
<xsl:apply-templates select="day"/>.
<xsl:apply-templates select="month"/>.
<xsl:apply-templates select="year"/>
</xsl:template>
<xsl:template match="day|month|year">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="contactpersons">
<p> <h3>Contact Persons</h3>
<ul>
<xsl:apply-templates/>
</ul>
</p>
Universitat Dortmund, Informatik VI, N. Fuhr
</xsl:template>
<xsl:template match="involvedpersons">
<p> <h3>Involved Persons</h3>
<ul>
<xsl:apply-templates/>
</ul>
</p>
</xsl:template>
<xsl:template match="sponsoredby">
<p> <h3>Sponsored by</h3>
<ul>
<xsl:apply-templates/>
</ul>
</p>
</xsl:template>
<xsl:template match="publicationlist">
<p> <h3>Publications</h3>
<ul>
<xsl:apply-templates/>
</ul>
</p>
</xsl:template>
<xsl:template match="publication">
<li><xsl:apply-templates/></li>
</xsl:template>
<xsl:template match="sponsor">
<li><xsl:apply-templates/></li>
</xsl:template>
Universitat Dortmund, Informatik VI, N. Fuhr
<xsl:template match="participatinginstitutes">
<p> <h3>Participating Institutes</h3>
<ul>
<xsl:apply-templates/>
</ul>
</p>
</xsl:template>
<xsl:template match="institute">
<li><xsl:apply-templates/></li>
</xsl:template>
<xsl:template match="description">
<h3>Description</h3>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="notes">
<h3>Notes</h3>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="linkdescription">
<xsl:apply-templates/>
</xsl:template>
<xsl:template
match="a[@href]|A[@HREF]|a[@name]|A[@NAME]|A[@href]">
<xsl:if test="@href">
<a href="{@href}"><xsl:apply-templates/></a>
</xsl:if>
<xsl:if test="@HREF">
Universitat Dortmund, Informatik VI, N. Fuhr
<a href="{@HREF}"><xsl:apply-templates/></a>
</xsl:if>
<xsl:if test="@name">
<a name="{@name}"><xsl:apply-templates/></a>
</xsl:if>
<xsl:if test="@NAME">
<a name="{@NAME}"><xsl:apply-templates/></a>
</xsl:if>
</xsl:template>
<xsl:template match="personnel">
<li><xsl:apply-templates/></li>
</xsl:template>
</xsl:stylesheet>
Universitat Dortmund, Informatik VI, N. Fuhr
HTML output
<html xmlns="http://www.w3.org/TR/REC-html40">
<head>
<META http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<title> MIND </title>
<meta name="description"
content="University of Dortmund,
Department of Computer Science (Chair VI): MIND,
Resource Selection and Data Fusion for Multimedia
International Digital Libraries ">
</head>
<body bgcolor="white">
<table width="100%">
<tr>
<td width="10%"></td><td width="80%" align="center">
[<a href="/ir/projects.html.en">IR Projects</a>]
[<a href="/ir/index.html.en">IR</a>]
[<a href="/issi/index.html.en">IS and Security</a>]
</td><td width="10%" align="right">
<a href="index.html.de">(deutsch)</a></td>
</tr>
</table>
<h1> MIND </h1>
<hr noshade size="2" width="100%">
<em>Resource Selection and Data Fusion for
Multimedia International Digital Libraries</em><br>
<img src="mast2_sm.gif" align="right">
<p> <h3>Field of operation</h3>
Information Retrieval </p>
<p> <h3>Project Duration</h3>
Universitat Dortmund, Informatik VI, N. Fuhr
From 01. 02. 2001 until 31. 07. 2003</p>
<p> <h3>Contact Persons</h3>
<ul> <li>
<a href="/staff/members/nottelma.html">
Dipl.-Inform. Henrik Nottelmann</a> </li>
</ul> </p>
<p><h3>Sponsored by</h3><ul><li>EU FP5</li></ul></p>
<p> <h3>Participating Institutes</h3>
<ul><li><a href="http://www.strath.ac.uk/">
University of Strathclyde </a> </li>
<li> <a
href="http://ls6-www.informatik.uni-dortmund.de">
University of Dortmund </a> </li>
</ul> </p>
<h3>Description</h3>
<p xmlns="">
This research addresses problems associated with
the emergence of thousands of heterogeneous
multimedia Digital libraries ... </p>
<h3>Notes</h3>
<ul xmlns="">
<li>
<a href="internal/index.html"
xmlns="http://www.w3.org/TR/REC-html40">
Internal pages</a>
</li>
</ul>
<p>
<h3>Publications</h3>
<ul>
<li>
<a href="overview/mind-overview.html">
MIND Overview slides </a>
Universitat Dortmund, Informatik VI, N. Fuhr
</li>
</ul>
</p>
<hr noshade size="2" width="100%">
<address>
<a href="mailto:[email protected]">
Henrik Nottelmann</a>
<[email protected]>,
20. March 2001</address>
</body>
</html>
Universitat Dortmund, Informatik VI, N. Fuhr
2.4.3.2 XLink: XML linking language
linking possible in any XML-DTD→ no special linking elements
linking via special attribute (for arbitrary elements):xml:link
terminology:
resource: adressable service or unit of information that partic-ipates in a link
link: explicit relationship between two or more resources
locator: data, provided as part of a link, which identifies aresource(attribute HREF)
inline link: link which serves as one of its own resourcese.g. A in HTML
out-of-line link: link whose content does not serve as one ofthe link’s resources
Universitat Dortmund, Informatik VI, N. Fuhr
Simple links
• one-directional
• mostly inline
<mylink xml:link="simple" title="Citation"
href="http://www.xyz.com/xml/foo.xml"
show="new" content-role="Reference">
as discussed in Smith(1997)</mylink>
<!ELEMENT mylink (#PCDATA)>
<!ATTLIST mylink
xml:link CDATA #FIXED "simple"
href CDATA #REQUIRED
content-role CDATA #IMPLIED
>
Universitat Dortmund, Informatik VI, N. Fuhr
Extended linksusually out-of-line links
capabilities:
• enable outgoing links in read-only documents
• create links to and from resouces in other formats
• applying and filtering sets of relevant links on demand
• enable other advanced hypermedia capabilities(e.g. via attribute ROLE)
example out-of-line extended link:
<commentary xml:link="extended" inline="false">
<locator href="smith2.1" role="Essay"/>
<locator href="jones1.4" role="Rebuttal"/>
<locator href="robin3.2" role="Comparison"/>
</commentary>
Universitat Dortmund, Informatik VI, N. Fuhr
definitions:
<!ELEMENT extended ANY>
<!ATTLIST extended
xml:link CDATA #FIXED "extended"
%link-semantics.att;
%local-resource-semantics.att;
>
<!ELEMENT locator ANY>
<!ATTLIST locator
xml:link CDATA #FIXED "locator"
%locator.att;
%remote-resource-semantics.att;
>
Universitat Dortmund, Informatik VI, N. Fuhr
<!ENTITY % locator.att
"href CDATA #REQUIRED"
>
<!ENTITY % link-semantics.att
"inline (true|false) ’true’
role CDATA #IMPLIED"
>
<!ENTITY % local-resource-semantics.att
"content-role CDATA #IMPLIED
content-title CDATA #IMPLIED"
>
<!ENTITY % remote-resource-semantics.att
"role CDATA #IMPLIED
title CDATA #IMPLIED
show (embed|replace|new) #IMPLIED
actuate (auto|user) #IMPLIED
behavior CDATA #IMPLIED"
>
Universitat Dortmund, Informatik VI, N. Fuhr
Link behaviour
SHOW attribute:describes display behaviour on traversal of link
• embed: designated resource embedded in body of currentresource
• replace: designated resource replaces current resource
• new: designated resource displayed in a new window
ACTUATE attribute:when should traversal of link occur?
• auto: retrieve resource when current resource is encoun-tered
• user: present resource only upon request from user
all combinations of SHOW and ACTUATE values are possible!
Universitat Dortmund, Informatik VI, N. Fuhr
2.4.3.3 XPointer: XML Pointer Language
for locators in XLink
• reference to whole document
• reference to named element in document
• reference to unnamed element in read-only document
locator syntax
Locator ::= URI
| Connector ( XPointer | Name)
| URI Connector (XPointer | Name)
Connector ::= ’#’ | ’|’
URI ::= URIchar*
Universitat Dortmund, Informatik VI, N. Fuhr
XPointer syntax
• location of individual nodes in element tree
• spanning locations across several elements
• arbitrary set of elements
syntax:
XPointer ::= AbsTerm ’.’ OtherTerms
| AbsTerm
| OtherTerms
OtherTerms ::= OtherTerm
| OtherTerm ’.’ OtherTerm
OtherTerm ::= RelTerm
| SpanTerm
| AttrTerm
| StringTerm
Universitat Dortmund, Informatik VI, N. Fuhr
Absolute location terms
AbsTerm ::= ’root()’ | ’origin()’ | IdLoc | HTMLAddr
IdLoc ::= ’id(’ Name ’)’
HTMLAddr ::= ’html(’ SkipLit ’)’
• root: root element of containing resource
• origin: application-dependent
• id: element with named id value
• html(NAMEVALUE): A element in HTML withNAME=NAMEVALUE
Universitat Dortmund, Informatik VI, N. Fuhr
Relative location terms
RelTerm ::= Keyword? Arguments
Keyword ::= ’child’
| ’descendant’
| ’ancestor’
| ’preceding’
| ’following’
| ’psibling’
| ’fsibling’
example:child(2,section).child(1,subsection)
Universitat Dortmund, Informatik VI, N. Fuhr
relative location term arguments
• selection by instance number
• selection by node type
• selection by attribute
Universitat Dortmund, Informatik VI, N. Fuhr
Spanning location termdata between two XPointers:
SpanTerm ::= ’span(’ XPointer ’,’ XPointer ’)’
examples:
id(a23).span(child(1),child(3))
span(id(sec2.1).child(-1,P),id(sec2.2).child(1,P))
Attribute location termreturns value of named attribute
String location termstring match
Universitat Dortmund, Informatik VI, N. Fuhr
2.5 Images
2.5.1 Media type
Non-temporal: Image
• Representation
– Color model: CIE, RGB, HSV, CMYK, YUV– Channels: alpha?, number, depth– Interlacing– Indexing– Pixel aspect ratio– Compression
• Operations
– Editing– Point operations: thresholding, color correction– Filtering– Compositing– Geometric transformations: displacing, rotating,
mirroring, scaling, skewing, warping– Conversion: color separation, resampling
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.2 Color
2.5.2.1 Human perception
visible light: λ ∈ [380nm . . . 780nm]([violet . . . red])
retina:
• rods for brightness
• cones for chromaticity (color)
three types of cones:
• yellow: λx = 600nm
• green: λy = 535nm
• blue: λz = 445nm
Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr
Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr
ϕ(λ): wavelength distribution of source light
k: normalization factor
x(λ), y(λ), z(λ): eye response functions
X, Y , Z: perceived color
X = k
∫ϕ(λ)x(λ)dλ
Y = k
∫ϕ(λ)y(λ)dλ
Z = k
∫ϕ(λ)z(λ)dλ
CIE Yxy color system
x =X
X + Y + Z
y =Y
X + Y + Z
Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr
perceived visual distance(magnified by 10)
→ need for equidistant color spaces
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.2.2 Color spaces
RGB three basic colours: red, green, blue
YUV luminance (as in b/w TV) +2 chrominance channels
YIQ used for NTSC TV
YCrCb used in JPEG digital image standard
CMY(k) cyan, magenta, yellow (black) used for printers
HSV color model with (approximately) equidistant colors
mapping RGB → YUV:Y = 0.30R + 0.59G + 0.11BU = (B − Y ) · 0.493V = (R − Y ) · 0.877
mapping RGB → YIQ:Y = 0.30R + 0.59G + 0.11BI = 0.60R − 0.28G − 0.32BQ = 0.21R − 0.52G + 0.31B
mapping RGB → YCrCb:Y = 0.30R + 0.59G + 0.11BCr = 0.50R − 0.42G − 0.08BCb = −0.17R − 0.33G + 0.50B
mapping RGB → CMY(K):C = 1 − RM = 1 − GY = 1 − BK = min(C,M, Y )
Universitat Dortmund, Informatik VI, N. Fuhr
mapping RGB → HSV:
v = max(r, g, b), s = v - min(r,g,b)v
let r = v - rv - min(r,g,b)
, g =v - g
v - min(r,g,b), b = v - b
v - min(r,g,b)
6h =
5 + b if r = max(r, g, b) and g = min(r, g, b)1 − g if r = max(r, g, b) and g 6= min(r, g, b)1 + r if g = max(r, g, b) and b = min(r, g, b)
3 − b if g = max(r, g, b) and b 6= min(r, g, b)3 + g if b = max(r, g, b) and r = min(r, g, b)5 − r otherwise
Universitat Dortmund, Informatik VI, N. Fuhr
vc
rg
b,
,(
)=
R
G
B
S
V
H
wc
Tvc
⋅=
r
gb
wc
hs
v,
,(
)=
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.3 GIF format
(graphics interchange format, proprietary standards of Com-puServe)
• lossless compression of image data
• restricted to 256 colors
structure of a GIF file:
• GIF signature:“GIF87a” / “GIF89a”
• screen descriptor
– width– height– color resolution (1. . . 8 bits)– background color
• global color map:table of RGB values
• sequence of images
• GIF terminator
Universitat Dortmund, Informatik VI, N. Fuhr
strucure of an image:
• image descriptor (image position+size)
• local color map
• raster data:sequence of color index values,compressed by patented variation of LZW
sequence of raster data: sequential / interlaced rows
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.4 PNG format
(portable network graphics)non-proprietary standard proposed by W3C
GIF features retained in PNG:
• Indexed-color images of up to 256 colors.
• Streamability:files can be read and written serially(file format usable as communications protocol)
• Progressive display
• Transparency(portions of the image can be marked as transparent),
• Ancillary information:textual comments and other data can be stored within theimage file.
• Complete hardware and platform independence.
• Effective, 100% lossless compression.
Universitat Dortmund, Informatik VI, N. Fuhr
New features of PNG:
• Truecolor images of up to 48 bits per pixel.
• Grayscale images of up to 16 bits per pixel.
• Full alpha channel (general transparency masks).
• Image gamma information(automatic display of images with correct bright-ness/contrast)
• Reliable, straightforward detection of file corruption.
• Faster initial presentation in progressive display mode.
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.5 JPEG Formats
2.5.5.1 Requirements
• high compression rate vs. image fidelity
• applicable to any kind of continuous-tone digital sourceimage
• tractable computational complexity
• modes of operation:
1. sequential encoding(left-to-right, top-to-bottom)
2. progressive encodingencoding in multiple scans for low-bandwidth com-munication(user watches image built up in multiple course-to-clear passes)
3. lossless encodingexact recovery of source image possible(although low compression compared to lossymodes)
4. hierarchical encodingencoding at multiple resolutions
2.5.5.2 Processing steps for DCT-based coding
DCT: discrete cosine transform
here: consider single component only= greyscale image
Universitat Dortmund, Informatik VI, N. Fuhr
FDC
TQ
uant
izer
Ent
ropy
Enc
oder
Spec
ific
atio
nT
able
Spec
ific
atio
nT
able
Imag
e D
ata
Sour
ceC
ompr
esse
dIm
age
Dat
a
8x8
bloc
ksD
CT
-Bas
ed E
ncod
er
Universitat Dortmund, Informatik VI, N. Fuhr
Spec
ific
atio
nT
able
Spec
ific
atio
nT
able
Imag
e D
ata
Rec
onst
ruct
edC
ompr
esse
dIm
age
Dat
a
Ent
ropy
Dec
oder
Deq
uant
izer
IDC
T
DC
T-B
ased
Dec
oder
Universitat Dortmund, Informatik VI, N. Fuhr
8*8 DCTcompression of a stream of 8*8 blocks of image samples
• group image samples into 8*8 blocks
• shift from unsigned integers to signed integers:[0, 2p − 1] → [−2p−1, 2p−1 − 1]
• input to the forward DCT
F (u, v) =1
4C(u)C(v)
(7∑
x=0
7∑y=0
f(x, y)
· cos (2x + 1)uπ
16cos
(2y + 1)vπ
16
)
u, v = 0 . . . 7
C(u), C(v) =
{1/√
(2) for u, v = 01 otherwise
Universitat Dortmund, Informatik VI, N. Fuhr
FDCT:64-point discrete signals→ 64 orthogonal basis signals(amplitudes of cosine functions)
F (0, 0) – DC coefficient:
1
4
1√(2)
1√(2)
(7∑
x=0
7∑y=0
f(x, y) · 1
16
1
16
)
other 63 coefficients – AC coefficients
little variation in 8*8 block→ most spatial frequencies with zero amplitude→ no encoding necessary→ compression
Universitat Dortmund, Informatik VI, N. Fuhr
inverse DCT:maps 64 DCT coefficients onto 8*8 image block
f(x, y) =1
4
(7∑
u=0
7∑v=0
C(u)C(v)F (u, v)
· cos (2x + 1)uπ
16cos
(2y + 1)vπ
16
)
Universitat Dortmund, Informatik VI, N. Fuhr
problems
theoretically:DCT is 1:1 mapping of 64 point vectors between image andfrequency domain
practically:loss through
• quantization
• computation of transcendental functions
Universitat Dortmund, Informatik VI, N. Fuhr
Quantizationmapping of FDCT output (F (u, v), u, v = 0 . . . 7)onto integers
quantization table:Q(u, v), u, v = 0 . . . 7, 1 ≤ Q(u, v) ≤ 255
quantization:
• goal: achieve further compression
• represent DCT coefficients with minimum necessary pre-cision(and minimum effect on visual image quality)
• lossy, n : 1 mapping
F Q(u, v) = IntegerRound
(F (u, v)
Q(u, v)
)
dequantization:
F ′(u, v) = F Q(u, v) · Q(u, v)
Universitat Dortmund, Informatik VI, N. Fuhr
DC coding and zig-zag sequence
• separate treatment of DC and AC coefficients
• DC:strong correlation between coefficients of adjacent 8*8blocks→ differential encoding
• AC:ordering in zig-zag sequencelow frequency coefficients (mostly nonzero) before high-frequency coefficients (mostly zero)(facilitate entropy coding)
DC = DC - DCl l l-1
DC DC
DC ACAC
AC AC 7770
01 07
ll-1
...... block blockl-1 l
Differential DC encoding Zig-zag sequence
Universitat Dortmund, Informatik VI, N. Fuhr
Entropy coding
lossless compression of DCT coefficients
1. convert zig-zag sequence of quantized coefficients into in-termediate sequence of symbols(with zero suppression)
2. convert symbols into data stream with no externally iden-tifiable boundaries(Huffman coding / arithmetic coding)
Universitat Dortmund, Informatik VI, N. Fuhr
Compression and picture quality
input: typically 8 bits/pixel per component(12 bits/pixel for special applications, e.g. medical images)
1 chrominance sample/4 luminance samples1 luminance component + 2 chrominance components→∑
12 bits/pixel
output:
• 0.25–0.5 bits/pixel: moderate to good quality
• 0.5–0.75 bits/pixel: good to very good quality
• 0.75–1.5 bits/pixel: excellent quality
• 1.5–2.0 bits/pixel: indistinguishable from the original
Universitat Dortmund, Informatik VI, N. Fuhr
Luminance sample
Chrominance sample
Block Edge
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.5.3 Predictive lossless coding
difficult with DCT→ independent method for lossless coding
typically 2:1 compression
Universitat Dortmund, Informatik VI, N. Fuhr
Imag
e D
ata
Sour
ceC
ompr
esse
dIm
age
Dat
aSp
ecif
icat
ion
Tab
le
Ent
ropy
Enc
oder
Los
sles
s E
ncod
er
Pred
icto
r
Universitat Dortmund, Informatik VI, N. Fuhr
C BA X
Selection value Prediction
0 no prediction1 A2 B3 C4 A + B - C5 A + ((B - C)/2)6 B + ((A - C)/2)7 (A + B)/2
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.5.4 Multiple-Component images
Source image formatsJPEG poses no restrictions onpixel aspect ratio,color space,image acquisition characteristics
JPEG source Image Model
...
C CC
12
N
Y
X
(a) Source Image withmultiple components
Universitat Dortmund, Informatik VI, N. Fuhr
yi
xi
bottom
right
(b) Characteristics of anImage component
Ci top
left
samples
• image contains 1 . . . 255 components(spectral bands, channels)
• component = rectangular area of samples
• sample = unsigned integer with p bits
• p fixed for all samples of an image
• p = 8 or p = 12 for DCT coding
• p = 2 . . . 12 for predictive coding
xi, yi sample dimensions of ith component
Universitat Dortmund, Informatik VI, N. Fuhr
Hi, Vi relative horizontal/vertical sampling factor1 ≤ Hi, Vi ≤ 4
X, Y overall image dimensionsX = maxi(xi), Y = maxi(yi), X, Y ≤ 216
encoder stores X, Y and Hi, Vi
decoder:
xi =⌈X · Hi
Hmax
⌉, yi =
⌈Y · Vi
Vmax
⌉
Universitat Dortmund, Informatik VI, N. Fuhr
Entropy order and interleavinginterleaving of data from multiple components
data unit=
• sample in predictive coding
• 8*8 block in DCT coding
order of compressed data units:generalization of raster-scan order
noninterleaved data ordering:
top
bottom
left right
Universitat Dortmund, Informatik VI, N. Fuhr
interleaved data ordering
• component Ci partitioned into rectangular regions Hi*Vi
• regions ordered left-to-right, top-to-bottom
• data units within region ordered left-to-right, top-to-bottom
• MCU: minimum coded unit=smallest group of interleaved data units
1 2C : H = 2 , V = 2 C : H = 2 , V = 1
0
11 2 3 540 0 2
0
1 1
2
3
3
4 5
0 1 0 1 22
0 0
1 1
3
2
3 4C : H = 1 , V = 2 C : H = 1 , V = 1
restrictions:
Universitat Dortmund, Informatik VI, N. Fuhr
• maximum number of components interleaved: 4
• maximum number of data units in an MCU: 10
Universitat Dortmund, Informatik VI, N. Fuhr
Multiple tablescomponent-specific tables for quantization and entropy coding
TableSpec.1
TableSpec. 2
EncodingProcess
CompressedImage Data
A
B
C
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.5.5 Baseline and other DCT sequential codecs
components of sequential coding:
• FDCT
• quantization
• entropy coding
• multiple-component control
variations:
• sample precisions: 8 bit / 12 bit
• Huffman / arithmetic coding
baseline sequential coding:
• 8 bit samples
• Huffman coding
• max. two sets of Huffman tables
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.5.6 DCT progressive mode
uses FDCT and quantization as with sequential coding
difference:each image component encoded in multiple scans
• requires image-sized buffer memory between quantizerand entropy encoder
• stores image as quantized DCT coefficients
• buffered coefficients partially encoded in multiple scans
• two complementary methods
– spectral selection:only specific band of coefficients from zig-zag se-quence encoded in a scan
– successive approximation:coefficients within current band encoded with lim-ited accuracy in a scan
Universitat Dortmund, Informatik VI, N. Fuhr
(a) Image componentas quantized
DCT coefficients
0
12
0
1
62
63
7 6 1
LSBMSB
Universitat Dortmund, Informatik VI, N. Fuhr
(b) Sequentialencoding
Sending
0
1
2
1 st scanSending
3
4
5
Universitat Dortmund, Informatik VI, N. Fuhr
7 6 5 4
Sending
2 nd scan
3
MSB
1 st scan
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.5.7 Hierarchical mode of operation
“pyramidal” encoding of an image at multiple resolutions
subsequent encoding uses double resolution(horizontal/vertical/both)
procedure:
1. filter and down-scale original image by desired power of 2in each dimension
2. encode reduced-size image by sequential DCT / progres-sive DCT / lossless coding
3. decode reduced-size image, then interpolate and oversam-ple it by 2 (horizontally/vertically/both)
4. use up-sampled image as prediction of the original,encode difference image as above
5. repeat steps 3. and 4. until full resolution has been en-coded
application of hierarchical encoding:access to high-resolution images for low-resolution-devices withlimited buffer capacity
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.5.8 Coded representation for compressed im-ages
• interchange format syntax
• tables stored with the image / default tables / referencedtables
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.5.9 JPEG2000
capabilities supported:
• resolution scalability:arbitrary number of resolution levels
• region of interest coding:certain parts of image coded in better quality
• SNR (signal-noise ration) scalability
• random access capability
• multi-component imagery
• arbitrary wavelet decompositions
• arbitrary wavelet kernels
• arbitrary bit-depth images
• tiling
– any number of tiles– rate-control performed jointly over all tiles
• frames
– similar to tiles– coder operates independently in frames
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.6 Fractal image compression
2.5.6.1 Introduction
Input Image Output Image
Copy machine
Universitat Dortmund, Informatik VI, N. Fuhr
Initial Image First Copy Second Copy Third Copy
(a)
(b)
(c)
final attractor independent of starting image -depends only on transformation
affine transformation:
wi
[xy
]=
[ai bi
ci di
] [xy
]+
[ei
fi
]
Universitat Dortmund, Informatik VI, N. Fuhr
some affine transformations
each image is transformed copy of itself→ image must have detail at every scale→ images are fractals
fractal image compression:store images as collections of transformations e.g. fern
Universitat Dortmund, Informatik VI, N. Fuhr
advantage: multiresolution representation of images
fractal vs. pixel-based representation:
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.6.2 Iterated function systems
Contractive transformations
transformation contractive iff for any two points P1, P2:
d(w(P1), w(P2)) < s · d(P1, P2)
(for s < 1)
distance in the plane:
d(P1, P1) =√
(x2 − x1)2 + (y2 − y1)2
example contractive transformation
wi
[xy
]=
[12
00 1
2
] [xy
]
Universitat Dortmund, Informatik VI, N. Fuhr
iterated function system:collection of contractive transformations
{wi :
<−>
2
→ R2|i = 1, . . . , n}maps plane R2 to itself
collection of transformations defines map
W (·) =
n⋃i=1
wi(·)
Universitat Dortmund, Informatik VI, N. Fuhr
f0 input image
f1 = W (f0)
f2 = W (W (f0)) = W ◦2(f0)
contractive mapping fixpoint theorem:
|W | ≡ f∞ = limn→∞
W ◦n(f0)
attractor is independent of f0 !
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.6.3 Self-similarity in images
grey-scale images as functions f(x, y)
Universitat Dortmund, Informatik VI, N. Fuhr
metric on images
δ(f, g) = sup(x,y)∈I2
|f(x, y) − g(x, y)|
Universitat Dortmund, Informatik VI, N. Fuhr
natural images are not exactly self-similar
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.6.4 Partitioned iterated function systems
partitioned copying machine
specification of copying machine:
1. # copies
2. affine transformation (for each copy)
3. contrast and brightness adjustment (for each copy)
4. mask for selecting part of the original to be transformed(for each copy, Di → Ri)
specification of transformation wi
wi
[xyz
]=
[ai bi 0ci di 00 0 si
][xyz
]+
[ei
fi
oi
]
si controls contrast (s < 1)oi affects brightness
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.6.5 Encoding images
ideal goal of fractal image compression:satisfy fixed point equation
f = W (F ) = w1(f) ∪ w2(f) ∪ · · ·wN(f)
→ seek partition of f into pieces s.th. f.p.e. is fulfilled
approximation:
f ≈ f ′ = W (f ′) ≈ W (f) = w1(f) ∪ w2(f) ∪ · · ·wN (f)
minimize quantities
δ(f ∩ (Ri × I), wi(f)) i = 1, . . . , N
1. find good choice for Di
2. find good contrast and brightness settings si and oi
Universitat Dortmund, Informatik VI, N. Fuhr
example:
• 256*256 pixels input image
• output ranges: Ri: consider nonoverlapping 8*8 sub-squares (1024)
• input ranges: Di: overlapping 16*16 subsquares (241 ·241 = 58 0581)
• 8 ways for mapping square → square(4 rotations, flip + 4 rotations)
• estimate si and oi using least squares regression
Universitat Dortmund, Informatik VI, N. Fuhr Universitat Dortmund, Informatik VI, N. Fuhr
compression
input image: 65536 bytes
compressed image: 3968 bytes
→ compression factor: 16.5
Universitat Dortmund, Informatik VI, N. Fuhr
2.5.6.6 Partitioning images
image areas requiring different levels of detail →vary size of input ranges Ri
quadtree partitioningdivide square into 4 sub-squares
Universitat Dortmund, Informatik VI, N. Fuhr
HV-partitioningdivide rectangle either horizontally or vertically
R21R
1st Partition 2nd 3rd and 4th Partitions
(a) (b) (c)
Universitat Dortmund, Informatik VI, N. Fuhr
triangula
rpartitio
nin
grecta
ngle→
2tria
ngles,
triangle→
4tria
ngles
(connect
partitio
nin
gpoin
tson
each
side)
Universitat Dortmund, Informatik VI, N. Fuhr
2.6 Audio
2.6.1 Introduction
human perception: 20 Hz – 20 kHz
digital audio:
• sample audio input in regular, discrete intervals
• quantize sampled values
digital audio data: sequence of binary values representingnumber of quantizer levels
pulse code modulation:represent each sample with an independent code word
Universitat Dortmund, Informatik VI, N. Fuhr
PCM
VA
LU
ES
DIG
ITA
L S
IGN
AL
PRO
CE
SSIN
GD
IGIT
AL
-TO
-AN
AL
OG
CO
NV
ER
SAT
ION
AN
AL
OG
-TO
-DIG
ITA
LC
ON
VE
RSA
TIO
N
AN
AL
OG
AU
DIO
INPU
TPC
MV
AL
UE
S
AN
AL
OG
AU
DI
OU
TPU
T
0011
0111
000.
..11
0011
0010
0...
Universitat Dortmund, Informatik VI, N. Fuhr
Nyquist theory:time-sampled signal can represent signals up to half the samplingrate
typical sampling rates:
8 kHz for speech
44.1 kHz for music (audio CD)
quantizer levels: power of 2each bit reduces signal-to-noise ratio by 6 db
typical # bits/sample:
8 (= 48 dB) speech, low-quality audio
16 (= 96 dB) high-quality audio (audio CD)
data rates for uncompressed audio:8 . . . 176 kB/sec (176 for audio CD, stereo)
Universitat Dortmund, Informatik VI, N. Fuhr
2.6.2 Media type
Temporal: Digital audio
• Representation
– Sampling frequency– Sample size and quantization: linear, nonlinear– Number of channels (tracks): 2, 4, 16, 32– Interleaving– Negative samples: one or two’s complement– Encoding: PCM, ADPCM
• Operations
– Storage– Retrieval– Editing: cross-fade, play list– Effects and filtering: delay, equalization, normaliza-
tion, noise reduction, time compression/expansion,pitch shifting, stereoization, acoustic environments
– Conversion
Universitat Dortmund, Informatik VI, N. Fuhr
2.6.3 Formats
2.6.3.1 µ-Law Audio Compression
logarithmic quantization
• represents low-amplitude audio samples with greater ac-curacy
• →uniform signal-to-noise ratio over range of amplitudes
• 8 bits/sample represent 14 bits in linear sampling
• used in ISDN telephone (with 8kHz sampling)
x input signal, |x| ≤ 1
y output signal
µ = 255
y =
{255 − 127
ln(1+µ)· ln(1 + µ · |x|) for x ≥ 0
127 − 127ln(1+µ)
· ln(1 + µ · |x|) for x < 0
Universitat Dortmund, Informatik VI, N. Fuhr
2.6.3.2 ADPCM
adaptive pulse code modulation
adjacent samples have similar values→ encode PCM value of the difference
(ADAPTIVE)DEQUANTIZER
(ADAPTIVE)PREDICTOR
C[n] Dq[n]
Xp[n-1]
Xp[n]+
+
+
(b) ADPCM Decoder
(ADAPTIVE)QUANTIZER
(ADAPTIVE)PREDICTOR
(ADAPTIVE)DEQUANTIZER
+X[n]
Xp[n-1]
D[n] C[n]
Xp[n]
Dq[n]
+
+
(a) ADPCM Encoder
-
ADPCM coder can adapt to characteristics of audio signal
Universitat Dortmund, Informatik VI, N. Fuhr
• change step size of quantizer
• change step size of predictor
different algorithms/standards, depending on
• adaptation possibilities
• side information
– quantizer/predictor step size– redundant contextual information (for error recov-
ery)
algorithms:
• IMA/ADPCM: Interactive Multimedia Association
• CCITT G.721 (32 kbps compressed data)
• CCITT G.723 (24 kbps compressed data)
• compact disc interactive (CD-I) audio compression algo-rithm
Universitat Dortmund, Informatik VI, N. Fuhr
IMA/ADPCM Algorithm
• compression rate: 4:1
• 16 bits/sample → 4 bits/sample
simple predictor:predicted value = previous sample
quantizer4 bits output:signed multiples of current step size/4
adaptation
• quantizer adapts step size based on
– current step size– quantizer output of previous input
• based on table lookup
• no side information required
good error recovery
Universitat Dortmund, Informatik VI, N. Fuhr
2.6.3.3 MPEG Audio
• lossy, but perceptually lossness compression
• 48 kHz sampling rate, 2*16 bits/sample
• compression rate: 6:1
• exploitation of auditory masking
���������������������������������������������������������������������������������������������������������������������������������������
���������������������������������������������������������������������������������������������������������������������������������������
SIGANLS ARE MASKEDREGION WHERE WEAKER
AM
PLIT
UD
E
FREQUENCY
STRONG TONAL SIGNAL
Universitat Dortmund, Informatik VI, N. Fuhr
Layer I
• filter bank divides audio signal into 32 frequency bands
• 12 samples per band
• for each nonzero sample:
– bit allocation– scale factor
output of layer I:frame with 32 groups of 12 samples = 384 samples
Universitat Dortmund, Informatik VI, N. Fuhr
Layer II
codes data in larger groups:frame with 3*12*32 samples
exploits common bit allocation and scale factors
Universitat Dortmund, Informatik VI, N. Fuhr
Layer III
• alias reduction:modified discrete cosine transformation
• logarithmic quantization
• entropy coding (Huffman)
• bit reservoir for effects due to entropy coding
• noise allocation instead of bit allocation
Universitat Dortmund, Informatik VI, N. Fuhr
Stereo redundancy coding
two types of coding:
• intensity stereo codingfor high frequencies:
– encode single summed signal for both channels– only independent scale factors
• middle/side stereo coding
– middle channel– + 2 side channels
Universitat Dortmund, Informatik VI, N. Fuhr
2.7 Video
2.7.1 Basics
2.7.1.1 B/W TV
presentation in greyscale only (luminance)
European format:
• 625 lines
• 833 colums
• ratio width/height: 4:3
• 25 frames/second
bandwidththeoretically:
1 s / 25 frames/s / 625 lines/frame= 64 µs/line= 15625 Hz
b/w changes between every pair of pixels in a line→ 15625 Hz * 833/2 ≈ 6.5 MHz
in practice: 5 – 5.5 MHz
Universitat Dortmund, Informatik VI, N. Fuhr
interlaced mode
50 half images/second
Universitat Dortmund, Informatik VI, N. Fuhr
2.7.1.2 Colour TV
colour representations:
RGB three basic colours: red, green, blue
YUV luminance (as in b/w TV) +2 chrominance channels - used in PAL
YIQ used for NTSC
Universitat Dortmund, Informatik VI, N. Fuhr
2.7.1.3 TV standards
NTSC National Television Systems Committee (USA)
– 30 images/second– 525 lines/image
PAL Phase alternating line (Germany)
– 25 images/second– 625 lines/image
Universitat Dortmund, Informatik VI, N. Fuhr
HDTV High definition Television (forthcoming)
HD-MAC Europe: 1250 lines, 50 Hz (interlaced)
MUSE Japan: 1125 lines, 60 Hz
NTSC USA: 1040 lines, 60 Hz
digital TV
component-wise coding: 4:2:2emphasis on luminanceluminance sampling: 13.5 MHzchrominance sampling: 6.75 MHz→ 216 Mbps
Universitat Dortmund, Informatik VI, N. Fuhr
2.7.1.4 Computer video
• non-interlaced display
• image rate: typically 70 Hz
• colour display:in RGB mode
a) with 24 bits/pixelb) via CLUT (colour lookup table)
8 or 16 bits/pixel → 256 or 65536 colours(out of 224)
Universitat Dortmund, Informatik VI, N. Fuhr
2.7.2 Media types
Media type Temporal: Analog video
• Representation
– Frame rate– Number of scan lines– Aspect ratio, e.g., 4:3– Interlacing, e.g., 2:1 fields per frame– Quality, e.g., signal-to-noise ratio and image resolu-
tion– Component versus composite
• Operations
– Storage: Tapes – Type B or C, Betacam, U-matic,Hi8, S-VHS, VHS; Videodisc
– Retrieval: based on time codes– Synchronization: avoid timebase jitter and timebase
phase shift using sync generator, genlock, timebasecorrector
– Editing: cuts-only editing, A-B roll editing, edit de-cision list (EDL)
– Mixing: cut, fade, dissolve (cross-fade), wipe, tum-ble, wrapping, keying
– Conversion: scan converter, standards conversion
Universitat Dortmund, Informatik VI, N. Fuhr
Media type Temporal: Digital video
• Representation
– Analog formats sampled: CCIR 601, digital compos-ite, CIF, QCIF, digital HDTV; synthesis, sampling
– Sampling rate– Sample size and quantization: linear, logarithmic– Data rate– Frame rate: 10, 15, 25, 30– Compression– Support for interactivity– Scalability: transmit scalability, receive scalability
• Operations
– Storage– Retrieval– Synchronization– Editing: tape based, nonlinear– Effects– Conversion
Universitat Dortmund, Informatik VI, N. Fuhr
2.7.3 MPEG-1/2
MPEG-video requirements
Generic standard
• independence of particular application
• acceptable quality for bandwidth of 1.5 Mb/s(as with CD-ROM)
Universitat Dortmund, Informatik VI, N. Fuhr
Applications
• digital storage medialow storage costs + sufficient bandwith (MPEG-1: 600MB/h = 1.5 Mb/s,MPEG-2: 1.8–4 GB/h = 0.5–1.1 MB/s)
– CD-ROM: 1.5 Mb/s– DVD: 1.1 MB/s– harddisc: ≥ 3 MB/s
• asymmetric applicationsfrequent decompression, compression only once
– electronic publishing
∗ education and training
∗ travel guidance
∗ videotext
∗ points of sale
– games– entertainment
• symmetric applicationsequal use of compression and decompression
– electronic publishing production– video mail– videotelephone– video conferencing
Universitat Dortmund, Informatik VI, N. Fuhr
Features of the compression algorithm
• random access
– access to any frame– access time ≤ 0.5 s– access points:
information unit coded without reference to otherunits
• fast forward/reverse searches
– scan compressed bit stream– display selected pictures
• reverse playback
– for specific applications only– possible without extreme memory requirements
• audio-visual synchronization
– permanent resynchronization of audio and video– integration of multiple audio and video signals
• robustness to errors
• coding/decoding delay(limited according to specific application)videotelephone: 150 ms
• editabilitypossibility of constructing short editing units
• format flexibility
– raster size– frame rate
• cost tradeoffs
– decoding with small chipsets– real time encoding possible (1990)
Universitat Dortmund, Informatik VI, N. Fuhr
Overview of the MPEG compression algorithm
quality requirements→ high compression rate→ interframe encoding
random access requirements→ intraframe coding
Universitat Dortmund, Informatik VI, N. Fuhr
MPEG-1:
• block-based motion compensationfor temporal redundancy reduction
– causal (predictive) coding: P frames– noncausal (interpolative) coding: B frames
• DCT-based spatial redundancy coding(as in JPEG)
Universitat Dortmund, Informatik VI, N. Fuhr
Temporal redundancy reduction
frame types:
• intra-frames (I)
– access points for random access– moderate compression
• prediction frames (P)
– coded with reference to a past (I or P) frame– used as reference for future P frames
• interpolation (bidirectional prediction) frames (B)
– reference to a past and a future P frame– never used as reference
reference always uses motion prediction
ratio I:P:B frames is application-specific
Universitat Dortmund, Informatik VI, N. Fuhr
Forward prediction
1 2 3 4 5 6 7 8
I B B B P B B B
Bidirectional prediction
I
9
transmission order: I P B B B I B B B
FDCT QuantizationColorspace
converter
Entropyencoder
Colorspace
converter
FDCT
Entropyencoder
Reference
Errorterms
Moniorestimator
(RGB YUV)
(RGB YUV)
Compressed image data100111001 ...
Compressed image data100111001 ...+
+
-
I frame
P/B frame
Universitat Dortmund, Informatik VI, N. Fuhr
motion compensation
matchBest
matchBest
3. Block B = (Block A + Block C)/22. Block B = Block C1. Block B = Block A
Block-Matching Technique
Previous frame
Future frame
Current frameA
C
B
• prediction
– local modelling of current picture as translation ofpicture at some previous time
– locality: amplitude and direction of displacementmay vary over the picture
• interpolation
– improves random access– reduces effect of errors– increases image quality
Universitat Dortmund, Informatik VI, N. Fuhr
multiresolution technique:
– subsignal with low temporal resolution (1/3 . . . 1/2frame rate)
– full-resolution signal =interpolation of low-resolution signal + correctionterm
– interpolation uses combination of past and futurereferences (bidirectional)
Universitat Dortmund, Informatik VI, N. Fuhr
bidirectional prediction
advantages:
• deals properly with areas not covered by prediction
• noise reduction by averaging between past and future ref-erence frames
• allows decoupling between prediction and coding(no error propagation)
• trade-off due to frequency of B frames:more B frames→ lower correlation of B frames with references,→ lower correlation between referencestypically: 10 B frames per seconde.g. I B B P B B P B B . . . I B B P B B
Universitat Dortmund, Informatik VI, N. Fuhr
motion representation in B frames
macroblock: 16 * 16 pixels
predictor of a macroblock depends on reference frames:
x coordinate of picture element
mj1 motion vector relative to reference frame Ij
(motion estimation information)
prediction modes:
macroblock type predictor
intra I1(x) = 128
forward predicted I1(x) = I0(x + m01)
backward predicted I1(x) = I2(x + m21)
average I1(x) = 0.5[I0(x + m01) + I2(x + m21)]
prediction error in each case: I1(x) − I1(x)
Universitat Dortmund, Informatik VI, N. Fuhr
motion estimation
computation of motion vectors:not specified in MPEG standard
typically:block-based matching technique, combined with cost funtion
Ic current frame
Ir reference frame
Mi macroblock in Ic
vi displacement of Mi w.r.t. Ir
V search range of possible motion vectors
D cost function
optimal displacement:
v∗i = min
v∈V
∑x∈Mi
D (Ic(x) − Ir(x − v))
(V , D chosen by implementation)
Universitat Dortmund, Informatik VI, N. Fuhr
Spatial redundancy reduction
fixed JPEG variant:
• 8 bits per pixel
• 1 luminance component, 2 chrominance components
• fixed DCUs: macroblock with 16*16 luminance pels, 8*8chrominanc pels
• Huffman entropy coding
• sequential encoding
Universitat Dortmund, Informatik VI, N. Fuhr
Layered structure, syntax and bit stream
goals
• genericity
• flexibilityvideo sequence parameters:
– picture width– picture height– pixel aspect ratio– frame rate– bit rate– buffer size
• efficiency
Universitat Dortmund, Informatik VI, N. Fuhr
layered syntax
• sequence layer(random access unit: context)
• group of frames layer(random access unit: video coding)
• frame layer(primary coding unit)
• slice layer(resynchronizing unit)
• macroblock layer(motion compensation unit)
• block layer(DCT unit)
Universitat Dortmund, Informatik VI, N. Fuhr
bit stream
• bit sequence consistent with syntax
• video buffer constraints
• decoding process
BufferMUX
-1
Q-1 IDCT +
+ Ref
Ref
MacroBlock Type
Motion vectors
Universitat Dortmund, Informatik VI, N. Fuhr
Standard and quality
Conformance: encoder and decoder
• bit stream and decoding process:standard defines syntax and meaning
• encoders and decoders:standard defines decoding process
Universitat Dortmund, Informatik VI, N. Fuhr
Resolution, bit rates and quality
VHS-like quality at 1.2 Mb/s
constrained parameter bit streams (CPB):
• horizontal size ≤ 720 pels
• vertical size ≤ 576 pels
• max. # macroblocks/picture ≤ 396
• max. # macroblocks/second ≤ 396·25 = 330·30• frame rate ≤ 30 frames/second
• bit rate ≤ 1.86 Mb/second
• decoder buffer ≤ 376832 bits
CIF format:352*240, 30 Hz / 384*288, 25 Hzyields 1.2–3 Mbps
CIF format often mixed up with MPEG-1but: MPEG-1 allows frame sizes up to 4096*4096!
Universitat Dortmund, Informatik VI, N. Fuhr
MPEG-2
for wider range of applications and higher bandwidth
• backward compatibility to MPEG-1
• support for interlaced video
• improvements on coding efficiency
• multiresolution video
• multichannel audio
typical frame sizes (in kbits):
Mbps Picture typeI P B Avg.
MPEG-1 SIF 1.15 150 50 20 38MPEG-2 601 4.00 400 200 80 130
Universitat Dortmund, Informatik VI, N. Fuhr
2.7.4 MPEG-4
Content-based interactivity
• Content-based multimedia data access tools• Content-based manipulation and bit-stream editing• Hybrid natural and synthetic data coding• Improved temporal random access
Compression
• Improved coding efficiency• Coding of multiple concurrent data streams
Universal access
• Robustness in error-prone environments• Content-based scalability
Universitat Dortmund, Informatik VI, N. Fuhr
Basic concepts
AV objects:
• video object component
• audio object component
video object plane (VOP):2D video object“frame” may consist of
• Only 1 VOP (2D)
• 2 or more mutually disjoint VOPs, resulting from the seg-mentation of a 2D scene
• 2 or more VOPs, resulting from the composition of thescene from several sources
possible object manipulations:
• change of the spatial position of an object (VOP) in thescene
• application of a spatial scaling factor to an object in thescene
• change of the ‘speed’ with which an object moves in thescene
• inclusion of objects (VOPs) available at the composer butnot currently in display
• deletion of an object in the scene
• change of the scene area being displayed
Universitat Dortmund, Informatik VI, N. Fuhr
a scene with three AVOs:
a scene before transformation:
a scene after the receiver transformation:
Universitat Dortmund, Informatik VI, N. Fuhr
2.8 MPEG-7
2.8.1 Introduction
content description for audio-visual data
Universitat Dortmund, Informatik VI, N. Fuhr
Terminology
Data
• audio-visual information,• described by MPEG-7,• regardless of storage, coding, display, transmission,
medium, or technology.
Feature distinctive characteristic of the Data
Descriptor representation of a Feature, defines the syntax andthe semantics of the Feature representation
Descriptor Value an instantiation of a Descriptor for a givendata set
Description Scheme (DS) specifies structure and semanticsof relationships between Descriptors and/or DescriptionSchemes
Description DS (structure) + Descriptor Values (instantia-tions) describing Data.
Coded Description Description encoded for reasons of com-pression efficiency, error resilience, random access, etc.
Description Definition Language (DDL) language allow-ing for the creation / extension / modification of Descrip-tion Schemes and Descriptors
Universitat Dortmund, Informatik VI, N. Fuhr
Abstract architectue of MPEG-7 applications
Universitat Dortmund, Informatik VI, N. Fuhr
MPEG-7 parts
Systems tools for
• transport,• storage,• synchronization between content and descriptions,• managing and protecting intellectual property
Description Definition Language language for definingnew Description Schemes + Descriptors.
Audio Descriptors and Description Schemes dealing with(only) Audio descriptions
Visual Descriptors and Description Schemes dealing with(only) Visual descriptions
Generic entities and Multimedia Description SchemesDescriptors and Description Schemes dealing with genericfeatures and multimedia descriptions
Reference Software software implementation of relevantparts of the MPEG-7 Standard
Conformance guidelines and procedures for testing confor-mance of MPEG-7 implementations.
Universitat Dortmund, Informatik VI, N. Fuhr
2.8.2 MPEG-7 systems
tools for
• transport,
• storage,
• synchronization between content and descriptions,
• managing and protecting intellectual property
– to be defined in the future –
Universitat Dortmund, Informatik VI, N. Fuhr
2.8.3 MPEG-7 Description Definition Lan-guage (DDL)
language for defining new Description Schemes + Descriptorsrequirements:
• express spatial, temporal, structural, and conceptual re-lationships
• rich model for links and references between descriptionsand the data
• validation of descriptor data types
• platform and application independent
• human- and machine-readable
→ based on XML syntax
Universitat Dortmund, Informatik VI, N. Fuhr
XML Schema Overview
• XML Schema:
– datatypes– simple and complex types– elements– inheritance, abstract types
• MPEG-7 Extensions:
– array and matrix datatyp– enumerated datatypes for Mime type, country code,
region code, currency code and character set code– typed references
Universitat Dortmund, Informatik VI, N. Fuhr
2.8.4 MPEG-7 Audio
Audio description tools for
• Sound effects description• Instrument description• Speech Recognition description
Audio Descriptor Frameworklow-level audio description
Universitat Dortmund, Informatik VI, N. Fuhr
2.8.5 MPEG-7 Video
• Color
• Texture
• Shape
• Motion
Universitat Dortmund, Informatik VI, N. Fuhr
Color Descriptors
Color space RGB, YUV, HSV, HMMD
Dominant color(s)
Color Histogram
Color Quantization
GoF/GoP Color Histogram Group of Frames/Group ofPictures color histogramaverage/ median / intersection
Color-Structure Histogram local cooccurrence of colors
Color Layout spatial distribution of color
Haar transformed Binary Histogram compact descriptorfor color (63 bits)
Universitat Dortmund, Informatik VI, N. Fuhr
Texture Descriptors
Luminance Edge Histogram spatial distribution of four di-rectional edges and one non-directional edge
Homogenous Texture Descriptors 2 descriptors:
1. structuredness, directionality and coarseness2. quantitative description (62 factors)
Universitat Dortmund, Informatik VI, N. Fuhr
Shape Descriptors
1. Object Bounding Box
2. Region-Based Shape
3. Contour-Based Shape
Universitat Dortmund, Informatik VI, N. Fuhr
Motion Descriptors
• Camera Motion
• Object Motion Trajectory
Universitat Dortmund, Informatik VI, N. Fuhr
2.8.6 MPEG-7 Multimedia DescriptionSchemes
Universitat Dortmund, Informatik VI, N. Fuhr
Content Management
Universitat Dortmund, Informatik VI, N. Fuhr
Navigation & Access
Summary Efficient support of browsing
• Hierarchical: Coarse to fine• Sequential: 1D temporal structure
Variation Substitution of the original content
• Adaptation to terminal, network, or user preferences
Universitat Dortmund, Informatik VI, N. Fuhr
MMDS: elements and functionality
Creation & Production Meta information describing cre-ation and production of the content
• title,• creator,• classification,• purpose of the creation,• etc.
Usage Meta information related to the usage of the content:
• rights holders,• access right,• publication,• financial information.
Media Description of storage media:
• storage format,• encoding of the AV content• identification of the media
Universitat Dortmund, Informatik VI, N. Fuhr
Structural aspects description structured around segments
• physical spatial, temporal or spatio-temporal com-ponent of the AV content
• signal-based features (color, texture, shape, motion,audio features) + elementary semantic information
Conceptual aspects Description from the conceptual view-point (under development)
Universitat Dortmund, Informatik VI, N. Fuhr
2.9 Other media
2.9.1 Music
Temporal
• Representation
– Operational versus symbolic– MIDI– SMDL: Standard Music Description Language
(SGML)
• Operations
– Playback and synthesis– Timing– Editing and composition
Universitat Dortmund, Informatik VI, N. Fuhr
2.9.1.1 MIDI
(Music Instrument Digital Interface)
defines interface between electronic music instruments and com-puters
compact representation of music data(≈ 0,3 kB/sec, vs. 176 kB/sec for CD audio)
basic idea:coding comprises
• name of instrument,
• start/end of note,
• base frequency,
• volume
Universitat Dortmund, Informatik VI, N. Fuhr
MIDI: model
• 16 channels for data transmission
– each channel corresponds to a synthesizer– several instruments can play different notes at the
same time– 3–16 simultanous notes per channel
(subject to quality of synthesizer)
• 128 instruments
– including sound effects(e.g. telephone, helicopter)
– addressed by unique number 0–127
• MIDI-clock
– allows for synchronization between sender and re-ceiver
– 24 ticks per quarter note
• SMPTE time code as alternative to MIDI-clock
– SMPTE = Society of Motion Picture and TelevisionEngineers
– SMPTE defines format:hours:minutes:seconds:frames(e.g. 30 frames/sec)
Universitat Dortmund, Informatik VI, N. Fuhr
MIDI infrastructure
• components
– input: typically via keyboard (like piano)different instruments can be imitated
– output: typically via synthesizer(transforms stored digital signal via D/A trans-former in acoustic signal)
– sequencer as editor for MIDI datauser interface: notes / technical MIDI data
• multimedia applications based on MIDI data allow for in-stantanous output via synthesizer
– MIDI requires precise timing of data transmission
Universitat Dortmund, Informatik VI, N. Fuhr
2.9.2 Graphics
Non-temporal
• Representation
– Geometric models (used in GKS, PHIGS, PEX)– Solid models: constructive solid geometry, surfaces
of revolution, extrusion– Physically based models (considering mass, velocity,
rigidity)– Empirical models: fractals, particle systems– Drawing models: PostScript, LOGO graphics– External formats for models: CGM, Render-Man In-
terface Binary (RIB)
• Operations
– Primitive editing: for objects, of vertex coordinates,surface normals
– Structural editing: creating, modifying, spatial rela-tionships
– Shading: flat, Gouraud, Phong, ray tracing, radios-ity, programmable shaders
– Mapping: texture mapping, bump mapping, dis-placement mapping, environment mapping, shadowmapping
– Lighting: ambient light, point lights, directionallights, spot lights
– Viewing: 2 or 3D, parallel and perspective projec-tions
– Rendering: converts a model (shading, lighting,viewing info) into an image
Universitat Dortmund, Informatik VI, N. Fuhr
2.9.3 Animation
Temporal
• Representation
– Cel models: celluloid sheets– Scene-based models– Event-based models– Keyframes– Articulated objects and hierarchical models– Scripting and procedural models– Physically based and empirical models
• Operations
– Graphics– Motion and parameter control– Animation rendering– Animation playback
Universitat Dortmund, Informatik VI, N. Fuhr
Other Media
• Media type Other: Extended Images
• Media type Other: Digital ink
• Media type Other: Speech audio
• Media type Temporal: Animation
Universitat Dortmund, Informatik VI, N. Fuhr
2.10 Multimedia
• MHEG
• SMIL
Universitat Dortmund, Informatik VI, N. Fuhr
2.10.1 MHEG
standard for interoperability and interchange of hypermedia(MH) objects
application areas:
• training and education
• documentation
• electronic books
• computer-supported multimedia cooperative work
• point of information
• medical applications
issues:
• association of content and presentation attributes
• synchronization in space and time
• linking between components
Universitat Dortmund, Informatik VI, N. Fuhr
2.10.1.1 The MHEG Standard
object:coded representation of independent and elementary unit of in-formation
objects interchanged and handled by applications
types of objects:
• monomedia
– text– graphics– image– audio– video– menu
• aggregated objectsdifferent media,with internal synchronization and links
input/output objects
Universitat Dortmund, Informatik VI, N. Fuhr
Specifity of the MHEG Standard Scope
• interactivity and multimedia synchronization
• real-time presentation
• real-time interchange
• final form presentations
Universitat Dortmund, Informatik VI, N. Fuhr
2.10.1.2 MH Objects Classes
Object Orientation
advantages of object-orientation:
• data encapsulation
• inheritance
• homogeneity of MH object descriptions
• representation of behaviour(autonomous objects in highly dynamic environment)
Universitat Dortmund, Informatik VI, N. Fuhr
Representation of MH Objects
content objectencoded monomedia data + decoding and presentationinformation
projector objectpresentation attributes for content or composite object
basic objectcontent + projector object
composite objectset of MH objects + temporal and spatial interobject re-lations
conditional action set objectdefines relations based on conditions
generic input objectdefines selection + text input methods
Universitat Dortmund, Informatik VI, N. Fuhr
MH Object Classes
object hierarchy:
MH object
• all-object
• clock
• null
Universitat Dortmund, Informatik VI, N. Fuhr
• all-object
– output content
∗ text content
∗ graphics content
∗ still picture content
∗ audio content
∗ audiovisual sequence content
– generic input
∗ action-button
∗ stay-on-button
∗ on-off button
∗ menu selection
∗ multiple selection
∗ etc . . .
– projector
∗ area projector
· text projector
· graphics projector
· still picture projector
· input projector
∗ audio projector
∗ audiovisual projector
– basic– spatio-temporal composites– conditional action set
Universitat Dortmund, Informatik VI, N. Fuhr
2.10.1.3 Methodology for MH Object Classes De-scription
4 levels:
1. informal text description
2. object-oriented definition
• class hierarchy• class behaviour• structure and semantics of attributes
3. notation of structure of presentationASN.1 syntax (abstract syntax notation)
4. coded object representation (ASN.1)
Universitat Dortmund, Informatik VI, N. Fuhr
2.10.1.4 Basic Objects Representation
basic object = content + projector object
content class:
• general attributes (inherited from superclass)
• specific attributes for encoding parameters
projector class:parameters relevant for presentation
• area projector: position + area size
• audio projector: volume, stereo/mono, balance, speed
Universitat Dortmund, Informatik VI, N. Fuhr
example: still picture object class(object-oriented definition)
descriptioninherits from = content classinherited by = NONE
representation(notation of structure)
• coding method
• coding parameters
• JPEG-parameters
• Huffman/arithmetic
• progressive/sequential
• color space
• source pixel density
• source data precision
• source image format
Universitat Dortmund, Informatik VI, N. Fuhr
2.10.1.5 Composite Objects: Multimedia Syn-chronization
General Considerations
synchronization modes:
• script defined by a using application
• system synchronization already provided within the ob-ject(e.g. MPEG)
• spatiotemporal synchronization provided by compositionof child objects within parent object
• conditional synchronization provided by management ofevents generated by
– other objects,– user’s interaction, or– a using application
Universitat Dortmund, Informatik VI, N. Fuhr
description of conditional synchronization
conditioncombination of event(s) + additional conditions
event
• event typestart/end of object, elapsed time
• object id• current state of object
running, stopped, selected
end, object ni, state=running
additional conditiondescribes context in which event occursobject nj , state=stopped
actionto be performed when condition=true
conditional action setset of (condition,action) pairs
Universitat Dortmund, Informatik VI, N. Fuhr
multimediascenarioStort of the
Fixed-delayinput
T 1
Altanatescenario
: syncro conditioning events(generated by presentationprocess, or by user’s interaction)
Picture
Sound
User response
Text 1 Text 2
S1 S2
Delay
T 2
time
END
Maxtime
{ MPEG sequencePicture n° x
Graphics1Delay
(text and graphics on video)1
Universitat Dortmund, Informatik VI, N. Fuhr
Space and Time Relations
placement of objects in space and time,based on attributes:
spatial position
• parallel relation• serial relation
Area sizeObject A
Area sizeObject B
1
2
3
1 2 3 4
X1=1Y1=2
MHgenericspaceorigin
X2=3Y2=1
Parallel spatial relation
MH generic coordinate space
Universitat Dortmund, Informatik VI, N. Fuhr
X1=1Y1=3
X2=2Y2=-1
Area sizeObject A
Area sizeObject B
1
2
3
1 2 3 4
Serial spatial relation
MHgenericspaceorigin
MH generic coordinate space
Universitat Dortmund, Informatik VI, N. Fuhr
temporal position
• parallel relation• serial relation
Temporal parallel relation
Parent object
Child object 1
Child object 2
t2
t1
Universitat Dortmund, Informatik VI, N. Fuhr
Parent object
Temporal serial (or sequential) relation
Child object 1
Child object 2
t2
t1
Universitat Dortmund, Informatik VI, N. Fuhr
General Framework for Spatiotemporal Composi-tion Representation
representation of composite objects:
1. description of relationshipin terms of position in time and space
2. list of component objectscomponent object:
• contained in the composite object, or• referenced by application-provided instance number,
or• standardized reference to external object
Universitat Dortmund, Informatik VI, N. Fuhr
2.10.1.6 Input Objects
buttons
• action-button: trigger, yields event• stay-on-button: trigger + local boolean variable• switch button: two-state input object
menu selectionyields number of selected item
multiple selectionyields indication of selected items
character stringcharacter sequence + text attributes
locationyields horizontal + vertical coordinates
numerical valueyields integer between minimum and maximum,linearly related to cursor position
Universitat Dortmund, Informatik VI, N. Fuhr
2.11 SMIL
Synchronized Multimedia Integration Language(W3C standard)
motivation:
• spatio-temporal composition of presentations
• declarative spezification
• text-based format
• specified as XML-DTD
• non-interactive presentations only! (except via linking)
Universitat Dortmund, Informatik VI, N. Fuhr
SMIL concepts
• media objects referenced via URIs
• spatial and temporal addressing by means of intervals andregions
• all objects in a single root window
• Z index for layer ordering for visual display
• spercification of temporal synchronization
• hard and soft synchronization:
hard: for audio-video synchronization, limited jittersoft: for background music; only fixed starting time
• alternative content for different presentation qual-ity/output devices
• flexiblelinking model
• semantic annotations
Universitat Dortmund, Informatik VI, N. Fuhr
SMIL example
<smil> <head>
<meta name="Title" content="Welcome to RealPlayer" />
<meta name="Author" content="RealNetworks" />
<meta name="Copyright" content = "(c) Real" />
<layout>
<root-layout height="300" width="350"
background-color="black" />
<region id="full_screen" left="0" top="0"
height="300" width="350" fit="fill" z-index="1" />
</layout></head>
<body> <par>
<audio src="firstrun.rm" />
<animation src="firstrun.swf" region="full_screen"
fill="freeze">
<anchor href="command:openwindow(tutorial,
http://ramhurl.real.com/g2install.html?
file=tutorials/607/free/overview.smi)"
coords="40,130,315,160" begin="14.9s" />
<anchor href="command:openwindow(tutorial,
http://ramhurl.real.com//take5demo.smi)"
coords="40,170,315,200" begin="14.9s" />
<anchor href="command:openwindow(tutorial,
http://ramhurl.real.com/start.smi)"
coords="40,205,315,235" begin="14.9s" />
</animation>
</par> </body>
</smil>
Universitat Dortmund, Informatik VI, N. Fuhr