fundamentals of multimedia slide

7/30/2019 Fundamentals of Multimedia Slide

1/304

Multimedia Systems

1


2/304

What is Multimedia?

2

When different people mention the term multimedia, they often have quitedifferent, or even opposing, viewpoints

A PC vendor

A PC that has sound capability, a DVD-ROM drive, and perhaps the superiority of

multimedia-enabled microprocessors that understand additional multimedia

instructions

A consumer entertainment vendor

interactive cable TV with hundreds of digital channels available, or a cable TV-like

service delivered over a high-speed Internet connection.

A Computer Science (CS) student

Applications that use multiple modalities, including text, images, drawings (graphics),

animation, video, sound including speech, and interactivity.


3/304

What is Multimedia?

3

Multimedia is Multiple forms of information content and information processing (e.g. text,

audio, graphics, animation, video, interactivity) to inform or entertain

Computer-controlled integration of text, graphics, drawings, still and moving

images (Video), animation, audio, and any other media where every type of

information can be represented, stored, transmitted and processed digitally

Characterized by the processing, storage, generation, manipulation and

rendition of Multimedia information

Multimedia may be broadly divided into linear and non-linear categories. Linearactive content progresses without any navigation control for the viewer such as a

cinema presentation. Non-linear content offers user interactivity to control progress as used with a computer

game or used in self-paced computer based training.


4/304

History of Multimedia and Hypermedia

Newspaper:

perhaps the first mass communication medium, uses text, graphics, andimages

Motion pictures:

conceived of in 1830's in order to observe motion too rapid for perceptionby the human eye

Wireless radio transmission: Guglielmo Marconi, at Pon-tecchio, Italy, in 1895

Television:

the new medium for the 20th century, established video as a commonlyavailable medium and has since changed the world of mass

communications

The connection between computers and ideas about multimediacovers only a short period

4


5/304

Characteristics of a Multimedia System

5

A Multimedia system has four basic characteristics: Multimedia systems must be com puter con tro l led

Multimedia systems are integrated

The information they handle must be represented digi tal ly

The interface to the final presentation of media is usually interact ive


6/304

Challenges of a Multimedia System

6

Very High Processing Power

needed to deal with large data processing and real time delivery of media.

Multimedia Capable File System needed to deliver real-time media -- e.g. Video/Audio Streaming. SpecialHardware/Software needed e.g RAID

technology.

Data Representations/File Formats that support multimedia Data representations/file formats should be easy to handle yet allow for compression/decompression in real-time

Efficient and High I/O input and output to the file subsystem needs to be efficient and fast. Needs to allow for real-time recording as well as

playback of data. e.g. Direct to Dis krecording systems.

Special Operating System to allow access to file system and process data efficiently and quickly. Needs to support direct transfers to disk, real-

time scheduling, fast interrupt processing, I/O streaming etc.

Storage and Memory

large storage units (of the order of 50 -100 Gb or more) and large memory (50 -100 Mb or more). Network Support

Client-server systems common as distributed systems common.

Software Tools user friendly tools needed to handle media, design and develop applications, deliver media.


7/304

Application of multimedia

7

Multimedia finds its application in various areas including

Advertisements

Art

Education

Entertainment

Engineering Medicine

Mathematics

Business

Scientific research and Spatial, temporal applications


8/304


World Wide Web

Hypermedia courseware

Video conferencing

Video-on-demand

Interactive TV

Groupware

Home shopping

Games

Virtual reality

Digital video editing and production systems

Multimedia Database systems

8


9/304

Topics

9

Issues in Multimedia (Authoring and Design)

Multimedia authoring versus programming Difference between multimedia authoring and programming

Multimedia application design

Design stages, storyboarding

Multimedia software tools

Audio sequencing, image/graphics editing, animation, multimedia authoring Text

Fonts and faces, Character set and alphabets, Font Editing and Design tools

Images/graphics

Digital images, Image data types, colors

Audio Sound digitization, audio file formats


10/304

Topics contd

10

Video

Text compression

Image compression

Audio Compression

Video Compression

Multimedia Hardware & Software

Content-based Multimedia Retrieval

Multimedia Network Communications

Use of previous programming skill


11/304

Multimedia Authoring and Tools

11

Multimedia authoring: creation of multimedia productions, sometimescalled movies orpresentations

Why should you use an authoring system?

An authoring System has pre-programmed elements for the development of interactivemultimedia software titles.

Authoring systems vary widely in orientation, capabilities, and learning curve.

There is no such thing as a completely point-and-click automated authoring system

authoring is actually just a speeded-up form of programming

we are mostly interested in interactive applications

we also have a look at still-image editors such as Adobe Photoshop, and simple videoeditors such as Adobe Premiere since they help to create interactive multimedia projects

The interaction goes to from no interactivity to virtual reality creation

Control the pace like click next, able to control the sequence, able to control the object


12/304

Multimedia Authoring

Paradigms/methodology

12

Multimedia Authoring Metaphors

Multimedia Production

Multimedia Presentation

Automatic Authoring


13/304


13

Scripting Language Metaphor: use a special language to enable interactivity

(buttons, mouse, etc.), and to allow conditionals, jumps, loops, functions/macrosetc. E.g., a smallToolbook program is as below:


14/304

Topics contd

14

Video

Text compression

Image compression

Audio Compression

Video Compression

Content-based Multimedia Retrieval

Use of previous programming skill


15/304

Multimedia Authoring andTools


16/304


Hypermedia courseware

Video conferencing

Video-on-demand

Interactive TV

Groupware

Home shopping

Digital video editing and production systems

Multimedia Database systems

World Wide Web

Games

Virtual reality

16


17/304


Multimedia authoring: creation of multimedia productions,sometimes called movies or presentations. Authoring involves the assembly and bringing together of Multimedia

with possibly high level graphical interface design and some highlevel scripting.

Programming involves low level assembly and construction and

control of Multimedia and involves real languages like C and Java.

An authoring System has Pre-programmed elements for the development of interactive

multimedia software

Vary widely in orientation, capabilities, and learning curve There is no completely point-and-click automated authoring system

A speeded-up form of programming, 1/8 of programmingdevelopment time


18/304


Focus in interactive applications Why?

The level of interaction goes from no interactivity to virtualreality creation

Control the pace like click next, able to control thesequence, able to control the object, able to control the

entire simulation It also includes image editors such as Adobe Photoshop, and

simple video editors such as Adobe Premiere since they helpto create interactive multimedia projects

In this section, we take a look at Multimedia Authoring Metaphors

Multimedia application Production

Automatic Authoring
http://localhost/var/www/apps/conversion/tmp/scratch_3/Multimedia%20Systems/interactive%20features.pptxhttp://localhost/var/www/apps/conversion/tmp/scratch_3/Multimedia%20Systems/interactive%20features.pptx


19/304


1. Scripting Language Metaphor

Use a special language to enable interactivity (buttons, mouse,etc.), and to allow conditionals, jumps, loops, functions/macros etc.

Closest to programming

Tend to be longer in development time

Run time speed is minimal

2. Iconic/Flow-control Metaphor Graphical icons are available in a toolbox, and authoring proceeds

by creating a flow chart with icons attached

Speediest in development time and suited for short time projects

Suffer least from runtime speed problems 19

global gNavSpriteon exitFramego the frameplay sprite gNavSpriteend


20/304

Fig. 2.1: Authorware flowchart


21/304

4. Hierarchical Metaphor Represented by embedded objects and iconic properties

User-controllable elements are organized into a tree structure

learning curve is non-trivial

Often used in menu-driven applications

5. Frames Metaphor

Like Iconic/Flow-control Metaphor; however links between icons aremore conceptual, rather than representing the actual flow of theprogram

This is a very fast development system but requires a good auto-debugging function


22/304

Fig. 2.2: Quest Frame


23/304

7. Cast/Score/Scripting Metaphor:

Time is shown horizontally; like a spreadsheet:rows, ortracks, represent instantiations ofcharacters in a multimedia production.

Multimedia elements are drawn from a cast ofcharacters, and scripts are basically event-procedures or procedures that are triggered bytimer events.

Director, by Macromedia, is the chief example ofthis metaphor. Director uses the Lingo scriptinglanguage, an object-oriented event-drivenlanguage.


24/304

Multimedia Application Production

the multimedia design phase consists of

Storyboarding help to plan the general organization or content of a presentation by recording

and organizing ideas on index cards, or placed on board/wall. Insure mediaare collected and organized

Flowcharting

Adds navigation information for the story board, the multimediaconcept structure and user interaction followed by detailfunctional requirement specification

Prototyping and user testing

parallel media production

Two types of design considerations needs to be also made Multimedia content and technical design


25/304

Multimedia Content Design

Content design deals with what to say and what vehicle to use.

There are five ways to format and deliver your message. You can write it,illustrate it, wiggle it, hear it, and interact with it.

Writing (Scripting) Understand your audience and correctly address them

Keep your writing as simple as possible. (e.g., write out the full message(s)

first, then shorten it Make sure technologies used complement each other

Illustrate(Graphics) Make use of pictures to effectively deliver your messages.

Create your own (draw, (color) scanner, PhotoCD, ...), or keep "copy files"of art works

Graphic styles

Fonts

colors


26/304

Multimedia Content Design

Graphics Styles: Human visual dynamics impact howpresentations must be constructed.

(a) Color principles and guidelines: Some colorschemes and art styles are best combined with a

certain theme or style. A general hint is to not use toomanycolors, as this can be distracting.

(b) Fonts: For effective visual communication in apresentation, it is best to use large fonts (i.e., 18 to 36

points), and no more than 6 to 8 lines per screen(fewer than on this screen!). Fig. 2.4 shows acomparison of two screen projections:

26


27/304

Fig. 2.4: Colours and fonts [from Ron Vetter].


28/304

(c) A color contrast program: If the text color is some triple

(R,G,B), a legible color for the background is that color subtracted from the maximum (here assuming max=1):

(R, G,B)(1 R, 1 G, 1 B) (2.1)

Some color combinations are more pleasing than others; e.g., a pink background and forest green foreground, or agreen background and mauve foreground. Fig. 2.5 showsa small VB program (textcolor.exe) in operation:


29/304

Fig. 2.5: Program to investigate colours andreadability.


30/304

Fig. 2.6: Colour wheel

- Fig. 2.6, shows a colour wheel, with opposite coloursequal to (1-R, 1-G, 1-B)


31/304

wiggling (Animation)

1. Types of animation

Character Animation - humanize an object Highlights and Sparkles

To pop a word in/out of the screen, to sparkle a logo

Moving Text

Video - live video or digitized video

2. When to Animate Only animate when it has a specific purpose

Enhance emotional impact

Make a point

Improve information delivery

Indicate passage of time

Provide a transition to next subsection

31


32/304

Video Transitions

Video transitions: to signal scene changes.

Many different types of transitions:

1. Cut: an abrupt change of image contentsformed by abutting two video framesconsecutively. This is the simplest and most

frequently used video transition.


33/304

2. Wipe: a replacement of the pixels in a region ofthe viewport with those from another video. Wipes

can be left-to-right, right-to-left, vertical, horizontal,like an iris opening, swept out like the hands of aclock, etc.

3. Dissolve: replaces every pixel with a mixture over

time of the two videos, gradually replacing the firstby the second. Most dissolves can be classified astwo types: cross dissolve and dither dissolve.


34/304

Type I: Cross Dissolve Every pixel is affected gradually. It can be

defined by:

(2.2)

where A and B are the color 3-vectors forvideo A and video B. Here, (t) is a transitionfunction, which is often linear:

(2.3)

(1 ( )) ( )t t D A B

( ) , with 1maxt k t k t


35/304

Type II: Dither Dissolve

Determined by (t), increasingly moreand more pixels in video A will abruptly(instead of gradually as in Type I) change

to video B.


36/304

Fade-in and fade-out are special types of

Type I dissolve: video A or B is black (orwhite). Wipes are special forms of Type IIdissolve in which changing pixels follow aparticular geometric pattern.

Build-your-own-transition: Suppose wewish to build a special type of wipe whichslides one video out while another videoslides in to replace it: a slide (orpush).


37/304

(a) Unlike a wipe, we want each video frame not beheld in place, but instead move progressively fartherinto (out of) the viewport.

(b) Suppose we wish to slide VideoL in from the left,and push out VideoR. Figure 2.9 shows this

process:

Fig. 2.9: (a): VideoL. (b): VideoR. (c): VideoL slidinginto place and pushing out VideoR.


38/304

Hearing (Audio)

Types of audio in multimedia application

Music - set the mood of the presentation, enhance the

emotion, illustrate points

Sound effects - to make specific points, e.g., squeaky

doors, explosions, wind, ...

Narration - most direct message, often effective


39/304

Interactivity (interacting)

Interactive multimedia systems

People remember 70% of what they interact with Menu driven programs/presentations

-often a hierarchical structure (main menu, sub-menus, ...)

Hypermedia

less structured, cross-links between subsections of the samesubject, nonlinear, quick access to information +: easier forintroducing more multimedia features,

Simulations / Performance-dependent Simulations

e.g., Games - SimCity, Flight Simulators


40/304

Technical Design Issues 1. Computer Platform: Much software is ostensibly portable

but cross-platform software relies on run-time modules whichmay not work well across systems.

2. Video format and resolution: The most popular video

formats NTSC, PAL, and SECAM are not compatible, soa conversion is required before a video can be played on aplayer supporting a different format.

3. Memory and Disk Space Requirement: At least 128 MBof RAM and 20 GB of hard-disk space should be available foracceptable performance and storage for multimediaprograms.


41/304

4. Delivery Methods: Not everyone/everywhere has rewriteable DVD

drives, as yet.

CD-ROMs: may be not enough storage to holda multimedia presentation. As well, access time

for CD-ROM drives is longer than for hard-diskdrives.

Electronic delivery is an option, but depends on

network bandwidth at the user side (and atserver). A streaming option may be available,depending on the presentation.


42/304

Automatic Authoring Hypermedia documents: Generally, three

steps:

1. Capture of media: From text or using an audio

digitizer or video frame-grabber; is highly developedand well automated.

2. Authoring: How best to structure the data in orderto support multiple views of the available data, rather

than a single, static view.

3. Publication: i.e. Presentation, is the objective ofthe multimedia tools we have been considering.


43/304

Externalization versus linearization:

(a) Fig. 2.12(a) shows the essential problem involved incommunicating ideas without using a hypermediamechanism.

(b) In contrast, hyperlinks allow us the freedom to partiallymimic the authors thought process (i.e., externalization).

(c) Using, e.g., Microsoft Word, creates a hypertext versionof a document by following the layout already set up inchapters, headings, and so on. But problems arise whenwe actually need to automatically extract semantic

content and find links and anchors (even considering justtext and not images etc.) Fig. 2.13 displays the problem.

Multimedia Systems([email protected])

43


44/304

Fig. 2.12: Communication using hyperlinks [from David Lowe].

(a)

(b)


45/304

(d) Once a dataset becomes large we should

employ database methods. The issuesbecome focused on scalability (to a largedataset), maintainability, addition of material,and reusability.

Fig. 2.13: Complex information space [from David Lowe].


46/304

Semi-automatic migration of hypertext

The structure of hyperlinks for text information is simple: nodes represent semantic information and these are

anchors for links to other pages.

Fig. 2.14: Nodes and anchors in hypertext [from David Lowe].


47/304

Hyperimages

We need an automated method to helpus produce true hypermedia:

Fig. 2.15: Structure of hypermedia [from David Lowe].


48/304

Can manually delineate syntactic image elements bymasking image areas. Fig. 2.16 shows ahyperimage, with image areas identified and

automatically linked to other parts of a document:

Fig. 2.16: Hyperimage [from David Lowe].


49/304

2.2 Some Useful Editing and AuthoringTools

One needs real vehicles for showing understandingprinciples of and creating multimedia. And straightprogramming in C++ or Java is not always the bestway of showing your knowledge and creativity.

Some popular authoring tools include the following: Adobe Premiere 6 Macromedia Director 8 and MX Flash 5 and MX

Dreamweaver MX

Assignments for this section


50/304

2.2.1 Adobe Premiere

2.2.2 Macromedia Director

2.2.3 Macromedia Flash

2.2.4 Dreamweaver


51/304

At the convergence of technology and creative

invention in multimedia is virtual reality Placing inside a lifelike experience

Take a step forward, and the view gets closer, turn

your head, and the view rotates

Reach out and grab an object; your hand moves in

front of you maybe the object explodes in a 90-

decibel crescendo as you wrap your fingers around it.

Or it slips out from your grip, falls to the floor, and

hurriedly escapes through a mouse hole at the bottom

of the wall


52/304

In VR, your cyberspace is made up of many thousandsof geometric objects plotted in three-dimensional space

The more objects and the more points that describe theobjects, the higher resolution and the more realistic yourview

As the user moves about, each motion or action requiresthe computer to recalculate the position, angle size, andshape of all the objects that make up your view, andmany thousands of computations must occur as fast as30 times per second to seem smooth.


53/304

2.3 VRML (Virtual Reality ModellingLanguage)

Overview

(a) VRML: conceived in the first international conference of theWorld Wide Web as a platform-independent language thatwould be viewed on the Internet.

(b) Objective of VRML: capability to put coloured objects intoa 3D environment.

(c) VRML is an interpreted language; however it has been

very influential since it was the first method available fordisplaying a 3D world on the World Wide Web.


54/304

History of VRML

VRML 1.0 was created in May of 1995, with a revision for clarification called VRML 1.0C in January of 1996:

VRML is based on a subset of the file inventor formatcreated by Silicon Graphics Inc.

VRML 1.0 allowed for the creation of many simple 3Dobjects such as a cube and sphere as well as user-definedpolygons. Materials and textures can be specified for

objects to make the objects more realistic.


55/304

The last major revision of VRML was VRML 2.0,standardized by ISO as VRML97:

This revision added the ability to create an interactiveworld. VRML 2.0, also called Moving Worlds, allows foranimation and sound in an interactive virtual world.

New objects were added to make the creation of virtual

worlds easier.

Java and Javascript have been included in VRML to allowfor interactive objects and user-defined actions.

VRML 2.0 was a large change from VRML 1.0 and theyare not compatible with each other. However, conversionutilities are available to convert VRML 1.0 to VRML 2.0automatically.


56/304

VRML Shapes VRML contains basic geometric shapes that can be combined to

create more complex objects. Fig. 2.28 displays some of theseshapes:

Fig. 2.28: Basic VRML shapes.

Shape node is a generic node for all objects in VRML.

Material node specifies the surface properties of an object. It cancontrol what color the object is by specifying the red, green and bluevalues of the object.


57/304

There are three kinds of texture nodes thatcan be used to map textures onto any object:

1. ImageTexture: The most common one that cantake an external JPEG or PNG image file andmap it onto the shape.

2. MovieTexture: allows the mapping of a movieonto an object; can only use MPEG movies.

3. PixelTexture: simply means creating an imageto use with ImageTexture within VRML.


58/304

VRML world

Fig. 2.29 displays a simple VRML scene from one viewpoint: Openable-book VRML simple world!:

The position of a viewpoint can be specified with the position

node and it can be rotated from the default view with theorientation node.

Also the cameras angle for its field of view can be changedfrom its default 0.78 radians, with the fieldOfView node.

Changing the field of view can create a telephoto effect.


59/304

Fig. 2.29: A simple VRML scene.


60/304

Three types of lighting can be used in a VRML world:

DirectionalLight node shines a light across the whole world in a

certain direction.

PointLight shines a light from all directions from a certain point inspace.

SpotLight shines a light in a certain direction from a point.

RenderMan: rendering package created by Pixar.

The background of the VRML world can also be specified usingthe Background node.

A Panorama node can map a texture to the sides of the world. Apanorama is mapped onto a large cube surrounding the VRMLworld.


61/304

Animation and Interactions The only method of animation in VRML is by tweening done by

slowly changing an object that is specified in aninterpolator node.

This node will modify an object over time, based on the six types ofinterpolators: color, coordinate, normal, orientation, position, andscalar.

(a) All interpolators have two nodes that must be specified: the key andkeyValue.

(b) The key consists of a list of two or more numbers starting with 0 andending with 1, defines how far along the animation is.

(c) Each key element must be complemented with a keyValueelement: defines what values should change.


62/304

To time an animation, a TimeSensornode should be used:

(a) TimeSensorhas no physical form in the VRML world and just keeps

time.

(b) To notify an interpolator of a time change, a ROUTE is needed toconnect two nodes together.

(c) Most animation can be accomplished through the method of routinga TimeSensorto an interpolator node, and then the interpolator node

to the object to be animated.

Two categories of sensors can be used in VRML to obtain inputfrom a user:

(a) Environment sensors: three kinds of environmental sensor nodes:VisibilitySensor, ProximitySensor, and Collision.

(b) Pointing device sensors: touch sensor and drag sensors.


63/304


64/304

(f) Nodes can be named using DEFand be used againlater

by using the keyword USE. This allows for the creation of

complex objects using many simple objects.

A simple VRML example to create a box in VRML:one can accomplish this by typing:

Shape {

Geometry Box{}

}

The Box defaults to a 2-meter long cube in thecenter of the screen. Putting it into a Transformnodecan move this box to a different part of the scene. Wecan also give the box a different color, such as red.


65/304

Transform { translation 0 10 0 children

[

Shape {

Geometry Box{}

appearance Appearance {

material Material {

diffuseColor 1 0 0

}

}

} ]}


66/304

Text


67/304

Introduction to Text

Words and symbols in any form, spoken or written, are the

most common system of communication

Deliver the most widely understood meaning

Typeface usually includes many type sizes and styles

A font is a collection of characters of a single size andstyle belonging to a particular typeface family.

Typical font styles are bold face and italic

Other style attributes such as underlining and outlining ofcharacters, may be added at the users choice

U f T t i M lti di


68/304

Text is used in multimedia projects in many ways

Web pages

Video

Computer-based training

Presentations

Uses for Text in Multimedia

U f T t i M lti di


69/304

Uses for Text in Multimedia

Text is also used in multimedia projects in these ways.

Games rely on text for rules, chat, characterdescriptions, dialog, background story, and manymore elements.

Educational games rely on text for content, directions,

feedback, and information. Kiosks use text to display information, directions, anddescriptions.


70/304

Formatting TextFormatting text controls the way the text looks.You can choose: Fonts

Text sizes and colors

Text alignment

Text spacing: line spacing or spacing betweenindividual characters

Advanced formatting: outlining, shadow, superscript,subscript, watermarks, embossing, engraving, or

animation Text wraps

T f


71/304

Typefaces Characterization of a typeface is serif and sans serif

Serif

Times, Times New Roman, Bookman

Used for body of text

F

Sans serif

Arial, Optima, Verdana Used for headings

F


72/304

Guidelines for Using Fonts Avoid using many varying font styles in the same

project. When possible, use fonts that come with both

Windows and Mac OS.

Use bitmap fonts on critical areas such as buttons,

titles, or headlines.


73/304

More Tips for Using Fonts

Use fancy or whimsical fonts sparinglyfor special effects or emphasis.

Keep paragraphs and line lengths short.

Use bold, italic, and underlining optionssparingly for emphasis.

More Guidelines for Using


74/304

More Guidelines for UsingFonts

Avoid using text in all uppercase letters. Use font, style options, size, and color

consistently.

Provide adequate contrast between textand background when choosing colors.

Always check spelling and grammar.


75/304

Formatting for Screen DisplayApply these guidelines to multimedia

applications for display rather than to printeddocuments. Test your presentation on monitors in several sizes.

Avoid patterned backgrounds.

Use small amounts of text on each screen display.

Text for a presentation that will be viewed by a large group ofpeople must be visible from the back of the room.

For interactive displays, use consistent placement of hypertextlinks.

Character set and alphabets


76/304


ASCII Character set Uses 8 bit characters

Numeric of value to 128 characters including bothlower and uppercase letters, punctuation marks, Arabicnumbers and math symbols.

32 control characters for device control messages, suchas carriage return, line feed, tab and form feed.

ASCII extended character set also uses 8 bits



77/304


UNICODE Character set Use 16-bit architecture for multilingual text and

character encoding.

Unicode uses about 65,000 characters from all knownlanguages and alphabets in the world.

Several languages share a set of symbols that have ahistorically related derivation, the shared symbols ofeach language are unified ymbols (Called scripts).


78/304

Font TechnologiesUnderstanding font technologies can be important when

creating multimedia projects. The most popular fonttechnologies are:

Scalable fonts: Postscript, TrueType, and OpenType

Bitmap fonts which are not scalable but provide morecontrol over the appearance of text.

Font Editing and Design tools


79/304

Font Editing and Design tools

In some multimedia projects it may be required to create

special characters. Using the font editing tools it is possible to create a special

symbols and use it in the entire text.

Software that can be used for editing and creating fonts

Fontographer

Fontmonger

Cool 3D text


80/304

Graphics and Image Data

Representations

Why use Images?


81/304

Why use Images?

To show information that is visual and cant be easily

communicated except as an image, for instance, maps,charts or diagrams

To clarify interpretation of information by applying colorschemes or other visuals that help make meaning more

obvious To create an evident context for information by using

images that your audience can associate with your tone ormessage

Bitmap/Vector images


82/304

Bitmap/Vector images In a bitmap or raster image, visual data is mapped as spots of

color or pixels. The more pixels in a bitmap image, the finer the detail will be

Because photographs have high levels of detail and a varietyof tones and colors, they are best represented as bitmap

images. Scanners and digital cameras produce bitmapimages

Vector or object oriented graphics use mathematical formulasto describe outlines and fills for image objects

Vector graphics can be enlarged or reduced with no loss ofdata and no change in image quality e.g. CorelDraw,Illustrator, Freehand, AutoCAD & Flash create vector images

Wh t i i


83/304

What is image For digitizing image, the image is discredited both in terms

spatial co-ordinates and its amplitude values Discretization of the spatial coordinates (x,y) is called

image sampl ing

Descretization of the amplitude values f ( x, y) is called

grey- level quantizat ion , intensi ty

A digital image is represented by a matrix of numericvalues each representing a quantized intensity value

When I is a two-dimensional matrix, then I(r,c) is the

intensity value at the position corresponding to row r andcolumn c of the matrix

Image


84/304

Image

Each element of the array is called pixels

Pixel Neighbours


85/304

Pixel Neighbours A pixelp1 is a neighbour of another pixel p2, if their spatial

coordinates (x1,y1) and (x1,y2) are not more than a unitdistance apart. Types of neighbours often used in imageprocessing: Horizontal neighbours

Vertical neighbours Diagonal neighbours

Ari thm etic and Log ic Operat ions Addition:p1+p1used in image averaging

Subtraction:p1-p2 used in image motion analysis, and backgroundremoval

Multiplication:p1*p2 used in colour and image shading operations

Divisionp1/p2 used in colour processing,

C l


86/304

Color Reflection of light is simply the bouncing of light waves from

an object back toward the lights source or other directions Energy is often absorbed from the light (and converted into

heat or other forms) when the light reflects off an object, sothe reflected light might have slightly different properties

light is the portion of electromagnetic radiation tat is visible tothe human eye

Visible light has a wavelength of about 400 to 780nanometers

The adjacent frequencies of infrared on the lower end andultraviolet on the higher end are still called light, even thoughthey are not visible to the human eye

C l


87/304

Color Cameras store and reproduce light as images and video

The device consists of a box with a hole in one side

Light from an external scene passes through the hole and

strikes a surface inside where it is reproduced, upside-down, but with both color and perspective preserved

At first, the image is projected in light-sensitive chemical

plates, later it became chemical film, and now it is

photosensitive electronics that can record images in a

digital format

Human Color perception


88/304

Human Color perception The retina contains two types of light-sensitive

photoreceptors: rods and cones. The rods are responsible for monochrome perception,

allowing the eyes to distinguish between black and white.

The cones are responsible for color vision.

In humans, there are three types of cones: maximallysensitive to long-wavelength, medium-wavelength, andshort-wavelength light or Red, Green and Blue

The color perceived is the combined effect of stimuli to

these three types of cone cells. Overall there are morerods than cones, so color perception is less accurate thanblack and white contrast perception.

M h I


89/304

Monochrome Images Each pixel is stored as a single bit (0 or 1), so also referred to as

binary image Such an image is also called a 1-bit monochrome image since

it contains no color

A 640 x 480 monochrome image requires 37.5 KB of storage.


90/304

8-bit Gray-level Images


91/304

8-bit Gray-level Images Each pixel has a gray-value between 0 and 255.

Each pixel is represented by a single byte; e.g., a dark pixel mighthave a value of 10, and a bright one might be 230.

A 640 x 480 grayscale image requires over 300 KB of storage.

8-Bit Colour Image

One byte for each pixel

Supports 256 out of the millions s possible, acceptable colour quality

Requires Colour Look-Up Tables (LUTs)

A 640 x 480 8-bit colour image requires 307.2 KB of storage (thesame as 8-bit greyscale)


92/304


93/304

24 bit Color Images


94/304

24-bit Color Images Each pixel is represented by three bytes (e.g., RGB)

Supports 256 x 256 x 256 possible combined colours(16,777,216)

A 640 x 480 24-bit colour image would require 921.6 KBof storage

Most 24-bit images are 32-bit images, the extra byte ofdata for each pixel is used to store an alpha valuerepresenting special effect information


95/304

95

Assignment


96/304

Assignment How do CRT monitors create images

How do Flat panel displays create images

How do scanners digitize image

How do printers create image with colors

How can we create 3D images Briefly describe the different image formats,

GIF,JPEG,PNG,TIFF


97/304

Image resolution refers to the number of pixels in a digitalimage (higher resolution always yields better quality).

- Fairly high resolution for such an image might be 1,600 x1,200,whereas lower resolution might be 640 x 480.

Frame buffer: Hardware used to store bitmap.

- Video card (actually a graphics card) is used for this purpose.

- The resolution of the video card does not have to match thedesired resolution of the image, but if not enough video cardmemory is available then the data has to be shifted around in RAMfor display.

8-bit image can be thought of as a set of 1-bit bit-planes, whereeach plane consists of a 1-bit representation of the image at

higher and higher levels of elevation: a bit is turned on if theimage pixel has a nonzero value that is at or above that bit level.

Fig. 3.2 displays the concept of bit-planes graphically.


98/304

Fig. 3.2: Bit-planes for 8-bit grayscale image.

3 2 Popular File Formats


99/304

3.2 Popular File Formats

8-bit GIF : one of the most important formats because of itshistorical connection to the WWW and HTML markup language asthe first image type recognized by net browsers.

JPEG: currently the most important common file format.

GIF


100/304

GIF GIF standard: (We examine GIF standard because it is so

simple! yet contains many common elements.) Limited to 8-bit (256) color images only, which, whileproducing acceptable color images, is best suited for imageswith few distinctive colors (e.g., graphics or drawing).

GIF standard supports interlacing successive display ofpixels in widely-spaced rows by a 4-pass display process.

GIF actually comes in two flavors:

1. GIF87a: The original specification.

2. GIF89a: The later version. Supports simple animation via aGraphics Control Extension block in the data, provides simplecontrol over delay time, a transparency index, etc.

GIF87


101/304

GIF87 For the standard specification, the general file format of a GIF87

file is as in Fig. 3.12.

Fig. 3.12: GIF fileformat.

Screen Descriptorcomprises a set of attributes that belong to


102/304

p p gevery image in the file. According to the GIF87 standard, it isdefined as in Fig. 3.13.

Fig. 3.13: GIF screen descriptor.

Color Map is set up in a very simple fashion as in Fig. 3.14.


103/304

p p y p gHowever, the actual length of the table equals 2(pixel+1) as givenin the Screen Descriptor.

Fig. 3.14: GIF color map.

Each image in the file has its own Image Descriptor, defined as


104/304

g gin Fig. 3.15.

Fig. 3.15: GIF image descriptor.

If the interlace bit is set in the local Image Descriptor, then the


105/304

rows of the image are displayed in a four-pass sequence(Fig.3.16).

Fig. 3.16: GIF 4-pass interlace display row order.


106/304

We can investigate how the file header works in practice byhaving a look at a particular GIF image. Fig. 3.7 on page is an 8-

bit color GIF image, in UNIX, issue the command:od -c forestfire.gif | head -2

and we see the first 32 bytes interpreted as characters:

G I F 8 7 a \208 \2 \188 \1 \247 \0 \0 \6 \3 \5

J \132 \24 | ) \7 \198 \195 \ \128 U \27 \196 \166 & T

To decipher the remainder of the file header (after GIF87a), weuse hexadecimal:

od -x forestfire.gif | head -2

with the result

4749 4638 3761 d002 bc01 f700 0006 0305 ae84 187c 2907 c6c3 5c80

551b c4a6 2654

JPEG


107/304

JPEG JPEG: The most important current standard for image

compression.

The human vision system has some specific limitations and JPEGtakes advantage of these to achieve high rates of compression.

JPEG allows the user to set a desired level of quality, or

compression ratio (input divided by output).

As an example, Fig. 3.17 shows ourforestfire image, with a qualityfactor Q=10%. - This image is a mere 1.5% of the original size. In comparison, a JPEG

image with Q=75% yields an image size 5.6% of the original, whereasa GIF version of this image compresses down to 23.0% ofuncompressed image size.


108/304

Fig. 3.17: JPEG image with low quality specified by user.

PNG


109/304

PNG PNG format: standing forPortable Network Graphics

meant to supersede the GIF standard, and extends it inimportant ways.

Special features of PNG files include:

1. Support for up to 48 bits of color information a largeincrease.

2. Files may contain gamma-correction information for correctdisplay of color images, as well as alpha-channel information forsuch uses as control of transparency.

3. The display progressively displays pixels in a 2-dimensionalfashion by showing a few pixels at a time over seven passesthrough each 8 8 block of an image.

TIFF


110/304

TIFF TIFF: stands forTagged Image File Format.

The support for attachment of additional information (referred toas tags) provides a great deal of flexibility.

1. The most important tag is a format signifier: what type ofcompression etc. is in use in the stored image.

2. TIFF can store many different types of image: 1-bit,grayscale, 8-bit color, 24-bit RGB, etc.

3. TIFF was originally a lossless format but now a new JPEG

tag allows one to opt for JPEG compression.

4. The TIFF format was developed by the Aldus Corporation inthe 1980's and was later supported by Microsoft.

EXIF


111/304

EXIF

EXIF (Exchange Image File) is an image format for digital

cameras:

1. Compressed EXIF files use the baseline JPEG format.

2. A variety of tags (many more than in TIFF) are available to facilitatehigher quality printing, since information about the camera and picture-

taking conditions (flash, exposure, light source, white balance, type ofscene, etc.) can be stored and used by printers for possible color correctionalgorithms.

3. The EXIF standard also includes specification of file format for audio thataccompanies digital images. As well, it also supports tags for information

needed for conversion to FlashPix (initially developed by Kodak).


112/304

Audio

Sound


113/304

What is Sound?

Sound is a wave phenomenon like light, but is macroscopic and

involves molecules of air being compressed and expanded

under the action of some physical device.

(a) For example, a speaker in an audio system vibrates back

and forth and produces a longitudinalpressure wave that we

perceive as sound.

(b) Since sound is a pressure wave, it takes on continuous

values, as opposed to digitized ones.


114/304

(C) If we wish to use a digital version of sound waves we

must form digitized representations of audio information.

The perception of sound in any organism is limited to a

certain range of frequencies(20Hz~20000Hz for humans)

Infrasound; Elephants

Ultrasound; Bat

Digitization of Sound


115/304

Digitization means conversion to a stream of numbers,

and preferably these numbers should be integers for

efficiency.

Example of Sound
http://localhost/var/www/apps/conversion/tmp/scratch_3/Breakaway%20-%20Audio%20processing.mp4http://localhost/var/www/apps/conversion/tmp/scratch_3/Breakaway%20-%20Audio%20processing.mp4


116/304

Fig. 6.1: An analog signal: continuousmeasurement of pressure wave.


117/304

The graph in Fig. 6.1 has to be made digital in both time andamplitude. To digitize, the signal must be sampled in eachdimension: in time, and in amplitude. (a) Sampling means measuring the quantity we are interested

in, usually at evenly-spaced intervals.

(b) The first kind of sampling, using measurements only at

evenly spaced time intervals, is simply called, sampling. The rateat which it is performed is called the sampling frequency

(c) For audio, typical sampling rates are from 8 kHz (8,000samples per second) to 48 kHz. This range is determined by theNyquist theorem, discussed later.

(d) Sampling in the amplitude or voltage dimension is calledquantization.


118/304

Fig. 6.2: Sampling and Quantization. (a): Sampling theanalog signal in the time dimension. (b): Quantization issampling the analog signal in the amplitude dimension.

(a) (b)

Few Terminologies


119/304

Regardless of what vibrating object is creating the sound

wave, the particles of the medium through which the soundmoves is vibrating in a back and forth motion at a givenfrequency.

The frequency of a wave refers to how often the particles ofthe medium vibrate when a wave passes through themedium. The frequency of a wave is measured as thenumber of complete back-and-forth vibrations of a particle ofthe medium per unit of time. If a particle of air undergoes1000 longitudinal vibrations in 2 seconds, then the

frequency of the wave would be 500 vibrations per second.A commonly used unit for frequency is the Hertz(abbreviated Hz), where

1 Hertz = 1 vibration/second

Few Terminologies Contd


120/304

Few Terminologies Contd


121/304

g The sensation of a frequency is commonly referred to as the

pitch A high pitch sound corresponds to a high frequency sound

wave and a low pitch sound to a low frequency sound wave.

Musically trained people are capable of detecting a difference

in frequency between two separate sounds that is as little as 2Hz and common people understand 7Hz

Any two sounds whose frequencies make a 2:1 ratio are said tobe separated by an octave

Fourier Series


122/304

The representation of periodic function as infinite sum of

sinusoidal

Harmonics: any series of musical tones whose frequenciesare integral multiples of the frequency of a fundamental tone


123/304

Fig. 6.3: Building up a complex signal by superposingsinusoids


124/304

Digitization


125/304

Thus to decide how to digitize audio data

we need to answer the following questions:1. What is the sampling rate?

2. How finely is the data to be quantized, and is

quantization uniform?

Nyquist theorem


126/304

The Nyquist theorem states how frequently we must samplein time to be able to recover the original sound.

(a) Fig. 6.4(a) shows a single sinusoid: it is a single, pure,frequency (only electronic instruments can create suchsounds).

(b) If sampling rate just equals the actual frequency, Fig. 6.4(b)

shows that a false signal is detected: it is simply a constant, withzero frequency.

(c) Now if sample at 1.5 times the actual frequency, Fig. 6.4(c)shows that we obtain an incorrect (alias) frequency that is lowerthan the correct one it is half the correct one (the wavelength,

from peak to peak, is double that of the actual signal).

(d) Thus for correct sampling we must use a sampling rate equalto at least twice the maximum frequency content in the signal.This rate is called the Nyquist rate.


127/304

Fig. 6.4: Aliasing.

(a): A single frequency.

(b): Sampling at exactly the frequencyproduces a constant.

(c): Sampling at 1.5 times per cycle

produces an alias perceived frequency.


128/304

Nyquist Theorem: If a signal is band-limited, i.e.,there is a lower limitf

1

and an upper limit f2

offrequency components in the signal, then the samplingrate should be at least 2(f2f1).

Nyquist frequency: half of the Nyquist rate.

Since it would be impossible to recover frequencies higherthan Nyquist frequency in any event, most systems havean antialiasing filterthat restricts the frequency content inthe input to the sampler to a range at or below Nyquistfrequency.

The relationship among the Sampling Frequency, TrueFrequency, and the Alias Frequency is as follows:falias = fsampling ftrue, forftrue < fsampling< 2 ftrue (6.1)


129/304

In general, the apparent frequency of a sinusoid is thelowest frequency of a sinusoid that has exactly the

same samples as the input sinusoid. Fig. 6.5 showsthe relationship of the apparent frequency to the inputfrequency.

Fig. 6.5: Folding of sinusoid frequency which issampled at 8,000 Hz. The folding frequency, shown

dashed, is 4,000 Hz.

1HZ Wave Frequency


130/304

Sampling at 2Hz


131/304

Sampling at 3Hz


132/304

Sampling at 1.5Hz


133/304

Sampling with 3HZ Frequency


134/304


135/304

Aliasing


136/304

Exercise


137/304

If sampling rate is 4000HZ, what is the frequency of a

sine wave If the highest frequency is 4000HZ, what is the minimum

sampling rate?

What is the alias of 2000HZ wave frequency sampled at

1500HZ?

Signal to Noise Ratio (SNR)


138/304

The ratio of the power of the correct signal and the noise iscalled the signal to noise ratio (SNR) a measure of thequality of the signal.

The SNR is usually measured in decibels (dB), where 1 dB isa tenth of a bel. The SNR value, in units of dB, is defined interms of base-10 logarithms of squared voltages, as follows:

(6.2)

2

10 10210log 20log

signal signal

noise noise

V V

SNR V V


139/304

a) The power in a signal is proportional to the

square of the voltage. For example, if thesignal voltage Vsignalis 10 times the noise,then the SNR is 20 log10(10) = 20dB.

b) In terms of power, if the power from tenviolins is ten times that from one violinplaying, then the ratio of power is 10dB, or1B.

c) To know: Power 10; Signal Voltage 20.


140/304

The usual levels of sound we hear around us are described in terms of decibels, as a ratio tothe quietest sound we are capable of hearing. Table 6.1 shows approximate levels for thesesounds.

Table 6.1: Magnitude levels of common sounds, in decibels

Threshold of hearing 0

Rustle of leaves 10

Very quiet room 20

Average room 40

Conversation 60

Busy street 70

Loud radio 80

Train through station 90

Riveter 100

Threshold of discomfort 120

Threshold of pain 140

Damage to ear drum 160

Signal to Quantization Noise Ratio(SQNR)


141/304

Aside from any noise that may have been present

in the original analog signal, there is also anadditional error that results from quantization.

(a) If voltages are actually in 0 to 1 but we have only 8

bits in which to store values, then effectively we forceall continuous values of voltage into only 256 differentvalues.

(b) This introduces a roundoff error. It is not reallynoise. Nevertheless it is called quantization noise(or quantization error).


142/304

The quality of the quantization is

characterized by the Signal toQuantization Noise Ratio (SQNR).

(a) Quantization noise: the difference between

the actual value of the analog signal, for theparticular sampling time, and the nearestquantization interval value.

(b) At most, this error can be as much ashalf of the interval.

(c) For a quantization accuracy of N bits per sample the


143/304

(c) For a quantization accuracy ofNbits per sample, theSQNR can be simply expressed:

(6.3)

Notes:

(a)We map the maximum signal to 2N1 1 ( 2N1) and the

most negative signal to 2N1.

(b) Eq. (6.3) is the Peaksignal-to-noise ratio, PSQNR: peaksignal and peak noise.

1220log 20log

10 10 1_

2

20 log 2 6.02 (dB)

V Nsignal

SQNRV

quan noise

N N

( ) Th d i i th ti f i t


144/304

(c) The dynamic range is the ratio of maximum tominimum absolute values of the signal: Vmax/Vmin. The

max abs. value Vmax gets mapped to 2N1 1; the minabs. value Vmin gets mapped to 1. Vmin is the smallestpositive voltage that is not masked by noise. Themost negative signal, Vmax, is mapped to 2

N1.

(d) The quantization interval isV=(2Vmax)/2N, since

there are 2N intervals. The whole range Vmax down to(Vmax V/2) is mapped to 2

N1 1.

(e) The maximum noise, in terms of actual voltages, ishalf the quantization interval:V/2 = Vmax/2N.

6 02N i th t If th i t


145/304

6.02N is the worst case. If the input

signal is sinusoidal, the quantization erroris statistically independent, and itsmagnitude is uniformly distributed between0 and half of the interval, then it can beshown that the expression for the SQNRbecomes:

SQNR = 6.02N+1.76(dB) (6.4)

Linear and Non-linear

Q ti ti


146/304

Quantization Linear format: samples are typically stored as uniformly quantized values.

Non-uniform quantization: set up more finely-spaced levels wherehumans hear with the most acuity.

Webers Law stated formally says that equally perceived differences have valuesproportional to absolute levels:

Response Stimulus/Stimulus (6.5)

Inserting a constant of proportionality k, we have a differential equation that states:

dr= k(1/s) ds (6.6)

with response rand stimuluss.

Integrating we arrive at a solution


147/304

Integrating, we arrive at a solution

r= klns + C (6.7)

with constant of integration C. Stated differently, the solution is

r= kln(s/s0) (6.8)

s0

= the lowest level of stimulus that causes a response (r= 0 whens =s0

).

Nonlinear quantization works by first transforming an analog signal from the raws spaceinto the theoretical rspace, and then uniformly quantizing the resulting values.

Such a law for audio is called-law encoding, (oru-law). A very similar rule, called A-law, is used in telephony in Europe.

The equations for these very similar encodings are as follows:

-law:


148/304

law:

(6.9)

A-law:

(6.10)

Fig. 6.6 shows these curves. The parameteris set to= 100 or= 255; the parameterA for theA-law encoder is usually set toA =87.6.

sgn( )ln 1 , 1

ln(1 ) p p

s s sr

s s

1,

1 ln

sgn( ) 11 ln , 1

1 ln

p p

p p

A s s

A s s A

r

s s sA

A s A s

1 if 0,where sgn( )

1 otherwise

ss


149/304

Fig. 6.6: Nonlinear transform for audio signals The-law in audio is used to develop a nonuniform

quantization rule for sound: uniform quantization ofrgivesfiner resolution ins at the quiet end.

149Li & Drew

Audio Filtering


150/304

Prior to sampling and AD conversion, the audio signal is also usuallyfilteredto remove unwanted frequencies. The frequencies kept depend onthe application:

(a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies are blocked by the use of a band-pass filterthat screens out lower and higher

frequencies.

(b) An audio music signal will typically contain from about 20Hz up to 20kHz.

(c) At the DA converter end, high frequencies may reappear in the output because of sampling and then quantization, smooth input signal is replaced by aseries of step functions containing all possible frequencies.

(d) So at the decoder side, a lowpass filter is used after the DA circuit.

Audio Quality vs. Data Rate


151/304

The uncompressed data rate increases as more bits areused for quantization. Stereo: double the bandwidth. totransmit a digital audio signal.

Table 6.2: Data rate and bandwidth in sample audio applications

Quality Sample

Rate (Khz)

Bits per

Sample

Mono /

Stereo

Data Rate

(uncompressed

) (kB/sec)

Frequency

Band (KHz)

Telephone 8 8 Mono 8 0.200-3.4

AM Radio 11.025 8 Mono 11.0 0.1-5.5

FM Radio 22.05 16 Stereo 88.2 0.02-11

CD 44.1 16 Stereo 176.4 0.005-20

DAT 48 16 Stereo 192.0 0.005-20

DVDAudio

192 (max) 24(max) 6channels

1,200 (max) 0-96 (max)

Synthetic Sounds


152/304

1. FM (Frequency Modulation): oneapproach to generating synthetic sound:

(6.11)( ) ( ) cos[ ( ) cos( ) ]c m m cx t A t t I t t


153/304

Fig. 6.7: Frequency Modulation. (a): A single frequency. (b): Twice thefrequency. (c): Usually, FM is carried out using a sinusoid argument toa sinusoid. (d): A more complex form arises from a carrier frequency,2tand a modulating frequency 4tcosine inside the sinusoid.

2. Wave Table synthesis:


154/304

A more accurate way of generating

sounds from digital signals. Also known,simply, as sampling.

In this technique, the actual digital

samples of sounds from real instrumentsare stored. Since wave tables are stored inmemory on the sound card, they can be

manipulated by software so that soundscan be combined, edited, and enhanced.

Quantization and Transmission of Audio


155/304

Coding of Audio: Quantization andtransformation of data are collectively known ascoding of the data.

a) For audio, the-law technique for companding

audio signals is usually combined with an algorithmthat exploits the temporal redundancy present inaudio signals.

b) Differences in signals between the present and a

past time can reduce the size of signal values andalso concentrate the histogram of pixel values(differences, now) into a much smaller range.


156/304

c) The result of reducing the variance of

values is that lossless compression methodsproduce a bitstream with shorter bit lengthsfor more likely values

In general, producing quantized sampledoutput for audio is called PCM (PulseCode Modulation). The differences versionis called DPCM (and a crude but efficientvariant is called DM). The adaptive versionis called ADPCM.

Pulse Code Modulation


157/304

The basic techniques for creating digitalsignals from analog signals are samplingand quantization.

Quantization consists of selectingbreakpoints in magnitude, and then re-mapping any value within an interval to

one of the representative output levels.


158/304

Fig. 6.2: Sampling and Quantization.

(a) (b)

a) The set of interval boundaries are called


159/304

a) The set of interval boundaries are calleddecision boundaries, and the representative

values are called reconstruction levels.

b) The boundaries for quantizer input intervalsthat will all be mapped into the same output level

form a coder mapping.

c) The representative values that are the outputvalues from a quantizer are a decoder mapping.

d) Finally, we may wish to compress the data, byassigning a bit stream that uses fewer bits for themost prevalent signal values (Chap. 7).

Every compression scheme has three stages:


160/304

Every compression scheme has three stages:A. The input data is transformed to a new

representation that is easier or more efficient tocompress.

B. We may introduce loss of information.

Quantization is the main lossy step we use alimited number of reconstruction levels, fewerthan in the original signal.

C. Coding. Assign a codeword (thus forming abinary bitstream) to each output level or symbol.This could be a fixed-length code, or a variablelength code such as Huffman coding

For audio signals we first consider PCM


161/304

For audio signals, we first consider PCM

for digitization. This leads to LosslessPredictive Coding as well as the DPCMscheme; both methods use differentialcoding. As well, we look at the adaptiveversion, ADPCM, which can provide bettercompression.

PCM in Speech Compression


162/304

Assuming a bandwidth for speech from about 50 Hz to about 10kHz, the Nyquist rate would dictate a sampling rate of 20 kHz.

(a) Using uniform quantization without companding, the minimumsample size we could get away with would likely be about 12 bits.Hence for mono speech transmission the bit-rate would be 240 kbps.

(b) With companding, we can reduce the sample size down to about 8bits with the same perceived level of quality, and thus reduce the bit-rateto 160 kbps.

(c) However, the standard approach to telephony in fact assumes thatthe highest-frequency audio signal we want to reproduce is only about 4kHz. Therefore the sampling rate is only 8 kHz, and the companded bit-rate thus reduces this to 64 kbps.

However there are two small wrinkles we must


163/304

However, there are two small wrinkles we mustalso address:

1. Since only sounds up to 4 kHz are to be considered,all other frequency content must be noise.Therefore, we should remove this high-frequency

content from the analog input signal. This is doneusing a band-limiting filter that blocks out high, aswell as very low, frequencies.

Also, once we arrive at a pulse signal, such

as that in Fig. 6.13(a) below, we must still performDA conversion and then construct a final outputanalog signal. But, effectively, the signal we arriveat is the staircase shown in Fig. 6.13(b).


164/304

Fig. 6.13: Pulse Code Modulation (PCM). (a) Original analogsignal and its corresponding PCM signals. (b) Decoded staircasesignal. (c) Reconstructed signal after low-pass filtering.

164Li & Drew

2. A discontinuous signal contains not just


165/304

2. A discontinuous signal contains not justgrequency components due to the originalsignal, but also a theoretically infinite set ofhigher-frequency components:

(a) This result is from the theory ofFourieranalysis, in signal processing.

(b) These higher frequencies are extraneous.

(c) Therefore the output of the digital-to-analogconverter goes to a low-pass filterthat allowsonly frequencies up to the original maximum to beretained.

The complete scheme for encoding and decoding


166/304

The complete scheme for encoding and decodingtelephony signals is shown as a schematic in Fig.6.14. As a result of the low-pass filtering, the outputbecomes smoothed and Fig. 6.13(c) above showedthis effect.

Fig. 6.14: PCM signal encoding and decoding.

Differential Coding of Audio


167/304

Audio is often stored not in simple PCM butinstead in a form that exploits differences which are generally smaller numbers, so offerthe possibility of using fewer bits to store.

(a) If a time-dependent signal has someconsistency over time (temporal redundancy),the difference signal, subtracting the current

sample from the previous one, will have a morepeaked histogram, with a maximum around zero.


168/304

Fundamental Concepts in

Video

Digital Video One may be excused for thinking that the capture


169/304

One may be excused for thinking that the capture

and playback of digital video is simply a matter ofcapturing each frame, or image, and playing themback in a sequence at 25 frames per second.

A single image or frame with a window size orscreen resolution of 640 x 480 pixels and 24 bitcolour (16.8 million colours) occupiesapproximately 1MB of disc space.

Roughly 25 MB of disc space are needed forevery second of video, 1.5 GB for every minute.

The three basic problems of digital video

There are three basic problems with digital video


170/304

p g

Size of video window, Frame rate and Quality of image

Size of video window Digital video stores a lot of information about each pixel in each

frame

It takes time to display those pixels on your computer screen

If the window size is small, then the time taken to draw the pixels isless. If the window size is large, there may not be enough time todisplay the image or single frame before its time to start the nextone

Choose an appropriate window size, may not always produce

desirable result

Frame Rates Too many pixels and not enough time.

Depending on the size of video window chosen, you may also be

able to reduce file size by reducing the number of frames per

5.1 Types of Video Signals


171/304

Component video

Component video: Higher-end video systems make use of threeseparate video signals for the red, green, and blue image planes.Each color channel is sent as a separate video signal.

(a) Most computer systems use Component Video, with separate signalsfor R, G, and B signals.

(b) For any color separation scheme, Component Video gives the bestcolor reproduction since there is no crosstalk between the threechannels.

(c) This is not the case for S-Video or Composite Video, discussed next.Component video, however, requires more bandwidth and goodsynchronization of the three components.

Composite Video 1 Signal


172/304

Composite video: color (chrominance) and intensity (luminance) signals

are mixed into a single carrier wave.

a) Chrominance is a composition of two color components (I and Q, or U and V).

b) In NTSC TV, e.g., I and Q are combined into a chroma signal, and a color subcarrier isthen employed to put the chroma signal at the high-frequency end of the signalshared with the luminance signal.

c) The chrominance and luminance components can be separated at the receiver endand then the two color components can be further recovered.

d) When connecting to TVs or VCRs, Composite Video uses only one wire and videocolor signals are mixed, not sent separately. The audio and syncsignals areadditions to this one signal.

Since color and intensity are wrapped into the same signal, some interferencebetween the luminance and chrominance signals is inevitable.

S-Video 2 Signals


173/304

S-Video: as a compromise, (separated video, or Super-video, e.g., in

S-VHS) uses two wires, one for luminance and another for acomposite chrominance signal.

As a result, there is less crosstalk between the colorinformation andthe crucial gray-scale information.

The reason for placing luminance into its own part ofthe signal is thatblack-and-white information is most crucial for visual perception.

In fact, humans are able to differentiate spatial resolution in grayscaleimages with a much higher acuity than for the color part of color images.

As a result, we can send less accurate color information than must besent for intensity information we can only see fairly large blobs ofcolor, so it makes sense to send less color detail.

5.2 Analog Video


174/304

An analog signal f(t) samples a time-varying image. So-called

progressive scanning traces through a complete picture (a frame)row-wise for each time interval.

In TV, and in some monitors and multimedia standards as well,another system, called interlaced scanning is used:

a) The odd-numbered lines are traced first, and then the even-numberedlines are traced. This results in odd and even fields two fieldsmake up one frame.

b) In fact, the odd lines (starting from 1) end up at the middle of a line

at the end of the odd field, and the even scan starts at a half-way point.


175/304

Fig. 5.1: Interlaced raster scan

c) Figure 5.1 shows the scheme used. First the solid (odd) lines are traced, P to Q, then R to S,etc., ending at T; then the even field starts at U and ends at V.

d) The jump from Q to R, etc. in Figure 5.1 is called the horizontal retrace, during which theelectronic beam in the CRT is blanked. The jump from T to U or V to P is called thevertical retrace.

Because of interlacing, the odd and even


176/304

g,

lines are displaced in time from each other generally not noticeable except whenvery fast action is taking place on screen,when blurring may occur.

For example, in the video in Fig. 5.2, themoving helicopter is blurred more than isthe still background.


177/304

Fig. 5.2: Interlaced scan produces two fields for each frame. (a) Thevideo frame, (b) Field 1, (c) Field 2, (d) Difference of Fields

(a)

(b) (c) (d)

Since it is sometimes necessary to change the frame rate,resize, or even produce stills from an interlaced source video,various schemes are used to de-interlace it


178/304

various schemes are used to de interlace it.

a) The simplest de-interlacing method consists of discarding onefield and duplicating the scan lines of the other field. Theinformation in one field is lost completely using this simpletechnique.

b) Other more complicated methods that retain information fromboth fields are also possible.

Analog video use a small voltage offset from zero to indicateblack, and another value such as zero to indicate the start ofa line. For example, we could use a blacker-than-black zero

signal to indicate the beginning of a line.


179/304

Fig. 5.3 Electronic signal for one NTSC scan line.

Digital Video


180/304

The advantages of digital representation for video aremany. For example:(a) Video can be stored on digital devices or in memory,

ready to be processed (noise removal, cut and paste, etc.),and integrated to various multimedia applications;

(b) Direct access is possible, which makes nonlinear videoediting achievable as a simple, rather than a complex,task;

(c) Repeated recording does not degrade image quality;

(d) Ease of encryption and better tolerance to channel noise.

Chroma Subsampling


181/304

Since humans see color with much less spatialresolution than they see black and white, it makessense to decimate the chrominance signal.

Interesting (but not necessarily informative!) nameshave arisen to label the different schemes used.

To begin with, numbers are given stating how manypixel values, per four original pixels, are actuallysent:

(a) The chroma subsampling scheme 4:4:4 indicatesthat no chroma subsampling is used: each pixels Y,Cb and Cr values are transmitted, 4 for each of Y, Cb,Cr.

(b) The scheme 4:2:2 indicates horizontal subsampling ofthe Cb Cr signals by a factor of 2 That is of four pixels


182/304

the Cb, Cr signals by a factor of 2. That is, of four pixelshorizontally labelled as 0 to 3, all four Ys are sent, andevery two Cbs and two Crs are sent, as (Cb0, Y0)(Cr0,Y1)(Cb2, Y2)(Cr2, Y3)(Cb4, Y4), and so on (or averagingis used).

(c) The scheme 4:1:1 subsamples horizontallyby a factor

of 4.

(d) The scheme 4:2:0 subsamples in both the horizontaland verticaldimensions by a factor of 2. Theoretically, anaverage chroma pixel is positioned between the rows andcolumns as shown Fig.5.6.

Scheme 4:2:0 along with other schemes is commonly usedin JPEG and MPEG (see later chapters in Part 2).


183/304

Fig. 5.6: Chroma subsampling


184/304

Lossless Compression

Algorithms

Introduction


185/304

Compression: the process of coding that willeffectively reduce the total number of bits neededto represent certain information.

Fig. 7.1: A General Data Compression Scheme.

Introduction (contd)


186/304

If the compression and decompression processesinduce no information loss, then the compressionscheme is lossless; otherwise, it is lossy.

Compression ratio:

(7.1)

B0 number of bits before compressionB1 number of bits after compression

0

1

B

compressionratioB

Compression basically employsredundancy in the data:


187/304

Temporal -- in 1D data, 1D signals, Audio etc. Spatial -- correlation between neighbouring pixels or

data items

Spectral -- correlation between colour or luminescence

components. This uses the frequency domain to exploitrelationships between frequency of change in data.

psycho-visual -- exploit perceptual properties of thehuman visual system.

Basics of Information Theory


188/304

The entropy of an information source with alphabet S= {s1,s2,

. . . ,sn} is:

(7.2)

(7.3)

pi probability that symbolsi will occur in S.

indicates the amount of information ( self-informationas defined by Shannon) contained insi, which corresponds tothe number of bits needed to encodesi.

2

1

1( ) log

n

i

i i

H S pp

2

1

logn

i i

i

p p

1log

2 pi

Distribution of Gray-LevelIntensities


189/304

Fig. 7.2 Histograms for Two Gray-level Images.

Fig. 7.2(a) shows the histogram of an image with uniform distributionof gray-level intensities, i.e., i pi= 1/256. Hence, the entropy of thisimage is:

log2256 = 8 (7.4)

Fig. 7.2(b) shows the histogram of an image with two possible values.Its entropy is 0.92.

Entropy and Code Length

As can be seen in Eq. (7.3): the entropy is ai ht d f t h it


190/304

weighted-sum of terms ; hence itrepresents the average amount of informationcontained per symbol in the source S.

The entropy specifies the lower bound for the

average number of bits to code each symbol in S,i.e.,

(7.5)

- the average length (measured in bits) of thecodewords produced by the encoder.

1log2 pi

l

l

Simple Repetition Suppression For Example


191/304

p

89400000000000000000000000000000000 With 894f32

Suppression of zero's in a file (Zero Leng th

Suppress ion) Silence in audio data, Pauses in conversation

Bitmaps

Blanks in text or program source files

Backgrounds in images

Run-Length Coding


192/304

This encoding method is frequentlyapplied to images (or pixels in a scan line).

For example:

111122233333311112222 can beencoded as: (1,4),(2,3),(3,6),(1,4),(2,4)

Variable-Length Coding (VLC)

R i b l i h i bl bi


193/304

Representing symbols with variable bit

Shannon-Fano Algorithm a top-down approach

1. Sort the symbols according to the frequency count of theiroccurrences.

2. Recursively divide the symbols into two parts, each withapproximately the same number of counts, until all parts containonly one symbol.

An Example: coding of HELLO

Frequency count of the symbols in HELLO.

Symbol H E L O

Count 1 1 2 1


194/304

Fig. 7.3: Coding Tree for HELLO by Shannon-Fano.

Table 7.1: Result of Performing Shannon-Fanoon HELLO


195/304

Symbol Count Log2 Code # of bits used

L 2 1.32 0 1

H 1 2.32 10 2

E 1 2.32 110 3

O 1 2.32 111 3TOTAL # of bits: 10

1

pi


196/304

Fig. 7.4 Another coding tree for HELLO by Shannon-Fano.

Table 7.2: Another Result of Performing Shannon-Fano

on HELLO (see Fig. 7.4)


197/304

Symbol Count Log2 Code # of bits used

L 2 1.32 00 4

H

fundamentals of multimedia slide

Documents