sychoacoustics decs mpeg-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) •...

25
Fundamentals of Multimedia, Chapter 14 Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG-21 14.5 Further Exploration 1 Li & Drew c Prentice Hall 2003

Upload: others

Post on 23-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Chapter 14MPEG Audio Compression

14.1 Psychoacoustics

14.2 MPEG Audio

14.3 Other Commercial Audio Codecs

14.4 The Future: MPEG-7 and MPEG-21

14.5 Further Exploration

1 Li & Drew c!Prentice Hall 2003

Page 2: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

14.1 Psychoacoustics

• The range of human hearing is about 20 Hz to about 20 kHz

• The frequency range of the voice is typically only from about

500 Hz to 4 kHz

• The dynamic range, the ratio of the maximum sound ampli-

tude to the quietest sound that humans can hear, is on theorder of about 120 dB

2 Li & Drew c!Prentice Hall 2003

Page 3: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Frequency Masking

• Lossy audio data compression methods, such as MPEG/Audioencoding, remove some sounds which are masked anyway

• The general situation in regard to masking is as follows:

1. A lower tone can e!ectively mask (make us unable to

hear) a higher tone

2. The reverse is not true – a higher tone does not mask alower tone well

3. The greater the power in the masking tone, the wider isits influence – the broader the range of frequencies it can

mask.

4. As a consequence, if two tones are widely separated infrequency then little masking occurs

5 Li & Drew c!Prentice Hall 2003

Page 4: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Threshold of Hearing

• A plot of the threshold of human hearing for a pure tone

102 103 104−10

0

10

20

30

40

50

60

Hz

dB

Fig. 14.2: Threshold of human hearing, for pure tones

6 Li & Drew c!Prentice Hall 2003

Page 5: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Threshold of Hearing (cont’d)

• The threshold of hearing curve: if a sound is above the dB

level shown then the sound is audible

• Turning up a tone so that it equals or surpasses the curve

means that we can then distinguish the sound

• An approximate formula exists for this curve:

Threshold(f) = 3.64(f/1000)!0.8 ! 6.5 e!0.6(f/1000!3.3)2

+ 10!3(f/1000)4

(14.1)

– The threshold units are dB; the frequency for the origin(0,0) in formula (14.1) is 2,000 Hz: Threshold(f) = 0 at

f =2 kHz

7 Li & Drew c!Prentice Hall 2003

Page 6: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Frequency Masking Curves

• Frequency masking is studied by playing a particular puretone, say 1 kHz again, at a loud volume, and determining how

this tone a!ects our ability to hear tones nearby in frequency

– one would generate a 1 kHz masking tone, at a fixedsound level of 60 dB, and then raise the level of a nearby

tone, e.g., 1.1 kHz, until it is just audible

• The threshold in Fig. 14.3 plots the audible level for a singlemasking tone (1 kHz)

• Fig. 14.4 shows how the plot changes if other masking tones

are used

8 Li & Drew c!Prentice Hall 2003

Page 7: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15−10

0

10

20

30

40

50

60

70

Frequency (kHz)

dB

Audible tone

Inaudible tone

Fig. 14.3: E!ect on threshold for 1 kHz masking tone

9 Li & Drew c!Prentice Hall 2003

Page 8: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15−10

0

10

20

30

40

50

60

70

Frequency (kHz)

dB1 4 8

Fig. 14.4: E!ect of masking tone at three di!erent frequencies

10 Li & Drew c!Prentice Hall 2003

Page 9: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Temporal Masking

• Phenomenon: any loud tone will cause the hearing receptorsin the inner ear to become saturated and require time to

recover

• The following figures show the results of Masking experi-ments:

16 Li & Drew c!Prentice Hall 2003

Page 10: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

100Delay time (ms)

dB

Test tone

Mask tone

60

40

20

1000100−5

Fig. 14.6: The louder is the test tone, the shorter it takes for

our hearing to get over hearing the masking.

17 Li & Drew c!Prentice Hall 2003

Page 11: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

0

0.01

0.02

0.03 0

4

6

8−10

0

10

20

30

40

50

60

FrequencyTime

Leve

l (dB

)

Tones below surfaceare inaudible

Fig. 14.7: E!ect of temporal and frequency maskings dependingon both time and closeness in frequency.

18 Li & Drew c!Prentice Hall 2003

Page 12: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

14.2 MPEG Audio

• MPEG audio compression takes advantage of psychoa-

coustic models, constructing a large multi-dimensional lookuptable to transmit masked frequency components using fewer

bits

• MPEG Audio Overview

1. Applies a filter bank to the input to break it into its fre-quency components

2. In parallel, a psychoacoustic model is applied to the data

for bit allocation block

3. The number of bits allocated are used to quantize theinfo from the filter bank – providing the compression

20 Li & Drew c!Prentice Hall 2003

Page 13: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

MPEG Layers

• MPEG audio o!ers three compatible layers :

– Each succeeding layer able to understand the lower layers

– Each succeeding layer o!ering more complexity in the psy-choacoustic model and better compression for a given

level of audio quality

– each succeeding layer, with increased compression e!ec-

tiveness, accompanied by extra delay

• The objective of MPEG layers: a good tradeo! betweenquality and bit-rate

21 Li & Drew c!Prentice Hall 2003

Page 14: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

MPEG Layers (cont’d)

• Layer 1 quality can be quite good provided a comparativelyhigh bit-rate is available

– Digital Audio Tape typically uses Layer 1 at around 192 kbps

• Layer 2 has more complexity; was proposed for use in DigitalAudio Broadcasting

• Layer 3 (MP3) is most complex, and was originally aimed ataudio transmission over ISDN lines

• Most of the complexity increase is at the encoder, not thedecoder – accounting for the popularity of MP3 players

22 Li & Drew c!Prentice Hall 2003

Page 15: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

MPEG Audio Strategy

• MPEG approach to compression relies on:

– Quantization

– Human auditory system is not accurate within the widthof a critical band (perceived loudness and audibility of a

frequency)

• MPEG encoder employs a bank of filters to:

– Analyze the frequency (“spectral”) components of the au-

dio signal by calculating a frequency transform of a win-dow of signal values

– Decompose the signal into subbands by using a bank of

filters (Layer 1 & 2: “quadrature-mirror”; Layer 3: addsa DCT; psychoacoustic model: Fourier transform)

23 Li & Drew c!Prentice Hall 2003

Page 16: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

MPEG Audio Strategy (cont’d)

• Frequency masking: by using a psychoacoustic model toestimate the just noticeable noise level:

– Encoder balances the masking behavior and the availablenumber of bits by discarding inaudible frequencies

– Scaling quantization according to the sound level that is

left over, above masking levels

• May take into account the actual width of the critical bands:

– For practical purposes, audible frequencies are divided into25 main critical bands (Table 14.1)

– To keep simplicity, adopts a uniform width for all fre-quency analysis filters, using 32 overlapping subbands

24 Li & Drew c!Prentice Hall 2003

Page 17: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

MPEG Audio Compression Algorithm

What to dropAudio(PCM)input

Psychoacousticmodeling

Bit allocation,quantizing and

coding

Bitstreamformatting

Time tofrequency

transformation

Encodedbitstream

Frequencyto time

transformation

Bitstreamunpacking

Frequencysample

reconstruction

DecodedPCM audio

Encodedbitstream

Fig. 14.9: Basic MPEG Audio encoder and decoder.

25 Li & Drew c!Prentice Hall 2003

Page 18: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Basic Algorithm (cont’d)

• The algorithm proceeds by dividing the input into 32 fre-quency subbands, via a filter bank

– A linear operation taking 32 PCM samples, sampled intime; output is 32 frequency coe"cients

• In the Layer 1 encoder, the sets of 32 PCM values are firstassembled into a set of 12 groups of 32s

– an inherent time lag in the coder, equal to the time toaccumulate 384 (i.e., 12"32) samples

• Fig.14.11 shows how samples are organized

– A Layer 2 or Layer 3, frame actually accumulates morethan 12 samples for each subband: a frame includes 1,152samples

26 Li & Drew c!Prentice Hall 2003

Page 19: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

12samples

Each subband filter produces 1 sample outfor every 32 samples in

Audio (PCM)samples In

Subband filter 0

Subband filter 1

Subband filter 2

Subband filter 31Layer 1Frame

Layer 2 and Layer 3Frame

12samples

12samples

12samples

12samples

12samples

12samples

12samples

12samples

12samples

12samples

12samples

. . . . .

.

. . .

. . .

Fig. 14.11: MPEG Audio Frame Sizes

27 Li & Drew c!Prentice Hall 2003

Page 20: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

• Mask calculations are performed in parallel with subband fil-

tering, as in Fig. 4.13:

PCMaudio signal

Linearquantizer

Bitstreamformatting

Filter bank:32 subbands

1,024-pointFFT

Psychoacousticmodel

Coded audiosignal

Side-informationcoding

Fig. 14.13: MPEG-1 Audio Layers 1 and 2.

30 Li & Drew c!Prentice Hall 2003

Page 21: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Layer 2 of MPEG-1 Audio

• Main di!erence:

– Three groups of 12 samples are encoded in each frame andtemporal masking is brought into play, as well as frequencymasking

– Bit allocation is applied to window lengths of 36 samplesinstead of 12

– The resolution of the quantizers is increased from 15 bitsto 16

• Advantage:

– a single scaling factor can be used for all three groups

31 Li & Drew c!Prentice Hall 2003

Page 22: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

Layer 3 of MPEG-1 Audio

• Main di!erence:

– Employs a similar filter bank to that used in Layer 2,except using a set of filters with non-equal frequencies

– Takes into account stereo redundancy

– Uses Modified Discrete Cosine Transform (MDCT) — ad-dresses problems that the DCT has at boundaries of thewindow used by overlapping frames by 50%:

F (u) = 2N!1!

i=0

f(i) cos

"

2!

N

#

i +N/2 + 1

2

$

(u + 1/2)

%

, u = 0, .., N/2! 1

(14.7)

32 Li & Drew c!Prentice Hall 2003

Page 23: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

PCMaudio signal

Filter bank:32 subbands

1,024-pointFFT

Psychoacousticmodel

M-DCT Nonuniformquantization

Bitstreamformatting

Huffmancoding

Side-informationcoding

Coded audiosignal

Fig 14.14: MPEG-Audio Layer 3 Coding.

33 Li & Drew c!Prentice Hall 2003

Page 24: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

• Table 14.2 shows various achievable MP3 compression ratios:

Table 14.2: MP3 compression performance

Sound Quality Bandwidth Mode CompressionRatio

Telephony 3.0 kHz Mono 96:1Better than 4.5 kHz Mono 48:1Short-waveBetter than 7.5 kHz Mono 24:1AM radioSimilar to 11 kHz Stereo 26 - 24:1FM radioNear-CD 15 kHz Stereo 16:1

CD > 15 kHz Stereo 14 - 12:1

34 Li & Drew c!Prentice Hall 2003

Page 25: sychoacoustics decs MPEG-21 rationelgammal/classes/cs334/slide14_short.pdf · 4 ring (cont’d) • dB audible • curve sound •: (f 3. 64(f/1000)−0.8 − 6. 5 e −0.6(f/1000−3.3)2

Fundamentals of Multimedia, Chapter 14

14.5 Further Exploration!# Link to Further Exploration for Chapter 14.

In Chapter 14 the “Further Exploration” section of the text web-site, a number of useful links are given:

• Excellent collections of MPEG Audio and MP3 links.

• The “o"cial” MPEG Audio FAQ

• MPEG-4 Audio implements “Tools for Large Step Scala-bility”, An excellent reference is given by the Fraunhofer-Gesellschaft research institute, “MPEG 4 Audio Scalable Pro-

file”.

42 Li & Drew c!Prentice Hall 2003