audio coding - aalto · • lossless audio coding – reducing the size of audio signal using...

39
1 © 2007-2017 V. Välimäki & A. Haghparast 1 Audio Coding Vesa Välimäki Department of Signal Processing and Acoustics Aalto University March 31, 2017 Sound check Sound check Sound check 0. General issues (Vesa & Fabian) 13.1.2017 1. History and future of audio DSP (Vesa) 20.1.2017 2. Digital filters in audio (Vesa) 27.1.2017 3. Audio filter design (Vesa) 3.2.2017 4. Analysis of audio signals (Vesa) 10.2.2017 No lecture (Evaluation week for Period III) 17.2.2017 5. Audio effects processing (Fabian) 24.2.2017 6. Synthesis of audio signals (Fabian) 3.3.2017 7. 3-D sound (Prof. Ville Pulkki) 10.3.2017 8. Physics-based sound synthesis (Vesa) 17.3.2017 9. Sampling rate conversion (Vesa) 24.3.2017 10. Audio coding (Vesa) 31.3.2017 Course Schedule in 2017 March 31, 2017 2 ©2003-2017 Vesa Välimäki

Upload: lycong

Post on 22-Aug-2018

234 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

1

© 2007-2017 V. Välimäki & A. Haghparast 1

Audio Coding

Vesa Välimäki

Department of Signal Processing and Acoustics

Aalto University

March 31, 2017

Sound checkSound checkSound check

0. General issues (Vesa & Fabian) 13.1.2017

1. History and future of audio DSP (Vesa) 20.1.2017

2. Digital filters in audio (Vesa) 27.1.2017

3. Audio filter design (Vesa) 3.2.2017

4. Analysis of audio signals (Vesa) 10.2.2017

No lecture (Evaluation week for Period III) 17.2.2017

5. Audio effects processing (Fabian) 24.2.2017

6. Synthesis of audio signals (Fabian) 3.3.2017

7. 3-D sound (Prof. Ville Pulkki) 10.3.2017

8. Physics-based sound synthesis (Vesa) 17.3.2017

9. Sampling rate conversion (Vesa) 24.3.2017

10. Audio coding (Vesa) 31.3.2017

Course Schedule in 2017

March 31, 20172

©2003-2017 Vesa Välimäki

Page 2: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

2

© 2007-2017 V. Välimäki & A. Haghparast 3

Outline• Introduction

• Lossless Audio Coding

• Perceptual (Lossy) Audio Coding– Subband coding, time-to-frequency mapping, psychoacoustic

models, parametric coding

• MPEG standards and some new codes– MP3, AAC, USAC, OPUS

• Applications

• Demo on flutter echoPart of this lecture material was produced by Ms. Azadeh Haghparast

(TKK Dept. of Signal Processing and Acoustics, 2007)

© 2007-2017 V. Välimäki & A. Haghparast 4

Flutter Echo

DEMO

Joni

Page 3: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

3

© 2007-2017 V. Välimäki & A. Haghparast 5

Bit-rate of Audio Signals• Bit-rate of one audio channel without compression (at 44.1 kHz):

16 bit 44100 samples/s = 700 kbit/s

• Stereo signal (44.1 kHz):

2 16 bit 44100 samples/s = 1.4 Mbit/s

• Additionally, bits are needed for error correction and syncronization– On a CD disk, 33 extra bits for each 16 bits of audio data are needed, so

the total bit-rate is 4.3 Mbit/s

© 2007-2017 V. Välimäki & A. Haghparast 6

Bit-Rate for Various Digital Audio Schemes

Application Format Sample rate

Audio bit-rate

Overhead bit-rate

Total bit-rate

Compact Disc (CD)

PCM 44.1 kHz 1.41 Mb/s 2.91 Mb/s 4.32 Mb/s

Digital Audio Tape (DAT)

PCM 44.1 kHz 1.41 Mb/s 1.67 Mb/s 3.08 Mb/s

Digital Compact Cassette (DCC)

MPEG-1 Layer I

48 kHz 384 kb/s 384 kb/s 768 kb/s

MiniDisc (MD) ATRAC 44.1 kHz 292 kb/s 718 kb/s 1.01 Mb/s

Digital Audio Broadcast (DAB)

MPEG-1 Layer II,III

48 kHz 256 kb/s 256 kb/s 512 kb/s

Page 4: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

4

© 2007-2017 V. Välimäki & A. Haghparast 7

Applications of Audio Coding• Storage

– Music archives– Movie soundtracks– Music for electronic games

• Communication– Mobile multimedia – Internet streaming

• Broadcasting– Digital radio and TV

• Wireless audio– Hands-free headphones

and headsets– Wireless speakers

© 2007-2017 V. Välimäki & A. Haghparast 8

Classification of Audio Coding Techniques

• Lossless Audio Coding– Reducing the size of audio signal using redundancy reduction, such

as sample value distribution– The original signal values can be obtained by decoding– Shorten, FLAC, Monkey’s Audio, MPEG-4 ALS, Windows Media

Audio 9 Lossless, RealAudio Lossless, APT-X…

• Lossy Audio Coding– Reducing the size of audio signal using irrelevancy reduction– Use of limitations of human hearing, e.g., auditory masking– MPEG Audio (Layer 1, 2, 3), Dolby Digital, Ogg Vorbis, MPEG-AAC,

HILN (MPEG-4 Parametric Audio Coding), WMA (Windows Media Audio), …

Page 5: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

5

© 2007-2017 V. Välimäki & A. Haghparast 9

Applications of Lossless Audio Coding• Archiving of original recordings

• Studio operations, such as mixing

• Digital music distribution over the Internet

• Portable music players/recorders

• Multi-channel audio (e.g., DVD-Audio)

• Bluetooth audio (headsets, speakers)

© 2007-2017 V. Välimäki & A. Haghparast 10

Principles of Lossless Audio Coding • A lossless audio coder comprises of three main blocks:

– Framing Divides the audio signal into frames, e.g., 100 ms– Decorrelation Removes redundancy (spectral whitening)– Entropy encoding Statistically efficient code book

The histogram of audio signals is often close to Laplace distribution: more small sample values than large ones

Short code for common sample values Long code for rare sample values

Page 6: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

6

© 2007-2017 V. Välimäki & A. Haghparast 11

Principles of Lossless Audio Coding

• Two approaches to decorrelate the audio signal

1. Linear predictive model – Lossless representation Predictor coefficients + Error signal

2. Linear transform model – Lossless representation Transform coefficients + Error signal

© 2007-2017 V. Välimäki & A. Haghparast 12

Linear Prediction

• Predictor coefficients are determined– Usually using the Autocorrelation or Covariance method

• Each sample is estimated from its previous samples using predictor coefficients

,ˆ1

Mk

k

knxkaQnx

Page 7: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

7

© 2007-2017 V. Välimäki & A. Haghparast 13

• Try several simple polynomial predictors and select the best one– The best predictor is the one that produces

output signal with the smallest amplitude

– The spectrum is whitened

– Integer coefficients to avoid rounding errors

– For example the following polynomial predictors

• xp0(n) = 0

• xp1(n) = x(n – 1)

• xp2(n) = 2x(n – 1) – x(n – 2)

• xp3(n) = 3x(n – 1) – 3x(n – 2) + x(n – 3)

Decorrelation With a Polynomial Predictor

(Ref. Hans and Schafer 2001)

© 2007-2017 V. Välimäki & A. Haghparast 14

Principles of Lossless Audio Coding

• Decorrelation by Linear Transform Model

Page 8: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

8

© 2007-2017 V. Välimäki & A. Haghparast 15

LTAC

• LTAC: Lossless Transform Audio Coding

• Fixed or variable frame length

• Orthonormal Discrete Cosine Transform (DCT)

• Groups of 32 adjoining transform coefficients

• Rice coding for transform coefficients

• Arithmetic coding for the error signal

© 2007-2017 V. Välimäki & A. Haghparast 16

MPEG-4 ALS

Page 9: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

9

© 2007-2017 V. Välimäki & A. Haghparast 17

MPEG-4 ALS

• Based on Linear Prediction– Optimal predictor coefficients are calculated based on an iterative

procedure

• Optimal order of predictor optimal predictor coefficients the smallest bit-rate

• Coefficients converted to arcsine

• Linear 8-bit quantization of arcsine coefficients

• Rice entropy coding

© 2007-2017 V. Välimäki & A. Haghparast 18

Comparison of Lossless Audio Codecs

≈ 50%Ref: Coalson, 2005. http://flac.sourceforge.net/comparison.html

Page 10: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

10

© 2007-2017 V. Välimäki & A. Haghparast 19

Lossy Audio Coding• High compression ratios can be achieved, when the

signal is allowed to change– The goal is the minimal disturbance for human listeners

• Technology for end users– Listen to the coded material as is (no further processing, EQ etc.)– Unsuitable for high-quality recordings or archiving

• Subband audio coding– MP3, Dolby AC-3, Vorbis, WMA (Windows Media Player)

• Parametric audio coding– HILN (MPEG-4)

© 2007-2017 V. Välimäki & A. Haghparast 20

Applications of Lossy Audio Coding

• Portable music players and mobile phones– Also MiniDisk players

• Internet audio

• Digital TV

• Digital radio

• Movie soundtracks

Page 11: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

11

© 2007-2017 V. Välimäki & A. Haghparast 21

Subband Audio Coding

• Perceptual audio coding

• Frequency-domain representation of audio signal

• Psychoacoustics model

• Threshold of hearing

• Shape the quantization below the threshold of hearing

© 2007-2017 V. Välimäki & A. Haghparast 22

Subband Audio Coding

• General block diagram of subband coder

Page 12: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

12

© 2007-2017 V. Välimäki & A. Haghparast 23

Time to Frequency Mapping

• Time to frequency mapping techniques

– The simplest technique Fourier Transform (FFT)

– Filter bank technique

– Pseudo-Quadrature Mirror Filter bank (PQMF)

– Modified Discrete Cosine Transform (MDCT)

© 2007-2017 V. Välimäki & A. Haghparast 24

Filter Bank

• N-channel filter bank N parallel bandpass filters

• Uniform or non-uniform bandwidth

• Magnitude response of a uniform bandwidth N-channel filter bank

Page 13: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

13

© 2007-2017 V. Välimäki & A. Haghparast 25

Filter Bank

• Analysis-synthesis filter bank Perfect Reconstruction filter bank

© 2007-2017 V. Välimäki & A. Haghparast 26

Filter Bank

• Down-sampling– Preserves data rate

– Problem: limiting the spectral bandwidth

aliasing (folding)

• Up-sampling– Restores data rate

– Problem: expanding the spectral bandwidth

imaging distortion

Page 14: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

14

© 2007-2017 V. Välimäki & A. Haghparast 27

Pseudo-Quadrature Mirror Filter Bank

• Pseudo-Quadrature Mirror Filter (PQMF) Bank

• Design a narrow lowpass filter Prototype filter

• Other filters obtained by cosine modulation of the prototype filter

• MPEG-1 and MPEG-2 – 32 channels

– Prototype filter of order 512

© 2007-2017 V. Välimäki & A. Haghparast 28

2-Channel PQMF

• Design of a 2-channel analysis-synthesis filter bank

• Challenge: – Define the filters , , ,

– For Perfect Reconstruction:

– and also

)(1 zH)(0 zH )(1 zG)(0 zG

),(10 zHzG

.01 zHzG

.01 zHzH

Page 15: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

15

© 2007-2017 V. Välimäki & A. Haghparast 29

Modified DCT Filter Bank

• Modified Discrete Cosine Transform Filter Bank• Also called Time-Domain Aliasing Cancellation (TDAC)• Special case of PQMF

– Length of the prototype filter is twice that of the PQMF– 50% overlap with the previous frame

• Prototype filter Sine function• Choice of window length

– Long window length good for stationary signal– Short window length suitable for transients

© 2007-2017 V. Välimäki & A. Haghparast 30

Subband Coding

• General block diagram of subband coder

Page 16: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

16

© 2007-2017 V. Välimäki & A. Haghparast 31

Psychoacoustics

• Absolute threshold of hearing

• Masking phenomenon– Simultaneous masking, also called frequency masking

– Non-simultaneous masking, also called temporal masking

• Critical bandwidth

• Spread of masking

© 2007-2017 V. Välimäki & A. Haghparast 32

Sound Pressure Level

• Quantity for measuring the sound pressure

• P: pressure of sound

• P0: standard pressure level = 20 μPa– Sound pressure level at the hearing threshold, when f = 2 kHz

,)(log10)( 2

010 P

PdBSPL

Page 17: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

17

© 2007-2017 V. Välimäki & A. Haghparast 33

Limits of Human Hearing

• Frequency range: 20 Hz … 20 kHz

• Most sensitive range: 1 kHz … 5 kHz

• Dynamic range: 20 dB … 95 dB

(for safe listening)

• Threshold of pain: About 120 dB

© 2007-2017 V. Välimäki & A. Haghparast 34

Absolute Threshold of Hearing

• Also called Threshold in Quiet

• Minimum level of sound of a pure tone perceived by an average human being in noiseless conditions– About 0 dB in mid frequencies

– Frequency components below this curve are inaudible

Page 18: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

18

© 2007-2017 V. Välimäki & A. Haghparast 35

Threshold of Hearing

© 2007-2017 V. Välimäki & A. Haghparast 36

Masking Phenomenon

• The most important psychoacousticconcept for transparent audio coding

• Frequency masking– Concurrent masker and maskee sounds

• Temporal masking– Extends beyond the time duration in which the

masker occurs

Page 19: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

19

© 2007-2017 V. Välimäki & A. Haghparast 37

Frequency Masking• The threshold of audibility of one sound is raised in the

presence of sound energy at neighboring frequencies

© 2007-2017 V. Välimäki & A. Haghparast 38

Frequency Masking Examples• Examples of the raising of the hearing threshold

Sine

White noise

Narrow-band noise

Page 20: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

20

© 2007-2017 V. Välimäki & A. Haghparast 39

Masking Curves

• Four masking curves for measurements– Narrow-band noise masking narrow-band noise (NMN)

– Narrow-band noise masking tone (NMT)

– Tone masking narrow-band noise (TMN)

– Tone masking tone (TMT)

© 2007-2017 V. Välimäki & A. Haghparast 40

Narrow-Band Noise Masking Tones

Page 21: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

21

© 2007-2017 V. Välimäki & A. Haghparast 41

Narrow-Band Noise Masking Tones

© 2007-2017 V. Välimäki & A. Haghparast 42

Tone Masking Tone

Page 22: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

22

© 2007-2017 V. Välimäki & A. Haghparast 43

Temporal Masking

~ 100 msa few ms

• Difficult to employ in audio coding

© 2007-2017 V. Välimäki & A. Haghparast 44

Critical Bandwidth

• The frequency range around a masker frequency, in which the masking curve remains flat Critical Bandwidth

• Each Critical Bandwidth corresponds to a constant distance on the basilar membrane

• Unit of Critical Band Bark

.5.7/arctan5.3)/76.0arctan(13/ 2kHzfkHzfBarkz

Page 23: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

23

© 2007-2017 V. Välimäki & A. Haghparast 45

Critical Bands on a Linear Frequency Scale

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Lineaarinen taajuus (kHz)Frequency (kHz)

© 2007-2017 V. Välimäki & A. Haghparast 46

Critical Bands on a Log Frequency Scale

10-1

100

101

0

0.2

0.4

0.6

0.8

1

Logaritminen taajuus (kHz)

• Constant bandwidth up to 500 Hz; then 1/3 octave

Frequency (kHz)

Page 24: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

24

© 2007-2017 V. Välimäki & A. Haghparast 47

Spread of Masking• The effect of frequency

masking is not limited to within one critical bandwidth

• Various analytical masking spread functions models– Triangle function

– Schroeder function

– …

© 2007-2017 V. Välimäki & A. Haghparast 48

Perceptual Entropy & Bit Allocation

• Perceptual Entropy lower bound of the number of bits to have transparent quality

• Bit Allocation Algorithms– Allocate bit numbers according to the signal-to-mask

ratio (SMR)

– Noise-to-mask-ratio (NMR) remains below the masking threshold

• It is well known that SNR (signal-to-noise ratio) is not a meaningful measure in perceptual coding

Page 25: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

25

© 2007-2017 V. Välimäki & A. Haghparast 49

Parametric Audio Coding

• Based on parametric modeling of audio signals– Cf. Parametric sound synthesis

• Very low bit-rate applications– Mobile communications, internet streaming

– 40 kbits/s and lower

• HILN (Harmonic and Individual Lines plus Noise)– Sinusoids plus noise modeling

– Parametric audio coding within the MPEG-4 standard

– Minimum bit rate: 4 kbit/s

© 2007-2017 V. Välimäki & A. Haghparast 50

HILN Encoder

Page 26: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

26

© 2007-2017 V. Välimäki & A. Haghparast 51

Parametric Model of Audio Source in HILN

• Decomposition of audio signal its components– Individual sinusoid frequency and amplitude

– Harmonic tone fundamental frequency, amplitude, spectral envelope of partials

– Noise amplitude and spectral envelope

– Transients optional parameter, such as temporal envelope

© 2007-2017 V. Välimäki & A. Haghparast 52

HILN Perception Model

• Different from the perception model used in subbandcoding

• Effect of parameter deviation on signal quality– Bit allocation for quantization

• Influence of different parameters on the quality of decoded signal– Choice of model parameter for transmission

Page 27: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

27

© 2007-2017 V. Välimäki & A. Haghparast 53

HILN Decoder

• Harmonics + sines + noise synthesis

© 2007-2017 V. Välimäki & A. Haghparast 54

MPEG Standards 1988-• The MPEG working group (Moving Picture Experts Group)

focuses on international standardisation of video and audiotechnology– Official name: ISO/IEC JTC1 SC29 WG11

• The most well known standards are MPEG-1 ja MPEG-2– MPEG-1 Layer 3 = MP3– MPEG-1 Layer 2 is used in the European digital radio standard (DAB)

• Also MPEG-4, MPEG-7, and MPEG-21– New multimedia standards, not only coding

• MPEG-D (2007): MPEG audio technologies – Includes MPEG Surround, Spatial Audio Object Coding (SAOC), and

Unified Speech and Audio Coding (USAC)

Page 28: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

28

© 2007-2017 V. Välimäki & A. Haghparast 55

MP3• MPEG-1 layer 3 is MP3

– Old technology; standard from 1991

• The most common codec for ”almost” CD-quality audio

© 2007-2017 V. Välimäki & A. Haghparast 56

MP3

Ref. K. Brandenburg, 1999

Page 29: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

29

© 2007-2017 V. Välimäki & A. Haghparast 57

Examples of MP3 Audio at Various Bit-rates

• One of the standard MPEG test signals: Suzanne Vega, “Tom’s Diner” (1987) - Duration 2 min 11 sec

• Signal-to-noise ratio does not describe well the sound quality of lossy audio coding, since errors are not only noise

Bit-rate Compression Bit/sample File size Quality1411 kbit/s 1:1 16 22565 kB Original CD128 kbit/s 1:11 1.5 2048 kB MP3 CD quality-96 kbit/s 1:15 1.1 1536 kB Almost CD quality-64 kbit/s 1:22 0.73 1026 kB -FM radio quality 32 kbit/s 1:44 0.36 514 kB Very low8 kbit/s 1:176 0.09 130 kB Not recommended

© 2007-2017 V. Välimäki & A. Haghparast 58

Example Test Image in Image Processing

• Lena– Very easy to see

consequences of processing

– Goal: everybody should use the same test data set

Page 30: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

30

© 2007-2017 V. Välimäki & A. Haghparast 59

Typical Problems in MP3 Encoded Signals

Time / s

Fre

quen

cy /

kH

z

MP3 128kbps pre-echo

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

x 104

Time / s

Fre

quen

cy /

kH

z

Original

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

x 104

• Pre-echo, limited frequency range, additional noise – Castanets are a well known problematic signal

© 2007-2017 V. Välimäki & A. Haghparast 60

What's wrong with MP3?

DEMO

Lauri

Page 31: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

31

© 2007-2017 V. Välimäki & A. Haghparast 61

MPEG-2 AAC• AAC = Advanced Audio Coding

– One of the most popular audio codecs (phones, YouTube, Apple, PlayStation…)

• Second-generation MPEG audio coding standard from 1997– In half-rate mode, the bit-rate is 50% of that of MPEG-1 Layer 1 at the same quality

– MP3 quality at 30% lower bit-rate (96 kbit/s)

• Multi-channel sound– Mono/stereo/multi-channel

– 1-48 channels and 0-16 effects channels (e.g. 5.1 movie sound)

• Many sample rates between 8 and 96 kHz– 8 kHz, 24 kHz, 32 kHz, 44,1 kHz, 48 kHz, 64 kHz, 88,2 kHz, 96 kHz

• Various bit-rates and variable bit-rate– For stereo signals: 16 ... 192 kbit/s

© 2007-2017 V. Välimäki & A. Haghparast 62

New Features in MPEG-2 AAC• Improved solutions over MP3

– Better frequency resolution: max 1024 bins (in MP3 only 576)

– Multi-symbol Huffman coding, in which four frequency points are combined

• Cleaner transition of frames– Modified Discrete Cosine Transform (MDCT) Filter Bank in which the impulse

response length is only 5.3 ms (in MP3 18.6 ms)

– Reduces pre-echo

• Temporal noise shaping (TNS)– Let the noise level raise in those parts of the frame, where it is not heard

• Original vs. AAC (mono, 32 kb/s)

Page 32: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

32

© 2007-2017 V. Välimäki & A. Haghparast 63

Other Lossy Coding Methods• Dolby AC-3 or ”Dolby Digital”

– 5.1 movie sound: left, center, right, left-surround, right-surround, and LFE (Low-Frequency Effects, low frequencies < 120 Hz)

• Dolby E– A codec for professional systems, such as TV production

• DTS (Digital Theatre Systems)– Movie and home theatre sound

• Sony ATRAC (Adaptive Transform Acoustic Coding) and SDDS– ATRAC 4 is the coding method for MiniDisc (compression ratio only 1:5)

• Microsoft Windows Media Audio

• Lucent PAC (versions 1-4) and EPAC– PAC = Perceptual Audio Codec; EPAC = Enhanced PAC

– Used in the US for commercial satellite radio broadcasting (XM, Sirius)

© 2007-2017 V. Välimäki & A. Haghparast 64

5.1 Home Theatre Sound• Home theatre imitates the sound system of movie theatres

Left Right

Center

LFE (subwoofer)Left surround Right surround

Page 33: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

33

© 2007-2017 V. Välimäki & A. Haghparast 65

Ogg Vorbis• Open source, IPR free (no patents)

• Almost the same quality as MP3, but depends on audio material– Pre-echos are audible in short transients (e.g. castanets)

• Variable bit-rate, but can be chosen to be practically constant– 45…500 kbit/s (stereo)

• Wikipedia and Spotify use Vorbis

• Used also in many computer games (Epic Games)

• Hardware support in some portable music players– Rio Karma 20GB, Cowon iAudio X5

Source: http://en.wikipedia.org/wiki/Vorbis

© 2007-2017 V. Välimäki & A. Haghparast 66

Subjective Comparison• The quality differences between audio coding methods can be

best evaluated with listening tests

• Top 5 audio codecs according to a listening test (in comparison against CD) (Soulodre et al., Journal of AES, 1998)

1. MPEG-2 AAC

2. Lucent PAC

3. MP3 (MPEG-1 Layer 3)

4. AC-3

5. MPEG-1 Layer 2

• Also other factors must be accounted for– Computational cost, sensitivity to bit errors, compatibility with other systems

Page 34: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

34

© 2007-2017 V. Välimäki & A. Haghparast 67

Audio Codec Comparison from Year 1998

• It is not easy to evaluate the results– Quality depends on

bit-rate

• Coders are designed for different purposes (bit-rates)

Source: http://www.aac-audio.com/technology/aac.rp.0002.xprtLsnr.html

© 2007-2017 V. Välimäki & A. Haghparast 68

New Now and in Near Future• Low-delay coding

– Usually coders have too much delay for two-directional communication

– MPEG-4 AAC-LC

• Flexible multi-channel audio coding– Binaural Cue Coding (BCC): transmit only one channel and low bit-rate side

information (Level/Time-Difference/Correlation)

– Directional Audio Coding (DirAC) by Pulkki et al. (Aalto Univ.)

– Number and position of loudspeakers can be chosen freely

• Bandwidth extension– The highest octave is not coded but only some features are transmitted

– Reproduced by allowing lower frequencies to image (engl. spectral band replication)

– mp3PRO, AAC+, AMR-WB+ (Adaptive Multi-Rate Wideband by Nokia)

Page 35: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

35

© 2007-2017 V. Välimäki & A. Haghparast 69

Bandwidth Expansion in mp3PRO

Source: Ziegler et al., 2002

© 2007-2017 V. Välimäki & A. Haghparast 70

Bluetooth Audio• The bitrate in Bluetooth transmission (about 500 kbit/s) is

insufficient for full-blown hi-fi audio etc. => Must compress!

Voice

Music

Voice-basednavigation

Ring tones

Hands-freeheadsets, wireless

headphonesand speakers

Page 36: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

36

© 2007-2017 V. Välimäki & A. Haghparast 71

Bluetooth Audio• The SBC (Sub-Band Codec) is supported for bitrates

132…345 kbit/s (de Bont et al., 1995)– Much simpler than MP3 and other CD-quality coders

– 4 or 8 subbands

– Various sample rates 16…48 kHz, mono or stereo

– Low-latency: even 16 samples

• APT-X lossless audio codec can be used for high-qualityapplications, such as for movie soundtracks

© 2007-2017 V. Välimäki & A. Haghparast 72

Unified Speech and Audio Coding (USAC)• New codec (April 2012) for both music and speech using low

bit rates 12 … 64 kbit/s

• Part of the MPEG-D Part 3 standard (2007)

• Uses LPC and residual coding for speech

• Uses MDCT-based tools for audio

• Includes MPEG-4 Spectral Band Replication, MPEG Surround (for multi-channel audio coding), and Parametric Stereo

• Performs as well or better than the previous best speech or audio codes

Page 37: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

37

© 2007-2017 V. Välimäki & A. Haghparast 73

Opus• A new Open source, IETF audio coding standard for

interactive real-time applications over the Internet (Internet Engineering Task Force, Sept. 2012)

• Based on Skype’s speech codec SILK (linear preduction) and CELT (a low-latency MDCT codec)– Either or both can be used

– When both are used, SILK codes the lower frequency band (< 8 kHz) and CELT codes the higher frequency band (8-20 kHz)

• Very low latency: 22.5 ms by default

Source: Antti Pakarinen, MSc thesis, 2012

© 2007-2017 V. Välimäki & A. Haghparast 74

Sound Example: USAC and Opus• Band

– Original

– USAC 64 kbps

– Opus 64 kbps

– Opus 14 kbps

• Bells– Original

– USAC 64 kbps

– Opus 64 kbps

– Opus 14 kbps

Source: Antti Pakarinen, MSc thesis, 2012

Page 38: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

38

© 2007-2017 V. Välimäki & A. Haghparast 75

Use of Lossy Audio Coding

• Only suitable for end users– For audio that are listened to without further processing

– Real-time streaming over a slow network

– Do not equalize!

– Errors may become audible with low-quality loudspeakers

– It is not allowed to re-compress (tandem)

• Very useful DSP technology, when either the bit-rate or memory capacity is limited

© 2007-2017 V. Välimäki & A. Haghparast 76

Conclusion• The best compression ratio is obtained with lossy coding

– Part of the signal is discarded – accounting for the limitations of hearing

• Lossless coding is an alternative – 50% reduction is possible based on statistical properties of audio signals

• The most common method has been MPEG-1 Layer 3 (MP3)– Later perhaps mp3PRO or AAC+ or something else

• Audio coding is useful in many applications– Phones, music players, Internet, PC, digital TV and radio, wireless audio

• CD quality can be obtained with less than 1 bit per sample– Almost at 48 kbit/s (stereo)

Page 39: Audio Coding - Aalto · • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution

39

© 2007-2017 V. Välimäki & A. Haghparast 77

Literature (1)• Karlheinz Brandenburg, “MP3 and AAC explained,” in Proc. AES 17th International Conference

on High Quality Audio Coding, 1999.

• Josh Coalson, “FLAC - Free Lossless Audio Codec”, 2005. http://flac.sourceforge.net/index.html.

• M. Hans and R. Schafer, “Lossless Compression of Digital Audio,” IEEE Signal Processing Magazine, pp. 21- 32, July 2001

• J. D. Johnston, “Perceptual coding of audio signals—a tutorial,” presented at the AES Convention, New York, Sept. 1997.

• P. Knoll, “MPEG digital audio coding,” IEEE Signal Processing Magazine, vol. 14, no. 5, pp. 59–81, July 1997.

• T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proc. IEEE, vol. 88, no. 4, pp. 451–513, April 2000.

• K. C. Pohlmann, Principles of Digital Audio. Fourth Edition, McGraw-Hill, 2000.

• T. Ziegler, A. Ehret, P. Ekstrand, and M. Lutzky, “Enhancing mp3 with SBR: Features and capabilities of the new mp3PRO Algorithm,” Audio Engineering Society Convention Paper 5560, Munich, May 2002.

© 2007-2017 V. Välimäki & A. Haghparast 78

Literature (2)• F. de Bont, M. Groenewegen and W. Oomen, “A High Quality Audio-Coding System at 128

kb/s,“ presented at the 98th AES Convention, Paris, France, Feb. 25-28, 1995.

• Udo Zölzer, Digital Audio Signal Processing, Wiley, 1997. Chapter 9, “Data Compression”, pp. 249–265.