audio coding - aalto · • lossless audio coding – reducing the size of audio signal using...

1

© 2007-2017 V. Välimäki & A. Haghparast 1

Audio Coding

Vesa Välimäki

Department of Signal Processing and Acoustics

Aalto University

March 31, 2017

Sound checkSound checkSound check

0. General issues (Vesa & Fabian) 13.1.2017

1. History and future of audio DSP (Vesa) 20.1.2017

2. Digital filters in audio (Vesa) 27.1.2017

3. Audio filter design (Vesa) 3.2.2017

4. Analysis of audio signals (Vesa) 10.2.2017

No lecture (Evaluation week for Period III) 17.2.2017

5. Audio effects processing (Fabian) 24.2.2017

6. Synthesis of audio signals (Fabian) 3.3.2017

7. 3-D sound (Prof. Ville Pulkki) 10.3.2017

8. Physics-based sound synthesis (Vesa) 17.3.2017

9. Sampling rate conversion (Vesa) 24.3.2017

10. Audio coding (Vesa) 31.3.2017

Course Schedule in 2017

March 31, 20172

©2003-2017 Vesa Välimäki

2


Outline• Introduction

• Lossless Audio Coding

• Perceptual (Lossy) Audio Coding– Subband coding, time-to-frequency mapping, psychoacoustic

models, parametric coding

• MPEG standards and some new codes– MP3, AAC, USAC, OPUS

• Applications

• Demo on flutter echoPart of this lecture material was produced by Ms. Azadeh Haghparast

(TKK Dept. of Signal Processing and Acoustics, 2007)


Flutter Echo

DEMO

Joni

3


Bit-rate of Audio Signals• Bit-rate of one audio channel without compression (at 44.1 kHz):

16 bit 44100 samples/s = 700 kbit/s

• Stereo signal (44.1 kHz):

2 16 bit 44100 samples/s = 1.4 Mbit/s

• Additionally, bits are needed for error correction and syncronization– On a CD disk, 33 extra bits for each 16 bits of audio data are needed, so

the total bit-rate is 4.3 Mbit/s


Bit-Rate for Various Digital Audio Schemes

Application Format Sample rate

Audio bit-rate

Overhead bit-rate

Total bit-rate

Compact Disc (CD)

PCM 44.1 kHz 1.41 Mb/s 2.91 Mb/s 4.32 Mb/s

Digital Audio Tape (DAT)

PCM 44.1 kHz 1.41 Mb/s 1.67 Mb/s 3.08 Mb/s

Digital Compact Cassette (DCC)

MPEG-1 Layer I

48 kHz 384 kb/s 384 kb/s 768 kb/s

MiniDisc (MD) ATRAC 44.1 kHz 292 kb/s 718 kb/s 1.01 Mb/s

Digital Audio Broadcast (DAB)

MPEG-1 Layer II,III

48 kHz 256 kb/s 256 kb/s 512 kb/s

4


Applications of Audio Coding• Storage

– Music archives– Movie soundtracks– Music for electronic games

• Communication– Mobile multimedia – Internet streaming

• Broadcasting– Digital radio and TV

• Wireless audio– Hands-free headphones

and headsets– Wireless speakers


Classification of Audio Coding Techniques

• Lossless Audio Coding– Reducing the size of audio signal using redundancy reduction, such

as sample value distribution– The original signal values can be obtained by decoding– Shorten, FLAC, Monkey’s Audio, MPEG-4 ALS, Windows Media

Audio 9 Lossless, RealAudio Lossless, APT-X…

• Lossy Audio Coding– Reducing the size of audio signal using irrelevancy reduction– Use of limitations of human hearing, e.g., auditory masking– MPEG Audio (Layer 1, 2, 3), Dolby Digital, Ogg Vorbis, MPEG-AAC,

HILN (MPEG-4 Parametric Audio Coding), WMA (Windows Media Audio), …

5


Applications of Lossless Audio Coding• Archiving of original recordings

• Studio operations, such as mixing

• Digital music distribution over the Internet

• Portable music players/recorders

• Multi-channel audio (e.g., DVD-Audio)

• Bluetooth audio (headsets, speakers)


Principles of Lossless Audio Coding • A lossless audio coder comprises of three main blocks:

– Framing Divides the audio signal into frames, e.g., 100 ms– Decorrelation Removes redundancy (spectral whitening)– Entropy encoding Statistically efficient code book

The histogram of audio signals is often close to Laplace distribution: more small sample values than large ones

Short code for common sample values Long code for rare sample values

6


Principles of Lossless Audio Coding

• Two approaches to decorrelate the audio signal

1. Linear predictive model – Lossless representation Predictor coefficients + Error signal

2. Linear transform model – Lossless representation Transform coefficients + Error signal


Linear Prediction

• Predictor coefficients are determined– Usually using the Autocorrelation or Covariance method

• Each sample is estimated from its previous samples using predictor coefficients

,ˆ1

Mk

k

knxkaQnx

7


• Try several simple polynomial predictors and select the best one– The best predictor is the one that produces

output signal with the smallest amplitude

– The spectrum is whitened

– Integer coefficients to avoid rounding errors

– For example the following polynomial predictors

• xp0(n) = 0

• xp1(n) = x(n – 1)

• xp2(n) = 2x(n – 1) – x(n – 2)

• xp3(n) = 3x(n – 1) – 3x(n – 2) + x(n – 3)

Decorrelation With a Polynomial Predictor

(Ref. Hans and Schafer 2001)


Principles of Lossless Audio Coding

• Decorrelation by Linear Transform Model

8


LTAC

• LTAC: Lossless Transform Audio Coding

• Fixed or variable frame length

• Orthonormal Discrete Cosine Transform (DCT)

• Groups of 32 adjoining transform coefficients

• Rice coding for transform coefficients

• Arithmetic coding for the error signal


MPEG-4 ALS

9


MPEG-4 ALS

• Based on Linear Prediction– Optimal predictor coefficients are calculated based on an iterative

procedure

• Optimal order of predictor optimal predictor coefficients the smallest bit-rate

• Coefficients converted to arcsine

• Linear 8-bit quantization of arcsine coefficients

• Rice entropy coding


Comparison of Lossless Audio Codecs

≈ 50%Ref: Coalson, 2005. http://flac.sourceforge.net/comparison.html

10


Lossy Audio Coding• High compression ratios can be achieved, when the

signal is allowed to change– The goal is the minimal disturbance for human listeners

• Technology for end users– Listen to the coded material as is (no further processing, EQ etc.)– Unsuitable for high-quality recordings or archiving

• Subband audio coding– MP3, Dolby AC-3, Vorbis, WMA (Windows Media Player)

• Parametric audio coding– HILN (MPEG-4)


Applications of Lossy Audio Coding

• Portable music players and mobile phones– Also MiniDisk players

• Internet audio

• Digital TV

• Digital radio

• Movie soundtracks

11


Subband Audio Coding

• Perceptual audio coding

• Frequency-domain representation of audio signal

• Psychoacoustics model

• Threshold of hearing

• Shape the quantization below the threshold of hearing


Subband Audio Coding

• General block diagram of subband coder

12


Time to Frequency Mapping

• Time to frequency mapping techniques

– The simplest technique Fourier Transform (FFT)

– Filter bank technique

– Pseudo-Quadrature Mirror Filter bank (PQMF)

– Modified Discrete Cosine Transform (MDCT)


Filter Bank

• N-channel filter bank N parallel bandpass filters

• Uniform or non-uniform bandwidth

• Magnitude response of a uniform bandwidth N-channel filter bank

13


Filter Bank

• Analysis-synthesis filter bank Perfect Reconstruction filter bank


Filter Bank

• Down-sampling– Preserves data rate

– Problem: limiting the spectral bandwidth

aliasing (folding)

• Up-sampling– Restores data rate

– Problem: expanding the spectral bandwidth

imaging distortion

14


Pseudo-Quadrature Mirror Filter Bank

• Pseudo-Quadrature Mirror Filter (PQMF) Bank

• Design a narrow lowpass filter Prototype filter

• Other filters obtained by cosine modulation of the prototype filter

• MPEG-1 and MPEG-2 – 32 channels

– Prototype filter of order 512


2-Channel PQMF

• Design of a 2-channel analysis-synthesis filter bank

• Challenge: – Define the filters , , ,

– For Perfect Reconstruction:

– and also

)(1 zH)(0 zH )(1 zG)(0 zG

),(10 zHzG

.01 zHzG

.01 zHzH

15


Modified DCT Filter Bank

• Modified Discrete Cosine Transform Filter Bank• Also called Time-Domain Aliasing Cancellation (TDAC)• Special case of PQMF

– Length of the prototype filter is twice that of the PQMF– 50% overlap with the previous frame

• Prototype filter Sine function• Choice of window length

– Long window length good for stationary signal– Short window length suitable for transients


Subband Coding

• General block diagram of subband coder

16


Psychoacoustics

• Absolute threshold of hearing

• Masking phenomenon– Simultaneous masking, also called frequency masking

– Non-simultaneous masking, also called temporal masking

• Critical bandwidth

• Spread of masking


Sound Pressure Level

• Quantity for measuring the sound pressure

• P: pressure of sound

• P0: standard pressure level = 20 μPa– Sound pressure level at the hearing threshold, when f = 2 kHz

,)(log10)( 2

010 P

PdBSPL

17


Limits of Human Hearing

• Frequency range: 20 Hz … 20 kHz

• Most sensitive range: 1 kHz … 5 kHz

• Dynamic range: 20 dB … 95 dB

(for safe listening)

• Threshold of pain: About 120 dB


Absolute Threshold of Hearing

• Also called Threshold in Quiet

• Minimum level of sound of a pure tone perceived by an average human being in noiseless conditions– About 0 dB in mid frequencies

– Frequency components below this curve are inaudible

18


Threshold of Hearing


Masking Phenomenon

• The most important psychoacousticconcept for transparent audio coding

• Frequency masking– Concurrent masker and maskee sounds

• Temporal masking– Extends beyond the time duration in which the

masker occurs

19


Frequency Masking• The threshold of audibility of one sound is raised in the

presence of sound energy at neighboring frequencies


Frequency Masking Examples• Examples of the raising of the hearing threshold

Sine

White noise

Narrow-band noise

20


Masking Curves

• Four masking curves for measurements– Narrow-band noise masking narrow-band noise (NMN)

– Narrow-band noise masking tone (NMT)

– Tone masking narrow-band noise (TMN)

– Tone masking tone (TMT)


Narrow-Band Noise Masking Tones

21


Narrow-Band Noise Masking Tones


Tone Masking Tone

22


Temporal Masking

~ 100 msa few ms

• Difficult to employ in audio coding


Critical Bandwidth

• The frequency range around a masker frequency, in which the masking curve remains flat Critical Bandwidth

• Each Critical Bandwidth corresponds to a constant distance on the basilar membrane

• Unit of Critical Band Bark

.5.7/arctan5.3)/76.0arctan(13/ 2kHzfkHzfBarkz

23


Critical Bands on a Linear Frequency Scale

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Lineaarinen taajuus (kHz)Frequency (kHz)


Critical Bands on a Log Frequency Scale

10-1

100

101

0

0.2

0.4

0.6

0.8

1

Logaritminen taajuus (kHz)

• Constant bandwidth up to 500 Hz; then 1/3 octave

Frequency (kHz)

24


Spread of Masking• The effect of frequency

masking is not limited to within one critical bandwidth

• Various analytical masking spread functions models– Triangle function

– Schroeder function

– …


Perceptual Entropy & Bit Allocation

• Perceptual Entropy lower bound of the number of bits to have transparent quality

• Bit Allocation Algorithms– Allocate bit numbers according to the signal-to-mask

ratio (SMR)

– Noise-to-mask-ratio (NMR) remains below the masking threshold

• It is well known that SNR (signal-to-noise ratio) is not a meaningful measure in perceptual coding

25


Parametric Audio Coding

• Based on parametric modeling of audio signals– Cf. Parametric sound synthesis

• Very low bit-rate applications– Mobile communications, internet streaming

– 40 kbits/s and lower

• HILN (Harmonic and Individual Lines plus Noise)– Sinusoids plus noise modeling

– Parametric audio coding within the MPEG-4 standard

– Minimum bit rate: 4 kbit/s


HILN Encoder

26


Parametric Model of Audio Source in HILN

• Decomposition of audio signal its components– Individual sinusoid frequency and amplitude

– Harmonic tone fundamental frequency, amplitude, spectral envelope of partials

– Noise amplitude and spectral envelope

– Transients optional parameter, such as temporal envelope


HILN Perception Model

• Different from the perception model used in subbandcoding

• Effect of parameter deviation on signal quality– Bit allocation for quantization

• Influence of different parameters on the quality of decoded signal– Choice of model parameter for transmission

27


HILN Decoder

• Harmonics + sines + noise synthesis


MPEG Standards 1988-• The MPEG working group (Moving Picture Experts Group)

focuses on international standardisation of video and audiotechnology– Official name: ISO/IEC JTC1 SC29 WG11

• The most well known standards are MPEG-1 ja MPEG-2– MPEG-1 Layer 3 = MP3– MPEG-1 Layer 2 is used in the European digital radio standard (DAB)

• Also MPEG-4, MPEG-7, and MPEG-21– New multimedia standards, not only coding

• MPEG-D (2007): MPEG audio technologies – Includes MPEG Surround, Spatial Audio Object Coding (SAOC), and

Unified Speech and Audio Coding (USAC)

28


MP3• MPEG-1 layer 3 is MP3

– Old technology; standard from 1991

• The most common codec for ”almost” CD-quality audio


MP3

Ref. K. Brandenburg, 1999

29


Examples of MP3 Audio at Various Bit-rates

• One of the standard MPEG test signals: Suzanne Vega, “Tom’s Diner” (1987) - Duration 2 min 11 sec

• Signal-to-noise ratio does not describe well the sound quality of lossy audio coding, since errors are not only noise

Bit-rate Compression Bit/sample File size Quality1411 kbit/s 1:1 16 22565 kB Original CD128 kbit/s 1:11 1.5 2048 kB MP3 CD quality-96 kbit/s 1:15 1.1 1536 kB Almost CD quality-64 kbit/s 1:22 0.73 1026 kB -FM radio quality 32 kbit/s 1:44 0.36 514 kB Very low8 kbit/s 1:176 0.09 130 kB Not recommended


Example Test Image in Image Processing

• Lena– Very easy to see

consequences of processing

– Goal: everybody should use the same test data set

30


Typical Problems in MP3 Encoded Signals

Time / s

Fre

quen

cy /

kH

z

MP3 128kbps pre-echo

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

x 104

Time / s

Fre

quen

cy /

kH

z

Original

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

x 104

• Pre-echo, limited frequency range, additional noise – Castanets are a well known problematic signal


What's wrong with MP3?

DEMO

Lauri

31


MPEG-2 AAC• AAC = Advanced Audio Coding

– One of the most popular audio codecs (phones, YouTube, Apple, PlayStation…)

• Second-generation MPEG audio coding standard from 1997– In half-rate mode, the bit-rate is 50% of that of MPEG-1 Layer 1 at the same quality

– MP3 quality at 30% lower bit-rate (96 kbit/s)

• Multi-channel sound– Mono/stereo/multi-channel

– 1-48 channels and 0-16 effects channels (e.g. 5.1 movie sound)

• Many sample rates between 8 and 96 kHz– 8 kHz, 24 kHz, 32 kHz, 44,1 kHz, 48 kHz, 64 kHz, 88,2 kHz, 96 kHz

• Various bit-rates and variable bit-rate– For stereo signals: 16 ... 192 kbit/s


New Features in MPEG-2 AAC• Improved solutions over MP3

– Better frequency resolution: max 1024 bins (in MP3 only 576)

– Multi-symbol Huffman coding, in which four frequency points are combined

• Cleaner transition of frames– Modified Discrete Cosine Transform (MDCT) Filter Bank in which the impulse

response length is only 5.3 ms (in MP3 18.6 ms)

– Reduces pre-echo

• Temporal noise shaping (TNS)– Let the noise level raise in those parts of the frame, where it is not heard

• Original vs. AAC (mono, 32 kb/s)

32


Other Lossy Coding Methods• Dolby AC-3 or ”Dolby Digital”

– 5.1 movie sound: left, center, right, left-surround, right-surround, and LFE (Low-Frequency Effects, low frequencies < 120 Hz)

• Dolby E– A codec for professional systems, such as TV production

• DTS (Digital Theatre Systems)– Movie and home theatre sound

• Sony ATRAC (Adaptive Transform Acoustic Coding) and SDDS– ATRAC 4 is the coding method for MiniDisc (compression ratio only 1:5)

• Microsoft Windows Media Audio

• Lucent PAC (versions 1-4) and EPAC– PAC = Perceptual Audio Codec; EPAC = Enhanced PAC

– Used in the US for commercial satellite radio broadcasting (XM, Sirius)


5.1 Home Theatre Sound• Home theatre imitates the sound system of movie theatres

Left Right

Center

LFE (subwoofer)Left surround Right surround

33


Ogg Vorbis• Open source, IPR free (no patents)

• Almost the same quality as MP3, but depends on audio material– Pre-echos are audible in short transients (e.g. castanets)

• Variable bit-rate, but can be chosen to be practically constant– 45…500 kbit/s (stereo)

• Wikipedia and Spotify use Vorbis

• Used also in many computer games (Epic Games)

• Hardware support in some portable music players– Rio Karma 20GB, Cowon iAudio X5

Source: http://en.wikipedia.org/wiki/Vorbis


Subjective Comparison• The quality differences between audio coding methods can be

best evaluated with listening tests

• Top 5 audio codecs according to a listening test (in comparison against CD) (Soulodre et al., Journal of AES, 1998)

1. MPEG-2 AAC

2. Lucent PAC

3. MP3 (MPEG-1 Layer 3)

4. AC-3

5. MPEG-1 Layer 2

• Also other factors must be accounted for– Computational cost, sensitivity to bit errors, compatibility with other systems

34


Audio Codec Comparison from Year 1998

• It is not easy to evaluate the results– Quality depends on

bit-rate

• Coders are designed for different purposes (bit-rates)

Source: http://www.aac-audio.com/technology/aac.rp.0002.xprtLsnr.html


New Now and in Near Future• Low-delay coding

– Usually coders have too much delay for two-directional communication

– MPEG-4 AAC-LC

• Flexible multi-channel audio coding– Binaural Cue Coding (BCC): transmit only one channel and low bit-rate side

information (Level/Time-Difference/Correlation)

– Directional Audio Coding (DirAC) by Pulkki et al. (Aalto Univ.)

– Number and position of loudspeakers can be chosen freely

• Bandwidth extension– The highest octave is not coded but only some features are transmitted

– Reproduced by allowing lower frequencies to image (engl. spectral band replication)

– mp3PRO, AAC+, AMR-WB+ (Adaptive Multi-Rate Wideband by Nokia)

35


Bandwidth Expansion in mp3PRO

Source: Ziegler et al., 2002


Bluetooth Audio• The bitrate in Bluetooth transmission (about 500 kbit/s) is

insufficient for full-blown hi-fi audio etc. => Must compress!

Voice

Music

Voice-basednavigation

Ring tones

Hands-freeheadsets, wireless

headphonesand speakers

36


Bluetooth Audio• The SBC (Sub-Band Codec) is supported for bitrates

132…345 kbit/s (de Bont et al., 1995)– Much simpler than MP3 and other CD-quality coders

– 4 or 8 subbands

– Various sample rates 16…48 kHz, mono or stereo

– Low-latency: even 16 samples

• APT-X lossless audio codec can be used for high-qualityapplications, such as for movie soundtracks


Unified Speech and Audio Coding (USAC)• New codec (April 2012) for both music and speech using low

bit rates 12 … 64 kbit/s

• Part of the MPEG-D Part 3 standard (2007)

• Uses LPC and residual coding for speech

• Uses MDCT-based tools for audio

• Includes MPEG-4 Spectral Band Replication, MPEG Surround (for multi-channel audio coding), and Parametric Stereo

• Performs as well or better than the previous best speech or audio codes

37


Opus• A new Open source, IETF audio coding standard for

interactive real-time applications over the Internet (Internet Engineering Task Force, Sept. 2012)

• Based on Skype’s speech codec SILK (linear preduction) and CELT (a low-latency MDCT codec)– Either or both can be used

– When both are used, SILK codes the lower frequency band (< 8 kHz) and CELT codes the higher frequency band (8-20 kHz)

• Very low latency: 22.5 ms by default

Source: Antti Pakarinen, MSc thesis, 2012


Sound Example: USAC and Opus• Band

– Original

– USAC 64 kbps

– Opus 64 kbps

– Opus 14 kbps

• Bells– Original

– USAC 64 kbps

– Opus 64 kbps

– Opus 14 kbps

Source: Antti Pakarinen, MSc thesis, 2012

38


Use of Lossy Audio Coding

• Only suitable for end users– For audio that are listened to without further processing

– Real-time streaming over a slow network

– Do not equalize!

– Errors may become audible with low-quality loudspeakers

– It is not allowed to re-compress (tandem)

• Very useful DSP technology, when either the bit-rate or memory capacity is limited


Conclusion• The best compression ratio is obtained with lossy coding

– Part of the signal is discarded – accounting for the limitations of hearing

• Lossless coding is an alternative – 50% reduction is possible based on statistical properties of audio signals

• The most common method has been MPEG-1 Layer 3 (MP3)– Later perhaps mp3PRO or AAC+ or something else

• Audio coding is useful in many applications– Phones, music players, Internet, PC, digital TV and radio, wireless audio

• CD quality can be obtained with less than 1 bit per sample– Almost at 48 kbit/s (stereo)

39


Literature (1)• Karlheinz Brandenburg, “MP3 and AAC explained,” in Proc. AES 17th International Conference

on High Quality Audio Coding, 1999.

• Josh Coalson, “FLAC - Free Lossless Audio Codec”, 2005. http://flac.sourceforge.net/index.html.

• M. Hans and R. Schafer, “Lossless Compression of Digital Audio,” IEEE Signal Processing Magazine, pp. 21- 32, July 2001

• J. D. Johnston, “Perceptual coding of audio signals—a tutorial,” presented at the AES Convention, New York, Sept. 1997.

• P. Knoll, “MPEG digital audio coding,” IEEE Signal Processing Magazine, vol. 14, no. 5, pp. 59–81, July 1997.

• T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proc. IEEE, vol. 88, no. 4, pp. 451–513, April 2000.

• K. C. Pohlmann, Principles of Digital Audio. Fourth Edition, McGraw-Hill, 2000.

• T. Ziegler, A. Ehret, P. Ekstrand, and M. Lutzky, “Enhancing mp3 with SBR: Features and capabilities of the new mp3PRO Algorithm,” Audio Engineering Society Convention Paper 5560, Munich, May 2002.


Literature (2)• F. de Bont, M. Groenewegen and W. Oomen, “A High Quality Audio-Coding System at 128

kb/s,“ presented at the 98th AES Convention, Paris, France, Feb. 25-28, 1995.

• Udo Zölzer, Digital Audio Signal Processing, Wiley, 1997. Chapter 9, “Data Compression”, pp. 249–265.

audio coding - aalto · • lossless audio coding – reducing the size of audio signal using...

Documents