© 2013 Dolby Laboratories, Inc.
Introducing Audio Signal Processing & Audio Coding
Dr Michael Mason
Snr Staff Eng., Team Lead (Applied Research)
Dolby Australia Pty Ltd
Introducing Audio Signal Processing & Audio Coding 1
© 2013 Dolby Laboratories, Inc.
Overview
• Audio Signal Processing Applications
• Audio Signal Processing Basics • Sampling
• What is an audio signal?
• Signal Processing Domains
• Case Study 1 – Headphone Virtualisation • Frequency Response
• FIR filtering
• Computational Complexity
• Case Study 2 – Perceptual Audio Coding • Psychoacoustics
Introducing Audio Signal Processing & Audio Coding 2
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Applications
• Cinema
• Delivering channel based audio - 5.1 – 7.1
• Distribute movies to multiple screens in a multiplex
• Cinemas use speaker arrays – rather than single speakers – so processing required to fill the arrays from single channel feeds
• Rendering object based audio – Dolby Atmos
• Cinema soundtrack is express as individual objects and locations - in every cinema the movie is renderer for that specific cinema’s speaker locations
• Speaker equalisation & protection
• Process the audio sent to each speaker to compensate for the frequency response of the speaker cones.
• Ensure that audio sent to the speakers doesn’t over driver the speaker cones, which would damage them.
Introducing Audio Signal Processing & Audio Coding 3
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Applications
• Broadcast / Home Theatre
• Compression of Audio for DVD / Blu-ray Disc
• Perceptual audio coding (case study later)
• Multi-channel audio coding
• Multiple languages
• Multiple playback formats (stereo / 5.1 / etc)
• Broadcast end-to-end
• Capture, coding, transmission, playback
• AV Receivers (AVRs), Set Top Boxes (STBs)
• Games consoles
Introducing Audio Signal Processing & Audio Coding 4
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Applications
• Personal Audio
• Devices
• Mobile phones (feature phones & smart phones)
• Tablets
• Music players
• PCs
• Same issues as Home Theatre, but usually more limited acoustic hardware (i.e. cheap speakers)
• Headphone playback is a big use case (case study later)
Introducing Audio Signal Processing & Audio Coding 5
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Applications
• Voice Processing
• Many of the ‘same’ basic challenges – but because speech has some specifically different characteristics from general audio, different solutions exist
• Speech coders use different approaches than audio codecs
• What makes a good codec is measured differently
• The transmission bandwidths used for the data is much more limited
• Conferencing & Telephony
Introducing Audio Signal Processing & Audio Coding 6
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Basics
• Sampling
• Digital signals have samples which are discrete in time and magnitude
• Process of converting a continuous signal to the digital domain is Sampling
• Two key questions when sampling are: How often to sample & how precisely?
Introducing Audio Signal Processing & Audio Coding 7
Analogue
to Digital
Converter
(ADC)
Digital to
Analogue
Converter
(DAC)
Digital
Signal
Processing
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Basics
• Sampling Frequency – Fs (how often?)
• Number of samples per second
• Nyquist rate:
• Greater than twice the highest frequency
Introducing Audio Signal Processing & Audio Coding 8
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Basics
• Resolution (how precisely?) • Each sample is represented by a number, how
many bits should we use?
• Converting a continuous value to a discrete value requires quantisation.
• Quantisation Error
• ‘1’ → 0.5
• ‘0’ → -0.5
Introducing Audio Signal Processing & Audio Coding 9
0
1
-1.0 +1.0
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Basics
• Resolution (how precisely?)
• By using more bits, we reduce the error
… skipping all the math …
• Each additional bit improves SNR (signal to noise ratio) by 6.01 dB
Introducing Audio Signal Processing & Audio Coding 10
000
101
-1.0 +1.0
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Basics
• Audio Signal
• Sampling Frequency
• Human perception – 20 Hz – 20,000 Hz
• Nyquist says Fs >= 40 kHz
• CD Audio: 44.1 kHz
• Blu-ray (and before that DAT): 48 kHz
• Bit depth
• Range of loudness relative to human hearing…
• Threshold of hearing – 0 dB
• Jet Engines – 110-140 dB
• Busy Road (standing at the curb) – 100 dB
• Sustained exposure will cause damage – 85dB
• 16 bits per sample gives ~ 96 dB of dynamic range
• 24 bits per sample = 144 dB
Introducing Audio Signal Processing & Audio Coding 11
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Basics
• Audio Signal
• Raw data rate
• 48 kHz, 16 bits per sample = 768 kbps / ch
• 3.86 GB for a 2hr movie (5.1 channels)
Introducing Audio Signal Processing & Audio Coding 12
© 2013 Dolby Laboratories, Inc.
Audio Signal Processing Basics
• Processing domains
• Sampled audio i.e., Pulse Code Modulated (PCM) data is in the time domain
• Not everything we want to do with audio is formulated as a time domain operation
• E.g.: Flattening the frequency response of a speaker
• The Fourier Transform expresses a signal in terms of it’s frequency components (sinusoids). Using it we can formulate processing in the frequency domain
• Whether processing is implemented in the time or the frequency domain can depend on where it is most efficient.
• Signal processing also has other useful transform domains which may offer advantages for specific types of processing (e.g. image coding often uses the discrete cosine transform – DCT)
Introducing Audio Signal Processing & Audio Coding 13
© 2013 Dolby Laboratories, Inc.
HEADPHONE VIRTUALISATION
Case Study 1
Introducing Audio Signal Processing & Audio Coding 14
© 2013 Dolby Laboratories, Inc.
Headphone Virtualisation
• How do you get surround sound out of a pair of headphones?
Introducing Audio Signal Processing & Audio Coding 15
© 2013 Dolby Laboratories, Inc.
Headphone Virtualisation
• Two things we need to achieve:
1. Make it sound like the audio is coming from different directions
2. Make it sound like the listener is in a room.
• Both can be achieved by filtering the signal using the impulse response of the room, or the head-related transfer functions.
Introducing Audio Signal Processing & Audio Coding 16
© 2013 Dolby Laboratories, Inc.
Headphone Virtualisation
• Room impulse response • By measuring how a short impulsive sound is altered by a
room, the room’s reflections and echoes can be characterised to create an impulse response.
https://www.youtube.com/watch?v=PkZjIHTJ4jc
• The impulse response can in turn be used to filter any signal, to make it sound like it was in the room.
• The process of filtering a signal using an impulse response is convolution:
• 𝑦[𝑛] = ℎ 𝑘 𝑥 𝑛 − 𝑘∞𝑘=−∞
Introducing Audio Signal Processing & Audio Coding 17
© 2013 Dolby Laboratories, Inc.
Headphone Virtualisation
• Room impulse response
• How many points would be required to capture a room? (i.e. how long is the impulse response?)
• Limiting the impulse response to 50ms gives us 1440 points (@48kHz)
• Considering the computational cost 1440 * 48k –> 69 MFLOPS
Introducing Audio Signal Processing & Audio Coding 18
© 2013 Dolby Laboratories, Inc.
Headphone Virtualisation
• Computational load • On a DSP chip with a single cycle
MAC -> 69 MIPS
• On an ARM, ‘MAC’s ~ 3.5 cycles each -> ~240 MIPS
• 5.1 channels -> 10 filters = 2,400 MIPS
Introducing Audio Signal Processing & Audio Coding 19
© 2013 Dolby Laboratories, Inc.
Headphone Virtualisation
• The solution? • Convolution in Time domain <-> Multiplication in Frequency
Domain • Fourier Transform the impulse response & the signal
• Block based, e.g., blocks of 2048 • O[N.log2(N)] -> k*22528 ~ 78,848
• Operate in the Frequency domain, • Complex multiplies -> 4 * 2048 -> 8,192
• Transform the result back to the time domain. • Same as forward transform
• Blocks per second? • 23 blocks/sec … ~4 MFLOPS / filter
• What about the HRTFs ?
Introducing Audio Signal Processing & Audio Coding 20
© 2013 Dolby Laboratories, Inc.
Headphone Virtualisation
• Head-related Transfer Function
• Measured on a dummy
• Applied as filters
• Same computational arguments lead us to the need to apply these in the frequency domain.
• NB: we don’t need to go back to the time domain between the two sets of filters
Introducing Audio Signal Processing & Audio Coding 21
© 2013 Dolby Laboratories, Inc.
PERCEPTUAL AUDIO CODING
Case study 2
Introducing Audio Signal Processing & Audio Coding 22
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• How do you reduce the storage and transmission bandwidth requirements of Audio signals?
• Bitrates:
• Uncompressed : 768 kbps / ch
• DVD (AC3) : 448 kbps (5.1 channels)
Introducing Audio Signal Processing & Audio Coding 23
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• Audio Coding is Lossy
• Lossless compression: must perfectly reconstruct their source. (zip files)
• Lossy compression: can ‘throw away’ data if it isn’t ‘needed’. The reconstruction need only be ‘good enough.’
• Deciding which bits to ‘throw away’ and what is ‘good enough’ is the hard part.
Introducing Audio Signal Processing & Audio Coding 24
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
Introducing Audio Signal Processing & Audio Coding 25
Time/Frequency
analysis
Psychoacoustic
analysis
Quantisation
Bit allocation
Entropy
coding
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• Psychoacoustics
• Study of sound Perception
• Perception implies the human experience – which include physiological and psychological factors.
• Is at the heart of the question of which parts of an audio signal are important, or unimportant.
Introducing Audio Signal Processing & Audio Coding 26
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• Psychoacoustics
• Most perceptual quantities are non-linear and subjective
• Loudness
• Non-linearly related to sound pressure
• Scales include: sone, phon
• Pitch
• Non-linearly related to frequency
• Scales include: Bark, Mel, ERB
Introducing Audio Signal Processing & Audio Coding 27
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• Frequency Masking
Introducing Audio Signal Processing & Audio Coding 28
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• Temporal Masking
Introducing Audio Signal Processing & Audio Coding 29
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• Time/Frequency analysis • Break the incoming signal into time blocks
and transform into the frequency domain
• Coding is always block based
• The frequency representation is analysed in bins of equal perceptual bandwidth (bark)
• Psychoacoustic analysis • Use the frequency representation of the
current block to calculate the masking curve
• Use the frequency masking curves from previous frames to account for temporal masking
Introducing Audio Signal Processing & Audio Coding 30
Time/Frequency
analysis
Psychoacoustic
analysis
Quantisation
Bit allocation
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• Masking Curve
• Areas of the spectrum where the masking curve is above the signal energy, represent ‘things we can’t hear’
• If we can’t hear them, we shouldn’t spend bits encoding them
Introducing Audio Signal Processing & Audio Coding 31
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding • Bit allocation
• Using the masking curve, we can calculate the allowed signal to noise ratio in each of the frequency bands
• Knowing that allocating a bit to a quantiser improves SNR by 6 dB, iterative allocate the bits available in the bit pool to band, until we either; run out of bits, or exceed the SNR requirements in all bands
• (any left over bits can be used to code the next frame)
• The bit distribution must be sent to the decoder
• Quantiser
• Quantise the frequency domain representation to send to the decoder.
Introducing Audio Signal Processing & Audio Coding 32
Time/Frequency
analysis
Psychoacoustic
analysis
Quantisation
Bit allocation
© 2013 Dolby Laboratories, Inc.
Perceptual Audio Coding
• Decoding is ‘simple’
• Recreate the frequency representation of each frame
• Transform back to the time domain
• Additional processing can be used to enhance the reconstructed signal
Introducing Audio Signal Processing & Audio Coding 33
© 2013 Dolby Laboratories, Inc. Introducing Audio Signal Processing & Audio Coding 34