Download - Introducing Audio Signal Processing & Audio Codingcs3121/Lectures/UNSW_2014_Q3.pdf · Introducing Audio Signal Processing & Audio Coding Dr Michael Mason ... Introducing Audio Signal

© 2013 Dolby Laboratories, Inc.

Introducing Audio Signal Processing & Audio Coding

Dr Michael Mason

Snr Staff Eng., Team Lead (Applied Research)

Dolby Australia Pty Ltd

Introducing Audio Signal Processing & Audio Coding 1


Overview

• Audio Signal Processing Applications

• Audio Signal Processing Basics • Sampling

• What is an audio signal?

• Signal Processing Domains

• Case Study 1 – Headphone Virtualisation • Frequency Response

• FIR filtering

• Computational Complexity

• Case Study 2 – Perceptual Audio Coding • Psychoacoustics



Audio Signal Processing Applications

• Cinema

• Delivering channel based audio - 5.1 – 7.1

• Distribute movies to multiple screens in a multiplex

• Cinemas use speaker arrays – rather than single speakers – so processing required to fill the arrays from single channel feeds

• Rendering object based audio – Dolby Atmos

• Cinema soundtrack is express as individual objects and locations - in every cinema the movie is renderer for that specific cinema’s speaker locations

• Speaker equalisation & protection

• Process the audio sent to each speaker to compensate for the frequency response of the speaker cones.

• Ensure that audio sent to the speakers doesn’t over driver the speaker cones, which would damage them.




• Broadcast / Home Theatre

• Compression of Audio for DVD / Blu-ray Disc

• Perceptual audio coding (case study later)

• Multi-channel audio coding

• Multiple languages

• Multiple playback formats (stereo / 5.1 / etc)

• Broadcast end-to-end

• Capture, coding, transmission, playback

• AV Receivers (AVRs), Set Top Boxes (STBs)

• Games consoles




• Personal Audio

• Devices

• Mobile phones (feature phones & smart phones)

• Tablets

• Music players

• PCs

• Same issues as Home Theatre, but usually more limited acoustic hardware (i.e. cheap speakers)

• Headphone playback is a big use case (case study later)




• Voice Processing

• Many of the ‘same’ basic challenges – but because speech has some specifically different characteristics from general audio, different solutions exist

• Speech coders use different approaches than audio codecs

• What makes a good codec is measured differently

• The transmission bandwidths used for the data is much more limited

• Conferencing & Telephony



Audio Signal Processing Basics

• Sampling

• Digital signals have samples which are discrete in time and magnitude

• Process of converting a continuous signal to the digital domain is Sampling

• Two key questions when sampling are: How often to sample & how precisely?


Analogue

to Digital

Converter

(ADC)

Digital to

Analogue

Converter

(DAC)

Digital

Signal

Processing



• Sampling Frequency – Fs (how often?)

• Number of samples per second

• Nyquist rate:

• Greater than twice the highest frequency




• Resolution (how precisely?) • Each sample is represented by a number, how

many bits should we use?

• Converting a continuous value to a discrete value requires quantisation.

• Quantisation Error

• ‘1’ → 0.5

• ‘0’ → -0.5


0

1

-1.0 +1.0



• Resolution (how precisely?)

• By using more bits, we reduce the error

… skipping all the math …

• Each additional bit improves SNR (signal to noise ratio) by 6.01 dB


000

101

-1.0 +1.0



• Audio Signal

• Sampling Frequency

• Human perception – 20 Hz – 20,000 Hz

• Nyquist says Fs >= 40 kHz

• CD Audio: 44.1 kHz

• Blu-ray (and before that DAT): 48 kHz

• Bit depth

• Range of loudness relative to human hearing…

• Threshold of hearing – 0 dB

• Jet Engines – 110-140 dB

• Busy Road (standing at the curb) – 100 dB

• Sustained exposure will cause damage – 85dB

• 16 bits per sample gives ~ 96 dB of dynamic range

• 24 bits per sample = 144 dB




• Audio Signal

• Raw data rate

• 48 kHz, 16 bits per sample = 768 kbps / ch

• 3.86 GB for a 2hr movie (5.1 channels)




• Processing domains

• Sampled audio i.e., Pulse Code Modulated (PCM) data is in the time domain

• Not everything we want to do with audio is formulated as a time domain operation

• E.g.: Flattening the frequency response of a speaker

• The Fourier Transform expresses a signal in terms of it’s frequency components (sinusoids). Using it we can formulate processing in the frequency domain

• Whether processing is implemented in the time or the frequency domain can depend on where it is most efficient.

• Signal processing also has other useful transform domains which may offer advantages for specific types of processing (e.g. image coding often uses the discrete cosine transform – DCT)



HEADPHONE VIRTUALISATION

Case Study 1



Headphone Virtualisation

• How do you get surround sound out of a pair of headphones?




• Two things we need to achieve:

1. Make it sound like the audio is coming from different directions

2. Make it sound like the listener is in a room.

• Both can be achieved by filtering the signal using the impulse response of the room, or the head-related transfer functions.




• Room impulse response • By measuring how a short impulsive sound is altered by a

room, the room’s reflections and echoes can be characterised to create an impulse response.

https://www.youtube.com/watch?v=PkZjIHTJ4jc

• The impulse response can in turn be used to filter any signal, to make it sound like it was in the room.

• The process of filtering a signal using an impulse response is convolution:

• 𝑦[𝑛] = ℎ 𝑘 𝑥 𝑛 − 𝑘∞𝑘=−∞






• Room impulse response

• How many points would be required to capture a room? (i.e. how long is the impulse response?)

• Limiting the impulse response to 50ms gives us 1440 points (@48kHz)

• Considering the computational cost 1440 * 48k –> 69 MFLOPS




• Computational load • On a DSP chip with a single cycle

MAC -> 69 MIPS

• On an ARM, ‘MAC’s ~ 3.5 cycles each -> ~240 MIPS

• 5.1 channels -> 10 filters = 2,400 MIPS




• The solution? • Convolution in Time domain <-> Multiplication in Frequency

Domain • Fourier Transform the impulse response & the signal

• Block based, e.g., blocks of 2048 • O[N.log2(N)] -> k*22528 ~ 78,848

• Operate in the Frequency domain, • Complex multiplies -> 4 * 2048 -> 8,192

• Transform the result back to the time domain. • Same as forward transform

• Blocks per second? • 23 blocks/sec … ~4 MFLOPS / filter

• What about the HRTFs ?




• Head-related Transfer Function

• Measured on a dummy

• Applied as filters

• Same computational arguments lead us to the need to apply these in the frequency domain.

• NB: we don’t need to go back to the time domain between the two sets of filters



PERCEPTUAL AUDIO CODING

Case study 2



Perceptual Audio Coding

• How do you reduce the storage and transmission bandwidth requirements of Audio signals?

• Bitrates:

• Uncompressed : 768 kbps / ch

• DVD (AC3) : 448 kbps (5.1 channels)




• Audio Coding is Lossy

• Lossless compression: must perfectly reconstruct their source. (zip files)

• Lossy compression: can ‘throw away’ data if it isn’t ‘needed’. The reconstruction need only be ‘good enough.’

• Deciding which bits to ‘throw away’ and what is ‘good enough’ is the hard part.





Time/Frequency

analysis

Psychoacoustic

analysis

Quantisation

Bit allocation

Entropy

coding



• Psychoacoustics

• Study of sound Perception

• Perception implies the human experience – which include physiological and psychological factors.

• Is at the heart of the question of which parts of an audio signal are important, or unimportant.




• Psychoacoustics

• Most perceptual quantities are non-linear and subjective

• Loudness

• Non-linearly related to sound pressure

• Scales include: sone, phon

• Pitch

• Non-linearly related to frequency

• Scales include: Bark, Mel, ERB




• Frequency Masking




• Temporal Masking




• Time/Frequency analysis • Break the incoming signal into time blocks

and transform into the frequency domain

• Coding is always block based

• The frequency representation is analysed in bins of equal perceptual bandwidth (bark)

• Psychoacoustic analysis • Use the frequency representation of the

current block to calculate the masking curve

• Use the frequency masking curves from previous frames to account for temporal masking


Time/Frequency

analysis

Psychoacoustic

analysis

Quantisation

Bit allocation



• Masking Curve

• Areas of the spectrum where the masking curve is above the signal energy, represent ‘things we can’t hear’

• If we can’t hear them, we shouldn’t spend bits encoding them



Perceptual Audio Coding • Bit allocation

• Using the masking curve, we can calculate the allowed signal to noise ratio in each of the frequency bands

• Knowing that allocating a bit to a quantiser improves SNR by 6 dB, iterative allocate the bits available in the bit pool to band, until we either; run out of bits, or exceed the SNR requirements in all bands

• (any left over bits can be used to code the next frame)

• The bit distribution must be sent to the decoder

• Quantiser

• Quantise the frequency domain representation to send to the decoder.


Time/Frequency

analysis

Psychoacoustic

analysis

Quantisation

Bit allocation



• Decoding is ‘simple’

• Recreate the frequency representation of each frame

• Transform back to the time domain

• Additional processing can be used to enhance the reconstructed signal


Download - Introducing Audio Signal Processing & Audio Codingcs3121/Lectures/UNSW_2014_Q3.pdf · Introducing Audio Signal Processing & Audio Coding Dr Michael Mason ... Introducing Audio Signal

Top Related