speech compression

1

Audio Compression Techniques

MUMT 611, January 2005Assignment 2Paul Kolesnik

2

Introduction

Digital Audio Compression Removal of redundant or otherwise irrelevant

information from audio signal Audio compression algorithms are often referred to as

“audio encoders” Applications

Reduces required storage space Reduces required transmission bandwidth

3

Audio Compression

Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically,

uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate

Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data

compression / encoding

4

Audio Data Compression

Redundant information Implicit in the remaining informationEx. oversampled audio signal

Irrelevant informationPerceptually insignificantCannot be recovered from remaining

information

5


Lossless Audio CompressionRemoves redundant dataResulting signal is same as original – perfect

reconstruction Lossy Audio Encoding

Removes irrelevant dataResulting signal is similar to original

6


Audio vs. Speech Compression TechniquesSpeech Compression uses a human vocal

tract model to compress signalsAudio Compression does not use this

technique due to larger variety of possible signal variations

7

Generic Audio Encoder

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

8

Generic Audio Encoder

Psychoacoustic ModelPsychoacoustics – study of how sounds are

perceived by humansUses perceptual coding

eliminate information from audio signal that is inaudible to the ear

Detects conditions under which different audio signal components mask each other

9

Psychoacoustic Model

Signal MaskingThreshold cut-offSpectral (Frequency / Simultaneous) MaskingTemporal Masking

Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain

10

Signal Masking

Threshold cut-off Hearing threshold

level – a function of frequency

Any frequency components below the threshold will not be perceived by human ear



11

Signal Masking

Spectral Masking A frequency

component can be partly or fully masked by another component that is close to it in frequency

This shifts the hearing threshold



12

Signal Masking

Temporal Masking A quieter sound can

be masked by a louder sound if they are temporally close

Sounds that occur both (shortly) before and after volume increase can be masked



13

Spectral Analysis

Tasks of Spectral AnalysisTo derive masking thresholds to determine

which signal components can be eliminatedTo generate a representation of the signal to

which masking thresholds can be applied Spectral Analysis is done through

transforms or filter banks

14

Spectral Analysis

TransformsFast Fourier Transform (FFT)Discrete Cosine Transform (DCT) - similar to

FFT but uses cosine values onlyModified Discrete Cosine Transform (MDCT)

[used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT

15

Spectral Analysis

Filter BanksTime sample blocks are passed through a set

of bandpass filtersMasking thresholds are applied to resulting

frequency subband signalsPoly-phase and wavelet banks are most

popular filter structures

16

Filter Bank Structures

Polyphase Filter Bank [used in all of the MPEG-1 encoders]Signal is separated into subbands, the widths

of which are equal over the entire frequency range

The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)

17

Filter Bank Structures

Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the

subbands are not evenly spaced (narrower for higher frequencies)

This allows for better time resolution (ex. short attacks), but at expense of frequency resolution

18

Noise Allocation

System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be

transmitted Any noise below the threshold is irrelevant

Frequency component quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for

each frequency component to keep noise under the threshold - this is known as noise allocation

19

Noise Allocation

Pre-echo In case a single audio block contains silence followed

by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding

This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case

This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)

20

Pre-echo Effect



21

Additional Encoding Techniques

Other encoding techniques techniques are available (alternative or in combination)Predictive CodingCoupling / Delta EncodingHuffman Encoding

22


Predictive Coding Often used in speech and image compression Estimates the expected value for each sample based

on previous sample values Transmits/stores the difference between the expected

and received value Generates an estimate for the next sample and then

adjusts it by the difference stored for the current sample

Used for additional compression in MPEG2 AAC

23


Coupling / Delta encoding Used in cases where audio signal consists of two or

more channels (stereo or surround sound) Similarities between channels are used for

compression A sum and difference between two channels are

derived; difference is usually some value close to zero and therefore requires less space to encode

This is a case of lossless encoding process

24


Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal

is represented by a simpler symbol, and its value is stored in a look-up table

Implemented using a look-up tables in encoder and in decoder

Provides substantial lossless compression, but requires high computational power and therefore is not very popular

Used by MPEG1 and MPEG2 AAC

25

Encoding - Final Stages

Audio data packed into frames Frames stored or transmitted

26

Conclusion

HTML Bibliographyhttp://www.music.mcgill.ca/~pkoles

Questions

speech compression

Technology

audio data compression

signals audio compression

signal masking temporal

separating audio

encoders signal

input signal

khz signal

premonitoring audio