speech compression

26
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

Upload: anithabalaprabhu

Post on 05-Dec-2014

4.161 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Speech Compression

1

Audio Compression Techniques

MUMT 611, January 2005Assignment 2Paul Kolesnik

Page 2: Speech Compression

2

Introduction

Digital Audio Compression Removal of redundant or otherwise irrelevant

information from audio signal Audio compression algorithms are often referred to as

“audio encoders” Applications

Reduces required storage space Reduces required transmission bandwidth

Page 3: Speech Compression

3

Audio Compression

Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically,

uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate

Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data

compression / encoding

Page 4: Speech Compression

4

Audio Data Compression

Redundant information Implicit in the remaining informationEx. oversampled audio signal

Irrelevant informationPerceptually insignificantCannot be recovered from remaining

information

Page 5: Speech Compression

5

Audio Data Compression

Lossless Audio CompressionRemoves redundant dataResulting signal is same as original – perfect

reconstruction Lossy Audio Encoding

Removes irrelevant dataResulting signal is similar to original

Page 6: Speech Compression

6

Audio Data Compression

Audio vs. Speech Compression TechniquesSpeech Compression uses a human vocal

tract model to compress signalsAudio Compression does not use this

technique due to larger variety of possible signal variations

Page 7: Speech Compression

7

Generic Audio Encoder

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: Speech Compression

8

Generic Audio Encoder

Psychoacoustic ModelPsychoacoustics – study of how sounds are

perceived by humansUses perceptual coding

eliminate information from audio signal that is inaudible to the ear

Detects conditions under which different audio signal components mask each other

Page 9: Speech Compression

9

Psychoacoustic Model

Signal MaskingThreshold cut-offSpectral (Frequency / Simultaneous) MaskingTemporal Masking

Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain

Page 10: Speech Compression

10

Signal Masking

Threshold cut-off Hearing threshold

level – a function of frequency

Any frequency components below the threshold will not be perceived by human ear

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: Speech Compression

11

Signal Masking

Spectral Masking A frequency

component can be partly or fully masked by another component that is close to it in frequency

This shifts the hearing threshold

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 12: Speech Compression

12

Signal Masking

Temporal Masking A quieter sound can

be masked by a louder sound if they are temporally close

Sounds that occur both (shortly) before and after volume increase can be masked

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 13: Speech Compression

13

Spectral Analysis

Tasks of Spectral AnalysisTo derive masking thresholds to determine

which signal components can be eliminatedTo generate a representation of the signal to

which masking thresholds can be applied Spectral Analysis is done through

transforms or filter banks

Page 14: Speech Compression

14

Spectral Analysis

TransformsFast Fourier Transform (FFT)Discrete Cosine Transform (DCT) - similar to

FFT but uses cosine values onlyModified Discrete Cosine Transform (MDCT)

[used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT

Page 15: Speech Compression

15

Spectral Analysis

Filter BanksTime sample blocks are passed through a set

of bandpass filtersMasking thresholds are applied to resulting

frequency subband signalsPoly-phase and wavelet banks are most

popular filter structures

Page 16: Speech Compression

16

Filter Bank Structures

Polyphase Filter Bank [used in all of the MPEG-1 encoders]Signal is separated into subbands, the widths

of which are equal over the entire frequency range

The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)

Page 17: Speech Compression

17

Filter Bank Structures

Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the

subbands are not evenly spaced (narrower for higher frequencies)

This allows for better time resolution (ex. short attacks), but at expense of frequency resolution

Page 18: Speech Compression

18

Noise Allocation

System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be

transmitted Any noise below the threshold is irrelevant

Frequency component quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for

each frequency component to keep noise under the threshold - this is known as noise allocation

Page 19: Speech Compression

19

Noise Allocation

Pre-echo In case a single audio block contains silence followed

by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding

This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case

This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)

Page 20: Speech Compression

20

Pre-echo Effect

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 21: Speech Compression

21

Additional Encoding Techniques

Other encoding techniques techniques are available (alternative or in combination)Predictive CodingCoupling / Delta EncodingHuffman Encoding

Page 22: Speech Compression

22

Additional Encoding Techniques

Predictive Coding Often used in speech and image compression Estimates the expected value for each sample based

on previous sample values Transmits/stores the difference between the expected

and received value Generates an estimate for the next sample and then

adjusts it by the difference stored for the current sample

Used for additional compression in MPEG2 AAC

Page 23: Speech Compression

23

Additional Encoding Techniques

Coupling / Delta encoding Used in cases where audio signal consists of two or

more channels (stereo or surround sound) Similarities between channels are used for

compression A sum and difference between two channels are

derived; difference is usually some value close to zero and therefore requires less space to encode

This is a case of lossless encoding process

Page 24: Speech Compression

24

Additional Encoding Techniques

Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal

is represented by a simpler symbol, and its value is stored in a look-up table

Implemented using a look-up tables in encoder and in decoder

Provides substantial lossless compression, but requires high computational power and therefore is not very popular

Used by MPEG1 and MPEG2 AAC

Page 25: Speech Compression

25

Encoding - Final Stages

Audio data packed into frames Frames stored or transmitted

Page 26: Speech Compression

26

Conclusion

HTML Bibliographyhttp://www.music.mcgill.ca/~pkoles

Questions