tools for signal compression€¦ · table of contents introduction..... xi part1.tools...

22

Upload: others

Post on 16-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction
File Attachment
Coverjpg

Tools for Signal Compression

Tools for Signal Compression

Nicolas Moreau

First published 2011 in Great Britain and the United States by ISTE Ltd and John Wiley amp Sons Inc Adapted and updated from Outils pour la compression des signaux applications aux signaux audioechnologies du stockage drsquoeacutenergie published 2009 in France by Hermes ScienceLavoisier copy Institut Teacuteleacutecom et LAVOISIER 2009

Apart from any fair dealing for the purposes of research or private study or criticism or review as permitted under the Copyright Designs and Patents Act 1988 this publication may only be reproduced stored or transmitted in any form or by any means with the prior permission in writing of the publishers or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address

ISTE Ltd John Wiley amp Sons Inc 27-37 St Georgersquos Road 111 River Street London SW19 4EU Hoboken NJ 07030 UK USA

wwwistecouk wwwwileycom

copy ISTE Ltd 2011 The rights of Nicolas Moreau to be identified as the author of this work have been asserted by him in accordance with the Copyright Designs and Patents Act 1988 ____________________________________________________________________________________

Library of Congress Cataloging-in-Publication Data Moreau Nicolas 1945- [Outils pour la compression des signaux English] Tools for signal compression Nicolas Moreau p cm Adapted and updated from Outils pour la compression des signaux applications aux signaux audioechnologies du stockage denergie Includes bibliographical references and index ISBN 978-1-84821-255-8 1 Sound--Recording and reproducing--Digital techniques 2 Data compression (Telecommunication) 3 Speech processing systems I Title TK78814M6413 2011 6213893--dc22

2011003206

British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-84821-255-8 Printed and bound in Great Britain by CPI Antony Rowe Chippenham and Eastbourne

Table of Contents

Introduction xi

PART 1 TOOLS FOR SIGNAL COMPRESSION 1

Chapter 1 Scalar Quantization 3

11 Introduction 312 Optimum scalar quantization 4

121 Necessary conditions for optimization 5122 Quantization error power 7123 Further information 10

1231 LloydndashMax algorithm 101232 Non-linear transformation 101233 Scale factor 10

13 Predictive scalar quantization 10131 Principle 10132 Reminders on the theory of linear prediction 12

1321 Introduction least squares minimization 121322 Theoretical approach 131323 Comparing the two approaches 141324 Whitening filter 151325 Levinson algorithm 16

133 Prediction gain 171331 Definition 17

134 Asymptotic value of the prediction gain 17135 Closed-loop predictive scalar quantization 20

Chapter 2 Vector Quantization 23

21 Introduction 2322 Rationale 23

vi Tools for Signal Compression

23 Optimum codebook generation 2624 Optimum quantizer performance 2825 Using the quantizer 30

251 Tree-structured vector quantization 31252 Cartesian product vector quantization 31253 Gain-shape vector quantization 31254 Multistage vector quantization 31255 Vector quantization by transform 31256 Algebraic vector quantization 32

26 Gain-shape vector quantization 32261 Nearest neighbor rule 33262 LloydndashMax algorithm 34

Chapter 3 Sub-band Transform Coding 37

31 Introduction 3732 Equivalence of filter banks and transforms 3833 Bit allocation 40

331 Defining the problem 40332 Optimum bit allocation 41333 Practical algorithm 43334 Further information 43

34 Optimum transform 4635 Performance 48

351 Transform gain 48352 Simulation results 51

Chapter 4 Entropy Coding 53

41 Introduction 5342 Noiseless coding of discrete memoryless sources 54

421 Entropy of a source 54422 Coding a source 56

4221 Definitions 564222 Uniquely decodable instantaneous code 574223 Kraft inequality 584224 Optimal code 58

423 Theorem of noiseless coding of a memoryless discretesource 60

4231 Proposition 1 604232 Proposition 2 614233 Proposition 3 614234 Theorem 62

424 Constructing a code 624241 Shannon code 62

Table of Contents vii

4242 Huffman algorithm 634243 Example 1 63

425 Generalization 644251 Theorem 644252 Example 2 65

426 Arithmetic coding 6543 Noiseless coding of a discrete source with memory 66

431 New definitions 67432 Theorem of noiseless coding of a discrete source with

memory 68433 Example of a Markov source 69

4331 General details 694332 Example of transmitting documents by fax 70

44 Scalar quantizer with entropy constraint 73441 Introduction 73442 LloydndashMax quantizer 74443 Quantizer with entropy constraint 75

4431 Expression for the entropy 764432 Jensen inequality 774433 Optimum quantizer 784434 Gaussian source 78

45 Capacity of a discrete memoryless channel 79451 Introduction 79452 Mutual information 80453 Noisy-channel coding theorem 82454 Example symmetrical binary channel 82

46 Coding a discrete source with a fidelity criterion 83461 Problem 83462 Ratendashdistortion function 84463 Theorems 85

4631 Source coding theorem 854632 Combined source-channel coding 85

464 Special case quadratic distortion measure 854641 Shannonrsquos lower bound for a memoryless source 854642 Source with memory 86

465 Generalization 87

PART 2 AUDIO SIGNAL APPLICATIONS 89

Chapter 5 Introduction to Audio Signals 91

51 Speech signal characteristics 9152 Characteristics of music signals 9253 Standards and recommendations 93

viii Tools for Signal Compression

531 Telephone-band speech signals 935311 Public telephone network 935312 Mobile communication 945313 Other applications 95

532 Wideband speech signals 95533 High-fidelity audio signals 95

5331 MPEG-1 965332 MPEG-2 965333 MPEG-4 965334 MPEG-7 and MPEG-21 99

534 Evaluating the quality 99

Chapter 6 Speech Coding 101

61 PCM and ADPCM coders 10162 The 24 bits LPC-10 coder 102

621 Determining the filter coefficients 102622 Unvoiced sounds 103623 Voiced sounds 104624 Determining voiced and unvoiced sounds 106625 Bit rate constraint 107

63 The CELP coder 107631 Introduction 107632 Determining the synthesis filter coefficients 109633 Modeling the excitation 111

6331 Introducing a perceptual factor 1116332 Selecting the excitation model 1136333 Filtered codebook 1136334 Least squares minimization 1156335 Standard iterative algorithm 1166336 Choosing the excitation codebook 1176337 Introducing an adaptive codebook 118

634 Conclusion 121

Chapter 7 Audio Coding 123

71 Principles of ldquoperceptual codersrdquo 12372 MPEG-1 layer 1 coder 126

721 Timefrequency transform 127722 Psychoacoustic modeling and bit allocation 128723 Quantization 128

73 MPEG-2 AAC coder 13074 Dolby AC-3 coder 13475 Psychoacoustic model calculating a masking threshold 135

751 Introduction 135

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 2: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Tools for Signal Compression

Tools for Signal Compression

Nicolas Moreau

First published 2011 in Great Britain and the United States by ISTE Ltd and John Wiley amp Sons Inc Adapted and updated from Outils pour la compression des signaux applications aux signaux audioechnologies du stockage drsquoeacutenergie published 2009 in France by Hermes ScienceLavoisier copy Institut Teacuteleacutecom et LAVOISIER 2009

Apart from any fair dealing for the purposes of research or private study or criticism or review as permitted under the Copyright Designs and Patents Act 1988 this publication may only be reproduced stored or transmitted in any form or by any means with the prior permission in writing of the publishers or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address

ISTE Ltd John Wiley amp Sons Inc 27-37 St Georgersquos Road 111 River Street London SW19 4EU Hoboken NJ 07030 UK USA

wwwistecouk wwwwileycom

copy ISTE Ltd 2011 The rights of Nicolas Moreau to be identified as the author of this work have been asserted by him in accordance with the Copyright Designs and Patents Act 1988 ____________________________________________________________________________________

Library of Congress Cataloging-in-Publication Data Moreau Nicolas 1945- [Outils pour la compression des signaux English] Tools for signal compression Nicolas Moreau p cm Adapted and updated from Outils pour la compression des signaux applications aux signaux audioechnologies du stockage denergie Includes bibliographical references and index ISBN 978-1-84821-255-8 1 Sound--Recording and reproducing--Digital techniques 2 Data compression (Telecommunication) 3 Speech processing systems I Title TK78814M6413 2011 6213893--dc22

2011003206

British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-84821-255-8 Printed and bound in Great Britain by CPI Antony Rowe Chippenham and Eastbourne

Table of Contents

Introduction xi

PART 1 TOOLS FOR SIGNAL COMPRESSION 1

Chapter 1 Scalar Quantization 3

11 Introduction 312 Optimum scalar quantization 4

121 Necessary conditions for optimization 5122 Quantization error power 7123 Further information 10

1231 LloydndashMax algorithm 101232 Non-linear transformation 101233 Scale factor 10

13 Predictive scalar quantization 10131 Principle 10132 Reminders on the theory of linear prediction 12

1321 Introduction least squares minimization 121322 Theoretical approach 131323 Comparing the two approaches 141324 Whitening filter 151325 Levinson algorithm 16

133 Prediction gain 171331 Definition 17

134 Asymptotic value of the prediction gain 17135 Closed-loop predictive scalar quantization 20

Chapter 2 Vector Quantization 23

21 Introduction 2322 Rationale 23

vi Tools for Signal Compression

23 Optimum codebook generation 2624 Optimum quantizer performance 2825 Using the quantizer 30

251 Tree-structured vector quantization 31252 Cartesian product vector quantization 31253 Gain-shape vector quantization 31254 Multistage vector quantization 31255 Vector quantization by transform 31256 Algebraic vector quantization 32

26 Gain-shape vector quantization 32261 Nearest neighbor rule 33262 LloydndashMax algorithm 34

Chapter 3 Sub-band Transform Coding 37

31 Introduction 3732 Equivalence of filter banks and transforms 3833 Bit allocation 40

331 Defining the problem 40332 Optimum bit allocation 41333 Practical algorithm 43334 Further information 43

34 Optimum transform 4635 Performance 48

351 Transform gain 48352 Simulation results 51

Chapter 4 Entropy Coding 53

41 Introduction 5342 Noiseless coding of discrete memoryless sources 54

421 Entropy of a source 54422 Coding a source 56

4221 Definitions 564222 Uniquely decodable instantaneous code 574223 Kraft inequality 584224 Optimal code 58

423 Theorem of noiseless coding of a memoryless discretesource 60

4231 Proposition 1 604232 Proposition 2 614233 Proposition 3 614234 Theorem 62

424 Constructing a code 624241 Shannon code 62

Table of Contents vii

4242 Huffman algorithm 634243 Example 1 63

425 Generalization 644251 Theorem 644252 Example 2 65

426 Arithmetic coding 6543 Noiseless coding of a discrete source with memory 66

431 New definitions 67432 Theorem of noiseless coding of a discrete source with

memory 68433 Example of a Markov source 69

4331 General details 694332 Example of transmitting documents by fax 70

44 Scalar quantizer with entropy constraint 73441 Introduction 73442 LloydndashMax quantizer 74443 Quantizer with entropy constraint 75

4431 Expression for the entropy 764432 Jensen inequality 774433 Optimum quantizer 784434 Gaussian source 78

45 Capacity of a discrete memoryless channel 79451 Introduction 79452 Mutual information 80453 Noisy-channel coding theorem 82454 Example symmetrical binary channel 82

46 Coding a discrete source with a fidelity criterion 83461 Problem 83462 Ratendashdistortion function 84463 Theorems 85

4631 Source coding theorem 854632 Combined source-channel coding 85

464 Special case quadratic distortion measure 854641 Shannonrsquos lower bound for a memoryless source 854642 Source with memory 86

465 Generalization 87

PART 2 AUDIO SIGNAL APPLICATIONS 89

Chapter 5 Introduction to Audio Signals 91

51 Speech signal characteristics 9152 Characteristics of music signals 9253 Standards and recommendations 93

viii Tools for Signal Compression

531 Telephone-band speech signals 935311 Public telephone network 935312 Mobile communication 945313 Other applications 95

532 Wideband speech signals 95533 High-fidelity audio signals 95

5331 MPEG-1 965332 MPEG-2 965333 MPEG-4 965334 MPEG-7 and MPEG-21 99

534 Evaluating the quality 99

Chapter 6 Speech Coding 101

61 PCM and ADPCM coders 10162 The 24 bits LPC-10 coder 102

621 Determining the filter coefficients 102622 Unvoiced sounds 103623 Voiced sounds 104624 Determining voiced and unvoiced sounds 106625 Bit rate constraint 107

63 The CELP coder 107631 Introduction 107632 Determining the synthesis filter coefficients 109633 Modeling the excitation 111

6331 Introducing a perceptual factor 1116332 Selecting the excitation model 1136333 Filtered codebook 1136334 Least squares minimization 1156335 Standard iterative algorithm 1166336 Choosing the excitation codebook 1176337 Introducing an adaptive codebook 118

634 Conclusion 121

Chapter 7 Audio Coding 123

71 Principles of ldquoperceptual codersrdquo 12372 MPEG-1 layer 1 coder 126

721 Timefrequency transform 127722 Psychoacoustic modeling and bit allocation 128723 Quantization 128

73 MPEG-2 AAC coder 13074 Dolby AC-3 coder 13475 Psychoacoustic model calculating a masking threshold 135

751 Introduction 135

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 3: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Tools for Signal Compression

Nicolas Moreau

First published 2011 in Great Britain and the United States by ISTE Ltd and John Wiley amp Sons Inc Adapted and updated from Outils pour la compression des signaux applications aux signaux audioechnologies du stockage drsquoeacutenergie published 2009 in France by Hermes ScienceLavoisier copy Institut Teacuteleacutecom et LAVOISIER 2009

Apart from any fair dealing for the purposes of research or private study or criticism or review as permitted under the Copyright Designs and Patents Act 1988 this publication may only be reproduced stored or transmitted in any form or by any means with the prior permission in writing of the publishers or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address

ISTE Ltd John Wiley amp Sons Inc 27-37 St Georgersquos Road 111 River Street London SW19 4EU Hoboken NJ 07030 UK USA

wwwistecouk wwwwileycom

copy ISTE Ltd 2011 The rights of Nicolas Moreau to be identified as the author of this work have been asserted by him in accordance with the Copyright Designs and Patents Act 1988 ____________________________________________________________________________________

Library of Congress Cataloging-in-Publication Data Moreau Nicolas 1945- [Outils pour la compression des signaux English] Tools for signal compression Nicolas Moreau p cm Adapted and updated from Outils pour la compression des signaux applications aux signaux audioechnologies du stockage denergie Includes bibliographical references and index ISBN 978-1-84821-255-8 1 Sound--Recording and reproducing--Digital techniques 2 Data compression (Telecommunication) 3 Speech processing systems I Title TK78814M6413 2011 6213893--dc22

2011003206

British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-84821-255-8 Printed and bound in Great Britain by CPI Antony Rowe Chippenham and Eastbourne

Table of Contents

Introduction xi

PART 1 TOOLS FOR SIGNAL COMPRESSION 1

Chapter 1 Scalar Quantization 3

11 Introduction 312 Optimum scalar quantization 4

121 Necessary conditions for optimization 5122 Quantization error power 7123 Further information 10

1231 LloydndashMax algorithm 101232 Non-linear transformation 101233 Scale factor 10

13 Predictive scalar quantization 10131 Principle 10132 Reminders on the theory of linear prediction 12

1321 Introduction least squares minimization 121322 Theoretical approach 131323 Comparing the two approaches 141324 Whitening filter 151325 Levinson algorithm 16

133 Prediction gain 171331 Definition 17

134 Asymptotic value of the prediction gain 17135 Closed-loop predictive scalar quantization 20

Chapter 2 Vector Quantization 23

21 Introduction 2322 Rationale 23

vi Tools for Signal Compression

23 Optimum codebook generation 2624 Optimum quantizer performance 2825 Using the quantizer 30

251 Tree-structured vector quantization 31252 Cartesian product vector quantization 31253 Gain-shape vector quantization 31254 Multistage vector quantization 31255 Vector quantization by transform 31256 Algebraic vector quantization 32

26 Gain-shape vector quantization 32261 Nearest neighbor rule 33262 LloydndashMax algorithm 34

Chapter 3 Sub-band Transform Coding 37

31 Introduction 3732 Equivalence of filter banks and transforms 3833 Bit allocation 40

331 Defining the problem 40332 Optimum bit allocation 41333 Practical algorithm 43334 Further information 43

34 Optimum transform 4635 Performance 48

351 Transform gain 48352 Simulation results 51

Chapter 4 Entropy Coding 53

41 Introduction 5342 Noiseless coding of discrete memoryless sources 54

421 Entropy of a source 54422 Coding a source 56

4221 Definitions 564222 Uniquely decodable instantaneous code 574223 Kraft inequality 584224 Optimal code 58

423 Theorem of noiseless coding of a memoryless discretesource 60

4231 Proposition 1 604232 Proposition 2 614233 Proposition 3 614234 Theorem 62

424 Constructing a code 624241 Shannon code 62

Table of Contents vii

4242 Huffman algorithm 634243 Example 1 63

425 Generalization 644251 Theorem 644252 Example 2 65

426 Arithmetic coding 6543 Noiseless coding of a discrete source with memory 66

431 New definitions 67432 Theorem of noiseless coding of a discrete source with

memory 68433 Example of a Markov source 69

4331 General details 694332 Example of transmitting documents by fax 70

44 Scalar quantizer with entropy constraint 73441 Introduction 73442 LloydndashMax quantizer 74443 Quantizer with entropy constraint 75

4431 Expression for the entropy 764432 Jensen inequality 774433 Optimum quantizer 784434 Gaussian source 78

45 Capacity of a discrete memoryless channel 79451 Introduction 79452 Mutual information 80453 Noisy-channel coding theorem 82454 Example symmetrical binary channel 82

46 Coding a discrete source with a fidelity criterion 83461 Problem 83462 Ratendashdistortion function 84463 Theorems 85

4631 Source coding theorem 854632 Combined source-channel coding 85

464 Special case quadratic distortion measure 854641 Shannonrsquos lower bound for a memoryless source 854642 Source with memory 86

465 Generalization 87

PART 2 AUDIO SIGNAL APPLICATIONS 89

Chapter 5 Introduction to Audio Signals 91

51 Speech signal characteristics 9152 Characteristics of music signals 9253 Standards and recommendations 93

viii Tools for Signal Compression

531 Telephone-band speech signals 935311 Public telephone network 935312 Mobile communication 945313 Other applications 95

532 Wideband speech signals 95533 High-fidelity audio signals 95

5331 MPEG-1 965332 MPEG-2 965333 MPEG-4 965334 MPEG-7 and MPEG-21 99

534 Evaluating the quality 99

Chapter 6 Speech Coding 101

61 PCM and ADPCM coders 10162 The 24 bits LPC-10 coder 102

621 Determining the filter coefficients 102622 Unvoiced sounds 103623 Voiced sounds 104624 Determining voiced and unvoiced sounds 106625 Bit rate constraint 107

63 The CELP coder 107631 Introduction 107632 Determining the synthesis filter coefficients 109633 Modeling the excitation 111

6331 Introducing a perceptual factor 1116332 Selecting the excitation model 1136333 Filtered codebook 1136334 Least squares minimization 1156335 Standard iterative algorithm 1166336 Choosing the excitation codebook 1176337 Introducing an adaptive codebook 118

634 Conclusion 121

Chapter 7 Audio Coding 123

71 Principles of ldquoperceptual codersrdquo 12372 MPEG-1 layer 1 coder 126

721 Timefrequency transform 127722 Psychoacoustic modeling and bit allocation 128723 Quantization 128

73 MPEG-2 AAC coder 13074 Dolby AC-3 coder 13475 Psychoacoustic model calculating a masking threshold 135

751 Introduction 135

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 4: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

First published 2011 in Great Britain and the United States by ISTE Ltd and John Wiley amp Sons Inc Adapted and updated from Outils pour la compression des signaux applications aux signaux audioechnologies du stockage drsquoeacutenergie published 2009 in France by Hermes ScienceLavoisier copy Institut Teacuteleacutecom et LAVOISIER 2009

Apart from any fair dealing for the purposes of research or private study or criticism or review as permitted under the Copyright Designs and Patents Act 1988 this publication may only be reproduced stored or transmitted in any form or by any means with the prior permission in writing of the publishers or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address

ISTE Ltd John Wiley amp Sons Inc 27-37 St Georgersquos Road 111 River Street London SW19 4EU Hoboken NJ 07030 UK USA

wwwistecouk wwwwileycom

copy ISTE Ltd 2011 The rights of Nicolas Moreau to be identified as the author of this work have been asserted by him in accordance with the Copyright Designs and Patents Act 1988 ____________________________________________________________________________________

Library of Congress Cataloging-in-Publication Data Moreau Nicolas 1945- [Outils pour la compression des signaux English] Tools for signal compression Nicolas Moreau p cm Adapted and updated from Outils pour la compression des signaux applications aux signaux audioechnologies du stockage denergie Includes bibliographical references and index ISBN 978-1-84821-255-8 1 Sound--Recording and reproducing--Digital techniques 2 Data compression (Telecommunication) 3 Speech processing systems I Title TK78814M6413 2011 6213893--dc22

2011003206

British Library Cataloguing-in-Publication Data A CIP record for this book is available from the British Library ISBN 978-1-84821-255-8 Printed and bound in Great Britain by CPI Antony Rowe Chippenham and Eastbourne

Table of Contents

Introduction xi

PART 1 TOOLS FOR SIGNAL COMPRESSION 1

Chapter 1 Scalar Quantization 3

11 Introduction 312 Optimum scalar quantization 4

121 Necessary conditions for optimization 5122 Quantization error power 7123 Further information 10

1231 LloydndashMax algorithm 101232 Non-linear transformation 101233 Scale factor 10

13 Predictive scalar quantization 10131 Principle 10132 Reminders on the theory of linear prediction 12

1321 Introduction least squares minimization 121322 Theoretical approach 131323 Comparing the two approaches 141324 Whitening filter 151325 Levinson algorithm 16

133 Prediction gain 171331 Definition 17

134 Asymptotic value of the prediction gain 17135 Closed-loop predictive scalar quantization 20

Chapter 2 Vector Quantization 23

21 Introduction 2322 Rationale 23

vi Tools for Signal Compression

23 Optimum codebook generation 2624 Optimum quantizer performance 2825 Using the quantizer 30

251 Tree-structured vector quantization 31252 Cartesian product vector quantization 31253 Gain-shape vector quantization 31254 Multistage vector quantization 31255 Vector quantization by transform 31256 Algebraic vector quantization 32

26 Gain-shape vector quantization 32261 Nearest neighbor rule 33262 LloydndashMax algorithm 34

Chapter 3 Sub-band Transform Coding 37

31 Introduction 3732 Equivalence of filter banks and transforms 3833 Bit allocation 40

331 Defining the problem 40332 Optimum bit allocation 41333 Practical algorithm 43334 Further information 43

34 Optimum transform 4635 Performance 48

351 Transform gain 48352 Simulation results 51

Chapter 4 Entropy Coding 53

41 Introduction 5342 Noiseless coding of discrete memoryless sources 54

421 Entropy of a source 54422 Coding a source 56

4221 Definitions 564222 Uniquely decodable instantaneous code 574223 Kraft inequality 584224 Optimal code 58

423 Theorem of noiseless coding of a memoryless discretesource 60

4231 Proposition 1 604232 Proposition 2 614233 Proposition 3 614234 Theorem 62

424 Constructing a code 624241 Shannon code 62

Table of Contents vii

4242 Huffman algorithm 634243 Example 1 63

425 Generalization 644251 Theorem 644252 Example 2 65

426 Arithmetic coding 6543 Noiseless coding of a discrete source with memory 66

431 New definitions 67432 Theorem of noiseless coding of a discrete source with

memory 68433 Example of a Markov source 69

4331 General details 694332 Example of transmitting documents by fax 70

44 Scalar quantizer with entropy constraint 73441 Introduction 73442 LloydndashMax quantizer 74443 Quantizer with entropy constraint 75

4431 Expression for the entropy 764432 Jensen inequality 774433 Optimum quantizer 784434 Gaussian source 78

45 Capacity of a discrete memoryless channel 79451 Introduction 79452 Mutual information 80453 Noisy-channel coding theorem 82454 Example symmetrical binary channel 82

46 Coding a discrete source with a fidelity criterion 83461 Problem 83462 Ratendashdistortion function 84463 Theorems 85

4631 Source coding theorem 854632 Combined source-channel coding 85

464 Special case quadratic distortion measure 854641 Shannonrsquos lower bound for a memoryless source 854642 Source with memory 86

465 Generalization 87

PART 2 AUDIO SIGNAL APPLICATIONS 89

Chapter 5 Introduction to Audio Signals 91

51 Speech signal characteristics 9152 Characteristics of music signals 9253 Standards and recommendations 93

viii Tools for Signal Compression

531 Telephone-band speech signals 935311 Public telephone network 935312 Mobile communication 945313 Other applications 95

532 Wideband speech signals 95533 High-fidelity audio signals 95

5331 MPEG-1 965332 MPEG-2 965333 MPEG-4 965334 MPEG-7 and MPEG-21 99

534 Evaluating the quality 99

Chapter 6 Speech Coding 101

61 PCM and ADPCM coders 10162 The 24 bits LPC-10 coder 102

621 Determining the filter coefficients 102622 Unvoiced sounds 103623 Voiced sounds 104624 Determining voiced and unvoiced sounds 106625 Bit rate constraint 107

63 The CELP coder 107631 Introduction 107632 Determining the synthesis filter coefficients 109633 Modeling the excitation 111

6331 Introducing a perceptual factor 1116332 Selecting the excitation model 1136333 Filtered codebook 1136334 Least squares minimization 1156335 Standard iterative algorithm 1166336 Choosing the excitation codebook 1176337 Introducing an adaptive codebook 118

634 Conclusion 121

Chapter 7 Audio Coding 123

71 Principles of ldquoperceptual codersrdquo 12372 MPEG-1 layer 1 coder 126

721 Timefrequency transform 127722 Psychoacoustic modeling and bit allocation 128723 Quantization 128

73 MPEG-2 AAC coder 13074 Dolby AC-3 coder 13475 Psychoacoustic model calculating a masking threshold 135

751 Introduction 135

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 5: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Table of Contents

Introduction xi

PART 1 TOOLS FOR SIGNAL COMPRESSION 1

Chapter 1 Scalar Quantization 3

11 Introduction 312 Optimum scalar quantization 4

121 Necessary conditions for optimization 5122 Quantization error power 7123 Further information 10

1231 LloydndashMax algorithm 101232 Non-linear transformation 101233 Scale factor 10

13 Predictive scalar quantization 10131 Principle 10132 Reminders on the theory of linear prediction 12

1321 Introduction least squares minimization 121322 Theoretical approach 131323 Comparing the two approaches 141324 Whitening filter 151325 Levinson algorithm 16

133 Prediction gain 171331 Definition 17

134 Asymptotic value of the prediction gain 17135 Closed-loop predictive scalar quantization 20

Chapter 2 Vector Quantization 23

21 Introduction 2322 Rationale 23

vi Tools for Signal Compression

23 Optimum codebook generation 2624 Optimum quantizer performance 2825 Using the quantizer 30

251 Tree-structured vector quantization 31252 Cartesian product vector quantization 31253 Gain-shape vector quantization 31254 Multistage vector quantization 31255 Vector quantization by transform 31256 Algebraic vector quantization 32

26 Gain-shape vector quantization 32261 Nearest neighbor rule 33262 LloydndashMax algorithm 34

Chapter 3 Sub-band Transform Coding 37

31 Introduction 3732 Equivalence of filter banks and transforms 3833 Bit allocation 40

331 Defining the problem 40332 Optimum bit allocation 41333 Practical algorithm 43334 Further information 43

34 Optimum transform 4635 Performance 48

351 Transform gain 48352 Simulation results 51

Chapter 4 Entropy Coding 53

41 Introduction 5342 Noiseless coding of discrete memoryless sources 54

421 Entropy of a source 54422 Coding a source 56

4221 Definitions 564222 Uniquely decodable instantaneous code 574223 Kraft inequality 584224 Optimal code 58

423 Theorem of noiseless coding of a memoryless discretesource 60

4231 Proposition 1 604232 Proposition 2 614233 Proposition 3 614234 Theorem 62

424 Constructing a code 624241 Shannon code 62

Table of Contents vii

4242 Huffman algorithm 634243 Example 1 63

425 Generalization 644251 Theorem 644252 Example 2 65

426 Arithmetic coding 6543 Noiseless coding of a discrete source with memory 66

431 New definitions 67432 Theorem of noiseless coding of a discrete source with

memory 68433 Example of a Markov source 69

4331 General details 694332 Example of transmitting documents by fax 70

44 Scalar quantizer with entropy constraint 73441 Introduction 73442 LloydndashMax quantizer 74443 Quantizer with entropy constraint 75

4431 Expression for the entropy 764432 Jensen inequality 774433 Optimum quantizer 784434 Gaussian source 78

45 Capacity of a discrete memoryless channel 79451 Introduction 79452 Mutual information 80453 Noisy-channel coding theorem 82454 Example symmetrical binary channel 82

46 Coding a discrete source with a fidelity criterion 83461 Problem 83462 Ratendashdistortion function 84463 Theorems 85

4631 Source coding theorem 854632 Combined source-channel coding 85

464 Special case quadratic distortion measure 854641 Shannonrsquos lower bound for a memoryless source 854642 Source with memory 86

465 Generalization 87

PART 2 AUDIO SIGNAL APPLICATIONS 89

Chapter 5 Introduction to Audio Signals 91

51 Speech signal characteristics 9152 Characteristics of music signals 9253 Standards and recommendations 93

viii Tools for Signal Compression

531 Telephone-band speech signals 935311 Public telephone network 935312 Mobile communication 945313 Other applications 95

532 Wideband speech signals 95533 High-fidelity audio signals 95

5331 MPEG-1 965332 MPEG-2 965333 MPEG-4 965334 MPEG-7 and MPEG-21 99

534 Evaluating the quality 99

Chapter 6 Speech Coding 101

61 PCM and ADPCM coders 10162 The 24 bits LPC-10 coder 102

621 Determining the filter coefficients 102622 Unvoiced sounds 103623 Voiced sounds 104624 Determining voiced and unvoiced sounds 106625 Bit rate constraint 107

63 The CELP coder 107631 Introduction 107632 Determining the synthesis filter coefficients 109633 Modeling the excitation 111

6331 Introducing a perceptual factor 1116332 Selecting the excitation model 1136333 Filtered codebook 1136334 Least squares minimization 1156335 Standard iterative algorithm 1166336 Choosing the excitation codebook 1176337 Introducing an adaptive codebook 118

634 Conclusion 121

Chapter 7 Audio Coding 123

71 Principles of ldquoperceptual codersrdquo 12372 MPEG-1 layer 1 coder 126

721 Timefrequency transform 127722 Psychoacoustic modeling and bit allocation 128723 Quantization 128

73 MPEG-2 AAC coder 13074 Dolby AC-3 coder 13475 Psychoacoustic model calculating a masking threshold 135

751 Introduction 135

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 6: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

vi Tools for Signal Compression

23 Optimum codebook generation 2624 Optimum quantizer performance 2825 Using the quantizer 30

251 Tree-structured vector quantization 31252 Cartesian product vector quantization 31253 Gain-shape vector quantization 31254 Multistage vector quantization 31255 Vector quantization by transform 31256 Algebraic vector quantization 32

26 Gain-shape vector quantization 32261 Nearest neighbor rule 33262 LloydndashMax algorithm 34

Chapter 3 Sub-band Transform Coding 37

31 Introduction 3732 Equivalence of filter banks and transforms 3833 Bit allocation 40

331 Defining the problem 40332 Optimum bit allocation 41333 Practical algorithm 43334 Further information 43

34 Optimum transform 4635 Performance 48

351 Transform gain 48352 Simulation results 51

Chapter 4 Entropy Coding 53

41 Introduction 5342 Noiseless coding of discrete memoryless sources 54

421 Entropy of a source 54422 Coding a source 56

4221 Definitions 564222 Uniquely decodable instantaneous code 574223 Kraft inequality 584224 Optimal code 58

423 Theorem of noiseless coding of a memoryless discretesource 60

4231 Proposition 1 604232 Proposition 2 614233 Proposition 3 614234 Theorem 62

424 Constructing a code 624241 Shannon code 62

Table of Contents vii

4242 Huffman algorithm 634243 Example 1 63

425 Generalization 644251 Theorem 644252 Example 2 65

426 Arithmetic coding 6543 Noiseless coding of a discrete source with memory 66

431 New definitions 67432 Theorem of noiseless coding of a discrete source with

memory 68433 Example of a Markov source 69

4331 General details 694332 Example of transmitting documents by fax 70

44 Scalar quantizer with entropy constraint 73441 Introduction 73442 LloydndashMax quantizer 74443 Quantizer with entropy constraint 75

4431 Expression for the entropy 764432 Jensen inequality 774433 Optimum quantizer 784434 Gaussian source 78

45 Capacity of a discrete memoryless channel 79451 Introduction 79452 Mutual information 80453 Noisy-channel coding theorem 82454 Example symmetrical binary channel 82

46 Coding a discrete source with a fidelity criterion 83461 Problem 83462 Ratendashdistortion function 84463 Theorems 85

4631 Source coding theorem 854632 Combined source-channel coding 85

464 Special case quadratic distortion measure 854641 Shannonrsquos lower bound for a memoryless source 854642 Source with memory 86

465 Generalization 87

PART 2 AUDIO SIGNAL APPLICATIONS 89

Chapter 5 Introduction to Audio Signals 91

51 Speech signal characteristics 9152 Characteristics of music signals 9253 Standards and recommendations 93

viii Tools for Signal Compression

531 Telephone-band speech signals 935311 Public telephone network 935312 Mobile communication 945313 Other applications 95

532 Wideband speech signals 95533 High-fidelity audio signals 95

5331 MPEG-1 965332 MPEG-2 965333 MPEG-4 965334 MPEG-7 and MPEG-21 99

534 Evaluating the quality 99

Chapter 6 Speech Coding 101

61 PCM and ADPCM coders 10162 The 24 bits LPC-10 coder 102

621 Determining the filter coefficients 102622 Unvoiced sounds 103623 Voiced sounds 104624 Determining voiced and unvoiced sounds 106625 Bit rate constraint 107

63 The CELP coder 107631 Introduction 107632 Determining the synthesis filter coefficients 109633 Modeling the excitation 111

6331 Introducing a perceptual factor 1116332 Selecting the excitation model 1136333 Filtered codebook 1136334 Least squares minimization 1156335 Standard iterative algorithm 1166336 Choosing the excitation codebook 1176337 Introducing an adaptive codebook 118

634 Conclusion 121

Chapter 7 Audio Coding 123

71 Principles of ldquoperceptual codersrdquo 12372 MPEG-1 layer 1 coder 126

721 Timefrequency transform 127722 Psychoacoustic modeling and bit allocation 128723 Quantization 128

73 MPEG-2 AAC coder 13074 Dolby AC-3 coder 13475 Psychoacoustic model calculating a masking threshold 135

751 Introduction 135

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 7: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Table of Contents vii

4242 Huffman algorithm 634243 Example 1 63

425 Generalization 644251 Theorem 644252 Example 2 65

426 Arithmetic coding 6543 Noiseless coding of a discrete source with memory 66

431 New definitions 67432 Theorem of noiseless coding of a discrete source with

memory 68433 Example of a Markov source 69

4331 General details 694332 Example of transmitting documents by fax 70

44 Scalar quantizer with entropy constraint 73441 Introduction 73442 LloydndashMax quantizer 74443 Quantizer with entropy constraint 75

4431 Expression for the entropy 764432 Jensen inequality 774433 Optimum quantizer 784434 Gaussian source 78

45 Capacity of a discrete memoryless channel 79451 Introduction 79452 Mutual information 80453 Noisy-channel coding theorem 82454 Example symmetrical binary channel 82

46 Coding a discrete source with a fidelity criterion 83461 Problem 83462 Ratendashdistortion function 84463 Theorems 85

4631 Source coding theorem 854632 Combined source-channel coding 85

464 Special case quadratic distortion measure 854641 Shannonrsquos lower bound for a memoryless source 854642 Source with memory 86

465 Generalization 87

PART 2 AUDIO SIGNAL APPLICATIONS 89

Chapter 5 Introduction to Audio Signals 91

51 Speech signal characteristics 9152 Characteristics of music signals 9253 Standards and recommendations 93

viii Tools for Signal Compression

531 Telephone-band speech signals 935311 Public telephone network 935312 Mobile communication 945313 Other applications 95

532 Wideband speech signals 95533 High-fidelity audio signals 95

5331 MPEG-1 965332 MPEG-2 965333 MPEG-4 965334 MPEG-7 and MPEG-21 99

534 Evaluating the quality 99

Chapter 6 Speech Coding 101

61 PCM and ADPCM coders 10162 The 24 bits LPC-10 coder 102

621 Determining the filter coefficients 102622 Unvoiced sounds 103623 Voiced sounds 104624 Determining voiced and unvoiced sounds 106625 Bit rate constraint 107

63 The CELP coder 107631 Introduction 107632 Determining the synthesis filter coefficients 109633 Modeling the excitation 111

6331 Introducing a perceptual factor 1116332 Selecting the excitation model 1136333 Filtered codebook 1136334 Least squares minimization 1156335 Standard iterative algorithm 1166336 Choosing the excitation codebook 1176337 Introducing an adaptive codebook 118

634 Conclusion 121

Chapter 7 Audio Coding 123

71 Principles of ldquoperceptual codersrdquo 12372 MPEG-1 layer 1 coder 126

721 Timefrequency transform 127722 Psychoacoustic modeling and bit allocation 128723 Quantization 128

73 MPEG-2 AAC coder 13074 Dolby AC-3 coder 13475 Psychoacoustic model calculating a masking threshold 135

751 Introduction 135

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 8: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

viii Tools for Signal Compression

531 Telephone-band speech signals 935311 Public telephone network 935312 Mobile communication 945313 Other applications 95

532 Wideband speech signals 95533 High-fidelity audio signals 95

5331 MPEG-1 965332 MPEG-2 965333 MPEG-4 965334 MPEG-7 and MPEG-21 99

534 Evaluating the quality 99

Chapter 6 Speech Coding 101

61 PCM and ADPCM coders 10162 The 24 bits LPC-10 coder 102

621 Determining the filter coefficients 102622 Unvoiced sounds 103623 Voiced sounds 104624 Determining voiced and unvoiced sounds 106625 Bit rate constraint 107

63 The CELP coder 107631 Introduction 107632 Determining the synthesis filter coefficients 109633 Modeling the excitation 111

6331 Introducing a perceptual factor 1116332 Selecting the excitation model 1136333 Filtered codebook 1136334 Least squares minimization 1156335 Standard iterative algorithm 1166336 Choosing the excitation codebook 1176337 Introducing an adaptive codebook 118

634 Conclusion 121

Chapter 7 Audio Coding 123

71 Principles of ldquoperceptual codersrdquo 12372 MPEG-1 layer 1 coder 126

721 Timefrequency transform 127722 Psychoacoustic modeling and bit allocation 128723 Quantization 128

73 MPEG-2 AAC coder 13074 Dolby AC-3 coder 13475 Psychoacoustic model calculating a masking threshold 135

751 Introduction 135

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 9: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Table of Contents ix

752 The ear 135753 Critical bands 136754 Masking curves 137755 Masking threshold 139

Chapter 8 Audio Coding Additional Information 141

81 Low bit rateacceptable quality coders 141811 Tool one SBR 142812 Tool two PS 143

8121 Historical overview 1438122 Principle of PS audio coding 1438123 Results 144

813 Sound space perception 14582 High bit rate lossless or almost lossless coders 146

821 Introduction 146822 ISOIEC MPEG-4 standardization 147

8221 Principle 1478222 Some details 147

Chapter 9 Stereo Coding A Synthetic Presentation 149

91 Basic hypothesis and notation 14992 Determining the inter-channel indices 151

921 Estimating the power and the intercovariance 151922 Calculating the inter-channel indices 152923 Conclusion 154

93 Downmixing procedure 154931 Development in the time domain 155932 In the frequency domain 157

94 At the receiver 158941 Stereo signal reconstruction 158942 Power adjustment 159943 Phase alignment 160944 Information transmitted via the channel 161

95 Draft International Standard 161

PART 3 MATLAB PROGRAMS 163

Chapter 10 A Speech Coder 165

101 Introduction 165102 Script for the calling function 165103 Script for called functions 170

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 10: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

x Tools for Signal Compression

Chapter 11 A Music Coder 173

111 Introduction 173112 Script for the calling function 173113 Script for called functions 176

Bibliography 195

Index 199

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 11: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Introduction

In everyday life we often come in contact with compressed signals when usingmobile telephones mp3 players digital cameras or DVD players The signals in eachof these applications telephone-band speech high fidelity audio signal and still orvideo images are not only sampled and quantized to put them into a form suitable forsaving in mass storage devices or to send them across networks but also compressedThe first operation is very basic and is presented in all courses and introductory bookson signal processing The second operation is more specific and is the subject ofthis book here the standard tools for signal compression are presented followedby examples of how these tools are applied in compressing speech and musical audiosignals In the first part of this book we focus on a problem which is theoretical innature minimizing the mean squared error The second part is more concrete andqualifies the previous steps in seeking to minimize the bit rate while respecting thepsychoacoustic constraints We will see that signal compression consists of seekingnot only to eliminate all redundant parts of the original signal but also to attempt theelimination of inaudible parts of the signal

The compression techniques presented in this book are not new They are explainedin theoretical framework information theory and source coding aiming to formalizethe first (and the last) element in a digital communication channel the encodingof an analog signal (with continuous times and continuous values) to a digitalsignal (at discrete times and discrete values) The techniques come from the workby C Shannon published at the beginning of the 1950s However except for thedevelopment of speech encodings in the 1970s to promote an entirely digitallyswitched telephone network these techniques really came into use toward the end ofthe 1980s under the influence of working groups for example ldquoGroup Special Mobile(GSM)rdquo ldquoJoint Photographic Experts Group (JPEG)rdquo and ldquoMoving Picture ExpertsGroup (MPEG)rdquo

The results of these techniques are quite impressive and have allowed thedevelopment of the applications referred to earlier Let us consider the example of

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 12: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

xii Tools for Signal Compression

a music signal We know that a music signal can be reconstructed with quasi-perfectquality (CD quality) if it was sampled at a frequency of 441 kHz and quantized ata resolution of 16 bits When transferred across a network the required bit rate fora mono channel is 705 kbs The most successful audio encoding MPEG-4 AACensures ldquotransparencyrdquo at a bit rate of the order of 64 kbs giving a compression rategreater than 10 and the completely new encoding MPEG-4 HE-AACv2 standardizedin 2004 provides a very acceptable quality (for video on mobile phones) at 24 kbsfor 2 stereo channels The compression rate is better than 50

In the Part 1 of this book the standard tools (scalar quantization predictivequantization vector quantization transform and sub-band coding and entropy coding)are presented To compare the performance of these tools we use an academicexample of the quantization of the realization x(n) of a one-dimensional randomprocess X(n) Although this is a theoretical approach it not only allows objectiveassessment of performance but also shows the coherence between all the availabletools In the Part 2 we concentrate on the compression of audio signals (telephone-band speech wideband speech and high fidelity audio signals)

Throughout this book we discuss the basic ideas of signal processing using thefollowing language and notation We consider a one-dimensional stationary zero-mean random process X(n) with power σ2

X and power spectral density SX(f)We also assume that it is Gaussian primarily because the Gaussian distribution ispreserved in all linear transformations especially in a filter which greatly simplifiesthe notation and also because a Gaussian signal is the most difficult signal to encodebecause it carries the greatest quantization error for any bit rate A column vector ofNdimensions is denoted by X(m) and constructed with X(mN) middot middot middotX(mN +N minus 1)These N random variables are completely defined statistically by their probabilitydensity function

pX(x) =1

(2π)N2radicdetRX

exp(minus1

2xtRminus1

X x)

where RX is the autocovariance matrix

RX = EX(m)Xt(m) =

⎡⎢⎢⎢⎢⎣

rX(0) rX(1) middot middot middot rX(N minus 1)

rX(1)

rX(1)rX(N minus 1) middot middot middot rX(1) rX(0)

⎤⎥⎥⎥⎥⎦

Toeplitz matrix with N times N dimensions Moreover we assume an auto-regressiveprocess X(n) of order P obtained through filtering with white noise W (n) withvariance σ2

W via a filter of order P with a transfer function 1A(z) for A(z) inthe form

A(z) = 1 + a1zminus1 + middot middot middot+ aP z

minusP

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 13: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Introduction xiii

The purpose of considering the quantization of an auto-regressive waveform as ourexample is that it allows the simple explanation of all the statistical characteristics ofthe source waveform as a function of the parameters of the filter such as for examplethe power spectral density

SX(f) =σ2W

|A(f)|2

where the notation A(f) is inaccurate and should be more properly written asA(exp(j2πf)) It also allows us to give analytical expressions for the quantizationerror power for different quantization methods when quadratic error is chosen as themeasure of distortion Comparison of the performance of the different methods isthereby possible From a practical point of view this example is not useless because itis a reasonable model for a number of signals for example for speech signals (whichare only locally stationary) when the order P selected is high enough (eg 8 or 10)

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 14: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

PART 1

Tools for Signal Compression

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 15: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Chapter 1

Scalar Quantization

11 Introduction

Let us consider a discrete-time signal x(n) with values in the range [minusA+A]Defining a scalar quantization with a resolution of b bits per sample requires threeoperations

ndash partitioning the range [minusA+A] into L = 2b non-overlapping intervalsΘ1 middot middot middotΘL of length Δ1 middot middot middotΔL

ndash numbering the partitioned intervals i1 middot middot middot iL

ndash selecting the reproduction value for each interval the set of these reproductionvalues forms a dictionary (codebook) 1 C = x1 middot middot middot xL

Encoding (in the transmitter) consists of deciding which interval x(n) belongsto and then associating it with the corresponding number i(n) isin 1 middot middot middotL = 2bIt is the number of the chosen interval the symbol which is transmitted or storedThe decoding procedure (at the receiver) involves associating the correspondingreproduction value x(n) = xi(n) from the set of reproduction values x1 middot middot middot xLwith the number i(n) More formally we observe that quantization is a non-bijectivemapping to [minusA+A] in a finite set C with an assignment rule

x(n) = xi(n) isin x1 middot middot middot xL iff x(n) isin Θi

The process is irreversible and involves loss of information a quantization errorwhich is defined as q(n) = x(n) minus x(n) The definition of a distortion measure

1 In scalar quantization we usually speak about quantization levels quantization steps anddecision thresholds This language is also adopted for vector quantization

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 16: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

4 Tools for Signal Compression

d[x(n) x(n)] is required We use the simplest distortion measure quadratic error

d[x(n) x(n)] = |x(n) minus x(n)|2

This measures the error in each sample For a more global distortion measure weuse the mean squared error (MSE)

D = E|X(n)minus x(n)|2

This error is simply denoted as the quantization error power We use the notationσ2Q for the MSE

Figure 11(a) shows on the left the signal before quantization and the partition ofthe range [minusA+A] where b = 3 and Figure 11(b) shows the reproduction values thereconstructed signal and the quantization error The bitstream between the transmitterand the receiver is not shown

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

5 10 15 20 25 30 35 40 45 50ndash8

ndash6

ndash4

ndash2

0

2

4

6

8

(a) (b)

Figure 11 (a) The signal before quantization and the partition of the range[minusA+A] and (b) the set of reproduction values reconstructed signal and

quantization error

The problem now consists of defining the optimal quantization that is indefining the intervals Θ1 middot middot middotΘL and the set of reproduction values x1 middot middot middot xL tominimize σ2

Q

12 Optimum scalar quantization

Assume that x(n) is the realization of a real-valued stationary random processX(n) In scalar quantization what matters is the distribution of values that the random

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 17: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

Scalar Quantization 5

processX(n) takes at time n No other direct use of the correlation that exists betweenthe values of the process at different times is possible It is enough to know themarginal probability density function of X(n) which is written as pX()

121 Necessary conditions for optimization

To characterize the optimum scalar quantization the range partition andreproduction values must be found which minimize

σ2Q = E[X(n)minus x(n)]2 =

Lsumi=1

intuisinΘi

(uminus xi)2pX(u)du [11]

This joint minimization is not simple to solve However the two necessaryconditions for optimization are straightforward to find If the reproduction valuesx1 middot middot middot xL are known the best partition Θ1 middot middot middotΘL can be calculated Once thepartition is found the best reproduction values can be deduced The encoding partof quantization must be optimal if the decoding part is given and vice versa Thesetwo necessary conditions for optimization are simple to find when the squared error ischosen as the measure of distortion

ndash Condition 1 Given a codebook x1 middot middot middot xL the best partition will satisfy

Θi = x (xminus xi)2 le (xminus xj)2 forallj isin 1 middot middot middotL

This is the nearest neighbor rule

If we define ti such that it defines the boundary between the intervals Θi and Θi+1minimizing the MSE σ2

Q relative to ti is found by noting

part

partti

[int ti

timinus1

(uminus xi)2pX(u)du+

int ti+1

ti(u minus xi+1)2pX(u)du

]= 0

(ti minus xi)2pX(ti)minus (ti minus xi+1)2pX(ti) = 0

such that

ti =xi + xi+1

2

ndash Condition 2 Given a partition Θ1 middot middot middotΘL the optimum reproduction valuesare found from the centroid (or center of gravity) of the section of the probabilitydensity function in the region of Θi

xi =

intuisinΘi upX(u)duintuisinΘi pX(u)du

= EX |X isin Θi [12]

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity

Page 18: Tools for Signal Compression€¦ · Table of Contents Introduction..... xi PART1.TOOLS FORSIGNALCOMPRESSION..... 1 Chapter 1. Scalar Quantization..... 3 1.1. Introduction

6 Tools for Signal Compression

First note that minimizing σ2Q relative to xi involves only an element from the sum

given in [11] From the following

part

partxi

intuisinΘi

(uminus xi)2pX(u)du = 0

minus2

intuisinΘi

upX(u)du+ 2xiintuisinΘi

pX(u)du = 0

we find the first identity of equation [12]

SinceintuisinΘi

upX(u)du =

intuisinΘi

pX(u)du

int infin

minusinfinupX|Θi(u)du

where pX|Θi is the conditional probability density function of X where X isin Θi wefind

xi =

int infin

minusinfinupX|Θi(u)du

xi = EX |X isin ΘiThe required value is the mean value of X in the interval under consideration 2

It can be demonstrated that these two optimization conditions are not sufficient toguarantee optimized quantization except in the case of a Gaussian distribution

Note that detailed knowledge of the partition is not necessary The partition isdetermined entirely by knowing the distortion measure applying the nearest neighborrule and from the set of reproduction values Figure 12 shows a diagram of theencoder and decoder

x1 xL x1 xL

i(n)x(n) x(n)Look upin

a table

Nearestneighbor

rule

Figure 12 Encoder and decoder

2 This result can be interpreted in a mechanical system the moment of inertia of an objectwith respect to a point is at a minimum when the point is the center of gravity