speech and voice compression

1. Voice and Audio Compressionfor Wireless Communications

2. Voice and Audio Compressionfor Wireless CommunicationsSecond EditionLajos HanzoUniversity of Southampton, UKF. Clare SomervillepicoChip Designs Ltd, UKJason WoodardCSR plc, UKIEEE Communications Society, SponsorJohn Wiley & Sons, Ltd 3. Copyright c 2007 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,West Sussex PO19 8SQ, EnglandTelephone (+44) 1243 779777Email (for orders and customer service enquiries): [email protected] our Home Page on www.wiley.comAll Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted inany form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except underthe terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the CopyrightLicensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of thePublisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd,The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to [email protected], orfaxed to (+44) 1243 770620.Designations used by companies to distinguish their products are often claimed as trademarks. All brand names andproduct names used in this book are trade names, service marks, trademarks or registered trademarks of theirrespective owners. The Publisher is not associated with any product or vendor mentioned in this book. Alltrademarks referred to in the text of this publication are the property of their respective owners.This publication is designed to provide accurate and authoritative information in regard to the subject mattercovered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. Ifprofessional advice or other expert assistance is required, the services of a competent professional should be sought.Other Wiley Editorial OfcesJohn Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USAJossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USAWiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, GermanyJohn Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, AustraliaJohn Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not beavailable in electronic books.IEEE Communications Society, SponsorCOMMS-S Liaison to IEEE Press, Mostafa Hashem SherifLibrary of Congress Cataloging-in-Publication DataHanzo, Lajos, 1952-Voice and Audio Compression for Wireless Communications / L. Hanzo,F.C.A. Somerville and J.P. Woodard 2nd ed.p. cm.Rev. ed. of: Voice and Audio Compression for Wireless Communications. c2001Includes bibliographical references and index.ISBN 978-0-470-51581-5 (cloth : alk. paper)1. Compressed speech. 2. Speech processing systems. 3. Telecommunication systems.I. Somerville, F. Clare A. II. Woodard, Jason P. III. Hanzo, Lajos,1952- Voice compression and communications. IV. Title.TK7882.S65H35 2007621.384dc222007011025British Library Cataloguing in Publication DataA catalogue record for this book is available from the British LibraryISBN 978-0-470- 51581-5 (HB)Typeset by the authors using LATEX software.Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, England.This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which at least twotrees are planted for each one used for paper production. 4. ContentsAbout the Authors xxiOther Wiley and IEEE Press Books on Related Topics xxiiiPreface and Motivation xxvAcknowledgements xxxvI Speech Signals and Waveform Coding 11 Speech Signals and an Introduction to Speech Coding 31.1 Motivation of Speech Compression . . . . . . . . . . . . . . . . . . . . . . 31.2 Basic Characterisation of Speech Signals . . . . . . . . . . . . . . . . . . . 41.3 Classication of Speech Codecs . . . . . . . . . . . . . . . . . . . . . . . . 81.3.1 Waveform Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1.1 Time-domain Waveform Coding . . . . . . . . . . . . . . 91.3.1.2 Frequency-domain Waveform Coding . . . . . . . . . . . . 101.3.2 Vocoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.3 Hybrid Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Waveform Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.1 Digitisation of Speech . . . . . . . . . . . . . . . . . . . . . . . . 111.4.2 Quantisation Characteristics . . . . . . . . . . . . . . . . . . . . . 131.4.3 Quantisation Noise and Rate-distortion Theory . . . . . . . . . . . 141.4.4 Non-uniform Quantisation for a known PDF: Companding . . . . . 161.4.5 PDF-independent Quantisation using Logarithmic Compression . . 181.4.5.1 The -law Compander . . . . . . . . . . . . . . . . . . . . 201.4.5.2 The A-law Compander . . . . . . . . . . . . . . . . . . . . 211.4.6 Optimum Non-uniform Quantisation . . . . . . . . . . . . . . . . . 231.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28v 5. vi CONTENTS2 Predictive Coding 292.1 Forward-Predictive Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2 DPCM Codec Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3 Predictor Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.2 Covariance Coefcient Computation . . . . . . . . . . . . . . . . . 332.3.3 Predictor Coefcient Computation . . . . . . . . . . . . . . . . . . 342.4 Adaptive One-word-memory Quantisation . . . . . . . . . . . . . . . . . . 392.5 DPCM Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.6 Backward-adaptive Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 422.6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.6.2 Stochastic Model Processes . . . . . . . . . . . . . . . . . . . . . 442.7 The 32 kbps G.721 ADPCM Codec . . . . . . . . . . . . . . . . . . . . . . 472.7.1 Functional Description of the G.721 Codec . . . . . . . . . . . . . 472.7.2 Adaptive Quantiser . . . . . . . . . . . . . . . . . . . . . . . . . . 472.7.3 G.721 Quantiser Scale Factor Adaptation . . . . . . . . . . . . . . 482.7.4 G.721 Adaptation Speed Control . . . . . . . . . . . . . . . . . . . 502.7.5 G.721 Adaptive Prediction and Signal Reconstruction . . . . . . . . 512.8 Subjective and Objective Speech Quality . . . . . . . . . . . . . . . . . . . 532.9 Variable-rate G.726 and Embedded G.727 ADPCM . . . . . . . . . . . . . . 542.9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.9.2 Embedded G.727 ADPCM Coding . . . . . . . . . . . . . . . . . . 552.9.3 Performance of the Embedded G.727 ADPCM Codec . . . . . . . . 562.10 Rate-distortion in Predictive Coding . . . . . . . . . . . . . . . . . . . . . . 622.11 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67II Analysis-by-Synthesis Coding 693 Analysis-by-Synthesis Principles 713.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.2 Analysis-by-Synthesis Codec Structure . . . . . . . . . . . . . . . . . . . . 723.3 The Short-term Synthesis Filter . . . . . . . . . . . . . . . . . . . . . . . . 733.4 Long-term Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.4.1 Open-loop Optimisation of LTP Parameters . . . . . . . . . . . . . 763.4.2 Closed-loop Optimisation of LTP Parameters . . . . . . . . . . . . 803.5 Excitation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.6 Adaptive Short-term and Long-term Post-Filtering . . . . . . . . . . . . . . 883.7 Lattice-based Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . 903.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Speech Spectral Quantisation 994.1 Log-area Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.2 Line Spectral Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034.2.1 Derivation of the Line Spectral Frequencies . . . . . . . . . . . . . 1034.2.2 Computation of the Line Spectral Frequencies . . . . . . . . . . . . 107 6. CONTENTS vii4.2.3 Chebyshev Description of Line Spectral Frequencies . . . . . . . . 1094.3 Vector Quantisation of Spectral Parameters . . . . . . . . . . . . . . . . . . 1154.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.3.2 Speaker-adaptive Vector Quantisation of LSFs . . . . . . . . . . . . 1154.3.3 Stochastic VQ of LPC Parameters . . . . . . . . . . . . . . . . . . 1174.3.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 1174.3.3.2 The Stochastic VQ Algorithm . . . . . . . . . . . . . . . . 1184.3.4 Robust Vector Quantisation Schemes for LSFs . . . . . . . . . . . 1214.3.5 LSF VQs in Standard Codecs . . . . . . . . . . . . . . . . . . . . 1224.4 Spectral Quantisers for Wideband Speech Coding . . . . . . . . . . . . . . . 1234.4.1 Introduction to Wideband Spectral Quantisation . . . . . . . . . . . 1234.4.1.1 Statistical Properties of Wideband LSFs . . . . . . . . . . 1254.4.1.2 Speech Codec Specications . . . . . . . . . . . . . . . . 1274.4.2 Wideband LSF VQs . . . . . . . . . . . . . . . . . . . . . . . . . 1284.4.2.1 Memoryless Vector Quantisation . . . . . . . . . . . . . . 1284.4.2.2 Predictive Vector Quantisation . . . . . . . . . . . . . . . 1324.4.2.3 Multimode Vector Quantisation . . . . . . . . . . . . . . . 1334.4.3 Simulation Results and Subjective Evaluations . . . . . . . . . . . 1364.4.4 Conclusions on Wideband Spectral Quantisation . . . . . . . . . . 1374.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385 Regular Pulse Excited Coding 1395.1 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1395.2 The 13 kbps RPE-LTP GSM Speech Encoder . . . . . . . . . . . . . . . . . 1465.2.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1465.2.2 STP Analysis Filtering . . . . . . . . . . . . . . . . . . . . . . . . 1485.2.3 LTP Analysis Filtering . . . . . . . . . . . . . . . . . . . . . . . . 1485.2.4 Regular Excitation Pulse Computation . . . . . . . . . . . . . . . . 1495.3 The 13 kbps RPE-LTP GSM Speech Decoder . . . . . . . . . . . . . . . . . 1515.4 Bit-sensitivity of the 13 kbps GSM RPE-LTP Codec . . . . . . . . . . . . . 1535.5 Application Example: A Tool-box Based Speech Transceiver . . . . . . . . . 1545.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1576 Forward-Adaptive Code Excited Linear Prediction 1596.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1596.2 The Original CELP Approach . . . . . . . . . . . . . . . . . . . . . . . . . 1606.3 Fixed Codebook Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.4 CELP Excitation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1656.4.1 Binary-pulse Excitation . . . . . . . . . . . . . . . . . . . . . . . . 1656.4.2 Transformed Binary-pulse Excitation . . . . . . . . . . . . . . . . 1666.4.2.1 Excitation Generation . . . . . . . . . . . . . . . . . . . . 1666.4.2.2 Bit-sensitivity Analysis of the 4.8 Kbps TBPE SpeechCodec . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1686.4.3 Dual-rate Algebraic CELP Coding . . . . . . . . . . . . . . . . . . 1706.4.3.1 ACELP Codebook Structure . . . . . . . . . . . . . . . . 1706.4.3.2 Dual-rate ACELP Bit Allocation . . . . . . . . . . . . . . 172 7. viii CONTENTS6.4.3.3 Dual-rate ACELP Codec Performance . . . . . . . . . . . 1736.5 Optimisation of the CELP Codec Parameters . . . . . . . . . . . . . . . . . 1746.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746.5.2 Calculation of the Excitation Parameters . . . . . . . . . . . . . . . 1756.5.2.1 Full Codebook Search Theory . . . . . . . . . . . . . . . . 1756.5.2.2 Sequential Search Procedure . . . . . . . . . . . . . . . . 1776.5.2.3 Full Search Procedure . . . . . . . . . . . . . . . . . . . . 1786.5.2.4 Sub-optimal Search Procedures . . . . . . . . . . . . . . . 1806.5.2.5 Quantisation of the Codebook Gains . . . . . . . . . . . . 1816.5.3 Calculation of the Synthesis Filter Parameters . . . . . . . . . . . . 1836.5.3.1 Bandwidth Expansion . . . . . . . . . . . . . . . . . . . . 1846.5.3.2 Least Squares Techniques . . . . . . . . . . . . . . . . . . 1846.5.3.3 Optimisation via Powells Method . . . . . . . . . . . . . 1876.5.3.4 Simulated Annealing and the Effects of Quantisation . . . . 1886.6 The Error Sensitivity of CELP Codecs . . . . . . . . . . . . . . . . . . . . . 1926.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1926.6.2 Improving the Spectral Information Error Sensitivity . . . . . . . . 1926.6.2.1 LSF Ordering Policies . . . . . . . . . . . . . . . . . . . . 1926.6.2.2 The Effect of FEC on the Spectral Parameters . . . . . . . 1956.6.2.3 The Effect of Interpolation . . . . . . . . . . . . . . . . . 1956.6.3 Improving the Error Sensitivity of the Excitation Parameters . . . . 1966.6.3.1 The Fixed Codebook Index . . . . . . . . . . . . . . . . . 1976.6.3.2 The Fixed Codebook Gain . . . . . . . . . . . . . . . . . . 1976.6.3.3 Adaptive Codebook Delay . . . . . . . . . . . . . . . . . . 1986.6.3.4 Adaptive Codebook Gain . . . . . . . . . . . . . . . . . . 1996.6.4 Matching Channel Codecs to the Speech Codec . . . . . . . . . . . 1996.6.5 Error Resilience Conclusions . . . . . . . . . . . . . . . . . . . . . 2036.7 Application Example: A Dual-mode 3.1 kBd Speech Transceiver . . . . . . . 2046.7.1 The Transceiver Scheme . . . . . . . . . . . . . . . . . . . . . . . 2046.7.2 Re-congurable Modulation . . . . . . . . . . . . . . . . . . . . . 2056.7.3 Source-matched Error Protection . . . . . . . . . . . . . . . . . . . 2066.7.3.1 Low-quality 3.1 kBd Mode . . . . . . . . . . . . . . . . . 2066.7.3.2 High-quality 3.1 kBd Mode . . . . . . . . . . . . . . . . . 2106.7.4 Voice Activity Detection and Packet Reservation Multiple Access . 2116.7.5 3.1 kBd System Performance . . . . . . . . . . . . . . . . . . . . . 2146.7.6 3.1 kBd System Summary . . . . . . . . . . . . . . . . . . . . . . 2176.8 Multi-slot PRMA Transceiver . . . . . . . . . . . . . . . . . . . . . . . . . 2186.8.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 2186.8.2 PRMA-assisted Multi-slot Adaptive Modulation . . . . . . . . . . 2196.8.3 Adaptive GSM-like Schemes . . . . . . . . . . . . . . . . . . . . . 2206.8.4 Adaptive DECT-like Schemes . . . . . . . . . . . . . . . . . . . . 2226.8.5 Summary of Adaptive Multi-slot PRMA . . . . . . . . . . . . . . . 2236.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 8. CONTENTS ix7 Standard Speech Codecs 2257.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2257.2 The US DoD FS-1016 4.8 kbps CELP Codec . . . . . . . . . . . . . . . . . 2257.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2257.2.2 LPC Analysis and Quantisation . . . . . . . . . . . . . . . . . . . 2277.2.3 The Adaptive Codebook . . . . . . . . . . . . . . . . . . . . . . . 2287.2.4 The Fixed Codebook . . . . . . . . . . . . . . . . . . . . . . . . . 2297.2.5 Error Concealment Techniques . . . . . . . . . . . . . . . . . . . . 2307.2.6 Decoder Post-ltering . . . . . . . . . . . . . . . . . . . . . . . . 2317.2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2317.3 The 7.95 kbps Pan-American Speech Codec Known as IS-54 DAMPSCodec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2317.4 The 6.7 kbps Japanese Digital Cellular Systems Speech Codec . . . . . . . 2357.5 The Qualcomm Variable Rate CELP Codec . . . . . . . . . . . . . . . . . . 2377.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2377.5.2 Codec Schematic and Bit Allocation . . . . . . . . . . . . . . . . . 2387.5.3 Codec Rate Selection . . . . . . . . . . . . . . . . . . . . . . . . . 2397.5.4 LPC Analysis and Quantisation . . . . . . . . . . . . . . . . . . . 2407.5.5 The Pitch Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2417.5.6 The Fixed Codebook . . . . . . . . . . . . . . . . . . . . . . . . . 2427.5.7 Rate 1/8 Filter Excitation . . . . . . . . . . . . . . . . . . . . . . . 2437.5.8 Decoder Post-ltering . . . . . . . . . . . . . . . . . . . . . . . . 2437.5.9 Error Protection and Concealment Techniques . . . . . . . . . . . . 2447.5.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2447.6 Japanese Half-rate Speech Codec . . . . . . . . . . . . . . . . . . . . . . . 2457.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2457.6.2 Codec Schematic and Bit Allocation . . . . . . . . . . . . . . . . . 2457.6.3 Encoder Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 2477.6.4 LPC Analysis and Quantisation . . . . . . . . . . . . . . . . . . . 2487.6.5 The Weighting Filter . . . . . . . . . . . . . . . . . . . . . . . . . 2487.6.6 Excitation Vector 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 2497.6.7 Excitation Vector 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 2507.6.8 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2517.6.9 Decoder Post-processing . . . . . . . . . . . . . . . . . . . . . . . 2527.7 The Half-rate GSM Speech Codec . . . . . . . . . . . . . . . . . . . . . . . 2537.7.1 Half-rate GSM Codec Outline and Bit Allocation . . . . . . . . . . 2537.7.2 Spectral Quantisation in the Half-rate GSM Codec . . . . . . . . . 2557.7.3 Error Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2567.8 The 8 kbps G.729 Codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2577.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2577.8.2 Codec Schematic and Bit Allocation . . . . . . . . . . . . . . . . . 2577.8.3 Encoder Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 2587.8.4 LPC Analysis and Quantisation . . . . . . . . . . . . . . . . . . . 2597.8.5 The Weighting Filter . . . . . . . . . . . . . . . . . . . . . . . . . 2627.8.6 The Adaptive Codebook . . . . . . . . . . . . . . . . . . . . . . . 2627.8.7 The Fixed Algebraic Codebook . . . . . . . . . . . . . . . . . . . 263 9. x CONTENTS7.8.8 Quantisation of the Gains . . . . . . . . . . . . . . . . . . . . . . . 2667.8.9 Decoder Post-processing . . . . . . . . . . . . . . . . . . . . . . . 2677.8.10 G.729 Error-concealment Techniques . . . . . . . . . . . . . . . . 2697.8.11 G.729 Bit-sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . 2707.8.12 Turbo-coded Orthogonal Frequency Division MultiplexTransmission of G.729 Encoded Speech . . . . . . . . . . . . . . . 2717.8.12.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 2717.8.12.2 System Overview . . . . . . . . . . . . . . . . . . . . . . 2727.8.12.3 Turbo Channel Encoding . . . . . . . . . . . . . . . . . . 2737.8.12.4 OFDM in the FRAMES Speech/Data Sub-burst . . . . . . 2747.8.12.5 Channel Model . . . . . . . . . . . . . . . . . . . . . . . 2757.8.12.6 Turbo-coded G.729 OFDM Parameters . . . . . . . . . . . 2757.8.12.7 Turbo-coded G.729 OFDM Performance . . . . . . . . . . 2767.8.12.8 Turbo-coded G.729 OFDM Summary . . . . . . . . . . . . 2777.8.13 G.729 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 2787.9 The Reduced Complexity G.729 Annex A Codec . . . . . . . . . . . . . . . 2787.9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2787.9.2 The Perceptual Weighting Filter . . . . . . . . . . . . . . . . . . . 2797.9.3 The Open-loop Pitch Search . . . . . . . . . . . . . . . . . . . . . 2807.9.4 The Closed-loop Pitch Search . . . . . . . . . . . . . . . . . . . . 2807.9.5 The Algebraic Codebook Search . . . . . . . . . . . . . . . . . . . 2807.9.6 The Decoder Post-processing . . . . . . . . . . . . . . . . . . . . . 2817.9.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2817.10 The 12.2 kbps Enhanced Full-rate GSM Speech Codec . . . . . . . . . . . . 2827.10.1 Enhanced Full-rate GSM Codec Outline . . . . . . . . . . . . . . . 2827.10.2 Enhanced Full-rate GSM Encoder . . . . . . . . . . . . . . . . . . 2847.10.2.1 Spectral Quantisation and Windowing in the EnhancedFull-rate GSM Codec . . . . . . . . . . . . . . . . . . . . 2847.10.2.2 Adaptive Codebook Search . . . . . . . . . . . . . . . . . 2867.10.2.3 Fixed Codebook Search . . . . . . . . . . . . . . . . . . . 2867.11 The Enhanced Full-rate 7.4 kbps IS-136 Speech Codec . . . . . . . . . . . . 2877.11.1 IS-136 Codec Outline . . . . . . . . . . . . . . . . . . . . . . . . . 2877.11.2 IS-136 Bit-allocation Scheme . . . . . . . . . . . . . . . . . . . . 2897.11.3 Fixed Codebook Search . . . . . . . . . . . . . . . . . . . . . . . 2907.11.4 IS-136 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . 2917.12 The ITU G.723.1 Dual-rate Codec . . . . . . . . . . . . . . . . . . . . . . . 2927.12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2927.12.2 G.723.1 Encoding Principle . . . . . . . . . . . . . . . . . . . . . 2927.12.3 Vector-quantisation of the LSPs . . . . . . . . . . . . . . . . . . . 2947.12.4 Formant-based Weighting Filter . . . . . . . . . . . . . . . . . . . 2957.12.5 The 6.3 kbps High-rate G.723.1 Excitation . . . . . . . . . . . . . . 2967.12.6 The 5.3 kbps Low-rate G.723.1 Excitation . . . . . . . . . . . . . . 2977.12.7 G.723.1 Bit Allocation . . . . . . . . . . . . . . . . . . . . . . . . 2987.12.8 G.723.1 Error Sensitivity . . . . . . . . . . . . . . . . . . . . . . . 3007.13 Advanced Multirate JD-CDMA Transceiver . . . . . . . . . . . . . . . . . . 3027.13.1 Multirate Codecs and Systems . . . . . . . . . . . . . . . . . . . . 302 10. CONTENTS xi7.13.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3057.13.3 The Adaptive Multirate Speech Codec . . . . . . . . . . . . . . . . 3067.13.3.1 AMR Codec Overview . . . . . . . . . . . . . . . . . . . 3067.13.3.2 Linear Prediction Analysis . . . . . . . . . . . . . . . . . 3077.13.3.3 LSF Quantisation . . . . . . . . . . . . . . . . . . . . . . 3087.13.3.4 Pitch Analysis . . . . . . . . . . . . . . . . . . . . . . . . 3087.13.3.5 Fixed Codebook with Algebraic Structure . . . . . . . . . 3087.13.3.6 Post-processing . . . . . . . . . . . . . . . . . . . . . . . 3107.13.3.7 The AMR Codecs Bit Allocation . . . . . . . . . . . . . . 3117.13.3.8 Codec Mode Switching Philosophy . . . . . . . . . . . . . 3117.13.4 The AMR Speech Codecs Error Sensitivity . . . . . . . . . . . . . 3127.13.5 RRNS-based Channel Coding . . . . . . . . . . . . . . . . . . . . 3157.13.5.1 RRNS Overview . . . . . . . . . . . . . . . . . . . . . . . 3157.13.5.2 Source-matched Error Protection . . . . . . . . . . . . . . 3167.13.6 Joint Detection Code Division Multiple Access . . . . . . . . . . . 3187.13.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3187.13.6.2 Joint Detection Based Adaptive Code Division MultipleAccess . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3197.13.7 System Performance . . . . . . . . . . . . . . . . . . . . . . . . . 3197.13.7.1 Subjective Testing . . . . . . . . . . . . . . . . . . . . . . 3267.13.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3277.14 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3278 Backward-adaptive Code Excited Linear Prediction 3318.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3318.2 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3318.3 Backward-adaptive G728 Codec Schematic . . . . . . . . . . . . . . . . . . 3348.4 Backward-adaptive G728 Coding Algorithm . . . . . . . . . . . . . . . . . 3368.4.1 G728 Error Weighting . . . . . . . . . . . . . . . . . . . . . . . . 3368.4.2 G728 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . 3378.4.3 Codebook Gain Adaption . . . . . . . . . . . . . . . . . . . . . . . 3418.4.4 G728 Codebook Search . . . . . . . . . . . . . . . . . . . . . . . 3438.4.5 G728 Excitation Vector Quantisation . . . . . . . . . . . . . . . . 3458.4.6 G728 Adaptive Post-ltering . . . . . . . . . . . . . . . . . . . . . 3478.4.6.1 Adaptive Long-term Post-ltering . . . . . . . . . . . . . . 3488.4.6.2 G.728 Adaptive Short-term Post-ltering . . . . . . . . . . 3508.4.7 Complexity and Performance of the G728 Codec . . . . . . . . . . 3518.5 Reduced-rate G728-like Codec: Variable-length Excitation Vector . . . . . . 3518.6 The Effects of Long-term Prediction . . . . . . . . . . . . . . . . . . . . . . 3548.7 Closed-loop Codebook Training . . . . . . . . . . . . . . . . . . . . . . . . 3598.8 Reduced-rate G728-like Codec: Constant-length Excitation Vector . . . . . . 3648.9 Programmable-rate 84 kbps Low-delay CELP Codecs . . . . . . . . . . . . 3658.9.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3658.9.2 84 kbps Codec Improvements Due to Increasing Codebook Sizes . 3668.9.3 84 kbps Codecs Forward Adaption of the Short-term SynthesisFilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 11. xii CONTENTS8.9.4 Forward Adaption of the Long-term Predictor . . . . . . . . . . . . 3688.9.4.1 Initial Experiments . . . . . . . . . . . . . . . . . . . . . 3688.9.4.2 Quantisation of Jointly Optimized Gains . . . . . . . . . . 3708.9.4.3 84 kbps Codecs Voiced/Unvoiced Codebooks . . . . . . 3738.9.5 Low-delay Codecs at 48 kbps . . . . . . . . . . . . . . . . . . . . 3758.9.6 Low-delay ACELP Codec . . . . . . . . . . . . . . . . . . . . . . 3788.10 Backward-adaptive Error Sensitivity Issues . . . . . . . . . . . . . . . . . . 3818.10.1 The Error Sensitivity of the G728 Codec . . . . . . . . . . . . . . . 3818.10.2 The Error Sensitivity of our 48 kbps Low-delay Codecs . . . . . . 3828.10.3 The Error Sensitivity of our Low-delay ACELP Codec . . . . . . . 3878.11 A Low-delay Multimode Speech Transceiver . . . . . . . . . . . . . . . . . 3888.11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3888.11.2 816 kbps Codec Performance . . . . . . . . . . . . . . . . . . . . 3888.11.3 Transmission Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 3898.11.3.1 Higher-quality Mode . . . . . . . . . . . . . . . . . . . . 3898.11.3.2 Lower-quality Mode . . . . . . . . . . . . . . . . . . . . . 3918.11.4 Speech Transceiver Performance . . . . . . . . . . . . . . . . . . . 3918.12 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392III Wideband Speech, MPEG-4 Audio and Their Transmission 3939 Wideband Speech Coding 3959.1 Sub-band-ADPCM Wideband Coding at 64 kbps . . . . . . . . . . . . . . . 3959.1.1 Introduction and Specications . . . . . . . . . . . . . . . . . . . . 3959.1.2 G722 Codec Outline . . . . . . . . . . . . . . . . . . . . . . . . . 3969.1.3 Principles of Sub-band Coding . . . . . . . . . . . . . . . . . . . . 3999.1.4 Quadrature Mirror Filtering . . . . . . . . . . . . . . . . . . . . . 4009.1.4.1 Analysis Filtering . . . . . . . . . . . . . . . . . . . . . . 4009.1.4.2 Synthesis Filtering . . . . . . . . . . . . . . . . . . . . . . 4039.1.4.3 Practical QMF Design Constraints . . . . . . . . . . . . . 4059.1.5 G722 Adaptive Quantisation and Prediction . . . . . . . . . . . . . 4109.1.6 G722 Coding Performance . . . . . . . . . . . . . . . . . . . . . . 4129.2 Wideband Transform-coding at 32 kbps . . . . . . . . . . . . . . . . . . . . 4139.2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4139.2.2 Transform-coding Algorithm . . . . . . . . . . . . . . . . . . . . . 4139.3 Sub-band-split Wideband CELP Codecs . . . . . . . . . . . . . . . . . . . . 4169.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4169.3.2 Sub-band-based Wideband CELP Coding . . . . . . . . . . . . . . 4179.3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 4179.3.2.2 Low-band Coding . . . . . . . . . . . . . . . . . . . . . . 4179.3.2.3 High-band Coding . . . . . . . . . . . . . . . . . . . . . . 4189.3.2.4 Bit-allocation Scheme . . . . . . . . . . . . . . . . . . . . 4199.4 Fullband Wideband ACELP Coding . . . . . . . . . . . . . . . . . . . . . . 4209.4.1 Wideband ACELP Excitation . . . . . . . . . . . . . . . . . . . . 420 12. CONTENTS xiii9.4.2 Backward-adaptive 32 kbps Wideband ACELP . . . . . . . . . . . 4229.4.3 Forward-adaptive 9.6 kbps Wideband ACELP . . . . . . . . . . . . 4239.5 A Turbo-coded Burst-by-burst Adaptive Wideband Speech Transceiver . . . 4259.5.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 4259.5.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4289.5.3 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 4289.5.4 Constant Throughput Adaptive Modulation . . . . . . . . . . . . . 4299.5.5 Adaptive Wideband Transceiver Performance . . . . . . . . . . . . 4319.5.6 Multi-mode Transceiver Adaptation . . . . . . . . . . . . . . . . . 4329.5.7 Transceiver Mode Switching . . . . . . . . . . . . . . . . . . . . . 4339.5.8 The Wideband G.722.1 Codec . . . . . . . . . . . . . . . . . . . . 4359.5.8.1 Audio Codec Overview . . . . . . . . . . . . . . . . . . . 4359.5.9 Detailed Description of the Audio Codec . . . . . . . . . . . . . . 4379.5.10 Wideband Adaptive System Performance . . . . . . . . . . . . . . 4399.5.11 Audio Frame Error Results . . . . . . . . . . . . . . . . . . . . . . 4409.5.12 Audio SEGSNR Performance and Discussions . . . . . . . . . . . 4419.5.13 G.722.1 Audio Transceiver Summary and Conclusions . . . . . . . 4429.6 Turbo-detected Unequal Error Protection Irregular ConvolutionalCoded AMR-WB Transceivers . . . . . . . . . . . . . . . . . . . . . . . . . 4429.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4429.6.2 The AMR-WB Codecs Error Sensitivity . . . . . . . . . . . . . . 4459.6.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4459.6.4 Design of Irregular Convolutional Codes . . . . . . . . . . . . . . 4469.6.5 An Irregular Convolutional Code Example . . . . . . . . . . . . . . 4499.6.6 UEP AMR IRCC Performance Results . . . . . . . . . . . . . . . 4509.6.7 UEP AMR Conclusions . . . . . . . . . . . . . . . . . . . . . . . 4529.7 The AMR-WB+ Audio Codec . . . . . . . . . . . . . . . . . . . . . . . . . 4549.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4549.7.2 Audio Requirements in Mobile Multimedia Applications . . . . . . 4569.7.2.1 Summary of Audiovisual Services . . . . . . . . . . . . . 4579.7.2.2 Bit Rates Supported by the Radio Network . . . . . . . . . 4579.7.3 Overview of the AMR-WB+ Codec . . . . . . . . . . . . . . . . . 4599.7.3.1 Encoding the High Frequencies . . . . . . . . . . . . . . . 4629.7.3.2 Stereo Encoding . . . . . . . . . . . . . . . . . . . . . . . 4629.7.3.3 Complexity of AMR-WB+ . . . . . . . . . . . . . . . . . 4639.7.3.4 Transport and File Format of AMR-WB+ . . . . . . . . . . 4639.7.4 Performance of AMR-WB+ . . . . . . . . . . . . . . . . . . . . . 4639.7.5 Summary of the AMR-WB+ Codec . . . . . . . . . . . . . . . . . 4659.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46610 MPEG-4 Audio Compression and Transmission 46910.1 Overview of MPEG-4 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . 46910.2 General Audio Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47110.2.1 Advanced Audio Coding . . . . . . . . . . . . . . . . . . . . . . . 47910.2.2 Gain Control Tool . . . . . . . . . . . . . . . . . . . . . . . . . . 48210.2.3 Psycho-acoustic Model . . . . . . . . . . . . . . . . . . . . . . . . 482 13. xiv CONTENTS10.2.4 Temporal Noise Shaping . . . . . . . . . . . . . . . . . . . . . . . 48410.2.5 Stereophonic Coding . . . . . . . . . . . . . . . . . . . . . . . . . 48610.2.6 AAC Quantisation and Coding . . . . . . . . . . . . . . . . . . . . 48710.2.7 Noiseless Huffman Coding . . . . . . . . . . . . . . . . . . . . . . 48910.2.8 Bit-sliced Arithmetic Coding . . . . . . . . . . . . . . . . . . . . . 49010.2.9 Transform-domain Weighted Interleaved Vector Quantisation . . . . 49210.2.10 Parametric Audio Coding . . . . . . . . . . . . . . . . . . . . . . . 49510.3 Speech Coding in MPEG-4 Audio . . . . . . . . . . . . . . . . . . . . . . . 49510.3.1 Harmonic Vector Excitation Coding . . . . . . . . . . . . . . . . . 49610.3.2 CELP Coding in MPEG-4 . . . . . . . . . . . . . . . . . . . . . . 49810.3.3 LPC Analysis and Quantisation . . . . . . . . . . . . . . . . . . . 50010.3.4 Multi Pulse and Regular Pulse Excitation . . . . . . . . . . . . . . 50210.4 MPEG-4 Codec Performance . . . . . . . . . . . . . . . . . . . . . . . . . 50310.5 MPEG-4 Spacetime Block Coded OFDM Audio Transceiver . . . . . . . . 50510.5.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 50610.5.2 System Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 50710.5.3 Frame Dropping Procedure . . . . . . . . . . . . . . . . . . . . . . 50710.5.4 Spacetime Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 51010.5.5 Adaptive Modulation . . . . . . . . . . . . . . . . . . . . . . . . . 51310.5.6 System Performance . . . . . . . . . . . . . . . . . . . . . . . . . 51410.6 Turbo-detected Spacetime Trellis Coded MPEG-4 Audio Transceivers . . . 51610.6.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . 51610.6.2 Audio Turbo Transceiver Overview . . . . . . . . . . . . . . . . . 51810.6.3 The Turbo Transceiver . . . . . . . . . . . . . . . . . . . . . . . . 51910.6.4 Turbo Transceiver Performance Results . . . . . . . . . . . . . . . 52110.6.5 MPEG-4 Turbo Transceiver Summary . . . . . . . . . . . . . . . . 52410.7 Turbo-detected Spacetime Trellis Coded MPEG-4 Versus AMR-WBSpeech Transceivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52510.7.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . 52510.7.2 The AMR-WB Codecs Error Sensitivity . . . . . . . . . . . . . . 52610.7.3 The MPEG-4 TWINVQ Codecs Error Sensitivity . . . . . . . . . 52710.7.4 The Turbo Transceiver . . . . . . . . . . . . . . . . . . . . . . . . 52810.7.5 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . 53110.7.6 AMR-WB and MPEG-4 TWINVQ Turbo Transceiver Summary . . 53410.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534IV Very Low-rate Coding and Transmission 53711 Overview of Low-rate Speech Coding 53911.1 Low-bitrate Speech Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 53911.1.1 AbS Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54211.1.2 Speech Coding at 2.4 kbps . . . . . . . . . . . . . . . . . . . . . . 54311.1.2.1 Background to 2.4 kbps Speech Coding . . . . . . . . . . . 54411.1.2.2 Frequency Selective Harmonic Coder . . . . . . . . . . . . 54511.1.2.3 Sinusoidal Transform Coder . . . . . . . . . . . . . . . . . 546 14. CONTENTS xv11.1.2.4 Multiband Excitation Coders . . . . . . . . . . . . . . . . 54711.1.2.5 Sub-band Linear Prediction Coder . . . . . . . . . . . . . 54911.1.2.6 Mixed Excitation Linear Prediction Coder . . . . . . . . . 54911.1.2.7 Waveform Interpolation Coder . . . . . . . . . . . . . . . 55111.1.3 Speech Coding Below 2.4 kbps . . . . . . . . . . . . . . . . . . . . 55211.2 Linear Predictive Coding Model . . . . . . . . . . . . . . . . . . . . . . . . 55311.2.1 Short-term Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 55411.2.2 Long-term Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 55611.2.3 Final Analysis-by-Synthesis Model . . . . . . . . . . . . . . . . . 55611.3 Speech Quality Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 55711.3.1 Objective Speech Quality Measures . . . . . . . . . . . . . . . . . 55711.3.2 Subjective Speech Quality Measures . . . . . . . . . . . . . . . . . 55811.3.3 2.4 kbps Selection Process . . . . . . . . . . . . . . . . . . . . . . 55811.4 Speech Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56011.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56312 Linear Predictive Vocoder 56512.1 Overview of a Linear Predictive Vocoder . . . . . . . . . . . . . . . . . . . 56512.2 Line Spectrum Frequencies Quantisation . . . . . . . . . . . . . . . . . . . 56612.2.1 Line Spectrum Frequencies Scalar Quantisation . . . . . . . . . . . 56612.2.2 Line Spectrum Frequencies Vector Quantisation . . . . . . . . . . . 56812.3 Pitch Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57112.3.1 VoicedUnvoiced Decision . . . . . . . . . . . . . . . . . . . . . . 57312.3.2 Oversampled Pitch Detector . . . . . . . . . . . . . . . . . . . . . 57412.3.3 Pitch Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57812.3.3.1 Computational Complexity . . . . . . . . . . . . . . . . . 58112.3.4 Integer Pitch Detector . . . . . . . . . . . . . . . . . . . . . . . . 58212.4 Unvoiced Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58312.5 Voiced Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58412.5.1 Placement of Excitation Pulses . . . . . . . . . . . . . . . . . . . . 58512.5.2 Pulse Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58512.6 Adaptive Postlter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58512.7 Pulse Dispersion Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58812.7.1 Pulse Dispersion Principles . . . . . . . . . . . . . . . . . . . . . . 58812.7.2 Pitch Independent Glottal Pulse Shaping Filter . . . . . . . . . . . 58912.7.3 Pitch-dependent Glottal Pulse Shaping Filter . . . . . . . . . . . . 59212.8 Results for Linear Predictive Vocoder . . . . . . . . . . . . . . . . . . . . . 59212.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59713 Wavelets and Pitch Detection 59913.1 Conceptual Introduction to Wavelets . . . . . . . . . . . . . . . . . . . . . . 59913.1.1 Fourier Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59913.1.2 Wavelet Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60113.1.3 Detecting Discontinuities with Wavelets . . . . . . . . . . . . . . . 60113.2 Introduction to Wavelet Mathematics . . . . . . . . . . . . . . . . . . . . . 60213.2.1 Multiresolution Analysis . . . . . . . . . . . . . . . . . . . . . . . 603 15. xvi CONTENTS13.2.2 Polynomial Spline Wavelets . . . . . . . . . . . . . . . . . . . . . 60413.2.3 Pyramidal Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 60513.2.4 Boundary Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 60713.3 Preprocessing the Wavelet Transform Signal . . . . . . . . . . . . . . . . . 60713.3.1 Spurious Pulses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60913.3.2 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61013.3.3 Candidate Glottal Pulses . . . . . . . . . . . . . . . . . . . . . . . 61013.4 Voicedunvoiced Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . 61013.5 Wavelet-based Pitch Detector . . . . . . . . . . . . . . . . . . . . . . . . . 61213.5.1 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . 61313.5.2 Autocorrelation Simplication . . . . . . . . . . . . . . . . . . . . 61613.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61914 Zinc Function Excitation 62114.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62114.2 Overview of Prototype Waveform Interpolation Zinc Function Excitation . . 62214.2.1 Coding Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 62214.2.1.1 UUU Encoder Scenario . . . . . . . . . . . . . . . . . . 62414.2.1.2 UUV Encoder Scenario . . . . . . . . . . . . . . . . . . 62414.2.1.3 VUU Encoder Scenario . . . . . . . . . . . . . . . . . . 62514.2.1.4 UVU Encoder Scenario . . . . . . . . . . . . . . . . . . 62514.2.1.5 VVV Encoder Scenario . . . . . . . . . . . . . . . . . . 62514.2.1.6 VUV Encoder Scenario . . . . . . . . . . . . . . . . . . 62614.2.1.7 UVV Encoder Scenario . . . . . . . . . . . . . . . . . . 62614.2.1.8 VVU Encoder Scenario . . . . . . . . . . . . . . . . . . 62614.2.1.9 UV Decoder Scenario . . . . . . . . . . . . . . . . . . . 62714.2.1.10 UU Decoder Scenario . . . . . . . . . . . . . . . . . . . 62714.2.1.11 VU Decoder Scenario . . . . . . . . . . . . . . . . . . . 62714.2.1.12 VV Decoder Scenario . . . . . . . . . . . . . . . . . . . 62714.3 Zinc Function Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62714.3.1 Error Minimisation . . . . . . . . . . . . . . . . . . . . . . . . . . 62814.3.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 62914.3.3 Reducing the Complexity of Zinc Function Excitation Optimisation 63014.3.4 Phases of the Zinc Functions . . . . . . . . . . . . . . . . . . . . . 63114.4 Pitch Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63114.4.1 Voicedunvoiced Boundaries . . . . . . . . . . . . . . . . . . . . . 63214.4.2 Pitch Prototype Selection . . . . . . . . . . . . . . . . . . . . . . . 63314.5 Voiced Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63514.5.1 Energy Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63614.5.2 Quantisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63814.6 Excitation Interpolation Between Prototype Segments . . . . . . . . . . . . 63914.6.1 ZFE Interpolation Regions . . . . . . . . . . . . . . . . . . . . . . 64014.6.2 ZFE Amplitude Parameter Interpolation . . . . . . . . . . . . . . . 64214.6.3 ZFE Position Parameter Interpolation . . . . . . . . . . . . . . . . 64214.6.4 Implicit Signalling of Prototype Zero Crossing . . . . . . . . . . . 64414.6.5 Removal of ZFE Pulse Position Signalling and Interpolation . . . . 644 16. CONTENTS xvii14.6.6 Pitch Synchronous Interpolation of Line Spectrum Frequencies . . . 64514.6.7 ZFE Interpolation Example . . . . . . . . . . . . . . . . . . . . . . 64514.7 Unvoiced Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64514.8 Adaptive Postlter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64514.9 Results for Single Zinc Function Excitation . . . . . . . . . . . . . . . . . . 64614.10 Error Sensitivity of the 1.9 kbps PWI-ZFE Coder . . . . . . . . . . . . . . . 64914.10.1 Parameter Sensitivity of the 1.9 kbps PWI-ZFE Coder . . . . . . . . 65014.10.1.1 Line Spectrum Frequencies . . . . . . . . . . . . . . . . . 65014.10.1.2 Voicedunvoiced Flag . . . . . . . . . . . . . . . . . . . . 65014.10.1.3 Pitch Period . . . . . . . . . . . . . . . . . . . . . . . . . 65114.10.1.4 Excitation Amplitude Parameters . . . . . . . . . . . . . . 65114.10.1.5 Root Mean Square Energy Parameter . . . . . . . . . . . . 65114.10.1.6 Boundary Shift Parameter . . . . . . . . . . . . . . . . . . 65114.10.2 Degradation from Bit Corruption . . . . . . . . . . . . . . . . . . . 65214.10.2.1 Error Sensitivity Classes . . . . . . . . . . . . . . . . . . . 65314.11 Multiple Zinc Function Excitation . . . . . . . . . . . . . . . . . . . . . . . 65414.11.1 Encoding Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 65414.11.2 Performance of Multiple Zinc Function Excitation . . . . . . . . . 65714.12 A Sixth-rate, 3.8 kbps GSM-like Speech Transceiver . . . . . . . . . . . . . 66114.12.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66114.12.2 The Turbo-coded Sixth-rate 3.8 kbps GSM-like System . . . . . . . 66214.12.3 Turbo Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . 66214.12.4 The Turbo-coded GMSK Transceiver . . . . . . . . . . . . . . . . 66414.12.5 System Performance Results . . . . . . . . . . . . . . . . . . . . . 66514.13 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66515 Mixed-multiband Excitation 66715.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66715.2 Overview of Mixed-multiband Excitation . . . . . . . . . . . . . . . . . . . 66815.3 Finite Impulse Response Filter . . . . . . . . . . . . . . . . . . . . . . . . . 67115.4 Mixed-multiband Excitation Encoder . . . . . . . . . . . . . . . . . . . . . 67315.4.1 Voicing Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . 67415.5 Mixed-multiband Excitation Decoder . . . . . . . . . . . . . . . . . . . . . 67615.5.1 Adaptive Postlter . . . . . . . . . . . . . . . . . . . . . . . . . . 67815.5.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 67915.6 Performance of the Mixed-multiband Excitation Coder . . . . . . . . . . . . 68015.6.1 Performance of a Mixed-multiband Excitation Linear PredictiveCoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68015.6.2 Performance of a Mixed-multiband Excitation and Zinc FunctionPrototype Excitation Coder . . . . . . . . . . . . . . . . . . . . . . 68315.7 A Higher Rate 3.85 kbps Mixed-multiband Excitation Scheme . . . . . . . . 68615.8 A 2.35 kbps Joint-detection-based CDMA Speech Transceiver . . . . . . . . 69115.8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69115.8.2 The Speech Codecs Bit Allocation . . . . . . . . . . . . . . . . . 69215.8.3 The Speech Codecs Error Sensitivity . . . . . . . . . . . . . . . . 69315.8.4 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694 17. xviii CONTENTS15.8.5 The JD-CDMA Speech System . . . . . . . . . . . . . . . . . . . 69515.8.6 System Performance . . . . . . . . . . . . . . . . . . . . . . . . . 69615.8.7 Conclusions on the JD-CDMA Speech Transceiver . . . . . . . . . 69915.9 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69916 Sinusoidal Transform Coding Below 4 kbps 70116.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70116.2 Sinusoidal Analysis of Speech Signals . . . . . . . . . . . . . . . . . . . . . 70216.2.1 Sinusoidal Analysis with Peak-picking . . . . . . . . . . . . . . . . 70216.2.2 Sinusoidal Analysis using Analysis-by-synthesis . . . . . . . . . . 70316.3 Sinusoidal Synthesis of Speech Signals . . . . . . . . . . . . . . . . . . . . 70416.3.1 Frequency, Amplitude and Phase Interpolation . . . . . . . . . . . 70416.3.2 Overlap-add Interpolation . . . . . . . . . . . . . . . . . . . . . . 70516.4 Low-bitrate Sinusoidal Coders . . . . . . . . . . . . . . . . . . . . . . . . . 70516.4.1 Increased Frame Length . . . . . . . . . . . . . . . . . . . . . . . 70816.4.2 Incorporating Linear Prediction Analysis . . . . . . . . . . . . . . 70816.5 Incorporating Prototype Waveform Interpolation . . . . . . . . . . . . . . . 70916.6 Encoding the Sinusoidal Frequency Component . . . . . . . . . . . . . . . 71016.7 Determining the Excitation Components . . . . . . . . . . . . . . . . . . . 71216.7.1 Peak-picking of the Residual Spectra . . . . . . . . . . . . . . . . 71216.7.2 Analysis-by-synthesis of the Residual Spectrum . . . . . . . . . . . 71316.7.3 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 71516.7.4 Reducing the Computational Complexity . . . . . . . . . . . . . . 71516.8 Quantising the Excitation Parameters . . . . . . . . . . . . . . . . . . . . . 72016.8.1 Encoding the Sinusoidal Amplitudes . . . . . . . . . . . . . . . . . 72016.8.1.1 Vector Quantisation of the Amplitudes . . . . . . . . . . . 72016.8.1.2 Interpolation and Decimation . . . . . . . . . . . . . . . . 72016.8.1.3 Vector Quantisation . . . . . . . . . . . . . . . . . . . . . 72116.8.1.4 Vector Quantisation Performance . . . . . . . . . . . . . . 72316.8.1.5 Scalar Quantisation of the Amplitudes . . . . . . . . . . . 72416.8.2 Encoding the Sinusoidal Phases . . . . . . . . . . . . . . . . . . . 72516.8.2.1 Vector Quantisation of the Phases . . . . . . . . . . . . . . 72516.8.2.2 Encoding the Phases with a Voicedunvoiced Switch . . . . 72516.8.3 Encoding the Sinusoidal Fourier Coefcients . . . . . . . . . . . . 72616.8.3.1 Equivalent Rectangular Bandwidth Scale . . . . . . . . . . 72616.8.4 Voicedunvoiced Flag . . . . . . . . . . . . . . . . . . . . . . . . 72716.9 Sinusoidal Transform Decoder . . . . . . . . . . . . . . . . . . . . . . . . . 72816.9.1 Pitch Synchronous Interpolation . . . . . . . . . . . . . . . . . . . 72916.9.1.1 Fourier Coefcient Interpolation . . . . . . . . . . . . . . 72916.9.2 Frequency Interpolation . . . . . . . . . . . . . . . . . . . . . . . 72916.9.3 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 72916.10 Speech Coder Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 73016.11 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736 18. CONTENTS xix17 Conclusions on Low-rate Coding 73717.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73717.2 Listening Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73817.3 Summary of Very-low-rate Coding . . . . . . . . . . . . . . . . . . . . . . 73917.4 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74118 Comparison of Speech Codecs and Transceivers 74318.1 Background to Speech Quality Evaluation . . . . . . . . . . . . . . . . . . . 74318.2 Objective Speech Quality Measures . . . . . . . . . . . . . . . . . . . . . . 74418.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74418.2.2 Signal-to-noise Ratios . . . . . . . . . . . . . . . . . . . . . . . . 74518.2.3 Articulation Index . . . . . . . . . . . . . . . . . . . . . . . . . . 74518.2.4 Cepstral Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 74618.2.5 Example: Computation of Cepstral Coefcients . . . . . . . . . . . 75018.2.6 Logarithmic Likelihood Ratio . . . . . . . . . . . . . . . . . . . . 75118.2.7 Euclidean Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 75218.3 Subjective Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75218.3.1 Quality Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75218.4 Comparison of Subjective and Objective Measures . . . . . . . . . . . . . . 75318.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75318.4.2 Intelligibility Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 75518.5 Subjective Speech Quality of Various Codecs . . . . . . . . . . . . . . . . . 75518.6 Error Sensitivity Comparison of Various Codecs . . . . . . . . . . . . . . . 75718.7 Objective Speech Performance of Various Transceivers . . . . . . . . . . . . 75718.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76419 The Voice over Internet Protocol 76519.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76519.2 Session Initiation Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 76619.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76619.2.2 SIP Signalling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76619.2.2.1 Registration . . . . . . . . . . . . . . . . . . . . . . . . . 76619.2.2.2 Call Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 76819.2.2.3 Terminate a Call . . . . . . . . . . . . . . . . . . . . . . . 77019.2.2.4 Cancel a Call . . . . . . . . . . . . . . . . . . . . . . . . . 77119.2.3 Session Description Protocol . . . . . . . . . . . . . . . . . . . . . 77219.3 H.323 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77419.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77419.3.2 H.323 Signalling . . . . . . . . . . . . . . . . . . . . . . . . . . . 77519.3.2.1 Registration . . . . . . . . . . . . . . . . . . . . . . . . . 77519.3.2.2 Call Establishment . . . . . . . . . . . . . . . . . . . . . . 77519.3.2.3 Capability Exchange . . . . . . . . . . . . . . . . . . . . . 77719.3.2.4 Establishment of Media Communication . . . . . . . . . . 77719.3.2.5 Call Termination . . . . . . . . . . . . . . . . . . . . . . . 77719.4 Real-time Transport Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 77819.4.1 RTP Header Format . . . . . . . . . . . . . . . . . . . . . . . . . . 779 19. xx CONTENTS19.4.2 RTP Proles and Payloads . . . . . . . . . . . . . . . . . . . . . . 77919.4.2.1 RTP Payload for G.711 . . . . . . . . . . . . . . . . . . . 77919.4.2.2 RTP Payload for G.729 . . . . . . . . . . . . . . . . . . . 77919.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781A Constructing the Quadratic Spline Wavelets 783B Zinc Function Excitation 787C Probability Density Function for Amplitudes 793Bibliography 797Index 825Author Index 834 20. About the AuthorsLajos Hanzo FREng, FIEEE, FIET, DSc received his degree in elec-tronics in 1976 and his doctorate in 1983. During his 30 year career intelecommunications he has held various research and academic posts inHungary, Germany and the UK. Since 1986 he has been with the Schoolof Electronics and Computer Science, University of Southampton, UK,where he holds the chair in telecommunications. He has co-authored14 books on mobile radio communications totalling in excess of 10 000pages, published about 700 research papers, acted as TPC Chair of IEEEconferences, presented keynote lectures and been awarded a number ofdistinctions. Currently he is directing an academic research team, working on a range ofresearch projects in the eld of wireless multimedia communications sponsored by industry,the Engineering and Physical Sciences Research Council (EPSRC) UK, the European ISTProgramme and the Mobile Virtual Centre of Excellence (VCE), UK. He is an enthusiasticsupporter of industrial and academic liaison and he offers a range of industrial courses.He is also an IEEE Distinguished Lecturer of both the Communications Society and theVehicular Technology Society (VTS). Since 2005 he has been a Governor of the VTS.For further information on research in progress and associated publications please refer tohttp://www-mobile.ecs.soton.ac.uk.Clare Somerville (nee Brooks) received the M.Eng in InformationEngineering, in 1995, from the University of Southampton, UK. From1995 to 1998 she performed research into low-bitrate speech codersfor wireless communications leading to a PhD in 1999, also from theUniversity of Southampton. From 1998 to 2001 she was with the GlobalWireless Systems Research department, Bell Laboratories, Swindon,UK where she undertook research into real-time services over GPRSnetworks. Since 2001 she has been a Principal Systems Engineer atpicoChip Designs Ltd, Bath UK. working on protocol layer aspects in both UMTS andWiMAX wireless systems. Her current interests lie within the picoChip WiMAX productrange where she is lead architect for the MAC. She is a member of the 802.16 standardsforum and is a registered mentor for Women in SET (Science, Engineering and Technology).xxi 21. xxii ABOUT THE AUTHORSJason Woodard was born in Northern Ireland in 1969. He received aBA degree in Physics from Oxford University in 1991, and an MSc withdistinction in Electronics from the University of Southampton in 1992.In 1995 he completed a PhD in speech coding, also at the Universityof Southampton, and then held a three year postdoctoral fellowship,researching turbo-coding techniques for the FIRST project within theEuropean ACTS programme.In 1998 he joined the PA Consulting Groupin Cambridge, UK and in 1999 he was a founding member of UbiNetics,a supplier of mobile communications test and IP solutions. Currently he is working on theresearch and development of advanced wireless technologies for CSR, a leading supplier ofsingle-chip wireless devices. Dr Woodard has published widely in wireless communications,including co-authoring two books. 22. Other Wiley and IEEE PressBooks on Related Topics1 R. Steele, L. Hanzo (Ed): Mobile Radio Communications: Second and Third Genera-tion Cellular and WATM Systems, John Wiley & Sons, Ltd and IEEE Press, 2nd edition,1999, ISBN 07 273-1406-8, 1064 pages L. Hanzo, F.C.A. Somerville, J.P. Woodard: Voice Compression and Communica-tions: Principles and Applications for Fixed and Wireless Channels, IEEE Press andJohn Wiley & Sons, Ltd, 2001, 642 pages L. Hanzo, P. Cherriman, J. Streit: Wireless Video Communications: Second to ThirdGeneration and Beyond, IEEE Press and John Wiley & Sons, Ltd, 2001, 1093 pages L. Hanzo, T.H. Liew, B.L. Yeap: Turbo Coding, Turbo Equalisation and SpacetimeCoding, John Wiley & Sons, Ltd and IEEE Press, 2002, 751 pages J.S. Blogh, L. Hanzo: Third-Generation Systems and Intelligent Wireless Networking:Smart Antennas and Adaptive Modulation, John Wiley & Sons, Ltd and IEEE Press,2002, 408 pages L. Hanzo, C.H. Wong, M.S. Yee: Adaptive Wireless Transceivers: Turbo-Coded,Turbo-Equalised and Spacetime Coded TDMA, CDMA and OFDM Systems, JohnWiley & Sons, Ltd and IEEE Press, 2002, 737 pages L. Hanzo, L.-L. Yang, E.-L. Kuan, K. Yen: Single- and Multi-Carrier CDMA: Multi-User Detection, Spacetime Spreading, Synchronisation, Networking and Standards,John Wiley & Sons, Ltd and IEEE Press, June 2003, 1060 pages L. Hanzo, M. Mnster, T. Keller, B.-J. Choi: OFDM and MC-CDMA for BroadbandMulti-User Communications, WLANs and Broadcasting, John-Wiley & Sons, Ltd andIEEE Press, 2003, 978 pages L. Hanzo, S.-X. Ng, T. Keller and W.T. Webb: Quadrature Amplitude Modulation:From Basics to Adaptive Trellis-Coded, Turbo-Equalised and Spacetime CodedOFDM, CDMA and MC-CDMA Systems, John Wiley & Sons, Ltd and IEEE Press,2004, 1105 pages1For detailed contents and sample chapters please refer to http://www-mobile.ecs.soton.ac.uk.xxiii 23. xxiv OTHER WILEY AND IEEE PRESS BOOKS ON RELATED TOPICS L. Hanzo, T. Keller: An OFDM and MC-CDMA Primer, John Wiley & Sons, Ltd andIEEE Press, 2006, 430 pages L. Hanzo, F.C.A. Somerville, J.P. Woodard: Voice and Audio Compression for WirelessCommunications, John Wiley & Sons, Ltd and IEEE Press, 2007, 858 pages L. Hanzo, P.J. Cherriman, J. Streit: Video Compression and Communications: H.261,H.263, H.264, MPEG4 and HSDPA-Style Adaptive Turbo-Transceivers, John Wiley &Sons, Ltd and IEEE Press, 2007, 680 pages L. Hanzo, J.S. Blogh, S. Ni: 3G Systems and HSDPA-Style FDD Versus TDDNetworking: Smart Antennas and Adaptive Modulation, John Wiley & Sons, Ltd andIEEE Press, 2007 24. Preface and MotivationThe Speech Coding SceneDespite the emergence of sophisticated high-rate multimedia services, voice communicationsremain the predominant means of human communications, although the compressed voicesignals may be delivered via the Internet. The large-scale, pervasive introduction of wirelessInternet services is likely to promote the unied transmission of both voice and data signalsusing the Voice over Internet Protocol (VoIP) even in the third-generation (3G) wirelesssystems, despite wasting much of the valuable frequency resources for the transmission ofpacket headers. Even when the predicted surge of wireless data and Internet services becomesa reality, voice remains the most natural means of human communications, although this maybe delivered via the Internet.This book is dedicated to audio and voice compression issues, although the aspectsof error resilience, coding delay, implementational complexity and bitrate are also at thecentre of our discussions, characterising many different speech codecs incorporated in source-sensitivity matched wireless transceivers. A unique feature of this book is that it also providescutting-edge turbo-transceiver-aided research-oriented design examples and a chapter on theVoIP protocol.Here we attempt a rudimentary comparison of some of the codec schemes treated in thebook in terms of their speech quality and bitrate, in order to provide a road map for the readerwith reference to Coxs work [1,2]. The formally evaluated mean opinion score (MOS) valuesof the various codecs portrayed in this book are shown in Figure 1.Observe in the gure that over the years a range of speech codecs have emerged, whichattained the quality of the 64 kbps G.711 pulse-code modulation (PCM) speech codec,although at the cost of signicantly increased coding delay and implementational complexity.The 8 kbps G.729 codec is the most recent addition to this range of the InternationalTelecommunications Unions (ITU) standard schemes, which signicantly outperforms allprevious standard ITU codecs in robustness terms. The performance target of the 4 kbps ITUcodec (ITU4) is also to maintain this impressive set of specications. The family of codecsdesigned for various mobile radio systems such as the 13 kbps regular pulse excited (RPE)scheme of the Global System of Mobile communications known as GSM, the 7.95 kbps IS-54, and the IS-95 Pan-American schemes, the 6.7 kbps Japanese digital cellular (JDC) and3.45 kbps half-rate JDC arrangement (JDC/2) exhibits slightly lower MOS values than theITU codecs. Let us now consider the subjective quality of these schemes in a little moredepth.The 2.4 kbps US Department of Defence Federal Standard codec known as FS-1015is the only vocoder in this group and it has a rather synthetic speech quality, associatedwith the lowest subjective assessment in the gure. The 64 kbps G.711 PCM codec andxxv 25. xxvi PREFACE AND MOTIVATIONthe G.726/G.727 adaptive differential PCM (ADPCM) schemes are waveform codecs. Theyexhibit a low implementational complexity associated with a modest bitrate economy. Theremaining codecs belong to the so-called hybrid coding family and achieve signicant bitrateeconomies at the cost of increased complexity and delay.ExcellentGoodFairPoorMOS2 4 8 16 32 64 128bit rate (kb/s)PCMG.711G.726G.728GSMG.729G.723ITU4IS54IS96JDCIn-MFS1016JDC/2MELPFS1015New ResearchComplexityDelayFigure 1: Subjective speech quality of various codecs [1] c IEEE, 1996.Specically, the 16 kbps G.728 backward-adaptive scheme maintains a similar speechquality to the 32 and 64 kbps waveform codecs, while also maintaining an impressively low,2 ms delay. This scheme was standardised during the early 1990s. The similar quality, butsignicantly more robust 8 kbps G.729 codec was approved in March 1996 by the ITU.Its standardisation overlapped with the G.723.1 codec developments. The G.723.1 codecs6.4 kbps mode maintains a speech quality similar to the G.711, G.726, G.727, G.728 andG.728 codecs, while its 5.3 kbps mode exhibits a speech quality similar to the cellular speechcodecs of the late 1980s. The standardisation of a 4 kbps ITU scheme, which we refer to hereas ITU4, is also a desirable design goal at the time of writing.In parallel to the ITUs standardisation activities a range of speech coding standards havebeen proposed for regional cellular mobile systems. The standardisation of the 13 kbps RPE-long-term prediction (LTP) full-rate GSM (GSM-FR) codec dates back to the second half ofthe 1980s, representing the rst standard hybrid codec. Its complexity is signicantly lowerthan that of the more recent code excited linear predictive (CELP) based codecs. Observe inthe gure that there is also a similar-rate enhanced full-rate GSM codec (GSM-EFR), whichmatches the speech quality of the G.729 and G.728 schemes. The original GSM-FR codecs 26. PREFACE AND MOTIVATION xxviidevelopment was followed a little later by the release of the 7.95 kbps vector sum excitedlinear predictive (VSELP) IS-54 American cellular standard. Due to advances in the eld the7.95 kbps IS-54 codec achieved a similar subjective speech quality to the 13 kbps GSM-FRscheme. The denition of the 6.7 kbps Japanese JDC VSELP codec was almost coincidentwith that of the IS-54 arrangement. This codec development was also followed by a half-ratestandardisation process, leading to the 3.2 kbps pitch-synchroneous innovation CELP (PSI-CELP) scheme.The IS-95 Pan-American code division multiple access (CDMA) system also has its ownstandardised CELP-based speech codec, which is a variable-rate scheme, supporting bitratesbetween 1.2 and 14.4 kbps, depending on the prevalent voice activity. The perceived speechquality of these cellular speech codecs contrived mainly during the late 1980s was foundsubjectively similar to each other under the perfect channel conditions of Figure 1. Lastly, the5.6 kbps half-rate GSM codec (GSM-HR) also met its specication in terms of achieving asimilar speech quality to the 13 kbps original GSM-FR arrangements, although at the cost ofquadruple complexity and higher latency.Recently, the advantages of intelligent multimode speech terminals (IMT), which canrecongure themselves in a number of different bitrates, quality and robustness modes,attracted substantial research attention in the community, which led to the standardisationof the high-speed downlink packet access (HSDPA) mode of the 3G wireless systems. TheHSDPA-style transceivers employ both adaptive modulation and adaptive channel coding,which result in a channel-quality dependent bitrate uctuation, hence requiring recongurablemultimode voice and audio codecs, such as the advanced multirate codec, referred to as theAMR scheme. Following the standardisation of the narrowband AMR codec, the widebandAMR scheme, referred to as the AMR-WB arrangement and encoding the 07 kHz band, wasalso developed, which will also be characterised in this book. Finally, the most recent AMRcodec, namely the so-called AMR-WB+ scheme, will also be the subject of our discussions.Recent research on sub-2.4kbps speech codecs is also covered extensively in this book,where the aspects of auditory masking become more dominant. Finally, since the classicG.722 sub-band-adaptive differential pulse code modulation (ADPCM) based widebandcodec has become obsolete in the light of exciting new developments in compression, themost recent trend is to consider wideband speech and audio codecs, providing substan-tially enhanced speech quality. Motivated by early seminal work on transform-domain orfrequency-domain based compression by Noll and his colleagues, in this eld the widebandG.721.1 codec which can be programmed to operate between 10 kbps and 32 kbps andhence lends itself to employment in HSDPA-style near-instantaneously adaptive wirelesscommunicators is the most attractive candidate. This codec is portrayed in the context ofa sophisticated burst-by-burst adaptive wideband turbo-coded orthogonal frequency divisionmultiplex (OFDM) IMT in this book. This scheme is also capable of transmitting high-qualityaudio signals, behaving essentially as a high-quality waveform codec.Milestones in Speech Coding HistoryOver the years a range of excellent monographs and text books have been published,characterising the state-of-the-art at its various stages of development and constitutingsignicant milestones. The rst major development in the history of speech compression 27. xxviii PREFACE AND MOTIVATIONcan be considered to be the invention of the vocoder, dating back to as early as 1939. Deltamodulation was contrived in 1952 and later it became well established following Steelesmonograph on the topic in 1975 [3]. PCM was rst documented in detail in Cattermolesclassic contribution in 1969 [4]. However, it was realised in 1967 that predictive codingprovides advantages over memoryless coding techniques, such as PCM. Predictive techniqueswere analysed in depth by Markel and Gray in their 1976 classic treatise [5]. This was shortlyfollowed by the often cited reference [6] by Rabiner and Schafer. Also, Lindblom and Ohmancontributed a book in 1979 on speech communication research [7].The foundations of auditory theory were laid down as early as 1970 by Tobias [8], butthese principles were not exploited to their full potential until the invention of the analysis-by-synthesis (AbS) codecs, which were heralded by Atals multi-pulse excited codec in theearly 1980s [9]. The waveform coding of speech and video signals has been comprehensivelydocumented by Jayant and Noll in their 1984 monograph [10]. During the 1980s the speechcodec developments were fuelled by the emergence of mobile radio systems, where spectrumwas a scarce resource, potentially doubling the number of subscribers and hence the revenue,if the bitrate could be halved.The RPE principle as a relatively low-complexity AbS technique was proposedby Kroon, Deprettere and Sluyter in 1986 [11], which was followed by further researchconducted by Vary [12,13] and his colleagues at PKI in Germany and IBM in France, leadingto the 13 kbps Pan-European GSM codec. This was the rst standardised AbS speech codec,which also employed LTP, recognising the important role the pitch determination plays inefcient speech compression [14,15]. It was in this era, when Atal and Schroeder invented thecode excited linear predictive (CELP) principle [16], leading to perhaps the most productiveperiod in the history of speech coding during the 1980s. Some of these developments werealso summarised, for example, by OShaughnessy [17], Papamichalis [18] and Deller, Proakisand Hansen [19].It was during this era that the importance of speech perception and acoustic phoneticswas duly recognised, for example, in the monograph by Lieberman and Blumstein [20]. Arange of associated speech quality measures were summarised by Quackenbush, BarnwellIII and Clements [21]. Nearly concomitantly Furui also published a book related to speechprocessing [22]. This period witnessed the appearance of many of the speech codecs seenin Figure 1, which found applications in the emerging global mobile radio systems, suchas IS-54, JDC, etc. These codecs were typically associated with source-sensitivity matchederror protection, where, for example, Steele, Sundberg and Wong [2326] have providedearly insights on the topic. Further sophisticated solutions were suggested, for example, byHagenauer [27].Both the narrowband and wideband AMR, as well as the AMR-WB+ codecs [28, 29]are capable of adaptively adjusting their bitrate. This also allows the user to adjust the ratiobetween the speech bitrate and the channel coding bitrate constituting the error protectionoriented redundancy according to the prevalent near-instantaneous channel conditions inHSDPA-style transceivers. When the channel quality is inferior, the speech encoder operatesat low bitrates, thus accommodating more powerful forward error control within the totalbitrate budget. By contrast, under high-quality channel conditions the speech encoder maybenet from using the total bitrate budget, yielding high speech quality, since in this high-rate case low redundancy error protection is sufcient. Thus, the AMR concept allows thesystem to operate in an error-resilient mode under poor channel conditions, while benetting 28. PREFACE AND MOTIVATION xxixfrom a better speech quality under good channel conditions. Hence, the source coding schememust be designed for seamless switching between rates available without annoying artifacts.Overview of MPEG-4 AudioThe denition of the MPEG-4 audio standard was the culmination of the 60-year researchconducted by the global research community, as portrayed in Figure 3, which will be detailedthroughout out discussions in the book. The Moving Picture Experts Group (MPEG) wasrst established by the International Standard Organisation (ISO) in 1988 with the aim ofdeveloping a full audio-visual coding standard referred to as MPEG-1 [3032]. The audio-related section MPEG-1 was designed to encode digital stereo sound at a total bitrate of 1.4to 1.5 Mbps depending on the sampling frequency, which was 44.1 kHz or 48 kHz downto a few hundred kilobits per second [33]. The MPEG-1 standard is structured in layers, fromLayer I to III. The higher layers achieve a higher compression ratio, albeit at an increasedcomplexity. Layer I achieves perceptual transparency, i.e. subjective equivalence with theuncompressed original audio signal at 384 kbps, while Layer II and III achieve a similarsubjective quality at 256 kbps and 192 kbps, respectively [3438].MPEG-1 was approved in November 1992 and its Layer I and II versions wereimmediately employed in practical systems. However, the MPEG Audio Layer III, MP3 forshort only became a practical reality a few years later, when multimedia PCs were introducedhaving improved processing capabilities and the emerging Internet sparked off a proliferationof MP3 compressed teletrafc. This changed the face of the music world and the distributionof music. The MPEG-2 backward compatible audio standard was approved in 1994 [39],providing an improved technology that would allow those who had already launchedMPEG-1 stereo audio services to upgrade their system to multichannel mode, optionally alsosupporting a higher number of channels at a higher compression ratio. Potential applicationsof the multichannel mode are in the eld of quadraphonic music distribution or cinemas.Furthermore, lower sampling frequencies were also incorporated, which include 16, 22.05,24, 32, 44.1 and 48 kHz [39]. Concurrently, MPEG commenced research into even higher-compression schemes, relinquishing the backward compatibility requirement, which resultedin the MPEG-2 advanced audio coding standard (AAC) standard in 1997 [40]. This providesthose who are not constrained by legacy systems to benet from an improved multichannelcoding scheme. In conjunction with AAC, it is possible to achieve perceptual transparentstereo quality at 128 kbps and transparent multichannel quality at 320 kbps; for example incinema-type applications.The MPEG-4 audio recommendation is the latest standard completed in 1999 [4145],which offers, in addition to compression, further unique features that will allow users tointeract with the information content at a signicant higher level of sophistication than ispossible today. In terms of compression, MPEG-4 supports the encoding of speech signalsat bitrates from 2 kbps up to 24 kbps. For coding of general audio, ranging from very lowbitrates up to high quality, a wide range of bitrates and bandwidths are supported, rangingfrom a bitrate of 8 kbps and a bandwidth below 4 kHz to broadcast quality audio, includingmonaural representations up to multichannel conguration.The MPEG-4 audio codec includes coding tools from several different encoding families,covering parametric speech coding, CELP-based speech coding and time/frequency (T/F) 29. xxx PREFACE AND MOTIVATIONParametric(HILN)24 644832SatelliteSecure com20 kHz8 kHz4 kHz2 4 6 8 1210 14 16Scalable CodecT/F codecbit rate (kbps)UMTS, Cellular ISDNInternetParametric Codec(HVXC)BandwidthTypical AudioITU-TcodecCELP codecFigure 2: MPEG-4 framework [41].audio coding, which are characterised in Figure 2. It can be observed that a parametric codingscheme, namely Harmonic Vector eXcitation Coding (HVXC) was selected for covering thebitrate range from 2 to 4 kbps. For bitrates between 4 and 24 kbps, a CELP-coding schemewas chosen for encoding narrowband and wideband speech signals. For encoding generalaudio signals at bitrates between 8 and 64 kbps, a T/F coding scheme based on the MPEG-2AAC standard [40] endowed with additional tools is used. Here, a combination of differenttechniques was established, because it was found that maintaining the required performancefor representing speech and music signals at all desired bitrates cannot be achieved byselecting a single coding architecture. A major objective of the MPEG-4 audio encoder isto reduce the bitrate, while maintaining a sufciently high exibility in terms of bitrateselection. The MPEG-4 codec also offers other new functionalities, which include bitratescalability, object-based of a specic audio passage for example, where a distinct objectmay be dened as a passage played by a certain instrument coding, as well as an increasedrobustness against transmission errors and supporting special audio effects.MPEG-4 consists of Versions 1 and 2. Version 1 [41] contains the main body ofthe standard, while Version 2 [46] provides further enhancement tools and functionalities,that includes the issues of increasing the robustness against transmission errors and errorprotection, low-delay audio coding, nely grained bitrate scalability using the Bit-SlicedArithmetic Coding (BSAC) tool, the employment of parametric audio coding, using theCELP-based silence compression tool and the 4 kbps extended variable bitrate mode of theHVXC tool. Due to the vast amount of information contained in the MPEG-4 standard, we 30. PREFACE AND MOTIVATION xxxiwill only consider some of its audio compression components, which include the coding ofnatural speech and audio signals. Readers who are specically interested in text-to-speechsynthesis or synthetic audio issues are referred to the MPEG-4 standard [41] and to thecontributions by Scheirer et al. [47, 48] for further information. Most of the material inChapter 10 will be based on an amalgam of [3438,40,41,43,44,46,49]. In this chapter, theoperations of each component of the MPEG-4 audio component will be highlighted in greaterdetail. As an application example, we will employ the transform-domain weighted interleavedvector quantisation (TWINVQ) coding tool, which is one of the MPEG-4 audio codecsin the context of a wireless audio transceiver in conjunction with spacetime coding [50]and various quadrature amplitude modulation (QAM) schemes [51]. The audio transceiver isintroduced in Section 10.5 and its performance is discussed in Section 10.5.6.Motivation and Outline of this BookDuring the early 1990s, Atal, Cuperman and Gersho [52] edited prestigious contributions onspeech compression. Also, Ince [53] contributed a book in 1992 related to the topic. Andersonand Mohan co-authored a monograph on source and channel coding in 1993 [54]. Research-oriented developments were then consolidated in Kondoz excellent monograph in 1994 [55]and in the multi-authored contribution edited by Kleijn and Paliwal [56] in 1995. The mostrecent addition to the above range of contributions is the second edition of OShaughnessywell-referenced book cited above. However, at the time of writing no book spans the entirehistory of speech and audio compression, which is the goal of this volume.Against this backcloth, this book endeavours to review the recent history of speechcompression and communications in the era of wireless turbo-transceivers and joint source/channel coding. We attempt to provide the reader with a historical perspective, commencingwith a rudimentary introduction to communications aspects, since throughout this book weillustrate the expected performance of the various speech codecs studied also in the contextof jointly optimised wireless transceivers.This book contains four parts. Parts I and II cover classic background material on speechsignals, predictive waveform codecs and analysis-by-synthesis codecs as well as the entirespeech and audio coding standardisation scene. The bulk of the book is contained in theresearch-oriented Parts III and IV, covering both standardised and proprietary speech codecs including the most recent AMR-WB+ and the MPEG-4 audio codecs, as well as cutting-edge wireless turbo transceivers.Specically, Chapters 1 and 2 of Part I provide a rudimentary introduction to speechsignals, classic waveform coding as well as predictive coding, respectively, quantifying theoverall performance of the various speech codecs, in order to render our treatment of thetopics as self-contained and all-encompassing as possible.Part II of this book is centred around AbS based coding, reviewing the classic principlesin Chapter 3 as well as both narrow and wideband spectral envelope quantisation in Chapter 4.RPE and CELP coding are the topic of Chapters 5 and 6, which are followed by a detailedchapter on the entire plethora of existing forward-adaptive standardised CELP codecs inChapter 7 and on their associated source-sensitivity matched channel coding schemes. Thesubject of Chapter 8 is both proprietary and standard backward-adaptive CELP codecs, 31. xxxii PREFACE AND MOTIVATIONAlgorithms/Techniques Timeline Standards/Commercial Codecs1983198119791970196119401986198719911990198919881999199819971995199419931992MPEG-4 Version 1 & 2 finalized [110,111]Dolby AC-2 [103]MPEG-1 Audio finalized [104]Dolby AC-3 [103]MPEG-2 backward compatible [107]MPEG-2 Advanced Audio Coding (AAC) [109]CNET codec [91]Levine & Smith, Verma & Ming:Sinusoidal+Transients+Noise coding [100,101]Park: Bit-Sliced Arithmetic Coding (BSAC) [98]Herre & Johnston: Temporal Noise Shaping [97]Iwakami: TWINVQ [96]Herre: Intensity Stereo Coding [95]Mahieux: backward adaptive prediction [91]Edler: Window switching strategy [92]Johnston: M/S stereo coding [93]Johnston: Perceptual Transform Coding [90]Scharf, Hellman: Masking effects [84,85]Schroeder: Spread of masking [86]Rothweiler: Polyphase Quadrature Filter [88]Fletcher: Auditory patterns [81]Nussbaumer: Pseudo-Quadrature Mirror Filter [87]Princen: Time Domain Aliasing Cancellation [89]Malvar: Modified Discrete Cosine Transform [94]Sony: MiniDisc: Adaptive TransformNTT: Transform-domain WeightedInterleaved Vector Quantization (TWINVQ) [96,108]Philips: Digital Compact Cassette (DCC) [106]Zwicker, Greenwood: Critical bands [82,83]AT&T: Perceptual Audio Coder (PAC) [102]Purnhagen: Parametric Audio Coding [99]Acoustic Coding (ATRAC) [105]Figure 3: Important milestones in the development of perceptual audio coding. 32. PREFACE AND MOTIVATION xxxiiiwhich is concluded with a system design example based on a low-delay, multimode wirelesstransceiver.The research-oriented Part III of this book is dedicated to a range of standard andproprietary wideband coding techniques and wireless systems. As an introduction to thewideband coding scene, in Chapter 9 the classic sub-band-based G.722 wideband codec isreviewed rst, leading to the discussion of numerous low-rate wideband voice and audiocodecs. Chapter 9 also contains diverse sophisticated wireless voice- and audio-system designexamples, including a turbo-coded OFDM wideband audio system design study. This isfollowed by a wideband voice transceiver application example using the AMR-WB codec,a source-sensitivity matched Irregular Convolutional Code (IRCC) and extrinsic informationtransfer (EXIT) charts for achieving a near-capacity system performance. Chapter 9 isconcluded with the portrayal of the AMR-WB+ codec. In Chapter 10 of Part III we detailthe principles behind the MPEG-4 codec and comparatively studied the performance of theMPEG-4 and AMR-WB audio/speech codecs combined with various sophisticated wirelesstransceivers. Amongst others, a jointly optimised source-coding, outer unequal protectionnon-systematic convolutional (NSC) channel-coding, inner trellis coded modulation (TCM)and spatial diversity aided spacetime trellis coded (STTC) turbo transceiver investigated.The employment of TCM provided further error protection without expanding the bandwidthof the system and by utilising STTC spatial diversity was attained, which rendered the errorstatistics experienced pseudo-random, as required by the TCM scheme, since it was designedfor Gaussian channels inicting randomly dispersed channel errors. Finally, the performanceof the STTC-TCM-2NSC scheme was enhanced with the advent of an efcient iterative jointdecoding structure.Chapters 1117 of Part IV are all dedicated to sub-4 kbps codecs and their wirelesstransceivers, while Chapter 18 is devoted to speech quality evaluation techniques as wellas to a rudimentary comparison of various speech codecs and transceivers. The last chapterof the book is on VoIP.This book is naturally limited in terms of its coverage of these aspects, simply owing tospace limitations. We endeavoured, however, to provide the reader with a broad range of ap-plication examples, which are pertinent to a range of typical wireless transmission scenarios.Our hope is that this book offers you the reader a range of interesting topics,portraying the current state-of-the-art in the associated enabling technologies. In simpleterms, nding a specic solution to a voice communications problem has to be based ona compromise in terms of the inherently contradictory constraints of speech quality, bitrate,delay, robustness against channel errors, and the associated implementational complexity.Analysing these trade-offs and proposing a range of attractive solutions to various voicecommunications problems is the basic aim of this book.Again, it is our hope that this book underlines the range of contradictory system designtrade-offs in an unbiased fashion and that you will be able to glean information from it, inorder to solve your own particular wireless voice communications problem, but most of allthat you will nd it an enjoyable and relatively effortless reading, providing you the reader with intellectual stimulation.Lajos HanzoClare SomervilleJason Woodard 33. AcknowledgementsThe book has been conceived in the Electronics and Computer Science Department at theUniversity of Southampton, although Dr Somerville and Dr Woodard have moved on in themean-time. We are indebted to our many colleagues who have enhanced our understanding ofthe subject, in particular to Professor Emeritus Raymond Steele. These colleagues and valuedfriends, too numerous all to be mentioned, have inuenced our views concerning variousaspects of wireless multimedia communications and we thank them for the enlightenmentgained from our collaborations on various projects, papers and books. We are gratefulto Jan Brecht, Jon Blogh, Marco Breiling, Marco del Buono, Sheng Chen, Stanley Chia,Byoung Jo Choi, Joseph Cheung, Peter Fortune, Sheyam Domeya, Lim Dongmin, DirkDidascalou, Stephan Ernst, Eddie Green, David Greenwood, Hee Thong How, ThomasKeller, Ee-Lin Kuan, Joerg Kliewer, W.H. Lam, C.C. Lee, M.A. Nofal, Xiao Lin, CheeSiong Lee, Tong-Hooi Liew, Soon-Xin Ng, Matthias Muenster, Noor Othman, VincentRoger-Marchart, Redwan Salami, David Stewart, Jeff Torrance, Spiros Vlahoyiannatos, JinWang, William Webb, John Williams, Jason Woodard, Choong Hin Wong, Henry Wong,James Wong, Lie-Liang Yang, Bee-Leong Yeap, Mong-Suan Yee, Kai Yen, Andy Yuen andmany others with whom we enjoyed an association.We also acknowledge our valuable associations with the Virtual Centre of Excellencein Mobile Communications, in particular with its Chief Executives, Dr Tony Warwick andDr Walter Tuttlebee, Dr Keith Baughan and other members of its Executive Committee,Professors Hamid Aghvami, Mark Beach, John Dunlop, Barry Evans, Joe McGeehan, SteveMacLaughlin and Rahim Tafazolli. Our sincere thanks are also due to John Hand and NafeesaSimjee, the EPSRC, UK; Dr Joao Da Silva, Dr Jorge Pereira, Bartholome Arroyo, BernardBarani, Demosthenes Ikonomou and other colleagues from the Commission of the EuropeanCommunities, Brussels, Belgium; Andy Wilton, Luis Lopes and Paul Crichton from MotorolaECID, Swindon, UK for sponsoring some of our recent research.We feel particularly indebted to Hee Thong How for his invaluable contributions to thebook by co-authoring some of the chapters and to Rita Hanzo as well as Denise Harvey fortheir skillful assistance in typesetting the manuscript in LATEX. Similarly, our sincere thanksare due to Mark Hammond, Jennifer Beal, Sarah Hinton and a number of other staff fromJohn Wiley & Sons for their kind assistance throughout the preparation of the camera-readymanuscript. Finally, our sincere gratitude is due to the numerous authors listed in the AuthorIndex as well as to those, whose work was not cited due to space limitations for theircontributions to the state-of-the-art, without whom this book would not have materialised.Lajos HanzoClare SomervilleJason Woodardxxxv 34. Part ISpeech Signals and WaveformCoding 35. Chapter 1Speech Signals and anIntroduction to Speech Coding1.1 Motivation of Speech CompressionAccording to the lessons of information theory, the minimum bitrate at which the condition ofdistortionless transmission of any source signal is possible is determined by the entropy of thespeech source message. Note, however, that in practical terms the source rate correspondingto the entropy is only asymptotically achievable as the encoding memory length or delaytends to innity. Any further compression is associated with information loss or codingdistortion. Many practical source compression techniques employ so-called lossy coding,which typically guarantees further bitrate economy at the cost of nearly imperceptible speech,audio, video, etc, source representation degradation.Note that the optimum Shannonian source encoder generates a perfectly uncorrelatedsource coded stream, where all the source redundancy has been removed, therefore theencoded source symbols which are in most practical cases constituted by binary bits areindependent and each one has the same signicance. Having the same signicance impliesthat the corruption of any of the source encoded symbols results in identical source signaldistortion over imperfect channels.Under these conditions, according to Shannons fundamental work [5759], best pro-t

speech and voice compression

Documents

speech coding

sponsorjohn wiley sons

john wiley sons canada

digitisation of speech

compressed speech

authors xxiother wiley

voice compression

speech processing systems