digital speech - buch.de · digital speech coding for low bit ... ited by the nyquist criterion,...

15
Digital Speech Coding for Low Bit Rate Communication Systems Second Edition A. M. Kondoz University of Surrey, UK.

Upload: truonghuong

Post on 20-Sep-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

  • Digital SpeechCoding for Low Bit Rate Communication Systems

    Second Edition

    A. M. KondozUniversity of Surrey, UK.

    Innodata0470870095.jpg

  • Digital Speech

  • Digital SpeechCoding for Low Bit Rate Communication Systems

    Second Edition

    A. M. KondozUniversity of Surrey, UK.

  • Copyright 2004 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,West Sussex PO19 8SQ, England

    Telephone (+44) 1243 779777

    Email (for orders and customer service enquiries): [email protected] our Home Page on www.wileyeurope.com or www.wiley.com

    All Rights Reserved. No part of this publication may be reproduced, stored in a retrievalsystem or transmitted in any form or by any means, electronic, mechanical, photocopying,recording, scanning or otherwise, except under the terms of the Copyright, Designs andPatents Act 1988 or under the terms of a licence issued by the Copyright Licensing AgencyLtd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing ofthe Publisher. Requests to the Publisher should be addressed to the Permissions Department,John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ,England, or emailed to [email protected], or faxed to (+44) 1243 770620.

    This publication is designed to provide accurate and authoritative information in regard tothe subject matter covered. It is sold on the understanding that the Publisher is not engaged inrendering professional services. If professional advice or other expert assistance is required,the services of a competent professional should be sought.

    Other Wiley Editorial Offices

    John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

    Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

    Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

    John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia

    John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore129809

    John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

    Wiley also publishes its books in a variety of electronic formats. Some content that appearsin print may not be available in electronic books.

    British Library Cataloguing in Publication Data

    A catalogue record for this book is available from the British Library

    ISBN 0-470-87008-7 (PB)

    Typeset in 11/13pt Palatino by Laserwords Private Limited, Chennai, IndiaPrinted and bound in Great Britain by Antony Rowe Ltd, Chippenham, WiltshireThis book is printed on acid-free paper responsibly manufactured from sustainable forestryin which at least two trees are planted for each one used for paper production.

    http://www.wileyeurope.comhttp://www.wiley.com

  • To my mother Fatma,my wife Munise, and our children Mustafa and Fatma

  • Contents

    Preface xiii

    Acknowledgements xv

    1 Introduction 1

    2 Coding Strategies and Standards 52.1 Introduction 52.2 Speech Coding Techniques 6

    2.2.1 Parametric Coders 72.2.2 Waveform-approximating Coders 82.2.3 Hybrid Coding of Speech 8

    2.3 Algorithm Objectives and Requirements 92.3.1 Quality and Capacity 92.3.2 Coding Delay 102.3.3 Channel and Background Noise Robustness 102.3.4 Complexity and Cost 112.3.5 Tandem Connection and Transcoding 112.3.6 Voiceband Data Handling 11

    2.4 Standard Speech Coders 122.4.1 ITU-T Speech Coding Standard 122.4.2 European Digital Cellular Telephony Standards 132.4.3 North American Digital Cellular Telephony Standards 142.4.4 Secure Communication Telephony 142.4.5 Satellite Telephony 152.4.6 Selection of a Speech Coder 15

    2.5 Summary 18Bibliography 18

    3 Sampling and Quantization 233.1 Introduction 23

  • viii Contents

    3.2 Sampling 233.3 Scalar Quantization 26

    3.3.1 Quantization Error 273.3.2 Uniform Quantizer 283.3.3 Optimum Quantizer 293.3.4 Logarithmic Quantizer 323.3.5 Adaptive Quantizer 333.3.6 Differential Quantizer 36

    3.4 Vector Quantization 393.4.1 Distortion Measures 423.4.2 Codebook Design 433.4.3 Codebook Types 443.4.4 Training, Testing and Codebook Robustness 52

    3.5 Summary 54Bibliography 54

    4 Speech Signal Analysis and Modelling 574.1 Introduction 574.2 Short-Time Spectral Analysis 57

    4.2.1 Role of Windows 584.3 Linear Predictive Modelling of Speech Signals 65

    4.3.1 Source Filter Model of Speech Production 654.3.2 Solutions to LPC Analysis 674.3.3 Practical Implementation of the LPC Analysis 74

    4.4 Pitch Prediction 774.4.1 Periodicity in Speech Signals 774.4.2 Pitch Predictor (Filter) Formulation 78

    4.5 Summary 84Bibliography 84

    5 Efficient LPC Quantization Methods 875.1 Introduction 875.2 Alternative Representation of LPC 875.3 LPC to LSF Transformation 90

    5.3.1 Complex Root Method 955.3.2 Real Root Method 955.3.3 Ratio Filter Method 985.3.4 Chebyshev Series Method 1005.3.5 Adaptive Sequential LMS Method 100

    5.4 LSF to LPC Transformation 101

  • Contents ix

    5.4.1 Direct Expansion Method 1015.4.2 LPC Synthesis Filter Method 102

    5.5 Properties of LSFs 1035.6 LSF Quantization 105

    5.6.1 Distortion Measures 1065.6.2 Spectral Distortion 1065.6.3 Average Spectral Distortion and Outliers 1075.6.4 MSE Weighting Techniques 107

    5.7 Codebook Structures 1105.7.1 Split Vector Quantization 1115.7.2 Multi-Stage Vector Quantization 1135.7.3 Search strategies for MSVQ 1145.7.4 MSVQ Codebook Training 116

    5.8 MSVQ Performance Analysis 1175.8.1 Codebook Structures 1175.8.2 Search Techniques 1175.8.3 Perceptual Weighting Techniques 119

    5.9 Inter-frame Correlation 1215.9.1 LSF Prediction 1225.9.2 Prediction Order 1245.9.3 Prediction Factor Estimation 1255.9.4 Performance Evaluation of MA Prediction 1265.9.5 Joint Quantization of LSFs 1285.9.6 Use of MA Prediction in Joint Quantization 129

    5.10 Improved LSF Estimation Through Anti-Aliasing Filtering 1305.10.1 LSF Extraction 1315.10.2 Advantages of Low-pass Filtering in Moving Average

    Prediction 1355.11 Summary 146Bibliography 146

    6 Pitch Estimation and VoicedUnvoiced Classification of Speech 1496.1 Introduction 1496.2 Pitch Estimation Methods 150

    6.2.1 Time-Domain PDAs 1516.2.2 Frequency-Domain PDAs 1556.2.3 Time- and Frequency-Domain PDAs 1586.2.4 Pre- and Post-processing Techniques 166

    6.3 VoicedUnvoiced Classification 1786.3.1 Hard-Decision Voicing 1786.3.2 Soft-Decision Voicing 189

  • x Contents

    6.4 Summary 196Bibliography 197

    7 Analysis by Synthesis LPC Coding 1997.1 Introduction 1997.2 Generalized AbS Coding 200

    7.2.1 Time-Varying Filters 2027.2.2 Perceptually-based Minimization Procedure 2037.2.3 Excitation Signal 2067.2.4 Determination of Optimum Excitation Sequence 2087.2.5 Characteristics of AbS-LPC Schemes 212

    7.3 Code-Excited Linear Predictive Coding 2197.3.1 LPC Prediction 2217.3.2 Pitch Prediction 2227.3.3 Multi-Pulse Excitation 2307.3.4 Codebook Excitation 2387.3.5 Joint LTP and Codebook Excitation Computation 2527.3.6 CELP with Post-Filtering 255

    7.4 Summary 258Bibliography 258

    8 Harmonic Speech Coding 2618.1 Introduction 2618.2 Sinusoidal Analysis and Synthesis 2628.3 Parameter Estimation 263

    8.3.1 Voicing Determination 2648.3.2 Harmonic Amplitude Estimation 266

    8.4 Common Harmonic Coders 2688.4.1 Sinusoidal Transform Coding 2688.4.2 Improved Multi-Band Excitation, INMARSAT-M Version 2708.4.3 Split-Band Linear Predictive Coding 271

    8.5 Summary 275Bibliography 275

    9 Multimode Speech Coding 2779.1 Introduction 2779.2 Design Challenges of a Hybrid Coder 280

    9.2.1 Reliable Speech Classification 2819.2.2 Phase Synchronization 281

    9.3 Summary of Hybrid Coders 2819.3.1 Prototype Waveform Interpolation Coder 282

  • Contents xi

    9.3.2 Combined Harmonic and Waveform Coding at Low Bit-Rates 2829.3.3 A 4 kb/s Hybrid MELP/CELP Coder 2839.3.4 Limitations of Existing Hybrid Coders 284

    9.4 Synchronized Waveform-Matched Phase Model 2859.4.1 Extraction of the Pitch Pulse Location 2869.4.2 Estimation of the Pitch Pulse Shape 2929.4.3 Synthesis using Generalized Cubic Phase Interpolation 297

    9.5 Hybrid Encoder 2989.5.1 Synchronized Harmonic Excitation 2999.5.2 Advantages and Disadvantages of SWPM 3019.5.3 Offset Target Modification 3049.5.4 Onset Harmonic Memory Initialization 3089.5.5 White Noise Excitation 309

    9.6 Speech Classification 3119.6.1 Open-Loop Initial Classification 3129.6.2 Closed-Loop Transition Detection 3159.6.3 Plosive Detection 318

    9.7 Hybrid Decoder 3199.8 Performance Evaluation 3209.9 Quantization Issues of Hybrid Coder Parameters 322

    9.9.1 Introduction 3229.9.2 Unvoiced Excitation Quantization 3239.9.3 Harmonic Excitation Quantization 3239.9.4 Quantization of ACELP Excitation at Transitions 331

    9.10 Variable Bit Rate Coding 3319.10.1 Transition Quantization with 4 kb/s ACELP 3329.10.2 Transition Quantization with 6 kb/s ACELP 3329.10.3 Transition Quantization with 8 kb/s ACELP 3339.10.4 Comparison 334

    9.11 Acoustic Noise and Channel Error Performance 3369.11.1 Performance under Acoustic Noise 3379.11.2 Performance under Channel Errors 3459.11.3 Performance Improvement under Channel Errors 349

    9.12 Summary 350Bibliography 351

    10 Voice Activity Detection 35710.1 Introduction 35710.2 Standard VAD Methods 360

    10.2.1 ITU-T G.729B/G.723.1A VAD 361

  • xii Contents

    10.2.2 ETSI GSM-FR/HR/EFR VAD 36110.2.3 ETSI AMR VAD 36210.2.4 TIA/EIA IS-127/733 VAD 36310.2.5 Performance Comparison of VADs 364

    10.3 Likelihood-Ratio-Based VAD 36810.3.1 Analysis and Improvement of the Likelihood Ratio Method 37010.3.2 Noise Estimation Based on SLR 37310.3.3 Comparison 373

    10.4 Summary 375Bibliography 375

    11 Speech Enhancement 37911.1 Introduction 37911.2 Review of STSA-based Speech Enhancement 381

    11.2.1 Spectral Subtraction 38211.2.2 Maximum-likelihood Spectral Amplitude Estimation 38411.2.3 Wiener Filtering 38511.2.4 MMSE Spectral Amplitude Estimation 38611.2.5 Spectral Estimation Based on the Uncertainty of Speech

    Presence 38711.2.6 Comparisons 38911.2.7 Discussion 392

    11.3 Noise Adaptation 40211.3.1 Hard Decision-based Noise Adaptation 40211.3.2 Soft Decision-based Noise Adaptation 40311.3.3 Mixed Decision-based Noise Adaptation 40311.3.4 Comparisons 404

    11.4 Echo Cancellation 40611.4.1 Digital Echo Canceller Set-up 41111.4.2 Echo Cancellation Formulation 41311.4.3 Improved Performance Echo Cancellation 415

    11.5 Summary 423Bibliography 426

    Index 429

  • Preface

    Speech has remained the most desirable medium of communication betweenhumans. Nevertheless, analogue telecommunication of speech is a cumber-some and inflexible process when transmission power and spectral utilization,the foremost resources in any communication system, are considered. Dig-ital transmission of speech is more versatile, providing the opportunity ofachieving lower costs, consistent quality, security and spectral efficiency inthe systems that exploit it. The first stage in the digitization of speech involvessampling and quantizations. While the minimum sampling frequency is lim-ited by the Nyquist criterion, the number of quantifier levels is generallydetermined by the degree of faithful reconstruction (quality) of the signalrequired at the receiver. For speech transmission systems, these two limita-tions lead to an initial bit rate of 64 kb/s the PCM system. Such a high bitrate restricts the much desired spectral efficiency.

    The last decade has witnessed the emergence of new fixed and mobiletelecommunication systems for which spectral efficiency is a prime mover.This has fuelled the need to reduce the PCM bit rate of speech signals. Digitalcoding of speech and the bit rate reduction process has thus emerged asan important area of research. This research largely addresses the followingproblems:

    Although it is very attractive to reduce the PCM bit rate as much aspossible, it becomes increasingly difficult to maintain acceptable speechquality as the bit rate falls.

    As the bit rate falls, acceptable speech quality can only be maintained byemploying very complex algorithms, which are difficult to implement inreal-time even with new fast processors with their associated high cost andpower consumption, or by incurring excessive delay, which may createecho control problems elsewhere in the system.

    In order to achieve low bit rates, parameters of a speech production and/orperception model are encoded and transmitted. These parameters arehowever extremely sensitive to channel corruption. On the other hand,the systems in which these speech coders are needed typically operate