politecnico di torino · nadia perreca, id 211012 speech coding: techniques, standards and...

45
POLITECNICO DI TORINO ANALOG AND TELECOMMUNICATION ELECTRONICS MINIPROJECT: SPEECH CODING: TECHNIQUES, STANDARDS AND APPLICATIONS PROFESSOR: DANTE DEL CORSO STUDENT: NADIA PERRECA, ID: 211012 ACADEMIC YEAR: 2013-2014

Upload: others

Post on 19-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  •  

     

    POLITECNICO DI TORINO

    ANALOG AND TELECOMMUNICATION ELECTRONICS

    MINIPROJECT:

     

    SPEECH CODING: TECHNIQUES, STANDARDS AND APPLICATIONS

     

     

     

     

     

     

     

     

    PROFESSOR: DANTE DEL CORSO

    STUDENT: NADIA PERRECA, ID: 211012

    ACADEMIC YEAR: 2013-2014 

     

     

     

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    2  

    INDEX

    INTRODUCTION 3

    CHAPTER I: SPEECH CODING 4

    I.1 Speech signal 4

    I.2 Speech processing 7

    I.3 Speech coding 8

    I.4 Speech coding standards 12

    I.5 Parametric representations 13

    I.6 Waveform representations 17

    I.7 Methods of comparison of speech coding techniques 20

    CHAPTER II: PULSE CODE MODULATION TECHNIQUES 22

    II.1 Pulse Code Modulation 22

    II.2 Linear PCM 24

    II.3 Logarithmic PCM 27

    II.3.1 A and μ conversion laws 29

    II.4 Differential PCM 33

    II.5 Adaptive Differential PCM 37

    II.6 Time division multiplexing 39

    APPENDIX 42

    BIBLIOGRAPHY 44

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    3  

    INTRODUCTION

    Speech signals are maybe the most natural and common signals we can image to deal with; that’s

    why, in the sphere of Information Technologies, voice communication have always played a very

    important role. Speech signals are unpredictable signals, whose values and characteristic vary very

    much according to the speaker and the message he wants to transmit, so there’s the need to develop

    specific techniques that can simplify their complicated processing. Among them, the speech coding

    is the process of obtaining a compact representation of voice signals for efficient transmission over

    band-limited wired and wireless channels, storage or many other applications. “Compact” is not a

    simple adjective but a key word: the goal of speech coding is to represent speech in digital form

    with as few bit per second as possible without losing the intelligibility and "pleasantness" of

    speech, which include speaker identity, emotions, intonation, timbre and so on. The need of

    compactness is due to the technological transition from analog to digital electronics.

    In the past, the speech signal coding techniques have been implemented and optimized for

    networks "dedicated" to the telephone traffic; today, speech coders have become essential

    components in both telecommunications and multimedia infrastructures. Commercial systems that

    rely on efficient speech coding include cellular communication, voice over internet protocol

    (VOIP), videoconferencing, electronic toys and so on.

    The aim of this project is to analyze the basic characteristics of speech signals and the most

    common speech signal processing techniques, with particular attention to the speech coding as a

    compression data technique. We’ll give some information about the different types of speech signal

    representations and the methods we can use to compare them. Then, we’ll focus on the different

    Pulse Code Modulation techniques and their application to the speech coding, in order to point out

    benefits and drawbacks of each technique and differences among all of them. In the end, we’ll

    analyze the Time Division Multiplexing technique, that’s one of the most important and actual

    application of Pulse Code Modulation technique.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    4  

    CHAPTER I: SPEECH CODING

    This chapter is an introduction to the speech coding techniques, standards and applications.

    We’ll analyze the basic characteristics of speech signals and the most common speech signal

    processing techniques, with particular attention to the speech coding as a compression data

    technique. We’ll give some information about the different types of speech signal representations

    and the methods we can use to compare them.

    I.1 SPEECH SIGNAL

    A speech signal is created by the vocal cords, travels through the vocal tract, is produced at

    speakers’ mouth and gets to the listener’s ear as a pressure wave.

    From an engineering point of view, we can model the speech production with a source-filter

    model: we can see the vocal cords as a source and the vocal tract a resonant cavity. If you placed a

    microphone right above someone’s glottis during voicing, you would hear the glottal source by

    itself as a buzzing sound. The vocal tract filters the sound energy by suppressing some components

    of the glottal wave and amplifying the ones that are close to the resonance frequencies of the vocal

    tract, which depends on its shape and its length. In this way, it changes the sound quality of the

    complex wave produced by the sound source. That means that when we talk about speech signal,

    we mean a sort filtered version of the real emitted sound.

    In Figure 1, we can see a representation of a speech signal.

    Figure 1: Speech signal.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    5  

    We can divide the speech sound into voiced and unvoiced: voiced signals are produced when the

    vocal cords vibrate during the pronunciation of a phoneme, while unvoiced signals do not entail the

    use of the vocal cords. Voiced signals tend to be louder like the vowels; on the other hand, unvoiced

    signals tend to be more abrupt like the stop consonant. The production of voiced and unvoiced

    speech is separated by silence regions: during silence region, there is no excitation supplied to the

    vocal tract and hence no speech output. However, silence is an integral part of speech signal: even if

    from an energy point of view it’s unimportant, its duration is essential for intelligible speech and it

    helps to recognize a certain category of sounds. Without the presence of silence region between

    voiced and unvoiced speech, the speech will not intelligible.

    A first distinction among voiced, unvoiced and silence speech can be done looking at the signal

    amplitude: if it’s low or negligible, the signal can be marked as silence, otherwise as unvoiced. If it

    exceeds a threshold level that is usually chosen by the user according to the preliminary noted

    characteristics of the sound he’s studying, it is declared to be voiced. In Figure 2, this classification

    is illustrated.

    Figure2: Speech signal: voiced, unvoiced and silence regions.

    As we can see from previous figures, speech is not a predictable signal; from an analytic point of

    view, it means that it’s a non stationary signal with a non-uniform probability distribution, even if

    sometimes it’s approximated with a Gaussian distribution. Its characteristics vary quickly and

    depend on the different emitted sound; this makes the speech signal hard to analyze and model.

    A quite common practical solution consists in to model the speech signal as a slowly varying

    function of time: we mean that, during intervals of 5 to 25 ms, the speech characteristics hopefully

    don’t change too much and we can consider them to be almost constant. That means that, over small

    time intervals, the speech signal can be considered stationary with good approximation. Over these

    windows we can analyze the signal spectrum, the power density distribution and we can distinguish

    voice and unvoiced sounds.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    6  

    A general block diagram is shown in Figure 3.

    Figure 3: Block diagram of the analysis of speech signal by using frames.

    The bandwidth of speech signals is concentrated in the bandwidth: 300 Hz- 3.4 kHz; they have a

    low pass trend.

    Even if it’s difficult to recognize this fact in the time domain, voiced signal waveforms are

    periodic signals, so they have a line spectrum. On the contrary, unvoiced signal spectrum is

    continuous. There may be regions where the speech can be mixed version of voiced and unvoiced

    speech. In mixed speech, the speech signal will look like unvoiced speech, but you will also observe

    some periodic structures. We can see the voiced speech as the useful signal and the unvoiced speech

    as a sort of noise; usually, it is modeled as a White Gaussian Random Variable.

    In Figure 4 we can see the signal for the speech word “six”; if we consider a frame during the

    pronunciation of the consonant “s” (unvoiced signal), the signal appears continuous, as a sort of

    noise, while if we analyze a frame during the pronunciation of the vocal “i” (voiced signal), we can

    see a more regular signal.

    Figure 4: Speech word “six”.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    7  

    I.2 SPEECH PROCESSING

    Speech processing is the study of speech signals and the processing methods of these signals. It

    involves the study of the techniques we use to deal with speech signals and of all the applications

    they are suitable to. There are analog and digital speech signal processing techniques. At the very

    beginning, speech signal processing was developed using analog electronics, in fact all 1G systems

    have analog speech transmission. Since the 1970s, signal processing has more and more been

    implemented on computers in the digital domain, in fact all 2G and 3G systems have digital speech

    transmission.

    Digital speech signal processing techniques are easier, less expensive, more sophisticated and

    faster than the analog ones. They allow an improvement in quality of speech, are reliable and very

    compact and can be implemented in Integrated Circuits. Plus, with regard to the transmission of

    voice signal requiring security, the digital representation has a distinct advantage over analog

    systems: the information bits can be scrambled in a manner which can be ultimately be unscrambled

    at the receiver.

    Digital signal processing techniques can be applied in many speech communication areas:

    - Transmission and storage;

    - Speaker verification and identification;

    - Speech recognition (conversion from speech to written text);

    - Aids to the handicapped;

    - Enhancement of signal quality.

    A general scheme which includes the fundamental signal processing is shown in Figure 5: a

    speech coder converts the analog speech signal into a coded digital representation, which is usually

    transmitted in frames over a non distorting channel. A speech decoder receives coded frames and

    synthesizes reconstructed speech.

    Figure 5: Block diagram of a general speech signal processing.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    8  

    I.3 SPEECH CODING

    As we can see in Figure 5, the speech signal coding is a very important step in speech signal

    processing. It involves the study of all the techniques used to represent and threat speech signals in

    a more convenient form that allows us to use them for several applications such as acquisition,

    manipulation, storage, transfer and so on.

    In general, a speech signal coder has two essential characteristics:

    - Integrity of the speech: the information contained into the speech signal must be kept intact

    without distortions.

    - Quality: the speech signal must be intelligible and pleasantness, that means that things such

    as speaker identity, emotions, intonation, timbre and so on must be recognized.

    In addition, it can have several desirable properties as:

    - Low bit-rate;

    - Low memory requirements;

    - Low transmission power required;

    - Fast transmission speed;

    - Low computational complexity;

    - Low coding delay;

    - Robustness.

    We can easily understand why all these properties are desirable. If we have a low bit-rate coder, a

    less bandwidth is required for the transmission. Plus, we reduce the amount of transmitted data,

    saving memory and transmission power required and increasing the transmission speed. If the coder

    has a low computational complexity, the power required still decrees. The coding delay is the time

    that elapses from the time the speech sample arrives at the encoder input to the time the speech

    sample appears at the decoder output, so it’s clear that we want a coding delay that is as low as

    possible, in order to minimize interferences and interruptions during the communication. A speech

    code is robustness if it’s suitable for any types of speakers such as male, female, children, and many

    different languages. That’s a very difficult property to satisfy because in general we need different

    circuits and devices to deal with different types of speech sound, according to their complexity.

    However, we have to put in mind that here is always a tradeoff between having one or another

    property, in particular between low bit-rate and speech quality. In general, we have to design the

    system according to the given specifications.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    9  

    There are essentially two types of digital speech signal coding: waveform representations and

    parametric representations. As we can see in Figure 6, both of them involves a series of techniques

    that we’ll analyze in the next sections.

    Figure 6: Speech signal coding techniques

     

    Waveform representation, as the name implies, are concerned with preserving the wave shape of the

    analog speech signal trough a sampling and quantization process.

    Parametric representations, on the other hand, are concerned with representing the speech signal

    by using some of its characteristic parameters.

    There are also hybrid representations, which are a fusion of the two illustrated coding

    techniques, but we won’t analyze them.

    In the study of speech signal processing, the speech coding is a very important matter, especially in

    telecommunication area.

    Speech coding is the process of obtaining a compact representation of the speech signal that can

    be efficiently transmitted over band-limited wired and wireless channels or stored in digital media.

    “Compact” is not a simple adjective but a key word: the goal of speech coding is to represent

    speech in digital form with as few bits as possible without losing the intelligibility and

    "pleasantness" of speech, which include speaker identity, emotions, intonation, timbre and so on.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    10  

    Other requirements, such as low coding delay, good performances, complexity, low losses, depend

    on the particular application we’re dealing with.

    In general, we can recognize parametric representations as a form of speech coding, but we’ll

    refer more in detail to the ways we have to improve the bit rate of waveform representations,

    because they allow to safe the quality of the signal. We’ll see that a standard bit-rate for a waveform

    representation is fixed at 64 kb/s: any bit-rate below 64 kb/s is treated as compression and the

    output of the source encoder is an encoded speech signal having a bit-rate less than 64 kb/s.

    If we compress the speech signal by reducing the number of bits per sample, we obtain a lot of

    benefits:

    - Reduction of the bandwidth;

    - Reduction of the transmitted data (memory occupation);

    - Reduction of the transmission power required;

    - Increase of the transmission speed;

    - Increase of the immunity to noise (some of the saved bits per sample can be used as

    protective error control bits to the speech parameters).

    Usually we can distinguish four levels of quality, according to the bit rate:

    - TOLL: perfect quality;

    - NEAR TOLL: almost perfect;

    - DIGITAL CELLULAR: noise is introduced as background but the spoken is still very well

    reconstructed;

    - LOW BIT RATE: speech is over that noisy artificial and not natural, but still

    understandable.

    In Table 1, we can see some examples of cited speech coding and relative bit rate and quality.

    Table 1: Speech coding bit rate and quality.

    Speech coding Bit rate (kb/s) Quality

    PCM 64/32 TOLL

    DPCM 32/16 NEAR TOOL

    ADPCM 4 DIGITAL CELLULAR

    Vocoder 4.2 LOW BIT RATE

    LPC-10 2.4 LOW BIT RATE

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    11  

    In the past, the speech signal coding techniques have been implemented and optimized for networks

    "dedicated" to the telephone traffic; but the growing need of integration between telephony and data

    will involve the study of new standards that offer such services, the "voice" of IP (Internet

    Protocol), and network data, which are able to ensure quality levels comparable to those offered the

    old telephone network. The satellite communication systems, where the cost of the channel is very

    high, mobile systems, where the number of users grows exponentially, as well as multimedia

    systems, whose information content requires employment of considerable mass memory, all

    applications are for which it is necessary to introduce processes of encoding voice.

    We can synthesize all the areas of application in a unique immediate graph, shown in Figure 7.

    Figure 7: Speech coding applications.

    Clearly, according to the specific area we are interested in, we’ve to use different coding techniques

    and standards.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    12  

    I.4 SPEECH CODING STANDARDS

    Standards for landline Public Switched Telephone Service (PSTN) networks are established by the

    International Telecommunication Union (ITU). The ITU is a branch of the International Standards

    Organization (ISO), which also develops standards for the Moving Picture Experts Group (MPEG).

    The ITU has promulgated a number of important speech and waveform coding standards at high bit

    rates and with very low delay, including:

    - G.711 : it standardizes the PCM 64 kb/s in which you are using a uniform quantization for

    the discretization of the amplitudes in 8 bits per sample;

    - G.721: it standardizes the ADPCM halving the bit-rate to 32 kb/s while maintaining the

    same encoding quality;

    - G.722: it standardizes the ADPCM 64 kb/s but uses two ADCPM 32 kb/s, one in the band

    0-4 kHz and the other in the 4-7 kHz band;

    - G.723.1: it provides two operating speeds: one at 6.3 kb/s and the other at 5.3 kb/s.

    Standards for cellular telephony in Europe are established by the European Telecommunications

    Standards Institute (ETSI). The ETSI has standardized algorithm for speech coding digital mobile

    communication standards, published by the Global System for Mobile Telecommunications

    (GSM) subcommittee. All speech coding standards for digital cellular telephone use are based on

    LPC-AS algorithms.

    The first GSM standard coder was based on a precursor of CELP called regular-pulse excitation

    with long-term prediction. In 1999 he designed a new mobile network with global coverage called

    UMTS ( Universal Mobile Telecommunication System ) for which the ETSI has proposed a new

    coding standard called the AMR ( Adaptive Multi-Rate ) as it uses an encoder that generates

    adaptive traffic flows with eight different speeds ( from 12.2 kb/s up to 4.75 kb/s) in function of the

    operating conditions.

    In Table 2 we can see some applications of illustrated standards.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    13  

    Table 2: Speech coding for several applications.

    Application Bandwidth

    (kHz)

    Bit rate

    (kb/s)

    Standards

    organization

    Standard

    number Algorithm Year

    Landline telephone 3.4 64 ITU G.711 PCM 1988

    Video conferencing 7 64 (32+32) ITU G.722 ADPCM 1988

    Digital cellular 3.4 8 ITU G.729 ACELP 1996

    Digital cellular 3.4 12.2 ETSI EFR ACELP 1997

    VoIP 3.4 5.3–6.3 ITU G.723.1 CELP 1996

    I.5 PARAMETRIC REPRESENTATIONS

    Parametric representations are concerned with representing the speech signal by using some

    parameters which are obtained by analyzing the speech signal spectrum.

    The idea is that a sampled speech signal contains a great deal of information that is either

    redundant (nonzero mutual information between successive samples in the signal) or perceptually

    irrelevant (information that is not perceived by human listeners): if we succeed in describe the

    signal by using some of its characteristic parameters, we can transmit these parameters instead than

    the signal itself. The parameters of a signal change relatively slowly than the signal they describe

    and they are in a little amount, so we can save bandwidth and increase the speed of transmission. In

    this way we obtain a lot of benefits in term of transmitted data, memory safe, transmission power

    and speed and so on.

    Clearly, there are backwards in term of quality: the output signal is not a reconstruction of the

    input signal based on its samples, but on some parameters which describes it in an encrypted way;

    we’re not able to recreate the original speech, but only a dehumanizing version of it.

    In order to obtain these parameters, we need to describe the speech signal production with a

    mathematical model, as the one introduced in Section I.1. This model is called source-filter model

    because we model the vocal cords as a source and the vocal tract a resonant cavity. The vocal tract

    filters the sound energy by suppressing some components of the glottal wave and amplifying others,

    the ones that are close to the resonance frequencies of the vocal tract, which depends on its shape

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    14  

    and its length. In this way, it changes the sound quality of the complex wave produced by the

    sound source.

    Figure 8: Source – Filter model.

    Next step consists in representing voiced and unvoiced signals.

    If we segment the speech signal into frames of small time duration in which it can be considered

    as a stationary signal, as seen in Section I.1, we can examine each frame at time. The duration of a

    single frame must be short enough so that the properties of the sound do not change significantly

    within it; must be long enough to be able to calculate the parameters that we want to estimate (also

    useful for reducing the effect of any noise which affect the signal). Plus, the series of windows

    should cover the entire signal, like shown in Figure 9.

    Figure 9: Framed speech signal.

    We can use different types of frame: this choice, clearly, influences the quality of the analysis. The

    simplest one is the rectangular waveform, but this choice can provide large fluctuation of the

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    15  

    parameters we are interested in. For example, if we’re measuring the energy of the signal and the

    frame shifts, the part of the signal that is contained in the new frame can assume higher values than

    the one it assumed in the previous frame and it causes a big difference in the signal energy. So, an

    alternative to the rectangular window may be the Hanning one: tapering the ends of the window, we

    avoid to have large effects on the parameters even if the signal suddenly changes. Both framing are

    shown in Figure 10.

    Figure 10: Different types of windows.

    If we look at a single frame, we know that we can model the voiced signal as a periodic signal

    and the unvoiced one as a White Gaussian Random Variable. In this way, we can model the signal

    source (the vocal cords) as two distinct signal sources.

    Looking at Figures 9 and 11, we can see that there’s an overlapping between different frames: this

    fact allows us to predict the trend of the signal in the next frame by studying its trend in the current

    frame.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    16  

    Figure 11: Different types of windows.

    One of the most powerful speech analysis techniques, and one of the most useful methods for

    encoding good quality speech at a low bit (only 2kb/s!) is the Linear Predictive Coding (LPC).

    It’s defined as a method for encoding an analog signal, in which the value of a current speech

    sample is estimated by using its past few speech sample values. That’s possible just because frames

    are usually overlapped.

    In Figure 12 we can see a schematization of an LPC model.

    Figure 12: Linear Prediction Coding model.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    17  

    First of all, we’re interested in understanding if a voiced or an unvoiced signal has been transmitted:

    according to the analysis of the frame, a switch selects the right source. We can’t neglect the effect

    of area which acts like a multiplicative factor on the signal, so we find a multiplier. Then, we have

    to characterize the filter, which models the effect of the vocal tract; the characteristic parameters of

    this filter, such as the gain, the cut off frequency, depends on the specific frame and changes when

    we consider different frames and can be estimated using different methods, such as the one of

    interest, the LPC.

    The parameters which describe the model, and so the speech signal, are transmitted to the

    receiver. The receiver unit needs to be set up in the same channel configuration to re-synthesize a

    version of the original signal spectrum in order to recreate speech; it will carry out LPC synthesis

    using the received parameters and builds a source filter model, that when provided a correct input,

    will accurately reconstruct the original speech signal.

    LPC is generally used for speech analysis and resynthesis. Some applications of this technique are:

    - Phone companies (GSM standard);

    - Vocoders;

    - Secure Wireless;

    - Audio codecs.

    I.6 WAVEFORM REPRESENTATIONS

    Waveform representation (also called standard representations) are concerned with preserving the

    wave shape of the analog speech signal in order to transmit a loyal representation of the speech

    signal: they are characterized by a great quality. That implies many data to transmit, a lot of power

    required, but also a simple structure based on the well noted sampling and quantization techniques:

    that’s the reason why they are not specific to speech signals and can be used for any type of signals.

    As we can say in Figure 6, there are two different types of waveform coders: Time domain

    Waveform Coders and Frequency domain Waveform Coders. They are based essentially on the

    same idea to represent the speech signal using the set of its samples, but they differ themselves

    about the techniques used to implement it. We’ll focus on the first class of techniques and in

    particular on the Pulse Code Modulation and its variants (Linear PCM (LPCM), Logarithmic PCM

    (LPCM), Differential PCM (DPCM), Adaptive Differential PCM (ADPCM)).

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    18  

    In Figure 13, the general scheme of a waveform representation technique is shown. As

    anticipated before, this block diagram is quite general and it’s suitable to schematize many types of

    applications; however, regarding speech processing we have a series of standards which governess

    every single step according to the application of interest.

    Figure 13: Block diagram of a generic waveform representation technique.

    Let’s analyze the sampler and the A/D conversion blocks more in detail, as shown in Figure 14.

    Figure 14: Detailed block diagram of sampling and A/D conversion.

    The sampler provides to make a continuous-time signal into a discrete-time signal. In order to keep

    the value of the signal constant for the time required to the following circuits to convert it, a Sample

    and Hold technique is used.

    The most used sampling technique is the Pulse Amplitude Modulation (PAM). It is an analog,

    impulsive modulation technique: it means that the modulating signal is an analog signal and the

    carrier signal is a train of pulses whose rate depends on the Nyquist’s criteria and whose duration

    depends on the time required for A/D conversion. In Figure 15 the PAM modulation is shown.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    19  

    Figure 15: Pulse Code Modulation; A) Analog signal (modulating signal), B) Train of pulses (carrier), C) Sampled

    signal (modulated signal).

     

    Since the bandwidth of speech signals is from 20 Hz up to 20 kHz, we should sample at least at 40

    kHz (according to the Nyquist’s criteria). Anyway, we said that the energy of a speech signal is

    concentrated in the firsts 4 kHz so we could think to modify the sampling rate according, for

    example, to the bandwidth of telephone channel lines, which is from 300 Hz to 3.4 kHz. As we well

    know, the output signal of a telephone line is a clear and pleasant sound!

    In this way we could sample at a frequency at least greater than 6.8 kHz; international

    standards fixed the telephonic sampling rate at 8kHz, so the sampling period has a duration of 125

    μs. It means that the speech signal would be perfectly reconstructed if you have at least 8000

    samples per second.

    8 / → 125

    The sampled signal is a discrete-time signal with continuous amplitude. The A/D converter provides

    a quantization and a coding of the amplitude of each modulated signal pulse. The number of bits

    assigned for the encoding has been determined by the international standards to 8 bits, so since we

    have to transmit 8000 samples per second, we work at 64kb/s. The type of code used depends on

    the nature of the communication channel and the transmission speed. As anticipated, any bit-rate

    below 64 kb/s is treated as compression and the output of the source encoder is an encoded speech

    signal having a bit-rate less than 64 kb/s.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    20  

    In the end, we apply a source and channel encoding, which are a set of operations that simplify

    the transmission of data over the channel. For example, the source encoding allows to converge at

    the same time a greater number of traffic flows on a single physical medium.

    The decoding step provides more or less the same operations but in reverse, in order to obtain a

    loyal version of the analog input speech signal as output.

    Waveform coders are most useful in applications that require the successful coding of both voiced

    and unvoiced signals. In the Public Switched Telephone Network (PSTN), for example, successful

    transmission of modem and fax signaling tones, and switching signals is nearly as important as the

    successful transmission of speech.

    I.7 METHODS OF COMPARISON OF SPEECH CODING TECHNIQUES

    If we want to compare a parametric technique with a waveform technique (or with an hybrid

    technique), we need some indicator of the intelligibility and quality of the speech produced by each

    coder. There are two methods for evaluating the quality of the speech signal that has undergone a

    process of compression:

    1) Subjective methods: they are the most significant and reliable methods of comparison, but

    they are also very expensive and require high test development time. The most commonly

    used parameter is the MOS (Mean Opinion Score), which represents the average ratings of

    opinion of a group of listeners. To establish a MOS for a coder, listeners are asked to

    classify the quality of the encoded speech in one of five categories characterized by a

    numerical value:

    1- Bad

    2- Poor

    3- Fair

    4- Good

    5- Excellent

    We can consider a speech coder as a good one if it’s characterized by a MOS greater than

    3.5-4. In Figure 16 we can see how the MOS of different types of speech coders changes

    according to the increasing of the bit rate.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    21  

    Figure 16: Comparison of different types of speech coding techniques.

    2) Objective methods: these methods are used in the initial phase of a codec project. They

    provide several analytical measurements; the most important one is the relationship Signal-

    to-Noise Ratio (SNR) between the power of the input signal and the power of the error

    coding. The objective measures have the advantage of being able to be carried out

    automatically (and therefore of very large databases), plus they don’t depend on the tastes of

    listeners. The main problem with objective measures is that, especially for coders operating

    at low speeds, they are not correlated with the quality of the speech signal.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    22  

    CHAPTER II: PULSE CODE MODULATION TECHNIQUES

    In this chapter we’ll analyze the different Pulse Code Modulation techniques and their application

    to the speech coding, in order to point out benefits and drawbacks of each technique and differences

    among all of them. Then, we’ll analyze the Time Division Multiplexing technique, that’s a very

    important application based on the Pulse Code Modulation.

    II.1 PULSE CODE MODULATION

    The Pulse Code Modulation (PCM) is a method used to digitally represent sampled analog signals;

    in other words, it’s a quantization technique that is usually applied to PAM signals, as we

    anticipated in the previous Chapter, Section I.6.

    The quantization is an operation which, given a continuous amplitude signal, returns a discrete

    amplitude signal. The set of discrete amplitudes depends on the range of values assumed by the

    input signal and on the number of bits, N, of the quantizer. In the case of interest, in which the input

    signal is a PAM signal, quantization affects the amplitude of each sample, by comparing it with the

    different levels of quantization and by rounding it to the closest level. The quantization is an

    operation which introduces an error called quantization error, due to the rounding or the

    truncation of the signal amplitude: it’s the difference between the real analog signal (A) and the

    quantized digital value of the same (A’).

    The quantization error can be quantified by evaluating the Signal to Noise Ratio (SNR):

    This quantity is usually express in dB, so:

    | 10

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    23  

    In order to compute the power of a signal s t , we need to know its probability density function, ρ, because:

    where σ is the variance of the signal. As explained in the previous chapter, Section I.1, speech signal probability density function can

    be approximated with a Gaussian, so we have:

    36

    And then:

    | 10 36 ∙1

    Since this parameter depends on the signal power and, so, on the signal amplitude, and since speech

    signals are usually low level signals, we can expect that the SNR will be very low. As we can see in Figure 17, the SNR related to speech signals is lower than the one computed for sine or square input signals. We will see that the SNR improves at the increasing of the number of bit; in particular, for each bit we add, we improve the SNR of 6 dB.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    24  

    Figure 17: SNR for different types of waveforms.

    II.2 LINEAR PCM

    Linear PCM, or uniform PCM, is the name given to quantization algorithm in which the

    reconstruction levels are uniformly distributed among the PAM range of values [0; S] (we consider a unipolar signal, but nothing changes if we consider a bipolar signal whose dynamic belongs to a

    range V ; V ] ). It means that we divide the signal dynamic range into a number M 2 of interval having the same amplitude, A ; the interval amplitude is also called quantization step and is equal to:

    2

    As we can see in Figure 18, the ideal quantization characteristic is a step function.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    25  

    Figure 18: Ideal transfer function of a linear quantizer.

    As anticipated in the previous section, the quantization introduces an error that’s intrinsic in the

    process: it can’t be deleted, but only reduced. This error is due to the rounding or the truncation of

    the signal amplitude and can be expressed as the difference between the real analog signal and the

    quantized digital value of the same.

    In the case of LPCM, the error has a sawthoot trend (as shown in Figure 19) and its maximum

    value is:

    2

    Figure 19: Quantization error.

    The advantage of LPCM is that the quantization error affects in the same way all the signal

    dynamic; this property is desirable in many digital audio applications. However, we can easily

    understand that the quantization error is much more relevant when the signals has low amplitude

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    26  

    rather than when the signal is high: if we deal with a low power signal, the quantization error affects

    its value more than it does if we deal with an high power signal. It means that high level signals are

    quantized with a good precision, while low level signals with a bad precision: in general, we want

    to quantizy whit the same precision all signal levels.

    Let’s evaluate the .

    Since the quantization error has a sawthoot trend, we can approximate it with a triangular

    waveform, whose probability density function is uniform over the function support ( /2; /2]), and easily calculate:

    /

    /

    /

    / 12 12212 2 ∙ 12

    So, we obtain:

    | 10 36 ∙2 ∙ 12 10 23 10 ∙ 4 3

    6 4.77

    As anticipated in Section II.1, we have a very low (a negative term is present, while when we

    deal with other input signal, we have a positive term), because of the low level of speech signals.

    The SNR has a linear trend until the input signal reaches an amplitude that is higher than the quantizer dynamic: this condition is called overload and is shown in Figure 20.

    Figure 20: SNR for Linear PCM.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    27  

    Plus, we can see that if we increase the number of bit, we improve the SNR of 6dB: the drawback of this choice is clearly related to the complexity of a system characterized by an elevated number

    of bit.

    II.3 LOGARITHMIC PCM

    A clever way to improve the accuracy of the quantization technique consists in realizing a non-

    linear (or non-uniform) quantization; it means that we consider a quantization step size which is

    not constant over the entire dynamic range of the signal but changes according to the level of the

    input signal. In this way, we can quantizy a low level signal using a very small quantization step

    size and an high level signal using a wider quantization step size, obtaining an acceptable precision

    over all PAM signal levels.

    Since, in general, we don’t know the signal distribution, a good criteria to follow in order to

    realize a non linear PCM is to make the constant on all PAM signal levels. In this way we

    can realize a general technique which doesn’t depend on the specific signal amplitude.

    We can obtain a non-linear quantization by using an analog or a digital process.

    In the analog process, the continuous amplitude PAM signal passes through an analog

    compressor before being converted into a digital signal. The compressor is essentially a logarithmic

    amplifier that has the task of amplifying the lowest levels of the PAM signal and compress the high

    ones.

    In the digital process the continuous amplitude PAM signal is converted into a digital signal by

    using a linear quantization and subsequently it passes through a compressor which modifies the

    digital representation using a different number of bits.

    In the last generation systems, the digital non-linear quantization method is more adopted for

    obvious reasons of cost, performance, simplicity of construction and integration with all other

    apparatuses numeric. Anyway, both solution need that the receiving apparatus PCM must contain

    an organ known as complementary to the compressor expander which can restore the original levels

    analog information.

    For voice signals, whose values are usually very low, we want narrower intervals close to zero, so

    the best type of non linear PCM to adopt can intuitively be the logarithmic one.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    28  

    Figure 21: Ideal transfer function of a logarithmic quantizer.

    Actually, we can prove this fact in a more rigorous way.

    If we look at the block diagram in Figure 22, we can see that the quantization error is generally

    seen as an additive error.

    Figure 22: Block diagram of a logarithmic quantizer.

    Out of our scheme we will have D, the digital signal, which is:

    For low level input signals, whose amplitude is close to zero, we want a quantization error which is

    close to zero, so we can express it as:

    where K is close to 1.

    In this way, we can see the additive quantization error as a multiplicative error:

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    29  

    This is a very good thing, because the quantization error is relative to the value of the signal and,

    since we can see it as a multiplicative error, it changes linearly the output of the system.

    As we can see in Figure 23, we have a constant SNR until the overload condition.

    Figure 23: SNR for Logarithmic PCM.

    II.3.1 A AND μ CONVERSION LAWS

    The logarithmic characteristic does not pass through the origin: this fact can be a problem when we

    process signals whose level is very close to the origin, a that’s quite common when we deal with

    speech signals.

    Two laws have been enacted to standardize the ways to solve this problem:

    - A – Law: provides the linearization of the logarithmic characteristic for values close to the

    origin, whose amplitude is fixed by a parameter A, as shown in Figure 24.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    30  

    Figure 24: A – Law.

    Mathematically speaking, we have:

    ∙ | | 1

    | | | | 1

    This law is in the standard law in Europe.

    - μ – Law: provides a translation of the logarithmic characteristic in order to obtain a passage

    through the origin, as shown in Figure 25. The translation must be equal to the intercept of

    the curve; the transfer function will be:

    1 ∙

    Figure 25: μ – Law.

    This law is the standard law in USA and Japan.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    31  

    These laws introduce an approximation of the logarithmic characteristic: in both cases, in fact, we

    don’t have a logarithmic trend close to the origin and that causes a decreasing of the SNR , as shown in Figure 26.

    Figure 26: SNR of an approximated logarithmic PCM.

    In Figure 27, the SNR ’s trend for both linear and logarithmic trend is shown.

    Figure 27: Comparison of the SNR for different PCM.

    In order to simplify the analysis of a logarithmic characteristic, we can introduce a piecewise

    approximation: the shape of the function is approximated with a set of lines, one put after the

    other. We divide the function in segments, which are divided in levels, as shown in Figure 28.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    32  

    Figure 28: Piecewise approximation.

    We can notice the presence of three numbers: the first one is a sign parameters, so a bit which can

    identify the polarity of the signal; the second one is a segment parameter, which can identify which

    line we are considering, and the last one is a level parameter, that identifies which point of the

    considered segment are we considering.

    With logarithmic functions, we have, for each point, that the same distance represent the same

    ratio; in piecewise approximation we can imagine that this behavior is globally satisfied, but not

    locally. This approximation introduces a very bad issue: on each segment, we have a linear

    approximation of the logarithm and this means that the quantization error is constant only on each

    segment. This is a bad thing because if we consider points near to the bound of the segment, which

    have almost the same amplitude, with points which are right or left respect to the bound, we will

    have very different quantization error, even if the points are close.

    The actual behavior of the signal to noise ratio is shown in Figure 29: there are ripples where we

    expected to have a fat behavior; these ripples are wide 6 dB, because of the considerations

    previously done.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    33  

    Figure 29: SNR of a piecewise approximated Log-PCM.

    II.4 DIFFERENTIAL PCM

    The basic problem with the previous types of PCM is that the quantizer works on a fixed dynamic

    range, while the speech signal, for its nature, is usually very low. It means that we effectively work

    with a number of bits that’s smaller than the number of bits of the quantizer: we don’t use the

    bits which are associated to the higher values of the dynamic range. In the following section

    we’ll see how to solve this problem in a very clever way that allows us to drastically reduce the bit

    rate. Clearly, it means that we have also a drastically reduction of speech signal quality.

    These methods are based on the observation that consecutive samples are often correlated. This

    allow two considerations:

    1) if samples are correlated, we can predict in a more or less precise way the value of a sample

    by an estimation of previous samples;

    2) correlated samples contain redundant information that are no useful, so we can delete them

    and obtain a faster transmission.

    Differential pulse-code modulation (DPCM) is a signal encoder that uses the baseline of pulse-code

    modulation (PCM) but adds some functionalities based on the exposed ideas. DPCM was invented

    by C. Chapin Cutler at Bell Labs in 1950.

    This method provides the coding of the difference between an input signal and its predicted

    value, estimated by an evaluation of previous input samples; in other words, we code the prediction

    error. If the difference (so, the error) is small, it means that the two samples are strictly correlated

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    34  

    and we can remove redundant information, plus the number of bits required for transmission is

    reduced. In this way, we can obtain compression ratios on the order of 2 to 4 and we drastically

    reduce the bit rate: clearly, as we know well, this provides a drawback in term of the quality of the

    obtained signal, that will be still clear and understandable, but no more pleasantness.

    It’s a predictive form of coding, because we have to predict current sample value based upon

    previous samples.

    The general block diagram of a transmitter and receiver system based on DPCM is shown in

    Figure 30.

    Figure 30: Transmitter and Receiver for a DPCM.

    Let’s analyze them.

    The Predictor is a unit whose task is to predict the quantized value of the current input sample,

    by an estimation which depends on previous sample (or samples) that were normally quantizied,

    and a prediction factor. We can intuitively understand that we can do a better estimation if we

    consider many samples instead than only one. In order to consider more samples, we need to

    consider a framed input signal, so all the samples which occur in a frame are used to do a new

    estimation. Let assume that K is the number of samples in a frame; the choice of the value of K is

    critical because:

    - If K is low, we have less samples and they are more correlated: they contain a lot of

    redundant information that we can delete. However, we have few parameters to deal with, so

    the complexity of the circuit is reduced, but the precision of the estimation decreases.

    - If K is high, we have more samples and they may have very different values, so they are less

    correlated. We can do a good estimation, but the complexity of the circuit increases and also

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    35  

    its efficiency, because if we have uncorrelated samples, there’re no redundant information to

    remove.

    We have to chose K in such a way that a possible error will weigh little on the estimate and,

    contemporary, that we can work with samples which are correlated. A clever idea that’s usually

    implemented is to multiply the frame with an exponential function that allows to weigh the close

    samples and reduce the weight of the more distant samples.

    The predicted value is a function of those K previous samples:

    1 ; 2 ;…;

    The difference between the input signal and its predicted value is called prediction error and it’s the

    input of the quantizer.

    If the prediction was right, and so x n x , this signal is zero: redundant information are deleted. Otherwise, it’s a very low signal and we need few bits to quantizy it. In this way, the bit rate is

    strongly reduce: if we use, for example, 4 bits, we work at 32 Kb/s.

    In the receiver the process is reversed. We need to decoder the input signal and then to add the

    predicted signal.

    Now we have to understand how the Predictor works.

    Usually, the prediction is linear: it means that the predicted value is a weighed linear

    combination of previous quantized samples.

    1 2 3 …

    where A, B, C … are natural numbers.

    A block diagram for a linear Predictor is shown in Figure 31.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    36  

    Figure 31: Block diagram of a Linear Predictor.

    The values of the coefficient depend on the autocorrelation function of the signal in the frame we’re

    analyzing, so they are not constant numbers. Their optimum values are the ones which minimize the

    prediction error power:

    , , … ∶ argmin

    We can obtain them by using variable gain amplifiers. The D units are delay units.

    If the prediction error power is low, we can reduce the bit we need to quantizy the prediction error.

    The complete block diagram for a linear Prediction DPCM coder is shown in Figure 32.

    Figure 32: Transmitter with Linear Predictor for a DPCM.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    37  

    II.5 ADAPTIVE DIFFERENTIAL PCM Adaptive differential pulse-code modulation (ADPCM) is a variant of DPCM that introduces some

    improvements in order to obtain a further reduction of the bit rate. Actually, this technique can be

    applied to improve also standard PCM, LPCM, Log-PCM and so on, not only DPCM. It was

    developed in the early 1970s at Bell Labs for voice coding, by P. Cummiskey, N. S. Jayant,

    and James L. Flanagan.

    The basic idea of ADPCM is to adapt the quantization step to the effective dynamic of the

    signal we want to deal with. If we consider a DPCM, we can say that if the difference input signal

    is low, ADPCM decreases the quantization step, so it can quantizy this small value with a better

    precision. Otherwise, if the difference signal is high, ADPCM increases the quantization level, in

    order to cover the entire dynamic.

    Whit this technique, we need just a bit and we are succeed in working at bit rates lower than 8

    kb/s! However, ADPCM cannot produce satisfactory quality when bit rate is lower than 16 Kb/s.

    We can work until to 4 Kb/s, but we lose the quality of the signal and, in particular, the possibility

    to recognize the speaker.

    To further reduce the bit rate, we need to use speech signal parametric representations.

    The basic block diagram for an ADPCM transmitter is shown in Figure 33: it’s very similar to the

    block diagram of a DPCM shown in Figure 30, excepted for the presence of the interconnection

    between the Predictor and the Quantizer, due to the adaption of the quantization step.

    Figure 33: Transmitter for an ADPCM.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    38  

    In order to adapt the quantization step to the dynamic of the signal, we can use a multiplier that

    changes it according to the Predictor’s output. There are two basic and conflicting requirements

    during the design of the step-size multiplier: the need of a fast response and the prevention of

    excessive step-size alterations in a stationary or steady-state situation (in which no step change is

    requirement).

    There are two types of ADCPM configuration:

    - Adaptive Quantization Forward: the prediction is estimated on samples which haven’t

    been still quantized;

    - Adaptive Quantization Backward: the prediction is estimated on samples which have been

    already quantizied.

    For the adaptive quantization forward technique, input samples are memorized in a buffer and

    sent to the Prediction unit; then, the quantization step is changed. This technique can be

    implemented with a very simple structure but has two limitis:

    1) Introduces a delay related to the memorization of samples in the buffer and an additive

    amount of information to sent, so it makes the data rate higher.

    2) There’s no possibility to recover the analog signal, so we have problems about the

    realization of the receiver.

    Figure 34: Adaptive Quantization Forward.

    These problems are solved by the adaptive quantization backward technique, because it is

    implemented by using a feedback configuration. The buffer is put at the outside of the quantizer, so

    it doesn’t introduce any delay; plus, the predictor and quantizer information does not need to be

    transmitted: “side information” data rate is lower! The receiver can be realized just inverting the

    process. However, this technique is less precise because we adapt the quantization step according to

    past frames.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    39  

    Figure 35: Adaptive Quantization Backward.

     

    II.6 TIME DIVISION MULTIPLEXING

    Nowadays, PCM is the standard form of digital audio in computers, CD, digital telephony and other

    digital audio applications. This technique was proposed around 1930-1940s, when there was the

    need to increase the number of long-distance telephone connections. This requirement,

    however, conflicted with the difficulties and the cost associated to the large-conductor bundles,

    which were very bulky and difficult to connect. So it was thought to multiplex a large number of

    telephone connections on a single coaxial cable. This gave rise to the Time Division Multiplexing

    (TDM) technique, a very modern and efficient digital technique based on PCM.

    A general scheme of TDM technique is shown in Figure 36.

    Figure 36: Time Division Multiplexing.

    We have n channels, a switcher that selects one of them and a unique coaxial cable through which

    the information is transmitted to the receiver. The demultiplexing unit selects the channel that

    corresponds to the source channel and then the transmission is complete.

    The idea of the TDM is based on the ability to sample a speech signal and at the same time,

    during a sampling period, transmit another speech signal on another channel. Consider the same

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    40  

    sampling period (T 125μs) for each channel; it is divided into n time-intervals called time-slot; in each time-slot, the system transmits the sample generated from one among the n channels. All

    channels are served cyclically: it means that each channel transmits samples with a period T125μs. In a sampling period, a word of n samples is sent to the receiver.

    We can better understand this process looking at Figure 37.

    Figure 37: Time Division Multiplexing; transmission.

    The receiver receives a continuous flow of information: in order to allow a correct decodification

    and association of each sample to its relative channel, we need to sent to the receiver information

    about the time duration of the sampling period (the length of each word associated to each period)

    and the channel associated to each received sample. That’s why the information transmitted over

    the coaxial cable contains also the serial data lines (DATA), a reference frequency signal

    (CLOCK) and a synchronous phase (FRAME).

    Actually, the clock information may or may not be present according to the type of TDM; there

    are, in fact, two different types of TDM: synchronous and asynchronous. In synchronous TDM, the

    clock provides the synchronism of bits, while the phase provides a time reference to identify the slot

    in which each device is enabled to transmit or receive information.

    The number of channels and the rates are established by international standards.

    The European TDM allows to pass 32 channels simultaneously on a single coaxial cable without,

    of course, to interfere between them. Of the 32 multiplexed channels, 30 are voice channels (calls)

    and 2 channels of service: the channel n.0 is used to sent the clock information at the receiver and

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    41  

    the n. 16 for the phase information. It applies the G.711 standard: uniform or logarithmic (A-

    law) PCM 64 kb/s (sampling rate of 8 kb/s and 8 bits per code). That means that we obtain a rate of

    ∙ 32 ∙ 8

    125 2.048 /

    The American TDM allows to pass 24 channels and a single service bit: all channels are

    dedicated to calls. It applies the logarithmic (μ- law) PCM 64 kb/s (sampling rate of 8 kb/s and 8

    bits per code), so we obtain a rate of

    ∙ 1 24 ∙ 8 1

    125 1.544 /

    TDM can also be used within Time Division Multiple Access (TDMA), where stations sharing the

    same frequency channel can communicate with one another.

    An example of application that utilizes both TDM and TDMA is GSM.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    42  

    APPENDIX

    In the following you can see some real speech signals.

    The images were obtained in the LED 2, II floor of “Cittadella”, Politecnico di Torino.

    Instruments: Analog Oscilloscope Hameg 1004-3, Microphone , Power Supply.

    In Figure 38, two signals are show in order to point out the difference between a voiced and an

    unvoiced sound.

                    

    Figure 38: Difference between voiced signals and unvoiced signals.

    In Figure 39, an amplitude modulated signal is shown: it’s obtained by varying the tone of the

    vowel “a”.

    Figure 39: Amplitude modulation of the tone “a”.

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    43  

    In Figure 40, we can see four different signals; they are all related to the pronunciation of the

    “Hello” word but the word is spoken by four different people.

                 

                   

    Figure 40: “Hello” word spoken by four different people.

     

     

     

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    44  

    BIBLIOGRAPHY

    Texts:

    L.R. Rabiner & R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, 1978. ISBN

    0-13213603-1.

    D. Del Corso, Elettronica per Telecomunicazioni, McGraw-Hill, 2002. ISBN 88-386-0832-6.

    Jerry D. Gibson, Digital Compression for Multimedia: Principles and Standards, Elsevier

    Science (USA), 1998. ISBN 1-55860-369-7.

    (Partial version available on

    http://books.google.it/books?hl=it&lr=&id=aqQ2Ry6spu0C&oi=fnd&pg=PR13&dq=Speech+C

    oding+Methods,+Standards,+and+Applications+Jerry+D.+Gibson&ots=vJ8yfLOEV3&sig=ovrz

    OwYvkCLDU7kgBgusxljWeP0#v=onepage&q=Speech%20Coding%20Methods%2C%20Stand

    ards%2C%20and%20Applications%20Jerry%20D.%20Gibson&f=false )

    Wiley Encyclopedia of Telecommunications, John Wiley & Sons, 2003. ISBN 978-0-471-36972-1.

    ITU-T Recommendation ( extract from the Blue book), ITU, 1988,1993.

    Articles and slides downloaded from web (in May 2014):

    M. Hasegawa-Johnson & A. Alwan, Speech Coding: Fundamentals and Applications, University

    of Illinois at Urbana.

    (http://www.seas.ucla.edu/spapl/paper/mark_eot156.pdf )

    J. D. Gibson, Speech Coding Methods, Standards, Applications, University of California at Santa

    Barbara.

    (http://vivonets.ece.ucsb.edu/casmagarticlefinal.pdf )

  • Nadia Perreca, ID 211012 Speech coding: techniques, standards and applications

    45  

    D. Tipper, Digital Speech Processing, University of Pittsburgh

    ( www.pitt.edu/~dtipper/2720/2720_Slides7.pdf )

    D. P. W. Ellis, An introduction to signal processing for speech, Columbia University, 2008.

    It’s a chapter of the book by Hardacastle William J., The Handbook of Phonetic Sciences, edited

    by Wiley-Blackwell.

    (http://academiccommons.columbia.edu/catalog/ac%3A144483 )

    P. Cummiskey, Adaptive Quantization in DPCM Coding of Speech, The Bell System Technical

    Journal (volume 52, issue 7, pages 1105-1118), 1973.

    (http://www.alcatel.hu/bstj/vol52-1973/articles/bstj52-7-1105.pdf )

    Websites:

    http://en.wikipedia.org/wiki/Speech_processing

    http://en.wikipedia.org/wiki/Pcm

    http://www.itu.int/rec/T-REC-G.711/_page.print