eindhoven university of technology master speech ... · computers to make simulations of processes...

Eindhoven University of Technology

MASTER

Speech production by means of a hydrodynamic model and a discrete-time description

Bogaert, I.J.M.

Award date:1994

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Download date: 28. Aug. 2018

t.~tbOLi ·r

Institute for Perception Research Group Hearing and Speech Postbus 513 5600 MB Eindhoven

Speech production by means of a hydrodynamic model

and a discrete-time description

Igor Bogaert

period: October 1993 - August 1994

supervisors: Dr. Ir. R. Veldhuis Dr. Ir. A. Hirschberg

professor: Prof. Dr. A. J. M. Houtsma

This research is a part of a cooperation between the group Hearing and Speech of the Institute of Perception Research and the group Gasdynamics and Aero-acoustics

of the Eindhoven University of Technology.

Abstract

Two-mass models have been used for some years to account for the motion of the vocal cords. During this research we have developed a simple two point mass model for the vocal cords. This model had independent descriptions for the geometric part, the mechanical part and the hydrodynamic part. A simple geometry of the model, using point masses instead of masses with finite dimensions, made it easy to calculate the governing hydrodynamic and mechanical equations. In this research we used a moving separation point of the air flow leaving the surface of the glottis. Combined with viscous effects in the air flow, this gave good results for the calculated glottal pulses. Among others, we showed the effect of different collision models when the vocal cords contacted each other during the time the glottis was closed. The use of state-space approach to calculate the dynamic behaviour of the glottis resulted in a large reduction of calculation time. We needed this small amount of calculation time to produce vowels with our two point mass model. We used the two point mass model as the source for the source-filter model with which we produced vowels, a speech synthesizer. We considered the vocal tract (the resonance cavities above the glottis e.g. the throat, mouth) as the filter for the signal coming from the source (vocal cords). Further research is recommended on the speech synthesizer. Both the description and the parameters which we used for the two point mass model, need a closer look. A further modification of the vocal tract model should make it possible to produce consonants and thereby synthetic speech with the speech synthesizer. The speech we produced should now be perceptually evaluated in order to asses the usefulness of the proposed model.

Contents

1 Introduction

2 The human voice

2.1 The physics of speech .

2.1.1 Source of the speech signal: the glottis

2.1.2 Filter for the speech signal: the vocal tract .

2. 2 The phonetics of speech

2.3 The perception of speech

3 A simple two point mass model

3.1 Geometrical aspects

3.2 Mechanical aspects .

3.3 Hydrodynamical aspects

3.3.1 Basic hydrodynamic equations .

3.3.2 The quasi-steady approximation .

3.3.3 Separation of the flow . . . . . .

3.3.4 Formation of the jet and turbulent dissipation

3.3.5 The hydrodynamic force

4

5

5

5

7

9

11

14

14

15

17

17

19

19

21

22

3.4 The description of the collision 24

3. 5 The geometrical, mechanical and hydrodynamic aspects of the two point mass model combined . . . . . . . . . . . . . 25

3.6 Results of the simple two point mass model

3. 7 Conclusions about the simple two point mass model .

1

26

30

4 A more advanced two mass point model

4.1 The separation criterium ..... .

4.2 Pressure loss due to viscous effects

4.3 Calculation of the flux in the advanced model

4.4 Calculation of the hydrodynamic force in the advanced model

4.5 Results of the advanced two point mass model

4.6 The effect of the collision model on the flow .

4. 7 The effect of the viscous pressure on the flow .

4.8 Conclusions about the advanced two point mass model

5 The two point mass model as a source for sound production

5.1 Continuous time behaviour of the two point mass model ....

33

33

33

35

35

36

41

42

44

46

46

5.2 A discrete time model for the vocal cords: the state space aproach 47

5.3 A discrete time model for the vocal tract

5.4 The production of vowels with the speech synthesizer

5.5 Coupling of the source model and the filter model .

5.6 Conclusions about discrete time speech synthesizer

6 Discussion, conclusion and suggestions for future research

A Derivation of the hydrodynamical force working on a plane without

50

53

55

59

60

viscosity 62

B Derivation of the hydrodynamical force working on a plane with vis-cosity 65

2

Fliigelnacht

Fliigelnacht, weither gekommen und nun fiir immer gespannt iiber Kreide und Kalk. Kiesel, abgrundhin rollend. Schnee. Und mehr noch des Weissen.

Unsichtbar, was braun schien, gedankfarben und wild iiberwuchert von Worten.

Kalk ist und Kreide. Und Kiesel. Schnee. Und mehr noch des Weissen.

Du, du selbst: in das fremde Auge gebettet, das dies iiberblickt.

Paul Celan, 1955

Chapter 1

Introduction

" ... Und wild iiberwuchert von Worten ... "

Today, speech is so commonly used that it is hard to understand how people can live without it. Both language and speech are essential for commun~cation: transfer of ideas, exchange of information, expression of feelings and so on. Confronted with people who cannot use speech in the way we do, because they are deaf or cannot speak properly, we realize how essential speech is and how complex the process of communication can be. A lot of research on the speech organs of humans has been done. However, it is difficult to do direct experiments on these organs: it is hard to make models of human flesh to experiment with. What can be done, is develop a scale model for these organs with nearly the characteristics of humans like dimensions, weight and so on. Or we can use computers to make simulations of processes taking place in the speech apparatus. At the physics department of the Eindhoven University of Technology research on a model for the vocal cords has been done: especially the fluid mechanical aspects of this model were studied. Introducing new fluid-mechanical aspects in existing models for the vocal cords yielded some improvements of these models. The Institute of Perception Research (IPO) is interested in models for speech production that can be used for speech synthesis. In this respect, a model for speech production can only be succesful if it requires little calculation time and if the produced sounds resemble the human voice. The purpose of this research can be formulated as: develop a simple model for speech production realized in DSP techniques ("Digital Signal Processing" techniques) with a good quality.

4

Chapter 2

The human voice

In the physical sense, sound is a variation of the air pressure which propagates as waves. In case of speech, this pressure variation is produced by the speech organs. The air flow from the lungs passes some cavities and constrictions on its way, e.g. the vocal cords. The sound generated at the vocal cords is filtered by the acoustical properties of the speech canal (the throat, the mouth and the nose). This filtering enables us to distinguish between different speech sounds. In this chapter the speech organs are divided into two parts. The first part is the main source for the production of speech: the glottis, which is the channel between the vocal folds. The second part consists of the resonant cavities. These cavities induce changes in the spectrum of the sound generated at the source. Changing the shape of the resonant cavities makes it possible to produce different vowels.

2.1 The physics of speech

2.1.1 Source of the speech signal: the glottis

In Figure 2.1 we give a schematic representation of the human speech organs. We can distinguish the lungs, the trachea, the glottis and the cavities in the mouth and the nose. All speech organs have a function that has little to do with speech: e.g. the lungs are mainly developed for breathing, the mouth for eating and the nose for smelling. Their function for producing speech, came later in the evolution of the organs. Let us first have a look at the lungs. There are essential differences between the normal functioning of the lungs and the functioning during phonation. During normal breathing, the ratio of the time of inhalation and exhalation is 0.4 to 0.45. While phonation takes place, this ratio is radically changed to 0.16. During normal breathing, one will inhale a much smaller amount of air than during phonation. The function of the lungs in the phonation process is as follows: the muscles of the diaphragm make sure that a well-controled air flow leaves the lungs and goes to the larynx. The average subglottal

5

nose

mouth

glottis

trachea

lungs

Figure 2.1: A schematic representation of the human speech organs.

pressure (upstream of the glottis) is held more or less constant during this process. From the lungs, the air goes through the trachea towards the larynx, where the phonation takes place. The larynx consists of two pieces of cartilage: the thyroid and the crico!d. In the thyro!d we find the glottis. The glottis is the aperture between two muscles: the vocal cords (also called vocal folds). By successive opening and closing, a pulsating air flow is created. We shall call the amount of air that passes the glottis between a successive opening and closing of the vocal cords an air pulse. The vocal cords are connected with the aryteno!ds. These aryteno!ds can put the vocal cords in different positions. In normal breathing, the glottis will be opened widely. During speaking, the vocal cords are put towards each other, so that there is on average a very small aperture. There are two forces working on the vocal cords: the hydrodynamic force and the elastic force. Depending instantaneously on the shape of the glottis, the hydrodynamic force either pushes the glottis open or pulls the vocal cords towards each other. When the glottis is opened, the elastic restoring force pulls the vocal cords together. The air pulses are created in the following way: suppose the vocal cords are closed. Due to the

6

i Amplitude of the higher harmonics

(dB)

0 1 2

F (kHz)~

Figure 2.2: The spectral representation of the signal from the glottis.

subglottal pressure, the vocal cords are opened. In the opening phase, the hydrodynamic force pushes the vocal cords apart. At a certain point, the elastic restoring force will be so big, that the vocal cords are pulled back. Due to the resulting deformation of the cords, the hydrodynamic force will help the vocal cords closing. When the cords are closed, the subglottal pressure again opens the vocal cords and the process is repeated. During this process, air pulses are created with a frequency in the order of 100 Hz (for men). The fundamental frequency of the air pulses is called F0 . This frequency can be changed by changing the tension of the vocal cords or by changing the subglottal pressure. The spectral representation of these pulses is illustrated by Figure 2.2. This representation consists of the basic frequency F0 (e.g. 100 Hz) and every multiple of it (200 Hz, 300 Hz, 400 Hz, etc.). The spectral representation shows a typical decay of -12 dB/oct.

2.1.2 Filter for the speech signal: the vocal tract

The air pulses that leave the glottis can be led into three resonant cavities. These cavities are the nose, the mouth and the throat. The cavity of mouth and throat (with or without the nose) is referred to as the "vocal tract". The shape of these cavities can be changed. By pulling up the velum, the nose cavity can be opened. By moving the tongue, the volume in the mouth is changed. When the cavities are changed, the resonance frequencies of the vocal tract can be changed. This change in resonance frequencies leads to different spectral envelopes in the speech signal. We can look at the vocal tract as a filter for the sound from the glottis. We can characterize the vocal tract with a transfer function. This transfer function will show certain resonance frequencies. These resonance frequencies are called "formants". These formants are called F1 , F2 , F3 etc. respectively, as shown in Figure 2.3. The resulting (i.e. audible) spectrum of the speech signal can be obtained by taking the so called convolution of the Figures 2.2 and Figure 2.3. The spectral representation of the resulting signal is illustrated in Figure 2.4.

7

i Transfer function of a vocal tract

(dB)

0

/

1 2

Freq (kHz) --+

Figure 2.3: The transfer function of an idealized vocal tract. Three formants F1 , F2, F3 are indicated.

i Resulting amplitude of higher harmonics

(dB)

-

-,.

-

-

0

••

•• 0

I

•• 0

1

F2 0

F3

' •• ••

0

' 0

j i T j I

2

Freq (kHz) --+

Figure 2.4: The spectral representation of the signal after passing the vocal tract.

8

•

front [i]

close

[~]

-----------[a] open

Figure 2.5: The Cardinal vowels. At the horizontal axis we can see whether the position of the tongue is more towards the front or towards the back of the mouth. At the vertical axis we can see whether there is a lot ofroom in the mouth ("openn) or there is less room in the mouth ("close").

2.2 The phonetics of speech

The phonetics of speech concern the way in which the speech organs in the vocal tract (the supra-glottal speech organs) can produce different sounds. While discussing this, we will distinguish vowels and consonants. Vowels are produced by an oscillation of the glottis and are relatively little obstructed in the vocal tract. The sound one makes when producing a vowel is nearly periodic. The main articulator during the production of a vowel is the tongue. The Englishman Daniel Jones Vowel classified the vowels according to the position of the tongue in the mouth and introduced the Cardinal Vowels, which are shown in Figure 2.5. These Cardinal Vowels are artificial vowels. They resemble the vowels in the following words: [i] resembles the vowel in the French word si; [e] resembles the vowel in the French word the; [c-] resembles the vowel in the French word meme; [a] resembles the vowel in the French word la; [a] resembles the vowel in the French word pas; [~] resembles the vowel in the German word Sonne; [o] resembles the vowel in the French word rose and [u] resembles the vowel in the German word gut.

According to Figure 2.5, the position of the tongue is in front of the mouth and closed when producing an [i], its position is low and backwarts when producing the [a] etc. The vowels in e.g. the English language can be described as a combination of the Cardinal Vowels. For example, the [c-] in the English word bed can be seen as a combination of the Cardinal Vowels [e] and [c-]. In Figure 2.6 we present the position of some English vowels with respect to the position of the Cardinal Vowels, a colon indicates that the preceding vowel is long. A definition of all vowels in this picture is given in [l]. Besides these vowels, there exist nasal vowels, as in the French sentence "un bon vin blanc". These vowels can be produced by positioning the velum in such a way that air is released both through the mouth and the nose at the same time. Another category

9

front

close

open

Figure 2.6: The English vowels in relation to the Cardinal Vowels

vowels is called "diphthong". When hearing a diphthong, the final sound differs clearly from the initial sound, however we cannot speak of two separate vowels. During the production of a diphthong, a smooth transition between the vowels takes place, for example in the French word lui.

To make a consonant, the air flow is much more obstructed by the vocal tract. The speech canal is constricted or totally closed. The first property of consonants is that they can be voiced or unvoiced. Examples of voiced consonants are the initial sound of the words bed, desk, voice. During the production of these sounds, the vocal cords are oscillating. Unvoiced consonants are the initial sounds of peak, tall, sound. A further classification of the consonants is made by looking at the way the articulation takes place and the place where the articulation takes place. We can classify the way of articulation of consonants as next:

• occlusive: occlusive consonants occur when the air stream is intermitted during a short time. After this intermission, the air is released with "explosion" -like sound. Examples of voiced occlusives are [b, d]; unvoiced occlusives are [p, t, k].

• nasal: nasal consonants occur when the air stream cannot pass the mouth canal. Through the open velum it is released by the nose. Examples are: [n, m].

• lateral: laterals occur when the air can pass on sideways of an obstruction in the mouth canal. Example: [l].

• trill: a trill sound is generated when one articulator (e.g. the tongue) trills against another (the velum) due to the streaming air. An example is the "rolling-r"

• fricative: a fricative is generated when two articulators are brought together so close, that the passing air results in a turbulent stream. This gives a "hissing" sound. Examples of unvoiced fricatives are: [!, s, x], voiced fricatives are: [v, z].

10

When denoting the place of articulation, we name the two articulators that are brought together to constrict the passing of air. We know the following places of articulation:

• bilabial: biliabial consonants can be made by constriction of upper and lower lip. Bilabial occlusives are [b, p].

• labiodental: labiodentals can be made by contacting the lower lip and upper teeth, for example [!, v].

• dental: when producing a dental consonant, the tip of the tongue contacts the upper teeth. This happens at the beginning of the English words think, this.

• alveolar-. when the tip of the tongue contacts the alveolum. Examples of alveolar occlusives are [t, d].

• retroflex-. when the tip of the tongue is curled towards the velum and contacts the alveolum, a retroflex consonant is formed. Examples can be found at the end of the Swedish words kart, bard.

• palatal: when the tip of the tongue contacts the hard part of the palate, like the beginning of the French cher, jour, a palatal consonant occurs.

• velar-. velar articulation means that the back of the tongue contacts the velum, like the [g] of the French word garr;on.

• uvular-. when the back of the tongue contacts the uvula, a uvular consonants occurs, like the Dutch (uvular) [R]

• pharyngal: when producing a pharyngal consonant, the back of the tongue contacts the walls of the throat. These consonants are used in Arabian language.

• glottal: glottal articulation is a special case because the vocal cords themselves act as articulators. When the vocal cords are firmly closed and suddenly open force, a plosive sound is heard. In language, this way of articulation occurs when a word starts with a vowel and is the first word in a phrase.

2.3 The perception of speech

The pitch of a sound can be defined as the perceived impression of the frequency of repetition of the wave. This means that pitch is closely related to the (basic) oscillation of the glottis. The normal range of fundamental frequencies in the male voice extends roughly from 80 to 300 Hz, that of the female from 200 to 450 Hz. In Figure 2.7 we can see the range of hearing as a function of frequency and dynamic volume. We see that for speech only a very narrow range is needed. The upper frequency limit for speech

11

~ooi

I aoi

I 601-

1

40f-I I

i 20t-

I i

Ql__~--'~~~~~~~~~-""'.~~~~,,.....~~~

20 50 100 200 500 1 k Si<. ~0 kH Z

Figure 2.7: The area of speech and music perception inside the limits of overall hearing. The horizontal axis gives the frequency of the sound in kHz; the vertical axis gives the loudness of the sound in dB.

800 a

i 600 a c

F1 (Hz) ce I

400 0 0 e

200 u y

500 1000 1500 2000 2500

F2 (Hz)~

Figure 2.8: F1 - F2 diagram of vocals.

12

is 12 kHz.

The formants in the speech signal determine the sound we hear. The higher formants

make it possible to distinguish different vowels and to change the timbre of the sound. Vowels can be characterized by F1 and F2 , see Figure 2.8. The first three formants determine the identity of a vowel. The formants F4 and F5 determine how natural it sounds. When we ant to produce synthetic speech, we have to take notice of the fact that vowels are crucial for the intelligibly of the human voice. Both vowels and voiced consonants have a pitch. However, we cannot ascribe a pitch to unvoiced consonants because unvoiced consonants can have a non-periodic character or they can have a large component of noise (hissing sounds). Though we can hear whether there is a large amount of high- or low-frequency energy present in the signal, we cannot ascribe a pitch to these sound.

13

Chapter 3

A simple two point mass model

In the preceding chapter we saw that the basis for voiced speech production is laid in the glottis. In the glottis the vocal-fold oscillation generates a time-varying airfiux, which drives acoustical oscillations in the vocal tract that can be heard as sound. In this chapter we propose a simplified physical model of the vocal cords and we give the equations which describe the processes in the glottis during the production of voiced sounds. In our model for the glottis two coupled point masses describe the mechanical oscillation of each vocal fold. The motion is driven by the hydrodynamic force due to the flow through the glottis. For the sake of simplicity, we assume that this hydrodynamic force acts on the first mass, whereas the movement of the second mass is determined by the coupling with the first mass. In this chapter the treatment of the model is divided in three parts: the geometry of the model, the mechanics of the model and the hydrodynamic processes that occur in the model.

3.1 Geometrical aspects

The geometric aspects of the simple two point mass model can be seen in Figure 3.1. This model is quasi two-dimensional: we assume that the flow through the glottis is constant in the z direction. This means that our processes will be dependent only on x and y positions. In subsequent figures, the z direction will be omitted. The vocal cords are two symmetric bodies in the air pipe. Each vocal cord consists of three plates, all having length Lg in the z direction. The first plate makes an angle a with the x axis, the second makes an angle ,B"with the x axis and the third plate makes an angle of 90° with the x axis. During the process of phonation, the angles a and (3

vary and the plates are stretched accordingly. The angles a and (3 are defined by:

a = arctan ( ho-hi ) 2(x1 -xo)

(3 = arctan ( h 2-h 1 ) 2(x2-x1)

14

(3.1)

_....z

ho Sub-glottal region Supra-glottal region

Xo

Figure 3.1: Three-dimensional representation of the quasi two-dimensional model of the vocal cords with definitions of the x, y and z axis.

where h0 , h1 and h2 are defined in Figure 3.1.

3.2 Mechanical aspects

Figure 3.2 shows the mechanical features inside the vocal cords. An important feature of the model is that there are two mass points (represented as black circles, denoted by m1 and m 2 ) present in each vocal fold (when we would take into account the third dimension, we should call them "line masses"). The masses are positioned at the edges of the plates, see Figure 3.2. The mechanical part is covered by three plates to separate it from the flow (see Section 3.1). It is very important to notice that the motion of the masses is restricted to the vertical, y direction. When the mass points move, the shape of the vocal fold is changed. The plates are stretched in a certain direction and the angles a and /] are changed. The position of m1 and m2 in the vertical y direction is denoted with y1 and Y2, whereas the opening of the glottis at m1 and m2 is denoted with h1 and h2 . From the symmetry we have used in our model, it is clear that h1 = 2y1

and h2 = 2y2 when the glottis is opened. In the mechanical part, there are three main springs connected to the two masses: k1 ,

k2 , k12 . Those springs have two effects: k1 and k2 force the masses towards their rest position (Yo,1 , Yo,2) and represent an elastic restoring force; k12 couples the motion of m 1 and m 2 • From Figure 3.2 it might look as if the k12 spring gives a horizontal and a vertical force component. Because the masses m1 and m2 can only move in the y

direction, we have modelled the k12 spring in such a way that there is only a force component in the y direction. The rest length of the k12 spring is given by Yo,12 , which is taken positive if the rest position of the k12 spring implies a diverging glottis. Besides

15

Air -

Ye Ye Plane of symmetry ,·c·i Y1 i Yo,1 -_____ 7n!

. !

Trachea Yo,2 Vocal tract

I Y2

Figure 3.2: Model for the vocal cords with two point masses. The positions of mass 1 and mass 2 are denoted with YI and y2 • The rest positions of the masses are denoted with Yo,I and Yo,2· In the figure, the masses are in their rest position: YI = Yo,1,

Y2 = Yo,2·

16

three springs, two dampers r 1 and r 2 have been introduced in the model. Owing to r1 and r2 the free oscillations of this system are damped. The resulting motion is a steady oscillation: a forced motion owing to an external force. Newton's second law yields the following:

LFi =m:Y (3.2) i

Newton's law states that the mass m times its accelaration y equals the sum of forces that act on that mass ~i Fi. Knowing the mechanical description of this model, we know the elastic and the damping forces on the mass. When we take into account an external force Fext,i for both masses, Newton's law, applied to mass 1, yields:

For mass 2, we can write nearly the same equation, with exchanged indices 1 and 2:

3.3 Hydrodynamical aspects

When the air flow, from lungs and trachea (the sub-glottal region) enters the glottis, it is compressed due to the small opening of the glottis. Because of the continuity of the flux Ug, the speed of the flow is higher in the glottis than in the trachea. Thanks to Bernoulli's law we can calculate the pressure at different places in the glottis, as long as the viscous dissipation remains negligible. Finally we can describe the motion of the glottis owing to the hydrodynamic force, which is the integral of the pressure. In this section we will give the basic hydrodynamic equations from which we will derive equations for the flux and the hydrodynamic force.

3.3.1 Basic hydrodynamic equations

Two basic equations in hydrodynamics are the mass-conservation law or continuity equation (3.5) and the momentum conservation law (3.7). The continuity equation states the following:

:t Iv pdV + ls p(Jl. · rr)dS = 0 (3.5)

Imagine we have a fixed volume V, delimited by the surface S, with outer normal rr, p is the gas density, and Jl. is the gas particle's velocity. Now the continuity equation tells us the following: when the mass (the integrated density) of gas particles in volume V

changes ( = ft f v pdV), there has passed a flux (f 8 p(Y. · rr)dS) through the surface S to transport the gas particles out of volume V. We will see in the following paragraph that we can neglect the first term in this equation.

17

v

n ..

P·

J_ s --- --- ---

Figure 3.3: The continuity equation.

What remains is a relation for the speed of the flow and the area over which we integrate. Suppose the flowspeed vx(x) is uniform, has an x component only and is dependent on the x coordinate. Integration over a surface A perpendicular on the flowspeed gives:

pvx(x)A(x) =Constant (3.6)

Besides the continuity equation, we use the momentum equation:

dd f pQdV = f F0dV + f '!lTdS t lv lv- ls (3.7)

This momentum equation tells us that the change in time of the impulse ft fv py_dV of a certain volume V (first term) equals the sum of forces that act on the volume: the integrated volume forces fv F0dV (e.g. gravity) and the surface forces fs '!lTdS (T is the stress tensor: in case we neglect the shearing stress, it reduces to -pl where p is the pressure and I is the unity tensor). As the vocal folds are small compared with the acoustical wavelength, and the pressure differences are small compared with the atmospheric pressure, we can assume the density p to be constant. We use Gauss' theorem to rewrite the surface integral of Equation 3. 7 as a volume integral. Having done this, we get the differential form of Equation 3. 7 by looking at the integrand only. After that, the total derivative in Equation 3.7 is separated in a partial time derivative and a convective term. We suppose that there are no external forces F0 present, that the flow is quasi parallel (this means 'Q = (v(x), 0, 0)) and we assume there is no shearing stress (this last assumption is reasonable because the typical Reynolds number Re= h~v. is of the order 103). Thus,

18

Equation 3. 7 reduces to: av av 8p

p-+pv- = --8t ax ax (3.8)

So, if we assume that the flow is locally incompressible, integration over x yields the equation of Bernoulli:

(3.9)

Where </>is the flow potential, defined by </> = J vdx, p is again the density of the flow, p is the pressure and v is the flow velocity. We can look at this equation as a law of mechanical energy conservation. We can arbitrarily set f(t) = 0 (or incorporate it into </>) as we are only interested in the velocity v = *. In ( 3. 9), the first term gives the energy owing to the inertia of air. The second term is a contribution from the potential energy and the third term is a contribution from the kinetic energy.

3.3.2 The quasi-steady approximation

When we look at the different forces that act on the air flow through the glottis, we can ask ourselves which forces dominate the movement. When we know the order of magnitude of these forces, we can make a crude estimate whether we can neglect certain forces. Let us compare the unsteady inertial forces Finert with the convective inertial forces Fconv due to the non uniformity of the flow. We compare these forces by means of the Strauhal number Sr. When the Strauhal number Sr is small, Finert can be neglected with respect to Fconv· The Strauhal number is defined by Sr = f(x 2v-xi)

where v is the mean velocity of the particles in the glottis ( v ~ 40 m/ s), ( x 2 - x1 ) is the length of the glottis ((x2 - x1 ) ~ 2.10-3 m) and f is the fundamental frequency of the vibration (! ~ 100 Hz). With these numbers we find that Sr ~ 10-2 . So we can neglect the unsteady inertial forces in a first order approximation. We call this approximation the quasi-steady approximation.

In the quasi-steady approximation, both time-dependent terms in Equation 3.9 and Equation 3.5 cancel. When we suppose that the flowspeed vx(x) has an x component only and both the pressure p(x) as the flowspeed are a function of the x position, the remaining equations are:

pvx(x)A(x) =Constant

1 p(x) + 2pvx(x) 2 =Constant

3.3.3 Separation of the flow

(3.10)

(3.11)

Imagine a flow streaming along a wall, as shown in Figure 3.5. Flow separation occurs when the velocity of the flow near the wall only has a velocity component normal to the wall and a zero velocity component in the direction of the wall: ~~ = 0. We denote the x position of this point with x 5 and the aperture of the glottis at this point is denoted

19

Sub-glottal region Supra-glottal region

X = Xo X = Xs

Figure 3.4: Definition of the separation position.

Figure 3.5: Flow lines at a separation point Xs of a flow along a wall.

20

with h5 , as can be seen in Figure 3.4. Because there is no simple accurate theoretical prediction for separation of a flow, we have to chose a criterium for the occurence of separation: we assume that there is a maximum ratio between the aperture of the glottis hi, and the position of the separation points h5 , given by:

(3.12)

We have derived this equation from some experiments done in [2]. This equation shows us that in a convergent glottis the separation point will always be at the end of the glottis (xs = x 2 ). In a divergent glottis, the separation point will be closer to mass 1.

The separation point depends on hi and h2 and is given by Equation 3.13. We can calculate the x position Xs by substituting the condition hs = 1.lhi in the equation for

the line h(x) =hi+~~=~~ x. Assuming that xi is at the origin (xi = 0) we get:

£; < 1.1 : (3.13)

£; > 1.1 : X - x2h1 . h 1 lh 5 - io(h2-h1)' 5 = . i

3.3.4 Formation of the jet and turbulent dissipation

After the flow separation, the air flow forms a free jet. In this jet the pressure is uniform. The pressure in the jet equals the pressure at the separation point. The pressure at the entrance of the glottis is called p0 and the pressure at the separation point (which equals the pressure in the free jet) is called Ps· We apply Bernoulli's law (3.11) between these points:

(3.14)

Because of the fact that hi, h2 « h0 , we can neglect the velocity v0 at the entrance of the glottis (see Equation 3.10). When we want to have an expression for the velocity V5 , Equation 3.14 gives:

(3.15)

Figure 3.6 shows an arbitrary smooth model of the vocal cords. Just as our simple two point mass model, it has got a third dimension with length Lg. The aperture of the vocal cords where separation occurs, is denoted with h5 • The flux of air Ug that passed the glottis, is defined by:

(3.16)

After the flow has passed the glottis, the free jet can become turbulent. If the Reynolds number Re= h~v. (with v the kinematic viscosity) is Re~ 100, the jet will be laminar. But when the Reynolds number is bigger than a certain critical value Ree ~ 3000, the jet will be turbulent, immediately after it has passed the glottis. An example of a

21

Figure 3.6: The flux that passes the glottis.

turbulent jet flow is shown in Figure 3. 7. The turbulence implies a strong dissipation of

kinetic energy. Let us return to the definition of the flux and the separation criterium.

The definition of the flux Ug is given by: Ug = Lgvshs and the separation point of

the flow is given by ~ :::; 1.1. When the flow is stationary, the flow velocity V5 is

constant. Now hs and Ug, are given by: hs = h2, Ug = Lgvsh2 if~ < 1.1 or hs = l.lh1,

Ug = l.lLgvsh1 if ~ > 1.1. From these expressions, we can see that the magnitude of

the flux is directly proportional to the position of either mass 2 (if ~ < 1.1), or mass

1 (~ > 1.1).

3.3.5 The hydrodynamic force

The vertical hydrodynamic force Fh on the vocal fold is the integral of the pressure on

the contact surface between the flow and the wall. If we neglect friction, this force is determined by the pressure p:

(3.17)

Applying Bernoulli's law (3.11) between the entrance of the glottis and a certain point x gives an expression of the pressure as a function of the x position.

Xo < X < Xs (3.18)

Where v0 and v(x) are the flow speeds at position x0 , position x respectively. In terms

of the flux Ug this equation becomes:

1 ( u )2

1 ( u )2

Po+ 2,P Lg1o = p(x) + 2,P Lgh(x) Xo < X < Xs (3.19)

This gives the following expression for the pressure as a function of x:

1 ( u ) 2 1 ( u )2 p(x) =Po - 2P ~ + 2P ~ Xo < X < Xs (3.20)

22

vocal cords turbulent area

Figure 3.7: A visualisation of the turbulent area of the flow that passed the glottis.

Fh depends strongly on the shape of the glottis because we integrate over a surface S. Calculating Fh, we have two different cases: the glottis can be opened or closed. For

the opened glottis, Fh can be written as (for derivation see Appendix A):

Fh = L, f p(x) dx = [-L,xo {Po - ~ ~~ (h,1hJ}] + [L,x+, - ~ ~~ ( h~~h~1 )}] (3.21)

When the flow cannot enter the glottis, the masses make physical contact. We assume

that there will be a constant pressure force in this case:

(3.22)

We see, that if the glottis is closed, the flow pushes the two masses apart. The hydrodynamic force turns out to be an instantaneous function of the positions of the masses. In the numerical calculations we make, the pressure p0 is described as a smooth in

creasing function of time, reaching a maximum value of 784 Pa (7840 dyne/cm2

) in 10

milliseconds.

23

3.4 The description of the collision

The solution of the two coupled Equations 3.3 and 3.4 combined with the expression for the hydrodynamic force (3.21), (3.22) describes the motion of the vocal cords. This solution gives the positions y1 and Y2 as a function of time. It may occur that y1 or Y2

become negative, yielding a mathematical correct, but physically unrealistic solution. A negative y1 or y2 means that the opposite masses have moved through each other. To avoid negative y positions, we can construct a model for the processes that occur when a collision takes place. The first model is the inelastic collision model. This model is formulated by:

if Yl < 0 :::} Y1 = 0; Y1 = 0 if Y2 < 0 :::} Y2 = O; Y2 = 0

(3.23)

This model tests whether e.g. y1 is negative, if it is, y1 is set to zero and the velocity of m 1 is also put to zero. However, a problem occurs when we simply implement this collision model: the glottis will never be in a closed phase. When the glottis is closed, it will immediately be forced open by the pressure force at the entrance of the glottis. As has been shown from high speed motion picture photography [3] there is a finite ratio between the time the glottis is opened (open glottis phase) and the time the glottis is closed (closed glottis phase). From experimental data [4] it was derived that this ratio is in the range of 0.4 - 0.8. Taking this into account, it is clear that we have to introduce a mechanism that keeps the glottis closed for a while. We can model this with a negative rest position of the masses m1 and m 2 . When the rest position of the masses is negative, the springs k1 and k2 will balance the pressure force at the entrance of the glottis for some time when the glottis is closed. The second collision model is the elastic model. In order to prevent negative y positions to occur, we introduce non linearities in the springs and the dampers in the mechanical part of the model. We will alter the constants of the springs and dampers when the position of the masses m 1 and m 2 (y1 and y2 ) have passed a critical distance Ye, defined in Figure 3.2. In this way, the original motion of the masses is changed in such a way that both y1 and y2 (nearly) don't become negative anymore. As an example, we show the value of the k1 spring as a function of h1 in Figure 3.8 We shall denote the spring constants in the open glottis phase to be k 1,open and k 2,open and in the closed glottis phase to be ki,clo and k2,c10:

if Y1 <Ye :::} ki = ki clo ,

Y1 >Ye :::} ki = kl,open

(3.24) Y2 <Ye :::} k2 = k2 clo , if Y2 >Ye :::} k2 = k2,open

24

t stiffness

kI,open

Figure 3.8: When hI is smaller than a critical value he, the stiffness kI reaches the value kI,clo.

In the following, we have adopted the ratio kki,clo = kk2,cl

0 = 4, and is similar to that l,open 2,open

in [2]. We use for the damping constants:

if YI< Ye ::::} rI = 2(I,cloVmikI YI> Ye ::::} rI = 2(I,openVmikI

(3.25) if Y2 <Ye ::::} r2 = 2(2,c1ov'm2k2

Y2 >Ye ::::} r2 = 2(2,openv'm2k2

The damping coefficients ( are similar as in [5], in [2]. They are considered as typical for the vocal cord model:

(I, open (I,clo

0.1 (2,open 1.1 (2,clo

0.6 1.6

(3.26)

3.5 The geometrical, mechanical and hydrodynamic aspects of the two point mass model combined

Equation 3.3 and Equation 3.4 gave us Newton's second law, applied to the two point mass model. Knowing the explicit form of the hydrodynamic force, Equation 3.21 and Equation 3.22, we can express the accelaration of mI and m2 in terms of the positions and speeds of those masses:

(3.27)

25

For mass 2:

(3.28)

For the sake of simplicity we will further assume that Fh,2 = 0 and we will set Fh,i = Fh.

The equations ( 3.13), (3.15), (3.21), (3.22), (3.27), (1.28) combined describe the motion of the masses of the two point mass model. When we know the movement of the masses, it is possible to calculate the flux of air that passes the glottis. A schematic view of this motion is presented in Figure 3.9. We assume that the process of oscillation starts when two masses in the glottis are at their rest position. Initially, the hydrodynamic force opens the glottis by pulling the vocal cords apart from their rest position. At a certain point, while the hydrodynamic force is decreasing, the restoring force of the first spring (with spring stiffness ki) will pull so hard, that mass 1 is pulled back. At that time, mass 2 isn't pulled back yet, but it still has a positive speed so that it moves in its original direction. By these two opposite effects, the position of the two masses, with respect to each other, changes. By this different relative position, a pressure minimum is induced and the resulting hydrodynamic force and the elastic force pulls the vocal cords towards each other. This helps closing the glottis. In the following, we will first describe the mechanical equations and then we will descibe the governing hydrodynamic equations. The constants and parameters we used during the calculations, shown in table 3.5, are derived from [6].

3.6 Results of the simple two point mass model

With the equations and constants presented in the previous section, we can calculate the motion of the masses and thereby the flux that passes the glottis. We only adopted the elastic collision model. In the following chapter, we will also show the effect of the inelastic collision model. We will calculate the two point mass model without a vocal tract. This implies that Ps = 0. In Figure 3.10 we can see the opening of the glottis at the position of mass 1 (hi) and mass 2 (h2), respectively. This motion is determined by the hydrodynamic force, the elastic force and the damping in the model. Since the external hydrodynamic force Fh acts only on mi, and m 2 is coupled with mi by the spring ki2, it is clear that m2 follows the motion of mi with some time delay, as can be seen in Figure 3.10. We pointed out in Section 3.4 that the solution of the differential equations generate negative positions for the masses. During this time we have put hi or h2 to zero, respectively. When either hi or h2 is negative, we assume that the glottis is closed. A possible justification for this is given in [5], where during the closed glottis phase a deformation of the vocal cords is assumed. One should consider mi and m2 as mass midpoints rather than as geometrical objects. This interpretation is also used by others [2], [7]. In Section 3.3.4 we showed that the flux fully depends on the variables hi and h2 . So

26

1.

2.

3.

4.

F: 1 5. lJ _ --~, n---

\Ll ___ _ 6.

rn----\Ll ___ _ 7.

/fl----

U ___ _ 8. \Ll ___ _ rn----Figure 3.9: The different stages during the movement of the glottis. The vectors beside the figures denote the hydrodynamic force Fh and the elastic restoring force from the springs Fs, both acting on the upper part of the glottis. The vectors in the drawing

denote the resulting speed JL.

27

I Symbol Numerical Value /

m1 0.17 g

m2 0.03 g (x2 - X1) 0.2 cm (x1 - xo) 0.025 cm

kl,open 45 kdynes

kl,closed 180 kdynes

Yo,1 0.0 cm

k2,open 8 kdynes

k2,closed 32 kdynes

Yo,2 0.0 cm

k12 25 kdynes

Yo,i2 0.0 cm he 0.001 cm

ho 2cm

r1,open 17.5 dynes cm s-1

r1,closed 192.4 dynes cm s-1

r2,open 18.6 dynes cm s-1

r2,closed 49.6 dynes cm s-1

Po 7840 dynes cm-2

Table 3.1: The constants used in the calculations of the simple two point mass model. All constants are given in CGS-units.

28

0.1

0.08

i 0.06

h1, h2 (cm) 0.04

0.02

0

-0.02 0 0.01 0.02 0.03 0.04 0.05

t (s) ---+

Figure 3.10: The position of mass 1 (thin line) as a function of time and the position of mass 2 (bold line), calculated with the simple two point mass model.

when we know the motions of mass 1 and mass 2, it is possible to construct the flux: by putting h2 and 1.lh1 in the same drawing and to test the condition whether ~ is smaller or bigger than 1.1, we have (up to the constant factor Lgvs) the air flux that passes the glottis. In Figure 3.11 we can see the flux as a function of time. A discontinuity can be observed. When the flux has reached its maximum value, an abrupt transition takes place: in the part of the curve with positive slope the flux is determined by the motion of mass 2 (the separation point is at the end of the glottis). At a certain point where the condition hs = l.lh1 is satisfied, the flux is determined by the motion of mass 1 (the separation point has begun to move and is up to a constant factor equal to the position of mass 1). At this transition point a discontinuity is introduced in the first derivative of the flux as can be seen in Figure 3.12. From Figure 3.10 it is clear that (using this separation criterium) there will always be such a peak unless the derivative of h1 and h2 are equal at the transition point. This condition can be satisfied, but it requires a very specific set of parameters. During normal phonation, where the parameters can continually change, it is unlikely that this condition is satisfied. Furthermore, we see in Figure 3.11 that the first pulse has a different shape than the other pulses. This is mainly due to the building up of the pressure p0 . This leads to a smaller flow speed of the air inside the glottis and to a smaller pressure force than in a

29

400

200

100

0 '--~=--~~--'-~-'-~_.£_~-'--~~'--~_,_...._~~___j'--___jc...<.._~~~-----'-..J 0 0.01 0.02 0.03 0.04 0.05

t (s) ~

Figure 3.11: The time-dependent flux, calculated with the simple two point mass model.

stationary movement of the vocal cords. The pressure has reached its maximum value at t = 0.01. From this moment, a steady pulse remains. We see that there are three discontinuities present in the first derivative of the flux in Figure 3.12. The first discontinuity arises when the glottis opens, the second arises when the separation point begins to move upstream and the third arises when the glottis closes. The fact that there is a discontinuity at the opening and at the closing of the glottis, gives rise to an abrupt generation and termination of the flow. In Figure 3.13 we can see the hydrodynamic force as a function of time. When the glottis is closed, a constant pressure force remains as a part of the hydrodynamic force. When the glottis opens, the hydrodynamic force suddenly increases and then decreases. The minimum of Fh is reached when the separation point starts to move upstream.

3. 7 Conclusions about the simple two point mass model

Our model of the vocal cords has a very simple geometry and the fluid dynamical and mechanical description we gave, contains little parameters that have to be adjusted. The description implies simple calculations (surface integrations over a simple

30

i <!:!!..r.. dt

100000

(cm3 /s)

-100000

-200000

-300000 ~---~----~---~----~---~ 0 0.01 0.02 0.03 0.04 0.05

t (s) ~

Figure 3.12: The derivative of the flux, calculated with the simple two point mass model.

geometry), which we can computate very efficiently. The desciption makes a good understanding and interpretation possible for various aspects of the two point mass model, e.g. the correlation between the motion of the masses, the separation point and the flux. However, the separation criterium introduces a discontinuity in- the derivative of the flux which seems to be far from realistic. Also, the abrupt closure of the cords gives rise to discontinuities of the flux derivative. This makes it necessary to look for another description of hydrodynamic behaviour. In the following chapter we will introduce another separation criterium and viscous effects to get a smoother shape of the glottal pulse.

31

2000

i 1500 Fh

(dyne)

1000

500

0 '""-~~~~--'--~~~~-'-~~~~~-'--~~~~-'-~~~~--' 0 0.01 0.02 0.03 0.04 0.05

t (s) --*

Figure 3.13: The hydrodynamic force as a function of time, calculated with the simple two point mass model.

32

Chapter 4

A more advanced two mass point model

We noticed in Chapter 3 that our model had some shortcomings, including discontinuities in derivatives when a change of state occured. In this chapter we introduce some new elements in the two point mass model to avoid sharp peaks in the glottal pulse and discontinuities in the derivative of the flux in the opening and the closing phase. Therefore we introduce viscosity as a correction term in Bernoulli's equation. The introduction of this correction makes it necessary to calculate the :fiux and the hydrodynamic force in another way.

4.1 The separation criterium

In the first place we will use a new separation criterium. This criterium is derived from [8]. The expression we will use, is a fit to experimental data. The separation criterium has the form:

_ 2h (x2-x1) Xs - i f3

hs = 2 tan (/])xs + hi, (4.1)

where the symbols are defined in Figure 3.1 and Figure 3.2. The constant 2 in ( eq:sepcritnew) has the dimension (rad cm-i ). The second expression in 4.1 simply follows from the fact that hs is a point on the line h( x) = 2 tan (/])x + hi. Like the former separation criterium 3.13, this separation criterium is only dependent on the geometry of the glottis. Given the positions of mi and m2 , we can determine the position of the separation point without any knowledge about the history of any variables.

4.2 Pressure loss due to viscous effects

Viscosity is a phenomenon that results from the internal friction forces in a fluid or gas. When a fluid is viscous, shear stress can become an important factor in describing the

33

1L t--------

Figure 4.1: A schematic representation of the speed distribution of a viscous fluid near the wall.

the fluid flow. This results into a different speed distribution of the flow near a wall, as one can see in Figure 4.1. In this section, we will adopt a Poiseuille term to account for the viscous effects in the glottis. This corresponds to a lower bound estimate for these viscous losses. When we look at a flow with a fully developed velocity profile in a channel (length Lg) of uniform height h the pressure profile between the two plates where viscous effects are present is given by a Poiseuille term:

d p - 12µ u. d x - - L h3 g

g (4.2)

A derivation of this equation is given in [6]. The constant µis the viscosity of air. The viscous losses are largest at the minimum aperture of the glottis. The pressure loss due to viscous effects 6.pvisc (which is positive) is given by the integration of Equation 4.2:

12µUg 1x· 1 6.pvisc = Lg xo h3(x) dx, (4.3)

with x0 and Xs the x position of the entrance of the glottis and the x position of the separation point of the flow, respectively. The parameter µis the dynamic viscosity of air, Ug the volume velocity or the flux, Lg the dimension of the glottis in the z-direction, and h( x) the opening of the glottis as a function of the x position. From Figure 4.1 it is clear that viscous effects influence the flux which passes the glottis and Equation 4.3 indicates that the hydrodynamic force exerted on the masses m 1 and m 2 , will change under viscous effects. In the following we will adopt this Poiseulle term for viscous losses to explain the behaviour of the flow during the opening and the closing of the glottis. In the following sections we will calculate the effects of the different pressure distributions for our calculations of the flux (3.15) and the hydrodynamic force (3.17).

34

4.3 Calculation of the flux in the advanced model

We apply Bernoulli's equation between the entrance of the glottis and the separation

point of the flow:

( 4.4)

Where J:; 12f~s h3(x) dx represent the viscous losses that occur in the glottis. We can simplify Equation 4.4 by neglecting the second term (because hs « h0 ). Furthermore we integrate the expression for the viscous losses, Equation 4.4 becomes:

(4.5)

Equation 4.5 is a second order equation with variable Ug. Knowing the positions of the masses (and thereby h1 , hs, x5 ) it is easy to solve Equation 4.5 by applying the formula for a second order equation: ax2 +bx+ c = 0 =?- x = -b+~. Because (Ps -po) < 0,

there is only one positive root. Let us now take a look at Equation 4.5 withµ= 0. Settingµ to zero means that there are no viscous effects present in our model. With µ = 0, the second term, dependent

on Ug to the first power, vanishes:

1 ( u. )2

-p _g_ + (Ps - Po) = 0 2 Lg hs

This gives as a solution for the flux:

Ug = LghsJ2(po; Ps) (4.6)

In Chapter 3 where we did not take into account viscous effects, the flux Ug was defined

as Ug = Lgvshs, with V5 given by V5 = J2(Po;P•)_ As we would expect, this yields the

same result as Equation 4.6.

4.4 Calculation of the hydrodynamic force in the advanced model

We defined the hydrodynamic force as the integrated pressure over the area of the glottis (3.17). Due to the introduction of the viscous pressure, the pressure distribution along

the glottis is different from (3.20):

( ) - 1 (__!!_s._)2 12~ Jx 1 d I < P X - Po - 7i,P Lg h(x) - Lg xo h3(x') X Xo < X Xs (4.7)

This expression is valid when the glottis is opened. When the glottis is closed, we use

35

the same expression as (3.22). Expression 4.7 has to be integrated over the area of the

glottis to get an expression for Fh (for derivation, see Appendix B):

Fh = [-Lgxo {Po - ~~ (h0

1hJ} - 6µ h:~6 Ug] +

( 4.8)

Again we can set µto zero. Doing this will give the analogous formulas for the hydro

dynamic force we found in Chapter 3.

4.5 Results of the advanced two point mass model

0.06

i 0.04 hi, h2 (cm)

0.02

-0.02 L__ ____ ....L.._ ____ __L. _____ L.._ ____ ....J.._ ____ ___,

0 0.01 0.02 0.03 0.04 0.05

t (s) --+

Figure 4.2: The aperture of the glottis at the position of mass 1 (thin line) as a function of time and at the position of mass 2 (bold line), calculated with the advanced two point mass model.

In this section we present the results of the calculations made on the advanced two point mass model. In this section we adopted viscous losses and we used the elastic

36

collision model, explained in Chapter 3.

In Figure 4.2 we present the opening of the glottis at mi and m2. Due to the building

up of pressure, the maximum value of hi is established when t > 0.01 seconds. We

see that h2 follows the motion of hi and that both hi and h2 become negative, which we could interprete as deformation of the vocal cords during the closed glottis phase. Both the maximum and the minimum value of hi and h2 are less big than in our simple

two point mass model. This will mean that the flux will also be reduced compared to the flux predicted by the simple two point mass model.

In Figure 4.3 we see the flux, calculated with the advanced model. The curve is

300

250

200 i

Ug 150 (cm3

)

100

50

0 0 0.01 0.02

t (s) --+

0.03

I I

0.04 0.05

Figure 4.3: The time-dependent flux, calculated with the advanced two point mass model.

smoother than the one we presented in Figure 3.11. In particular at the opening and closing of the glottis the flux changes in a smooth way, whereas in the case of the simple model this transition was a very abrupt one. The fact that there is a smooth transition can also be observed in Figure 4.4.

In the simple model, an abrupt rise and an abrupt fall of the derivative occured at the opening and the closing of the glottis, repectively. Figure 4.4 clearly shows that ~ is continuous at the opening of the glottis. Furthermore, we can see that ~ suddenly increases after reaching its minimum but it will be shown in Figure 4.9 that it takes

37

i <!:lb,_ dt

(cm3 /s) -100000

-200000

-300000 ~~~~~-'----~~~~-'-~~~~-'-~~~~---'-~~~~---' 0 0.01 0.02 0.03 0.04 0.05

t (s) ~

Figure 4.4: The derivative of the flux, calculated with the advanced two point mass model.

a finite time for the derivative to go to zero, implying a rather smooth diminishing of the flux during the closure of the glottis.

We present the hydrodynamic force as a function of time in Figure 4.5. The curve of the hydrodynamic force has some characteristic features. We can obtain more understanding of the processes that are responsible for the motion of m 1 and m2 . We will distinguish a viscous term Fvisc and a Bernoulli term Fbern· We recall the expression for the hydrodynamic force Fh:

Fh = [-Lgxo {Po - t ¥i- ( ho1h1 )} - 6µ h:~5 Ug] +

+ [Lgxs {Ps - t¥i- (hCh11)} + 6µh:~iug]

We will define Fbern and Fvisc as:

Fvisc = In the denominator of the first term of Fbern, there is an ho present, whereas in the denominator of the first term of Fvisc there is an h6 present. Since ho is of the order of

38

i Fh

(dyne)

1200

800

400

-400 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0 0.01 0.02 0.03 0.04 0.05

t (s) -r

Figure 4.5: The hydrodynamic force as a function of time, calculated with the advanced two point mass model.

a few cm, and since we saw that both h1 and h2 have a maximum of about 0.05 cm, the values of both Fbern and Fvisc are determined by the second term.

39

1000

i 500 Fh

(dyne)

-500

-1000~~~~~~~~~~~~~~~~~~~~~~~~

0 0.01 0.02 0.03 0.04 0.05

t (s) --+

Figure 4.6: Two constituents of the hydrodynamic force: the term due to viscous effects Fvisc (thin line) and the term due to the bernoulli force Fbern (bold line).

Figure 4.6 shows us how Fbern and Fvisc change during the opened glottis phase. We see that Fbern during the opening of the glottis (e.g. at t = 0.02 s) increases smoothly, whereas we saw in Figure 3.13 an abrupt rise of Fh when the glottis opened. The smooth increase of Fbern is due to the smooth increasing of the flux we saw in Figure 4.3. Fbern

appears to be positive when the glottis is convergent (e.g. at the opening of the glottis) and it is negative when the glottis has a divergent shape. The peak at the minimum of Fbern is reached when the separation point starts to move from h2 stream upward. This negative peak is partially reduced by Fvisc, as can be seen by comparing Figure 4.5 and Figure 4.6. Fvisc is always positive and it is smallest when the glottis is wide opened. As we expected, Fvisc is bigger near the opening and the closure of the glottis. Due to the small opening of the glottis at these points, viscous effects will become more important.

40

4.6 The effect of the collision model on the flow

200

100

50

0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0 0.01 0.02 0.03 0.04 0.05

t (s) ~

Figure 4.7: The flux through the glottis, calculated with the advanced two point mass model and an inelastic collision model.

Figure 4. 7 shows the flux Ug when we use the inelastic collision model. This figure shows that it takes slightly more time for the model to generate a pulse than in the case of the elastic collision model we had adopted in the preceding sections. This is due to the fact that, in this collision model, the rest position of the masses has to be negative Yo, 1 = y0,2 = -0.004 cm, as explained in Section 3.4. This negative rest position will introduce an extra force which keeps the glottis closed for a time. When the glottis opens ( t ~ 0.012 s ), the pressure has increased until its maximum value. So the maximum value of the flux at the first pulse equals the maximum value of the flux at the following pulses. Figures of the derivative of the flux (not presented here) showed that the curve of the flux is slightly changed when adopting an inelastic collision model in comparison with an elastic collision model. The maximum value of the flux Ug is about 25 % smaller and the opening phase takes more time (it takes 25 % more time to establish a flux of 50 cm3

). These differences can be explained as follows: when the vocal cords collide, we change the positions and speeds of the two point masses. By doing this, we impose new initial positions for the generation of a new pulse. This gives us the following

41

r interpretation: the glottal pulses, generated with the inelastic collision model, consist of a repetition of one pulse (the initial pulse). On the other hand, the glottal pulses, generated with the elastic collision model, consist of a continuous, stationary movement of the vocal cords.

4. 7 The effect of the viscous pressure on the flow

The effects of viscosity can be seen when looking at the flux Ug and the derivative of the flux ~. We will take a look at one glottal pulse only and we will focus on the effects at the point where the glottis opens and the point where it closes. Figure 4.8 shows the flux of the model, without viscous effects (thin line) and with

400

200

100

0 L.--~~---1.~~~---.=:-~~--'---~~~...J..._~~~.L----l.~---'~--'--~-1-~-' 0.03 0.032 0.034 0.036 0.038 0.04 0.042 0.044

t (s) -+

Figure 4.8: The termination of the flux with (bold line) and without (thin line) viscous effects.

viscous effects (bold line). We will discuss the shape of the curve to some extent. The maximum value of the flux is bigger when there are no viscous effects present. We can explain this by assuming that the viscous effects reduce the velocity of the masses to some extend. Even when this reduction in speed is only present at the opening of the glottis, it will cause a reduction in kinetic energy in the mechanical mass-spring system. This could lead to a diminishing of the value of h1 and h2 . Furthermore we can distinguish a discontinuity in the slope of the curve, we will discuss this phenomenon

42

by looking at the derivative of the flux.

200000 .-----.,----,-----.----.--------..-----.----------.-----,

100000

-200000

-300000

-400000

-500000 .__ __ _.___ __ ...._ __ _,__ __ _,__ __ __.__ __ __,_ __ ~-0.03 0.032 0.034 0.036 0.038 0.04 0.042 0.044

t (s) ~

Figure 4.9: The derivative of the flux with (bold line) and without (thin line) viscous effects.

Figure 4.9 shows ~ as a function of time. We will first take a look at <J!ft- without the viscous effects. We saw in Chapter 3 that there are three discontinuities present in the curve. The first discontinuity arises when the glottis opens, the second arises when the separation point begins to move upstream and the third arises when the glottis closes. We concluded that the fact that there is a discontinuity at the opening and at the closing of the glottis, gives rise to an abrupt generation and termination of the flow. When we look at ~ with viscous effects taken into account, we see that two discontinuities have vanished. There are no discontinuities during the opening and the closing of the glottis (this is a feature of the flow that could not be distinguished from Figure 4.4). This means that there is a smooth transition when the flow is generated and when the flow is terminated. We can explain this smooth character of the curve by taking the new speed distribution in the flow of which we gave an example in 4.1. One discontinuity remains, which is the one when the separation point starts to move. However it can be seen that this discontinuity is far less with viscosity than without viscosity. Due to the new separation criterium and a different motion of the masses m 1

and m 2 , the transition point where the separation starts to move upstream, is moved. This explains why we cannot see a discontinuity in the curve of Ug in Figure 4.8. The advanced two point mass model has 5 independent geometric parameters, 4 inde-

43

pendent mechanical parameters and one independent hydrodynamic parameter. The other parameters are dependent or have been incorporated in the description of the model. Further calculations showed that the advanced two point mass model is sensitive to changes in the value of some parameters. If for example x 2 or Lg is taken two times smaller, the glottis will not close anymore. Also if the spring constants k1 or k2

are increased by a factor 1.5, the glottis will not close. Further research has to show how critical all parameters are and to what extent they influence the motion of the vocal cords.

4.8 Conclusions about the advanced two point mass model

There are several similarities between the advanced and the simple two point mass model. The similarities are the building up of pressure which causes the shape of the first glottal pulse to differ from the shape of the stationary pulses. Also in the advanced model, we made sure that the forces acting on the masses were instantaneous functions of the geometry and the hydrodynamics, and we saw also in the advanced model that m 2 follows m1.

The differences between the two models can clearly be seen in the differences in the flux. The new separation criterium still divides the curve of the flux into two regions but this time there is a smaller discontinuity associated with it. This discontinuity can be caused by the combination of the separation criterium and the geometry of the model. The separation criterium tells us where the separation point has to be. When the criterium gives a point that is not between the points x1 and x 2 , we fix the separation position either to x1 or x 2• The transition between a fixed separation point and a separation point according to the separation criterium does not have to be smooth and can therefore cause a discontinuity in the first derivative of the flux. Though the glottal pulses and the derivative of the glottal pulses show great similarity with glottal pulses obtained from inverse filtering [4] and other models [15], more study about separation of the flow is needed to implement an accurate description of the phenomenon in the model. We saw that the viscous effects introduced a smooth generation and termination of the flow. This effect is partially due to a diminishing of the fl.ux and a diminishing of the hydrodynamic force. These two effects result in a slower speed of the masses m1 and m2 and a smaller fl.ux when the glottis is nearly closed. Both the elastic and the inelastic collision models were tested and gave good results, though the models showed some differences in the shape of the glottal pulse. On one hand, the pulses obtained with the inelastic collision model were built up from a repetition of pulses. On the other hand, the pulses obtained with the elastic collision model are a result of a stationary movement. Both collision models have some shortcomings. The elastic collision model was justified with the argument of deformation of the vocal

44

cords. Though deformation doesn't seem unlikely, the argument is more a justification

afterward while it should be modeled explicitly in the two mass model. On the other

hand, there are no physiological reasons for the assumption that an inelastic collision

takes place during phonation. Neither of the two collision models has a physiological

base, but modelling of the collision and its interpretation deserves more attention than is given in this research.

The advanced model gives us the opportunity to experiment with different parameters,

which can be a subject for future research. Some parameters we used appeared to have

a rather critical value. Increasing some parameters by a factor 1.5 leads to a stationary

motion of the vocal cords during which the glottis is not closed. Further research has to show how critical all parameters are and to what extent they influence the motion

of the vocal cords.

Though we can experiment with geometrical parameters, we have not changed the geometry itself in this model, since a new geometry introduces a whole set of new

governing equations. That is why we cannot draw conclusions about the relation be

tween the geometry and the obtained glottal pulses. Other geometries can be studied

in future research. The model we have developed now will be used in the following chapter to produce

some vowels. It is used as the source for a source-filter model.

45

Chapter 5

The two point mass model as a source for sound production

With the help of discrete-time signal processing, we can use our model for the vocal cords to generate sound. First we look in some more detail at the continuous time behaviour of our model and, more specifically, to what it means for our signal analysis. After that we try to make a mapping of the results to the discrete-time situation. The discrete-time description of the source model combined with the discrete-time description of the filter model makes it possible to design a speech synthesizer.

5.1 Continuous time behaviour of the two point mass model

The driving forces of the motion of the vocal cords are the hydrodynamic forces Fh,1 , Fh,2 , which depend on the positions y1 and y2 , as defined in Figure 3.2. A schematic drawing is presented in Figure 5.1. As we saw in the previous chapter, Newton's law for our model can be written as follows. The first equation describes the forces on mass 1, the second describes forces on mass 2:

(5.1)

As we did in Chapter 3, we introduced two hydrodynamic forces Fh,l and Fh,2 to get a more general, symmetric pair of equations. For simplicity we will omit Fh,2 later on. Furthermore, we neglected the rest positions of the masses (e.g. y0,1 , Yo,2, Yo, 12 etc.). Equilibrium positions will only introduce a constant term in the particular solution of the differential equations, which can be added after we have solved the equations.

46

Yi hi Fh i ,

a

Mechanics Geometry Fluid Dynamics

/3 Y2 h2

Fh,2

Figure 5.1: A representation of the two point mass model. The forces Fh,i, Fh,2 determine the positions Yi, y2 • These new positions determine the geometry of the glottis, and the geometry parameters determine the magnitude of the new forces.

5.2 A discrete time model for the vocal cords: the state space aproach

For the implementation of our model in a speech synthesizer we need a discrete-time approach. In this section we will show how we can make a discrete-time translation of our model. The key to this translation is that we require that the impulse responses of the discrete-time case is a sampled version of the impulse response of the continuoustime case. We will now give a state-space description of the mechanics of the two-mass model. We start with the equations:

;i;.(t) = A;f(t) + BJJ.(t) '}L(t) = C;f(t) + DJJ.(t) '

(5.2)

with A a 4x4 matrix, Ba 4x2 matrix, C is a 2x4 matrix and Dis a 2x2 matrix. ;f(t), '}L(t) and JJ.(t) are defined as:

;f(t) = ( ~~ ) , y(t) = ( Yi ) , JJ.(t) = ( ~h,i ) , Y2 - Y2 rh,2

Y2

(5.3)

So ;f(t), 'f!_(t) give information about the positions and speeds of the two masses. The matrices A, B, C, D are determined by the differential Equations 5.1. We can rewrite these equations in a matrix form as:

c· J r 0

1 0

llU;J+[1' Il .. _ktfil _.!.L !£u

( Fh,i ) Yi _ m1 m1 m1 (5.4) Y2 - o 0 0 Fh2 , .. !£u 0 -~ Y2 m2 m2

47

D

]J_( +f;_(t-~ t) B

~(t) c ?i_( t)

~/ +

- A ~

Figure 5.2: Visualization of the continuous-time matrix representation.

( Y1 ) ( 1 0 0 0 ) ( ~~ ) Y2 = 0 0 1 0 ~:

(5.5)

We see that matrix D = 0. In Figure 5.2 we see a schematic representation of this process. We rewrite (5.2) (with bi, i = 1, 2 the columns of the matrix B) as:

(5.6)

There are four impulse responses in the continuous-time case for the inputs ]J_i, i = 1, 2. In order to obtain these, we diagonalize the matrix A. We can diagonalize the matrix A, according to M;.1 AMA = AA where MA is a transformation matrix and AA is the diagonalized representation of the matrix A. With this diagonalization, we find the

impulse responses ~i(t) for the continuous-time case. In fact, ~i(t) is a vector of dimension 2. This means that both ~1 (t) as ~2 (t) gives 2 impulse responses.

(5.7)

The representation we gave in Figure 5.2 is a first order system in terms of the vectors ~(t), ?i_(t) and ]J_(t). This representation was given for the continuous-time case. A first order system for the discrete-time case is given in Figure 5.3, with analogue equations for (5.2):

~n = E~n-1 + F]J_n = E~n-1 + l:i L]j,_i,n' ?in = G~n-1 + H ]j,_n

(5.8)

With the vectors determined by the positions and speeds of the masses, and the ma-trices by the description of the model respectively. At this moment we do not know what the matrices E, F, G and H look like. In the following we will demand that

48

H

'.Yt.n F +J2.n z-1 J2.n-l G

'fLn

+

- E ,_____

Figure 5.3: Visualization of the discrete-time matrix representation.

the impulse responses of the discrete-time model and of the continuous-time model are equal. This demand makes it possible to get explicit identities of the matrices. From Equation 5.8 we can conclude that:

(5.9)

where 'H...oi n stands for the impulse responses of Yn in the discrete time model. We now require that the impulse responses (5.9) of the discrete-time model is a sampled version of the impulse responses (5.7) of the continuous-time model:

'JL.oi(nT)T = 'H...oi,n' n = 0, 1, ... , oo, i = 1, 2

Substituting the equations for the impulse responses yields:

This equation is satisfied when:

c MA exp (AAT)M"A 1

b·T -i

G E f., i=l,2 -i

(5.10)

(5.11)

This means that H = D = 0. Thus we showed that it is possible to translate the continuous-time model of our two point mass model into a discrete-time model. Using the state-space approach, the calculation time was found to be reduced by a factor of

49

300. We are now going to compare the results of the state-space approach and the numerical solution of the differential equations. Therefore we introduce the relative mean-square approximation error:

(5.12)

Here Xn are the values calculated with the state-space approach and Xn are the variables calculated by solving the differential equations numerically. The relative mean-square approximation error appeared to be very small: for the hydrodynamic force for example p = 4.7910-4

, expressed in dB, this is: 10log10 (p) = -33.19 dB The state-space approach makes it possible to study more-mass models with relatively little calculation time, like the 16-mass model presented in [10], [11]. The most timeconsuming operation in the state-space approach is the initialisation of some matrices (like MA exp (i\.AT)MA. 1

). The other calculations in the state-space approach involve additions and substractions. This makes it possible to do very fast calculations. This can be used in a speech synthesizer where we want to calculate the behaviour of the model for real-time speech production.

5.3 A discrete time model for the vocal tract

The air that has passed the glottis, will enter the vocal tract. The vocal tract has a tubular shape. Its walls are composed of muscular and bony tissues. Owing to the muscles, it is possible to change its shape and thereby its resonance frequencies. In this way, different sounds can be produced. Besides the vocal tract, there are also resonance cavities in the nose. These cavities constitute the nasal tract. In the following model we will neglect the nasal tract. It is decribed in various works about speech production, e.g. in [12], [13].

The model for the vocal tract is directly based on the tubular shape of the vocal tract and uses a time-domain description. We consider a model of N lossless cylindrical tubes, with N = 48 - 72 (since the vocal tract of an adult male is about 17 cm long, the tubes will have a length of 0.35 - 0.23 cm). The length of a tube depends on the sampling frequency in a way that will become clear later. As an example we have shown four sections of a tubular vocal tract in Figure 5.4. We will denote the length of a particular tube with li and the cross-sectional area of a tube with Ai (which can be up to 20 cm2), i runs from 1 at the glottal end to N at the lips of the speaker. All these tubes are concatenated, i = l..N. Every wave that passes a transition of two sections, will turn out to be partially reflected and partially transmitted to the next section. In the following, we will first discuss what happens in one single section. After that we will discuss what happens at the boundary of two sections. We complete this model of the vocal tract by describing what happens at the boundary between the

50

--, I

I

--~

Ai

li

u:- -i

u:I- -i

li+l

- ui+i

- + Ui+l

i+l

Ai+1

\

I

... - -I

L--

Figure 5.4: A part of the tubular (cylindrical} vocal tract model. Reflection of the wave occurs at every boundary between two tube sections, e.g. at the dotted line between i and i + 1.

glottis and the vocal tract and what happens at the boundary between the lips and the vocal tract. In a single section of the vocal tract model, the motion of the air can be described by a second order differential equation [12]:

fJ2u 1 82u ax2 c2 at2' (5.13)

where x is the x position in the vocal tract, c is the wave propagation velocity in the vocal tract and t is the time. Unlike in the preceding chapters, we will denote the volume velocity in the vocal tract with a small letter u. This equation yields travelling waves as a solution. These can be written as:

ui(x, t) = ut(t - x/c) - ui(t + x/c) (5.14)

The travelling waves can go in a direction towards the lips ( ut) or in the opposite direction towards the glottis ( ui). For the pressure wave, associated with the volume velocity u, we can derive a similar expression:

Pi(x, t) = ~~ [ut(t - x/c) + u;(t + x/c)] i

(5.15)

Now we have an expression for the waves in every section i. Now we are going to look what happens at the boundary of two sections. When the areas Ai and Ai+l of two sections next to each other are inequal (Ai -=/= Ai+i), part of the incoming wave will be transmitted and part will be reflected. Equations 5.14 and 5.15, combined with boundary conditions give new relations for the reflected incoming and the transmitted outgoing waves. Continuity of the flux between section i and section i + 1 gives the relation:

(5.16)

51

Here Ti = li/ c is the delay time: it is the time it takes a sound wave to propagate through the ith section. The left side of Equation 5.16 is the net volume velocity ui(li, t) at the left edge of the boundary, while the right side corresponds to ui+l (0, t) at the right edge. Continuity of the pressure (5.15) gives the relation:

Ap~ [ut(t-Ti)+ui(t+Ti)] = A~c [u41(t)+ui+1(t)] (5.17) I 1+1

As mentioned earlier, when the flow meets a transition of tubes, the wave will be partially reflected and transmitted. The amount in which the wave is refelected, depends on the magnitude of the areas A and A+i· When we solve Equations 5.16 and 5.17,

we get a relationship for the outgoing waves in terms of the incoming waves:

u;(t +Ti)= -riut(t - Ti)+ </Jiu;+1(t) u41(t) = f3iut(t - Ti)+ riui(t),

(5.18)

where A·+1-A· r· - ' '

i - A;+1+A;

(5.19)

The term ri is called the reflection coefficient between the sections i and i + 1. It is clear from Equation 5.19 that -1 ~ ri ~ 1. The lower and upper bounds are reached when one of the areas at the boundary is zero or infinite. Furthermore, when both are equal, the transmission will be 100 % (this is the same as when there would be no transition between two sections at all). A signal-flow diagram of the reflection and transmission of the waves is given in Figure 5.5. This figure is the signal-flow representation of Figure 5.4. Once the flow has passed the lips, it will meet no further obstacles. The complex reflection coefficient for the final lip juncture is

RL =(pc/AN)- ZL (pc/AN)+ ZL

Where the impedance ZL of the lips is given by:

ZL = )WT£ R 1 + JWT£

(5.20)

(5.21)

In this equation w is the frequency of the signal, T£ is a time constant and R is a real number. In the following, we will use capital letters to denote the Fourier transformed variables (e.g. the Fourier transformed of ut ( t) is denoted by Ut (jw)). At the beginning of the vocal tract, the glottal termination is modelled in the Fourier domain as a volume velocity source Ug(jw) parallel with an impedance Zc(jw). Writing this out in an equation, we obtain the net volume velocity into the first section:

p (. ) J!E... (u+(jw) + u-(jw)) U+(. ) - u-(. ) - TT (. ) - 1 JW - TT (. ) - Ai 1 1 (5.22)

1 JW 1 JW - vg JW Zc(jw) - vg JW Zc(jw)

52

ut(t) ut(t +Ti) u41(t) ut+1(t+Ti+i) Ti + Ti+l

1 + ri

-ri ri

u-;(t) u-;(t - Ti) 1 - ri ui+i ( t) Ui+1 (t - Ti+i) Ti

+ Ti+l

ith tube (i + l)st tube

Figure 5.5: A signal-flow representation of the reflection and transmission of waves at the transition of two tubes.

Here P1 (jw) is the pressure at the left end of the first tube and Zc is a complex function of the frequency w, just as (5.21). This gives for the flow U{(jw) into the first section:

U{(jw) = 1+2R0 Ug(jw) + RcU1(jw)

with Za(jw)-.2.S. R - Ai

G - Za(jw)+.e.s. Ai

(5.23)

At this point we know what happens at the beginning of the vocal tract model (5.23), at the end (5.20) and what happens in the coupled sections (5.18). This means that we have a complete description of the vocal tract model. A signal flow representation of this description (with the number of tubes N = 3) ·is given in Figure 5.6. Because we have chosen sections of equal length l, the delay time Ti is equal for every section. This yields a standard delay time T. In our discrete-time system T is chosen such that T = ~T8 , with T8 the sampling period.

5.4 The production of vowels with the speech synthesizer

The apparatus we are going to use to produce speech, our speech synthesizer, contains a source model (e.g. our two point mass model) and a vocal tract model which has been discussed above. The data needed to describe the diameters of the tubes in the vocal

53

Ug(jw) U1(jw) _,_ T T

_,_ T

(l+Ro) l+R 2

RG -Ti Ti -T2 T2 R1

'+-- T T T -- --

Figure 5.6: A signal-flow representation of the vocal tract with three tubular sections. All sections have the same length which gives rise to a standard delay time T. The coefficients T g and R1 account for the initiation of the flow at the glottis and the termination of the flow at the lips, respectively.

tract, has been taken from [13]. For the moment we are only interested in producing isolated vowels. No transitions between vowels or between vowels and consonants take place. In this section we will discuss speech we have produced using a non-coupled model of source (vocal cords) and filter (vocal tract). The vocal tract is driven by the flux Ug but Ug is not affected by the wave that reflects at the glottis. Since we work with an uncoupled model, it is clear that neither hi, h2 , Ug or Fh is affected by the vocal tract model. In Figure 5.7 the pressure at the lips P1 as a function of time is shown. We used the advanced two point mass model, explained in Chapter 4, as the source model and the tubular vocal tract model, explained in the preceding sections, as the vocal tract model. We used a configuration of the vocal tract for the vowel [a]. Figure 5.7 shows that the pressure starts to vary when the glottis is opened. During the time the glottis is closed ( 0.015 s < t < 0.02 s ) the waves resonate in the vocal tract. We can distinguish a decay of the amplitude of this oscillation, which is due to losses at the lips and losses at the glottis, modelled by the reflection coefficient R1 and RG, respectively. When the glottis is opened ( 0.02 s < t < 0.03 s ) the vocal tract acts as some kind of differentiator on the flux Ug. This can best be seen at time t = 0.03 s where both in Figure 4.4 and Figure 5.7 a drop in the curve occurs. Since there is little data available on measurements of a sole glottis or a sole vocal tract, it is very hard to say whether the separate source or separate filter model has a good quality. This means that it is possible that we have compensated shortcomings

54

L

i PL

(relative)

0 0.01 0.02 0.03

t (s) --+

0.04 0.05

Figure 5.7: The pressure P1 at the lips as a function of time is shown. We used the advanced two point mass model for the source and a tubular vocal tract model in a configuration for the production of an [a] as the filter. Source and filter model were not coupled.

of one model (the vocal cord model) with shortcomings in another model (the vocal tract model). Though the figure we presented here, shows similarity with calculations of others, [14], [5], it is hard to say whether the quality of the sound we produce with this speech synthesizer is good or not. This speech synthesizer can only produce long sustained vowels and no consonants. Further (perceptual) research on the speech synthesizer is needed to get an indication of the quality of the produced sound. Another reason we recommend perceptual tests is that it is not generally known what aspects are relevant for the naturalness of sound: some opinions are that voiced sounds are responsible for naturalness [12], others say transitions between sounds are responsible for it [15].

5.5 Coupling of the source model and the filter model

In Section 5.3 we assumed that there was no couping between the source model and the filter model. This meant for our calculations that the pressure at the separation

55

point Ps was set to zero. In reality this assumption is not true. There is an acoustic

coupling between the source and the filter: the resonating waves in the vocal tract

influence the pressure at the end of the glottis. In this section we will see what the

effects of coupling of the two models are.

Let us return to the expression for the flux Ug which we calculated in Chapter 4.

~ (~) 2

_ ~ { Xo(ho +hi) _ Xs(h1 + hs)} ( _ ) _ 2P L h 6µ L h2 h2 h2 + Ps Po - 0,

g s g 1 0 s (5.24)

where Ps is the pressure at the separation point of the flow. In our model for the vocal

tract we described the flux in terms of two waves u+ and u-, travelling in opposite

direction (5.14):

(5.25)

Assuming that the pressure at the begin of the vocal tract equals the pressure at the

separation point gives us Ps in terms of u+ and u- (5.15):

pc + Ps = -(u + u-) Ag

(5.26)

Substituting (5.25) and (5.26) in Equation 5.24 gives a second order relation in terms of u+ and u-:

~ (u+-u-)2

_ 6 (u+-u-){xo(ho+h1)_xs(h1+hs)} (pc(+ -)- )=O 2 P L h µ L h 2 h 2 h 2 + A u + u Po '

gs gl 0 s g

(5.27)

We know that u- is fully determined by the history of the system (5.18). The only

unknown variable in Equation 5.27 is therefore u+. Equation 5.27 can be solved ana

lytically and yields only one positive root.

The flux of the coupled source-filter model is given in Figure 5.8. The flux is slightly

affected by the coupling but the shape of the glottal pulse is the same as in Figure 5.8.

In Figure 5.10 we see that the hydrodynamic force is also affected by the coupling since

Fh depends on Ug and Ui (4.8). Though the hydrodynamic force changed because of the coupling, the motion of m 1 and m 2 showed nearly the same results as with the

non-coupled model. Figure 5.11, which gives the pressure at the lips as a function of

time, has the same features as Figure 5.7.

56

350

300

i 250

Ug 200 (cm3 )

150

100

50

0 0 0.01 0.02 0.03 0.04 0.05

t (s) -+

Figure 5.8: The flux passing the glottis in a coupled source-filter model.

0.08

0.06

r 0.04 h1, h2

(cm) 0.02

0

-0.02 0 0.01 0.02 0.03 0.04 0.05

t (s) -+

Figure 5.9: The aperture of the glottis at the position of mass 1 (thin line) as a function of time and at the position of mass 2 {bold line), calculated with the advanced two point mass model in a coupled source-filter model.

57

2000

1500

i 1000

Fh 500 (dyne)

0

-500

-1000

-1500 0 0.01 0.02 0.03 0.04

t (s) ~

Figure 5.10: The hydrodynamic force in a coupled source-filter model.

i P1

(relative)

0 0.01 0.02 0.03

t (s) ~ 0.04 0.05

0.05

Figure 5.11: The pressure P1 at the lips as a function of time is shown. We used the advanced two point mass model for the source and a tubular vocal tract model in a configuration for the production of an [a] as the filter. A coupled source-filter model was used.

58

5.6 Conclusions about discrete time speech synthesizer

Since we wanted to use the two point mass model as a source for the production of speech, we have made a distinction between the source model (the glottis) and the filter model (the vocal tract). Because we cannot obtain experimental verification of the validity of the two models seperately, it is possible that failures in one model are compensated by failures in the second model. In this chapter we presented a discrete-time description for the vocal cords and the vocal tract. The discrete-time description of the vocal cords involved the state-space approach. This approach describes the system of coupled differential equations in a matrix form. It reduced calculation time of the dynamic behaviour considerably, while the results of the calculations were in good agreement with the solutions we got by solving the differential equations numerically. The discrete time version of the vocal tract involved a tubular shape of coupled sections in the vocal tract. We descibed the air flow by taking reflection and transmission into account at the boundary of two sections. We have produced some vowels with this source filter model and they showed similarities with vowels produced by other models. The effects of the coupling we observed, were relevant for a change in the flux only, the motion of the vocal foids was not affected by it. A problem arises when we want to evaluate the quality of the speech we produced with the two point mass model and the vocal tract. It is not generally known what aspects are relevant for the naturalness of sound: some opinions are that voiced sounds are responsible for naturalness [12], others say transitions between sounds are responsible for it [15]. We recommend to do some perceptive experiments to find out what parameters in the two point mass model influence naturalness of the synthetic speech.

59

Chapter 6

Discussion, conclusion and suggestions for future research

The two-mass model of Ishizaka (5] was a first step in getting a description of the motion of the vocal cords. Ishizaka used a model of rectangular masses and assumed five different terms contributing to the hydrodynamic force. This model has been used for studying different aspects of the motion of the vocal cords, like behaviour of the model in case of large amplitude oscillations (7]. Apart from (16] little research has been done on verification of the model by experiment. So it is difficult to say something about the quality of the produced glottal pulses. During this research, two point mass models have been studied because of the simplicity of the governing differential equations. The model we have used has a very simple geometry, thereby making it easy to calculate the solution of the governing differential equations. The introduction of some fluid dynamical aspects resulted in a smooth shape of the glottal pulse: the separation criterium which was based on a fit to experimental data, gave the best result. The introduction of viscous effects in the model resulted in a smooth initiation and termination of the flow during opening and closing. We calculated the behaviour of two types of collisions in our model: elastic and inelastic. Both of them gave acceptable results. The pulses we derived are in good agreement with those calculated by others [5] and measured [16], and obtained by inverse :filtering [4]. Studying this two-mass model, we have taken one set of parameters for the spring and damping constants, geometry etc. These parameters appeared to have a rather critical value. Increasing some parameters by a factor 1.5 leads to a different behaviour of the model, resulting in a stationary motion of the vocal cords during which the glottis is not closed. Further research has to show how critical all parameters are and to what extend they influence the motion of the vocal cords. The state space approach we used to calculate the behaviour of two point mass models, gives the poosibility to operate the model in real time and to study more-mass models with relatively little calculation time, like the 16-mass model (10]. The most timeconsuming operation in the state-space approach is the initialisation of some matrices.

60

The other calculations in the state space approach involve additions and substractions. This makes it possible to do calculations in real time if the computer used for the calculations is powerful enough. This can be used in a speech synthesizer where we want to calculate the flux Ug in real time.

When we want to use the two-mass model as a source for the production of speech, we have to make a distinction between the source model (the glottis) and the filter model (the vocal tract). The vocal tract model was based on the tubular shape of the vocal tract. Together with the two-mass model it gave the possibility to produced vowels with our model. A problem arose when we wanted to evaluate the quality of the speech we had produced with the two point mass model and the vocal tract. It is not generally known what aspects are relevant for the naturalness of sound: some opinions are that voiced sounds are responsible for naturalness [12], others say transitions between sounds are responsible for it [15]. We recommend to do some perceptive experiments to find out what parameters in the two point mass model influence naturalness of the synthetic speech.

61

Appendix A

Derivation of the hydrodynamical force working on a plane without viscosity

The hydrodynamical force Fh is defined by:

Fh =Lg r· p(x) dxdz Jxo

(A.l)

We will calculate the hydrodynamical force as the integrated pressure over the glottal area. In this equation, p(x) is the pressure at the x-position of the glottis and we integrate over the x- and z-direction. To evaluate the hydrodynamic force, we need an expression of the pressure as a function of the x-position. For this purpose, we can use the Bernoulli-equation, applied to the entrance of the glottis and a certain point x:

Xo < X < X1

(A.2)

with h(x) given by:

h(x)=ho-h1 (x-x1)+h1 Xo<x<x1 XQ-X)

(A.3) h(x) = h,-hi (x - x1) +hi X1 < x < Xs

x,-x1

The flux Ug is given by: Ug = Lgv(x)h(x). In the Bernoulli-equation we can substitute the velocities by: v( x) = Ls~(x). Furthermore we neglect the velocity at the entrance of the glottis v0 , since this speed is very small compared to the velocity at the separation point v0 < < Vs· Therefore, the contribution of t pv5 to the hydrodynamical force, will be neglectible. This gives the following expression for the pressure:

62

Xo < X < X1

(A.4) _ 1 u2

1 ug p(x) - Ps + 2Pi/i;2 - ?,P £2h2(x) X1 < X < Xs

g • g

Substituting equation A.4 into equation A.l gives:

Fh = Lg 1x'p(x) dxdz = xo

-L (xo - x1) {Po - 1/!i (-1-)} + g 2 Li hoh1

So we find for the hydrodynamical force Fh:

(A.5)

63

When the glottis is closed, there will be no flow. The pressure over the glottis is constant. When the glottis is closed, we assume that there is a constant pressure force at the entrance of the glottis. The hydrodynamical force becomes:

(A.6)

(A.7)

64

Appendix B

Derivation of the hydrodynamical force working on a plane with viscosity

As we saw in Appendix A, the hydrodynamical force Fh is defined by:

Fh =Lg 1x:· p(x) dxdz (B.l)

We will calculate the hydrodynamical force as the integrated pressure over the glottal area. In this equation, p(x) is the pressure at the x-position of the glottis, we integrate over the x- and z-direction. To evaluate the hydrodynamic force, we need an expression of the pressure as a function of the x-position. For this purpose, we can use the Bernoulli-equation with a viscous term, applied to the entrance of the glottis and a certain point x:

1 2 _ ( ) 1 2 ( ) + Jx 12 !!A 1 d I Po+2PV0-PX +2pv x xo µLgh3(x') x Xo < X < X1

( ) 1 2( ) Jx 12 !!A 1 d I _ 1 2 Jx• 12 !!A 1 d 1 PX + 2PV X + x1 µLg h3(x') X - Ps + 2PVs + x1 µLg h3(x1 ) X

with h(x) given by:

(B.3)

The flux Ug is given by: Ug = Lgv(x)h(x). In the Bernoulli-equation we can substitute

65

the velocities by: v(x) = Lg~(x)" Furthermore we neglect the velocity at the entrance of the glottis v0 , since this speed is very small compared to the velocity at the separation point v0 << V 5 • Therefore, the contribution of tpvg to the hydrodynamical force, will be neglectible. This gives the following expression for the pressure:

( ) - - l __!!L - JX 12 !!.s._l_d I P X - Po 2P £ih2(x) xo µLg h3(x') X Xo < X < X1

(B.4)

X1 < X < X 5

Substituting equation B.4 into equation B.l gives:

66

So we find for the hydrodynamical force Fh:

(B.5)

When the glottis is closed, there will be no flow. The pressure over the glottis is constant. The hydrodynamical force becomes:

Fh =Lg 1x. p(x) dxdz xo

(B.6)

(B.7)

67

Bibliography

[1] R. Collier, F.G. Droste Fonetiek en Fonologie, 1975, Leuven.

[2] X. Pelorson, A. Wijnands, A. Hirschberg "Theoretical and experimental study of quasi-steady flow separation within the glottis during phonation. Application of a modified two-mass model", accepted for publication in Journal of the Acoustic Society of America, 1994.

[3] B. Malmberg Manual of Phonetics, 1968, Amsterdam.

[4] J. Lindqvist "The voice source studied by means of inverse filtering", Quarterly progress and status report, Speech transmission laboratory, Royal institute of Technology Stockholm, 1970.

[5] K. Ishizaka, J.L. Flanagan "Synthesis of voiced sounds from a two-mass model of the vocal cords" The Bell system technical journal, Vol. 51, No. 6, July-August 1972.

[6] X. Pelorson "A study of the two-mass model of the vocal cords from a fluid dynamical point of view", IPO report No. 873.

[7] J.C. Lucero "Dynamics of the two-mass model", Journal of the Acoustic Society of America, Vol. 94, No. 6, December 1993.

[8] X. Pelorson unpublished work

[9] G. Fant "Some problems in voice source analysis", Speech Communication 13, Vol. 13, pp. 7-22, 1993.

[10] I.R. Titze "The human vocal cords: A mathematical model part I", Phonetica, Vol. 28, pp. 129-170.

[11] I.R. Titze "The human vocal cords: A mathematical model part II", Phonetica 29, pp. 1-21.

[12] D. O'Shaughnessy Speech Communication, human and machine, Addison-Wesley, 1987.

[13] L.F. Brosnahan, B. Malmberg Introduction to Phonetics, 1975, Cambridge.

68

[14] T. Koizumi, S. Taniguchi, S. Hiromitsu "Two-mass models of the vocal cords for natural sounding voice synthesis", Journal of the acoustic society of America, Vol. 82 No. 4, 1987.

[15] G. Fant "Some problems in voice source analysis", Speech Communication 13, 1993, North Holland.

[16] B. Cranen The acoustic impedance of the glottis, measurements and modeling, Ph. D. thesis, 1987, Enschede.

[17] S.G. Nooteboom, A. Cohen Spreken en verstaan, 1984, Assen.

[18] J.L. Flanagan Speech analysis, synthesis and perception, 1972, Berlin.

[19] G. Fant Acoustic theory of speech production, 1970, The Hague.

[20] G. Fant, J. Liljencrants, Q. Lin "Speech production", STL - Quarterly progress and status report, No. 4, 1985.

[21] I.R. Titze "The physics of small amplitude oscillation of the vocal folds", Journal of the acoustic so~iety ~f America, Vol. 83 No. 4, 1988

69

eindhoven university of technology master speech ... · computers to make simulations of processes...

Documents