speech intelligibility in cockpit voice recorders · movements of the mouth and lips that aid in a...

SPEECH INTELLIGIBILITY IN COCKPIT VOICE RECORDERS

by

Jane Foster

A thesis submitted to Johns Hopkins University in conformity with the requirements

for the degree of Master of Electrical Engineering

Baltimore, Maryland

May 2015

ii

Abstract

This paper focuses on the issue of correcting failures in Cockpit Voice

Recorders when dealing with speech intelligibility. The common problem that is

addressed of Cockpit Voice Recorders is the failure of the erase function, causing

speech recordings to be recorded on top of each other. Computational speech

intelligibility systems that are focused on include the Articulation Index, the Speech

Intelligibility Index, and the Speech Transmission Index. These all use the Signal-to-

Noise Ratio to help measure intelligibility. This paper determines a threshold for the

Signal-to-Noise Ratio and intelligibility with layered continuous human speech.

iii

Preface

During my time as an undergraduate student at Johns Hopkins University, I

focused on two seemingly opposite things, Electrical Engineering and French. After

completing a thesis for my Bachelor of Arts in French Language and Literature, it

only seemed appropriate to write a thesis during my Electrical Engineering Master’s

program. Driven by my love for language, I searched for a problem to solve that

dealt with speech. After spending a summer working on “Black Boxes” (also known

as the Cockpit Voice Recorder) for the National Transportation Safety Board in

Washington, D.C., I was troubled by a problem I saw. Cockpit Voice Recorders

were found to be broken, even though they are made to be indestructible. Vital to

many investigations, the Cockpit Voice Recorder contains necessary data and when

it fails, it is frankly useless. With this in mind, I set out with the goal to create a more

robust test to improve these devices.

I want to express my appreciation and gratitude to Dr. Mounya Elhilali – my

thesis advisor, Dr. Danielle Tarraf – my academic advisor, Sarah McComb,

Alexander Simonelli, Johns Hopkins University, and my parents, for always

believing in me and pushing me to learn more. Thank you.

iv

Table of Contents

Abstract ........................................................................................................................ ii

Preface......................................................................................................................... iii

List of Tables ............................................................................................................... v

List of Figures ............................................................................................................. vi

Introduction .................................................................................................................. 1

Articulation Index .................................................................................................... 1

Speech Intelligibility Index ...................................................................................... 7

Speech Transmission Index ..................................................................................... 9

Signal-to-Noise Ratio............................................................................................. 10

Cockpit Voice Recorders ....................................................................................... 11

Purpose ................................................................................................................... 13

Methods...................................................................................................................... 15

Results ........................................................................................................................ 18

Signal-to-Noise Ratio............................................................................................. 18

Speech Intelligibility Index .................................................................................... 20

Speech Transmission Index ................................................................................... 21

Discussion .................................................................................................................. 23

Conclusion ................................................................................................................. 29

Appendix .................................................................................................................... 30

List of Acronyms ................................................................................................... 30

MATLAB Code ..................................................................................................... 30

Bibliography .............................................................................................................. 42

Curriculum Vitale ...................................................................................................... 44

v

List of Tables

Table 1 : This table shows the 18 different frequency bands for the one-third octave

band procedure for calculating the SII ............................................................... 16

Table 2 : This table shows the SNR in dB for one, two, and five layers of speech.

These values were calculated using MATLAB and the code is shown in the

appendix. ............................................................................................................ 19

Table 3 :This table shows the peak SNR in dB for one, two, and five layers of

speech. These values were calculated using MATLAB and the code is shown in

the appendix. ...................................................................................................... 20

Table 4 : This table shows the peak SNR in dB for one, two, and five layers of

speech. These values were calculated using MATLAB and the code is shown in

the appendix. ...................................................................................................... 21

Table 5 : This table shows the STI for two and five layers of speech. These values

were calculated using equation 2. ...................................................................... 22

vi

List of Figures

Figure 1 : Relationship between the Calculated AI and the Effective AI with Visual

Cues when the listener can see the lips and face of the talker (ANSI S3.5-1969

22). This plot is based on the study “Visual contribution to speech intelligibility

in noise” by Sumby and Pollack. ......................................................................... 4

Figure 2 : Relation between AI and various measures of speech intelligibility (ANSI

S3.5-1969 23). This plot is compiled by ANSI to include both data from French

and Steinberg in their paper "Factors governing the intelligibility of speech

sounds" as well as data from Kryter in his paper “Some comparisons between

rhyme and PB-word intelligibility tests”. The acronym PB stands for words that

are phonetically balanced. .................................................................................... 6

Figure 3: Relationship between SII and speech intelligibility. SII is shown on the

bottom axis in percentages, SNR is shown on the top axis in dB, and speech

intelligibility is shown on the left axis in percentages (Killion and Mueller 14). 8

Figure 4: This graph shows the relation between the STI and the articulation loss of

consonants (ALcons) using a logarithmic scale for plotting ALcons which is an

intelligibility score found using talkers and listeners. This intelligibility score is

found using the loss of consonants in Consonant-Vowel-Consonant (CVC) type

nonsense words for a set of 57 auditorium-like conditions (noise, reverberation,

or echo). ............................................................................................................. 10

Figure 5: This image shows the layout of a solid-state “black box” recorder. The

casing is orange to attract attention to be easily found in aircraft wreckage. An

underwater locator beacon is used to find the device when the plane crashes

over water. The special casing allows the device to withstand 3400 G for 6.5

ms, 1100 degrees Celsius for 60 minutes, and water pressure resistance up to

6000 meters. The memory is all located within the insulated, armored casing for

better protection (Edgar). ................................................................................... 12

Figure 6: This image shows how the Cockpit Voice Recorder (CVR) appears to a

pilot of a Boeing 737. The recording begins with the first rise in engine oil

pressure and ends 5 minutes after the last engine shuts down. The rest of the

Cockpit Voice Recorder resides in the cargo hold in the back of the aircraft

(Brady). .............................................................................................................. 13

Figure 7 : This plot shows the SNR for 20 samples of two and five layers of

continuous speech. For two layers, the numbers are both positive and negative.

For five layers, the numbers are consistently negative because the noise power

is greater than the signal power. ........................................................................ 23

Figure 8: This plot shows the STI for 19 samples of two and five layers of

continuous speech. ............................................................................................. 25

vii

Figure 9 : This plot shows the SII for 20 samples of one layers of continuous speech

with added interior airplane noise in blue. The average of all of the samples is

shown in red. ...................................................................................................... 26

Figure 10: This plot shows the relationship between SNR and SII for one, two, and

five layers of continuous speech with added interior airplane noise. One layer is

shown in green, two layers are shown in red, and five layers are shown in blue.

............................................................................................................................ 27

1

Introduction

Speech intelligibility is used as a metric to measure the ability for humans to

understand a given sound. Often used in room acoustics, speech intelligibility helps

define the likelihood that human speech can be understood correctly in a given situation.

Speech intelligibility is related to the Signal-to-Noise ratio, as less speech can be

understood when noise is increased. Speech intelligibility is a subjective test; however,

computational models have been formed to improve the regularity and standardization

associated with the measurement of speech intelligibility. Over the years, engineers and

scientists have developed many procedures and mathematical models to encompass and

define the field of speech intelligibility. These processes include the Articulation Index

(AI), the Speech Transmission Index (STI), and the Speech Intelligibility Index (SII).

These methods have been altered for improvement in certain situations, which in turn

creates new models. However these three standards of speech intelligibility (the AI, the

STI and the SII) will be used in this research.

Articulation Index

The Articulation Index (AI) was originally used to determine how the effects of

noise on a telephone line affect speech intelligibility (Lindgreen, Garcia, and Georganti

4). French and Steinberg developed the AI in 1947. They point out that “echoes, phase

distortion, and reverberation may affect intelligibility” (French and Steinberg 91). These

two scientists attempted to focus on the hearing of young men and women with good

hearing in order to prevent variations in hearing level sensitivity from developing any

inconsistences in their model (French and Steinberg 92). The human ear has lower

2

hearing sensitivity when frequencies are masked (French and Steinberg 96). This can be

the reason why acknowledgement of certain frequencies is impossible to a human listener

in a noisy situation. Noise can mask certain frequencies, causing problems in speech

reception and intelligibility. The Articulation Index is based on the idea that “any narrow

band of speech frequencies of a given intensity carries a contribution to the total index

which is independent of the other bands with which it is associated and that the total

contribution of all bands is the sum of the contributions of the separate bands” (French

and Steinberg 101). In short, this metric is defined by the sum of different frequency

bands each weighted separately. The weighting value of each band depends on the

effective sensation level of the signal within a given band in the ear of the listener

(French and Steinberg 92). The different level assigned to each band is necessary to

determine the AI.

The AI is standardized by the American National Standards Institute in ANSI

S3.5-1969 (Lindgreen, Garcia, and Georganti 5). The American National Standards

Institute (ANSI) explains that the AI was developed primarily for adult male speakers, so

this method cannot be assumed to apply in situations involving females or children

(ANSI S3.5-1969 6). This is because of the dominant use of male voices when the AI was

being tested and developed. There are two methods for calculating the AI: The 20 Band

Method and The One-Third Octave Band and Octave Band Method (ANSI S3.5-1969 7-

8). The 20 Band Method divides the frequency spectrum into 20 unequal bands. When

above the audibility threshold (exceeding 30 dB), each of these bands contributes equally

to the speech intelligibility metric. The One-Third Octave Band and Octave Band Method

3

requires knowledge of speech and noise present in certain one-third octave or octave

bands. Either of these two methods can be used to calculate the AI according to ANSI.

Some factors can be easily assessed and evaluated by the AI. These factors

include masking by steady-state noise, masking by non-steady-state noise, frequency

distortion of the speech signal, amplitude distortion of the speech signal, reverberation,

vocal effort, and visual cues (ANSI S3.5-1969 15-21). For example, visual cues (such as

movements of the mouth and lips that aid in a listeners comprehension) would have an

effect on speech intelligibility. The AI can be modified to include or ignore the effect of

visual cues. Figure 1 shows the relationship between the AI with visual cues and the AI

without visual cues and is based on research by Sumby and Pollack (ANSI S3.5-1969

22). The relationship is relatively linear, with an increase in the calculated AI leading to

an increase in Effective AI with Visual Cues.

4

Figure 1 : Relationship between the Calculated AI and the Effective AI with Visual Cues when the

listener can see the lips and face of the talker (ANSI S3.5-1969 22). This plot is based on the study

“Visual contribution to speech intelligibility in noise” by Sumby and Pollack.

There are also factors that cannot be evaluated by the AI. These factors include sex of

the speaker, multiple transmission paths, the combination of factors, monaural versus

binaural presentation, and asymmetrical clipping, frequency shifting, and fading (ANSI

S3.5-1969 21). These aspects are unable to be assessed by the AI metric.

The relationship of the AI to speech intelligibility is also defined in the ANSI

standard. The AI is on a scale from zero to one and speech intelligibility is on a scale

from zero to one hundred percent. When more constraints are imposed on the system, a

5

higher percent intelligibility score is achieved for a given SI (ANSI S3.5-1969 21). This

can be shown in Figure 2 where for example, when the vocabulary was limited to just 32

words, a higher speech intelligibility score was achieved for a given AI. Figure 2 shows a

compilation of different data relating the AI and speech intelligibility (ANSI S3.5-1969

23). Different curves represent different data sets that were tested using the AI. The AI

predicts the relative performance of a communication system operating under given

conditions (ANSI S3.5-1969 23). In cases of poor reception, French and Steinberg assert

that sentence tests are more useful (French and Steinberg 115). The circumstances

making up and surrounding communication systems are needed for the AI to predict the

performance relative to other communication systems.

6

Figure 2 : Relation between AI and various measures of speech intelligibility (ANSI S3.5-1969 23).

This plot is compiled by ANSI to include both data from French and Steinberg in their paper

"Factors governing the intelligibility of speech sounds" as well as data from Kryter in his paper

“Some comparisons between rhyme and PB-word intelligibility tests”. The acronym PB stands for

words that are phonetically balanced.

The overall performance of the intelligibility of a system can be evaluated using the zero

to one scale of the AI. For commercial systems, an AI greater than 0.5 is preferable

(ANSI S3.5-1969 23). With communication systems used in stress conditions with a

variety of speakers and listeners with varying degrees of skill, an AI exceeding 0.7 is

7

appropriate (ANSI S3.5-1969 23). These numbers are used to implement the AI in real

world applications.

Speech Intelligibility Index

The Speech Intelligibility Index (SII) is similar to the AI. The SII also uses the

signal-to-noise ratio (SNR) to represent contributions within each frequency band. The

intelligibility score is determined through a weighted average across the bands, which in

turn relates to the subjective intelligibility of speech (Schwerin and Paliwal 10). One

difference between the AI and the SII is the correction for change in speech spectrum

according to vocal effort (Lindgreen, Garcia, and Georganti 5). The American National

Standards Institute (ANSI) developed a method for calculating the SII in 1997. The ANSI

defines the SII as the product of the band importance function and the band audibility

function summed over the total number of frequency bands in the computational method

(ANSI S3.5-1997 2-3). Below is an equation that can be used to describe the SII:

𝑆 = ∑ 𝐼𝑖𝐴𝑖

𝑛

𝑖=1

[1]

The value S refers to the SII, a number from zero to one. The number n is the total

number of bands in the computation. The value Ii refers to the band importance function

and the value Ai refers to the band audibility function. There are four different methods

for calculating the SII (ANSI S3.5-1997 9). These methods include Critical Frequency

Band, One-Third Octave Frequency Band, Equally-Contributing Critical Band, and

Octave Frequency Band. Critical Frequency Band has the most bands with 21 different

bands; Octave Frequency Band has the least bands with only six bands. One-Third

Octave Frequency Band and Equally-Contributing Critical Band have 18 and 17 bands

8

respectively (ANSI S3.5-1997 9). The weighting value of each band is called the band

importance (the value Ii described above). The band importance assigned to each band

differs and must be computed across bands. However, the Equally-Contributing Critical

Band is an exception as the band importance is a constant 0.0588 for all bands (ANSI

S3.5-1997 11). This value is determined by dividing one by the total number of bands in

this method, 18. These four methods primarily differ and can be distinguished by the

weighting values of each band and the number of bands.

The relationship between the SII and speech intelligibility is shown below in

Figure 3 (Killion and Mueller 14). As one can see, there exists a similarity between the

SII and the AI in that the more constraints placed on the system, the higher an

intelligibility score will be for a given SII value.

Figure 3: Relationship between SII and speech intelligibility. SII is shown on the bottom axis in

percentages, SNR is shown on the top axis in dB, and speech intelligibility is shown on the left axis in

percentages (Killion and Mueller 14).

9

The performance of the SII is defined in the ANSI standard. Good communication

systems have an SII greater than approximately 0.75 and poor communication systems

have an SII less than approximately 0.45 (ANSI S3.5-1997 16). This range is used to

determine the overall performance of a given communication system.

Speech Transmission Index

The Speech Transmission Index (STI) is based on the Modulation Transfer

Function (MTF). The sound transmission system consists of both a source of sound and

an environment where the sound is transmitted. This system is characterized by MTFs,

with each MTF describing a different frequency region of the noise carrier (Houtgast and

Steeneken 1070). The MTF is used to define the reduction in the modulation index of the

intensity envelope as a function of modulation frequency (Houtgast and Steeneken 1071).

With similarity in the envelope spectra of speech in different conditions (with a

maximum of about 3 Hz), the MTF can be used to measure speech intelligibility, and in

turn develop the STI (Houtgast and Steeneken 1071). Figure 4 shows the linear

relationship between the STI and speech intelligibility when using a logarithmic scale for

plotting the articulation loss of consonants (Houtgast and Steeneken 1073). Houtgast and

Steeneken reported this data in 1985 in a paper in The Journal of the Acoustical Society

of America.

10

Figure 4: This graph shows the relation between the STI and the articulation loss of consonants

(ALcons) using a logarithmic scale for plotting ALcons which is an intelligibility score found using

talkers and listeners. This intelligibility score is found using the loss of consonants in Consonant-

Vowel-Consonant (CVC) type nonsense words for a set of 57 auditorium-like conditions (noise,

reverberation, or echo).

The STI is calculated when the overall mean signal-to-noise ratio falls between

+15 and – 15 dB by using the below equation.

𝑆𝑇𝐼 =

[(𝑆𝑁)

′̅̅ ̅̅ ̅̅+ 15]

30

[2]

The value (𝑆

𝑁)

′̅̅ ̅̅ ̅̅ refers to the weighted average of (

𝑆

𝑁)

′

which equals 10 log [𝑚

1−𝑚] in

decibels (dB) (Houtgast and Steeneken 1072). The value of m is the modulation

reduction factor and is given by (1 + 10−

𝑆𝑁

10 )

−1

where 𝑆

𝑁 refers to the Signal-to-Noise ratio

(SNR).

Signal-to-Noise Ratio

Each of the above metrics, the AI, the SII and the STI, use the Signal-to-Noise

Ratio (SNR) to determine speech intelligibility. The calculation of the SNR of a given

11

speech signal is necessary to determine whether the speech will be able to be understood

by the listener.

Cockpit Voice Recorders

The calculation of the Signal-to-Noise Ratio will be conducted in the environment

of a failed Cockpit Voice Recorder (CVR). A CVR is a device that records the pilots’

speech in a cockpit to improve the safety of aviation and is a part of the “black box” in

airplanes that have seating for more than ten passengers. The basic black box can be

found in Figure 5 (Edgar). Figure 5 is an image of a newer solid-state recorder, where

older recorders have a tape recording.

12

Figure 5: This image shows the layout of a solid-state “black box” recorder. The casing is orange to

attract attention to be easily found in aircraft wreckage. An underwater locator beacon is used to

find the device when the plane crashes over water. The special casing allows the device to withstand

3400 G for 6.5 ms, 1100 degrees Celsius for 60 minutes, and water pressure resistance up to 6000

meters. The memory is all located within the insulated, armored casing for better protection (Edgar).

As required by the Federal Aviation Administration (FAA), the CVR is placed in

all aircrafts with seating for ten or more passengers. Per Recommendation Number A-

96-171 by the National Transportation Safety Board (NTSB), these devices record the

most recent two hours of audio in the cockpit (Hall 6). This recommendation was due to

the investigation of a hard touchdown of a domestic passenger flight from Atlanta,

Georgia to Nashville, Tennessee, in which four passengers and one crewmember was

injured. (Hall 1). The length of the recording was improved upon from the 30 minute

13

CVR to a two hour CVR because of the necessity for a longer audio recording to

determine the cause and the possible safety solutions for the accident.

In commercial vehicles (and larger aircraft), the CVR has four channels, one

microphone for each person in the cockpit, and one Cockpit Area Microphone (CAM)

that records anything happening in the cockpit. The channels are tested using a singular

tone test. This simple test catches some problems with a CVR, but if the erase head is

broken, the singular frequency test is not robust enough to recognize this problem. Figure

6 shows the CVR as seen by the pilot in a Boeing-737 jet airplane. Pressing the button

marked “TEST” will emit the single tone test. The status is shown on the light marked

“STATUS” (Brady).

Figure 6: This image shows how the Cockpit Voice Recorder (CVR) appears to a pilot of a Boeing

737. The recording begins with the first rise in engine oil pressure and ends 5 minutes after the last

engine shuts down. The rest of the Cockpit Voice Recorder resides in the cargo hold in the back of

the aircraft (Brady).

Purpose

Speech intelligibility tests would add to the complexity of the single tone test in

hope of preventing broken CVRs from flying in the skies. With no erase function,

recordings layer on top of one another and render the CVR useless. Therefore without a

more robust test, the failure of the CVR isn’t noticed until the plane crashes, at which

14

point the data on the CVR is critical to determining the cause of the accident.

Determining a Signal-to-Noise Ratio threshold that interacts well with one of the three

speech intelligibility tests will help keep the CVRs that are on airplanes functioning

properly.

15

Methods

The dataset was developed to mimic an environment with a failed CVR. Due to

the sensitive nature of CVR recordings, they are not readily available. When a CVR fails,

one problem that can be encountered is a rerecording over a previous recording. This is

the problem that will be addressed with the dataset. This environment was replicated

using the TIMIT (Texas Instruments – Massachusetts Institute of Technology) Acoustic-

Phonetic Continuous Speech Corpus to create the dataset (Garafolo, John, et al.). This

corpus of read speech contains broadband recordings of 630 speakers in eight major

dialects of American English, each reading 10 phonetically rich sentences (Garafolo,

John, et al.). An additional noise (with a Signal-to-Noise ratio of 24 dB) was added to the

dataset. The added noise was a recording of the interior of a Boeing 747 airliner during

quiet constant flight (British Broadcasting Corporation). This noise was picked to best

match the cockpit environment of the plane during flight. The Signal-to-Noise ratio was

set at 24 dB because this is the minimum Signal-to-Noise Ratio for the audio channel in a

Honeywell Solid State Cockpit Voice Recorder ED-56a voice recording system

(Honeywell 5). In addition to the airplane noise, multiple phrases from the TIMIT dataset

were combined to replicate failure of the CVR. One phrase was treated as a control case,

where only the airplane noise was present. Additional tests of two phrases and five

phrases were used to test for Signal-to-Noise Ratio, Peak Signal-to-Noise Ratio, Speech

Intelligibility Index, and Speech Transmission Index using MATLAB. All of the

MATLAB code can be found in the appendix.

16

The SNR was calculated using the uncorrupted single layer of speech as the

signal, and the additional airplane noise and the layers of speech as the noise. The peak

Signal-to-Noise Ratio was calculated using Equation 3.

𝑃𝑆𝑁𝑅 = 10𝑙𝑜𝑔10 (

𝑝𝑒𝑎𝑘𝑣𝑎𝑙2

𝑀𝑆𝐸) [3]

The notation peakval refers to the highest value of the signal and MSE refers to the Mean

Square Error.

The Speech Intelligibility Index (SII) was calculated using code developed by the

Acoustical Society of America Working Group S3-79, 2005 (Musch, Hannes, and Pat

Zurek). The one third-octave band procedure was used with 18 different frequency bands.

The frequency bands are listed in Table 1.

Band Number Frequency Range

1 140-180

2 180-225

3 225-282.5

4 282.5-357.5

5 357.5-450

6 450-565

7 565-715

8 715-900

9 900-1125

10 1125-1425

11 1425-1800

12 1800-2250

13 2250-2825

14 2825-3575

15 3575-4500

16 4500-5650

17 5650-7150

18 1750-8000 Table 1 : This table shows the 18 different frequency bands for the one-third octave band procedure

for calculating the SII

17

Each of these frequency bands had a different band importance function and band

audibility function. These were multiplied together and then summed to calculate the SII

value as seen in Equation 1.

The Speech Transmission Index (STI) was calculated using the Signal-to-Noise

Ratio (SNR). The values of the SNR between -15 and +15 dB allowed the STI to be

estimated using equation 2.

18

Results

The results for the Signal-to-Noise Ratio (SNR), Speech Intelligibility Index (SII),

and Speech Transmission Index (STI) are explained below. Each had 20 trials for three

variables: one layer with just the added Boeing 747 interior noise, two layers of speech

with the added Boeing 747 noise, and five layers of speech with the added Boeing 747

noise.

Signal-to-Noise Ratio

Below are two tables detailing the results of the SNR study. There were 20 speech

waves tested. Table 2 shows the SNR of the dataset in decibels (dB). The first column

shows the SNR of the control dataset with just one layer of continuous speech. The

second column shows the SNR of two layers of continuous speech. The third column

shows the SNR of five layers of continuous speech.

19

One layer (dB) Two Layers (dB) Five layers (dB)

24.000 -0.578 -3.978

24.000 5.405 -14.040

24.000 5.578 -4.257

24.000 6.929 -13.872

24.000 2.418 -7.025

24.000 6.874 -4.984

24.000 -5.541 -17.857

24.000 -0.932 -7.590

24.000 -1.338 -1.546

24.000 -3.099 -14.496

24.000 -1.697 -6.304

24.000 -1.913 -4.415

24.000 -5.014 -6.514

24.000 -4.599 -6.489

24.000 0.613 -11.743

24.000 -14.441 -5.544

24.000 -7.181 -5.829

24.000 2.908 -5.048

24.000 7.208 -10.059

24.000 -0.251 -8.709 Table 2 : This table shows the SNR in dB for one, two, and five layers of speech. These values were

calculated using MATLAB and the code is shown in the appendix.

The average SNR for one layer of speech is 24 dB, for two layers of speech is -0.433 dB,

and for five layers of speech is -8.015 dB.

Table 3 shows the peak SNR of the dataset. The first column shows the peak SNR

of the control dataset with just one layer of continuous speech. The second column shows

the peak SNR of two layers of continuous speech. The third column shows the peak SNR

of five layers of continuous speech.

20

One layer (dB) Two Layers (dB) Five layers (dB)

-33.390 -57.968 -61.369

-33.275 -51.869 -60.071

-27.650 -46.073 -59.018

-33.401 -50.472 -61.300

-29.829 -51.411 -59.445

-30.761 -47.888 -60.074

-27.859 -57.401 -61.598

-22.497 -47.429 -57.550

-30.645 -55.983 -58.603

-29.410 -56.509 -58.043

-28.420 -54.116 -58.205

-29.069 -54.982 -56.720

-26.061 -55.075 -63.016

-28.790 -57.390 -62.385

-27.865 -51.253 -59.813

-19.741 -58.181 -58.158

-25.958 -57.139 -60.822

-28.882 -49.974 -59.746

-31.019 -47.811 -61.590

-25.447 -49.698 -61.703 Table 3 :This table shows the peak SNR in dB for one, two, and five layers of speech. These values

were calculated using MATLAB and the code is shown in the appendix.

The average peak SNR for one layer of speech is -28.499, for two layers of speech is -

52.931 dB, and for five layers of speech is -59.961dB.

Speech Intelligibility Index

Table 4 shows the results of the SII study. The first column shows the SII of the

control dataset with just one layer of continuous speech with added interior airplane

noise. The second column shows the SII of two layers of continuous speech with added

interior airplane noise. The third column shows the SII of five layers of continuous

speech with added interior airplane noise.

21

One layer Two Layers Five layers

0.793 0.398 0.310

0.784 0.487 0.114

0.756 0.590 0.305

0.783 0.737 0.148

0.764 0.653 0.247

0.778 0.687 0.354

0.752 0.208 0.011

0.716 0.332 0.238

0.768 0.350 0.464

0.760 0.237 0.041

0.755 0.385 0.266

0.753 0.402 0.307

0.752 0.348 0.213

0.761 0.314 0.240

0.749 0.512 0.106

0.702 0.033 0.264

0.750 0.173 0.347

0.761 0.541 0.258

0.776 0.699 0.268

0.732 0.606 0.270 Table 4 : This table shows the peak SNR in dB for one, two, and five layers of speech. These values

were calculated using MATLAB and the code is shown in the appendix.

The average SII for one layer is 0.757, for two layers is 0.434, and for five layers is

0.238. The SII is a metric for speech intelligibility that ranges from zero to one, with one

being completely intelligible and zero being completely unintelligible.

Speech Transmission Index

The STI was estimated using equation 2 using the calculated SNR values. Table 5

shows the results of the STI study for two layers of speech and five layers of speech. The

STI could not be estimated for one layer of speech because the average SNR of 24 dB

was not between the requisite -15 to +15 dB.

22

Two Layers Five layers

0.481 0.367

0.680 0.032

0.686 0.358

0.731 0.038

0.581 0.266

0.729 0.334

0.469 0.247

0.455 0.448

0.397 0.017

0.443 0.290

0.436 0.353

0.333 0.283

0.347 0.284

0.520 0.109

0.019 0.315

0.261 0.306

0.597 0.332

0.740 0.165

0.492 0.210 Table 5 : This table shows the STI for two and five layers of speech. These values were calculated

using equation 2.

Only 19 trials were conducted for two layers and five layers of speech because one trial

could not be considered as the SNR was below -15 dB. However, this value was taken

into consideration when evaluating the averages. The average modulation factor, m, had a

value of 0.475 for two layers of speech and a value of 0.136 for five layers of speech.

Using this value, the average STI was approximated to be 0.486 for two layers of speech

and 0.233 for five layers of speech. The STI is a metric for speech intelligibility that

ranges from zero to one with one being completely intelligible and zero being completely

unintelligible.

23

Discussion

A positive Signal-to-Noise Ratio means that the signal power is greater than the

noise power. For two layers of continuous speech, the SNR is closer to zero, and is often

negative. The negative values mean that the noise power is greater than the signal power.

Below, in Figure 7, is a plot showing that the SNR for two layers of speech can fluctuate

between a positive and negative value.

Figure 7 : This plot shows the SNR for 20 samples of two and five layers of continuous speech. For

two layers, the numbers are both positive and negative. For five layers, the numbers are consistently

negative because the noise power is greater than the signal power.

The values for the SNR for two layers of continuous speech range from -14.441 dB to

7.208 dB with an average of -0.433 dB. This dynamic range makes it difficult to pinpoint

a problematic SNR. However, when there are five layers of continuous speech, the SNR

is consistently negative. In Figure 7, one can see that the SNR is never above zero. The

24

values for the SNR for five layers of continuous speech range from -17.857 dB to -1.546

dB with an average of -7.995 dB. This average is decidedly negative, showing that five

layers of continuous speech should have greater noise power than signal power. As one

can see, there is a difference between the 24 dB SNR for one layer of speech, the

maximum value of 7.208 dB for two layers of speech, and the maximum value of -1.546

dB for five layers of speech. Once the second layer of speech is recorded over the first

layer, the CVR should be able to deem that the signal is corrupted by measuring the SNR,

thus alerting the user that the CVR should be repaired or replaced.

Next we will look at different speech intelligibility metrics including the Speech

Transmission Index and the Speech Intelligibility Index and how they interpret the

intelligibility of two layers versus five layers.

Using the Signal-to-Noise Ratio, the Speech Transmission Index (STI) can be

estimated using equation 2. Below, in Figure 8, is a plot showing the STI for two layers

of continuous speech. The STI for two layers of continuous speech ranges from 0.019 to

0.740 with an average of 0.486. The two layers of continuous speech cannot be compared

to one layer, but a comparison of the two layers with the five layers can be conducted.

25

Figure 8: This plot shows the STI for 19 samples of two and five layers of continuous speech.

Figure 8 also shows the STI for five layers of continuous speech. The average STI

is lower than the average STI for two layers of continuous speech. The STI for five

layers of continuous speech ranges from 0.017 to 0.448. While the average STI is lower

for five layers of continuous speech than for two layers, the numbers aren’t as consistent.

The range of the STI for two layers of continuous speech overlaps the STI for five layers

of continuous speech making it an inefficient metric to differentiate between two layers

of continuous speech and five layers of continuous speech.

To find a more robust test, the Speech Intelligibility Index was tested using the

dataset. This test showed a greater difference between the one layer and the two layers.

Below in Figure 9 the plot shows the SII for the control dataset. This data only has the

26

added airplane noise and is compared to the original signal. The SII for the one layer

ranges from 0.702 to 0.793. This is a narrow range for SII.

Figure 9 : This plot shows the SII for 20 samples of one layers of continuous speech with added

interior airplane noise in blue. The average of all of the samples is shown in red.

Next we can look at the SII for two layers. Figure 9 shows the SII for two layers

of continuous speech. The SII ranges from 0.173 to 0.737. Only one sample out of twenty

for the two layers fell within the same range as the SII for one layer of continuous speech.

The SII for two layers is outside the range of the SII for one layer 95% of the time. This

allows for a system that tests CVR failure based on the SII to be accurate 95% of the

time. Once the CVR fails once in this way, it will continue to fail. We can look at the SII

of five layers of continuous speech to determine if the SII will be an accurate metric for

intelligibility in CVRs and if it will monitor failures of CVRs.

27

Figure 9 shows a plot of the SII for five layers of continuous speech. The SII

ranges from 0.011 to 0.464 for five layers of continuous speech. This range is entirely

beneath the range of the SII for one layer of continuous speech. As one can see, five

layers of speech renders the signal unintelligible according the SII. Failure of the CVR

can be recognized using the SII metric. The 5% of data that will not be recognized on the

first additional layer of speech will be documented with further layers of speech. The SII

can be used to measure speech intelligibility in Cockpit Voice Recorders.

Finally, let’s correlate the SNR with the SII. Below in Figure 10, one can see the

relationship between SNR and SII.

Figure 10: This plot shows the relationship between SNR and SII for one, two, and five layers of

continuous speech with added interior airplane noise. One layer is shown in green, two layers are

shown in red, and five layers are shown in blue.

28

With a defined Signal-to-Noise ratio of 24 dB, the data of the one layer of continuous

speech remains separated from the two and five layers of continuous speech data. The

two layers and five layers of continuous speech have some overlap both in light of the

SNR and the SII metric. The two layers retain higher values for SII than the five layers of

speech; however, they mostly rest below the one layer of speech. The SNR and SII are

correlated with a Pearson correlation coefficient of 0.946. This correlation coefficient is

statistically significant with a p-value less than 0.05. This correlation helps one

understand the threshold of SNR related to speech intelligibility.

29

Conclusion

The discussion above supports the idea of measuring Speech Intelligibility Index

to improve Cockpit Voice Recorder functionality. The SII metric is necessary to

determine Speech Intelligibility of the system.

By focusing on the specific case where the erase function of the CVR is broken,

one problem has been addressed to aid in the improvement of CVRs. This improvement

could help solve other troubling problems with CVRs in the air travel industry. Often,

connectivity issues or broken microphones go unnoticed with the simple tone test. These

devices are useless unless they are powered on and recording accurate information. They

exist for the sole purpose of improving aviation safety, so it is important that the CVR

device continually works and functions so that it can aid in the comprehension and

progress of the aviation industry.

30

Appendix

List of Acronyms

AI Articulation Index

ANSI American National Standards Institute

CVC Consonant-Vowel-Consonant

CVR Cockpit Voice Recorder

dB Decibels

FAA Federal Aviation Administration

MTF Modulation Transfer Function

NTSB National Transportation Safety Board

PB Phonetically Balanced

TIMIT Texas Instruments – Massachusetts Institute of Technology

SII Speech Intelligibility Index

SNR Signal-to-Noise Ratio

STI Speech Transmission Index

MATLAB Code %Double Dataset load('TIMITsizesorted.mat'); n = 100; %%total number of data pieces, cannot exceed number of files B = cell(1, n); %storage of double data for x = 2:2:n; B{x/2} = padarray(S{x},(size(S{x-1})-size(S{x}))/2) + S{x-1}; end;

%% Data x5 load('TIMITsizesorted.mat'); n = 100; %% cannot exceed number of files C = cell(1, n); %storage of datax5 for x = 5:5:n; C{x/5} = padarray(S{x},(size(S{x-4})-size(S{x}))/2)+ . . .

padarray(S{x-1},(size(S{x-4})-size(S{x-1}))/2)+ . . .

padarray(S{x-2},(size(S{x-4})-size(S{x-2}))/2) + . . .

padarray(S{x-3},(size(S{x-4})-size(S{x-3}))/2) + S{x-4}; end;

function [ noisy, noise ] = addnoise( signal, noise, snr ) % ADDNOISE Add noise to signal at a prescribed SNR level. % % [NOISY,NOISE]=ADDNOISE(SIGNAL,NOISE,SNR) adds NOISE to SIGNAL % at a prescribed SNR level. Returns the mixture signal as well % as scaled noise such that NOISY=SIGNAL+NOISE. % % Inputs

31

% SIGNAL is a target signal as vector. % % NOISE is a masker signal as vector, such that % length(NOISE)>=length(SIGNAL). Note that % in the case that length(NOISE)>length(SIGNAL), % a vector of length length(SIGNAL) is selected % from NOISE starting at a random sample number. % % SNR is the desired signal-to-noise ratio level (dB). % % Outputs % NOISY is a mixture signal of SIGNAL and NOISE at given SNR. % % NOISE is a scaled masker signal, such that the mixture % NOISY=SIGNAL+NOISE has the desired SNR. % % Example % % inline function for SNR calculation % SNR = @(signal,noisy)( 20*log10(norm(signal)/norm(signal-

noisy)) ); % % fs = 16000; % sampling frequency

(Hz) % freq = 1000; % sinusoid frequency

(Hz) % time = [ 0:1/fs:2 ]; % time vector (s) % signal = sin( 2*pi*freq*time ); % signal vector (s) % noise = randn( size(signal) ); % noise vector (s) % snr = -5; % desired SNR level (dB) % % % generate mixture signal: noisy = signal + noise % [ noisy, noise ] = addnoise( signal, noise, snr ); % % % check the resulting signal-to-noise ratio % fprintf( 'SNR: %0.2f dB\n', SNR(signal,noisy) ); % % See also TEST_ADDNOISE_SINUSOID, TEST_ADDNOISE_SPEECH. % Author: Kamil Wojcicki, UTD, July 2011 % inline function for SNR calculation SNR = @(signal,noisy)( 20*log10(norm(signal)/norm(signal-noisy)) ); % needed for older realases of MATLAB randi = @(n)( round(1+(n-1)*rand) ); % ensure masker is at least as long as the target S = length( signal ); N = length( noise ); if( S>N ), error( 'Error: length(signal)>length(noise)' ); end; % generate a random start location in the masker signal R = randi(1+N-S); % extract random section of the masker signal noise = noise(R:R+S-1); % scale the masker w.r.t. to target at a desired SNR level noise = noise / norm(noise) * norm(signal) / 10.0^(0.05*snr); % generate the mixture signal noisy = signal + noise;

32

% sanity check assert( abs(SNR(signal,noisy)-snr) < 1E10*eps(snr) ); %%% EOF

%% Add Plane Noise SP2 = cell(1,20); SP5 = cell(1,20); BP = cell(1,20); CP = cell(1,20); for n = 1:1:20 [SP2{n}, trash] = addnoise(S2{n}, plane_noise, 24); [SP5{n}, ~] = addnoise(S5{n}, plane_noise, 24); [BP{n}, ~] = addnoise(B{n}, plane_noise, 24); [CP{n}, ~] = addnoise(C{n}, plane_noise, 24); end

%% Calculating Peak SNR and SNR psnr2 = zeros(1, 50); %%peak Signal-to-Noise Ratio for x2 snr2 = zeros(1, 50); %%Signal-to-Noise Ratio for x2 psnr5 = zeros(1, 50); %%peak Signal-to-Noise Ratio for x5 snr5 = zeros(1, 50); %%Signal-to-Noise Ratio for x5

psnrc = zeros(1, 50); %%peak Signal-to-Noise Ratio for control snrc = zeros(1, 50); %%Signal-to-Noise Ratio for control for x = 1:1:20 [psnr1, snr1] = psnr(BP{x},S2{x}); psnr2(x) = psnr1; snr2(x) = snr1; clearvars psnr1; clearvars snr1; end for x = 1:1:20 [psnr1, snr1] = psnr(CP{x},S5{x}); psnr5(x) = psnr1; snr5(x) = snr1; clearvars psnr1; clearvars snr1; end for x = 1:1:20 [psnr1, snr1] = psnr(SP2{x},S2{x}); psnrc(x) = psnr1; snrc(x) = snr1; clearvars psnr1; clearvars snr1; end

% Speech Intelligibility Index

function S = SII_test(varargin) % "Methods for calculation of the Speech Intelligibility Index" % (ANSI S3.5-1997) % % MATLAB implementation of Section 4. % % Note: The remaining sections of the standard, which provide means to

33

calculate input parameters required by % the "core" SII procedure of Section 4, are implemented in seperate

scripts: % Section 5.1 in script Input_5p1.m "method based on the direct

measurement/ estimation of noise and speech spectrum levels at the

listener's position" % Section 5.2 in script Input_5p2.m "method based on MTFI/CSNSL

measurements at the listener's position" % Section 5.3 in script Input_5p3.m "method based on MTFI/CSNSL

measurements at the eardrum of the listener" % % % Parameters are passed to the procedure through pairs of "identifier"

and corresponding "argument" % Identifiers are always strings. Possible identifiers are: % % 'E' Equivalent Speech Spectrum Level (Section 3.6 in the standard) % 'N' Equivalent Noise Spectrum Level (Section 3.15 in the standard) % 'T' Equivalent Hearing Threshold Level [dBHL] (Section 3.23 in the

standard) % 'I' Band Importance function (Section 3.1 in the standard) % % Except for 'E', which must be specified, all parameters are optional.

If an identifier is not specified a default value will be used. % Pairs of identifier and argument can occur in any order. However, if

an identifier is listed, it must be followed immediately by its

argument. % % % Possible arguments for the identifiers are: % % Arguments for 'E': % A row or column vector with 18 numbers stating the

Equivalent Speech Spectrum Levels in dB in bands 1 through 18. % % Arguments for 'N': % A row or column vector with 18 numbers stating the

Equivalent Noise Spectrum Levels in dB in bands 1 through 18. % If this identifier is omitted, a default Equivalent Noise

Spectrum Level of -50 dB is assumed in all 18 bands (see note in Section

4.2). % % Arguments for 'T': % A row or column vector with 18 numbers stating the

Equivalent Hearing Threshold Levels in dBHL in bands 1 through 18. % If this identifier is omitted, a default Equivalent Hearing

Threshold Level of 0 dBHL is assumed in all 18 bands . % % Arguments for 'I': % A scalar having a value of either 1, 2, 3, 4, 5, 6, or 7.

The Band-importance functions associated with each scalar are % 1: Average speech as specified in Table 3

(DEFAULT) % 2: various nonsense syllable tests where most

English phonemes occur equally often (as specified in Table B.2) % 3: CID-22 (as specified in Table B.2) % 4: NU6 (as specified in Table B.2) % 5: Diagnostic Rhyme test (as specified in Table

B.2) % 6: short passages of easy reading material (as

specified in Table B.2) % 7: SPIN (as specified in Table B.2) % %

34

% The function returns the SII of the specified listening condition,

which is a value in the interval [0, 1]. % % % REMINDER OF DEFINITIONS & MEANINGS: % % Equivalent Speech Spectrum Level, E-prime % The SII calculation is based on free-field levels, even

though the quantity relevant for perception and intelligibility % is the level at the listener's eardrum. % The Equivalent Speech Spectrum Level is the speech spectrum

level at the center of the listener's % head (when the listener is temporarily absent) that produces

in an average human with unoccluded ears an eardrum speech level equal % to the eardrum speech level actually present in the

listening situation to be analyzed. % Before the SII can be applied to a given listening

situation, the corresponding Equivalent Speech % Spectrum Level must be derived. For example, when speech is

presented over insert earphones (earphones inside the earcanal), % only the speech spectrum level at the eardrum is known.

Using the inverse of the freefield-to-eardrum transfer function (Table 3

of the standard) % this eardrum level must be "projected" into the freefield,

yielding the Equivalent Speech Spectrum Level. % Sections 5.1, 5.2, and 5.3 of the standard give three

examples of how to derive the Equivalent Speech Spectrum Level from

physical measurements. % The standard allows the use of alternative transforms, such

as the one illustrated above, where appropriate. % % Equivalent Noise Spectrum Level, N-prime % Similar to the Equivalent Speech Spectrum Level, the

Equivalent Noise Spectrum Level is the noise spectrum level at the

center of the listener's % head (when the listener is temporarily absent) that produces

an eardrum noise level equal to the eardrum noise level actually present

in the % listening situation to be analyzed. Sections 5.1, 5.2, and

5.3 give three examples of how to derive the Equivalent Speech Spectrum % Level from physical measurements. % % Hannes Muesch, 2003 - 2005 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% VERIFY INTEGRITY OF INPUT

VARIABLES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [x,Nvar] = size(varargin); CharCount = 0; Ident = []; for k = 1:Nvar if ischar(varargin{k})&(length(varargin{k})==1) CharCount = CharCount + 1; Ident = [Ident; k]; end end if Nvar/CharCount ~= 2 error('Every input must be preceeded by an identifying string') else for n = 1:length(Ident) if upper(varargin{Ident(n)}) == 'N' % Equivalent %Noise

Spectrum Level (3.15) N = varargin{Ident(n)+1}; elseif upper(varargin{Ident(n)}) == 'E' % Equivalent %Speech

35

Spectrum Level (3.6) E = varargin{Ident(n)+1}; elseif upper(varargin{Ident(n)}) == 'T' % Equivalent

%Hearing Threshold Level [dBHL] (3.23) T = varargin{Ident(n)+1}; elseif upper(varargin{Ident(n)}) == 'I' % Band %Importance

function (3.1) I = varargin{Ident(n)+1}; else error('Only ''E'', ''I'', ''N'', and ''T'' are valid

identifiers'); end; end; end; if isempty(who('E')), error('The Equivalent Speech Spectrum Level,

''E'', must be specified'); end if isempty(who('N')), N = -50*ones(1,18);

end; if isempty(who('T')), T = zeros(1,18);

end; if isempty(who('I')), I = 1;

end; N = N(:)'; T = T(:)'; E = E(:)'; if length(N) ~= 18, error('Equivalent Noise Spectrum Level: Vector size

incorrect'); end; if length(T) ~= 18, error('Equivalent Hearing Threshold Level: Vector

size incorrect'); end; if length(E) ~= 18, error('Equivalent Speech Spectrum Level: Vector size

incorrect'); end;

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% IMPLEMENTATION OF SPEECH

INTELLIGIBILITY INDEX %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ======================= THE NUMBERS IN PARENTHESIS REFER TO THE

SECTIONS IN THE ANSI STANDARD ======================= % Define band center frequencies for 1/3rd octave procedure (Table 3) f = [160 200 250 315 400 500 630 800 1000 1250 1600 2000, ... 2500 3150 4000 5000 6300 8000]; % Define Internal Noise Spectrum Level (Table 3) X = [0.6 -1.7 -3.9 -6.1 -8.2 -9.7 -10.8 -11.9 -12.5 -13.5 -15.4 -17.7,

... -21.2 -24.2 -25.9 -23.6 -15.8 -7.1]; % Self-Speech Masking Spectrum (4.3.2.1 Eq. 5) V = E - 24; % 4.3.2.2 B = max(V,N); % Calculate slope parameter Ci (4.3.2.3 Eq. 7) C = 0.6.*(B+10*log10(f)-6.353) - 80; % Initialize Equivalent Masking Spectrum Level (4.3.2.4) Z = []; Z(1) = B(1);

36

% Calculate Equivalent Masking Spectrum Level (4.3.2.5 Eq. 9) for i = 2:18 Z(i) = 10*log10(10.^(0.1*N(i)) + ... sum(10.^(0.1*(B(1:(i-1))+3.32.*C(1:(i-1)).*log10(0.89*f(i)./f(1:(i-

1))))))); end; % Equivalent Internal Noise Spectrum Level (4.4 Eq. 10) X = X + T; % Disturbance Spectrum Level (4.5) D = max(Z,X); % Level Distortion Factor (4.6 Eq. 11) L = 1 - (E - SpeechSptr('normal') - 10)./160; L = min(1,L); % 4.7.1 Eq. 12 K = (E-D+15)/30; K = min(1,max(0,K)); % Band Audibility Function (7.7.2 Eq. 13) A = L.*K; % Speech Intelligibility Index (4.8 Eq. 14) S = sum(BndImp(I).*A); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % PRIVATE FUNCTIONS

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function I = BndImp(tst); % Band importance functions: % tst = 1: Average speech as specified in Table 3 % 2: various nonsense syllable tests where most English % phonemes occur equally often % 3: CID-22 % 4: NU6 % 5: Diagnostic Rhyme test % 6: short passages of easy reading material % 7: SPIN if (nargin ~= 1), error('Incorrect # of input args to BndImp'); end; if ~((tst==1)|(tst==2)|(tst==3)|(tst==4)|(tst==5)|(tst==6)|(tst==7)), error('Band Importance function must be integer between 1 and 7'); end; BIArr= [0.0083 0 0.0365 0.0168 0 0.0114 0 0.0095 0 0.0279 0.013 0.024 0.0153 0.0255 0.015 0.0153 0.0405 0.0211 0.033 0.0179 0.0256 0.0289 0.0284 0.05 0.0344 0.039 0.0558 0.036 0.044 0.0363 0.053 0.0517 0.0571 0.0898 0.0362 0.0578 0.0422 0.0518 0.0737 0.0691 0.0944 0.0514 0.0653 0.0509 0.0514 0.0658 0.0781 0.0709 0.0616 0.0711 0.0584 0.0575 0.0644 0.0751 0.066 0.077 0.0818 0.0667 0.0717 0.0664 0.0781 0.0628 0.0718

37

0.0844 0.0774 0.0873 0.0802 0.0811 0.0672 0.0718 0.0882 0.0893 0.0902 0.0987 0.0961 0.0747 0.1075 0.0898 0.1104 0.0938 0.1171 0.0901 0.0755 0.0921 0.0868 0.112 0.0928 0.0932 0.0781 0.082 0.1026 0.0844 0.0981 0.0678 0.0783 0.0691 0.0808 0.0922 0.0771 0.0867 0.0498 0.0562 0.048 0.0483 0.0719 0.0527 0.0728 0.0312 0.0337 0.033 0.0453 0.0461 0.0364 0.0551 0.0215 0.0177 0.027 0.0274 0.0306 0.0185 0 0.0253 0.0176 0.024 0.0145 0]; I = BIArr(:,tst)'; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function E = SpeechSptr(VclEfrt); % This function returns the standard speech spectrum level from Table 3 Ei=[32.41 33.81 35.29 30.77; 34.48 33.92 37.76 36.65; 34.75 38.98 41.55 42.5; 33.98 38.57 43.78 46.51; 34.59 39.11 43.3 47.4; 34.27 40.15 44.85 49.24; 32.06 38.78 45.55 51.21; 28.3 36.37 44.05 51.44; 25.01 33.86 42.16 51.31; 23 31.89 40.53 49.63; 20.15 28.58 37.7 47.65; 17.32 25.32 34.39 44.32; 13.18 22.35 30.98 40.8; 11.55 20.15 28.21 38.13; 9.33 16.78 25.41 34.41; 5.31 11.47 18.35 28.24; 2.59 7.67 13.87 23.45; 1.13 5.07 11.39 20.72]; switch lower(VclEfrt) case 'normal', E = Ei(:,1)'; case 'raised', E = Ei(:,2)'; case 'loud', E = Ei(:,3)'; case 'shout', E = Ei(:,4)'; otherwise, error('Identifyer string to ''E'' not recognized') end; % EOF

%% SII Calculations %% divide signal and noise into different bands (18) %band pass filter %bandpass digital filter design Fs = 16000; Ast1 = 60; Ast2 = 60; Ap = 1; ds1 = fdesign.bandpass(40,140,180,280,Ast1,Ap,Ast2,Fs); ds2= fdesign.bandpass(80,180,225,325,Ast1,Ap,Ast2,Fs); ds3 = fdesign.bandpass(125,225,282.5,382.5,Ast1,Ap,Ast2,Fs); ds4 = fdesign.bandpass(182.5,282.5,357.5,457.5,Ast1,Ap,Ast2,Fs);

38

ds5 = fdesign.bandpass(257.5,357.5,450,550,Ast1,Ap,Ast2,Fs); ds6 = fdesign.bandpass(350,450,565,665,Ast1,Ap,Ast2,Fs); ds7 = fdesign.bandpass(465,565,715,815,Ast1,Ap,Ast2,Fs); ds8 = fdesign.bandpass(615,715,900,1000,Ast1,Ap,Ast2,Fs); ds9 = fdesign.bandpass(800,900,1125,1225,Ast1,Ap,Ast2,Fs); ds10 = fdesign.bandpass(1025,1125,1425,1525,Ast1,Ap,Ast2,Fs); ds11 = fdesign.bandpass(1325,1425,1800,1900,Ast1,Ap,Ast2,Fs); ds12 = fdesign.bandpass(1700,1800,2250,2350,Ast1,Ap,Ast2,Fs); ds13 = fdesign.bandpass(2150,2250,2825,2925,Ast1,Ap,Ast2,Fs); ds14 = fdesign.bandpass(2725,2825,3575,3675,Ast1,Ap,Ast2,Fs); ds15 = fdesign.bandpass(3475,3575,4500,4600,Ast1,Ap,Ast2,Fs); ds16 = fdesign.bandpass(4400,4500,5650,5750,Ast1,Ap,Ast2,Fs); ds17 = fdesign.bandpass(5550,5650,7150,7250,Ast1,Ap,Ast2,Fs); ds18 = fdesign.bandpass(7050,7150,7900,8000,Ast1,Ap,Ast2,Fs); Hds1 = design(ds1,'equiripple'); Hds2 = design(ds2,'equiripple'); Hds3 = design(ds3,'equiripple'); Hds4 = design(ds4,'equiripple'); Hds5 = design(ds5,'equiripple'); Hds6 = design(ds6,'equiripple'); Hds7 = design(ds7,'equiripple'); Hds8 = design(ds8,'equiripple'); Hds9 = design(ds9,'equiripple'); Hds10 = design(ds10,'equiripple'); Hds11 = design(ds11,'equiripple'); Hds12 = design(ds12,'equiripple'); Hds13 = design(ds13,'equiripple'); Hds14 = design(ds14,'equiripple'); Hds15 = design(ds15,'equiripple'); Hds16 = design(ds16,'equiripple'); Hds17 = design(ds17,'equiripple'); Hds18 = design(ds18,'equiripple'); %% Signal 2x Sband2s = cell(18,20); for n = 1:1:20 Sband2s{1,n} = filter(Hds1,S2{n}); Sband2s{2,n} = filter(Hds2,S2{n}); Sband2s{3,n} = filter(Hds3,S2{n}); Sband2s{4,n} = filter(Hds4,S2{n}); Sband2s{5,n} = filter(Hds5,S2{n}); Sband2s{6,n} = filter(Hds6,S2{n}); Sband2s{7,n} = filter(Hds7,S2{n}); Sband2s{8,n} = filter(Hds8,S2{n}); Sband2s{9,n} = filter(Hds9,S2{n}); Sband2s{10,n} = filter(Hds10,S2{n}); Sband2s{11,n} = filter(Hds11,S2{n}); Sband2s{12,n} = filter(Hds12,S2{n}); Sband2s{13,n} = filter(Hds13,S2{n}); Sband2s{14,n} = filter(Hds14,S2{n}); Sband2s{15,n} = filter(Hds15,S2{n}); Sband2s{16,n} = filter(Hds16,S2{n}); Sband2s{17,n} = filter(Hds17,S2{n}); Sband2s{18,n} = filter(Hds18,S2{n}); end %% Signal 5x Sband5s = cell(18,20); for n = 1:1:20 Sband5s{1,n} = filter(Hds1,S5{n});

39

Sband5s{2,n} = filter(Hds2,S5{n}); Sband5s{3,n} = filter(Hds3,S5{n}); Sband5s{4,n} = filter(Hds4,S5{n}); Sband5s{5,n} = filter(Hds5,S5{n}); Sband5s{6,n} = filter(Hds6,S5{n}); Sband5s{7,n} = filter(Hds7,S5{n}); Sband5s{8,n} = filter(Hds8,S5{n}); Sband5s{9,n} = filter(Hds9,S5{n}); Sband5s{10,n} = filter(Hds10,S5{n}); Sband5s{11,n} = filter(Hds11,S5{n}); Sband5s{12,n} = filter(Hds12,S5{n}); Sband5s{13,n} = filter(Hds13,S5{n}); Sband5s{14,n} = filter(Hds14,S5{n}); Sband5s{15,n} = filter(Hds15,S5{n}); Sband5s{16,n} = filter(Hds16,S5{n}); Sband5s{17,n} = filter(Hds17,S5{n}); Sband5s{18,n} = filter(Hds18,S5{n}); end %% Control, Just Plane noise 2x Sbandp2s = cell(18,20); for n = 1:1:20 Sbandp2s{1,n} = filter(Hds1,SP2{n}-S2{n}); Sbandp2s{2,n} = filter(Hds2,SP2{n}-S2{n}); Sbandp2s{3,n} = filter(Hds3,SP2{n}-S2{n}); Sbandp2s{4,n} = filter(Hds4,SP2{n}-S2{n}); Sbandp2s{5,n} = filter(Hds5,SP2{n}-S2{n}); Sbandp2s{6,n} = filter(Hds6,SP2{n}-S2{n}); Sbandp2s{7,n} = filter(Hds7,SP2{n}-S2{n}); Sbandp2s{8,n} = filter(Hds8,SP2{n}-S2{n}); Sbandp2s{9,n} = filter(Hds9,SP2{n}-S2{n}); Sbandp2s{10,n} = filter(Hds10,SP2{n}-S2{n}); Sbandp2s{11,n} = filter(Hds11,SP2{n}-S2{n}); Sbandp2s{12,n} = filter(Hds12,SP2{n}-S2{n}); Sbandp2s{13,n} = filter(Hds13,SP2{n}-S2{n}); Sbandp2s{14,n} = filter(Hds14,SP2{n}-S2{n}); Sbandp2s{15,n} = filter(Hds15,SP2{n}-S2{n}); Sbandp2s{16,n} = filter(Hds16,SP2{n}-S2{n}); Sbandp2s{17,n} = filter(Hds17,SP2{n}-S2{n}); Sbandp2s{18,n} = filter(Hds18,SP2{n}-S2{n}); end %% Noise 2x Nband2s = cell(18,20); for n = 1:1:20 Nband2s{1,n} = filter(Hds1,BP{n} - S2{n}); Nband2s{2,n} = filter(Hds2,BP{n} - S2{n}); Nband2s{3,n} = filter(Hds3,BP{n} - S2{n}); Nband2s{4,n} = filter(Hds4,BP{n} - S2{n}); Nband2s{5,n} = filter(Hds5,BP{n} - S2{n}); Nband2s{6,n} = filter(Hds6,BP{n} - S2{n}); Nband2s{7,n} = filter(Hds7,BP{n} - S2{n}); Nband2s{8,n} = filter(Hds8,BP{n} - S2{n}); Nband2s{9,n} = filter(Hds9,BP{n} - S2{n}); Nband2s{10,n} = filter(Hds10,BP{n} - S2{n}); Nband2s{11,n} = filter(Hds11,BP{n} - S2{n}); Nband2s{12,n} = filter(Hds12,BP{n} - S2{n}); Nband2s{13,n} = filter(Hds13,BP{n} - S2{n}); Nband2s{14,n} = filter(Hds14,BP{n} - S2{n}); Nband2s{15,n} = filter(Hds15,BP{n} - S2{n}); Nband2s{16,n} = filter(Hds16,BP{n} - S2{n}); Nband2s{17,n} = filter(Hds17,BP{n} - S2{n});

40

Nband2s{18,n} = filter(Hds18,BP{n} - S2{n}); end %% Noise 5x Nband5s = cell(18,20); for n = 1:1:20 Nband5s{1,n} = filter(Hds1,CP{n} - S5{n}); Nband5s{2,n} = filter(Hds2,CP{n} - S5{n}); Nband5s{3,n} = filter(Hds3,CP{n} - S5{n}); Nband5s{4,n} = filter(Hds4,CP{n} - S5{n}); Nband5s{5,n} = filter(Hds5,CP{n} - S5{n}); Nband5s{6,n} = filter(Hds6,CP{n} - S5{n}); Nband5s{7,n} = filter(Hds7,CP{n} - S5{n}); Nband5s{8,n} = filter(Hds8,CP{n} - S5{n}); Nband5s{9,n} = filter(Hds9,CP{n} - S5{n}); Nband5s{10,n} = filter(Hds10,CP{n} - S5{n}); Nband5s{11,n} = filter(Hds11,CP{n} - S5{n}); Nband5s{12,n} = filter(Hds12,CP{n} - S5{n}); Nband5s{13,n} = filter(Hds13,CP{n} - S5{n}); Nband5s{14,n} = filter(Hds14,CP{n} - S5{n}); Nband5s{15,n} = filter(Hds15,CP{n} - S5{n}); Nband5s{16,n} = filter(Hds16,CP{n} - S5{n}); Nband5s{17,n} = filter(Hds17,CP{n} - S5{n}); Nband5s{18,n} = filter(Hds18,CP{n} - S5{n}); end %% find the max value within each band max_valueS2s = zeros(18,20); max_valueN2s = zeros(18,20); max_valueS5s = zeros(18,20); max_valueN5s = zeros(18,20); max_valueSP2s = zeros(18,20); for n = 1:18 for m = 1:20 max_valueS2s(n,m) = max(Sband2s{n,m}); max_valueN2s(n,m) = max(Nband2s{n,m}); max_valueS5s(n,m) = max(Sband5s{n,m}); max_valueN5s(n,m) = max(Nband5s{n,m}); max_valueSP2s(n,m) = max(Sbandp2s{n,m}); end end %% Calculate Power %Signal Level and Noise level E2 = 20.*log10(max_valueS2s./32767); N2 = 20.*log10(max_valueN2s./32767); E5 = 20.*log10(max_valueS5s./32767); N5 = 20.*log10(max_valueN5s./32767); NP2 = 20.*log10(max_valueSP2s./32767); %% Calculate SII SII_val2 = zeros(20,1); SII_val5 = zeros(20,1); SII_valP2 = zeros(20,1); for n = 1:1:20 SII_val2(n) = SII_test('E', abs(N2(:,n)), 'N', abs(E2(:,n)), 'I',

1); SII_val5(n) = SII_test('E', abs(N5(:,n)), 'N', abs(E5(:,n)), 'I',

41

1); SII_valP2(n) = SII_test('E', abs(NP2(:,n)), 'N', abs(E2(:,n)), 'I',

1); end

%% Make Graphs %%BOX PLOTS figure(1); boxplot([SNR2 SNR5], 'labels', {'Two Layers', 'Five Layers'}); hold on; ylabel('SNR (dB)'); title('Signal-to-Noise Ratio'); axis([0 3 -15 15]); figure(2); boxplot([STI2 STI5], 'labels', {'Two Layers', 'Five Layers'}); hold on; ylabel('STI'); title('Speech Transmission Index'); axis([0 3 0 1]); figure(3); boxplot([Plane SII2 SII5], 'labels', {'One Layer','Two Layers', 'Five

Layers'}); hold on; ylabel('SII'); title('Speech Intelligibility Index'); axis([0 4 0 1]);

%%Correlation Plot figure(4); scatter(SNRC, Plane, 'g'); hold on; scatter(SNR2, SII2, 'r'); hold on; scatter(SNR5, SII5, 'b'); hold on; xlabel('Signal-to-Noise Ratio (dB)'); ylabel('Speech Intelligibility Index'); title('SNR vs. SII'); legend('One Layer', 'Two Layers', 'Five Layers'); axis([-25 25 0 1]); %% Calculate Correlation [R, P] = corrcoef([SNRC SNR2 SNR5], [Plane SII2 SII5]);

42

Bibliography

ANSI. "Methods for Calculation of the Articulation Index." ANSI Report No. S3.5-1969

(1969). Print.

ANSI. "Methods for Calculation of the Speech Intelligiblity Index." ANSI Report No.

S3.5-1997 (1997). Print.

Brady, Chris. “Communications”. The 737 Technical Site. 1999 : Web. 10 Feb. 2015.

http://www.b737.org.uk/communications.htm.

British Broadcasting Corporation. “Boeing 747 Constant Quiet Flight”. Sound Effects.

Web. 14 April 2015. http://www.sounddogs.com/sound-

effects/64/mp3/864366_SOUNDDOGS__tr.mp3.

Edgar, Julian. "Inside the Black Box". Auto Speed. 11 Dec 2001: Issue 160. Web. 10 Feb.

2015. http://www.autospeed.com/cms/article.html?&title=Inside-the-Black-

Bo&xA=1227.

French, N. R., and J. C. Steinberg. "Factors governing the intelligibility of speech

sounds." The journal of the Acoustical society of America 19.1 (1947): 90-119.

Garofolo, John, et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1.

Web Download. Philadelphia: Linguistic Data Consortium, 1993.

Hall, Jim. “Safety Recommendation A-96-166 through -171”. National Transportation

Safety Board. 20 Dec. 1996: 1-6. Print.

Honeywell. "SSCVR: Solid State Cockpit Voice Recorder ED056a Voice Recording

System." Product Description, Mar. 2000.

Houtgast, Tammo, and Herman JM Steeneken. "A review of the MTF concept in room

acoustics and its use for estimating speech intelligibility in auditoria." The

Journal of the Acoustical Society of America 77.3 1985: 1069-1077.

Killion, Mead C., and H. Gustav Mueller. "Twenty years later: a NEW count-the-dots

method." The Hearing Journal 63.1 (2010): 10-17.

Kryter, K. D. “Some comparisons between rhyme and PB-word intelligibility tests” The

Journal of the Acoustical Society of America Vol. 37 1965: 1146

Lindgreen, Troels Schmidt, David Pelegrin Garcia, and Eleftheria Georganti.

“Intelligibility of Speech” DTU Technical University of Denmark. 2008: Print.

43

Musch, Hannes, and Pat Zurek. "Programs." SII: Speech Intelligibility Index. ASA

Working Group S3-79, 2005. Web. 22 Apr. 2015.

http://www.sii.to/html/programs.html.

Schwerin, Belinda, and Kuldip Paliwal. "An improved speech transmission index for

intelligibility prediction." Speech Communication 65. 2014: 9-19. Web. 10 Feb.

2015. http://www.sciencedirect.com/science/article/pii/S0167639314000429.

Sumby, William H., and Irwin Pollack. "Visual contribution to speech intelligibility in

noise." The journal of the acoustical society of america 26.2 (1954): 212-215.

Wojcicki, Kamil. "Add Noise." MATLAB Central File Exchange. 14 July 2011. Web. 22

Apr. 2015. http://www.mathworks.com/matlabcentral/fileexchange/32136-add-

noise/content/addnoise/addnoise.m.

44

Curriculum Vitale

Jane Foster

703 362 1001

[email protected]

EDUCATION

Johns Hopkins University Baltimore, MD

Master of Science in Electrical Engineering Expected May 2015

Bachelor of Science in Electrical Engineering Departmental Honors 2014

Bachelor of Arts in French, Departmental Honors 2014

Dean’s List, University Honors

Related Courses: Signals and Systems, Circuits, Digital Signal Processing, Image

Processing and Analysis, Computer Architecture, Electronics Design Lab, Projects in the

Design of a Chemical Car, Mechatronics, Audio Signal Processing, Robot

Sensors/Actuators

EXPERIENCE

Recorders Division Student Trainee Washington, DC

National Transportation Safety Board Department of Research and Engineering 2014

- Process and analyze parametric data stored vehicle recording systems, writing

technical reports on the devices

- Learn about accident investigation processes, surface vehicle navigation, and aircraft

operations

Undergraduate Researcher Paris, France

École Normale Supérieure 2013

- Focus on understanding brain processes in perceiving sounds in complex acoustic

environments

- Developed a computational model for the statistics of hearing perception using

MATLAB and ran tests on the fundamentals of vowel perception

Electrical Engineering Intern Chantilly, VA

Scitor Corporation 2012

- Researched RADAR and link budgets, created images to aid in optimization of office

space and organization

ACTIVITIES

Graduate Advisor, President, Vice President of Chapter Affairs, Treasurer 2010-2015

Johns Hopkins University IEEE

Company Dancer 2010-2015

Johns Hopkins University Ballet Company

Team Leader 2012-2013

Power Beaming Design Team

AWARDS

Electrical and Computer Engineering Student Leadership Award May 2014

speech intelligibility in cockpit voice recorders · movements of the mouth and lips that aid in a...

Documents