speech intelligibility in cockpit voice recorders · movements of the mouth and lips that aid in a...
TRANSCRIPT
SPEECH INTELLIGIBILITY IN COCKPIT VOICE RECORDERS
by
Jane Foster
A thesis submitted to Johns Hopkins University in conformity with the requirements
for the degree of Master of Electrical Engineering
Baltimore, Maryland
May 2015
ii
Abstract
This paper focuses on the issue of correcting failures in Cockpit Voice
Recorders when dealing with speech intelligibility. The common problem that is
addressed of Cockpit Voice Recorders is the failure of the erase function, causing
speech recordings to be recorded on top of each other. Computational speech
intelligibility systems that are focused on include the Articulation Index, the Speech
Intelligibility Index, and the Speech Transmission Index. These all use the Signal-to-
Noise Ratio to help measure intelligibility. This paper determines a threshold for the
Signal-to-Noise Ratio and intelligibility with layered continuous human speech.
iii
Preface
During my time as an undergraduate student at Johns Hopkins University, I
focused on two seemingly opposite things, Electrical Engineering and French. After
completing a thesis for my Bachelor of Arts in French Language and Literature, it
only seemed appropriate to write a thesis during my Electrical Engineering Master’s
program. Driven by my love for language, I searched for a problem to solve that
dealt with speech. After spending a summer working on “Black Boxes” (also known
as the Cockpit Voice Recorder) for the National Transportation Safety Board in
Washington, D.C., I was troubled by a problem I saw. Cockpit Voice Recorders
were found to be broken, even though they are made to be indestructible. Vital to
many investigations, the Cockpit Voice Recorder contains necessary data and when
it fails, it is frankly useless. With this in mind, I set out with the goal to create a more
robust test to improve these devices.
I want to express my appreciation and gratitude to Dr. Mounya Elhilali – my
thesis advisor, Dr. Danielle Tarraf – my academic advisor, Sarah McComb,
Alexander Simonelli, Johns Hopkins University, and my parents, for always
believing in me and pushing me to learn more. Thank you.
iv
Table of Contents
Abstract ........................................................................................................................ ii
Preface......................................................................................................................... iii
List of Tables ............................................................................................................... v
List of Figures ............................................................................................................. vi
Introduction .................................................................................................................. 1
Articulation Index .................................................................................................... 1
Speech Intelligibility Index ...................................................................................... 7
Speech Transmission Index ..................................................................................... 9
Signal-to-Noise Ratio............................................................................................. 10
Cockpit Voice Recorders ....................................................................................... 11
Purpose ................................................................................................................... 13
Methods...................................................................................................................... 15
Results ........................................................................................................................ 18
Signal-to-Noise Ratio............................................................................................. 18
Speech Intelligibility Index .................................................................................... 20
Speech Transmission Index ................................................................................... 21
Discussion .................................................................................................................. 23
Conclusion ................................................................................................................. 29
Appendix .................................................................................................................... 30
List of Acronyms ................................................................................................... 30
MATLAB Code ..................................................................................................... 30
Bibliography .............................................................................................................. 42
Curriculum Vitale ...................................................................................................... 44
v
List of Tables
Table 1 : This table shows the 18 different frequency bands for the one-third octave
band procedure for calculating the SII ............................................................... 16
Table 2 : This table shows the SNR in dB for one, two, and five layers of speech.
These values were calculated using MATLAB and the code is shown in the
appendix. ............................................................................................................ 19
Table 3 :This table shows the peak SNR in dB for one, two, and five layers of
speech. These values were calculated using MATLAB and the code is shown in
the appendix. ...................................................................................................... 20
Table 4 : This table shows the peak SNR in dB for one, two, and five layers of
speech. These values were calculated using MATLAB and the code is shown in
the appendix. ...................................................................................................... 21
Table 5 : This table shows the STI for two and five layers of speech. These values
were calculated using equation 2. ...................................................................... 22
vi
List of Figures
Figure 1 : Relationship between the Calculated AI and the Effective AI with Visual
Cues when the listener can see the lips and face of the talker (ANSI S3.5-1969
22). This plot is based on the study “Visual contribution to speech intelligibility
in noise” by Sumby and Pollack. ......................................................................... 4
Figure 2 : Relation between AI and various measures of speech intelligibility (ANSI
S3.5-1969 23). This plot is compiled by ANSI to include both data from French
and Steinberg in their paper "Factors governing the intelligibility of speech
sounds" as well as data from Kryter in his paper “Some comparisons between
rhyme and PB-word intelligibility tests”. The acronym PB stands for words that
are phonetically balanced. .................................................................................... 6
Figure 3: Relationship between SII and speech intelligibility. SII is shown on the
bottom axis in percentages, SNR is shown on the top axis in dB, and speech
intelligibility is shown on the left axis in percentages (Killion and Mueller 14). 8
Figure 4: This graph shows the relation between the STI and the articulation loss of
consonants (ALcons) using a logarithmic scale for plotting ALcons which is an
intelligibility score found using talkers and listeners. This intelligibility score is
found using the loss of consonants in Consonant-Vowel-Consonant (CVC) type
nonsense words for a set of 57 auditorium-like conditions (noise, reverberation,
or echo). ............................................................................................................. 10
Figure 5: This image shows the layout of a solid-state “black box” recorder. The
casing is orange to attract attention to be easily found in aircraft wreckage. An
underwater locator beacon is used to find the device when the plane crashes
over water. The special casing allows the device to withstand 3400 G for 6.5
ms, 1100 degrees Celsius for 60 minutes, and water pressure resistance up to
6000 meters. The memory is all located within the insulated, armored casing for
better protection (Edgar). ................................................................................... 12
Figure 6: This image shows how the Cockpit Voice Recorder (CVR) appears to a
pilot of a Boeing 737. The recording begins with the first rise in engine oil
pressure and ends 5 minutes after the last engine shuts down. The rest of the
Cockpit Voice Recorder resides in the cargo hold in the back of the aircraft
(Brady). .............................................................................................................. 13
Figure 7 : This plot shows the SNR for 20 samples of two and five layers of
continuous speech. For two layers, the numbers are both positive and negative.
For five layers, the numbers are consistently negative because the noise power
is greater than the signal power. ........................................................................ 23
Figure 8: This plot shows the STI for 19 samples of two and five layers of
continuous speech. ............................................................................................. 25
vii
Figure 9 : This plot shows the SII for 20 samples of one layers of continuous speech
with added interior airplane noise in blue. The average of all of the samples is
shown in red. ...................................................................................................... 26
Figure 10: This plot shows the relationship between SNR and SII for one, two, and
five layers of continuous speech with added interior airplane noise. One layer is
shown in green, two layers are shown in red, and five layers are shown in blue.
............................................................................................................................ 27
1
Introduction
Speech intelligibility is used as a metric to measure the ability for humans to
understand a given sound. Often used in room acoustics, speech intelligibility helps
define the likelihood that human speech can be understood correctly in a given situation.
Speech intelligibility is related to the Signal-to-Noise ratio, as less speech can be
understood when noise is increased. Speech intelligibility is a subjective test; however,
computational models have been formed to improve the regularity and standardization
associated with the measurement of speech intelligibility. Over the years, engineers and
scientists have developed many procedures and mathematical models to encompass and
define the field of speech intelligibility. These processes include the Articulation Index
(AI), the Speech Transmission Index (STI), and the Speech Intelligibility Index (SII).
These methods have been altered for improvement in certain situations, which in turn
creates new models. However these three standards of speech intelligibility (the AI, the
STI and the SII) will be used in this research.
Articulation Index
The Articulation Index (AI) was originally used to determine how the effects of
noise on a telephone line affect speech intelligibility (Lindgreen, Garcia, and Georganti
4). French and Steinberg developed the AI in 1947. They point out that “echoes, phase
distortion, and reverberation may affect intelligibility” (French and Steinberg 91). These
two scientists attempted to focus on the hearing of young men and women with good
hearing in order to prevent variations in hearing level sensitivity from developing any
inconsistences in their model (French and Steinberg 92). The human ear has lower
2
hearing sensitivity when frequencies are masked (French and Steinberg 96). This can be
the reason why acknowledgement of certain frequencies is impossible to a human listener
in a noisy situation. Noise can mask certain frequencies, causing problems in speech
reception and intelligibility. The Articulation Index is based on the idea that “any narrow
band of speech frequencies of a given intensity carries a contribution to the total index
which is independent of the other bands with which it is associated and that the total
contribution of all bands is the sum of the contributions of the separate bands” (French
and Steinberg 101). In short, this metric is defined by the sum of different frequency
bands each weighted separately. The weighting value of each band depends on the
effective sensation level of the signal within a given band in the ear of the listener
(French and Steinberg 92). The different level assigned to each band is necessary to
determine the AI.
The AI is standardized by the American National Standards Institute in ANSI
S3.5-1969 (Lindgreen, Garcia, and Georganti 5). The American National Standards
Institute (ANSI) explains that the AI was developed primarily for adult male speakers, so
this method cannot be assumed to apply in situations involving females or children
(ANSI S3.5-1969 6). This is because of the dominant use of male voices when the AI was
being tested and developed. There are two methods for calculating the AI: The 20 Band
Method and The One-Third Octave Band and Octave Band Method (ANSI S3.5-1969 7-
8). The 20 Band Method divides the frequency spectrum into 20 unequal bands. When
above the audibility threshold (exceeding 30 dB), each of these bands contributes equally
to the speech intelligibility metric. The One-Third Octave Band and Octave Band Method
3
requires knowledge of speech and noise present in certain one-third octave or octave
bands. Either of these two methods can be used to calculate the AI according to ANSI.
Some factors can be easily assessed and evaluated by the AI. These factors
include masking by steady-state noise, masking by non-steady-state noise, frequency
distortion of the speech signal, amplitude distortion of the speech signal, reverberation,
vocal effort, and visual cues (ANSI S3.5-1969 15-21). For example, visual cues (such as
movements of the mouth and lips that aid in a listeners comprehension) would have an
effect on speech intelligibility. The AI can be modified to include or ignore the effect of
visual cues. Figure 1 shows the relationship between the AI with visual cues and the AI
without visual cues and is based on research by Sumby and Pollack (ANSI S3.5-1969
22). The relationship is relatively linear, with an increase in the calculated AI leading to
an increase in Effective AI with Visual Cues.
4
Figure 1 : Relationship between the Calculated AI and the Effective AI with Visual Cues when the
listener can see the lips and face of the talker (ANSI S3.5-1969 22). This plot is based on the study
“Visual contribution to speech intelligibility in noise” by Sumby and Pollack.
There are also factors that cannot be evaluated by the AI. These factors include sex of
the speaker, multiple transmission paths, the combination of factors, monaural versus
binaural presentation, and asymmetrical clipping, frequency shifting, and fading (ANSI
S3.5-1969 21). These aspects are unable to be assessed by the AI metric.
The relationship of the AI to speech intelligibility is also defined in the ANSI
standard. The AI is on a scale from zero to one and speech intelligibility is on a scale
from zero to one hundred percent. When more constraints are imposed on the system, a
5
higher percent intelligibility score is achieved for a given SI (ANSI S3.5-1969 21). This
can be shown in Figure 2 where for example, when the vocabulary was limited to just 32
words, a higher speech intelligibility score was achieved for a given AI. Figure 2 shows a
compilation of different data relating the AI and speech intelligibility (ANSI S3.5-1969
23). Different curves represent different data sets that were tested using the AI. The AI
predicts the relative performance of a communication system operating under given
conditions (ANSI S3.5-1969 23). In cases of poor reception, French and Steinberg assert
that sentence tests are more useful (French and Steinberg 115). The circumstances
making up and surrounding communication systems are needed for the AI to predict the
performance relative to other communication systems.
6
Figure 2 : Relation between AI and various measures of speech intelligibility (ANSI S3.5-1969 23).
This plot is compiled by ANSI to include both data from French and Steinberg in their paper
"Factors governing the intelligibility of speech sounds" as well as data from Kryter in his paper
“Some comparisons between rhyme and PB-word intelligibility tests”. The acronym PB stands for
words that are phonetically balanced.
The overall performance of the intelligibility of a system can be evaluated using the zero
to one scale of the AI. For commercial systems, an AI greater than 0.5 is preferable
(ANSI S3.5-1969 23). With communication systems used in stress conditions with a
variety of speakers and listeners with varying degrees of skill, an AI exceeding 0.7 is
7
appropriate (ANSI S3.5-1969 23). These numbers are used to implement the AI in real
world applications.
Speech Intelligibility Index
The Speech Intelligibility Index (SII) is similar to the AI. The SII also uses the
signal-to-noise ratio (SNR) to represent contributions within each frequency band. The
intelligibility score is determined through a weighted average across the bands, which in
turn relates to the subjective intelligibility of speech (Schwerin and Paliwal 10). One
difference between the AI and the SII is the correction for change in speech spectrum
according to vocal effort (Lindgreen, Garcia, and Georganti 5). The American National
Standards Institute (ANSI) developed a method for calculating the SII in 1997. The ANSI
defines the SII as the product of the band importance function and the band audibility
function summed over the total number of frequency bands in the computational method
(ANSI S3.5-1997 2-3). Below is an equation that can be used to describe the SII:
𝑆 = ∑ 𝐼𝑖𝐴𝑖
𝑛
𝑖=1
[1]
The value S refers to the SII, a number from zero to one. The number n is the total
number of bands in the computation. The value Ii refers to the band importance function
and the value Ai refers to the band audibility function. There are four different methods
for calculating the SII (ANSI S3.5-1997 9). These methods include Critical Frequency
Band, One-Third Octave Frequency Band, Equally-Contributing Critical Band, and
Octave Frequency Band. Critical Frequency Band has the most bands with 21 different
bands; Octave Frequency Band has the least bands with only six bands. One-Third
Octave Frequency Band and Equally-Contributing Critical Band have 18 and 17 bands
8
respectively (ANSI S3.5-1997 9). The weighting value of each band is called the band
importance (the value Ii described above). The band importance assigned to each band
differs and must be computed across bands. However, the Equally-Contributing Critical
Band is an exception as the band importance is a constant 0.0588 for all bands (ANSI
S3.5-1997 11). This value is determined by dividing one by the total number of bands in
this method, 18. These four methods primarily differ and can be distinguished by the
weighting values of each band and the number of bands.
The relationship between the SII and speech intelligibility is shown below in
Figure 3 (Killion and Mueller 14). As one can see, there exists a similarity between the
SII and the AI in that the more constraints placed on the system, the higher an
intelligibility score will be for a given SII value.
Figure 3: Relationship between SII and speech intelligibility. SII is shown on the bottom axis in
percentages, SNR is shown on the top axis in dB, and speech intelligibility is shown on the left axis in
percentages (Killion and Mueller 14).
9
The performance of the SII is defined in the ANSI standard. Good communication
systems have an SII greater than approximately 0.75 and poor communication systems
have an SII less than approximately 0.45 (ANSI S3.5-1997 16). This range is used to
determine the overall performance of a given communication system.
Speech Transmission Index
The Speech Transmission Index (STI) is based on the Modulation Transfer
Function (MTF). The sound transmission system consists of both a source of sound and
an environment where the sound is transmitted. This system is characterized by MTFs,
with each MTF describing a different frequency region of the noise carrier (Houtgast and
Steeneken 1070). The MTF is used to define the reduction in the modulation index of the
intensity envelope as a function of modulation frequency (Houtgast and Steeneken 1071).
With similarity in the envelope spectra of speech in different conditions (with a
maximum of about 3 Hz), the MTF can be used to measure speech intelligibility, and in
turn develop the STI (Houtgast and Steeneken 1071). Figure 4 shows the linear
relationship between the STI and speech intelligibility when using a logarithmic scale for
plotting the articulation loss of consonants (Houtgast and Steeneken 1073). Houtgast and
Steeneken reported this data in 1985 in a paper in The Journal of the Acoustical Society
of America.
10
Figure 4: This graph shows the relation between the STI and the articulation loss of consonants
(ALcons) using a logarithmic scale for plotting ALcons which is an intelligibility score found using
talkers and listeners. This intelligibility score is found using the loss of consonants in Consonant-
Vowel-Consonant (CVC) type nonsense words for a set of 57 auditorium-like conditions (noise,
reverberation, or echo).
The STI is calculated when the overall mean signal-to-noise ratio falls between
+15 and – 15 dB by using the below equation.
𝑆𝑇𝐼 =
[(𝑆𝑁)
′̅̅ ̅̅ ̅̅+ 15]
30
[2]
The value (𝑆
𝑁)
′̅̅ ̅̅ ̅̅ refers to the weighted average of (
𝑆
𝑁)
′
which equals 10 log [𝑚
1−𝑚] in
decibels (dB) (Houtgast and Steeneken 1072). The value of m is the modulation
reduction factor and is given by (1 + 10−
𝑆𝑁
10 )
−1
where 𝑆
𝑁 refers to the Signal-to-Noise ratio
(SNR).
Signal-to-Noise Ratio
Each of the above metrics, the AI, the SII and the STI, use the Signal-to-Noise
Ratio (SNR) to determine speech intelligibility. The calculation of the SNR of a given
11
speech signal is necessary to determine whether the speech will be able to be understood
by the listener.
Cockpit Voice Recorders
The calculation of the Signal-to-Noise Ratio will be conducted in the environment
of a failed Cockpit Voice Recorder (CVR). A CVR is a device that records the pilots’
speech in a cockpit to improve the safety of aviation and is a part of the “black box” in
airplanes that have seating for more than ten passengers. The basic black box can be
found in Figure 5 (Edgar). Figure 5 is an image of a newer solid-state recorder, where
older recorders have a tape recording.
12
Figure 5: This image shows the layout of a solid-state “black box” recorder. The casing is orange to
attract attention to be easily found in aircraft wreckage. An underwater locator beacon is used to
find the device when the plane crashes over water. The special casing allows the device to withstand
3400 G for 6.5 ms, 1100 degrees Celsius for 60 minutes, and water pressure resistance up to 6000
meters. The memory is all located within the insulated, armored casing for better protection (Edgar).
As required by the Federal Aviation Administration (FAA), the CVR is placed in
all aircrafts with seating for ten or more passengers. Per Recommendation Number A-
96-171 by the National Transportation Safety Board (NTSB), these devices record the
most recent two hours of audio in the cockpit (Hall 6). This recommendation was due to
the investigation of a hard touchdown of a domestic passenger flight from Atlanta,
Georgia to Nashville, Tennessee, in which four passengers and one crewmember was
injured. (Hall 1). The length of the recording was improved upon from the 30 minute
13
CVR to a two hour CVR because of the necessity for a longer audio recording to
determine the cause and the possible safety solutions for the accident.
In commercial vehicles (and larger aircraft), the CVR has four channels, one
microphone for each person in the cockpit, and one Cockpit Area Microphone (CAM)
that records anything happening in the cockpit. The channels are tested using a singular
tone test. This simple test catches some problems with a CVR, but if the erase head is
broken, the singular frequency test is not robust enough to recognize this problem. Figure
6 shows the CVR as seen by the pilot in a Boeing-737 jet airplane. Pressing the button
marked “TEST” will emit the single tone test. The status is shown on the light marked
“STATUS” (Brady).
Figure 6: This image shows how the Cockpit Voice Recorder (CVR) appears to a pilot of a Boeing
737. The recording begins with the first rise in engine oil pressure and ends 5 minutes after the last
engine shuts down. The rest of the Cockpit Voice Recorder resides in the cargo hold in the back of
the aircraft (Brady).
Purpose
Speech intelligibility tests would add to the complexity of the single tone test in
hope of preventing broken CVRs from flying in the skies. With no erase function,
recordings layer on top of one another and render the CVR useless. Therefore without a
more robust test, the failure of the CVR isn’t noticed until the plane crashes, at which
14
point the data on the CVR is critical to determining the cause of the accident.
Determining a Signal-to-Noise Ratio threshold that interacts well with one of the three
speech intelligibility tests will help keep the CVRs that are on airplanes functioning
properly.
15
Methods
The dataset was developed to mimic an environment with a failed CVR. Due to
the sensitive nature of CVR recordings, they are not readily available. When a CVR fails,
one problem that can be encountered is a rerecording over a previous recording. This is
the problem that will be addressed with the dataset. This environment was replicated
using the TIMIT (Texas Instruments – Massachusetts Institute of Technology) Acoustic-
Phonetic Continuous Speech Corpus to create the dataset (Garafolo, John, et al.). This
corpus of read speech contains broadband recordings of 630 speakers in eight major
dialects of American English, each reading 10 phonetically rich sentences (Garafolo,
John, et al.). An additional noise (with a Signal-to-Noise ratio of 24 dB) was added to the
dataset. The added noise was a recording of the interior of a Boeing 747 airliner during
quiet constant flight (British Broadcasting Corporation). This noise was picked to best
match the cockpit environment of the plane during flight. The Signal-to-Noise ratio was
set at 24 dB because this is the minimum Signal-to-Noise Ratio for the audio channel in a
Honeywell Solid State Cockpit Voice Recorder ED-56a voice recording system
(Honeywell 5). In addition to the airplane noise, multiple phrases from the TIMIT dataset
were combined to replicate failure of the CVR. One phrase was treated as a control case,
where only the airplane noise was present. Additional tests of two phrases and five
phrases were used to test for Signal-to-Noise Ratio, Peak Signal-to-Noise Ratio, Speech
Intelligibility Index, and Speech Transmission Index using MATLAB. All of the
MATLAB code can be found in the appendix.
16
The SNR was calculated using the uncorrupted single layer of speech as the
signal, and the additional airplane noise and the layers of speech as the noise. The peak
Signal-to-Noise Ratio was calculated using Equation 3.
𝑃𝑆𝑁𝑅 = 10𝑙𝑜𝑔10 (
𝑝𝑒𝑎𝑘𝑣𝑎𝑙2
𝑀𝑆𝐸) [3]
The notation peakval refers to the highest value of the signal and MSE refers to the Mean
Square Error.
The Speech Intelligibility Index (SII) was calculated using code developed by the
Acoustical Society of America Working Group S3-79, 2005 (Musch, Hannes, and Pat
Zurek). The one third-octave band procedure was used with 18 different frequency bands.
The frequency bands are listed in Table 1.
Band Number Frequency Range
1 140-180
2 180-225
3 225-282.5
4 282.5-357.5
5 357.5-450
6 450-565
7 565-715
8 715-900
9 900-1125
10 1125-1425
11 1425-1800
12 1800-2250
13 2250-2825
14 2825-3575
15 3575-4500
16 4500-5650
17 5650-7150
18 1750-8000 Table 1 : This table shows the 18 different frequency bands for the one-third octave band procedure
for calculating the SII
17
Each of these frequency bands had a different band importance function and band
audibility function. These were multiplied together and then summed to calculate the SII
value as seen in Equation 1.
The Speech Transmission Index (STI) was calculated using the Signal-to-Noise
Ratio (SNR). The values of the SNR between -15 and +15 dB allowed the STI to be
estimated using equation 2.
18
Results
The results for the Signal-to-Noise Ratio (SNR), Speech Intelligibility Index (SII),
and Speech Transmission Index (STI) are explained below. Each had 20 trials for three
variables: one layer with just the added Boeing 747 interior noise, two layers of speech
with the added Boeing 747 noise, and five layers of speech with the added Boeing 747
noise.
Signal-to-Noise Ratio
Below are two tables detailing the results of the SNR study. There were 20 speech
waves tested. Table 2 shows the SNR of the dataset in decibels (dB). The first column
shows the SNR of the control dataset with just one layer of continuous speech. The
second column shows the SNR of two layers of continuous speech. The third column
shows the SNR of five layers of continuous speech.
19
One layer (dB) Two Layers (dB) Five layers (dB)
24.000 -0.578 -3.978
24.000 5.405 -14.040
24.000 5.578 -4.257
24.000 6.929 -13.872
24.000 2.418 -7.025
24.000 6.874 -4.984
24.000 -5.541 -17.857
24.000 -0.932 -7.590
24.000 -1.338 -1.546
24.000 -3.099 -14.496
24.000 -1.697 -6.304
24.000 -1.913 -4.415
24.000 -5.014 -6.514
24.000 -4.599 -6.489
24.000 0.613 -11.743
24.000 -14.441 -5.544
24.000 -7.181 -5.829
24.000 2.908 -5.048
24.000 7.208 -10.059
24.000 -0.251 -8.709 Table 2 : This table shows the SNR in dB for one, two, and five layers of speech. These values were
calculated using MATLAB and the code is shown in the appendix.
The average SNR for one layer of speech is 24 dB, for two layers of speech is -0.433 dB,
and for five layers of speech is -8.015 dB.
Table 3 shows the peak SNR of the dataset. The first column shows the peak SNR
of the control dataset with just one layer of continuous speech. The second column shows
the peak SNR of two layers of continuous speech. The third column shows the peak SNR
of five layers of continuous speech.
20
One layer (dB) Two Layers (dB) Five layers (dB)
-33.390 -57.968 -61.369
-33.275 -51.869 -60.071
-27.650 -46.073 -59.018
-33.401 -50.472 -61.300
-29.829 -51.411 -59.445
-30.761 -47.888 -60.074
-27.859 -57.401 -61.598
-22.497 -47.429 -57.550
-30.645 -55.983 -58.603
-29.410 -56.509 -58.043
-28.420 -54.116 -58.205
-29.069 -54.982 -56.720
-26.061 -55.075 -63.016
-28.790 -57.390 -62.385
-27.865 -51.253 -59.813
-19.741 -58.181 -58.158
-25.958 -57.139 -60.822
-28.882 -49.974 -59.746
-31.019 -47.811 -61.590
-25.447 -49.698 -61.703 Table 3 :This table shows the peak SNR in dB for one, two, and five layers of speech. These values
were calculated using MATLAB and the code is shown in the appendix.
The average peak SNR for one layer of speech is -28.499, for two layers of speech is -
52.931 dB, and for five layers of speech is -59.961dB.
Speech Intelligibility Index
Table 4 shows the results of the SII study. The first column shows the SII of the
control dataset with just one layer of continuous speech with added interior airplane
noise. The second column shows the SII of two layers of continuous speech with added
interior airplane noise. The third column shows the SII of five layers of continuous
speech with added interior airplane noise.
21
One layer Two Layers Five layers
0.793 0.398 0.310
0.784 0.487 0.114
0.756 0.590 0.305
0.783 0.737 0.148
0.764 0.653 0.247
0.778 0.687 0.354
0.752 0.208 0.011
0.716 0.332 0.238
0.768 0.350 0.464
0.760 0.237 0.041
0.755 0.385 0.266
0.753 0.402 0.307
0.752 0.348 0.213
0.761 0.314 0.240
0.749 0.512 0.106
0.702 0.033 0.264
0.750 0.173 0.347
0.761 0.541 0.258
0.776 0.699 0.268
0.732 0.606 0.270 Table 4 : This table shows the peak SNR in dB for one, two, and five layers of speech. These values
were calculated using MATLAB and the code is shown in the appendix.
The average SII for one layer is 0.757, for two layers is 0.434, and for five layers is
0.238. The SII is a metric for speech intelligibility that ranges from zero to one, with one
being completely intelligible and zero being completely unintelligible.
Speech Transmission Index
The STI was estimated using equation 2 using the calculated SNR values. Table 5
shows the results of the STI study for two layers of speech and five layers of speech. The
STI could not be estimated for one layer of speech because the average SNR of 24 dB
was not between the requisite -15 to +15 dB.
22
Two Layers Five layers
0.481 0.367
0.680 0.032
0.686 0.358
0.731 0.038
0.581 0.266
0.729 0.334
0.469 0.247
0.455 0.448
0.397 0.017
0.443 0.290
0.436 0.353
0.333 0.283
0.347 0.284
0.520 0.109
0.019 0.315
0.261 0.306
0.597 0.332
0.740 0.165
0.492 0.210 Table 5 : This table shows the STI for two and five layers of speech. These values were calculated
using equation 2.
Only 19 trials were conducted for two layers and five layers of speech because one trial
could not be considered as the SNR was below -15 dB. However, this value was taken
into consideration when evaluating the averages. The average modulation factor, m, had a
value of 0.475 for two layers of speech and a value of 0.136 for five layers of speech.
Using this value, the average STI was approximated to be 0.486 for two layers of speech
and 0.233 for five layers of speech. The STI is a metric for speech intelligibility that
ranges from zero to one with one being completely intelligible and zero being completely
unintelligible.
23
Discussion
A positive Signal-to-Noise Ratio means that the signal power is greater than the
noise power. For two layers of continuous speech, the SNR is closer to zero, and is often
negative. The negative values mean that the noise power is greater than the signal power.
Below, in Figure 7, is a plot showing that the SNR for two layers of speech can fluctuate
between a positive and negative value.
Figure 7 : This plot shows the SNR for 20 samples of two and five layers of continuous speech. For
two layers, the numbers are both positive and negative. For five layers, the numbers are consistently
negative because the noise power is greater than the signal power.
The values for the SNR for two layers of continuous speech range from -14.441 dB to
7.208 dB with an average of -0.433 dB. This dynamic range makes it difficult to pinpoint
a problematic SNR. However, when there are five layers of continuous speech, the SNR
is consistently negative. In Figure 7, one can see that the SNR is never above zero. The
24
values for the SNR for five layers of continuous speech range from -17.857 dB to -1.546
dB with an average of -7.995 dB. This average is decidedly negative, showing that five
layers of continuous speech should have greater noise power than signal power. As one
can see, there is a difference between the 24 dB SNR for one layer of speech, the
maximum value of 7.208 dB for two layers of speech, and the maximum value of -1.546
dB for five layers of speech. Once the second layer of speech is recorded over the first
layer, the CVR should be able to deem that the signal is corrupted by measuring the SNR,
thus alerting the user that the CVR should be repaired or replaced.
Next we will look at different speech intelligibility metrics including the Speech
Transmission Index and the Speech Intelligibility Index and how they interpret the
intelligibility of two layers versus five layers.
Using the Signal-to-Noise Ratio, the Speech Transmission Index (STI) can be
estimated using equation 2. Below, in Figure 8, is a plot showing the STI for two layers
of continuous speech. The STI for two layers of continuous speech ranges from 0.019 to
0.740 with an average of 0.486. The two layers of continuous speech cannot be compared
to one layer, but a comparison of the two layers with the five layers can be conducted.
25
Figure 8: This plot shows the STI for 19 samples of two and five layers of continuous speech.
Figure 8 also shows the STI for five layers of continuous speech. The average STI
is lower than the average STI for two layers of continuous speech. The STI for five
layers of continuous speech ranges from 0.017 to 0.448. While the average STI is lower
for five layers of continuous speech than for two layers, the numbers aren’t as consistent.
The range of the STI for two layers of continuous speech overlaps the STI for five layers
of continuous speech making it an inefficient metric to differentiate between two layers
of continuous speech and five layers of continuous speech.
To find a more robust test, the Speech Intelligibility Index was tested using the
dataset. This test showed a greater difference between the one layer and the two layers.
Below in Figure 9 the plot shows the SII for the control dataset. This data only has the
26
added airplane noise and is compared to the original signal. The SII for the one layer
ranges from 0.702 to 0.793. This is a narrow range for SII.
Figure 9 : This plot shows the SII for 20 samples of one layers of continuous speech with added
interior airplane noise in blue. The average of all of the samples is shown in red.
Next we can look at the SII for two layers. Figure 9 shows the SII for two layers
of continuous speech. The SII ranges from 0.173 to 0.737. Only one sample out of twenty
for the two layers fell within the same range as the SII for one layer of continuous speech.
The SII for two layers is outside the range of the SII for one layer 95% of the time. This
allows for a system that tests CVR failure based on the SII to be accurate 95% of the
time. Once the CVR fails once in this way, it will continue to fail. We can look at the SII
of five layers of continuous speech to determine if the SII will be an accurate metric for
intelligibility in CVRs and if it will monitor failures of CVRs.
27
Figure 9 shows a plot of the SII for five layers of continuous speech. The SII
ranges from 0.011 to 0.464 for five layers of continuous speech. This range is entirely
beneath the range of the SII for one layer of continuous speech. As one can see, five
layers of speech renders the signal unintelligible according the SII. Failure of the CVR
can be recognized using the SII metric. The 5% of data that will not be recognized on the
first additional layer of speech will be documented with further layers of speech. The SII
can be used to measure speech intelligibility in Cockpit Voice Recorders.
Finally, let’s correlate the SNR with the SII. Below in Figure 10, one can see the
relationship between SNR and SII.
Figure 10: This plot shows the relationship between SNR and SII for one, two, and five layers of
continuous speech with added interior airplane noise. One layer is shown in green, two layers are
shown in red, and five layers are shown in blue.
28
With a defined Signal-to-Noise ratio of 24 dB, the data of the one layer of continuous
speech remains separated from the two and five layers of continuous speech data. The
two layers and five layers of continuous speech have some overlap both in light of the
SNR and the SII metric. The two layers retain higher values for SII than the five layers of
speech; however, they mostly rest below the one layer of speech. The SNR and SII are
correlated with a Pearson correlation coefficient of 0.946. This correlation coefficient is
statistically significant with a p-value less than 0.05. This correlation helps one
understand the threshold of SNR related to speech intelligibility.
29
Conclusion
The discussion above supports the idea of measuring Speech Intelligibility Index
to improve Cockpit Voice Recorder functionality. The SII metric is necessary to
determine Speech Intelligibility of the system.
By focusing on the specific case where the erase function of the CVR is broken,
one problem has been addressed to aid in the improvement of CVRs. This improvement
could help solve other troubling problems with CVRs in the air travel industry. Often,
connectivity issues or broken microphones go unnoticed with the simple tone test. These
devices are useless unless they are powered on and recording accurate information. They
exist for the sole purpose of improving aviation safety, so it is important that the CVR
device continually works and functions so that it can aid in the comprehension and
progress of the aviation industry.
30
Appendix
List of Acronyms
AI Articulation Index
ANSI American National Standards Institute
CVC Consonant-Vowel-Consonant
CVR Cockpit Voice Recorder
dB Decibels
FAA Federal Aviation Administration
MTF Modulation Transfer Function
NTSB National Transportation Safety Board
PB Phonetically Balanced
TIMIT Texas Instruments – Massachusetts Institute of Technology
SII Speech Intelligibility Index
SNR Signal-to-Noise Ratio
STI Speech Transmission Index
MATLAB Code %Double Dataset load('TIMITsizesorted.mat'); n = 100; %%total number of data pieces, cannot exceed number of files B = cell(1, n); %storage of double data for x = 2:2:n; B{x/2} = padarray(S{x},(size(S{x-1})-size(S{x}))/2) + S{x-1}; end;
%% Data x5 load('TIMITsizesorted.mat'); n = 100; %% cannot exceed number of files C = cell(1, n); %storage of datax5 for x = 5:5:n; C{x/5} = padarray(S{x},(size(S{x-4})-size(S{x}))/2)+ . . .
padarray(S{x-1},(size(S{x-4})-size(S{x-1}))/2)+ . . .
padarray(S{x-2},(size(S{x-4})-size(S{x-2}))/2) + . . .
padarray(S{x-3},(size(S{x-4})-size(S{x-3}))/2) + S{x-4}; end;
function [ noisy, noise ] = addnoise( signal, noise, snr ) % ADDNOISE Add noise to signal at a prescribed SNR level. % % [NOISY,NOISE]=ADDNOISE(SIGNAL,NOISE,SNR) adds NOISE to SIGNAL % at a prescribed SNR level. Returns the mixture signal as well % as scaled noise such that NOISY=SIGNAL+NOISE. % % Inputs
31
% SIGNAL is a target signal as vector. % % NOISE is a masker signal as vector, such that % length(NOISE)>=length(SIGNAL). Note that % in the case that length(NOISE)>length(SIGNAL), % a vector of length length(SIGNAL) is selected % from NOISE starting at a random sample number. % % SNR is the desired signal-to-noise ratio level (dB). % % Outputs % NOISY is a mixture signal of SIGNAL and NOISE at given SNR. % % NOISE is a scaled masker signal, such that the mixture % NOISY=SIGNAL+NOISE has the desired SNR. % % Example % % inline function for SNR calculation % SNR = @(signal,noisy)( 20*log10(norm(signal)/norm(signal-
noisy)) ); % % fs = 16000; % sampling frequency
(Hz) % freq = 1000; % sinusoid frequency
(Hz) % time = [ 0:1/fs:2 ]; % time vector (s) % signal = sin( 2*pi*freq*time ); % signal vector (s) % noise = randn( size(signal) ); % noise vector (s) % snr = -5; % desired SNR level (dB) % % % generate mixture signal: noisy = signal + noise % [ noisy, noise ] = addnoise( signal, noise, snr ); % % % check the resulting signal-to-noise ratio % fprintf( 'SNR: %0.2f dB\n', SNR(signal,noisy) ); % % See also TEST_ADDNOISE_SINUSOID, TEST_ADDNOISE_SPEECH. % Author: Kamil Wojcicki, UTD, July 2011 % inline function for SNR calculation SNR = @(signal,noisy)( 20*log10(norm(signal)/norm(signal-noisy)) ); % needed for older realases of MATLAB randi = @(n)( round(1+(n-1)*rand) ); % ensure masker is at least as long as the target S = length( signal ); N = length( noise ); if( S>N ), error( 'Error: length(signal)>length(noise)' ); end; % generate a random start location in the masker signal R = randi(1+N-S); % extract random section of the masker signal noise = noise(R:R+S-1); % scale the masker w.r.t. to target at a desired SNR level noise = noise / norm(noise) * norm(signal) / 10.0^(0.05*snr); % generate the mixture signal noisy = signal + noise;
32
% sanity check assert( abs(SNR(signal,noisy)-snr) < 1E10*eps(snr) ); %%% EOF
%% Add Plane Noise SP2 = cell(1,20); SP5 = cell(1,20); BP = cell(1,20); CP = cell(1,20); for n = 1:1:20 [SP2{n}, trash] = addnoise(S2{n}, plane_noise, 24); [SP5{n}, ~] = addnoise(S5{n}, plane_noise, 24); [BP{n}, ~] = addnoise(B{n}, plane_noise, 24); [CP{n}, ~] = addnoise(C{n}, plane_noise, 24); end
%% Calculating Peak SNR and SNR psnr2 = zeros(1, 50); %%peak Signal-to-Noise Ratio for x2 snr2 = zeros(1, 50); %%Signal-to-Noise Ratio for x2 psnr5 = zeros(1, 50); %%peak Signal-to-Noise Ratio for x5 snr5 = zeros(1, 50); %%Signal-to-Noise Ratio for x5
psnrc = zeros(1, 50); %%peak Signal-to-Noise Ratio for control snrc = zeros(1, 50); %%Signal-to-Noise Ratio for control for x = 1:1:20 [psnr1, snr1] = psnr(BP{x},S2{x}); psnr2(x) = psnr1; snr2(x) = snr1; clearvars psnr1; clearvars snr1; end for x = 1:1:20 [psnr1, snr1] = psnr(CP{x},S5{x}); psnr5(x) = psnr1; snr5(x) = snr1; clearvars psnr1; clearvars snr1; end for x = 1:1:20 [psnr1, snr1] = psnr(SP2{x},S2{x}); psnrc(x) = psnr1; snrc(x) = snr1; clearvars psnr1; clearvars snr1; end
% Speech Intelligibility Index
function S = SII_test(varargin) % "Methods for calculation of the Speech Intelligibility Index" % (ANSI S3.5-1997) % % MATLAB implementation of Section 4. % % Note: The remaining sections of the standard, which provide means to
33
calculate input parameters required by % the "core" SII procedure of Section 4, are implemented in seperate
scripts: % Section 5.1 in script Input_5p1.m "method based on the direct
measurement/ estimation of noise and speech spectrum levels at the
listener's position" % Section 5.2 in script Input_5p2.m "method based on MTFI/CSNSL
measurements at the listener's position" % Section 5.3 in script Input_5p3.m "method based on MTFI/CSNSL
measurements at the eardrum of the listener" % % % Parameters are passed to the procedure through pairs of "identifier"
and corresponding "argument" % Identifiers are always strings. Possible identifiers are: % % 'E' Equivalent Speech Spectrum Level (Section 3.6 in the standard) % 'N' Equivalent Noise Spectrum Level (Section 3.15 in the standard) % 'T' Equivalent Hearing Threshold Level [dBHL] (Section 3.23 in the
standard) % 'I' Band Importance function (Section 3.1 in the standard) % % Except for 'E', which must be specified, all parameters are optional.
If an identifier is not specified a default value will be used. % Pairs of identifier and argument can occur in any order. However, if
an identifier is listed, it must be followed immediately by its
argument. % % % Possible arguments for the identifiers are: % % Arguments for 'E': % A row or column vector with 18 numbers stating the
Equivalent Speech Spectrum Levels in dB in bands 1 through 18. % % Arguments for 'N': % A row or column vector with 18 numbers stating the
Equivalent Noise Spectrum Levels in dB in bands 1 through 18. % If this identifier is omitted, a default Equivalent Noise
Spectrum Level of -50 dB is assumed in all 18 bands (see note in Section
4.2). % % Arguments for 'T': % A row or column vector with 18 numbers stating the
Equivalent Hearing Threshold Levels in dBHL in bands 1 through 18. % If this identifier is omitted, a default Equivalent Hearing
Threshold Level of 0 dBHL is assumed in all 18 bands . % % Arguments for 'I': % A scalar having a value of either 1, 2, 3, 4, 5, 6, or 7.
The Band-importance functions associated with each scalar are % 1: Average speech as specified in Table 3
(DEFAULT) % 2: various nonsense syllable tests where most
English phonemes occur equally often (as specified in Table B.2) % 3: CID-22 (as specified in Table B.2) % 4: NU6 (as specified in Table B.2) % 5: Diagnostic Rhyme test (as specified in Table
B.2) % 6: short passages of easy reading material (as
specified in Table B.2) % 7: SPIN (as specified in Table B.2) % %
34
% The function returns the SII of the specified listening condition,
which is a value in the interval [0, 1]. % % % REMINDER OF DEFINITIONS & MEANINGS: % % Equivalent Speech Spectrum Level, E-prime % The SII calculation is based on free-field levels, even
though the quantity relevant for perception and intelligibility % is the level at the listener's eardrum. % The Equivalent Speech Spectrum Level is the speech spectrum
level at the center of the listener's % head (when the listener is temporarily absent) that produces
in an average human with unoccluded ears an eardrum speech level equal % to the eardrum speech level actually present in the
listening situation to be analyzed. % Before the SII can be applied to a given listening
situation, the corresponding Equivalent Speech % Spectrum Level must be derived. For example, when speech is
presented over insert earphones (earphones inside the earcanal), % only the speech spectrum level at the eardrum is known.
Using the inverse of the freefield-to-eardrum transfer function (Table 3
of the standard) % this eardrum level must be "projected" into the freefield,
yielding the Equivalent Speech Spectrum Level. % Sections 5.1, 5.2, and 5.3 of the standard give three
examples of how to derive the Equivalent Speech Spectrum Level from
physical measurements. % The standard allows the use of alternative transforms, such
as the one illustrated above, where appropriate. % % Equivalent Noise Spectrum Level, N-prime % Similar to the Equivalent Speech Spectrum Level, the
Equivalent Noise Spectrum Level is the noise spectrum level at the
center of the listener's % head (when the listener is temporarily absent) that produces
an eardrum noise level equal to the eardrum noise level actually present
in the % listening situation to be analyzed. Sections 5.1, 5.2, and
5.3 give three examples of how to derive the Equivalent Speech Spectrum % Level from physical measurements. % % Hannes Muesch, 2003 - 2005 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% VERIFY INTEGRITY OF INPUT
VARIABLES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [x,Nvar] = size(varargin); CharCount = 0; Ident = []; for k = 1:Nvar if ischar(varargin{k})&(length(varargin{k})==1) CharCount = CharCount + 1; Ident = [Ident; k]; end end if Nvar/CharCount ~= 2 error('Every input must be preceeded by an identifying string') else for n = 1:length(Ident) if upper(varargin{Ident(n)}) == 'N' % Equivalent %Noise
Spectrum Level (3.15) N = varargin{Ident(n)+1}; elseif upper(varargin{Ident(n)}) == 'E' % Equivalent %Speech
35
Spectrum Level (3.6) E = varargin{Ident(n)+1}; elseif upper(varargin{Ident(n)}) == 'T' % Equivalent
%Hearing Threshold Level [dBHL] (3.23) T = varargin{Ident(n)+1}; elseif upper(varargin{Ident(n)}) == 'I' % Band %Importance
function (3.1) I = varargin{Ident(n)+1}; else error('Only ''E'', ''I'', ''N'', and ''T'' are valid
identifiers'); end; end; end; if isempty(who('E')), error('The Equivalent Speech Spectrum Level,
''E'', must be specified'); end if isempty(who('N')), N = -50*ones(1,18);
end; if isempty(who('T')), T = zeros(1,18);
end; if isempty(who('I')), I = 1;
end; N = N(:)'; T = T(:)'; E = E(:)'; if length(N) ~= 18, error('Equivalent Noise Spectrum Level: Vector size
incorrect'); end; if length(T) ~= 18, error('Equivalent Hearing Threshold Level: Vector
size incorrect'); end; if length(E) ~= 18, error('Equivalent Speech Spectrum Level: Vector size
incorrect'); end;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% IMPLEMENTATION OF SPEECH
INTELLIGIBILITY INDEX %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % ======================= THE NUMBERS IN PARENTHESIS REFER TO THE
SECTIONS IN THE ANSI STANDARD ======================= % Define band center frequencies for 1/3rd octave procedure (Table 3) f = [160 200 250 315 400 500 630 800 1000 1250 1600 2000, ... 2500 3150 4000 5000 6300 8000]; % Define Internal Noise Spectrum Level (Table 3) X = [0.6 -1.7 -3.9 -6.1 -8.2 -9.7 -10.8 -11.9 -12.5 -13.5 -15.4 -17.7,
... -21.2 -24.2 -25.9 -23.6 -15.8 -7.1]; % Self-Speech Masking Spectrum (4.3.2.1 Eq. 5) V = E - 24; % 4.3.2.2 B = max(V,N); % Calculate slope parameter Ci (4.3.2.3 Eq. 7) C = 0.6.*(B+10*log10(f)-6.353) - 80; % Initialize Equivalent Masking Spectrum Level (4.3.2.4) Z = []; Z(1) = B(1);
36
% Calculate Equivalent Masking Spectrum Level (4.3.2.5 Eq. 9) for i = 2:18 Z(i) = 10*log10(10.^(0.1*N(i)) + ... sum(10.^(0.1*(B(1:(i-1))+3.32.*C(1:(i-1)).*log10(0.89*f(i)./f(1:(i-
1))))))); end; % Equivalent Internal Noise Spectrum Level (4.4 Eq. 10) X = X + T; % Disturbance Spectrum Level (4.5) D = max(Z,X); % Level Distortion Factor (4.6 Eq. 11) L = 1 - (E - SpeechSptr('normal') - 10)./160; L = min(1,L); % 4.7.1 Eq. 12 K = (E-D+15)/30; K = min(1,max(0,K)); % Band Audibility Function (7.7.2 Eq. 13) A = L.*K; % Speech Intelligibility Index (4.8 Eq. 14) S = sum(BndImp(I).*A); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % PRIVATE FUNCTIONS
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function I = BndImp(tst); % Band importance functions: % tst = 1: Average speech as specified in Table 3 % 2: various nonsense syllable tests where most English % phonemes occur equally often % 3: CID-22 % 4: NU6 % 5: Diagnostic Rhyme test % 6: short passages of easy reading material % 7: SPIN if (nargin ~= 1), error('Incorrect # of input args to BndImp'); end; if ~((tst==1)|(tst==2)|(tst==3)|(tst==4)|(tst==5)|(tst==6)|(tst==7)), error('Band Importance function must be integer between 1 and 7'); end; BIArr= [0.0083 0 0.0365 0.0168 0 0.0114 0 0.0095 0 0.0279 0.013 0.024 0.0153 0.0255 0.015 0.0153 0.0405 0.0211 0.033 0.0179 0.0256 0.0289 0.0284 0.05 0.0344 0.039 0.0558 0.036 0.044 0.0363 0.053 0.0517 0.0571 0.0898 0.0362 0.0578 0.0422 0.0518 0.0737 0.0691 0.0944 0.0514 0.0653 0.0509 0.0514 0.0658 0.0781 0.0709 0.0616 0.0711 0.0584 0.0575 0.0644 0.0751 0.066 0.077 0.0818 0.0667 0.0717 0.0664 0.0781 0.0628 0.0718
37
0.0844 0.0774 0.0873 0.0802 0.0811 0.0672 0.0718 0.0882 0.0893 0.0902 0.0987 0.0961 0.0747 0.1075 0.0898 0.1104 0.0938 0.1171 0.0901 0.0755 0.0921 0.0868 0.112 0.0928 0.0932 0.0781 0.082 0.1026 0.0844 0.0981 0.0678 0.0783 0.0691 0.0808 0.0922 0.0771 0.0867 0.0498 0.0562 0.048 0.0483 0.0719 0.0527 0.0728 0.0312 0.0337 0.033 0.0453 0.0461 0.0364 0.0551 0.0215 0.0177 0.027 0.0274 0.0306 0.0185 0 0.0253 0.0176 0.024 0.0145 0]; I = BIArr(:,tst)'; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function E = SpeechSptr(VclEfrt); % This function returns the standard speech spectrum level from Table 3 Ei=[32.41 33.81 35.29 30.77; 34.48 33.92 37.76 36.65; 34.75 38.98 41.55 42.5; 33.98 38.57 43.78 46.51; 34.59 39.11 43.3 47.4; 34.27 40.15 44.85 49.24; 32.06 38.78 45.55 51.21; 28.3 36.37 44.05 51.44; 25.01 33.86 42.16 51.31; 23 31.89 40.53 49.63; 20.15 28.58 37.7 47.65; 17.32 25.32 34.39 44.32; 13.18 22.35 30.98 40.8; 11.55 20.15 28.21 38.13; 9.33 16.78 25.41 34.41; 5.31 11.47 18.35 28.24; 2.59 7.67 13.87 23.45; 1.13 5.07 11.39 20.72]; switch lower(VclEfrt) case 'normal', E = Ei(:,1)'; case 'raised', E = Ei(:,2)'; case 'loud', E = Ei(:,3)'; case 'shout', E = Ei(:,4)'; otherwise, error('Identifyer string to ''E'' not recognized') end; % EOF
%% SII Calculations %% divide signal and noise into different bands (18) %band pass filter %bandpass digital filter design Fs = 16000; Ast1 = 60; Ast2 = 60; Ap = 1; ds1 = fdesign.bandpass(40,140,180,280,Ast1,Ap,Ast2,Fs); ds2= fdesign.bandpass(80,180,225,325,Ast1,Ap,Ast2,Fs); ds3 = fdesign.bandpass(125,225,282.5,382.5,Ast1,Ap,Ast2,Fs); ds4 = fdesign.bandpass(182.5,282.5,357.5,457.5,Ast1,Ap,Ast2,Fs);
38
ds5 = fdesign.bandpass(257.5,357.5,450,550,Ast1,Ap,Ast2,Fs); ds6 = fdesign.bandpass(350,450,565,665,Ast1,Ap,Ast2,Fs); ds7 = fdesign.bandpass(465,565,715,815,Ast1,Ap,Ast2,Fs); ds8 = fdesign.bandpass(615,715,900,1000,Ast1,Ap,Ast2,Fs); ds9 = fdesign.bandpass(800,900,1125,1225,Ast1,Ap,Ast2,Fs); ds10 = fdesign.bandpass(1025,1125,1425,1525,Ast1,Ap,Ast2,Fs); ds11 = fdesign.bandpass(1325,1425,1800,1900,Ast1,Ap,Ast2,Fs); ds12 = fdesign.bandpass(1700,1800,2250,2350,Ast1,Ap,Ast2,Fs); ds13 = fdesign.bandpass(2150,2250,2825,2925,Ast1,Ap,Ast2,Fs); ds14 = fdesign.bandpass(2725,2825,3575,3675,Ast1,Ap,Ast2,Fs); ds15 = fdesign.bandpass(3475,3575,4500,4600,Ast1,Ap,Ast2,Fs); ds16 = fdesign.bandpass(4400,4500,5650,5750,Ast1,Ap,Ast2,Fs); ds17 = fdesign.bandpass(5550,5650,7150,7250,Ast1,Ap,Ast2,Fs); ds18 = fdesign.bandpass(7050,7150,7900,8000,Ast1,Ap,Ast2,Fs); Hds1 = design(ds1,'equiripple'); Hds2 = design(ds2,'equiripple'); Hds3 = design(ds3,'equiripple'); Hds4 = design(ds4,'equiripple'); Hds5 = design(ds5,'equiripple'); Hds6 = design(ds6,'equiripple'); Hds7 = design(ds7,'equiripple'); Hds8 = design(ds8,'equiripple'); Hds9 = design(ds9,'equiripple'); Hds10 = design(ds10,'equiripple'); Hds11 = design(ds11,'equiripple'); Hds12 = design(ds12,'equiripple'); Hds13 = design(ds13,'equiripple'); Hds14 = design(ds14,'equiripple'); Hds15 = design(ds15,'equiripple'); Hds16 = design(ds16,'equiripple'); Hds17 = design(ds17,'equiripple'); Hds18 = design(ds18,'equiripple'); %% Signal 2x Sband2s = cell(18,20); for n = 1:1:20 Sband2s{1,n} = filter(Hds1,S2{n}); Sband2s{2,n} = filter(Hds2,S2{n}); Sband2s{3,n} = filter(Hds3,S2{n}); Sband2s{4,n} = filter(Hds4,S2{n}); Sband2s{5,n} = filter(Hds5,S2{n}); Sband2s{6,n} = filter(Hds6,S2{n}); Sband2s{7,n} = filter(Hds7,S2{n}); Sband2s{8,n} = filter(Hds8,S2{n}); Sband2s{9,n} = filter(Hds9,S2{n}); Sband2s{10,n} = filter(Hds10,S2{n}); Sband2s{11,n} = filter(Hds11,S2{n}); Sband2s{12,n} = filter(Hds12,S2{n}); Sband2s{13,n} = filter(Hds13,S2{n}); Sband2s{14,n} = filter(Hds14,S2{n}); Sband2s{15,n} = filter(Hds15,S2{n}); Sband2s{16,n} = filter(Hds16,S2{n}); Sband2s{17,n} = filter(Hds17,S2{n}); Sband2s{18,n} = filter(Hds18,S2{n}); end %% Signal 5x Sband5s = cell(18,20); for n = 1:1:20 Sband5s{1,n} = filter(Hds1,S5{n});
39
Sband5s{2,n} = filter(Hds2,S5{n}); Sband5s{3,n} = filter(Hds3,S5{n}); Sband5s{4,n} = filter(Hds4,S5{n}); Sband5s{5,n} = filter(Hds5,S5{n}); Sband5s{6,n} = filter(Hds6,S5{n}); Sband5s{7,n} = filter(Hds7,S5{n}); Sband5s{8,n} = filter(Hds8,S5{n}); Sband5s{9,n} = filter(Hds9,S5{n}); Sband5s{10,n} = filter(Hds10,S5{n}); Sband5s{11,n} = filter(Hds11,S5{n}); Sband5s{12,n} = filter(Hds12,S5{n}); Sband5s{13,n} = filter(Hds13,S5{n}); Sband5s{14,n} = filter(Hds14,S5{n}); Sband5s{15,n} = filter(Hds15,S5{n}); Sband5s{16,n} = filter(Hds16,S5{n}); Sband5s{17,n} = filter(Hds17,S5{n}); Sband5s{18,n} = filter(Hds18,S5{n}); end %% Control, Just Plane noise 2x Sbandp2s = cell(18,20); for n = 1:1:20 Sbandp2s{1,n} = filter(Hds1,SP2{n}-S2{n}); Sbandp2s{2,n} = filter(Hds2,SP2{n}-S2{n}); Sbandp2s{3,n} = filter(Hds3,SP2{n}-S2{n}); Sbandp2s{4,n} = filter(Hds4,SP2{n}-S2{n}); Sbandp2s{5,n} = filter(Hds5,SP2{n}-S2{n}); Sbandp2s{6,n} = filter(Hds6,SP2{n}-S2{n}); Sbandp2s{7,n} = filter(Hds7,SP2{n}-S2{n}); Sbandp2s{8,n} = filter(Hds8,SP2{n}-S2{n}); Sbandp2s{9,n} = filter(Hds9,SP2{n}-S2{n}); Sbandp2s{10,n} = filter(Hds10,SP2{n}-S2{n}); Sbandp2s{11,n} = filter(Hds11,SP2{n}-S2{n}); Sbandp2s{12,n} = filter(Hds12,SP2{n}-S2{n}); Sbandp2s{13,n} = filter(Hds13,SP2{n}-S2{n}); Sbandp2s{14,n} = filter(Hds14,SP2{n}-S2{n}); Sbandp2s{15,n} = filter(Hds15,SP2{n}-S2{n}); Sbandp2s{16,n} = filter(Hds16,SP2{n}-S2{n}); Sbandp2s{17,n} = filter(Hds17,SP2{n}-S2{n}); Sbandp2s{18,n} = filter(Hds18,SP2{n}-S2{n}); end %% Noise 2x Nband2s = cell(18,20); for n = 1:1:20 Nband2s{1,n} = filter(Hds1,BP{n} - S2{n}); Nband2s{2,n} = filter(Hds2,BP{n} - S2{n}); Nband2s{3,n} = filter(Hds3,BP{n} - S2{n}); Nband2s{4,n} = filter(Hds4,BP{n} - S2{n}); Nband2s{5,n} = filter(Hds5,BP{n} - S2{n}); Nband2s{6,n} = filter(Hds6,BP{n} - S2{n}); Nband2s{7,n} = filter(Hds7,BP{n} - S2{n}); Nband2s{8,n} = filter(Hds8,BP{n} - S2{n}); Nband2s{9,n} = filter(Hds9,BP{n} - S2{n}); Nband2s{10,n} = filter(Hds10,BP{n} - S2{n}); Nband2s{11,n} = filter(Hds11,BP{n} - S2{n}); Nband2s{12,n} = filter(Hds12,BP{n} - S2{n}); Nband2s{13,n} = filter(Hds13,BP{n} - S2{n}); Nband2s{14,n} = filter(Hds14,BP{n} - S2{n}); Nband2s{15,n} = filter(Hds15,BP{n} - S2{n}); Nband2s{16,n} = filter(Hds16,BP{n} - S2{n}); Nband2s{17,n} = filter(Hds17,BP{n} - S2{n});
40
Nband2s{18,n} = filter(Hds18,BP{n} - S2{n}); end %% Noise 5x Nband5s = cell(18,20); for n = 1:1:20 Nband5s{1,n} = filter(Hds1,CP{n} - S5{n}); Nband5s{2,n} = filter(Hds2,CP{n} - S5{n}); Nband5s{3,n} = filter(Hds3,CP{n} - S5{n}); Nband5s{4,n} = filter(Hds4,CP{n} - S5{n}); Nband5s{5,n} = filter(Hds5,CP{n} - S5{n}); Nband5s{6,n} = filter(Hds6,CP{n} - S5{n}); Nband5s{7,n} = filter(Hds7,CP{n} - S5{n}); Nband5s{8,n} = filter(Hds8,CP{n} - S5{n}); Nband5s{9,n} = filter(Hds9,CP{n} - S5{n}); Nband5s{10,n} = filter(Hds10,CP{n} - S5{n}); Nband5s{11,n} = filter(Hds11,CP{n} - S5{n}); Nband5s{12,n} = filter(Hds12,CP{n} - S5{n}); Nband5s{13,n} = filter(Hds13,CP{n} - S5{n}); Nband5s{14,n} = filter(Hds14,CP{n} - S5{n}); Nband5s{15,n} = filter(Hds15,CP{n} - S5{n}); Nband5s{16,n} = filter(Hds16,CP{n} - S5{n}); Nband5s{17,n} = filter(Hds17,CP{n} - S5{n}); Nband5s{18,n} = filter(Hds18,CP{n} - S5{n}); end %% find the max value within each band max_valueS2s = zeros(18,20); max_valueN2s = zeros(18,20); max_valueS5s = zeros(18,20); max_valueN5s = zeros(18,20); max_valueSP2s = zeros(18,20); for n = 1:18 for m = 1:20 max_valueS2s(n,m) = max(Sband2s{n,m}); max_valueN2s(n,m) = max(Nband2s{n,m}); max_valueS5s(n,m) = max(Sband5s{n,m}); max_valueN5s(n,m) = max(Nband5s{n,m}); max_valueSP2s(n,m) = max(Sbandp2s{n,m}); end end %% Calculate Power %Signal Level and Noise level E2 = 20.*log10(max_valueS2s./32767); N2 = 20.*log10(max_valueN2s./32767); E5 = 20.*log10(max_valueS5s./32767); N5 = 20.*log10(max_valueN5s./32767); NP2 = 20.*log10(max_valueSP2s./32767); %% Calculate SII SII_val2 = zeros(20,1); SII_val5 = zeros(20,1); SII_valP2 = zeros(20,1); for n = 1:1:20 SII_val2(n) = SII_test('E', abs(N2(:,n)), 'N', abs(E2(:,n)), 'I',
1); SII_val5(n) = SII_test('E', abs(N5(:,n)), 'N', abs(E5(:,n)), 'I',
41
1); SII_valP2(n) = SII_test('E', abs(NP2(:,n)), 'N', abs(E2(:,n)), 'I',
1); end
%% Make Graphs %%BOX PLOTS figure(1); boxplot([SNR2 SNR5], 'labels', {'Two Layers', 'Five Layers'}); hold on; ylabel('SNR (dB)'); title('Signal-to-Noise Ratio'); axis([0 3 -15 15]); figure(2); boxplot([STI2 STI5], 'labels', {'Two Layers', 'Five Layers'}); hold on; ylabel('STI'); title('Speech Transmission Index'); axis([0 3 0 1]); figure(3); boxplot([Plane SII2 SII5], 'labels', {'One Layer','Two Layers', 'Five
Layers'}); hold on; ylabel('SII'); title('Speech Intelligibility Index'); axis([0 4 0 1]);
%%Correlation Plot figure(4); scatter(SNRC, Plane, 'g'); hold on; scatter(SNR2, SII2, 'r'); hold on; scatter(SNR5, SII5, 'b'); hold on; xlabel('Signal-to-Noise Ratio (dB)'); ylabel('Speech Intelligibility Index'); title('SNR vs. SII'); legend('One Layer', 'Two Layers', 'Five Layers'); axis([-25 25 0 1]); %% Calculate Correlation [R, P] = corrcoef([SNRC SNR2 SNR5], [Plane SII2 SII5]);
42
Bibliography
ANSI. "Methods for Calculation of the Articulation Index." ANSI Report No. S3.5-1969
(1969). Print.
ANSI. "Methods for Calculation of the Speech Intelligiblity Index." ANSI Report No.
S3.5-1997 (1997). Print.
Brady, Chris. “Communications”. The 737 Technical Site. 1999 : Web. 10 Feb. 2015.
http://www.b737.org.uk/communications.htm.
British Broadcasting Corporation. “Boeing 747 Constant Quiet Flight”. Sound Effects.
Web. 14 April 2015. http://www.sounddogs.com/sound-
effects/64/mp3/864366_SOUNDDOGS__tr.mp3.
Edgar, Julian. "Inside the Black Box". Auto Speed. 11 Dec 2001: Issue 160. Web. 10 Feb.
2015. http://www.autospeed.com/cms/article.html?&title=Inside-the-Black-
Bo&xA=1227.
French, N. R., and J. C. Steinberg. "Factors governing the intelligibility of speech
sounds." The journal of the Acoustical society of America 19.1 (1947): 90-119.
Garofolo, John, et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1.
Web Download. Philadelphia: Linguistic Data Consortium, 1993.
Hall, Jim. “Safety Recommendation A-96-166 through -171”. National Transportation
Safety Board. 20 Dec. 1996: 1-6. Print.
Honeywell. "SSCVR: Solid State Cockpit Voice Recorder ED056a Voice Recording
System." Product Description, Mar. 2000.
Houtgast, Tammo, and Herman JM Steeneken. "A review of the MTF concept in room
acoustics and its use for estimating speech intelligibility in auditoria." The
Journal of the Acoustical Society of America 77.3 1985: 1069-1077.
Killion, Mead C., and H. Gustav Mueller. "Twenty years later: a NEW count-the-dots
method." The Hearing Journal 63.1 (2010): 10-17.
Kryter, K. D. “Some comparisons between rhyme and PB-word intelligibility tests” The
Journal of the Acoustical Society of America Vol. 37 1965: 1146
Lindgreen, Troels Schmidt, David Pelegrin Garcia, and Eleftheria Georganti.
“Intelligibility of Speech” DTU Technical University of Denmark. 2008: Print.
43
Musch, Hannes, and Pat Zurek. "Programs." SII: Speech Intelligibility Index. ASA
Working Group S3-79, 2005. Web. 22 Apr. 2015.
http://www.sii.to/html/programs.html.
Schwerin, Belinda, and Kuldip Paliwal. "An improved speech transmission index for
intelligibility prediction." Speech Communication 65. 2014: 9-19. Web. 10 Feb.
2015. http://www.sciencedirect.com/science/article/pii/S0167639314000429.
Sumby, William H., and Irwin Pollack. "Visual contribution to speech intelligibility in
noise." The journal of the acoustical society of america 26.2 (1954): 212-215.
Wojcicki, Kamil. "Add Noise." MATLAB Central File Exchange. 14 July 2011. Web. 22
Apr. 2015. http://www.mathworks.com/matlabcentral/fileexchange/32136-add-
noise/content/addnoise/addnoise.m.
44
Curriculum Vitale
Jane Foster
703 362 1001
EDUCATION
Johns Hopkins University Baltimore, MD
Master of Science in Electrical Engineering Expected May 2015
Bachelor of Science in Electrical Engineering Departmental Honors 2014
Bachelor of Arts in French, Departmental Honors 2014
Dean’s List, University Honors
Related Courses: Signals and Systems, Circuits, Digital Signal Processing, Image
Processing and Analysis, Computer Architecture, Electronics Design Lab, Projects in the
Design of a Chemical Car, Mechatronics, Audio Signal Processing, Robot
Sensors/Actuators
EXPERIENCE
Recorders Division Student Trainee Washington, DC
National Transportation Safety Board Department of Research and Engineering 2014
- Process and analyze parametric data stored vehicle recording systems, writing
technical reports on the devices
- Learn about accident investigation processes, surface vehicle navigation, and aircraft
operations
Undergraduate Researcher Paris, France
École Normale Supérieure 2013
- Focus on understanding brain processes in perceiving sounds in complex acoustic
environments
- Developed a computational model for the statistics of hearing perception using
MATLAB and ran tests on the fundamentals of vowel perception
Electrical Engineering Intern Chantilly, VA
Scitor Corporation 2012
- Researched RADAR and link budgets, created images to aid in optimization of office
space and organization
ACTIVITIES
Graduate Advisor, President, Vice President of Chapter Affairs, Treasurer 2010-2015
Johns Hopkins University IEEE
Company Dancer 2010-2015
Johns Hopkins University Ballet Company
Team Leader 2012-2013
Power Beaming Design Team
AWARDS
Electrical and Computer Engineering Student Leadership Award May 2014