comparison of the effects of broad-band noise … · j.p. vetrano chief, comsec engineering office...

37
RADC-TR-86-135 In-House Report August 1986 COMPARISON OF THE EFFECTS OF BROAD-BAND NOISE ON SPEECH INTELLIGIBILITY AND VOICE QUALITY RATINGS Caldwell P. Smith APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED ROME AIR DEVELOPMENT CENTER Air Force Systems Command Griffiss Air Force Base, NY 13441-5700 AD/AI76TOO

Upload: ngokien

Post on 16-Feb-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

RADC-TR-86-135 In-House Report August 1986

COMPARISON OF THE EFFECTS OF BROAD-BAND NOISE ON SPEECH INTELLIGIBILITY AND VOICE QUALITY RATINGS

Caldwell P. Smith

APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED

ROME AIR DEVELOPMENT CENTER Air Force Systems Command

Griffiss Air Force Base, NY 13441-5700

AD/AI76TOO

This report has been reviewed by the RADC Public Affairs Office (PA) and is releasable to the National Technical Information Service (NTIS). At NTIS it will be releasable to the general public, including foreign nations.

RADC-TR-86-135 has been reviewed and is approved for publication.

APPROVED: J^^^-

J.P. VETRANO Chief, COMSEC Engineering Office Electromagnetic Sciences Division

APPROVED:

ALLAN C. SCHELL Chief, Electromagnetic Sciences Division

FOR THE COMMANDER

JOHN A. RITZ Plans and Programs Division

DESTRUCTION NOTICE - For classified documents, follow the procedures in DOD 5200.22-M, Industrial Security Manual, Section 11-19 or DOD 5200.1-R, Information Security Program Regulation, Chapter IX. For unclassified, limited documents, destroy by any method that will prevent disclosure of contents or reconstruction of the document.

If your address has changed or if you wish to be removed from the RADC mailing list, or if the addressee is no longer employed by your organization, please notify RADC (EEV) Hanscom AFB MA 01731-5000. This will assist us in maintaining a current mailing list.

Do not return copies of this report unless contractual obligations or notices on a specific document requires that it be returned.

Unclassified SECURITY CLASSIFICATION OF THIS PAGE

REPORT DOCUMENTATION PAGE la. REPORT SECURITY CLASSIFICATION

Unclassified lb RESTRICTIVE MARKINGS

2a SECURITY CLASSIFICATION AUTHORITY

2b OECLASSIFICATION / DOWNGRADING SCHEDULE

3 DISTRIBUTION/AVAILABILITY OF REPORT

Approved for public release; distribution unlimited.

4 PERFORMING ORGANIZATION REPORT NUMBER(S)

RADC-TR-86-135 5 MONITORING ORGANIZATION REPORT NUMBER(S)

6a. NAME OF PERFORMING ORGANIZATION Rome Air Development

Center

6b OFFICE SYMBOL (It applicable)

EEV

7J NAME OF MONITORING ORGANIZATION

Rome Air Development Center (EEV)

6c ADDRESS (Cry. State, and ZIP Code)

Hanscom AFB Massachusetts 01731-5000

7b ADDRESS (dry. Suit, and ZIP Cod*) Hanscom AFB Massachusetts 01731

8a. NAME OF FUNDING/SPONSORING ORGANIZATION Rome Air

Development Center

8b OFFICE SYMBOL (If applicable)

EEV

9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER

8c. ADDRESS (City. Slate, and ZIP Code)

Hanscom AFB Massachusetts 01731-5000

10 SOURCE OF FUNDING NUMBERS

PROGRAM ELEMENT NO

33401F

PROJECT NO

7820

TASK NO

03

WORK UNIT ACCESSION NO

01 11 TITLE (Include Security Classification) .,..,...

Comparison of the Effects of Broad-Band Noise on Speech Intelligibility and Voice Quality Ratings

12 PERSONAL AUTHOR(S) Caldwell P. Smith

13a TYPE OF REPORT In-House

13b TIME COVERED FROM 85-8-1 TQ86-3-31

14 DATE OF REPORT (Year, Month, Day) 1986 August

15 PAGE COUNT 40

16 SUPPLEMENTARY NOTATION

COSATI CODES

FIELD

17 GROUP

02 SUB GROUP

ecessary and identity by Woo Speech perception

IB SUBJECT TERMS (Continue on reverie if necessary and identify by block number) Voice communication Intelligibility Articulation index

19 ABSTRACT (Continue on reverie if necessary and identify by block number)

Results of a study of effects of broad-band noise on speech intelligibility and voice quality are presented, with a comparison of three methods for evaluating speech signals: the Diagnostic Rhyme Test (DRT) for speech intelligibility, the Diagnostic Acceptability Measure (DAM) test for voice quality and acceptability, and the Mean Opinion Score (MOS also for evaluating speech quality and acceptability.

Speech samples were combined with broad-band noise, with accurate calibration of speech-to-noise energy ratios by using a new measurement algorithm developed under RADC sponsorship. Four scramblings (randomizations) of the Diagnostic Hhyme Test were prepared with each of three male speakers, for assessing speech intelligibility. Connected speech samples from the three speakers were prepared for assessing voice quality and acceptability. The processed speech samples were digitally recorded for subsequent presentation to listener crews.

20 DISTRIBUTION/AVAILABILITY OF ABSTRACT • UNCLASSIFIED/UNLIMITED Q SAME AS RPT DDTIC USERS

21 ABSTRACT SECURITY CLASSIFICATION Unclassified

22a NAME OF RESPONSIBLE INDIVIDUAL Caldwell P. Smith

22b TELEPHONE (Include Area Code)

(617) 377-3281 22c OFFICE SYMBOL

RADC/EEV DO FORM 1473.84 MAR 83 APR edition may be used until eahausted

All other editions are obsolete. SECURITY CLASSIFICATION OF 'HIS PAGE

Unclassified

Unclassified

19. (Contd)

The resulting intelligibility scores were used in constructing the relationship between Diagnostic Rhyme Test scores and values of the articulation index. From this relationship it was possible to estimate from previous standardized intelligibility data, the percent accuracy with which listeners would be expected to receive stereotyped operational messages ("sentences known to listeners") corresponding to various DRT intelligibility scores. For example, it was estimated that a DRT intelligibility score of 70, which has been estimated to be the threshold value below which intelligibility would be unacceptable, corresponds approximately to the value of the articulation index for which listeners would be expected to get 92 percent correct "sentences known to listeners. "

Intelligibility scores and voice quality ratings were also used in constructing regression models relating scores with speech-to-noise energy ratios. Category- scales for intelligibility scores, and voice quality ratings, were compared on a basis of their common relation to the scale of S/N ratios. It was determined that categories (for example, "very good", "fair"; and so on) for intelligibility scores and for DAM voice quality ratings were in rough agreement at the top and bottom of the range studied here (30 dB S/N ratio to 6 dB S/N ratio) but agreed poorly in the middle of the range. Categories for mean opinion scores were in poor agreement with the other two category scales.

Unclassified

Preface

The method for calibrating speech-to-noise energy ratios and the software

for using that algorithm were devised by J.T. Sims. The voice recordings of

speech with noise were prepared by Paul Gatewood. The assistance of

Marianne Bahia, John Fisher, John Olender, Fred Reinhard, Denis Robitaille

and Luigi Spagnuolo in making this study possible is gratefully acknowledged.

11!

Contents

1. INTRODUCTION 1

2. SPEECH MATERIALS FOR ASSESSING EFFECTS OF BROAD-BAND NOISE 2

3. LISTENER TESTS 3

4. INTELLIGIBILITY TESTS RESULTS 3

5. COMPARISONS WITH THE ARTICULATION INDEX 8

6. SPEECH PERFORMANCE WITH "OPERATIONAL MESSAGES" 10

7. VOICE QUALITY AND ACCEPTABILITY TESTS 11

8. MEAN OPINION'SCORES 15

9. COMPARISON OF CATEGORY SCALES 17

10. DISCUSSION 18

11. CONCLUSIONS 18

REFERENCES 2 1

APPENDIX A: Details of Variation in Diagnostic Rhyme Test Feature Scores With Speech-to-Noise Energy Ratios 23

Illustrations

1. Scatter Diagram of Overall Intelligibility Scores by Speakers lor Speech in Broad-band Noise, With a Regression Model Based on the Reciprocal of S/N Ratio 4

2. Category Scale for Diagnostic Rhyme Test Intelligibility Scores, With Examples of Voice Processor Categories 5

3. Intelligibility Scores and Regression Model for Speech in Broad-band Noise, in Relation to the Category Scale for Diagnostic Rhyme Test Intelligibility Scores 6

4. Intelligibility Scores vs S/N Ratio, With Multiple Regression Lines Calculated for Individual Male Speakers 7

5. Regression Lines vs S/N Ratio for the Individual Phonetic Features That Contribute to Intelligibility 8

6. Intelligibility Contours for Different Speech Materials Plotted vs the Articulation Index, With Contour for Diagnostic Rhyme Test Scores Obtained From These Studies 9

7. Scatter Diagram of Scores for Signal Quality vs S/N Ratio With Regression Model and 95 Percent Confidence Limits for the Ensemble of Scores 12

8. Scatter Diagram of Scores for Background Quality vs S/N Ratio With 2nd Order Regression Model Based on the Reciprocal of S/N Ratio 12

9. Scatter Diagram of Scores for Overall Voice Quality vs S/N Ratio With 2nd Order Regression Model Based on the Reciprocal of S/N Ratio 13

10. Category Scale for Diagnostic Acceptability Measure Voice Quality Scores With Examples of Voice Processor Categories 14

11. Scores for Overall Voice Quality vs S/N Ratio and Regression Model, in Relation to the Category Scale for Diagnostic Acceptability Measure Voice Quality Scores 15

12. Scatter Diagram of Mean Opinion Scores vs S/N Ratio With Linear Regression Model 16

13. Mean Opinion Scores and Linear Regression Model in Relation to the Categories Used by Listeners 16

14. Comparison of the Category Scales for Intelligibility Scores, Voice Quality Scores, and Mean Opinion Scores in Relation to the Scale for S/N Ratio and the Estimates of Values of the Articulation Index 17

Al. Regression Models for the Scores for Voicing-present, Voicing-absent, and Voicing (total) vs the Reciprocal of S/N Ratio 24

A2. Regression Models for Voicing-frictional, Voicing-non-frictional, and Voicing (total) vs the Reciprocal of S/N Ratio 24

A3. Regression Models for Voicing-frictional vs the Reciprocal of S/N Ratio, Indicating the Differences Among the Three Male Speakers 25

VI

Illustrations

A4. Regression Models for Voicing (total) vs the Reciprocal of S/N Ratio, Indicating Differences Among the Three Male Speakers 25

A5. Regression Models for Nasality Scores vs S/N Ratio Indicating Virtually No Differences Between the Total Score, and the Present and Absent State 26

A6. Regression Models for the Nasality Scores vs S/N Ratio, Indicating Only Slight Differences Between Total Scores and the Grave and Acute States 26

A7. Regression Models for Sustention (total) vs Reciprocal of S/N Ratio Showing Differences Among the Three Male Speakers 27

A8. Regression Models for Sustention-voiced vs the Reciprocal of S/N Ratio Showing Differences Among the Three Male Speakers 27

A9. Regression Models for Sustention-unvoiced vs Reciprocal of S/N Ratio Showing Differences Among the Three Male Speakers 28

A 10. Regression Models for Sibilation vs the Reciprocal of S/N Ratio, Showing Differences Between the Present and Absent State of Sibilation 29

All. Regression Models for Sibilation vs the Reciprocal of S/N Ratio, Showing Differences Between the Voiced and Unvoiced State of Sibilation 29

A 12. Regression Models for Graveness vs the Reciprocal of S/N Ratio, Showing the Differences Between the Present and Absent State of the Graveness Feature 30

A 13. Regression Models for Graveness vs the Reciprocal of S/N Ratio, Showing the Differences Between the Voiced and Unvoiced State of the Graveness Feature 30

A 14. Regression Models for Compactness vs the Reciprocal of S/N Ratio, Showing the Differences Between the Present and Absent State of the Compactness Feature 31

A 15. Regression Models for Compactness vs the Reciprocal of S/ N Ratio, Showing the Differences Between the Voiced and Unvoiced State of the Compactness Feature 3 1

Tabl es

1. Estimate of Speech Performance With "Operational Messages"

vu

Comparison of the Effects of Broad-Band Noise on Speech Intelligibility and Voice Quality Ratings

1. INTRODUCTION

In the past few years, increased attention has been focused on effects of

acoustic background noise in degrading performance of digital voice communications

processors. In order to expand the information on this problem, a number of noise

environments of special interest including jet and prop aircraft cabin noise, typical

office noise, noise in shipboard environments, and the background noise in certain

vehicles were measured and recorded, and subsequently simulated in sound rooms

for the purpose of preparing standardized speech recordings representative of the

effects of those noise environments, for assessing speech intelligibility and voice

quality of various digital voice communications processors.

Those studies did not attempt to address the effects of noise environments on

listeners, for several reasons. Designers of digital voice processors and pre-

processors have virtually no options available for modifying their algorithms to

remedy that problem. In many cases, appropriate headphones and ear protectors

provide adequate solutions to the problem of noisy listener environments. Of

particular importance, speech testing to assess voice quality and naturalness is

most critically conducted when listeners are in a quiet environment, since when

listeners are placed in a noisy acoustic environment for the purpose of conducting

(Received for publication 7 August 1986)

speech tests, the noise can tend to mask distortions and make systems having

degraded voice quality sound more acceptable.

Using recorded speech test materials representing the various noise environ-

ments that were simulated, it has been possible to conduct intelligibility and voice

quality tests of voice processors with a high degree of reliability and repeatibility

of test results. However, it was found that there were wide variations, as much

as 10 to 12 dB, in the speech-to-noise energy ratios of different talkers used in

these simulations, even under conditions of close control of sound pressure levels,

careful phasing of transducers to obtain uniform noise fields, and close attention

to details of microphone placement, instructions to talkers, and so on.

Facing a need to accurately calibrate dozens of recordings of speech by

multiple talkers in various noise environments, a study was funded under which a

contractor worked in the Rome Air Development Center speech test and evaluation

facility to develop a measurement algorithm that might be used to facilitate accurate,

efficient measurement and calibration of speech-to-noise energy ratios. The

successful result of that study has now been published in the literature, and the

measurement algorithm is now being used to accurately calibrate each speaker's

speech recordings in the speech test and evaluation library of recordings used by

the Department of Defense Digital Voice Processor Consortium. The algorithm

was also used to calibrate speech-to-noise energy ratios in this pilot study.

2. SPEECH MATERIALS FOR ASSESSING EFFECTS OF BROAD-BAND NOISE

This study utilized existing recordings of three male speakers in a quiet non-

reverberant acoustic environment, using a high-quality (Altec 659A) dynamic

microphone in a close-talking position approximately 6 cm from the lips. The

reproduced speech signals were electrically mixed with white noise from a broad-

band noise generator and both were low-pass filtered at 4 kHz. Speech levels were 2

standardized using the measurement algorithm of Brady, and using the calibration

method of Sims, the speech-to-noise energy ratios were successively set at 6 dB,

12 dB, 18 dB, 24 dB, and 30 dB. At each S/N ratio high quality digital audio

recordings were prepared with a Sony PCM Fl digitizer, using sentence lists for

the purpose of assessing voice quality and acceptability and four scramblings

1. Sims, J. T. (1985) A speech-to-noise ratio measurement algorithm, J. Acoust. Soc. Am. . 7_8(No. 5): 167 1 - 1674.

2. Brady, P. T. (1968) Equivalent peak level: a threshold-independent speech- level measure, J. Acoust. Soc. Am., 44:695-699.

(randomizations) of the Diagnostic Rhyme Test (DRT) for assessing speech intel-

ligibility, for each of the three male speakers.

3. LISTENER TESTS

Evaluations of the speech test materials were performed "blind" by a con-

tractor, Dynastat Inc. , in which ten-member listener crews were presented the

reproduced digital recordings at an optimum listening level over headphones in a

sound room. Listener judgements of voice quality and acceptability were assessed

independently by two methods: the Diagnostic Acceptability Measure (DAM) test 3 4 of Voiers, and by mean opinion scores (MOS).

4. INTELLIGIBILITY TESTS RESULTS

Results of these intelligibility tests are summarized in Figure 1. Using three

talkers and four replications of the test at each of the five S/N ratios provided 12

speaker scores at each S/N ratio, or 60 scores in all. The scatter diagram of

scores shows the variation in scores that occurred at each S/N ratio, and an

exaggeration of a typical tendency for dispersion of scores to vary inversely with

the average score. Several regression models were calculated for the relationship between

intelligibility scores and S/N ratio, which led to a choice of the equation and

regression line shown plotted in Figure 1, expressing intelligibility in relation

to the reciprocal of the S/N ratio in dB. That particular regression model re- 2

suited in a value of r = 0. 9 1 and a standard error of estimate of 2. 48. The

regression equation is obviously not useful for extrapolating outside the range from

6 to 30 dB S/N (on this scale, 0 dB S/N is at infinity). However, the regression

model is suited for the purpose intended here, of relating this data to a scale of

categories of speech intelligibility scores.

3. Voiers, VV. D. (1977) Diagnostic acceptability measure for speech communica- tions systems, IEEE Proc. 1CASSP 77CH1197-3 ASSP. pp. 204-207.

4. CCITT (1981) Telephone Transmission Quality: Recommendations of the P Series, Yellow Book Vol. V, ITU, Geneva.

<D L 0 0

CO

X

100

90-

Q J) 80 + •H

U) *2- .91

^ ?C| I Expectation! (6<S/N<30> Q) /lfl T ORT SCOT.

-P C

101.5 - 158.6/CS/N in db)

i i 3024 18 12

Speech/Noise Rat 10 (db)

Figure 1. Scatter Diagram of Overall Intelligibility Scores by Speakers, for Speech in Broad-band Noise, With a Regression Model Based on the Reciprocal of S/N Ratio

The category scale for DRT intelligibility scores was established to assist in

the interpretation of intelligibility scores by users and planners of digital voice

communications systems. Intelligibility scores are classified in terms of eight

categories, ranging from "excellent" to "unacceptable", based on the ranges

illustrated in Figure 2. This category scale, which has been published previously

rates intelligibility scores below 70 as "unacceptable". However, there has been

some evidence that highly stereotyped messages well-known to talkers and

listeners can be successfully exchanged over a telephony channel even under condi-

tions such that the average intelligibility of the channel (assessed with the Diagnostic

Rhyme Test) is below 70.

5, 6

5. Smith, C. P. (1983) Narrowband (LPC-10) Vocoder Performance Under Combined Effects of Random Bit Errors and Jet Aircraft Cabin Noise, RADC-TR-83-293, ADA141333. Rome Air Development Center, Griffiss AFB, N. Y.

6. Smith, C. P. (1983) Relating the performance of speech processor to the bit error rate. Speech Technology 2(No. 1 ):41-53.

Categories of DRT Intelligibility Scores DRT Score Descriptive Examples

Category

100 " _ ,, o Hiqh Fidelity Speech Excellent 3 ' r

QR oCVSD-32

gi ___!"_r_-°°---_°_C_VS_DI16 o Conus Voice-Grade Tel. Service

Q7 o LPC-10, 'ideal' conditions

" ~ " ~o~LPI>r0~wTth ~error~

__ protection, 2Z BER

o LPC-10; no error

7Q protection, 21 BER -o~LPC~-F0~w7th "er^ro^"

__ protection, SZ BER

o Experimental 800 BPS Very Poor u . D

__ ' Voice Processor

~o LPC~-F0 in Helicopter UNACCEPTABLE newcop er

Figure 2. Category Scale for Diagnostic Rhyme Test Intelligibility Scores, With Examples of Voice Processor Categories

When the category scale is combined with the intelligibility scores and regression

model obtained in this study, the result presented in Figure 3 is obtained. A

30 dB S/N ratio resulted in scores distributed about equally in the "excellent" and

the "very good" categories, with the average value approximately at the boundary

between these categories.

The average score at 24 dB S/N ratio was in the "very good" category, with a few speaker scores in the "excellent" category.

All of the scores clustered in the "very good" category for the 18 dB S/N

condition.

Dispersion of the individual scores was noticeably increased at 12 dB S/N ratio,

with individual scores ranging from "very good" to "fair", with the average value

falling in the "good" category. Still greater dispersion was evident at 6 dB S/N,

individual scores ranging from "fair" (two), to "poor" (four), to "very poor" (five)

to "unacceptable" (one). The average intelligibility obtained at this S/N ratio was

approximately at the boundary between the "poor" and "very poor" categories.

L

90

80-

70*

• excel lent

0 0

CO

X

•SI •

liNJ •

very good

good

r- H moderate 1 \ •

CD

fair • N. • •

poor ^\^.

•H very poor •

d)

c t—1

unacceptable

1 1 1 1

1—

30 24 18 12 6

Speech/Noise Ratio (db)

Figure 3. Intelligibility Scores and Regression Model for Speech in Broad-band Noise, in Relation to the Category Scale for Diagnostic Rhyme Test Intelligibility Scores

Individual talkers have been consistently found in hundreds of tests to exhibit

significant differences in intelligibility scores (any intelligibility score based on

a single speaker should be viewed with suspicion). Significant differences were

found in the scores for these three speakers, though the background noise added

to the dispersion of the scores and made the speaker differences less conspicuous.

The relative ranking of the speaker's scores tended to be maintained each S/N ratio;

consequently an alternative regression model with separate regression lines calcu-

lated lor each speaker and having a common slope resulted in an increase of r to

0. 94 and a reduction of the mean square residual. The alternative regression

model is illustrated in relation to the scatter diagram of scores in Figure 4.

A "fringe benefit" of intelligibility testing with the Diagnostic Rhyme Test is

that it provides separate, independent scores for various phonetic features that 7 8

contribute to intelligibility ' and permits an evaluation of the effects of noise on

7. Voiers, VV.D. (1977) Diagnostic evaluation of speech intelligibility, in Speech Intelligibility and Speaker Recognition, M. Hawley, Ed. , Dowden Hutchinson & Ross, Stroudsburg, PA, pp. 374-387.

8. Voiers, W. D. (1983) Evaluating processed speech using the Diagnostic Rhyme Test, Speech Technology. l(No. 3):30-39.

various components of intelligibility. Those findings are summarized in Figure 5.

Nasality was the least impaired by noise interference, followed in order of in-

creasing susceptibility to noise by voicing, compactness, and sibilation; graveness

and sustention (the feature that distinguishes between sustained and abrupt con-

sonants) were the features most vulnerable to noise interference. The detailed

effects of noise interference on various combinations of feature states, for

example, the present and absent states, and the voiced and unvoiced states of the

various features, are detailed in Appendix A.

Q

L 0 0

90 •

r2 - .94 Exp«ct ati ons DRT Soor. -

1—1 1

Spaakar Origin

RH 103.2

^ " JE 99.6

X -p •H

i—t

•H

0) •«-t

l—i

H 70- QJ

-P C

i—t

. Xfs • .• ^

(6<S/N<30> I'" 100-<M«3. 125) (origin) -158. 6/(S/N in do) ..

—1 1

3024 18 12 Speech/Noise

0 1 2 3 4 5 6M

7

8 9 10 11 12

Ratio (db)

Figure 4. Intelligibility Scores vs S/N Ratio, With Multiple Regression Lines Calculated for Individual Male Speakers

-0 •H

* 30- —

20- Z

10- -P c

0

50-

40- - Faoiur* Origin Slops

Nowllty 99.2 -IB. 6 Voicing 07.2 -53.6

CoapoeUM* 184.9 -169.8 Slbllotion 188.1 -218.4 Crovonaoo 96.8 -227.4 Sootontton 111.8 -272.5

• •>

12 3024 18 Speech/Noise Ratio (db)

Figure 5. Regression Lines vs S/N Ratio for the Individual Phonetic Features That Contribute to Intelligibility

5. COMPARISONS WITH THE ARTICULATION INDEX

The articulation index provides a means of predicting speech intelligibility of

different types of speech materials (nonsense syllables, phonetically balanced word

lists, sentences) in relation to the speech signal level and the interfering noise level 9

and their energy spectra, in combination with different listening conditions. Those

relationships have been summarized in an American National Standard that pro-

vided the basis for the summary shown in Figure 6. The curve labeled "Rhyme

Tests" in the figure is based on earlier versions of rhyme tests that, unlike the

Diagnostic Rhyme Test, did not provide an adjustment of intelligibility scores for

chance effects

9. Beranek, L. L. (1947) The design of speech communications systems IRE Proc. , 3_5(No. 9):880-890.

10. ANSI S3. 5-1969 (1969) American National Standard Methods for the Calculation of the Articulation Index, American National Standards Institute, N. Y.

100

QJ L O o

CO

90

80

>.70 -P •iH

,-H 60

CO 50

40 QJ

_P C 50

^—^

_C 20 o QJ 0J 10 Q_

CO 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 ARTICULATION INDEX

0.8 0.9

Figure 6. Intelligibility Contours lor Different Speech Materials Plotted vs the Articulation Index, With Contour for Diagnostic Rhyme Test Scores From These Studies. This figure derives from ANSI S3. 5-1969, American National Standard Methods for the Calculation of the Articulation Index. This version is non-standard, in that the Diagnostic Rhyme Test curve has been added. Also the ordinate scale which is usually labeled "Percent of syllables, words, or sentences understood correctly" is relabeled "Speech Intelligibility Score", as DRT scores are not "percent correct" but include a correction for a priori probabilities

This study made it possible to estimate values of the articulation index at each

of the S/N ratios of these tests and construct a new curve that has been added to

the figure, estimating the variation in Diagnostic Rhyme Test scores with values

of the articulation index over the range studied here.

The ordinate scale in Figure 6 has customarily been labeled "Percent of

syllables, words, or sentences understood correctly. " However, Diagnostic

Rhyme Test scores are corrected for a priori probabilities with the calculation

r-r,-, (Number of items right — Number of items wrong) v . IJ!\I score = w*—. I r r *. ^ I Total number of items

00

Accordingly, the ordinate scale in Figure 6 was re-labeled "Speech Intelligibility

Score." The dashed contour labeled "Diagnostic Rhyme Test" (three male speakers)

represents the relationship calculated in this study. If these DRT scores were

modified to remove the correction for chance and thus express "Percent correct"

a curve would be obtained that approximates the older curve labeled "Rhyme Tests"

for the range investigated here. However, the asymptote of the curve would not

approach 100 for "perfect conditions" (A.I. = 1.0) as there are typically about

2 percent listener errors for Diagnostic Rhyme Tests conducted with high fidelity

speech signals.

6. SPEECH PERFORMANCE WITH "OPERATIONAL MESSAGES"

The relationships between speech intelligibility and the articulation index

summarized in Figure 6 permit estimation of speech performance with stereotyped

voice messages well-known to listeners (typical "operational messages" that have

been advocated for use in speech test and evaluation), shown in Table 1.

Extrapolation of the curve for DRT scores vs the AI gives an estimate of an

articulation index of about 0. 30 for a DRT score of 70, the score that has been

postulated as representing a threshold score representing the boundary between

"unacceptable" and "very poor" intelligibility performance. While the ANSI

relationships shown in Figure 6 suggest that well-known stereotyped voice messages

might be received with better than 90 percent accuracy under such conditions, the

relationships also suggest that were any emergency to occur in which communi-

cators were required to depart from their usual stereotyped communications and

need to use unfamilar words and phrases, the intelligibility performance of that

channel would present serious difficulties. Figure 6 also tends to explain why

some speech tests using "operational messages" resulted in subjects producing

judgements that a voice processor had acceptable performance, even though that

processor had scored below 70 in formal Diagnostic Rhyme Tests.

10

Table 1. Estimate of Speech Performance With "Operational Messages" Based on the Articulation Index

S/N Ratio Avg. DRT Estimated Est. of Avcj. percent correct: Score A. I. Sentences known to Listeners

('Operational Messages')

30 db 96 .85

24 db 95 .76

18 db 93 .64

12 db 87 . 50

6 db 75 .34

(by extrapolation:)

(70) (.30)

99Z

99Z

98Z

97Z

94Z

(92X)

7. VOICE QUALITY AND ACCEPTABILITY TESTS

As the voice quality and acceptability tests were not replicated there were far

fewer listener scores than for intelligibility. Separate scores for signal quality

and for background quality are obtained with the Diagnostic Acceptability Measure

(DAM) test. A weighted combination of background and signal quality scores pro-

duces scores for overall quality, called the Composite Acceptability Estimate

(DAM/CAE). With additive background and linear processing it might be anticipated

that background quality scores would vary with the S/N ratio and signal quality

scores would remain relatively constant. This was not the case. While background

quality scores varied more widely, there was also significant variation in the

judgements of signal quality at the various background noise levels. Figure 7 shows

the scatter diagram of signal quality scores (DAM/CSA) in relation to S/N ratio

with the regression line and 95 percent confidence limits for the ensemble of

scores, modelled in relation to the reciprocal of S/N ratio.

Scores for background quality, representing the Composite Background Accept-

ability (DAM/CBA) are shown in relation to S/N ratio in Figure 8. A 2nd order

regression model based on the reciprocal of S/N ratio resulted in a value of 2

r = 0. 98 and a standard error of estimate of 1. 38, with a range of approximately

25 compared with a range of approximately 10 points for signal quality.

11

90

m 80

0 0

< tn x

< o a

70 •

60

50-

40

30 + II 0 20

> 10

r- - .62 Exp.ctation: (6<S/N<30) DAM (CSA) «+79. 2 -66.4/<S/N in db) Standard trror of E«t. • 2. 699 MoU Sp«ak.r« (RH, JE, CH>

3024 18 12 Speech/Noise Rat 10

6 (db)

Figure 7. Scatter Diagram of Scores for Signal Quality With Regression Line and 95 Percent Confidence Limits for the Data Points, Based on Reciprocal of S/N Ratio

90- 01 L 80- 0

CD ,x 60-

2:- 50- < S Qa 40'-

2 30' 0 20- >

10-

r' - .98 Expectation: <6<S/N<30) DAM(CBA)~*88.3 -468. 7/S +1328. 9/ST

S =

Standard tprop of Eot. • Hal* Sp.ak.ra (RH,JE, CH)

1.38

-+- -*- 3024 18 12 Speech/Noi se Rat l o (db)

Figure 8. Scatter Diagram of Scores for Background Quality vs S/N Ratio With 2nd Order Regression Model Based on the Reciprocal of S/N Ratio

12

Scores for overall quality are not the average of the signal and background

quality scores, but a function of the product of the two. A scatter diagram of those

scores representing the Composite Acceptability Estimate (DAM/CAE) is pre-

sented in Figure 9 with a 2nd order regression curve based on the reciprocal of 2

S/N ratio. This regression model resulted in a value of r =0. 96 and a standard

error of estimate of 1. 94.

90 <D L 80

< Xs 60

DQ40

W 30 0

0 20 >

10

Expectation: (6<S/N<30)

DAM(CAE)=+77.6 -474. 9/S +1510. 9/S Standard error of Est.= 1.94

Male Speakers(RH, JEfCH)

I I

s = 3024 18 12 Speech/Noise Rati io

6

(db)

Figure 9.', Scatter Diagram of Scores for Overall Quality vs S/N Ratio. With a 2nd Order Regression Model Based on the Reciprocal of S/N Ratio

The same caveat regarding extrapolation to estimate scores outside the

measured range of S/N ratios (6 dB to 30 dB) expressed for speech intelligibility

data applies here, and even more strongly in the case of the 2nd order regression

models. Again, however, the regression model serves a useful purpose in con-

junction with a scale of categories that has been established to assist in the inter-

pretation of DAM voice quality ratings. The category scale, shown in Figure 10,

utilizes the same eight labels for categories as used for intelligibility scores,

ranging from "excellent" to "unacceptable". The category scale refers to the

scores for overall voice quality (DAM/CAE): no separate category scales for signal and background quality have been attempted.

13

Categories of DAM Voice Quality Scores

DAM Score Descriptive Examples

Category

_ ,, o Hiqh Fidelity Speech Excellent a ' r

64 OORHS "(Zero ~BER~ Very Good

58 - --j- - -z~- - -R-

Good 53 o~CVSiM6 7Zer^~BERr

Moderate n... Q in Urrice noise

~o~LPC"-r0~wTth "error"

_ protection, 1Z BER

~o~LPC"-r0~«Tth "error" _R protection, 2Z BER

~~~ ~~ ~o~LPC"-r0~wTth"erT-^r~ __ ' protection, 5Z BER

" UN"ACC"EPT"ABLE' "°'LPC" f0'i; "Hal fc^ "

Figure 10. Category Scale for Diagnostic Acceptability Measure Voice Quality Scores With Examples of Voice Processor Categories

As with the category scale for intelligibility scores, these labels do not repre-

sent judgements of listeners but judgements of a committee of experts that has

been involved with extensive tests of voice processors and has had opportunities

to obtain informal judgements of voice processor quality from users and correlate

those opinions with results of formal DAM tests. This category scale should be

considered tentative and subject to revision as further knowledge is gained (as is

the case with the category scale for intelligibility scores).

Combining the category scale for voice quality scores with the data obtained

in these studies produce the result shown in Figure 11.

The 30 dB S/N ratio resulted in voice quality ratings clustered around the

boundary between the "excellent" and "very good" categories, a result that by

coincidence was similar to that obtained with speech intelligibility scores. The

24 dB S/N ratio resulted in voice quality ratings in the "very good" and "good"

categories, while the 18 dB S/N ratio resulted in scores bracketing the "good"

category. At 12 dB S/N ratio the voice quality scores clustered around the

boundary between "moderate" and "fair". The 6 dB S/N ratio produced scores in

the "poor" category.

14

01 L 0 0

^^

< ^

< §

<D 0

•l-t

0 >

90

80

70

60

50

40

30

20 +

10

S

••

excellent

^ty^ very good ;:vv^-

good moderate

fair ^*^~-^_

poor •

v«ry poor

unacceptable

1—1 1 1 1

3024 18 12 Speech/Noise Rat x o

6 (db)

Figure 11. Diagnostic Acceptability Measure Scores for Overall Quality vs S/N Ratio and Regression Model, in Relation to the Category Scale for Diagnostic Acceptability Measure Voice Quality Scores

8. MEAN OPINION SCORES

Mean opinion scores result from tests in which listeners make direct judge-

ments of telephony channels. There are differing versions of the test procedure.

In one version, subjects conduct conversations over the telephony channel under

test and then make their judgements of the channel. Other versions involve only

listening to speech samples and then rating the speech sample. In this instance

the ratings were obtained by the latter method, using a five-point scale represent-

ing "excellent", "good", "fair", "poor", and "bad", with results shown in

Figure 12 together with a regression model.

The speech samples on which listeners made their judgements were the same

sentence recordings used for the Diagnostic Acceptability Measure voice quality

tests. The regression model best fitting these points was based on the S/N ratios

rather than the reciprocals of those values, and resulted in a value of r =0. 93 and

a standard error of estimate of 0. 17. The ratings and the regression curve are

shown in relation to the category scale for the mean opinion scores in Figure 13.

15

L 0 0 10

5-

4-

a- 3 Or

C^2+

c 0 1-

r2 = .93

Expectation: (6<S/N<30)

MOS = +1.4 +. 068* (S/N in db)

Standard error of Est. = . 17 1 1 1 1 1

6 12 18 24 30

Speech/Noise Ratio (db)

Figure 12. Scatter Diagram of Mean Opinion Scores vs S/N Ratio With Linear Regression Model

12

>eec h/N oi se 18

Rat 24 30

10 (db)

Figure 13. Mean Opinion Scores and Linear Regression Model in Relation to the Categories Used by Listeners

16

9. COMPARISON OF CATEGORY SCALES

In Figure 14, the category scales for intelligibility and quality ratings are

compared, based on their common relationship to S/N ratio; the values calculated

for the articulation index at the five S/.\T ratios tested are also shown.

DRT Intelligibility

Score

DAM(CAE) Voice Quality

Score

exceJJent ex-ceJ'Jen 6 1

30- 0

-P °24-

very good

very good 1

QJ

218- 0

good |

Il2 good moderate

0 QJ

moderate fain

fair

Q.6- CO

• • poor '' very poor poor |

MOS AI Mean Opinion Estimate

Score

•- (. 85)

fair

poor

»- (. 76)

•- (. 64)

i-(.5)

L (. 34)

Figure 14'; Comparison of the Category Scales for the Intelli- gibility Scores, Voice Quality Scores, and Mean Opinion Scores in Relation to the Scale for S/N Ratio and the Estimates of Values of the Articulation Index

Diagnostic Rhyme Test intelligibility and Diagnostic Acceptability Measure

voice quality scales are in fair agreement at the top and bottom of the range but

show little agreement in the middle of the range. The mean opinion score ratings

gave fair agreement only at the bottom of this scale.

17

10. DISCUSSION

The discrepancies between the category scales for Diagnostic Rhyme Test

intelligibility scores. Diagnostic Acceptability Measurement voice quality scores,

and Mean Opinion Scores emphasize the different origins of these scales. While

the categories representing the numerical ratings in obtaining Mean Opinion Scores

represent direct judgements by listener crews, the other two category scales were

created by a committee with members long experienced in test and evaluation of

digital voice communications processors and the interpretation of Diagnostic Rhyme

Test Scores and Diagnostic Acceptability Measurement scores. Repeatedly faced

with the problem of interpreting to others the significance of scores obtained for

voice processors under different test conditions, it was decided to construct rating

scales based on descriptive labels, that might be used by anyone wishing to esti-

mate the significance of a particular Diagnostic Rhyme Test intelligibility score

or Diagnostic Acceptability Measurement voice quality rating. The category scales

for these scores were constructed by the committee after many discussions of this

issue and extensive reviews of performance data covering a wide variety of pro-

cessors and test conditions.

It is therefore not surprising, considering the ad hoc nature of the Diagnostic

Rhyme Test and Diagnostic Acceptability Measurement category scales, that the

categorizations do not agree well in their relationship to the effects of broad-band

noise on speech. These findings may provide the basis and incentive for further

studies of the discrepancies between the category scales and the issue of whether

the scales might or should be brought into closer agreement.

11. CONCLUSIONS

Tests of speech with additive broad-band noise resulted in the following findings;

• Estimates were made of the relationship between

Diagnostic Rhyme Test intelligibility scores, and

values of the articulation index.

• A comparison of category scales for Diagnostic Rhyme

Test intelligibility scores, Diagnostic Acceptability

Measurement voice quality ratings, and Mean Opinion

Scores was established. Values of the articulation

index in relation to those category scales were

established.

18

From the relation between Diagnostic Rhyme Test

scores and the articulation index it was possible to

make estimates of the percent correct of stereotyped

messages known to listeners, that is, "operational messages", in relation to Diagnostic Rhyme Test scores.

Test results highlighted the importance of conducting

speech test and evaluation with multiple speakers, and

of replicating tests whenever practicable.

The study confirmed the utility of the new algorithm

developed in the Rome Air Development Center speech

test and evaluation facility for measuring and calibrating

speech-to-noise energy ratios.

19

References

1. Sims, J. T. (1985) A speech-to-noise ratio measurement algorithm, J. Acoust. Soc. Am. , TJUNo. 5):167 1-1674.

2. Brady, P. T. (1968) Equivalent peak level: a threshold-independent speech- level measure, J. Acoust. Soc. Am., 44:695-699. __________________-________——— ***—_

3. Voiers, W.D. (1977) Diagnostic acceptability measure for speech communica- tions systems, IEEE Proc. ICASSP 77CHU97-3 ASSP, pp. 204-207.

4. CCITT (1981) Telephone Transmission Quality: Recommendations of the P Series, Yellow Book Vol. V, ITU, Geneva.

5. Smith, C. P. (1983) Narrowband (LPC-10) Vocoder Performance Under Combined Effects of Random Bit Errors and Jet Aircraft Cabin Noise, RADC-TR-83-293, ADA141333, Rome Air Development Center. Griffiss AFB, N. Y.

6. Smith, C. P. (198.3) Relating the performance of speech processor to the bit error rate, Speech Technology 2(No. l):41-53.

7. Voiers, W.D. (1977) Diagnostic evaluation of speech intelligibility, in Speech Intelligibility and Speaker Recognition, M. Hawley, Ed. , Dowden Hutchinson & Ross, Stroudsburg, PA, pp. 374-387.

8. Voiers, W.D. (1983) Evaluating processed speech using the Diagnostic Rhyme Test, Speech Technology, HNo. 3):30-39.

9. Beranek, L. L. (1947) The design of speech communications systems, IRE Proc. 3i5(No. 9):880-890.

10. ANSI S3. 5-1969 (1969) American National Standard Methods for the Calculation of the Articulation Index, American National Standards Institute, N. Y.

21

Appendix A Details of Variation in Diagnostic Rhyme Test Feature

Scores With Speech-to-Noise Energy Ratios

Figures Al through A 15 that follow, present the results of analyzing separately

the effects of broad-band noise on each of the intelligibility feature states.

Nasality was little affected by noise over the range tested here; this was true

not only for the overall scores for this feature, but also for the contrasts between

nasality-present and nasality-absent, and for the grave and acute states of this

intelligibility feature.

The remaining feature scores tended to show varying degrees of susceptibility

to noise interference. The voiced state of sibilation was degraded by noise to a

greater degree than the unvoiced state; however the opposite was true for the

features graveness and compactness.

The present state of sibilition and graveness were more susceptible to noise

than the absent state; however the opposite was true of the features voicing and

compactness. The feature scores that exhibited significant differences among

speakers included voicing (frictional) and voicing (total); sustention (voiced),

sustention (unvoiced) and sustention (total).

23

100

JJ80 + X70 +

"£ 60-

en

50-

40-

30-

« 20{

0

Voicing F«tur. Qrigln Slop* r2

• Pr~«it. 99.4 -S1.7 .14 AbW. 95.1 -55.8 .21

Voioinq: Total r • . c4 Expectation: (6<S/N<30) Scor. - 97.24 -53.6/CS/N in db>

Standard trror of E«t.t 4. 67

t t —»— 12 3024 18

Speech/Noise Rat 10 (db) Figure Al. Regression Models for the Scores for Voicing-present, Voicing-absent, and Voicing (total) vs the Reciprocal of S/ N Ratio

1001 01 L 93

x7B-

~ 60- l—t

Voicing Faotur* Origin

T Frlot. 83.9 Slap. -M.7 .88

•* 50-

U1

Hon-frlst. IK. 6

Voicing: Total

-56.6 .41

- 30

^ 20^

rZ- .24

Exp.ct ati oni (6 <S/N <30)

Soor« - 97.24 -53 .6/CS/N in db)

i 10 Standard irror of E.t.t 4.67

I 1 1 1 r— 3024 18 12 Speech/Noi se Ratio (db)

Figure A2. Regression Models for Voicing-frictional, Voicing-non-frictional, and Voicing (total) vs the Reciprocal of S/N Ratio

24

1001 0) L 90-

X70-

^ 60-

^ 50- •H 40.. (7) - 30-

H 20

0

Spaokar

- - RH

JE

— CH

Origin

98.9 84.8 97.9

Voicing: Frictional

r2- .63

Exp.ctation: (6<S/N<30)

Scor. - (origin) -50. 7/(S/N in db)

Standard «rror of Eat.t 5. 6

H r- 3024 18 12 Speech/Noise Ratio (db)

Figure A3. Regression Models for Voicing-frictional vs the Reciprocal of S/N Ratio, Indicating the Differences Among the Three Male Speakers

QJ L 0 0

CO

X

•H

(D

QJ -P

C

100

90

80

70

60

50

40

30

20

10

0

Spaakcr Origin

CH 98.3 RH 180.7 • JE 92.8

Voicing: To tal r2 = .64 Expectation: (6<S/N <30) Voicing: T otal =

.. (origin) -53. 6/ (S/N in db) Standard 1—1 1—

error of E 1

st. : 3.31 1

30 24 18 12 Speech/Noise Ratio (db)

Figure A4. Regression Models for Voicing (total) vs the Reciprocal for S/N Ratio, Indicating Differences Among the Three Male Speakers

25

lUUl at L 98

~ 68 Nasality

Faaiurt Origin Slop*

Pr^«nt 98.9 -13.7 r-2

.85 •H 50- _Q '^40-

- 30-

U 20-

Ab.«nt 99.5 -23.4

Natality: Total r2- . 18 Expectation: (6<S/N<30)

.19

Scort - 99.2 -18. B/CS/N in db)

H 11J Standard trror of E»t.t 1.9

1—( 1 1 ,

3024 18 12 Speech/Noise Rat l o (db)

Figure A5. Regression Models for Nasality Scores vs S/N Ratio, Indicating Virtually No Differences Between the Total Score, and the Present and Absent State

100

JJ80 + X70 +

~Z 60 \

ID

Hi -P C

50-

40-

30-

20

10-

0

Nasallty F«atur« Origin

-7 Crov« 99.6 Acut« 96.8

Na»ality: Total r2- .18

Slop. -27.9

-9.3

r2 .17 .05

Expectation* (6<S/N<30) Soar. - 99.2 -18. 6/(S/N in db) Standard trror of E"t. t 1.9

I t

3024 18 12 Speech/Noise Ratio (db)

Figure A6. Regression Models for Nasality Scores vs S/N Ratio, Indicating Only Slight Differences Between Total Scores and the Grave and Acute States

26

QJ L 0 0 CO

X -p

•l-t

•H 00

QJ -P c

100

60'

50-

40-

30-

20 •

10-

0

Sustention: Total r2 = .90

Expectation: (6<S/N<30) Sustention: Total =

(origin)-272. 5/(S/N in db) Standard error of Est.: 4. 91 i t i 1

3024 18 12 6 Speech/Noise Ratio (db)

Figure A7. Regression Models for Sustention (total) vs Reciprocal of S/N Ratio Showing Differences Among the Three Male Speakers

1MD- ^^ QJ "**—•^

k 90' -CL^r — *—— „ o ^*——»»»^— ^^ """•"•^•^

CO 80' ^^^TT; ., x70- ^""^••"••-^c-

-»> • Spaokv Origin

•H 60 ' : RH 97.9 •—< v JE 84.6 •H 50 CH 1B4.7

•H 40. U) Sustention: Voiced

Z 30] r2= .73

^ 20- Expectation: (6<S/N<30)

Score = (origin) -174.9/(S/N in db) Standard error of Est.: 7.5 "c 10

i—• 1—i I 1 I

30 24 18 12 6 Speech/Noise Ratio (db)

Figure A8. Regression Models for Sustention-voiced vs the Reciprocal of S/N Ratio Showing Differences Among the Three Male Speakers

27

100 01

X70j -P •H 60 •

^ 50- _Q '^40i CD - 38 f

H 20

In

Spaokar Origin

RH 115.9

JE 105.0

CH 102.7

Sustention: Unvoiced r2 - .87 Expectation: (6<S/N<30) Score = (origin) -370. 1/(S/N in db) Standard error of Eet.: 7.8

I I 30 24 18 12 6 Speech/Noise Ratio (db)

Figure 'A9. Regression Models for Sustention-unvoiced vs Reciprocal of S/N Ratio Showing Differences Among the Three Male Speakers

28

W

ai

c

40-

30-

20-

10-

0

— — — Pr«Mnt 50+ Abaant

Sibilation F»aiur« Origin Slop*

112.1 -316.5 103.2 -103.3

.77

.65

Sibi1 at ions Total

r2« .81

Exp.ctation: (6<S/N<30)

Scor. -108.1 -218.4/CS/N in db) Standard «rror of E«t. i 5. 3

-l—i i i 3024 18 12 6 Speech/Noise Ratio (db)

Figure A 10. Regression Models for Sibilation vs the Reciprocal of S/N Ratio, Showing the Differences Between the Present and Absent State of Sibilation

d)

X70-

*^ *^^ '••••.

Sibilation •^ ^ •H 60-

i—i F«aUra Origin Slop* r2

Voicd 109.7 -275.4 .79 •H 50- ' •', Unvoted 106.4 -161.5 .68

ID Sibil ationi Total

2 30- I 20

r2- .81

Exp-ctationj (6<S/N<30)

Soar* - 108.1 -218. 4/(S/N in db)

"S 10- Standard trror of E«t. t 5. 3 r—1

I 1 1 1 1

3024 18 12 Speech/No is< Ratio (db)

Figure All. Regression Models for Sibilation vs the Reciprocal of S/N Ratio, Showing the Differences Between the Voiced and Unvoiced State of Sibilation

29

(D k 90- 0 ^•^^***"^«-»

c8«- "*" ""-^^""^"^-^ '

X70- -P Graveness *" —• ^"*"—.^^ •H 60 F«atur« Origin Slops r^ "" -~

i—« Pr»wnt 94.2 -241.8 .72 ••>•.

•H 50- AbMni 102.3 -216.0 .73 _Q •H 40. 0) - 30- r2- .84

H 20 Exp.otation: (6<S/N<30) Soor. - 98.0 -227.4/CS/N in db)

"c 10' Standard trror of E»t.t 4. 9 i—i

a- 1—i I i i 3024 18 12 Speech/Noii Rat 10 (db)

Figure 12. Regression Models for Graveness vs the Reciprocal of S/N Ratio, Showing the Differences Between the Present and Absent State of the Graveness Feature

100 OJ

0 (/) 80-

x70^

"^ 60-

~ 50- _Q

CD

« 30 -

H 20-

"c 10 H

0

Graveness Fxatur* Origin Slop*

— - Volc«d

• • • • Unvoic«d

104.4 -154.5 91.6 -300.6

r .83 .71

Gra 2_

n«»»: Total r~- .84 Expectations (6<S/N<30) Scor. = 98.0 -227.4/CS/N in db) Standard error of E»t. : 4. 9

-i—I 1 I

30 24 18 12 Speech/Noise Rat 10 (db)

Figure A 13. Regression Models for Graveness vs the Reciprocal of S/N Ratio, Showing the Differences Between the Voiced and Unvoiced State of the Graveness Feature

3 0

"£ 60-

^ 50- •H 40..

U) 2 30- u 20-

0

Compactness Foaiuro Origin Slops r

Prooont 103.8 -116.6 .67

Ab.«ni 105.9 -223.0 .76

-ompactrn 2, r

jtn«««t Total

.84 Exportation: (6<S/N<30) Scor. - 104.9 -169. 8/(S/N in db> Standard error of E»t. i 3. 6

3024 18 12 Speech/Noise Ratio (db)

Figure 14. Regression Models for Compactness vs the Reciprocal of S/N Ratio, Showing the Differences Between the Present and Absent State of the Compactness Feature

X70^

~Z 60-

* 50-

'^40i 2 30 •""* rut v 20-

0

Compactness Fooiuro Origin Slop* r2

- Voiood 102.2 -72.9 .54

Unvoiood 107.6 -266.7 .77

Compaotnooo: Total

r2- .84 Expectation (6<S/N<30) Scon. • 104.9 -169. 8/ (S/N in db) Standard error of Eet. i 3. 6

I I

3024 Spe

18 12 jch/Noise Rat 10 (db)

Figure A 15. Regression Models for Compactness vs the Reciprocal of S/N Ratio, Showing the Differences Between the Voiced and Unvoiced State of the Compactness Feature

31