tactile-auditory speech perception by unimodally and ...€¦ · tactile-auditory speech perception...

11
J Am Acad Audiol 4 :98-108 (1993) Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1 . Alcantara* t Peter J. Blamey* Graeme M. Clark* Abstract The following study compared the effectiveness of unimodal and bimodal training strategies at improving the perception of speech information under a variety of conditions . Normal- hearing subjects were trained in the perception of vowel and consonant stimuli . Speech information was provided either via a multiple channel electrotactile speech processing aid (the Tickle Talker), and/or by a 200-Hz low-pass filtered auditory signal . Two subjects were trained only in the combined tactile-plus-auditory (TA) condition ; the remaining two were trained in both the tactile-alone (T) and auditory-alone (A) conditions ; however, only one condition was used at any single time . All subjects were evaluated in the TA, T, and A conditions, both atthe beginning of the study, prior to training, and atthe completion of training, on closed-set vowel and consonant confusion tests, and on an open-set word test . Results indicated that whilst statistically significant improvements occurred from one evaluation period to the next, in both groups of subjects, the improvements per condition were not dependent on the type of training received . The results provide a preliminary indication that the provision of unimodal training does not impair the perception of speech information under bimodal perception conditions . Key Words : Bimodal, multichannel electrotactile aid, speech perception, training, unimodal A n important concern regardingthe speech perception training provided to hearing- impaired subjects to be fitted with tac- tile communication aids is the method and tim- ing of introduction of the tactile supplement with respect to other modalities that also may be used in the training program . For example, if a hearing-impaired adult or child is undergoing instruction in lipreading and/or audition, the question of whether they should be given tactile training simultaneously with visual and/or au- ditory training arises . Alternatively, should the introduction of the tactile aid be delayed until the subject has attained an adequate profi- ciency in the use of the visual and/or auditory *Department of Otolaryngology, University of Mel- bourne, East Melbourne, Australia, tcurrently Department of Experimental Psychology, University of Cambridge, Cambridge, England Reprint requests : Joseph I . Alcantara, Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England modalities? In other words, should the tactile modality be included in a multimodal training procedure, or should each modality be trained on its own? A similar question applies to the inclusion of lipreading in auditory training programs . That is, should the auditory modality be trained in the absence of lipreading, as for the Auditory Verbal training procedure (Simser, 1988), or should it be included in a bimodal (auditory- plus-lipreading) training program (Clark,1989)? The question is also relevant to tactile speech perception training, as both single and multiple channel tactile aids have been commonly viewed as acting as supplementary sources of speech information rather than as sensory substitutes for audition . For example, a large number of studies have demonstrated the use of tactile aids as supplements to lipreading for hearing- impaired individuals who derive no benefit from their aided residual hearing or for artificially deafened normal-hearing subjects (Pickett and Pickett, 1963 ; Engelmann and Rosov, 1975 ; 98 1 1110

Upload: others

Post on 22-Sep-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

J Am Acad Audiol 4:98-108 (1993)

Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara* t Peter J. Blamey* Graeme M. Clark*

Abstract

The following study compared the effectiveness of unimodal and bimodal training strategies at improving the perception of speech information under a variety of conditions . Normal-hearing subjects were trained in the perception of vowel and consonant stimuli . Speech information was provided either via a multiple channel electrotactile speech processing aid (the Tickle Talker), and/or by a 200-Hz low-pass filtered auditory signal . Two subjects were trained only in the combined tactile-plus-auditory (TA) condition ; the remaining two were trained in both the tactile-alone (T) and auditory-alone (A) conditions ; however, only one condition was used at any single time . All subjects were evaluated in the TA, T, and A conditions, both atthe beginning of the study, prior to training, and atthe completion of training, on closed-set vowel and consonant confusion tests, and on an open-set word test . Results indicated that whilst statistically significant improvements occurred from one evaluation period to the next, in both groups of subjects, the improvements per condition were not dependent on the type of training received . The results provide a preliminary indication that the provision of unimodal training does not impair the perception of speech information under bimodal perception conditions .

Key Words : Bimodal, multichannel electrotactile aid, speech perception, training, unimodal

A

n important concern regardingthe speech perception training provided to hearing-impaired subjects to be fitted with tac-

tile communication aids is the method and tim-ing of introduction of the tactile supplement with respect to other modalities that also may be used in the training program. For example, if a hearing-impaired adult or child is undergoing instruction in lipreading and/or audition, the question of whether they should be given tactile training simultaneously with visual and/or au-ditory training arises . Alternatively, should the introduction of the tactile aid be delayed until the subject has attained an adequate profi-ciency in the use of the visual and/or auditory

*Department of Otolaryngology, University of Mel-bourne, East Melbourne, Australia, tcurrently Department of Experimental Psychology, University of Cambridge, Cambridge, England

Reprint requests : Joseph I . Alcantara, Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England

modalities? In other words, should the tactile modality be included in a multimodal training procedure, or should each modality be trained on its own?

A similar question applies to the inclusion of lipreading in auditory training programs . That is, should the auditory modality be trained in the absence of lipreading, as for the Auditory Verbal training procedure (Simser, 1988), or should it be included in a bimodal (auditory-plus-lipreading) training program (Clark,1989)? The question is also relevant to tactile speech perception training, as both single and multiple channel tactile aids have been commonly viewed as acting as supplementary sources of speech information rather than as sensory substitutes for audition . For example, a large number of studies have demonstrated the use of tactile aids as supplements to lipreading for hearing-impaired individuals who derive no benefit from their aided residual hearing or for artificially deafened normal-hearing subjects (Pickett and Pickett, 1963 ; Engelmann and Rosov, 1975;

98

1 1110

Page 2: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Tactile-Auditory Speech Perception/Alcantara et al

Sparks et a1,1978 ; Saunders,1983; Brooks et al, 1986 a, b; Plant, 1986). More recently, tactile aids have also been fitted to individuals with severe to profound hearing losses who are able to derive some speech perception benefit from their aided hearing (Kozma-Spytek and Weisenberger,1987; Eilers et al, 1988 ; Lynch et al, 1988, 1989 a, b) . In many of these studies, a large majority, if not all of the training that was provided in the use of the tactile aid, was con-ducted in the tactile-alone condition. Although it was possible to demonstrate that speech per-ception performance in the combined condition (i .e ., tactile-plus-lipreadingortactile-plus-aided residual hearing) following unimodal tactile-alone training was greater than in the unimodal conditions, the studies were not able to demon-strate whether the performance level measured in the combined condition would have been greater if the tactile training had been provided simultaneously with other modalities .

The current study tested the hypotheses that unimodal training is more effective in im-proving unimodal speech perception skills than bimodal training, and that bimodal training is more effective in improving bimodal speech perception skills than unimodal training . To this end, a group of normal-hearing subjects was provided with speech perception training using the information provided by a multiple-channel electrotactile aid, and a degraded source of auditory information. Half of the subjects were provided with training only in the bimodal tactile-plus-auditory (TA) condition; the remain-ing subjects were provided with training only in the unimodal tactile-alone (T) and auditory-alone (A) conditions .

The following questions were examined : (1) Would the speech perception scores in the com-bined TA condition be greater for the bimodally trained group than for the unimodally trained group? and (2) Would the speech perception scores in the unimodal T and A conditions be greater for the unimodally trained group than for the bimodally trained group?

At first, it may seem paradoxical to hypoth-esize that one would obtain better TA scores after bimodal training if the unimodal T and A scores are both better after unimodal training . This apparent paradox may be resolved if one considers the effectiveness of the combination process to be a variable dependent upon the type of training . That is, speech perception perform-ance in a bimodal condition may be dependent not only on the information available from the constituent unimodal conditions, but also on

the effectiveness of the combination of these conditions . If the effectiveness of combination depends on whether training is provided in unimodal or bimodal conditions, then it is pos-sible to envisage a situation in which there is an improvement in unimodal perception that is not reflected in the level of bimodal perception be-cause the combination processis not well devel-oped . Data from Blamey et al (1989) using normal hearing subjects indicates that tactile information may not combine with either audi-tory or visual information as effectively as the auditory and visual modalities combine with each other. This may be due to the relative lack of experience in using the tactile modality for speech perception purposes, compared to the greater experience in the use ofthe auditory and visual modalities . Alternatively, integration of tactile information with visual information might not be as good as integration of auditory and visual information.

In order to measure the effectiveness of combination, therefore, one requires a quanti-tative model of the combination process. The model used in the current study for this purpose is that ofBlamey et al (1989) and Blamey (1990) . This model was chosen because it had previ-ously been shown useful in the interpretation of clinical data (Blarney et al, 1989 ; Blamey 1990). Briefly, the model applies probability theory to speech perception, under the assumptions of statistical independence . The model is used in this study to describe bimodal perception as the result of the combination of two statistically independent unimodal sources of information. In practice, predicted TA scores are calculated from observed T and A scores using Equation 1:

(1- fTA) = [(1- fA)(1- fl)]/(' - fl) [1]

where fTA, CT, and fA are the probabilities of correctly identifying a speech feature in the TA, T, and A conditions respectively, and f, is the probability of correctly identifying a feature by chance, which is the same in each condition. The f, values are calculated from a confusion matrix in which the responses are evenly distributed (i .e ., all the elements of the matrix were equal) . The proportion of information incorrectly per- ceived is therefore [(1- fTA)/(1- f,)], [1- fT)/(1-f,)], and [1 - fA)/(1 - f,)], for the TA, T, and A conditions, respectively, after correction for chance scores . Since the information provided by each unimodal condition is assumed to be statistically independent, the probability of an

99 .

Page 3: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Journal of the American Academy of Audiology/Volume 4, Number 2, March 1993

error in the bimodal condition is the product [(1- f,)/(l - Q][(1- f,/(l - fc)], which is equated to [(1- fIA)/(1- fd].

It should be noted that the term "feature" is used here to refer to the physical characteristics arising from the acoustic or articulatory sources of a speech signal . These are perhaps more commonly known as "acoustic cues ." The fea-ture-level has been chosen, in this and similar models of multimodal perception, as a reason-able level for the integration of speech events to occur, because of the large number of acoustic cues present that function to distinguish differ-ent speech signals arising from articulatory influences (e.g., Peterson and Lehiste, 1960; Ainsworth, 1972) . In addition, this model has been demonstrated to provide a more accurate description of multimodal perception when fea-ture-level information is used, compared to pho-neme and word-level information (Blamey, 1990).

Accordingly, an additional aim of this study was to test the hypothesis that TAobserved/TApredicted scores for the bimodally trained group were greater than TA,bserved/TA predicted scores for the unimodally trained group . The model of speech perception is used to determine if the combina-tion of information from the unimodal condi-tions is optimal . If this is the case, the ratio of observed over predicted TA scores will tend towards one, and if not, values less than one will be recorded . The advantage of this method over a simple comparison of observed TA scores of the two training groups is that it is not influ-enced by the relative magnitude of the observed TA scores .

METHOD

Subjects

Four university graduate students between the ages of 21 and 26 years were recruited for the study. The criteria for selection were that they have normal hearing thresholds in the audiometric range (ANSI, 1969), be native speak-ers of English, and not have received any seri-ous injuries to the nerves of the fingers or hand to be fitted with the tactile aid. All of the sub-jects had been involved in a tactile psycho-physical study (Blamey et al, 1990) directly before taking part . All subjects were paid for their involvement in the study.

Electrotactile Speech Processor

The multiple-channel electrotactile speech processor (Tickle Talker) consisted of four sepa-

rate components : an ear-level or lapel omni-directional microphone, the speech processor, the stimulator unit, and the electrode handset. An early prototype of the electrotactile aid in which the stimulator and speech processing components of the aid were contained in two separate housings was used in this study. The speech processor measured 11.5 x 7.5 x 2 cm and was powered by three 1.5-volt AA battery cells. The speech processor used was the same as the WSP model speech processor manufactured by the Nucleus Company for use in the 22-channel intracochlear prosthesis (Blamey et al, 1987a) . This processor produced a coded radio frequency output, which was hard-wired into the separate stimulator circuitry where it was decoded and subsequently used to control the electrical pa-rameters for presentation through the electrode handset.

The stimulator unit measured 11 x 6 x 3 cm and was powered separately by a rechargeable 9-volt NiCd PP3 battery. The combined weight of the speech processor and stimulator with batteries was 435 g. The electrode handset con-sisted of eight finger electrodes, each made from a flat square of fine gauge stainless steel mesh, and positioned against the side of the proximal phalanx of each finger. The electrodes were held in position by springy plastic clips or velcro bands around the fingers. The plastic clips and velcro bands completely covered the outer sur-face ofthe electrodes, electrically isolating them from one another. Each electrode was approxi-mately 0.5 cml. The larger electrode worn on the underside of the wrist acting as the common ground was approximately 7 to 10 cm'.

Speech signals received by the microphone were passed to the speech processor. In brief, the speech processing strategy presented the amplitude of the speech signal by varying the stimulus pulse width; the fundamental fre-quency of the voice by varying the pulse rate ; and the second formant frequency by varying the electrode to be stimulated . A summary of the speech features encoded, together with their corresponding electrical parameters and sub-jective sensations is given in Table 1.

The estimate of the amplitude envelope was derived from a peak detector operating on the speech waveform . The pulse width varied be-tween "threshold" (i .e ., minimum pulse width required for 100% detection), and "maximum comfortable levels" (i .e ., maximum pulse width that could be tolerated without discomfort for a period of 10 minutes) . The threshold and maxi-mum comfortable levels were individually se-

100

",-_TW Tr ti` I'114, IOIIVI ,r,

Page 4: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

II11i L.i . lirl I' '1101

Table 1 Speech Processing Strategy

Estimated Electrical Subjective Speech Feature Parameter Sensation

Amplitude Pulse width Intensity of stimulation

Fundamental Pulse rate Quality of frequency stimulation

(roughness)

Second formant Electrode Place of frequency position stimulation

lected for each of the electrodes by the subjects prior to and during the course of the study. The estimate of the fundamental voice frequency was derived from zero crossings of a low-pass filtered envelope of the speech waveform . The output from the zero-crossing detector was scaled and used to trigger a pulse generator to initiate a stimulus . The pulse rates used ranged from approximately 40 to 300 pulses per second . The estimate of the second formant frequency (EF2) was derived from zero crossings of the speech waveform filtered in the region of 800 to 4000 Hz. The EF2 was encoded as one of eight elec-trode positions.

The encoded speech stimulus was presented to the wearer of the aid via the electrode hand-set. The electrode handset comprised eight fin-ger electrodes, positioned directly over the dig-ital nerve bundles of the fingers of the nondominant hand, and a larger common elec-trode positioned on the underside of the wrist. The electrical current was concentrated in a small area around the sides of the fingers, producing a localized sensation. No sensation was felt near the wrist electrode unless the electrode was not in good contact with the skin over its whole area . The finger electrodes were numbered one to eight in order from the lateral side of the index finger to the medial side of the little finger . The site of stimulation was consid-ered important for two reasons: firstly, stimula-tion of nerve bundles was found to produce more pleasant sensations than stimulation at nerve endings, thereby allowing greater dynamic ranges of stimulation and secondly, it provided a convenient spatially ordered array, which could be used for the encoding of the second formant feature. The frequency boundaries be-tween the eight electrodes were 900,1125,1350, 1575, 1800, 2400, and 3300 Hz. Thus, speech sounds with different EF2 values were felt on different electrodes . For example, the vowel sound / o / with a second formant value of ap-

Tactile-Auditory Speech Perception/Alcantara et al

proximately 850 Hz would be felt on the lateral side of the index finger (i .e ., electrode one), while the vowel sound /i/ with a second formant value of about 2100 Hz would be felt on the medial side of the ring finger (i .e ., electrode six) .

The stimulus waveform was a series of biphasic constant current pulses . The width of each phase was varied to alter the intensity of the sensation. The current flowing between the wrist electrode and a selected finger electrode during each phase was 1.5 mA_-The net charge transfer for each biphasic pulse was zero . The charge per phase varied from 15 to 1350 nC as the pulse width varied from 10 to 900 psec . There was a fixed 100-gsec gap between the phases in which no current flowed. These stimuli used charge densities below the maximum lev-els recommended for subcutaneous stimulation by Mortimer et al (1980) and Brummer et al (1984) .

In summary, the electrotactile speech proc-essor used in the following studies differed from other tactile aids reported in the literature in two important ways : the site of stimulation and the speech processing strategy employed . In contrast to other devices that present stimula-tion patterns representing the whole waveform or spectrum (as in a vocoder), or only present a single parameter such as the fundamental fre-quency or amplitude envelope, this device em-ployed a scheme designed to extract three speech parameters : the amplitude envelope, the funda-mental voice frequency, and the second formant frequency.

Experimental Set-up

The subjects were isolated in a sound at-tenuating chamber during training and evalua-tion periods. The trainer was situated outside the chamber. Communication was achieved via audiolink. A Marantz Superscope EC-7 electret condenser cardioid microphone, placed 10 cm from the trainer, was used to take the raw speech signal to one of two paths; directly to the microphone input of the electrotactile speech processor, or to a low-pass elliptical filter (with a 72 dB/octave skirt) set to a cut-off of 200 Hz, and then to a TEAC 3 Tascam Series mixer, where it was mixed with white noise at a level sufficient to produce a +10 dB signal to noise ratio. The speech-in-noise signal was then pre-amplified via the mixer and sent to a pair of Pioneer SE205 circumaural headphones, which were worn by the subject. During evaluation periods, the trainer's microphone was replaced

101

Page 5: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Journal of the American Academy of Audiology/Volume 4, Number 2, March 1993

by a Yamaha KX-230 cassette deck, and the audio output was treated in the same way as described above.

The cut-off frequency of 200 Hz was chosen for two reasons. Firstly, it was appropriate to choose a cut-off frequency that simulated a profound hearing loss limited to the lower fre-quencies of hearing, as this was typical of most of the hearing-impaired people being seen for rehabilitation with the present tactile aid. Sec-ondly, a pilot study conducted before the main study indicated that scores in the combined condition were in the 20 to 40 percent correct range for vowels and consonants, thereby leav-ing room for improvement during training.

Training

Each subject attended two to three 1-hour training sessions per week in the perception of vowel and consonant speech contrasts, and had received approximately 20 hours of training in total by the end of the study.

The four subjects were split into two groups on the basis of the form of training to be re-ceived . The first group received training only in the bimodal tactile-auditory (TA) condition. The second group received the same form of train-ing; however only one condition, tactile (T) or auditory (A), was used for the duration of any one session and the condition used Was alter-nated every session. Therefore, by the end of the study, the first group had received 20 hours of training in the TA condition, and the second group had received 10 hours in the T condition, and 10 hours in the A condition.

The training was analytic in format and provided live-voice . The procedure outlined be-low was used in exactly the same manner for both groups of subjects . The only procedural difference was the number of modalities used during training . The training comprised closed-set forced choice exercises with variable num-bers of response options. The exercises focused on the following individual speech features : vowel duration, vowel place, consonant man-ner, voicing, and place of articulation . The train-ing exercises used could be described as making a transition from simpler discrimination tasks to more difficult recognition tasks. Features were introduced over one to two sessions, and then reviewed regularly during the course ofthe training . The salient cues of a particular feature were introduced using two items that differed maximally with respect to that feature. When the subject was able to identify the correct item

reliably, the number of alternatives provided in the recognition task was increased. By the end of training, all the subjects were able to accu-rately perceive the features trained from re-sponse option sets with the maximum number of alternatives . Feedback was provided at all times during training.

An analytic approach was chosen in the current study in order to minimize time spent on data collection . That is, previous studies had indicated that accurate perception of tactile speech features was possible in a relatively short space oftime following analytic-level train-ing, whereas the perception of synthetic-level information (e.g ., sentences, connected dis-course) generally required longer periods of training (Alcantara et al, 1990a, b) . Synthetic training was also not included because the sub-jects were not long-term users of the tactile device .

Eleven vowels, /1, l, a, U, 3 , ~e, £,3 , D , A , U /, presented in /h/vowel/d/ context, were used to train vowel feature information . Vowel dura-tion (i .e ., short vs long) was introduced first, followed by vowel place (i .e ., low F2 [850-1300 Hz] vs middle F2 [1300-1800 Hz] vs high F2 [ 1800-2300 Hz]) . Twelve consonants, /p, b, m, t, d, n, f, v, s, z, k, g/, presented in /a/consonant/a/ context, were used to train consonant feature information . Manner of articulation was intro-duced first, followed by voicing and place of articulation .

Evaluation

Each subject was evaluated on two occa-sions, once at the beginning of the study, prior to training, and again at the end of the study, after the completion of training. Three tests were used to give a measure of the subjects' perceptual abilities. The tests were (1) a closed-set vowel test, (2) a closed-set consonant test, and (3) the open-set CNC word test scored on a phoneme-correct basis (Peterson and Lehiste, 1962). The items in the vowel and consonant tests were the same as those used in the training period, and were presented in the same context, that is /h/vowel/d/ and /a/consonant/a/. Ran-domized lists containing four presentations of each stimulus item were used, thereby produc-ing confusion matrices of 44 vowels and 48 con-sonants. Each CNC list contained 50 words. No feedback was given during evaluation periods.

Evaluations were conducted in the tactile-plus-auditory (TA), tactile-alone (T), and audi-tory-alone (A) conditions for all three tests. Two

102

Page 6: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Tactile-Auditory Speech Perception/Alcdntara et al

Table 2 Vowel Test Percentage Correct Scores

Subject

Phase Condition S 1 S2 S3 S4

Pre-training TA 20 .5, 13 .6 13 .6, 27 .3 13 .6, 27 .3 27 .3, 18 .2 T 4 .6, 20 .5 18 .2, 20 .5 15 .9, 11 .4 6.8, 27 .3 A 11 .4, 20 .5 27 .3, 20 .5 25 .0, 27 .3 15 .9, 27 .3

Post-training TA 47 .7, 50 .0 65 .9, 56 .8 45 .5, 45 .5 50 .0, 50 .0 T 43 .2, 45 .5 59 .1, 50 .0 36 .4, 38 .6 43 .2, 36 .4 A 25 .0, 25 .0 34 .1, 22 .7 34 .1, 34 .1 29 .5, 31 .8

Scores are for the unlmodally trained subjects (S1 and S2) and bimodally trained subjects (S3 and S4),-in the tactile-plus-auditory (TA), tactile (T), and auditory (A) conditions, for the pre-training and post-training phases .

presentations of the vowel and consonant tests were provided per condition, in the pre- and post-training evaluation phases . One presenta-tion ofthe CNC word test was used per condition in the pre- and post-training evaluation phases . Different word lists were used in each condition and on each evaluation. All evaluation materi-als were recorded on metal audiotape, using a speaker unknown to the subjects . The order of testing the conditions was balanced across the four subjects .

RESULTS

T ables 2, 3, and 4 show individual subject scores for the vowel, consonant, and CNC

word tests respectively, in each of the test con-ditions (i .e ., TA, T, and A), for the pre-training and post-training evaluation phases .

A three-way analysis of variance (ANOVA) was carried out on the data, in which training group (unimodal, bimodal) was a between-groups factor and test condition (TA, T, A) and evalua-tion phase (pre-training, post-training) were repeated measures factors. Results of the sepa-rate tests carried out for the vowel, consonant, and CNC word tests are summarized in Table 5. The strong main effect of evaluation phase for the vowel and consonant tests confirmed that

post-training scores (collapsed across training group and test condition) were significantly higher than pre-training scores . This suggested that improvements in speech perception fol-lowed training, and is consistent with the find-ings of an earlier study (Alcintara et al, 1990b) . There was also a significant effect of evaluation condition for the vowel and consonant tests. Post-hoc comparison of test means using the Tukey test showed that scores were signifi-cantly greater (p < .05) for the TA condition than the T and A conditions for the vowel test, and that TA was greater than A, which was greater than T, for the consonant test . The main effect of training group was not significant for any of the tests.

The training group x test condition term was used to test the main hypothesis of the experiment, which was whether it is more effec-tive to train auditory and tactile speech percep-tion separately in unimodal training sessions or to train them together for an equal amount of time in a bimodal fashion. The nonsignificant result for all of the tests suggested that speech perception ability in the TA, T, and A conditions did not differ as a function of the type of training provided . In other words, unimodal training did not improve unimodal speech perception ability in preference to bimodal speech perception abil-

Table 3 Consonant Test Percentage Correct Scores

Subject

Phase Condition S 1 S2 S3 S4

Pre-training TA 31 .3, 25 .0 35 .4, 41 .7 52 .1, 37 .5 35 .4, 37 .5 T 14 .6, 14 .6 8 .3, 16 .7 14 .6, 10 .4 22 .9, 10 .4 A 22 .9, 29 .2 20 .8, 35 .4 33 .3, 41 .7 29 .2, 41 .7

Post-training TA 47 .9, 50 .0 60 .4, 58 .3 54 .2, 56 .3 54 .2, 50 .0 T 31 .3, 27 .1 35 .4, 31 .3 25 .0, 25 .0 31 .3, 27 .1 A 33 .3, 37 .5 37 .5, 41 .7 41 .7, 39 .6 33 .3, 35 .4

Scores are for the unimodally trained subjects (S1 and S2) and bimodally trained subjects (S3 and S4), in the tactile-plus-auditory (TA), tactile (T), and auditory (A) conditions, for the pre-training and post-training phases .

Page 7: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Journal of the American Academy of Audiology/Volume 4, Number 2, March 1993

Table 4

Phase Condition

CNC Word Test

Sl

Phoneme Percentage Correct

Subject

S2

Scores

S3 S4 Pre-training TA 7 14 17 8

T 11 9 13 9 A 9 9 15 13

Post-training TA 9 16 16 14 T 12 14 12 15 A 8 13 16 14

Scores are for the unimodally trained subjects (S1 and S2) and blmodally trained subjects (S3 and S4), in the tactile-plus-auditory (TA), tactile (T), and auditory (A) conditions, for the pre-training and post-training phases .

ity. Similarly, bimodal training did not improve bimodal speech perception ability in preference to unimodal speech perception ability.

The interaction terms involving evaluation phase proved to be more interesting, since they relate to the improvements that occurred be-tween pre- and post-training evaluations . The significant training group x evaluation phase interaction terms for both the vowel and conso-nant tests indicated that the unimodally trained group improved more than the bimodally trained group overall (i .e ., in both unimodal and bimo-dal test conditions) . The significant condition x phase interactions indicated that the improve-ments in the TA and T conditions were greater than the improvements in the A condition. The remaining interaction terms in the- ANOVA tests were not significant at the 95% signifi-cance level.

In order to test the hypothesis that the combination of unimodal conditions was not as effective following unimodal training as it was following bimodal training, a series oftwo-tailed independent t-tests was conducted to compare the ratio of TA /TA predieted scores for the unimodally and

observed imodally trained groups from

the post-training evaluation phase alone. The model of bimodal perception used to predict TA performance described the combination of unimodal condition information in terms of fea-tures. Accordingly, TA, T, and A scores for the

vowel and consonant stimulus items were analyzed first in terms of features according to the classifications shown in Tables 6 and 7. The vowels were classified on the basis of data for average male speakers (Bernard, 1970). Classi-fications used for the consonants were based on the results of Miller and Nicely (1955) and Blarney et al (1987b). Tables 8 and 9 show the subject mean percentage feature correct scores for the vowel and consonant tests, respectively, for the unimodally and bimodally trained groups in the three test conditions . It is apparent that T and A condition ceiling effects occurred for the duration feature of vowels, and the nasality and voicing feature of consonants, but not for any of the other features . In the presence of such effects, it becomes impossible to demonstrate differences between the unimodally and bimodally trained groups ; however, there were sufficient features below the 90 to 100 percent level for an adequate comparison to be carried out. The overall percentage correct scores for vowels and consonants were significantly above chance and showed no sign of ceiling effects. The individual subject scores for the unimodal T and A conditions were used to calculate the pre-dicted TA scores for each of the subjects in each of the training groups using Blamey et al's (1989) and Blamey's (1990) model of bimodal perception . The prediction was made assuming that each unimodal condition contributed a por-

Table 5 Three-Way ANOVA Test Results Analysis Term

of F

Vowels

P

Conso

F

nants

P

CNC

F

Words

P G 1,2 0.03 > .05 0.83 > .05 1 .54 > .05 P 1,2 8929.6 < .001 309 .3 < .005 2 .24 > .05 C 2,4 17 .8 < .025 44.2 < .005 0 .062 > .05 G x C 2,4 5.49 > .05 0.80 > .05 0 .88 > .05

GxP 1,2 153.2 < .01 25.3 < .05 0.006 > 05 C x P 2,4 52 .1 < .005 17 .7 < .01 40 > .05 The three factors are training group (G), evaluation phase (P), and test condition (C) .

104

Page 8: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Tactile-Auditory Speech Perception/Alcantara et al

Table 6 Groups of Vowels Used in the Feature Correct Analysis of Vowel Confusions

Vowel

Duration*

Feature

F1 Frequencyt

F2 Frequencyt

heed 1 1 1 heard 1 2 2 hard 1 3 2 who'd 1 1 2 hoard 1 2 3 hid 2 1 1 head 2 2 1 had 2 3 1 hud 2 3 2 hod 2 3 3 hood 2 2 3

In each column, vowels with the same number are grouped together for the analysis of the feature indicated

*1 long, 2 = short ; t 1 = low, 2 -- mid, 3 = high ; and $1 = high, 2 = mid, 3 = low .

tion of the total information, and that an error occurred in the bimodal condition only if the speech feature was incorrectly perceived in both of the unimodal conditions . Equation 1, de-scribed earlier, was used for the calculations, and included the term f<� which was the prob-ability of correctly recognizing a feature by chance . The appropriate ft, values are included in Tables 8 and 9. This formula has been shown to provide a good description of combined audi-tory-visual perception of nonsense syllables by cochlear implant and tactile aid wearers (Blarney et al, 1989 ; Blamey, 1990).

Accordingly, TA predicted scores were calcu-lated for each of the vowel and consonant fea-tures, from each pair of post-training T and A (observed) scores collected for each subject . The ratios of the TAobserved/TA predicted scores were then calculated, and it was these values, partitioned according to training group, that were included in the independent t-test analyses comparing training groups . Separate tests were performed for each of the vowel and consonant features .

Table 10 shows the statistical results of the training group comparison . The results of these tests indicated that the effectiveness of combi-nation of the unimodal T and A conditions was similar for both the unimodally and bimodally trained groups, for every vowel and consonant feature . In other words, bimodal training did not result in a more effective combination of information from the unimodal conditions than did unimodal training.

DISCUSSION

T his experiment was designed as a prelimi-nary study and the conclusions must be

treated with some caution because the small number of subjects makes it inappropriate for use to make any recommendations regarding clinical practice at this time . Despite their ten-tative nature, however, the conclusions are in-teresting and would be worth verifying in a more extensive study . Firstly, it is clear that both auditory and tactile information contrib-uted to the perception of the vowels and conso-nants, as the TA condition was significantly greater than both the T and A conditions . This is consistent with previous results (Blarney et al, 1989 ; Cowan et al, 1989), and suggests that the tactile aid is able to provide useful speech feature information to supplement the informa-tion available through audition . Secondly, the comparison of TA, T, and A performance scores (collapsed across evaluation phase), and that of the TAobser ed/TApredieted ratios, for the two training groups supports the hypothesis that the modal-ity (or modalities) chosen for training does not have the effect of preferentially improving per-ception ability in some modalities compared to others . Thirdly, the unimodal training produced a larger improvement than the bimodal train-ing in both unimodal and bimodal test condi-tions in every case except the A condition for vowels . This suggests that improvements in bimodal perception flow naturally from improve-

Table 7 Groups of

Feature

Conson

b

ants Used in

P m

the Feature C

V f

orrect Analysis

Consonant

d t

of

n

Conson

z

ant Co

s

nfusions

9 Voicing 1 2 1 1 2 1 2 1 1 2 1 Nasality 1 1 2 1 1 1 1 2 1 1 1 Affrication 1 1 1 2 2 1 1 1 2 2 1 Duration 1 1 1 1 1 1 1 1 2 2 1 Place 1 1 1 1 1 2 2 2 2 2 3 Amplitude envelope 1 2 3 1 4 1 2 3 1 4 1 High F2 1 1 1 1 1 1 2 1 2 2 1

In each row, consonants with the same number are grouped together for the analysis of the feature indicated .

k

105

Page 9: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Journal of the American Academy of AudiologyNolume 4, Number 2, March 1993

Table 8 Mean Percentage Feature Correct Score for Vowels

Group Condition Feature

Duration Fl Frequency F2 Frequency (f = 50)* (f = 34)* (( = 34)-

Unimodal A 100 46 44 T 96 59 74 TA 100 61 74

Bimodal A 99 55 43 T 96 47 65 TA 100 53 71

Scores are the vowels in the tactile-plus-auditory (TA), tactile (T), and auditory (A) conditions, for the unimodally and bimodally trained groups .

*Percent correct score attributable to chance .

ments in the constituent unimodal conditions without requiring specific bimodaltraining, and that it may be more effective to focus attention on a single modality during analytic training . It remains to be tested whether these same con-clusions apply to synthetic training and open-set speech recognition. This third finding is consistent with the philosophy of the Auditory Verbal auditory training strategy (Simser,1988), which holds that the development of auditory perception skills can only reach their full poten-tial if audition is trained in the absence of lipreading . Although it was not possible to keep a detailed record of the rate of progress of training, as a result of periods of irregular attendance by some of the subjects, it was ob-served that the unimodally trained subjects tended to learn the training tasks used more quickly than the bimodally trained group, espe-cially in the tactile-alone condition. Fourthly, the improvements in the TA and T conditions were greater than those of the A condition. This may have arisen from the previous experience of the subjects and the nature of the two signals. That is, since all the subjects were normal-hearing adults, they were all consequently highly

experienced users of auditory information. In addition, the auditory signal provided was se-verely restricted in the information it carried. As a result, the subjects may have been close to the top of the learning curve for the A condition with little scope for further improvement. In contrast, none of the subjects had any previous speech perception experience with the tactile device, and the tactile signal was rich in poten-tial speech perception cues . This would result in the subjects starting much lower on the tactile speech perception learning curve with the po-tential for a greater level of improvement.

The lack of a significant improvement in the perception of word-level information (i .e ., the CNC word test) was most likely due to two reasons: firstly, the provision of analytic (fea-ture-level) training, which has previously been shown to be more effective at improving the perception of isolated vowels and consonants, rather than words or sentences (Alcantara et al, 1990a) and secondly, an inadequate length of training, as the subjects used had not previ-ously been exposed to either degraded or tactile transforms of speech . The result is consistent with the findings ofprevious studies (e.g., Cowan

Table 9 Mean Percentage Feature Correct Scores for Consonants

Group Cdndition

Voicing (f~ = 51)*

Nasality (f = 72)*

Affrication (f = 56)*

Fea

Duration (f = 72)*

ture

Place (( = 38)*

Amplitude envelope (f = 29)*

High F2 (( = 63)*

nimodal A 95 100 76 80 44 81 75 T 71 82 70 97 53 49 87 TA 96 99 77 96 60 84 90

imodal A 100 100 79 75 40 84 67 T 70 90 72 88 51 51 80 TA 95 99 86 96 51 87 85

U

B

Scores are for the consonants in the tactile-plus-auditory (TA), tactile (T), and auditory (A) conditions, for the unimodally and bimodally trained groups .

*Percent correct score attributable to chance .

Page 10: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Tactile-Auditory Speech Perception/Alcdntara et al

Table 10 Independent t-test Results Comparing TAobserved/TApredicted Ratios

and Bimodally Trained Groups for the Unimodally

Test Feature Unim

(TAon~TA,J odal Group Mean

(TAob1TAP_d) Biomodal Group Mean

t(3) p

Vowels Duration 1 .00 1 .00 0.000 > .05 F1 91 82 -.849 > .05 F2 94 1 .01 1 .851 > .05

Consonants Nasality 99 99 0.000 > .05 Duration 98 1 .03 2.245 > .05 Voicing 99 95 -1 .648 > .05

Amplitude envelope 97 96 -.027 > .05 Frication 91 98 1 .245 > .05 Place 1 .04 97 -.865 > .05 High F2 99 1 .02 676 > .05

et al, 1988) indicating that although accurate perception of tactile vowel and consonant fea-ture information was possible after short peri-ods of training (10-20 hours), the perception of word and sentence-level information required longer periods of training. Indeed, adults and children currently fitted with the tactile aid continue to improve in their perception of fea-ture, word, and sentence information after 2 to 3 years of daily use (Cowan et al, 1989, 1990) . Such an extended period of training was not possible in this study. The tactile-alone (T) performance of the subjects used in this study is poorer than that described in Alcdntara et al (1990a). This is most likely due to two factors: the previous study provided a longer period of training and used a familiar speaker for the evaluations of tactile speech perception, unlike the present study, which had a shorter training period and used a speaker unknown to the subjects .

CONCLUSIONS

T he results of this study provide a prelimi-nary indication that the provision of unimodal training, as a means of improving the perception of tactile and auditory speech per-ception cues, does not impair the perception of this information under the more realistic bimo-dal perception conditions, which long-term wear-ers of the tactile aid will probably be subjected to . Indeed, there is some indication that unimodal training may be more appropriate for training as a means of maximizing bimodal speech per-ception ability. Additional data are required, however, to further establish the validity of these findings and consequently make recom-mendations regarding clinical practice . Specifi-cally, it would be desirable to observe the results using a larger group of subjects, preferably hearing-impaired subjects with a range of hear-

ing losses, and also to verify the conclusions with synthetic-level training programs over longer training periods.

Acknowledgment. The authors acknowledge the financial support of Cochlear Pty. Ltd. ; the Department of Industry, Technology and Commerce, and Department of Employment, Education and Training of the Austral-ian Commonwealth Government ; the National Health and Medical Research Council (Aust) ; and the technical support provided by Dr . Peter Seligman . The first author was supported by a National Health and Medical Re-search Council of Australia Ph.D . scholarship .

REFERENCES

Ainsworth WA. (1972) . Duration as a cue in the recogni-tion of synthetic vowels . JAcoust Soc Am 51:648-651 .

Alcantara JI, Cowan RSC, Blarney PJ, Clark GM. (1990a). A comparison of two training strategies for speech recognition with an electrotactile speech proces-sor. J Speech Hear Res 33:195-204.

Alcantara JI, Whitford LA, Blarney PJ, Cowan RSC, Clark GM. (1990b). Speech feature recognition by pro-foundly hearingimpaired children using a multiple chan-nel electrotactile speech processor and aided residual hearing. JAcoust Soc Am 88:1260-1273 .

American National Standards Institute. (1969) . Ameri-can National Standards Specification for Audiometers. (ANSI 3.6-1969.) New York : ANSI .

Bernard JRL. (1970) . Toward the acoustic specification of Australian English. Zeit Phonetik 23:113-128 .

Blarney PJ, Dowell RC, Brown AM, Clark GM, Seligman PM . (1987a). A formant-estimating speech processor for cochlear implant patients . Speech Commun 6:293-298 .

Blarney PJ, Dowell RC, Brown AM, Clark GM, Seligman PM . (1987b). Vowel and consonant recognition of cochlear implant patients using formant estimating speech proc-essors . JAcoust Soc Am 82:48-57 .

Blamey PJ, Cowan RSC, Alcantara JI, Whitford LA, Clark GM. (1989) . Speech perception using combinations of auditory, visual, and tactile information. JRehabil Res Deu 26:15-24 .

Page 11: Tactile-Auditory Speech Perception by Unimodally and ...€¦ · Tactile-Auditory Speech Perception by Unimodally and Bimodally Trained Normal-Hearing Subjects Joseph 1. Alcantara*

Journal of the American Academy of Audiology/Volume 4, Number 2, March 1993

Blamey PJ . (1990). Multimodal stimulation for speech perception . In : Rowe MJ, Aitkin LM, eds. Information Processing in Mammalian Auditory and Tactile System . New York: Wiley-Liss, 267-280.

Blamey PJ, Alcantara JI, Cowan RSC, Galvin KL, Sarant JZ, Clark GM. (1990) . Perception of amplitude envelope variations of pulsatile electrotactile stimuli. JAcoust Soc Am 88:1765-1772 .

impaired : a case study. J Acoust Soc Am 82(Suppl 1) : S23-824.

Lynch MP, Eilers RE, Oller DK, Lavoie L. (1988) . Speech perception by congenitally deaf subjects using an electrocutaneous vocoder. J Rehabil Res Dev 25:41-50.

Lynch MP, Eilers RE, Oller DK, Cobo-Lewis AB. (1989a). Multisensory speech perception by profoundly deaf chil-dren. J Speech Hear Res 54:57-67 .

Brooks PL, Frost BJ, Mason JL, Gibson DM. (1986a). Continuing evaluation of the Queen's University Tactile Vocoder. I. Identification of open-set words. J Rehabil Res Dev 23:119-128 .

Brooks PL, Frost BJ, Mason JL, Gibson DM. (1986b). Continuing evaluation of the Queen's University Tactile Vocoder. 11 . Identification of open-set sentences and tracking narrative. J Rehabil Res Dev 23:129-138 .

Brummer SB, Roblee LS, Hambrecht FT . (1984) . Criteria for selecting electrodes for electrical stimulation: theo-retical and practical considerations . Ann N Y Acad Sci 405:159-171 .

Clark M. (1989) . Language through Living for Hearing Impaired Children . London : Hodder and Stoughton.

Cowan RSC, Alcantara JI, Blamey PJ, Clark GM . (1988) . Preliminary evaluation of a multichannel electrotactile speech processor. JAcoust Soc Am 83 :2328-2338 .

Cowan RSC, Alcantara JI, Whitford LA, Blarney PJ, Clark GM. (1989) . Speech perception studies using a multichannel tactile speech processor, residual hearing, and lipreading. JAcoust Soc Am 85:2593-2607 .

Cowan RSC, Blamey PJ, Galvin KL, Sarant JZ, Alcantara if, Clark GM. (1990) . Perception of sentences, words, and speech features by profoundly hearing children using a multichannel electrotactile speech processor. J Acoust Soc Am 88:1374-1384 .

Eilers RE, Widen JE, Oller DK . (1988) . Assessment techniques to evaluate tactual aids for hearing-impaired subjects . J Rehabil Res Dev 25:33-46 .

Engelmann S, Rosov R. (1975) . Tactual hearing experi-ment with deaf and hearing subjects . Except Child 41:243-253.

Kozma-Spytek L, Weisenberger JM . (1987) . Evaluation of a multichannel electrotactile device for the hearing

108

Lynch MP, Eilers RE, Oller DK, Urbano RC, Pero PJ . (1989b). Multisensorynarrative trackingby a profoundly deaf subject using an electrocutaneous vocoder and vibrotactile aid. J Speech Hear Res 32:331-338 .

Miller GA, Nicely PE . (1955) . An analysis of perceptual confusions among some English consonants . JAcoust Soc Am 27:338-352 .

Mortimer JT, Kaufman D, Roessmann U. (1980) . Intra-muscular electrical stimulation: tissue damage. Ann Biomed Eng 8:235-244 .

Peterson GE, Lehiste 1. (1960) . Duration of syllabic nuclei in English. JAcoust Soc Am 32 :175-184 .

Peterson GE, Lehiste I . (1962) . Revised CNC lists for auditory tests. J Speech Hear Disord 27:62-70 .

Pickett JM, Pickett BH . (1963) . Communication of speech sounds by a tactual vocoder. J Speech Hear Res 6:207-222.

Plant G. (1986) . A single transducer vibrotactile aid to lipreading . Speech Trans Lab Quart Prog Stat Rep STL-QPSR 1:41-63 .

Saunders FA . (1983) . Information transmission across the skin: high resolution tactile sensory aids for the deaf and the blind. Int J Neurosci 19:21-28 .

Simser J. (1988) . The Possible Dream. Guest speaker at The Federation for Junior Deaf Education Conference 88, Sydney .

Sparks DW, Kuhl PK, Edmonds AE, Gray GP. (1978) . Investigating the MESA (Multipoint Electrotactile Speech Aid) : The transmission of segmental features of speech . JAcoust Soc Am 63:246-257 .

PC I ' Iul . t I ~ ~ ; [I1I411i rtlNlllll114+