limits of visual communication: the effect of signal-to ...whipl/staff/sperling/pdfs/... ·...

11
Vol. 4, No. 12/December 1987/J. Opt. Soc. Am. A 2355 Limits of visual communication: the effect of signal-to-noise ratio on the intelligibility of American Sign Language M. Pavel Department of Psychology,Stanford University, Stanford, California 94305 George Sperling, Thomas RiedI, and August Vanderbeek Human Information Processing Laboratory, Department of Psychology, New York University, New York, New York 10003 Received April 23, 1987; accepted August 19, 1987 To determine the limits of human observers' ability to identify visually presented American Sign Language (ASL), the contrast s and the amount of additive noise n in dynamic ASL images were varied independently. Contrast was tested over a 4:1 range; the rms signal-to-noise ratios (s/n) investigated were s/n = 1/4, 1/2, 1,and - (whichis used to designate the original, uncontaminated images). Fourteen deaf subjects were tested with an intelligibility test composed of 85 isolated ASL signs, each 2-3 sec in length. For these ASL signs (64 X 96 pixels, 30 frames/sec), subjects' performance asymptotes between s/n = 0.5 and 1.0;further increases in s/n do not improve intelligibility. Intelligibility wasfound to depend only on s/n and not on contrast. A formulation in terms of logistic functions was proposed to derive intelligibility of ASL signs from s/n, sign familiarity, and sign difficulty. Familiarity (ignorance) is represented by additive signal-correlated noise; it represents the likelihood of a subject's knowing a particular ASL sign, and it adds to s/n. Difficulty is represented by a multiplicative difficulty coefficient; it represents the perceptual vulnerability of an ASL sign to noise and it adds to log(s/n). INTRODUCTION How good must the quality of visual information be in order for images to be useful? Visual scientists would like to know the answer to this question in order to describe the human visual system better. Engineers and image-processing-sys- tems designers would like to know so that they may build the most effective systems. Clearly, this question has both practical and theoretical importance. From the practical point of view,the popularity of movies, television, and video games attests to the fact that people like to watch dynamic, electronically or optically trans- formed two-dimensional images, sometimes even in prefer- ence to natural scenes! Currently, most of the effort devot- ed to the technology of video and image processing is direct- ed at producing images of the highest possible quality, requiring high information transmission rates. The objec- tive is high-fidelity reproduction of original images. Aside from aesthetic considerations, high image quality may not be necessary for successful performance on many tasks such as visual inspection, motor control, and commu- nication. An indication of how low the requirements may be for a successful execution of many tasks is suggested by a number of recent studies, including those of the capabilities of visually impaired patients. 1 4 These findings are sup- ported by several studies of the abilities of normals to inter- pret static low-bandwidth images 2 , 3 , 5 -8 and by research on American Sign Language (ASL) communication with low bandwidth.9-l 6 From a theoretical point of view, researchers who intend to characterize and model the visual perceptual system are interested in what information in images is used by the visual system for different tasks. The simplest theory of perceptual communication is based on the assumption that the information is extracted from images by an optimal receiver using feature detectors similar to templates (matched filters). The performance ought to be character- izable within the classical information-theoretic framework (e.g., see Refs. 17 and 18). In particular, the amount of information should be a function of the signal-to-noise ratio. In the present study we examine the amount of informa- tion necessary for visual communication with American Sign Language (ASL) used by the deaf community. A significant proportion of the deaf community uses manual communica- tion forms such as ASL. ASL is a complex language in which meaning is conveyed by gestures and motions that are concatenated under syntactic constraints. For the purpose of this work it is important to note that the potential of communicating with ASL is similar to that of speech. For example, as a language in use for everyday communication, ASL is used by the deaf community at the same rate of communication as a spoken language.' 9 We assume that the amount of useful information can be inferred from the level of performance on ASL communica- tion tasks. Given this assumption, there are at least two ways to determine the limits of necessary image quality. One way involvesreducing the amount of information in the original image by filtering or removing information. The studies referred to above all use this method. The second approach involves adding noise (i.e., random signals) that mask image information. Although this approach has been widely used in audition, 20 noise has been used with dynamic ASL images only in a study of location discrimination. 2 ' In the present study, we use additive noise to reduce the amount of information in images of ASL signs in order to 0740-3232/87/122355-11$02.00 © 1987 Optical Society of America Pavel et al.

Upload: others

Post on 18-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

Vol. 4, No. 12/December 1987/J. Opt. Soc. Am. A 2355

Limits of visual communication: the effect of signal-to-noiseratio on the intelligibility of American Sign Language

M. Pavel

Department of Psychology, Stanford University, Stanford, California 94305

George Sperling, Thomas RiedI, and August Vanderbeek

Human Information Processing Laboratory, Department of Psychology, New York University, New York,New York 10003

Received April 23, 1987; accepted August 19, 1987

To determine the limits of human observers' ability to identify visually presented American Sign Language (ASL),the contrast s and the amount of additive noise n in dynamic ASL images were varied independently. Contrast wastested over a 4:1 range; the rms signal-to-noise ratios (s/n) investigated were s/n = 1/4, 1/2, 1, and - (which is used todesignate the original, uncontaminated images). Fourteen deaf subjects were tested with an intelligibility testcomposed of 85 isolated ASL signs, each 2-3 sec in length. For these ASL signs (64 X 96 pixels, 30 frames/sec),subjects' performance asymptotes between s/n = 0.5 and 1.0; further increases in s/n do not improve intelligibility.Intelligibility was found to depend only on s/n and not on contrast. A formulation in terms of logistic functions wasproposed to derive intelligibility of ASL signs from s/n, sign familiarity, and sign difficulty. Familiarity (ignorance)is represented by additive signal-correlated noise; it represents the likelihood of a subject's knowing a particularASL sign, and it adds to s/n. Difficulty is represented by a multiplicative difficulty coefficient; it represents theperceptual vulnerability of an ASL sign to noise and it adds to log(s/n).

INTRODUCTION

How good must the quality of visual information be in orderfor images to be useful? Visual scientists would like to knowthe answer to this question in order to describe the humanvisual system better. Engineers and image-processing-sys-tems designers would like to know so that they may build themost effective systems. Clearly, this question has bothpractical and theoretical importance.

From the practical point of view, the popularity of movies,television, and video games attests to the fact that peoplelike to watch dynamic, electronically or optically trans-formed two-dimensional images, sometimes even in prefer-ence to natural scenes! Currently, most of the effort devot-ed to the technology of video and image processing is direct-ed at producing images of the highest possible quality,requiring high information transmission rates. The objec-tive is high-fidelity reproduction of original images.

Aside from aesthetic considerations, high image qualitymay not be necessary for successful performance on manytasks such as visual inspection, motor control, and commu-nication. An indication of how low the requirements may befor a successful execution of many tasks is suggested by anumber of recent studies, including those of the capabilitiesof visually impaired patients.1 4 These findings are sup-ported by several studies of the abilities of normals to inter-pret static low-bandwidth images2 ,

3,5-8 and by research on

American Sign Language (ASL) communication with lowbandwidth.9-l 6

From a theoretical point of view, researchers who intendto characterize and model the visual perceptual system areinterested in what information in images is used by thevisual system for different tasks. The simplest theory of

perceptual communication is based on the assumption thatthe information is extracted from images by an optimalreceiver using feature detectors similar to templates(matched filters). The performance ought to be character-izable within the classical information-theoretic framework(e.g., see Refs. 17 and 18). In particular, the amount ofinformation should be a function of the signal-to-noise ratio.

In the present study we examine the amount of informa-tion necessary for visual communication with American SignLanguage (ASL) used by the deaf community. A significantproportion of the deaf community uses manual communica-tion forms such as ASL. ASL is a complex language inwhich meaning is conveyed by gestures and motions that areconcatenated under syntactic constraints. For the purposeof this work it is important to note that the potential ofcommunicating with ASL is similar to that of speech. Forexample, as a language in use for everyday communication,ASL is used by the deaf community at the same rate ofcommunication as a spoken language.' 9

We assume that the amount of useful information can beinferred from the level of performance on ASL communica-tion tasks. Given this assumption, there are at least twoways to determine the limits of necessary image quality.One way involves reducing the amount of information in theoriginal image by filtering or removing information. Thestudies referred to above all use this method. The secondapproach involves adding noise (i.e., random signals) thatmask image information. Although this approach has beenwidely used in audition,2 0 noise has been used with dynamicASL images only in a study of location discrimination. 2 ' Inthe present study, we use additive noise to reduce theamount of information in images of ASL signs in order to

0740-3232/87/122355-11$02.00 © 1987 Optical Society of America

Pavel et al.

Page 2: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

2356 J. Opt. Soc. Am. A/Vol. 4, No. 12/December 1987

assess whether the signal-to-noise ratio describes the perfor-mance of ASL signers (the signal-to-noise hypothesis) orwhether the noise and the signal combine in some morecomplicated way to determine ASL intelligibility.

First, we show how the signal-to-noise ratio (s/n) appliesto theoretical descriptions of receiver performance and per-ception. The performance of a communication system ischaracterized by the probability of correctly identifyingmessages as a function p of the signal rms power s and thenoise rms power n. For a large class of realizable optimalreceivers (e.g., matched filters), the degrading effect of addi-tive noise on receiver performance can be expressed in termsof a monotonic function f of the ratio of the signal power tothe noise power:

Probicorrect detection} = p(s, n) = f(s/n) (1)

for noise with a constant spectrum. The consequence of Eq.(1) is that increasing both the noise and the signal power bythe same factor will not change the probability of correctdetection. We note in passing that the signal-to-noise in-variance is also consistent with the classical information-theoretic analysis of transmission systems in which the ca-pacity of a channel is a function of the signal-to-noise ratio.22

Perceptual processes may also be viewed and studied ascommunication systems. Empirical findings in psychologi-cal research indicate that the addition of noise to stimulifrequently results in signal-to-noise invariance. Such in-variances enable us to model perceptual processes as optimalreceivers.23 In human signal detection, performance also ischaracterized by the probability of obtaining a correct re-sponse. The signal-to-noise invariance is expressed as aconstant ratio

s/n = kP,

derived, based on the empirical results, to predict intelligi-bility from the signal-to-noise ratio, the difficulty of individ-ual ASL signs, and the subject's skill.

Speech intelligibility refers to the probability of correctlyidentifying spoken messages, usually isolated words. Thestimuli used to measure speech intelligibility have beencarefully chosen and standardized so that this measure willpredict the performance of audio communication systems.Similarly, it would be useful to develop an intelligibilitymeasure for ASL that is defined as the probability of correct-ly identifying isolated ASL signs. This intelligibility mea-sure could then be used to predict the performance of ASLcommunication systems.

Speech intelligibility has been found to depend on thesignal-to-noise ratio. For example, Hawkins and Stevens2 0

found that an increase in the power of background noisemust be accompanied by a proportional increase in speechpower in order to maintain constant intelligibility. Conse-quently, the effectiveness of audio communication systemscan be described by their signal-to-noise ratios.

To determine whether the signal-to-noise invarianceholds for the ASL intelligibility, we must first define signaland noise power for images and video signals. We must thendevelop an intelligibility measure for ASL. With thesetools, we can test the hypothesis that the intelligibility ofASL degraded by additive noise is a function of the signal-to-noise ratio. In practice, signal-to-noise invariance en-ables us to predict intelligibility of ASL when it is degradedby noise. Theoretically, the relationship between ASL in-telligibility and noise presents a useful way of describing theinformation in the images and consequently of characteriz-ing, in informational terms, the process of perceiving ASL.

(2)

where kP is a constant that depends on p but is independentof s and n. When Eq. (2) holds for all values of p, then theprobability of detection is a function of the ratio s/n. Withrespect to the signal-to-noise invariance expressed in Eq.(1), taking the inverse of both sides of Eq. (1) gives kP =

f '(p), where f is the monotonic function from Eq. (1).Similar invariance is observed in experiments in which

subjects are asked to detect incremental intensity signals AIon backgrounds I. In these experiments, the probability ofdetection p is said to follow Weber's law24' 25 whenever thefollowing holds: l/I = kp. Alternatively, Weber's law maybe regarded as a case in which the additive background noiseis simply proportional to the background intensity.

In psychoacoustics, the signal-to-noise ratio is a good de-scription of the human ability to detect auditory signals(e.g., pure tones) in noise. Signal-to-noise invariance holdsalso in visual masking tasks in which human observers de-tect the presence of simple visual patterns on a noisy back-ground.26 One goal of this work is to determine whethersignal-to-noise invariance holds for complex visual taskswith dynamic stimuli.

We proceed as follows. First, we describe the develop-ment of an intelligibility test and its use to measure intelligi-bility of ASL for various combinations of signal and noiselevels. The results of this experiment are used to test thesignal-to-noise hypothesis and to describe the empirical re-lationship between the signal-to-noise ratio and intelligibil-ity by an explicit function. In the final section a model is

METHOD

In this section we describe the construction of the intelligi-bility test, digital processing and stimulus generation, andthe procedure for the experimental testing of ASL intelligi-bility.

ApparatusThe recording and testing apparatus is outlined in Fig. 1. Avideo camera (JVC s100u) was used to record sign sequencesmonochromatically on a video cassette recorder (Sony BetaMax SLO 323, Beta I format). Images were digitized bytransferring them first to a video motion analyzer (SonySVM 1010), which permitted individual frames to be ac-cessed. An image processor (Grinnell GMR 27) then digi-tized the image frames to a spatial resolution of 512 X 512with 8 bits of (nominal) luminance resolution. Frames wereuploaded to a digital computer (VAX 11/750) and subse-quently processed by using the HIPS system27'28 runningunder the UNIX 4.1bsd operating system. The individualimages (frames) were reduced in x and y pixel dimensions bya factor of 4 to produce the source frames from which allstimuli were derived.

After processing, the images were downloaded from thecomputer to the video cassette recorder (VCR). The VCRwas used to drive the monitor (Koyo 9 in. TMC-9M withstandard NTSC B&W raster) that displayed the signs to thesubjects (see Fig. 2). Analog contrast and brightness wereset to minimize nonlinearity, saturation, and cutoff; settings

Pavel et al.

Page 3: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

Vol. 4, No. 12/December 1987/J. Opt. Soc. Am. A 2357

II ~ IID+.I'A h i~121-

CPU

'VCR

S IP

Fig. 1. The production of test stimuli. The signer is illuminated by optimally located lamps and photographed through a 12 in. X 18 in. (30.5cm X 45.7 cm) aperture (A) by a video camera (C) (located at D = 4.44 m) and recorded on a VCR. The output of the VCR is converted into amachine-readable format by a Grinnell image processor (IP). A DEC VAX 11/750 computer is used to crop, subsample, and add noise to theimage, which then is reconverted into video format by the IP, rerecorded on the VCR, and viewed by subjects on location on a monitor (M)through a viewing hood (V).

Fig. 2. Single frames taken from the ASL sequence "animal" withadded noise. (a) s/n = a; (b) s/n = 1; (c) s/n = 0 (pure noise).Because the signal is correlated between frames and the noise is not,the single-frame static illustration (b) appears to be of a lower imagequality than the dynamic images viewed by the subjects.

were checked daily and maintained with a photometer. Aviewing hood was attached to the monitor to minimizescreen reflection and ambient light and to control the view-ing distance. Subjects viewed the images at a distance of 56cm from the video screen, resulting in a visual angle of thestimulus field of approximately 4.4 deg horizontally and 6.6deg vertically.

The processing of the stimuli consisted of digital prepro-cessing followed by the generation of sequences of test stim-uli with specified contrasts and added noise. The original(512 X 512 pixel) images were cropped and reduced by blockaveraging to produce images measuring 96 X 64 pixels.

These new individual frames were then concatenated to bepresented as 30-frame/sec sequences 2, 2.5, and 3 sec longbecause the particular signs have different durations. Theactual frame was reenlarged by a factor of 2 and consisted of192 scan lines produced by two interlaced fields.

The first step in generating the stimulus sequences was todetermine the actual rms contrast of each sequence. Theactual rms contrast Ck of the kth frame is defined as the rmsdeviation of the pixel luminance from the mean pixel lumi-nance of that frame, divided by the frame's mean luminance:

m 1/2

E (L k(l ) - j '

ek~~~m-1)uk2 , ~~~~(3)

where m is the total number of pixels in each frame (m =rows X columns), Mk(i) is the luminance value of pixel i inframe k (k = 1,.. ., K), and Vk = I Vkl(i)/m is the averagepixel intensity in the kth frame. The average actual rmscontrast C of a sequence of K frames is the average of 0

k overthat sequence:

1Ko = K Eok-

k=1

addnoise

Pavel et al.

To"

Page 4: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

2358 J. Opt. Soc. Am. A/Vol. 4, No. 12/December 1987

Because the actual contrast of each ASL sequence is slightlydifferent, we will use the term contrast to designate thenormalized contrast C, where C is defined as unity for theoriginal sequence. Thus the luminance of pixels in an ASLsequence with contrast C is determined from the originalASL sequence by

Vk(L) = Vk + C[Vk(i) - Uk]-

The additive noise used in this experiment was distribut-ed normally (Gaussian noise) with a uniform power spec-trum over the entire range of spatial and temporal frequen-cies, limited only by the corresponding sampling rates; 14cycles/deg of visual angle and 15 Hz. The noise rms valuesN were determined from the actual rms contrast C of thesequence and the desired signal-to-noise ratio s/n:

.c_ (4)s/n

The noise was generated by sampling from a normal distri-bution with a mean of zero and a standard deviation RI.This noise was then algebraically added, pixel by pixel, tothe intensity values of the pixels of the original image.

Our image-processing system has an internal intensityrange of 0-255. The ranges of contrast and s/n used werechosen to maximize the probability that the signal with noiseremained within the allowable range 0-255 of pixel valueswhile utilizing the largest possible dynamic range. This wasof concern because saturation and cutoff (values outside therange 0-255) produce distortion that is difficult to charac-terize in terms of additive noise. None of the sign sequencescontained more than 3% saturated pixels.

Intelligibility Test DevelopmentThe intelligibility test was designed to measure the ability ofASL signers to identify individual signs isolated from con-text. Identifying isolated signs is a stringent test of perfor-mance because in natural ASL conversation, signers dependgreatly on context. What follows is a brief description of theintelligibility test development process.

1. Stimulus SelectionInitially, 112 candidate signs were selected from a previouslyconstructed list of 300 ASL signs that were designed torepresent various classes of words as well as a range of handshapes, locations, and movements. An attempt was made toavoid unusual or ambiguous signs.

2. Preselection of 88 Test SignsIn a small preselection experiment, two expert observersviewed each sign, without noise and with normal contrast.They were asked to indicate in writing what sign had beenpresented. Signs to which both subjects responded incor-rectly were eliminated from the test.

3. Difficulty RatingIn order to produce groups of signs of approximately equaldifficulty and noise susceptibility, a preliminary intelligibil-ity test was run. The stimuli in this preliminary test werethe images of the preselected ASL signs. Digitally generat-ed Gaussian noise was added to the signs (on a per-pixelbasis) such that the s/n was 1/2, and the original contrast wasunchanged (C = 1). Two subjects were asked to identifyeach of the signs. Each stimulus was then assigned an ordi-nal difficulty score according to the performance of the twosubjects as follows: (1) both subjects correct (easy), (2) onesubject correct (difficult), and (3) both subjects incorrect(most difficult). The difficulty scores were used to dividethe stimuli into 11 blocks of eight signs in order to equalizethe difficulty of each block. Three other factors were bal-anced (as much as possible) between blocks: symmetry (i.e.,are the left- and right-hand shapes and movements thesame?), location (where in space is the sign produced?), andhand shape.

4. Final Sign SetThe resulting test stimuli consisted of 88 common ASL signs(one sign was repeated). There were 43 nouns, 30 verbs, 12adjectives, 1 adverb and 1 conjunction. The complete set isincluded in Table 1. Typical of the nouns, verbs, and adjec-

Table 1. Representation of the Difficulty of ASL Signs

Stimulus (ASL) Preliminary Difficulty Word Frequency Probability Correct No. of Trials Difficulty (loge)

AccidentAnimalAppleBearBecauseBehindBelieveBoringBossBreadBugChallengeCheeseColorCopCountryCriticizeDailyDeafDisbelieveDon't-want

123211111121311312121

33689

57883258200

55

414

369

14115

3244

122123

489

92.8682.1478.57

100.00100.00

7.14100.00

0.0085.7123.0850.0085.7192.86

100.00100.0092.8650.0042.86

100.0092.3178.57

142814131414141414131414141314141414141314

0.552-0.380

0.141<-0.435

<0.0980.311

<-0.397>0.765

0.0310.0071.1250.724

-0.142<-0.435

<0.0980.552

-0.2620.496

<-0.5950.571

-0.552

Pavel et al.

Page 5: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

Vol. 4, No. 12/December 1987/J. Opt. Soc. Am. A 2359

Table 1. Continued

Stimulus (ASL) Preliminary Difficulty Word Frequency Probability Correct No. of Trials Difficulty (loge)EmphasizeEyeFinishFlagFlowerFollowFootballFridayGirlGoodGuiltyHomeHospitalImproveJumpKillLeaveLetterLousyLoveMachineMemberMotherNoonOurOwePaperPast/agoPayPennyPlanPourPreachPregnantPunishReadRedRelaxSchoolScrewdriverSergeantSecretSheepShortSitSorryStartSufferSummerTalkTelegraphTemptThinkTobaccoTomatoTrainTreeUglyUncleUnderstandWeekWifeWolfWorldWrestlingYesterday

111212212312212331311211111223121321111133222112212132211111133112

20122391623973660

22080729

547110392463

20514512

23210313721625

125210

15727117225

205'9

883

17319719

49250

7823

2126748

15433

134154212

433199

83592157

137275228

6787

183

0.00100.00100.0085.7135.7171.4378.5714.2978.570.000.00

100.0078.57

100.0085.717.14

35.7192.86

100.0092.8671.43

100.00100.0021.43

100.0085.71

100.0085.717.14

92.86100.0064.297.14

92.3192.867.14

61.5457.1478.5714.2971.4371.4314.29

100.0028.57

100.00100.00100.0078.577.14

42.8678.5792.8678.5735.7185.7192.8628.57

100.0092.86

100.00100.00

0.0092.3150.00

100.00

141414141414141414141414141414141414131414131414141414141414141414131414131414141414141314131414141414141414141414141414141414131414

>0.765<-0.595

<0.0980.724

-0.1300.9200.1410.1390.834

>0.765>0.765

<-0.5950.141

<0.0980.0310.311

-0.130-0.516<0.099

0.552-0.466<0.099<0.098

0.029<0.098

0.724<-0.595

0.0310.311

-0.142<0.098

0.3450.3110.8280.5520.3111.0210.3670.1410.1390.9200.9200.832

<-0.435-0.057

<-0.595<-0.595

<0.0980.8340.3110.4960.375

-0.1420.1410.5630.7240.552

-0.057<0.098-0.142<0.098<0.098>0.765-0.124-0.262<0.098

Pavel et al.

Page 6: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

2360 J. Opt. Soc. Am. A/Vol. 4, No. 12/December 1987

tives used were, respectively, girl, cheese, and tomato; sit,talk, and tempt; and good, lousy, and ugly. A native ASLsigner was videotaped signing the isolated ASL signs. Thesigns were performed without lip movement or other facialexpression in order to provide an estimate of pure ASLcommunication (even though facial gestures usually accom-pany ASL). The same starting and final positions (armsfolded) were used as context for each isolated sign.

Stimulus Parameter DeterminationIn order to determine the range of contrasts to be used forthe study, a preliminary contrast-sensitivity test was run.The test signs were informally presented at a variety ofcontrasts C without added noise. Subjects performed athigh levels even with C equal to only 1/16.

Stimulus Block OrderingSubjects viewed a stimulus tape on which all the experimen-tal stimuli were displayed in a predetermined order. Eachstimulus tape consisted of nine blocks of the main 3 X 3experimental conditions and two blocks of additional condi-tions (zero-noise control and an extra contrast condition).Each block consisted of eight signs. Each main block wasrecorded at one of three contrast levels and one of threesignal-to-noise ratios, thus forming a 3 X 3 design. Themain s/n conditions were produced by the following combi-nations of signal and noise (s, n) contrasts:

s/n = 1.00: (0.5/0.5), (0.25/0.25), (0.125/0.125);

s/n = 0.50: (0.5/1.0), (0.25/0.50), (0.125/0.250);

s/n = 0.25: (0.5/2.0), (0.25/1.00), (0.125/0.500).

The signal and noise levels were chosen to avoid both largevalues of s + n, where saturation would occur, and low valuesof s, where intensity quantization and other extraneoussources of system noise would become significant. Of thetwo additional blocks, one displayed a pure signal at a signalcontrast of 1 (original contrast). When no noise was addedthe actual s/n - 64, obtained by careful measurement,2 9 wasdesignated as s/n = a. This block provided a performancebaseline. The other block was a fourth contrast conditionfor s/n = 1 with C = 1.

In order to counterbalance any possible systematic effectof block difficulty on performance, three different VCRtapes were created and used in the study. Each block ofsigns was produced with a different contrast level on eachtape. Thus, at each s/n level, all comparisons of differentcontrast levels were completely balanced; that is, each con-trast level was tested with precisely the same ASL signs aswere the other two contrast levels, and every sign was seen atall three contrast levels (by different subjects).

At all contrast levels and s/n ratios, including s/n = a, thesignal was quantized to 16 intensity levels so that therewould be no confounding of the number of intensity levelswith the range of signal contrast. The overall stimulus s + nwas quantized at 256 levels.

SubjectsThe subjects were 14 deaf, fluent-signing adults who usedASL as their primary mode of communication. Four sub-jects were native signers who learned ASL as their firstlanguage, seven subjects first learned sign language before

the age of 7 years, and three subjects learned sign languagebefore the age of 20 years. Of the five males and ninefemales involved, seven were born deaf. The other sevensubjects lost their hearing no later than the age of 3 years.The average age of the subjects was 60 years, with the agesranging from 40 to 76 years. The subjects' occupationsincluded those of housewife, architect, printer, proofreader,and ASL teacher.

ProcedureSubjects first answered a written questionnaire that in-quired about their sign-language background. The subjectsthen viewed an instructional video tape. On this video tape,a native deaf signer explained the nature of the test materi-als and response requirements. The video tape also showedan example of visual noise. The brightness and contrastcontrols were set identically for all subjects. Subjects wereinstructed first to imitate the sign presented and then towrite an English gloss for the perceived sign on the scoresheet provided. Any questions the subjects had were an-swered by the experimenter, who was fluent in ASL. Ses-sions lasted approximately 40 min.

Scoring ResponsesThe subjects' written responses were scored for accuracy.The nature of ASL requires careful interpretation of thesubjects' responses. In particular, certain signs can elicitdifferent English word responses, analogous to synonyms inEnglish. For example, when shown the sign "relax," a sub-ject can correctly respond with any of various different writ-ten answers: "relax," "content," "satisfy," and "relieve."Each of these responses is equally acceptable. In otherinstances, one articulated sign can have two totally differentmeanings. As with homonyms in English, the context inwhich the sign is used clarifies its meaning. For example,one articulated sign can elicit the written responses "pep-per" or "preach." Either of these answers would be scoredas correct for that context-dependent sign. All synonymand homonym responses were scored as correct. Addition-ally, over the course of time, some signs have changed theirmeaning, while retaining their same form. For example, thesign articulation that presently means "bread" formerlymeant "how." Both English responses, "bread" and "how,"were scored as correct.

RESULTS AND DISCUSSION

Overall PerformanceThe subjects' performance is summarized in terms of theproportion of correct responses for each stimulus item andeach contrast level and noise level. The overall proportionof incorrect identifications was 32% in the total of 1181 trials.A few of the trials were excluded from the analysis eitherbecause the observer did not see the stimulus altogether orbecause the response was uninterpretable. Of the incorrecttrials, 58% were incorrect words (false alarms), and the re-maining 42% were omissions. Thus our subjects were some-what conservative; had they guessed more often, they mighthave slightly increased their scores.

The control block of trials with items at the original con-trast with no noise added was run to ensure that the experi-

Pavel et al.

Page 7: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

Vol. 4, No. 12/December 1987/J. Opt. Soc. Am. A - 2361

Table 2. Summary of Intelligibility Results (AllStimuli)

% Correct Response for the FollowingSignal-to-Noise Ratioa

Contrast 0.25 0.50 1.00 Infiniteb

1 95.83 (72) 92.31 (104)0.5 17.86 (112) 79.46 (112) 91.96 (112)0.25 27.68 (112) 80.18 (111) 86.61 (112)0.125 24.32 (111) 78.38 (111) 87.50 (112)

AverageProbab. 0.23 0.79 0.90

a The number of trials is given in parentheses next to each value.b The actual maximum s/n was -64. The number of trials is given in

parentheses next to each value.

a

81CSH

0

U

0.0 0.2 0.4 0.6 0.8 1.0 1.2

SIGNAL-TO-NOISE RATIORMS

Fig. 3. Probability of correct response (averaged over subjects) as afunction of the signal-to-noise ratio. Within each vertical group ofpoints, the normalized contrast C of the images varies. The smoothcurve is derived from Eq. (13).

mental procedure did not impair the intelligibility of thestimulus materials. The results (Table 2) indicate that sub-jects were able to perform at 92% correctness, which is slight-ly better than is usually observed with such tests.1 3"14 In theexperimental conditions, group performance ranged from alow of 18% to a high of 92%, which indicates that the experi-ment successfully spanned a wide range of stimulus condi-tions.

The Effect of the Signal-to-Noise RatioTable 2 and Fig. 3 indicate that for each contrast level, theprobability of correct response is a monotonically increasingfunction of the signal-to-noise ratio. The contrast alonedoes not seem to have any systematic effect. The largestdifference (10%) that was due to overall contrast (with s/nheld constant) occurred at the lowest s/n levels, but theeffects of contrast on performance were not systematic.

We tested the signal-to-noise hypothesis by using a x2test. The test was based on the consequence of the signal-to-noise hypothesis that the probability of correct identifi-cation should be the same for the same signal-to-noise ratio.In the main 3 X 3 study, there were three different levels of s/n, each composed of three different combinations of signaland noise. For each s/n, we computed the average intelligi-

bility and used that as the expected value for computing x2.The x2 for each s/n has two degrees of freedom, and the sumof all x2 then has six degrees of freedom. The resulting,statistically insignificant, x2 = 5.08 is expected under thesignal-to-noise hypothesis. For comparison, we computedx2 on the basis of other hypotheses but could not find analternative that we would not reject. Over the range of s andn values studied, only s/n (and not s or n individually)determined performance.

Limits of Visual Communication: A FunctionalFormulationHaving confirmed that the signal-to-noise ratio is the mostimportant predictor of performance, we propose an addi-tional analysis to demonstrate how this performance is de-rived from the interplay of three factors: (1) s/n, (2) thesubjects' capabilities, and (3) the difficulty of the ASL signs.

To begin, consider the fact that observers' performance islimited at both low and high signal levels. The low-signallimitations, which are primarily due to the observers' inabil-ity to see the images clearly, have been traditionally charac-terized by a constant internal noise that is added to thesignal (e.g., Ref. 26). In what follows we will ignore thisnoise, since we have not measured intelligibility at low signallevels. The high-level limitations may be caused by a num-ber of factors, including the observers' expertise, their cogni-tive abilities, the implementation of the signs, and the quan-tizing noise. For the sake of uniformity we developed amodel in which the the high-signal limits are also represent-ed by additive noise.

Thus the following development of a model is based ontwo simplifying assumptions: (1) the variances of all noises,internal and external, simply add together to produce theeffective noise variance and (2) the probability of a correctidentification is directly proportional to the effective signal-to-noise ratio, that is, the ratio of the signal to the effectivenoise. The second assumption can be elaborated with twoadditional parameters in order to fit the data. We assume adirect proportionality because it considerably simplifies theresulting functional form. Other assumptions could equallywell have been made.

Under the first assumption (that an observer's limitationscan be represented by additive sources of internal noise ni),using the terminology of Pelli,26 we define the effective noisene as the sum of the internal and external noises ne2 = n2 +ni2. The effective signal-to-noise ratio s/ne that determinesperformance is then the ratio of the signal to the effectivenoise: s/ne = s/(n 2 + ni 2)1/2.

The assumption that the performance is proportional tos/ne, together with the empirically observed upper bound onperformance as s increases, determines an asymptotic limitof s/ne. Let the upper bound on identification performancebe 1/k, k > 1. The limit

lim 5 s-e (n2 + n 2)1/2 - '

indicates that, for large signal levels, the magnitude of theinternal noise is proportional to the signal strength, ni = ks.The effective noise then becomes

ne2 = n 2 + (ks) 2 . (5)

-o[

Pavel et al.

Po

1o

To

1!

Page 8: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

2362 J. Opt. Soc. Am. A/Vol. 4, No. 12/December 1987

The second assumption, that the performance p is propor-tional to the effective signal-to-noise ratio, can be written as

p(s, n) = (-2\ne

s2 1

n 2 + (kS)2 k2 + (nA2\s/

Although p could, in principle, be a function of both s and n,in this formulation it is a function of only one variable, theexternal signal-to-noise ratio. When s/n is expressed interms of its logarithm,

"kn} k + EC21Og(S/n)

where k > 1 is a real constant, then the performance functionin this equation has the basic form of the logistic function.(Note that the logistic distribution function is quite similarto the Gaussian distribution function but has a more conve-nient and more appropriate functional form for the presentanalysis.) To characterize actual data, two additional pa-rameters are required: a parameter ae for scaling and aparameter a for shifting. Introduction of these parametersinto Eq. (7) yields

= ~~~~~~~~~(8)kni k + jalog(s/n)-l(

where ae > 0, 1, and k > 1 are real constants and P is the theprobability of correct identification of an ASL sign. [With-in a probabilistic interpretation of P (i.e., logistic distribu-tion of a random variable), : and a will be closely related tothe mean (expected value) ,u and the variance a-, respective-ly.]

The origin of internal noise and its dependence on signallevel deserves a discussion. We consider two factors (quan-tization error and ignorance) that are modeled well by sig-nal-correlated noise.

Quantization NoiseThe images in our experiment were quantized to a fixednumber of levels (in particular, 16), and then the signalamplitude was amplified. This procedure was used in orderto ensure that the luminance signal was identical at all signallevels. It is customary and convenient to describe such errorintroduced by a quantizing process as additive, quantizingnoise. For a given quantized signal, the absolute amplitudeof the quantizing noise, i.e., the difference between the origi-nal signal and the quantized signal, will increase with theamplification factor but will always be proportional to thesignal level.

By using standard statistical assumptions, we estimatedthe average signal-to-noise ratio in our stimuli that was dueto quantizing noise. The resulting lower bound on the rmss/n ratio is approximately 6.9 (46.9 in power units), which iswell above the highest signal-to-noise ratio used in our ex-periment. The average effect of the quantizing noise wouldbe negligible in the experiments.

Representing Ignorance as NoiseA second component of internal noise arises from a combina-tion of an observer's unfamiliarity with particular ASL signs,with their implementation in our test, or with their Englishglosses. To represent the observer's ignorance as additive

noise, its amplitude must be made proportional to the ampli-tude of the signal so that ignorance remains independent ofthe signal level.

In both the quantization and ignorance components of theinternal noise, it is the combination of the assumption ofnoise additivity with the assumption of the invariance of theperformance with the signal level that requires proportional-ity between the noise amplitude and the signal level. Thisproportionality is embodied in the constant k. A multipli-cative combination rule would not require dependency ofthe noise on the signal level, but the resulting proportion ofsignal variance that is due to noise would be the same. Thereason for choosing an additive representation is that, in thepresent experiments, we investigated the effects of additiveexternal noise and compared the additive effects of thisexternal noise with the internal noise.

Detection performance has been characterized tradition-ally by the slope and the location of the corresponding psy-chometric functions. The psychometric functions are, inturn, assumed to describe probability distributions of un-derlying random variables. To interpret P [Eq. (11)] interms of an underlying probability distribution, we considerthe conditional probability that the subject will make a cor-rect response, given that he or she is familiar with the partic-ular ASL stimulus; i.e., we consider k. This may be inter-preted as a cumulative probability-distribution function of arandom variable that determines the intelligibility of ASLstimuli. With this interpretation of P, the parameters a anda represent the standard deviation a = 1/1.814a and themean At = p/1a, respectively, of the underlying random vari-able. The actual values of ji and a are expressed in units oflog(s/n), and they do not depend on k.

The constants 1, a, and k were computed by using thefunctional form of Eq. (8) in conjunction with a maximum-likelihood estimation technique. Note that these estimatesare based on the rms values of the ratio s/n. The resultingparameter values are

a = 4.476,

l3 = 5.033,

corresponding to the following distributional parameters:

= -1.124,

= 0.123,

k = 1.125.

The psychometric function that is based on these parametervalues was plotted in Fig. 3. It shows that = -1.124 is thevalue of log(s/n) at which 0.5 of the asymptotic performanceis reached, that the steepest slope is a, and that the asymp-totic performance level is k.

The convenience of this a, a, k representation justifies thealgebraic development; our data are not reduced by thelogistic representation. However, the logistic representa-tion has another important advantage. Because the maingoal of this experiment was to determine the invariance ofperformance with a constant signal-to-noise ratio even asthe stimulus contrast varied, it was not practical to testempirically each ASL sign in each condition. Thereforedetermining the difficulty of individual ASL signs requires afunctional model to relate signs that were never tested in the

Pavel et al.

Page 9: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

Vol. 4, No. 12/December 1987/J. Opt. Soc. Am. A 2363

same condition. For example, 29 of the 87 distinct ASLsigns (one sign was repeated) were either always or neverrecognized. Did recognition failures occur because ASLsigns were difficult or because they occurred only in difficultconditions? Because conditions differ only in objectivelymeasurable additive noise, the parameters of the logisticrepresentation can be estimated and used to estimate thedifficulty of individual signs and conditions (noise) evenwhen no signs occur in more than one condition; that is, thelogistic representation permits the segregation of the itemdifficulty and the condition difficulty in a seemingly intrac-table case. (See the Discussion section for full details.)

Effects of Item and Subject Differences: SubjectDifferencesA number of post hoc analyses were performed to examineother possible sources of variability. These analyses soughtto determine the source of differences in difficulty betweenstimuli and to estimate the effects of differences in skillsbetween the individual observers.

Individual DifferencesAn analysis was performed to determine whether there wasevidence in our performance data for an interaction betweensubjects and the effects of the signal-to-noise ratio. Eachsubject's performance was analyzed separately, and the re-sults were tested for interaction by using a two-way analysisof variance. There was no significant interaction, and wecould not find any systematic effect for the individual sub-jects. All our subjects were affected quantitatively in thesame way by stimulus degradation.

Word FrequencyThe performance in auditory intelligibility experimentswith unspecified word sets depends on the prior frequency ofthe words in natural language,3 0 high-frequency words beingidentified at lower s/n levels. By analogy, we would expectthat a sign with a high frequency of usage in ASL may beidentified at lower s/n than an uncommon sign. To examinethis sign-frequency hypothesis, it would be necessary to havenorms for the frequency of usage of signs. Such informationis at present unavailable. Therefore we assumed that thefrequency of signs follows patterns similar to that of English.We compared the performance on each sign with the fre-quency of the nearest corresponding English word. Thescatter diagram of this comparison is shown in Fig. 4. Thecorrelation between the frequency and overall performanceis small (0.153) and not statistically significant. It may bethat the correlations are small in part because the signs inthe study all were relatively high-frequency signs. Confir-mation of this result awaits the availability of frequencynorms for ASL signs.

Removing Perfect StimuliA closer examination of the data revealed that 24 of the 88stimuli were correctly identified on every presentation. Toensure that the signal-to-noise-ratio results were not ob-tained by the particular mixture of difficult and easy stimuli,we repeated all the data analyses with those perfect stimuliexcluded. The results and the conclusion remained un-changed: the intelligibility score was determined by thesignal-to-noise ratio.

Effects of Item and Subject Differences: Item Differences

Components of Stimulus DifficultyUp to the preceding paragraph, we had assumed, in ouranalyses, that all signs were about equally easy or difficult toidentify. Even a superficial examination of the data, how-ever, suggests considerable differences among the signs.Some signs were recognized quite accurately, whereas otherswere not. The determination of the actual difficulty of eachindividual sign was not possible using the raw data becausethe different ASL signs were presented with different s/nratios. Here, we take advantage of the general form of thefunctional relationship between the probability of correctidentification and s/n to evaluate the difficulties of individ-ual stimuli.

There are at least two potential components of difficultyin identifying individual signs. One is observers' possiblelack of familiarity with some of the ASL signs. This effectcannot be eliminated by increasing the signal level or thesignal-to-noise ratio. This component was discussed earlierand was summarized by the constant k in Eq. (8). Thesecond component, a perceptual one, arises from the limitsof visual capabilities and information content of the images.The purpose of the following analysis is to describe thedifficulty of individual signs in terms of signal related pa-rameters.

Representation of the Difficulty of ASL SignsSince the signal-to-noise ratio was the predominant deter-minant of intelligibility, our hypothesis was that the percep-tual component of difficulty of an ASL sign could be ex-pressed as a parameter modifying the value of the physicals/n. There are several ways to define such a parameter (e.g.,additive to s/n, additive to s or n), and we explored severalreasonable formulations of item difficulty. The best-fittingrepresentation of sign difficulty consisted of a constant d(x)representing the sign difficulty that was subtracted from thelogarithm of s/n; alternatively, d(x) may be viewed as anefficiency factor that multiplies s, with the most difficultsigns being inefficient and requiring more s.

The reader may recall that the constant k in Eq. (8) repre-sented both the sign familiarity and the effects of quantizingnoise over the ensemble of signs. In order to characterize

8

81

0]

2

0o , 200 400 600 800 1000 1200 1400

WORD FREQUENCY

Fig. 4. Probability of a correct response (intelligibility) as a func-tion of the English word frequency for each of the ASL signs tested.For these data, the correlation between intelligibility and wordfrequency is 0.15, which is not statistically different from zero.

on0 0 0 0 0M0 0

ND 00 0 0

~at

-o o0

1O00

0

: o co II

Pavel et al.

Po

1o

1!

1�

Page 10: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

2364 J. Opt. Soc. Am. A/Vol. 4, No. 12/December 1987

signs whose probability of correct identification exceeded1/k, we computed the individual sign difficulty using Eq. (8)with k = 1. The intelligibility of an ASL sign x then dependson s/n and on the difficulty parameter d(x) as follows:

P x, s) = 1 (9)P(x, n) 1 + EC-alog(s/n)-d(x)]-fl

The resulting values d(x) (i.e., logarithm of the losses) areshown in Table 1. The fit of this model was assessed onthose signs whose probability of identification was less thanone and greater than zero. Each value of d(x) was used topredict the difficulty of the same sign in up to three differents/n conditions. The model was tested by using a x2 test; theresulting x2 = 24.2 had 21 degrees of freedom. Note that forthe stimuli that were either always or never correctly identi-fied (100% or 0% correct), we can determine only the upperand lower bounds of d(x), respectively. These bounds areidentified by the corresponding < and > symbols in Table 1.The reason that these bounds may differ for two stimuli withthe same probability of correct response is that they wereused with different s/n levels. For example, consider twosigns that were always correctly recognized; the sign shownin the low s/n condition will have a lower difficulty boundthan the one that was shown with a high s/n. Similarly, thedifficulty of the signs does not necessarily increase with theprobability of error. There are signs (e.g., "accident") thatwere identified with high probability (93%) whose difficulty(0.552) is higher than other signs (e.g., "behind") that werealmost never identified (7%) but whose difficulty is estimat-ed to be lower (0.311). The reason for this nonmonotonicityis that the former sign was presented in conditions with ahigher s/n ratio than the latter. It is worth noting that thistype of inference would not be possible without the model.

The important result of this analysis is that the effect ofthe perceptual component of difficulty can be viewed as areduction of the signal-to-noise ratio.

The Relation between s/n and Information CapacityThe most important result of this study is the confirmationof the signal-to-noise hypothesis. However, in the process,we measured the quantitative relationship between ASLintelligibility and the signal-to-noise ratio. Shannon2 2

proved that, under certain assumptions, s/n and the infor-mation capacity c of a channel are related by c = TWlog2[(s2 + n2 )/n2 ], where W is the spatial bandwidth of thesignal and the noise and where T is time. It is tempting tocombine the measured performance at various values of s/nwith Shannon's theorem to infer the noise of the humanperceptual channel and ultimately, thereby, its informationcapacity.

In the light of the apparent simplicity and directness ofShannon's theorem, it is instructive to confront the manyobstacles between the data and estimates of perceptual in-formation capacity. The critical value of s/n, at which per-formance begins to suffer as external noise is added, is oftenused to estimate the internal noise (e.g., Ref. 26). However,our data show that internal noise is negligible under theconditions of the experiment; otherwise, the ratio s/n wouldnot have been adequate to describe the data; for constants/n, performance would have suffered as s and n decreased.What we did observe is that, as s/n is increased, a criticalpoint is reached approximately when rms s/n = 0.5. At thispoint the internal perceptual noise equals the external noise.

However, this internal noise is the signal-correlated noisethat is described by the parameter k of Eqs. (5) and (6) andnot the signal-independent noise implied by Shannon'stheorem.

Quite apart from its relation to information theory, thenumerical value s/n = 0.5, as an estimate of equivalent inter-nal and external noise, must be interpreted with great cau-tion because the signal and the noise have different spectralpower densities. In our study, s and n represented the totalpower in the signal and the noise, respectively. When themasking effect of noise is band specific, that is, when noisemasks signals best if both s and n are in the same spatial-frequency band, then s/n has meaning only if both s and nhave the same spectrum. For example, at the given level ofs, we might have achieved the same performance reductionwith a much less powerful n if it had been spectrally betterplaced. The spectrally optimized noise would have yieldeda lower estimate of internal noise (higher critical s/n) andhence a correspondingly higher estimate of human perceptu-al capacity. To obtain a meaningful numerical value for thecritical s/n, it is minimally necessary to ensure that s and nhave the same spatiotemporal-frequency spectrum. Ulti-mately, it is better to divide the spatial-temporal frequencyspectrum into relatively narrow bands and to investigateperformance band by band rather than to coalesce the fre-quency bands in both the signal and the noise. Indeed, thisis the approach taken by Pelli3 l and Riedl32 and is the ardu-ous but only available path toward theoretically relating s/nto perceptual information-processing capacity.

SUMMARY AND CONCLUSIONS

(1) An intelligibility test was developed for video com-munication of ASL. The test consists of carefully selectedASL signs and a procedure for determining observers' re-sponses.

(2) When noise was added to ASL images, the intelligi-bility of ASL was predominantly determined by the signal-to-noise ratio, independently of the signal and the noiselevels within the range tested.

(3) A logisticlike function was proposed to characterizethe relationship between intelligibility and the signal-to-noise ratio. The logistic function was chosen on the basis ofalgebraic simplicity; other similar distributions might haveserved equally well.

(4) The difficulty of individual ASL signs can be repre-sented as consisting of two additive noise components: one,which is due to sign familiarity and quantizing noise, be-haves as signal-correlated noise (added noise that is propor-tional to the signal) and implies an upper limit in intelligibil-ity; the other, a perceptual-difficulty component, behaves asnoise added to log(s/n); it acts as an attenuation of thephysical signal and hence of s/n.

ACKNOWLEDGMENTS

The experimental work for this project was carried out at theHuman Information Processing Laboratory in the Depart-ment of Psychology of New York University and supportedby the National Science Foundation, Science and Technol-ogy to Aid the Handicapped grant PFR-80171189. Subse-quent analyses of the data and the preparation of the final

Pavel et al.

Page 11: Limits of visual communication: the effect of signal-to ...whipl/staff/sperling/PDFs/... · quently, the effectiveness of audio communication systems can be described by their signal-to-noise

Vol. 4, No. 12/December 1987/J. Opt. Soc. Am. A 2365

report were carried out jointly at New York University andat Stanford University, supported in part by U.S. Air ForceLife Sciences Directorate grants AFOSR 80-0279 and 85-0364. We appreciate and gratefully acknowledge the assis-tance that we received from many persons in the deaf com-munity and schools for the deaf, including Jerome Schein,formerly director of the Deafness Research and TrainingCenter at New York University; the staff of Public School 47;the New York Society for the Deaf; and the Hebrew Associa-tion of the Deaf. Nancy Frishberg provided essential guid-ance in the construction of the intelligibility test materials.We thank our patient deaf signer, Ellen Roth, and acknowl-edge the many useful comments of Michael S. Landy and theskillful technical assistance of Robert Picardi.

REFERENCES

1. G. E. Legge, G. S. Rubin, D. G. Pelli, and M. M. Schleske,"Psychophysics of reading. II. Low vision," Vision Res. 25,253-266 (1985).

2. D. G. Pelli, Institute for Sensory Research, Syracuse University,Syracuse, New York 13244-5290 (personal communication,1983).

3. D. G. Pelli and L. C. Applegate, "The visual requirements ofmobility," Invest. Ophthal. Vis. Sci. Suppl. 26, 57 (1985).

4. D. Regan and D. Neima, "Low-contrast letter charts as a test ofvisual function," Ophthalmology 90, 1192-1200 (1983).

5. A. P. Ginsburg, "Visual information processing based on spatialfilters constrained by biological data," Vols. 1 and 2 of doctoraldissertation (Cambridge University, Cambridge, 1978); also inLab. Rep. AMRL-TR-78-129 (Aerospace Medical ResearchLaboratory, Wright-Patterson Air Force Base, Ohio, 1978).

6. A. P. Ginsburg, "Specifying relevant spatial information forimage evaluation and display designs: an explanation of howwe see certain objects," Proc. Soc. Inf. Disp. 21, 219-227 (1980).

7. G. E. Legge, G. S. Rubin, D. G. Pelli, and M. M. Schleske,"Psychophysics of reading. I. Normal vision," Vision Res. 25,239-252 (1985).

8. G. S. Rubin and K. Siegel, "Recognition of low-pass filteredfaces and letters," Invest. Ophthalmol. Vis. Sci. Suppl. 25, 71(1984).

9. J. F. Abramatic, P. Letellier, and M. Nadler, "A narrow-bandvideo communication system for the transmission of sign lan-guage over ordinary telephone lines," in Image Sequence Pro-cessing and Dynamic Scene Analysis, T. S. Huang, ed. (Spring-er-Verlag, Berlin, 1983), pp. 314-336.

10. D. E. Pearson, "Visual communication systems for the deaf,"IEEE Trans. Commun. C-29, 1986-1992 (1981).

11. D. E. Pearson, "Evaluation of feature-extracted images for deafcommunication," Electron. Lett. 19, 629-631 (1983).

12. D. E. Pearson and H. Six, "Low data-rate moving-image trans-mission for deaf communication, in Proceedings of the Interna-tional Conference on Electronic Image Processing (Institutionof Electrical Engineers, London, 1982), pp. 204-208.

13. G. Sperling, "Bandwidth requirements for video transmission of

American sign language and finger spelling," Science 210, 797-799 (1980).

14. G. Sperling, "Video transmission of American sign language andfinger spelling: present and projected bandwidth require-ments," IEEE Trans. Commun. C-29, 1993-2002 (1981).

15. G. Sperling, M. Pavel, Y. Cohen, M. S. Landy, and B. J.Schwartz, "Image processing in perception and cognition," inPhysical and Biological Processing of Images. Rank PrizeFunds International Symposium at The Royal Society of Lon-don, 0. J. Braddick and A. C. Sleigh, eds., Vol. 11 of SpringerSeries in Information Sciences (Springer-Verlag, Berlin, 1983),pp. 359-378.

16. G. Sperling, M. S. Landy, Y. Cohen, and M. Pavel, "Intelligibleencoding of ASL image sequences at extremely low informationrates," Comput. Vision Graphics Image Process. 31, 335-391(1985).

17. A. E. Burgess, "Visual signal detection. III. On Bayesian useof prior knowledge and cross correlation," J. Opt. Soc. Am. A 2,1498-1507 (1985).

18. A. E. Burgess, R. F. Wagner, R. J. Jennings, and H. B. Barlow,"Efficiency of human visual detection," Science 214, 93-94(1981).

19. U. Bellugi and S. Fischer, "A comparison of sign language andspoken language," Cognition 1, 173-200 (1972).

20. J. E. Hawkins and S. S. Stevens, "The masking of pure tonesand of speech by white noise," J. Acoust. Soc. Am. 22, 6-13(1950).

21. H. Poizner and H. Lane, "Discrimination of location in Ameri-can sign language," in Understanding Language through SignLanguage, P. Siple, ed. (Academic, New York, 1978), pp. 271-287.

22. C. E. Shannon and W. Weaver, The Mathematical Theory ofCommunication (U. of Illinois Press, Urbana, Ill., 1949).

23. D. M. Green and J. A. Swets, Signal Detection Theory andPsychophysics (Wiley, New York, 1966).

24. E. H. Weber, De Pulsu, Resorptione, Auditu, et Tactu(Koehler, Leipzig, 1834).

25. R. S. Woodworth and H. Schlosberg, Experimental Psychology(Holt, New York, 1954).

26. D. G. Pelli, "Effects of visual noise," doctoral dissertation(Cambridge University, Cambridge, 1980).

27. M. S. Landy, Y. Cohen, and G. Sperling, "HIPS: a UNIX-based image processing system," Comput. Vision Graphics Im-age Process. 25, 331-347 (1984). (HIPS is the Human Informa-tion Processing Laboratory's image-processing system.)

28. M. S. Landy, Y. Cohen, and G. Sperling, "HIPS: Image pro-cessing under UNIX software and applications," Behav. Res.Methods Instrum. 16, 199-216 (1984).

29. Y. Cohen, "Measurement of video noise," Tech. Rep. (HumanInformation Processing Laboratory, New York University, NewYork, 1982).

30. I. Pollack, "Message uncertainty and message reception," J.Acoust. Soc. Am. 31, 1500-1508 (1959).

31. D. G. Pelli, "The spatiotemporal spectrum of the equivalentnoise of human vision," Invest. Ophthalmol. Vis. Sci. Suppl. 24,46 (1983).

32. T. R. Riedl, "Spatial frequency selectivity and higher level hu-man information processing," doctoral dissertation (Depart-ment of Psychology, New York University, New York, 1985).

Pavel et al.