lexical access and lexical decision: mechanisms of frequency sensitivity

21
JOURNAL OF VERBALLEARNING AND VERBALBEHAVIOR22, 24--44 (1983) Lexical Access and Lexical Decision: Mechanisms of Frequency Sensitivity BARRY GORDON Baltimore City Hospitals and The Johns Hopkins University and School of Medicine Three models of lexical access and lexical decision--the serial search model, the two- dictionary model, and a parallel-access, criterion-bias model---were tested in a large experi- ment (148 subjects, 458 words) comparing the effects of mixed- and blocked-frequency presentation on correct lexical decision times. Reaction times were faster for high-frequency words in the blocked, pure-frequency condition than in the mixed-frequency one; medium- frequency words showed less of a difference; and low-frequency words showed no appreci- able difference at all. These results corroborate and extend Glanzer and Ehrenreich's (Jour- nal of Verbal Learning and Verbal Behavior, 1979, 18, 381-398) empirical results with this paradigm. They strongly imply that changes in decision criteria underlie the reaction time differences Glanzer and Ehrenreich and we found. This evidence places further constraints on theories of lexical access, which may be more easily accommodated by parallel models than by serial ones. Glanzer and Ehrenreich's two-dictionary model is not supported by this data. Experimental investigations and theo- retical explanations of lexical access have usually addressed two classes of issues to- gether. One class of concerns has been with the nature of the lexical informational codes, whether extracted from a presented word or inherent in the lexicon. The other class of questions has had to do with the mechanisms by which this information (ir- respective of its nature) is used (cf. Colt- heart, Davelaar, Jonasson, & Besner, 1977). Yet, it is not only logically possible to separate these two types of questions, but it may be theoretically and experimentally This paper is based in part on a doctoral dissertation submitted to The Johns Hopkins University. I thank my advisor, Alfonso Caramazza, and Rita Berndt, Renee Gordon, Bert Green, Howard Egeth, Steven LaPointe, Michael McCloskey, and Warren Torgerson for their guidance with the experiment and/or the manuscript. Two anonymous reviewers had insightful comments. I also thank Kevin Gallagher and John Lewis for programming assistance, and Clara Marin, Andrew Mead, and Jane Sellman for help with stimulus selection and/or subject testing. This work was supported in part by NIH (NINCDS) Grants NS16155 and NS14099 to The Johns Hopkins Univer- sity. Address reprint requests to Barry Gordon, De- partment of Neurology, Johns Hopkins Hospital, 600 N. Wolfe St., Baltimore, Maryland 21205. revealing to do so. This article will con- sider just the possible mechanisms of lexi- cal access, particularly those mediating lexical decision. Theories of lexical access have generally drafted one of two mechanisms for service: In serial search models, the information extracted from the stimulus is sequentially compared with the lexicon. This serial search mechanism has been employed in various forms by many theories of lexical representation (e.g., Becket, 1976; Becker & Killion, 1977; Forbach, Stanners, & Hochhaus, 1974; Forster, 1976, 1978; Forster & BednaU, 1976; Landauer, 1975; Rubenstein, Garfield, & Rubenstein, 1970; Rubenstein, Lewis, & Rubenstein, 1971; Stanners & Forbach, 1973). When lexical access is viewed as a memory task, then the serial search proposals share many of the assumptions of serial scan memory models (e.g., Sternberg, 1975). The contrasting parallel- or direct-access theories postulate that incoming information can be simulta- neously compared with all relevant infor- mation in the lexicon. Morton's (1970, 1980) logogen model is one example; Colt- heart et al. (1977) have proposed a direct- access model of lexical decision based on 0022-5371/83/010024-21 $03.00/0 Copyright~5)1983 by Academic Press.Inc. All rightsof reproduction in any formreserved. 24

Upload: barry-gordon

Post on 15-Sep-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lexical access and lexical decision: mechanisms of frequency sensitivity

JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR 22, 24--44 (1983)

Lexical Access and Lexical Decision: Mechanisms of Frequency Sensitivity

BARRY GORDON

Baltimore City Hospitals and The Johns Hopkins University and School of Medicine

Three models of lexical access and lexical decision--the serial search model, the two- dictionary model, and a parallel-access, criterion-bias model---were tested in a large experi- ment (148 subjects, 458 words) comparing the effects of mixed- and blocked-frequency presentation on correct lexical decision times. Reaction times were faster for high-frequency words in the blocked, pure-frequency condition than in the mixed-frequency one; medium- frequency words showed less of a difference; and low-frequency words showed no appreci- able difference at all. These results corroborate and extend Glanzer and Ehrenreich's (Jour- nal of Verbal Learning and Verbal Behavior, 1979, 18, 381-398) empirical results with this paradigm. They strongly imply that changes in decision criteria underlie the reaction time differences Glanzer and Ehrenreich and we found. This evidence places further constraints on theories of lexical access, which may be more easily accommodated by parallel models than by serial ones. Glanzer and Ehrenreich's two-dictionary model is not supported by this data.

Experimental investigations and theo- retical explanations of lexical access have usually addressed two classes of issues to- gether. One class of concerns has been with the nature of the lexical informational codes, whether extracted from a presented word or inherent in the lexicon. The other class of questions has had to do with the mechanisms by which this information (ir- respective of its nature) is used (cf. Colt- heart, Davelaar, Jonasson, & Besner, 1977). Yet, it is not only logically possible to separate these two types of questions, but it may be theoretically and experimentally

This paper is based in part on a doctoral dissertation submitted to The Johns Hopkins University. I thank my advisor, Alfonso Caramazza, and Rita Berndt, Renee Gordon, Bert Green, Howard Egeth, Steven LaPointe, Michael McCloskey, and Warren Torgerson for their guidance with the experiment and/or the manuscript. Two anonymous reviewers had insightful comments. I also thank Kevin Gallagher and John Lewis for programming assistance, and Clara Marin, Andrew Mead, and Jane Sellman for help with stimulus selection and/or subject testing. This work was supported in part by NIH (NINCDS) Grants NS16155 and NS14099 to The Johns Hopkins Univer- sity. Address reprint requests to Barry Gordon, De- partment of Neurology, Johns Hopkins Hospital, 600 N. Wolfe St., Baltimore, Maryland 21205.

revealing to do so. This article will con- sider just the possible mechanisms of lexi- cal access, particularly those mediating lexical decision.

Theories of lexical access have generally drafted one of two mechanisms for service: In serial search models, the information extracted from the stimulus is sequentially compared with the lexicon. This serial search mechanism has been employed in various forms by many theories of lexical representation (e.g., Becket, 1976; Becker & Killion, 1977; Forbach, Stanners, & Hochhaus , 1974; Fors te r , 1976, 1978; Forster & BednaU, 1976; Landauer, 1975; Rubenstein, Garfield, & Rubenstein, 1970; Rubenstein, Lewis, & Rubenstein, 1971; Stanners & Forbach, 1973). When lexical access is viewed as a memory task, then the serial search proposals share many of the assumptions of serial scan memory models (e.g., Sternberg, 1975). The contrasting parallel- or direct-access theories postulate that incoming information can be simulta- neously compared with all relevant infor- mation in the lexicon. Morton 's (1970, 1980) logogen model is one example; Colt- heart et al. (1977) have proposed a direct- access model of lexical decision based on

0022-5371/83/010024-21 $03.00/0 Copyright ~5) 1983 by Academic Press. Inc. All rights of reproduction in any form reserved.

24

Page 2: Lexical access and lexical decision: mechanisms of frequency sensitivity

MECHANISMS OF FREQUENCY SENSITIVITY 25

the logogen concept. Again, considering the broader memory aspects of lexical re- trieval, Ratcliff (1978) and Murdock and his colleagues (Murdock, 1979; Metcalfe & Murdock, 1981) have discussed parallel- access models of memory which could con- ceivably also serve as the basis for lexical a c c e s s .

Among the best-established and most critical experimental findings which these various theories have had to confront are the pervasive effects of word frequency. In the lexical decision task, word frequency has been the single strongest predictor of differences in reaction times among words, accounting for about 50% of the variance in reaction times in Whaley's (1978) study.

The serial search hypothesis accounts for the frequency effect in lexical decision by postulating that the scanning sequence is frequency-ordered (from highest to lowest) and that a correct word match terminates the search.

Parallel-access models can incorporate frequency sensitivity in lexical decision in several ways. Coltheart et al. (1977) suggest that the threshold for a word's logogen re- sponse varies inversely with frequency; lexical decisions are then faster for high- frequency words than low frequency ones because the high-frequency word logogens reach threshold sooner. In the parallel- access model we will discuss later, the growth of representation strength is related to word frequency (familiarity); higher- frequency word stimuli will more quickly give the subject more information on which to base the lexical decision.

Glanzer and Ehrenreich (1979) have pro- posed yet another type of model to contrast with these two. Theirs differs from both of the others in that it says nothing about the way the decision itself is made. Instead, they postulate that subjects have two inter- nal "dictionaries" of word-related knowl- edge at their disposal: one of these dic- tionaries contains only common, high- frequency words; the other contains all of the subject's known words, including the

high-frequency ones. Within each dictio- nary, search time does not vary with fre- quency. Instead, the word frequency effect emerges in their model from differences in the probability of choosing the correct dic- tionary to begin the search, and in the chances of successfully finding the word in the dictionary searched first.

As Glanzer and Ehrenreich (1979) have noted, essentially all of the evidence for the frequency effect in lexical decision has been obtained in situations where subjects are tested with items from a range of fre- quencies (mixed- f requency lists). The theoretical accounts have therefore also been adapted to only this experimental situation. Glanzer and Ehrenreich (1979) have argued that presenting only a re- s t r ic ted range of f requenc ies (a pure- frequency list) allows a direct test of the assumptions of the serial search hypothe- sis. They interpreted their experiment with pure-frequency lists as strong evidence against this class of models, and in its place proposed the two-dict ionary model we noted above. Forster (1981) in turn has challenged the two-dictionary hypothesis on the basis of several naming tasks, in- cluding one purportedly homologous to Glanzer and Ehrenreich's (1979) lexical de- cision experiment. Forster (1981) therefore felt that the serial search interpretation of the frequency effect was "adequately de- fended" (p. 202).

Yet, as Forster (1981) and O'Connor and Forster (1981) have also noted, most of the evidence that can be marshaled for the se- rial search hypothesis--including their own data--can also be explained by parallel- access models. Truly decisive evidence for either position is lacking. Forster (1981) has suggested that the finding that stimulus quality does not interact with the frequency effect in lexical decision (Becker & Killion, 1977; Stanners, Jastrzembski , & West- brook, 1975) may be the best evidence against Morton's (1970) model. Yet even this observation is not very damaging, since this could simply mean that stimulus quality

Page 3: Lexical access and lexical decision: mechanisms of frequency sensitivity

26 BARRY GORDON

affects an encoding stage prior to the point of lexical memory access (as both original groups of investigators suggested). Even data that do challenge Morton 's (1970) model more directly (O'Connor & Forster, 1981) do not necessarily indict the entire class of parallel-access models, as O'Con- nor and Forster (1981) themselves admit.

We suggest that Glanzer and Ehren- reich's (1979) paradigm can provide some of the evidence needed to disciminate be- tween the serial search and the parallel- access models. Forster (1981) did not seem to challenge Glanzer and Ehrenreich 's (1979) basic experimental findings, but he did suggest that they might be the result of "alterations in decision making, not in lexi- cal access" (p. 194). We will argue that this interpretation is correct. However, we will also argue that this interpretation does not diminish their theoret ica l usefulness . Rather, the modulation of decision pro- cesses brought about by frequency blocking adds to the empirical constraints on serial search models, while at the same time it augments the convergent evidence for a parallel model of lexical access which easily supports these decision effects.

We will first summarize Glanzer and Ehrenre ich ' s exper imenta l design and findings, and then discuss the predictions of each model for the mixed- versus pure- frequency situation. We will focus on the issues surrounding correct word access, for the practical reason that our experiment did not measure reaction times for "nonword" responses (see Procedure under METHOD for a justification of this).

Glanzer and Ehrenreich 's (1979) fre- quency blocking was chosen on the basis of numerical distr ibution. Their high-fre- quency range was drawn from words with 148/million or higher K u c e r a - F r a n c i s (1967) frequency (summed over deriva- tional forms); medium frequency, from 6-14/million; low frequency, from 1-3/ million summed f requency . The three pure-frequency lists contained words of only a single frequency range; each of the

three mixed-frequency lists incorporated one-third of the words for each pure- frequency list. So the same word appeared in one pure and in one mixed-frequency list. The same set of legal nonwords were used in each list.

Each group of subjects was tested on only a single list. The subjects were not in- structed about the frequency representa- tions in the lists, but only results after 102 practice trials (51 words, with at least 7 -8 from the same frequency class) were analyzed. Presumably, then, subjects were both practiced and familiar with the fre- quency distr ibutions by the time they reached the critical items.

We can summarize their results as fol- lows:

(1) Word class frequency was strongly associated with differences in reaction times for the mixed-frequency lists, as ex- pected. However, for the pure-frequency condition, this expected relationship ap- peared to break down: low- and medium- frequency words both showed essentially the same-reaction times, although these were longer than the reaction time for high-frequency words.

(2) Reaction time for high-frequency words was significantly faster when they were in the pure-frequency list than when they were in the mixed-frequency lists.

(3) For medium-frequency words, there was no appreciable difference in reaction time between the two types of lists.

(4) Low-frequency words showed a non- significant trend for faster reaction times in the pure-frequency condition compared to the mixed-frequency one.

Serial Search Predictions

Glanzer and Ehrenreich (1979) discussed the possibilities for pure-frequency list performance allowed by a general serial search model. We will summarize and aug- ment their relevant arguments:

For the usual case of an item presented in a mixed-frequency list, the serial hypothe-

Page 4: Lexical access and lexical decision: mechanisms of frequency sensitivity

MECHANISMS OF FREQUENCY SENSITIVITY 27

sis assumes that search for a match begins with the most-frequent word in the lexi- con, and continues to progressively less- frequent words. For the pure-frequency lists, there are several potential options for where the search can originate. These op- tions lead to different patterns of medium- and low-frequency pure list results, but Glanzer and Ehrenreich (1979) felt that all of these had the same consequence for the high-frequency words: mixed- and pure- reaction times will not differ, since the ori- gin of the lexical search cannot be altered. Since Glanzer and Ehrenreich (1979) in fact found a significant difference for the high- frequency words, they counted this as strong evidence against the entire class of serial search models.

But this finding by itself is not necessarily as damaging as they maintain. The assump- tion underlying this argument is that pure high-frequency word searches must origi- nate at the top of the lexical stack, like mixed-frequency searches. This presup- poses that the highest-frequency words in the lexicon are included in the stimulus set, so that no further optimization is possible. Yet this was not true in Glanzer and Ehren- reich's (1979) experiment, and is not true in most lexical decision studies. The highest frequency words in the lexicon are the grammatical function words, which are generally excluded from lexical access studies (but see Gordon & Caramazza, 1982). If the start of the search could be adjusted between the mixed- and pure- frequency lists to include or to bypass these words, respectively, then even the serial search proposal can predict some differ- ence between conditions for Glanzer and Ehrenreich's (1979) high-frequency words. Their high-frequency result is therefore not so challenging for the serial search models, by itself. But if this adjustment were feasi- ble, there would also be no reason to deny a similiar opt ion for searches for pure medium- and low-frequency words. The times for these decisions should then ap- proach those of the pure-condition high- frequency words. Glanzer and Ehrenreich's

(1979) experiment was equivocal on this point, and did not have enough power to experimentally test it anyway.

More importantly for the problems of ex- perimental investigation, the serial search mechanisms which have actually been used--for example, by Rubenstein and his colleagues (Rubenstein et al., 1970; Ruben- stein et al., 1971) or by Forster (1976, 1978)--are far more elaborate than the prototype Glanzer and Ehrenreich (1979) discussed. These implementations typically include subtile searches and iterative pro- cessing. Forster's (1976, 1978) model is the most extensively developed, and will serve as the basis for our predictions; our discus- sion should nevertheless also be relevant to the models proposed by Rubenstein et al. (1970, 1971).

In Forster's (1976, 1978) model, a Master Lexicon is the repository of the complete phonologic, or thographic , and seman- t i c - syntactic specification of each word in the subject 's vocabulary. This Master Lexicon is never queried directly; instead, it is accessed by pointers from peripheral access files. Each of these peripheral ac- cess files contains only one type of infor- mation: phonologic, orthographic, seman- tic, and so on. A letter string presented visually would be handled by the ortho- graphic access file. Within an access file, entries are further subgrouped into "bins" (Forster's term) on the basis of similarity. In the case of a letter string, similarity is based on the first few letters (Forster, 1978, pg. 151). In turn, within each of these bins the entries are filed in order of frequency within that modality. A lexical decision about a visually presented string proceeds in the following way:

The initial letters of the string direct the search process to the appropriate bin. There, a frequency-ordered serial search is carried out. When a "sufficiently accurate" (Forster , 1978) match is found at this peripheral level, a pointer sent to the Mas- ter Lexicon retrieves the complete ortho- graphic entry. This is then compared with the properties of the stimulus item. If these

Page 5: Lexical access and lexical decision: mechanisms of frequency sensitivity

28 BARRY GORDON

match, it is certain that the presented letter string is a word. If the match is not perfect, however, the search process starts again in the peripheral access file, retrieving the next best candidate. This process can con- tinue until the whole set of available word entries is exhausted.

There are at least three aspects of the process envisioned by Forster (1976, 1978) which might be adjusted in the course of pure-frequency list testing: the initial pars- ing (by letter or group of letters), the bin composition, and the criterion used by the initial match process in the peripheral ac- cess bin. (In the case where no changes occur, then Forster 's model predicts no difference between mixed- and pure- frequency reaction times, unless semantic facilitation is unequally balanced across the conditions.)

If pure-frequency presentation allows just a nonspecific optimization of any or all of these processes, then we would expect that all of the frequency classes would show improvement in their pure-frequency con- dition performances relative to their mixed- frequency ones. In fact, it might be rea- sonable to anticipate that low-frequency words might show more improvement in pure- f requency lists than would high- frequency words, since the low-frequency words suffer the brunt of the delay Forster attributes to inexact specification and re- peated bin searches.

More specific improvements might be possible. As Landauer and Streeter (1973) have shown, it seems likely that rare words are more orthographically distinct than fre- quent ones. This dist inctiveness might permit a subject given just a pure low- f requency list to improve his parsing strategy, narrow his bin compositions, and sharpen his acceptance criterion more than he could for the pure high-frequency list. Glanzer and Ehrenreich's (1979) experi- ment and the one we will describe both provide an easy, artificial opportunity for this asymmetric selectivity: since only a limited range of word lengths were tested, the experienced subject could prune many

more low-frequency words than high- frequency ones from his list of possible lexical candidates. This and other types of frequency-specific differences are further reasons why pure-list low-frequency words could show an improvement over their mixed-frequency condition which would be more marked than that of the pure-list high-frequency words.

None of these predictions are unchal- lengeable; not enough is known about the differences between high- and low-fre- quency words even if this model were able to incorporate them. However, given some reasonable extrapolations, it does seem that this type of frequency-ordered serial search model predicts either (1) no difference be- tween pure- and mixed-frequency condi- tions, (2) a uniform improvement for the pure-frequency conditions, or (3) more im- provement for low-frequency words than for high-frequency words in the pure- frequency condition. Glanzer and Ehren- reich's (1979) data again provide only par- tial refutation of these predictions.

Two-Dictionary Model

Glanzer and Ehrenreich (1979) hypothe- sized that subjects have a choice of two internal dictionaries for lexical look-up: one contains only the common, high-frequency words known by the subject; the other contains his complete set of words, high- frequency as well as medium- and low- frequency. They further assumed that sub- jects can choose which dictionary they will consult first when presented with a letter string. If the subject expects only high- frequency words (either because of experi- ence with a pure high-frequency list, as in their experiment, or because of warning in- structions), then he will consult only the high-frequency dictionary to make his deci- sion. If he expects only medium- or low- frequency words, then he will consult his complete dictionary from the start. Glanzer and Ehrenreich (1979) do not propose any mechanism for search within a dictionary; they only postulate that all words in a dic- tionary take the same average time to find.

Page 6: Lexical access and lexical decision: mechanisms of frequency sensitivity

MECHANISMS OF FREQUENCY SENSITIVITY 29

Their experimental results for pure-frequen- cy lists imply that search of the complete dictionary must take considerably longer than search of the high-frequency one.

For both the pure- and mixed-frequency cases, this two-dictionary model permits a number of options. Its predictions depend upon the mixture of search strategies sub- jects are using (what proportion of their searches begin with the complete dictio- nary), and upon the likelihood that the ex- perimenter's and the subjects' categoriza- tion of words by frequency will not be identical (e.g., some words the experi- menter regards as medium frequency may be in a subject's high-frequency dictio- nary). Fortunately, for the case where all low-frequency stimuli are not in a subject's high-frequency dictionary, and where all the presumptive high-frequency words are, we can use the equations Glanzer and Ehrenreich (1979, p. 395) have provided to make some testable predictions. One of these - -which is also apparent qualita- t ive ly- i s that there should be a difference between the low-frequency word condi- tions. This difference must arise because for low-frequency words, pure-list searches will always start with the complete dictio- nary, while mixed-list ones will be slowed on the average by fruitless initial searches of the high-frequency dictionary. As we will show, these and other quantitative con- straints do allow the two-dictionary model to be empirically falsifiable.

Resonance Model

With the exception of Coltheart et aI. (1977), the parallel-access model has not been extensively developed for application to the lexical decision task. The model we propose is similar in many respects to Colt- heart et al.'s (1977) parallel-access, log- ogen-derived (Morton, 1970, 1980) model of the lexical decision task. However, in- stead of using Morton's logogens, our model adopts the parallel-access resonance meta- phor Ratcliff (1978), Larochelle, McClel- land, and Rodriguez (1980), and others have used, along with the hypothesis that

the subject sets a decision criterion (cf. Ratcliff, 1978).

We postulate that the internal repre- sentations of words are exposed simulta- neously, in parallel, to the stimulus infor- mation. By "representa t ion ," we mean whatever information is used in making the lexical decision task; this process model can be neutral about what specific types of codes are used, so long as their separate effects can be lumped together for the situ- ation we are considering now.

A representation will "resonate" more or less strongly depending upon how similar the stimulus information is to its optimal pattern and how intrinsically strong a res- onator it is. We will assume that the actual word stimuli are the best matches for their internal representations, and that all of these word matches are equally good. So differences in intr insic representa t ion strength are the principal determinants of activation differences between words. We will identify these intrinsic resonance strengths as the internal correlate of word frequency (familiarity).

In lexical decision, upon presentation of a word in the subject's vocabulary, its rep- resentation's activation will increase over time. The rate of this increase is approxi- mately proportional to the representation's intrinsic resonance strength, being more rapid for more frequent (more familiar) words. Since the internal resonance is a function of similarity, nonwords may elicit some degree of resonance in the array of internal representations. But, since non- words do not match the representations as closely as words do, the nonword-induced activations will increase more slowly to a lower peak level (Figure la).

The above descriptions all apply to aver- age values. The actual moment-to-moment values of both signal (activation of a real word representation) and noise (adventi- tious nonword-induced activations) have some random component . Early after stimulus presentation, the population dis- tributions of word and nonword-induced activations will overlap to a significant ex-

Page 7: Lexical access and lexical decision: mechanisms of frequency sensitivity

30 BARRY GORDON

tent. They become more separated over time.

Thresholds

In the lexical decision task, the subject is examining the amplitudes of these reso- n ances after p r e s e n t a t i o n of the letter string. To make an accurate decision, she or he sets a thre sho ld for determining whether any activation is strong enough to indicate that a real word has been pre- sented. If the resonance of one of the repre- sentations reaches a sufficiently high am-

plitude, the subject can conclude that a real word was presented. If the resonance am- plitudes never reach this threshold, the subject can assume that what was pre- sented was not a real word. The word fre- quency effect then becomes a consequence o f the rapidity o f growth in r e s o n a n c e strength: the r e s o n a n c e s of higher-fre- quency , more familiar words will grow more briskly to a higher level, so they will reach threshold sooner than will those of l o w e r - f r e q u e n c y r e p r e s e n t a t i o n s . This model is illustrated in Figures 1 to 3.

!

z

Z o

< >

ACTIVATIONS FOR ACTIVATIONS FOR WORD TRIALS NONWORD TRIALS

• . ~ .i~!:~ •

1 1,~,;~ ~.•'~'"

TIME TIME

EXPECTED WORD ISTRIBUTION

EXPECTED NONWORD DISTRIBUTION

.

TIME

FIG. 1. (a) Growth of activation strengths over time of word representations of a single familiarity (frequency) range and of nonword-induced representations. (Only the time period spanned by the word-nonword decision process is shown; activations do not necessarily increase much after this time period, and may even decay in strength at later intervals.) On the average, words lead to a more rapid rise in activation strength to higher levels than do nonwords. (b) The word vs nonword decision problem. At any given point in time for a given frequency class, the decision can be modeled by class/ca/ Signal Detection Theory: selecting a cutoff along the activation axis which optimally dis- criminates presumed word activations (the "signal") from presumed nonword-induced activations ("noise"). The criterion illustrated is the optimal one for equa/ly-likely word- and nonword-trials when all decisions are given equal weight.

Page 8: Lexical access and lexical decision: mechanisms of frequency sensitivity

MECHANISMS OF FREQUENCY SENSITIVITY 31

I I DISTRIBUTION OF WORD ACTIVATIONS

DISTRIBUTION OF

FIG. 2. Resonance model: speed-accuracy tradeoff in criterion setting (for words of a single familiarity/frequency class). Because of the increasing separation of word-induced activations from nonword-induced activations over the time frame of interest, responses can be more accurate with a higher decision threshold, but they will also be slower.

In order to meet the combined experi- mental demands of maximal accuracy and maximal speed of decision, the decision threshold will be set as high as possible above the expected noise distribution, and as low as possible with respect to the ex- pected signal (word) distribution. (For now, we will make the usual assumption that this criterion setting is constant over the critical portion of the experiment.)

Pure Lists . After experience with pure high-frequency words, the subject will take advantage of their rapid rise in excitation strength to set a relatively low decision threshold, guaranteeing rapid yet accurate responses (see Figure 3, left side). For maximal speed with similar accuracy with low-frequency items, the threshold will have to be set higher, because their activa- tions grow slowly, only gradually becoming more distinguishable from nonword noise activations. Therefore, pure high-frequency words will have faster decision latencies than will low-frequency words, and so on. These predictions (see Figure 3, left side) are compatible with Glanzer and Ehren- reich's (1979) data, with the qualifications we discussed earlier.

Mixed Lists . To predict performance for the mixed-frequency lists, we will assume subjects set some compromise threshold value for response. The consequences of

four different threshold setting will be dis- cussed (the third is the one illustrated in Figure 3, right side).

(1) The mixed-list threshold could be set low, at the level it would have for pure- condition high-frequency words. This set- ting seems implausible, since the result-- equivalent high-frequency pure- and mixed- list reaction times--is strongly contradicted by Glanzer and Ehrenreich's (1979) data.

(2) The mixed-list threshold could be set at a medium value, where it would be for a pure-condition medium-frequency word list. So reaction times for medium-fre- quency words in the pure- and mixed- frequency lists would be equal (in agree- ment with Glanzer and Ehrenreich's data). However, this also means that decisions about low-frequency words would be faster in the mixed-frequency lists compared to the pure-frequency one, which contradicts the trend in Glanzer and Ehrenreich 's (1979) data.

(3) The mixed-list threshold could be set high, where it would be for the pure- condition low-frequency word list alone. Low-frequency mixed-list decisions would therefore be as fast as pure-list decisions. However, mixed-list high-frequency deci- sions will now be very much slower than for the pure list. Mixed-list medium-frequency decisions will be slower than pure-list ones;

Page 9: Lexical access and lexical decision: mechanisms of frequency sensitivity

32 BARRY GORDON

this difference could be less than that be- tween the high-frequency conditions. 1

(4) The mixed-list threshold could be set high, but not quite as high as it would be for the pure low-frequency words. In other words, subjects find the list with mixed fre- quencies slightly less difficult overall than they do the list with only low-frequency words. The predictions in this case would differ from those in (3) only for the low- frequency words: they would show slightly faster reaction times in the mixed-list than in the pure-list case.

Note that altering the threshold in the mixed list will not change the ordering of reaction times by frequency (LF > MF > HF), but will affect their absolute values and the error rates.

This resonance model allows speed-ac- curacy tradeoffs to occur. In this respect, we are sympathetic to Kiger and Glass's (1981) in te rp re ta t ion of the d i f fe rence Glanzer and Ehrenreich (1979) found be- tween high-frequency word conditions as a "context effect" of list difficulty, probably mediated by s p e e d - a c c u r a c y tradeoffs. However, here it is not considered a trivial epiphenomenon; it is an intimate part of the model . Finding a s p e e d - a c c u r a c y tradeoff for the lexical access mechanism is theoretically important; neither the serial search nor the two-dictionary models can allow their search speeds to be altered

1 We can make reasonably secure qualitative pre- dictions for the behavior of the high- and low- frequency words because they represent contrasts of both activation rise times and of threshold settings. However, we cannot make such definite predictions about the mixed-list medium-frequency words, since we would have to know more exactly how their acti- vation rate interacts with their decision criteria. Be- cause of this limitation, we can only discuss how they might possibly behave within the resonance model. This same caveat applies to our later discussion of very-high-frequency words; their very rapid activation increases might nullify any effect of different threshold settings (see Gordon, 1981). Ultimately, of course, these experimental situations will become important tests of a quantitative resonance model (Gordon, 1981).

P U R E - F R E Q U E N C Y M I X E D - F R E Q U E N C Y

L I S T S L I S T S

H I G H F R E Q U E N C Y

WORDS MIXED DECISION

. . . . . . . . THR ESHO LD PURE I

DECISION I THRESHOLD --

RT RT PURE MIXED

M E D I U M F R E Q U E N C Y W O R D S MIXED

DECISION DEc~URN E . . . . . . . THRESHOLD

THRESHOLD - - - - [- I [ I

RT RT PURE MIXED

LOW F R E Q U E N C Y PURE WORDS MIXED

THRESHOLD THRESHOLD

RT PURE

MIXED

FIG. 3. Resonance model: predicted reaction times for words of each frequency range. The growth of acti- vation strength is illustrated for each of the word fre- quency classes; the nonword-elicited distribution is not illustrated, but it also grows over time as shown in Figures 1 and 2. The same nonword-induced distribu- tion is assumed for each condition, since the same set of nonwords were used throughout our experiment. The left side illustrates the optimal criterion setting possible for words presented in pure-frequency lists, with their resulting decision times; the right side shows one possible compromise setting for the mixed- frequency condition (discussed as option 3 in the text), with its resultant decision times. RT = reaction time.

without significantly changing the models themselves.

Therefore, the three classes of lexical- access models can lead to different predic- tions about the results of a pure- versus mixed-frequency experiment. Because of this discriminatory power, our experiment used Glanzer and E h r e n r e i c h ' s basic paradigm to try to differentiate the models. In order to make the best use of their data as possible, our actual implementation was also very similar to theirs. However, as we will note, we modified their design and pro- cedure in order to meet our sharpened set of theoretical needs, and to avoid, if possi- ble, some potential problems with their ex- periment.

Page 10: Lexical access and lexical decision: mechanisms of frequency sensitivity

M E C H A N I S M S OF F R E Q U E N C Y SENSITIVITY 33

M E T H O D

Materials

Words were selected exhaustively on the basis of the following joint criteria:

High-frequency words had K u c e r a - Francis (1967) root frequencies between 100 and 385, and total frequencies summed over all regular derivational forms between 100 and 512. This upper limit was chosen to avoid a possible reaction time floor for the very highest-frequency words (Gordon & Caramazza, 1982).

Low-frequency words had K u c e r a - Francis (1967) root frequencies of 0-2 , and summed frequencies of 0-3. Words with Kucera-Francis frequency of 0 (those not indexed) were obtained as necessary (see below) from the Carroll , Davies, and Richman (1971) word count, beginning with the rarest items.

Medium-frequency words were selected to fit the frequency range of 10-33 (root forms) and 10-37 (summed forms), which is essentially the middle point of the log fre- quency range spanned by the high- and low-frequency lists. Since reaction time is approximately proport ional to log fre- quency (cf. Gordon & Caramazza, 1982), we anticipated that reaction times for the medium-frequency words should also be halfway between the other two classes, for maximum discriminability.

The midpoints of each list were separated from one another in terms of frequency by a factor of five or more; even the closest items between lists were still separated by a factor of three or more. For example, the highest-frequency items in the low- frequency list had a frequency of 3, com- pared to a frequency of 10 for the lowest- frequency items of the medium-frequency list.

To minimize heterogeneity, words were between 4 and 7 letters in length. They were either mono- or bisyllabic. 2 All were

monomorphemic, either singular nouns, in- finitive forms of verbs, or adjectives.

The following types of words were excluded from the lists: nonroot forms; pseudoderivational forms (e.g., suitor); compound words; proper nouns; homo- phones (e.g., raze); slang words; words with strong emotional overtones or import; "foreign" words (chic); archaic forms (fe- alty); color and direction terms (white, east); some brand names (dole); highly technical terms (e.g., compile); very visu- ally similar words (tine--time); and words with local significance or confusability.

In addition, both within and between lists, words which were felt to be related associatively, semantically, or antonym- ically (e.g., farm, barn, dairy, cow, and field) by any of three evaluators (B.G., C.M., or J.S.) were eliminated except for one representative (e.g., of army and attack only army was used). This was done to eliminate any complications induced by semantic relatedness.

Whenever a free choice in word selection seemed to exist, we applied the following additional criteria (in order): (1) the word had to be closest to the center frequency of the category; (2) the mean lengths and dis- tribution of lengths for each list had to be approximately equal; (3) there had to be an equal number of words in each list; and (4) the number of words in each list had to be a multiple of twelve (for sublist generation).

Two hundred four words of each type met all of the selection criteria. 2 The origi- nal high- and medium-frequency word lists had to be severely pruned to fit, while the original low-frequency word list from the Kucera-Francis (1967) norms had to be augmented by words from the Carroll et al. (1971) count for balance (as detailed ear- lier).

Nonwords were created to match the overall word set as closely as possible in length, number of syllables, and initial let-

2 Two errors were d i scove red after a s ignif icant n u m b e r of subjec ts had been tested: (1) one low- f requency word (venial) was trisyllabic, and (2) be- cause of final ad jus tments , torpid and torpor were both

included in the pure low-frequency list and in one of the mixed-f requency lists, separa ted by 80 i tems. Re- sponses to the second i tem were ignored in the data analysis .

Page 11: Lexical access and lexical decision: mechanisms of frequency sensitivity

34 BARRY GORDON

ter or letter pair. They were all ortho- graphically and phonologically legal. The same 204 nonwords were used in each of the experimental lists.

From these three lists of words and one list of nonwords, six experimental lists were created (three pure-frequency lists, three mixed-frequehcy lists) using almost exactly the ordering Glanzer and Ehren- reich (1979) and, to a lesser extent, Berry (1971) employed.

One quarter of each of the four source lists was randomly selected to be used as practice items; the remainder served as critical items. Within each source list, the practice and critical items were then ran- domly assigned to one of three practice or critical item groups. These sublists of the words and nonwords were then combined into mixed-frequency experimental lists: each mixed-frequency list contained three practice-item sublists--one from each of the high-, medium-, and low-frequency lists--and three critical-item sublists--one from each of the high-, medium-, and low- frequency lists. Across the test lists, the subcomponent ordering obeyed an ortho- gonal Latin square design (Winer, 1971, p. 687). The pure-frequency lists were formed from the three sublists of their respective frequency type. Each word therefore oc- curred once in a pure-frequency list, and

once in one of the mixed-frequency lists. Each critical-item sublist contained 51 words (50 unique words in the case of one of the low-frequency sublists; see foot- note 2).

Words from the sublists were intermixed. Each word occurred in the same ordinal po- sition in its pure- and mixed-frequency list. The same set of nonwords was used in each of the lists, and each likewise occupied the same position in each list. Using almost the same constraints as Glanzer and Ehren- reich (1979), no more than two successive words could be of the same frequency class, no more than three words or non- words could be in immediate succession, and no successive items began or ended with the same letter. Sample segments of the critical-item portions of these lists are shown in Table 1.

Procedure. Consent and instructions were explained orally. Subjects were told that they would be shown letter strings, which they had to identify as either words or nonwords. Subjects were forewarned about the nature of the word items (very common, familiar, rare, or a mixture of these) for each list. They were also warned that the nonwords might look and sound like actual words.

The subjects were instructed to respond as quickly as possible if the string was a

TABLE 1 SAMPLE LIST COMPOSITIONS

List type

Pure low- Pure medium- Pure high- Mixed- Mixed- Mixed- frequency frequency frequency frequency frequency frequency

list list list list list list

atone trick list atone trick list whiff quarrel hold hold quarrel whiff sneap sneap sneap sneap sneap sneap baven baven baven baven baven baven tithe rally level level rally tithe convex diet game diet convex game guile sustain strong guile strong sustain leab leab leab leab leab leab abame abame abame abame abame abame taint doll common common doll taint imbue flavor fine flavor imbue fine sham rust special sham special rust

Page 12: Lexical access and lexical decision: mechanisms of frequency sensitivity

MECHANISMS OF FREQUENCY SENSITIVITY 35

word, without making more than rare mis- takes. They were not to respond to non- words. This single response technique (Pachella, 1974) was partly motivated by the need for higher experimental resolution; even with 96 subjects, Glanzer and Ehren- reich's (1979) experiment was unable to verify several important contrasts, as we have discussed. The single response tech- nique has been useful for this purpose in other experiments (Gordon & Caramazza, 1982). In these, we directly compared the dual- and the single-response methods be- tween subjects, in lexical decisions for identical sets of words which spanned a fairly wide frequency range. Compared to results with the conventional dual-response technique, single-response reaction times were about 90 milliseconds faster, with only 58% of the variance (paired comparison t(273) = -12, p < .00001), with far lower error rates of any kind (0.4% vs 6.4%, t(276) = -20, p < .00001).

While it is possible that the process re- sponsible for a single-response "yes" deci- sion is different than the one for the dual technique (Pachella, 1974), this seems un- likely on a priori grounds. Furthermore, in our experiments the influence of frequency was the same in both cases; the two meth- ods produced remarkably similar reaction time versus frequency functions (Gordon & Caramazza, 1982). And even if the tech- niques do involve different decision mech- anisms, then any discrepancy from Glanzer and Ehrenreich's findings using the method would be an important check on the origin of the frequency context effects. So we felt justified in sacrificing the dual-response technique in this experiment in order to gain less noisy data about correct word de- cisions, from a different methodological perspective.

Stimulus presentation and response re- cording were done by a microcomputer modified for laboratory use. Each trial was serf-paced by foot pedal. Stimuli appeared as upper-case letters subtending no more than 3.0 ° horizontally and 0.5 ° vertically at a comfortable reading distance. They were

each displayed for a constant period of 2.5 seconds to guarantee equal exposure to all of the classes of stimuli; in Glanzer and Ehrenreich's (1979) experiment, exposure duration was dependent upon response time. Positive responses were made by mi- croswitch held in the dominant hand. The subject was given only general feedback, and only during the practice items; subjects responding slowly were encouraged to go faster. We chose serf-pacing and a relative lack of feedback compared to Glanzer and Ehrenreich's (1979) experiment in order to minimize any dependency of reaction time on the prior response per se. Subjects knew that some initial items were practice, but not explicitly how many were. After every 85 trials, they were given additional break time to allow for computer processing.

After the main body of the experiment, all nonpractice strings to which erroneous responses had been given were displayed. The subject was not explicitly told that these had been errors. Rather, the subject was told that the experimenters wanted to check word/nonword decisions on a few items without time pressure. The subject was encouraged to take his or her time de- ciding if the items were words or not; she or he could also respond °'not sure." These responses were recorded along with the data. If a subject did not know what an item really was, or was not sure, that item was not counted in the analysis of his or her data.

Subjects. Subjects were all native speak- ers of English, between 17 and 36 years of age. All were either undergraduates or had a college education. They were either paid or received research course credit for par- ticipation. One hundred and fifty-one sub- jects successfully completed the testing.

DATA ANALYSIS AND RESULTS

As in Glanzer and Ehrenreich's (1979) design, each subject's first 102 trials were considered familiarization and were not analyzed.

Log transformation. In our data, as ex-

Page 13: Lexical access and lexical decision: mechanisms of frequency sensitivity

36 BARRY GORDON

pected, sublist standard deviations were highly correlated with mean reaction times (r = .89,p = .00001). Sublist variances had slightly less correlation with the means (r = .86, p = .00001). Therefore, the individual reaction times were logarithmically trans- formed prior to averaging (Winer, 1971), and all subsequent analyses are with the transformed data. For presentation in the tables, the averaged data was converted back to milliseconds.

Post-error skipping. Subject responses can be slowed for 3 -5 trials following an error (Burns, 1971; Laming, 1979; Rabbitt & Rogers, 1977); this might have contrib- uted to some of the mixed- versus pure- condition reaction time differences Glanzer and Ehrenreich (1979) found. Our data were examined for any evidence of post-error slowing both by examining the pattern of reaction times for each word with and with- out preceding errors (cf. Kiger & Glass, 1981), and by simply examining whether ig- noring the five trials following an error made a difference in overall results. As can be seen in Table 2, if error-induced slowing makes any contribution at all to the reaction time differences we find, its effect is rather small (possibly because of the self-pacing in our experiment). (Also, some of the differ- ences in Table 2 may be just sampling dif- ferences; with a 6% error rate, skipping five trials after an error eliminates about 15% of the critical items from analysis.) Neverthe- less, to be extremely conservative, all of

our subsequent analyses are with five of the trials after any error (true errors or un- known items) excluded. (True errors were counted when error rates were calculated, however.)

Subject trimming. We compared each subject 's sublist reaction times to their group's performance on each of the three sublists. If any of the subject's sublist reac- tion times differed by 2.5 standard devia- tions or more from the group's, all of that subject's data were excluded from analysis. Three of the 151 subjects were rejected by this cri teria (one from the pure high- frequency condition, one from the pure medium-frequency condition, and one from a mixed-frequency condition).

Our results are presented in Table 3. Since our word selection was exhaustively defined by the criteria we have enumerated, our statist ical t rea tment automat ical ly generalizes over the word population we have defined (Clark 1973; Clark, Cohen, Smith, & Keppel, 1976). For comparison, however, Table 3 also shows the results of the Treatments-by-words subdesign for each condition, analyzed by paired sam- pies' T-tests.

Error comparisons. False negative error rates (reported in Table 3) for the high- and medium-frequency words are quite low. False negative rates for the low-frequency conditions are much higher (8.9% for the mixed, 8.1% for the pure). Kolmogorov- Smirov two-sample tests (of the arcsine

TABLE 2 REACTION TIMES (msec) WITH AND WITHOUT POST-ERROR SKIPPING

Word frequency class List type/post-error

skipping Low Medium High

(By subjects) Mixed-frequency lists

Total list error rates 3.2% 3.2% 3.2% Skip 0 715 568 523 Skip 5 710 566 520

Pure-frequency lists Total list error rates 6.2% 1.6% 1.4%

Skip 0 712 549 481 Skip 5 710 547 480

Page 14: Lexical access and lexical decision: mechanisms of frequency sensitivity

MECHANISMS OF FREQUENCY SENSITIVITY

TABLE 3 CORRECT REACTION TIMES IN MILLISECONDS

37

List type

Word frequency class

Low Medium High

(By subjects) Mixed-frequency list

Pure-frequency list

Difference

Significance Overall Sublists

710 566 520 (8.9%) (0.33%) (0.06%)

710 547 480 (8.1%) (0.48%) (0.20%)

0 18 41

p > .97 p = .15 p = .008 = .67(-) = .54(+) = .08(+) = .91(-) = .25(+) = .06(+) = .55(+) = .49(+) = .05(+)

(By words) Mixed frequency 719 566 521

Pure frequency 707 550 480

Difference 12 16 41

Significance Paired comp. p = .07 p < .00001 p < < .00001

Note. The direction of the reaction time difference for a sublist is indicated after its significance value (all as Mixed minus Pure condition). False-negative error rates for words are in parentheses.

t r ans fo rmed pe rcen t age errors) for any similarity of the error means or distribu- tions (Siegel, 1956) were done. The small differences between the high-frequency and between the low-frequency conditions are far from significant (two-tailedp > .99). For the medium-frequency words , there is a more notable difference between the 0.33% error rate for the mixed list and the 0.48% error rate for the pure list, although this difference does not reach significance (z = 1.2, p = . 11, two-tailed). While this further weakens the rel iabil i ty of the medium- f r equency reac t ion time difference, this contrast is not a critical one for the distinc- tions which will be of major importance (see footnote 1).

Frequency Sensitivities

Our results once again confirm an associ- ation of reaction time with word frequency for the mixed-frequency condition. Of the 74 subjects in the mixed-list condition, 73

show the expec ted ordering of react ion times (HF < MF < LF), and for all, low- frequency reaction times are longer than those of the other two categories. It is nota- ble that this ordering of HF < MF < LF is also true of the pure-frequency condition. For both the mixed- and pure-frequency condi t ions , these f r e q u e n c y effects are highly significant: for the mixed lists, re- peated measures F(2,146) = 707, p < 8 x 10-76; for the pure lists, between-subjects F(2,71) -- 33, p < 1 x 10 -l°. The frequency effect is more p ronounced for the pure- frequency condition (see below).

Condition Effects

Within each frequency category, sublist condition comparisons are strictly between sub jec t s . These were c o m p a r e d with pseudo-T-tests, using separate variance es- timates, since our data generally showed signif icant n o n h o m o g e n e i t y of var iance (Nie, Hull, Jenkins, Stienbrenner, & Bent,

Page 15: Lexical access and lexical decision: mechanisms of frequency sensitivity

38 BARRY GORDON

1975). These results are reported in the ta- bles under the heading "Sublists." These sublist comparisons are combined into a significance measure for each frequency category (the "overall" measure in Table 3). This is done by transforming the sublist probabili t ies to an equivalent z-score, summing these and renormalizing the sum, and then using the total z-score derived from this process as the test statistic (Mos- teller & Bush, 1954, pp. 329-330).

High-frequency word decisions are 41 milliseconds faster in the pure-frequency condition than in the mixed (z = 2.65, p = .008, two-tailed).

For medium-frequency words, there is also a trend for faster reaction times in pure- compared to mixed-frequency lists (in each of the sublist comparisons), although this 18-millisecond difference does not reach significance (z = 1.44, p = . 15, two- tailed), and is also less impressive when the differing error rates are taken into account.

For the low-frequency words, there is no overall difference in pure- versus mixed- frequency conditions (0 millisecond), nor is there a consistent sublist pattern (z -- -.028, p > .97, two-tailed) in the analysis by subjects. The two conditions for the low-frequency words may not be com- pletely identical (on the basis of the analysis by words and their slight inequality in error rates), but the low-frequency words clearly behave very similarly in the two-frequency contexts.

For high- and low-frequency words, our list condition effects are strikingly consis- tent with Glanzer and Ehrenreich's (1979). We also confirm a difference between high-frequency words in the pure- and mixed-conditions, and we also find no sig- nificant difference between the behaviors of low-frequency words in these conditions, with much less experimental uncertainty in our findings than Glanzer and Ehrenreich (1979). This consistency of experimental results between Glanzer and Ehrenreich's experiment and ours is perhaps even more striking when the differences in word

selection and in response methodology are taken into account.

Distribution Comparisons

In addition to the mean reaction times, we also compared the shapes of the group reaction time distributions of each fre- quency class for the pure- and mixed-list conditions (see Figures 4, 5, and 6). These were derived using the procedure described by Ratcliff (1979):

Each individual's set of (untransformed) reaction times for each frequency category was sorted into ascending order. This set was then divided into 15 quantiles (this number being a compromise between res- olution and quant izat ion noise). Each quantile was averaged with corresponding quantiles from other subjects in that cate- gory to give group quantiles. From the group quantiles, a representative probabil- ity function of the group's distribution of reaction times was calculated. This proce- dure carries the distributional characteris- tics of the individual subject's data over into the summary data (Ratcliff, 1979).

The distributional plots (see Figures 4, 5, and 6) lend further support to our state- ments about the mean data: the high- frequency pure condition distribution is clearly faster (and its peak is slightly higher) than that of the mixed-frequency condition; medium-frequency words in the pure con- dition are somewhat faster than when in the mixed condition; for the low-frequency words, the distributions are nearly overlap- ping. It is also important to note that (1) the mixed-list distributions resemble those of their corresponding pure-frequency lists, and (2) all of the distributions appear to be unimodal.

DISCUSSION

Two-Dictionary Model

In Glanzer and Ehrenre ich ' s (1979) model, response processes are strictly in- dependent of the f requency-sens i t ive mechanisms. Therefore, the differences in

Page 16: Lexical access and lexical decision: mechanisms of frequency sensitivity

M E C H A N I S M S O F F R E Q U E N C Y S E N S I T I V I T Y 39

7-

O3

(-J 6- bJ CO

>- I-.- 4-

O3 Z bJ 3- C3

>-

_J

03 C~ 1- O0

,"r 13-

O"

HIGH FREQUENCY DISTRIBUTIONS

r ' I

.: Lrl

i i - i

._i

. . . . . . . . . . . q . . . .

360 400 5~o 660 ' 700 e6o 960 ~o'oo M S E C S •

FIG. 4. Group reaction time distributions for high-frequency words.

= P U R E L I S T

- - - = M I X E O L I S T

I l'O0

response methodo logy be tween their ex- periment and ours should be even less of an issue than it is with the serial search model. Granting the admissibility of our present experimental evidence, we do not confirm the two-dictionary model's prediction of a difference be tween low-frequency condi- tion effects.

Glanzer and Ehrenreich (1979) do not

themselves make this qualitative predic- tion; their actual predictions are charac- terized by a set of model equations. Since our high- and low-frequency word choices almost certainly do not violate the assump- tions of their modeling, we can use their equations to predict what quantitative con- straints of the two-dictionary model should apply to our data.

r_) bJ

>-

F--

O3 Z bJ 0

>-

J

£0

¢'n

MEOIUM FRE(~UENCY DISTRIBUTIONS

: PURE LIST

- - - =NIXEO LIST

-I

i i

L

L-__

360 460 s~o eSo v6o 860 ~6o lo'oo M S E C S .

FIG. 5. Group reaction time distributions for medium-frequency words.

1 lJO0

Page 17: Lexical access and lexical decision: mechanisms of frequency sensitivity

40

b3 O'3

5-

>--

4"

0"3 z ~3 3. c3

>-

_3

~ 1- o l I

O-

BARRY GORDON

LOW F R E Q U E N C Y D I S T R I B U T I O N S

= PURE LIST

- - - = MIXED L IST

" - 1 . . . . ..... ~------------?l

I-i 4~o s~o 6~o 7~o 8~o 9~o 1o'oo ~i'oo

MSECS •

FIG. 6. Group reaction time distributions for low-frequency words.

Glanzer and Ehrenreich (1979) used the following terminology:

h = high-frequency dictionary search time. They estimated h = 72.8 milliseconds (Glanzer & Ehrenreich, 1979, p. 394).

c = complete-dictionary search time. This was estimated to be 237.5 milliseconds (Glanzer and Ehrenreich, 1979, p. 394).

The estimated values of these two pa- rameters should also be good approxima- tions for our experiment, if their two- dictionary model is correct.

r = probability that the subject will choose his internal h i g h - f r e q u e n c y dictio- nary for initial search, when given a mixed-frequency list.

So for high-frequency words, they pre- dict that mixed-condition reaction times will be (r - 1) (h - c ) slower than pure- condition reaction times, on the average. For low-frequency words, this difference is predicted to be rh .

If the difference we observed between conditions for the high-frequency words is substituted into these equations, they pre- dict that a 55-millisecond difference will be found for the low-frequency words. This difference is approximately 3 standard de- viations beyond what we observe. If, in- stead, our low-frequency data is the initial

estimator, then the predicted difference in latencies for the high-frequency conditions approaches (c - h) = 165 milliseconds• This is over 12 standard deviations greater than the difference observed.

It should be emphasized that Glanzer and Ehrenreich did not find a difference be- tween the low-frequency word conditions either; rather, they found a nonsignificant trend which was respected because it fit into the pattern predicted by their model. Given the present experimental evidence, it seems unlikely that for low-frequency words there is any truly significant, observ- able difference between conditions of the magnitude that Glanzer and Ehrenreich (1979) must predict•

Glanzer and Ehrenreich's (1979) model does not explicitly predict reaction time distributions for comparison with our data, but these can be inferred. Pure high- frequency word decisions and pure low- frequency word decisions should depend on searching only a single dictionary (the high-frequency and the complete one, re- spectively). Reaction times for other fre- quency conditions reflect the admixture of these two types of searches: Mixed-list low-frequency latencies are the result of either complete-dictionary searches alone, or of initial high-frequency dict ionary searches followed by successful complete-

Page 18: Lexical access and lexical decision: mechanisms of frequency sensitivity

MECHANISMS OF FREQUENCY SENSITIVITY 41

dictionary ones. Mixed-condition, high- frequency word searches are a mixture of fast high-frequency dictionary retrievals and slower comple te -d ic t ionary ones. Therefore, we might expect these mixed- list distributions to be some linear combi- nation of the search distributions for the two separate dictionaries. If these pure dic- tionary distributions are distinct enough (and, since their means should differ by 165 milliseconds by Glanzer and Ehrenreich's (1979) estimation, they might be), mixed- list reaction times should show a mul- timodal distribution. They do not (see Fig- ures 4 and 6). Furthermore, the shapes of the mixed-list distributions are nearly iden- tical to those of the corresponding pure lists, for all of the frequency classes. So there is little evidence to even suggest that a mixture of search processes is occurring.

Serial Search Models

We have discussed reasons for believing that our results reflect alterations in the lexical decision process, and not trivial re- sponse stage effects. The pattern of results we find is nearly the opposite of any of the ones apparently permitted by the serial search model (as discussed earlier): there are clear differences between the mixed- and the pure-frequency conditions; these differences are most striking for the high- frequency words, and nonexistent or nearly so for the low-frequency ones. So this data both corroborates and extends Glanzer and Ehrenreich's arguments against these models.

Resonance Model

In the "constant" criterion model, we assumed that for pure-frequency lists, deci- sion thresholds are set relatively low for the high-frequency words, highest for the low-frequency words, and at an inter- mediate level for the medium-frequency words. These settings represent an optimal tradeoff with the experimenter's demands of speed and accuracy. With the mixed lists, however, if the thresholds are set too

low (at the pure high-frequency or pure medium-frequency levels), the subject's accuracy for the low-frequency words would be seriously compromised. There- fore, one possibility we considered was that the mixed-list threshold was set high, at the value appropriate for purely low-frequency words. In this case, reaction times for low-frequency words in the mixed condi- tion would equal those in the pure- frequency condition; this is what our ex- periment finds. High-frequency words will have slower mixed-condi t ion react ion times, since the decision threshold is set higher; our data shows a 41-millisecond difference. The model also predicts that this mixed-list criterion will make medium- frequency word decisions somewhat slower than they would be in the pure condition, and that the magnitude of this difference could be between those for the high- and low-frequency words (but see footnote 1). The 18-millisecond difference we find, al- though not significant, might therefore be considered some additional support for the parallebaccess/criterion shift model.

The possible slight difference between the low-frequency conditions, with pure low-frequency words having a tendency to longer reaction times (on a by-items mea- sure) is compatible with the model we have developed; it implies that a list composed entirely of low-frequency words is treated as more "difficult," with a slightly higher criterion setting, than one containing high- and medium-frequency ones too. This would correspond to the fourth possibility for the mixed-list threshold discussed in the Thresholds subsection. If this difference can be confirmed, it would complicate testing the model against other accounts of lexical access, pending quantitative ver- sions of the different explanations.

The model we propose can explain some of the differences between Glanzer and Ehrenreich's (1979) data and our own. They found a greater difference between high- frequency words in mixed versus pure con- ditions (70 milliseconds) than we do (41

Page 19: Lexical access and lexical decision: mechanisms of frequency sensitivity

42 B A R R Y G O R D O N

milliseconds). The resonance model gives some reason to expect a greater difference in their exper iment than in ours (but see foo tno te 1): their h igh- f requency words probably had a somewhat higher average f requency than did ours (their minimum frequency was greater and they did not im- pose an upper limit, as we did). Therefore , their subjects would have been able to set even lower pure-condi t ion thresholds for their high-frequency words than ours could have, contributing to a greater pure versus mixed condition contrast in their experi- ment.

Conversely, because Glanzer and Ehren- reich 's (1979) " m e d i u m " f requency cate- gory was lower in f r e q u e n c y than ours (6 -14 /mi l l ion versus 10-37/mi l l ion , re- spec t ive ly) , the i r " m e d i u m " f r e q u e n c y class would be expected to show less dif- f e r e n c e b e t w e e n c o n d i t i o n s than ours would have. Since our exper iment lacked the power to val idate the 18-millisecond d i f f e r e n c e we found , the i r e x p e r i m e n t would have been even more handicapped. So it is not necessari ly surprising that they failed to find a difference be tween their " m e d i u m " f requency conditions.

Intralist criterion variability. We have been assuming a fixed criterion setting for all the critical items of each list. However , it is conceivable that the criterion adjust- ments could be more flexible. In the ex- treme, the criterion setting might depend on only the single preceding item. Word reac- tion times would then depend on the fre- quency of the preceding item (if a word) as well as on the w o r d ' s own f r e q u e n c y . High-frequency words p receded by high- f r e q u e n c y words should have the same r eac t ion t imes they would have in the pure-frequency list. High-frequency words preceded by low-frequency words should have much slower reaction times. On the a v e r a g e , t h e n , m i x e d - c o n d i t i o n high- f requency word react ion times would be longer than in the pure condit ion. Low- f r e q u e n c y wor ds p r e c e d e d by low-f re - q u e n c y wor ds wou ld b e h a v e as in the

T A B L E 4

CONDITIONAL REACTION TIMES

Word frequency class

Low Medium High

Precursor frequency Mixed lists

(Nonword) 713 567 523 (93) (93) (93)

High 685 556 513 (20) (23) (17)

Medium 750 556 528 (22) (17) (20)

Low 765 572 528 (17) (20) (23)

Maximum difference 80 16 15

Pure lists (Nonword) 706 550 473

(93) (93) (93) High -- - - 488

(6O) Medium -- 550 --

(6O) L o w 706 - - - -

(59)

Note. Analysis by words; number of words averaged in parentheses.

pure-frequency condition, but would have faster react ion t imes when p receded by high- and medium-f requency words; the average condit ion effect would be faster than for the pure-frequency case. We can make the additional prediction that these conditional effects should vary with criti- cal-item f requency, becoming more pro- nounced with lower-frequency words. This is because the shallower slope of lower- f r equency act ivat ions magnifies react ion time changes produced by shifts in the deci- sion criteria.

Our data does show some evidence of such criterion variability. In Table 4, it is broken down within each f requency level by the f requency class of the immediately preceding item (for correct decisions only). High-frequency words in the mixed condi- tion show relatively little sensitivity to the f r eq u en cy of the p r ecu r so r word , while low-frequency words show a great influ-

Page 20: Lexical access and lexical decision: mechanisms of frequency sensitivity

MECHANISMS OF FREQUENCY SENSITIVITY 43

ence. There is also an overall t rend towards longer react ion times for lower-frequency precursors. Of course, we cannot make too much of this data since our exper iment was not explicitly designed to test for condi- tional effects. However , these results are just what we would expect if there is some criterion adjustment based on immediate experience in this task.

Errors. It is not clear how either the serial search model or the two-dict ionary one ac- counts for errors . The re sonance model should not try to predict erroneous word responses (false negatives) without some theory of nonword decis ions, which we have not tried to develop here given our exper imental focus on " y e s " responses . But the resonance model can generate pre- d ic t ions a bou t f a l se -pos i t ive e r ro r s for nonwords: since these responses are made when nonword-elici ted activations exceed the decision threshold, (a) false-positive re- s p o n s e s shou ld be m o r e f r e q u e n t fo r l ower - th re sho ld condi t ions (pure H F > pure MF > pure L F -~ mixed frequency), (b) me a n f a l s e - p o s i t i v e r e a c t i o n t imes should be highly correlated with the mean true-positive times, and (c) the mean false- positive reaction times should be slightly greater than the mean true-posit ive times.

The data goes against predictions (a) and (c), although not conclusively. On the aver- age, subjects made 3.6 errors each on the m i x e d - f r e q u e n c y l is ts . In c o m p a r i s o n , subjects made an average of 5.4 errors each on the pure low-frequency list (t (97) = 1.7, p = . 10, two-tailed, using a pooled-variance approximation to the t test (Nie et al., 1975, pp. 2 6 9 - 270)); 3.3 e r ro r s on the pu re medium-frequency list, t (97) = - . 3 3 , p = .75; and 3.0 e r r o r s on the p u r e high- f requency list, t (96) = - . 68 , p = .50. And of the comparisons of these errors ' react ion times against the average for each list (for those subjects who made errors), only the pure low-frequency list shows a difference in the p red ic ted d i rec t ion of +134 mil- liseconds (paired-comparison t (20) = +2.9, p < .01). The others are in the opposite direction: for the mixed- f requency lists,

- 6 3 milliseconds, t (57) = -1 .8 , p = .08; for the pure medium-frequency condition, - 7 2 milliseconds, t (17) = -1 .3 , p = .22; for the pure high-frequency condition, - 4 3 milliseconds, t (20) = -2 .6 , p = .02. (In each case, the actual statistical testing was with the log-transformed reaction times, as before.) This data and these trends are cer- tainly disquieting for the resonance model as it stands. They may be an equally dif- ficult test for the serial search and the two- dictionary models as well, when these ex- planations are developed enough to make error predictions.

To summarize our perspective: This data shows that a parallel, resonance-like access mechanism, coupled with a variable deci- sion criterion, should be considered a via- ble and testable explanation of lexical ac- cess and lexical decision. This same evi- dence reveals some major limitations of the current serial search models and of Glanzer and E h r e n r e i c h ' s (1979) two-d i c t i ona ry model. In addition, the possibility of se- quential i t em - f r eq u en cy ef fec ts and the necessi ty of accounting for erroneous re- sponses pose further experimental oppor- tunities for comparing the different models, and for developing new ones.

REFERENCES

BECKER, C. A. Allocation of attention during visual word recognition. Journal of Experimental Psy- chology: Human Perception and Performance, 1976, 2, 556-566.

BECKER, C. A., & KILLION, T. H. Interaction of vi- sual and cognitive effects in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 1977, 3, 389-40•.

BERRY, C. Advanced frequency information and ver- bal response times. Psychonomic Science, 1971, 21, 151-152.

BURNS, J. T. Error-induced inhibition in a serial reac- tion time task. Journal of Experimental Psychol- ogy, 1971, 90, 141-148.

CARROLL, J .B . , DAVIES, P., & RICHMAN, B. The American Heritage word frequency book. Boston: Houghton Mifflin, 1971.

CLARK, H. H. The language-as-fixed-effect fallacy: A critique of language statistics in psychological re- search. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 335-359.

CLARK, H. H., COHEN, J., SMITH, J. E. K., KEPPEL,

Page 21: Lexical access and lexical decision: mechanisms of frequency sensitivity

44 BARRY GORDON

G. Discussion of Wike and Church's comments. Journal of Verbal Learning and Verbal Behavior, 1976, 15, 257-266.

COLTHEART, M., DAVELAAR, E., JONASSON, J. T., & BESNER, D. Access to the internal lexicon. In S. Dornic (Ed.), Attention and performance VI. New York: Academic Press, 1977.

FORSTER, K. I. Accessing the mental lexicon. In R. J. Wales & E. Walker (Eds.), New approaches to language mechanisms. Amste rdam: Nor th- Holland, 1976.

FORSTER, K. I. Accessing the mental lexicon. In E. W. Walker (Ed.), Explorations in the biology of lan- guage. Montgomery, Vermont: Bradford Books, 1978.

FORSTER, K. I. Frequency blocking and lexical access: One mental lexicon or two? Journal of Verbal Learning and Verbal Behavior, 1981, 20, 190-203.

FORSTER, K. I., & BEDNALL, E. S. Terminating and exhaustive search in lexical access. Memory and Cognition, 1976, 4, 53-61.

GLANZER, M., & EHRENREICH, S. L. Structure and search of the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 1979, 18, 381-398.

GORDON, B. Lexical access and lexical decision: Mechanisms of frequency sensitivity. Unpub- lished doctoral dissertation, Johns Hopkins Uni- versity, 1981.

GORDON, B., t~ CARAMAZZA, A. Lexical decision for open- and closed-class words: Failure to replicate differential frequency sensitivity. Brain and Lan- guage, 1982, 15, 143-160.

KIGER, J. I., & GLASS, A. L. Context effects in sen- tence verification. Journal of Experimental Psy- chology: Human Perception and Performance, 1981, 7, 688-700.

KUCERA, H., & FRANCIS, W. S. Computational analysis of present-day American English. Provi- dence, R.I.: Brown Univ. Press, 1967.

LAMING, D. Choice reaction performance following an error. Acta Psychologica, 1979, 43, 199-224.

LANDAUER, T. K. Memory without organization: Properties of a model with random storage and undirected retrieval. Cognitive Psychology, 1975, 7, 495-531.

LANDAUER, T. K., t~ STREETER, L. A. Structural dif- ferences between common and rare words: Fail- ure of equivalence assumptions for theories of word recognition. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 119-131.

LAROCHEELE, S., MCCLELLAND, J. L., & ROD- RIGUEZ, E. Context and the allocation of re- sources in word recognition. Journal of Experi- mental Psychology: Human Perception and Per- formance, 1980, 6, 686-694.

METCALFE, J., & MURDOCK, B. B. An encoding and retrieval model of single-trial free recall. Journal of Verbal Learning and Verbal Behavior, 1981, 20, 161-189.

MORTON, J. A functional model for memory. In D. A. Norman (Ed.), Models of human memory. New York: Academic Press, 1970.

MORTON, J. Word recognition. In J. Morton & J. C. Marshall (Eds.), Psycholinguistics 2: Structures and processes. Cambridge, Mass.: MIT Press, 1980.

MOSTELLER, F., 8¢ BUSH, R. R. Selected quantitative techniques. In G. Lindzey (Ed.), Handbook ofso- cialpsychology. Reading, Mass.: Addison-Wes- ley, 1954. Vol. I.

NIL, N. H., HULL, C. H., JENKINS, J. G., STEIN- BRENNER, K., • BENT, D. H. sPaS: Statistical Package for the Social Sciences. New York: McGraw-Hil l , 1975. 2nd ed.

O'CONNOR, R. E., & FORSTER, K.I. Criterion bias and search sequence bias in word recognition. Memory and Cognition, 1981, 9, 78-92.

PACHELLA, R, G. The interpretation of reaction time in information processing research. In B. Kan- towitz (Ed.), Human information processing: Tutorials in performance and cognition. Potomac, Maryland: Edbaum, 1974.

RABBITT, P. M. A., 8¢ RODGERS, B. What does a man do after he makes an error? An analysis of re- sponse programming. Quarterly Journal of Ex- perimental Psychology, 1977, 29, 727-743.

RATCLIFF, R. A theory of memory re t r ieval . Psychological Review, 1978, 85, 59-108.

RATCLIFF, R. Group reaction time distributions and an analysis of distribution statistics. Psychological Bulletin, 1979, 86, 446-461.

RUBENSTEIN, H., GARFIELD, L., & MILLIKAN, J. A. Homographic entr ies in the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 1970, 9, 487-494.

RUBENSTEIN, H., LEWIS, S. S., & RUBENSTEIN, M. A. Evidence for phonemic recoding in visual word identification. Journal of Verbal Learning and Verbal Behavior, 1971, 10, 645-657.

SEIGEL, S. Nonparametric statistics for the behavioral sciences. New York: McGraw-Hil l , 1956.

STANNERS, R. F., & FORBACH, G. B. Analysis of let- ter strings in word recognition. Journal of Ex- perimenta~Psychology, 1973, 98, 31-35.

STANNERS, R. F., JASTRZEMBSKI, J .E . , & WEST- BROOK, A. Frequency and visual quality in a word -nonword classification task. Journal of Verbal Learning and Verbal Behavior, 1975, 14, 259- 264.

STERNBERG, S. Memory scanning: New findings and current controversies. Quarterly Journal of Ex- perimental Psychology, 1975, 27, 1-32.

WHALEY, C.P. Word-nonword classification time. Journal of Verbal Learning and Verbal Behavior, 1978, 17, 143-154.

WINER, B. J. Statistical principles in experimental de- sign. New York: McGraw-Hil l , 1971.2nd ed.

(Received August 3, 1981)