productivity measurements applied to ten english prefixes: a

41
Department of English Bachelor Degree Project English Linguistics Spring 2012 Supervisor: Alan McMillion Productivity Measurements Applied to Ten English Prefixes A comparison of different measures of morphological productivity based on ten prefixes in English Linnéa Joandi

Upload: others

Post on 11-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Department of English

Bachelor Degree Project

English Linguistics

Spring 2012

Supervisor: Alan McMillion

Productivity

Measurements Applied

to Ten English Prefixes A comparison of different measures of

morphological productivity based on ten

prefixes in English

Linnéa Joandi

Productivity Measurements

Applied to Ten English Prefixes A comparison of different measures of morphological productivity

based on ten English prefixes

Linnéa Joandi

Abstract

Morphological productivity is difficult to define and describe. Nevertheless have several

measures been proposed by scholars, in order to quantify this notion. This paper

investigates ten common English prefixes with meanings related to degree or size. The

aims of the study are (1) to review several measures of morphological productivity, (2)

via a sample of corpus occurrences of ten prefixes, to calculate productivity figures

using five different measures of productivity, and (3), perhaps most importantly, to

discuss the differences and similarities of the five measures. The results suggest that

while several of the measures are quite similar (e.g. type frequency and hapax legomena

frequency), other measures are different (e.g. 'Productivity in the narrow sense'). While

three of the measures could be said to provide information concerning past or 'factual'

productivity, two of the measures seem instead to indicate an aspect of productivity that

is referred to as potential productivity.

Keywords

Morphological productivity, word-formation, prefixation, type frequency, token

frequency, hapax legomena, 'Productivity in the narrow sense'

Contents

1. Introduction ................................................................................1

2. Background .................................................................................2

2.1 Theoretical issues ..........................................................................2

2.1.1 Definitions of morphological productivity ...........................................2

2.1.2 Others issues concerning the notion of productivity ............................3

2.1.3 Approaches to the issues concerning the notion of productivity ............4

2.2 Methodological issues ...................................................................4

2.2.1 The corpus ...................................................................................4

2.2.2 The selection criteria .....................................................................5

2.2.3 The data ......................................................................................5

2.3 Notions ......................................................................................5

2.3.1 Type frequency ............................................................................5

2.3.2 Token frequency ..........................................................................6

2.3.3 Hapax legomena ..........................................................................6

2.3.4 Neologisms .................................................................................7

2.3.5 'Productivity in the narrow sense' ...................................................7

2.3.6 Method Q – type frequency related to token frequency ......................8

2.4 Research questions .....................................................................8

3. Methodology ...............................................................................8

3.1 The data and the BNC ..................................................................9

3.1.1 The BNC .....................................................................................9

3.1.2 The data .....................................................................................9

3.2 The prefixes ...............................................................................9

3.3 Selection criteria ..........................................................................12

4. Results .......................................................................................12

4.1 Results of type frequency ..............................................................12

4.2 Results of token frequency .............................................................14

4.3 Results of hapax frequency ............................................................15

4.4 Results of 'Productivity in the narrow sense' .....................................15

4.5 Method Q – Type frequency related to token frequency ......................16

4.6 Correlations .................................................................................17

4.7 Summary of results .......................................................................17

5. Discussion ...................................................................................18

5.1 Sizing up the productivity measures ................................................18

5.2 Measurement correlations ...............................................................20

5.3 Hapaxes´ importance concerning productivity ...................................20

5.4 Conclusion ....................................................................................20

References ......................................................................................22

Appendix A ......................................................................................24

Stockholms universitet

106 91 Stockholm

Telefon: 08–16 20 00

www.su.se

1

1. Introduction

Word-formation, also referred to as word-coinage or just coinage, has gone from being a

neglected area of linguistic research to becoming a field that has received a lot of attention

in the recent decades (Bauer, 1983; Bauer, 2005; Fernández-Domínguez et al., 2007).

Word-formation is the process by which new words are constructed from smaller elements

(affixes, other words, or morphemes) (Plag, 2003).

Many scholars have recently realized the value of morphological and word-formation

studies because of the general relevance to the broad notion of linguistic productivity

(Bauer, 1983). This can especially be seen by the numerous studies that have been

published within this area of linguistic research in the recent decades (see for example the

list of recent work mentioned in Fernández- Domínguez et al., 2007). Furthermore, this

study has chosen to focus on prefixes, and not suffixes, because they have received less

attention in morphological studies so far (Lehrer, 1995).

Many word-formation studies deal with the concept of morphological productivity, in

short, also referred to as productivity. Productivity can be viewed as the probability of

morphological-rules or affixes to be used in the production or comprehension of new word-

forms. It refers to “the property of an affix or a morphological process [word-formation

rule,] to give rise to new [word] formations” (McMahon, 2006: 122; Bauer, 1983: 18; Plag,

2003: 44). A word-formation rule or affix is considered productive if it has the ability to

coin new words by other word-formation processes. In contrast, if it is unproductive, new

coinages will not (in general) take place (Plag, 2006). As will be seen below, the concept of

morphological productivity is not unproblematic and there are several issues, both

theoretical and methodological, that remain unresolved.

Morphological productivity still remains an important notion however, despite the

problems, because it is widely used in linguistics and because it not only concerns

morphology, but also syntax, lexicology and phonology. While there seems to be a

consensus concerning the importance of productivity studies (Bauer, 1983), the current

research situation has been described as being “in a rather poor state” (Bauer, 2001: 25).

The study described below investigates several productivity measures that have been

encountered in the literature and that, additionally, have been applied on the ten English

degree/size prefixes in this study. The British National Corpus (BNC) has been used in

order to collect data for these prefixes. This paper also considers several research articles

within the area of morphological productivity as well as other relevant literature (see for

example Bauer, 2001; Bauer, 1983; Fernández- Domínguez et al., 2007; McMahon, 2006

and Plag, 2003).

2

The broad aims of this paper are the following:

1. Analyse the notion of morphological productivity.

2. Investigate the morphological productivity of ten English prefixes by using several

measures that are frequently encountered in the relevant literature.

3. Compare and evaluate these morphological productivity measures based on the results and

methodology of this study.

This study is qualitative in nature. The general aim is to explore notions and measurements

of morphological productivity. Although a sample of ten prefixes is selected from searches

in the British National Corpus for estimating quantities, the counts are meant to be

indicative of the usefulness of various measures, not to quantitatively study the occurrence

of types and tokens of the prefixed items.

Section 2 of this paper will (1) present some background information concerning

morphological productivity, (2) discuss different definitions of relevant morphological

concepts as well as (3) discuss various issues that affect both the theoretical and

methodological study of productivity. Section 3 reviews the methodology applied in this

study while section 4 presents the results. Section 5 will discuss the results and offer some

conclusions.

2. Background

Bauer points out that “there is, at the moment, no single ´theory of word-formation´, nor

even agreement on the kind of data that is relevant for the construction of such a theory”

(Bauer, 1983: 1), and this, apparently, continues to be the case (Bauer, 2001; Bauer, 2003;

van Marle, 1992).

van Marle expresses a ‘common sense’ view of morphological productivity where it is

simply a property of morphological patterns that gives rise to new words, although he

recognizes that this description is in need of further definition (van Marle, 1992).

Consequently, it seems reasonable to address some of these definitional problems before

investigating specific measures of morphological productivity.

2.1 Theoretical issues

2.1.1 Definitions of morphological productivity

The term morphological productivity, often referred to as productivity in morphological

research, was defined by Hockett (1958) as a “property of language which allows us to say

things which have never been said before” (cited in Bauer, 2001, p. 1). Chomsky (1965)

later relates to productivity as the creativity of a language (as cited in Bauer, 2001, p. 1).

Shultink (1961) views productivity as the possibility for users of a language to

unintentionally produce a (more or less) uncountable number of formations (cited in Bauer,

2001, p. 1). His definition is thus somewhat similar to Hockett´s since they both stress the

3

potential of a language to coin new words. Bauer (1983) claims that a productive process is

one that can be used synchronically in the production of new word-forms. Bauer (2001)

goes on to say that while there may be features of morphological processes that allow for

new coinages, to be productive, these features must give rise to some degree of repetition in

the speech community. Plag (2006) in contrast, defines productivity as a feature of an affix

(rather than one of a language as Hockett proposes), or one of a morphological process

(Bauer, 2001). He claims that productivity is a property of the affix or a morphological

process that is used in order to produce "[word-] formations on a systemic basis” (Plag,

2003: 44; Plag, 2006: 122). There seems to be considerable agreement on this general

definition (Plag, 2003; Plag, 2004; Bauer, 1983; Plag, 2006). Yet another definition of

morphological productivity is that of Baayen which says that “the term 'morphological

productivity' is generally used informally to refer to the number of words [the type

frequency of an affix] in use in a language community that a rule describes” (Baayen,

2012). The basic intuition underlying the term is perhaps reflected in Plag et al. (1999)

where it is claimed that "productivity is generally loosely defined as the possibility to coin

new complex words according to the word formation rules of a given language" (p. 10).

Definitions of morphological productivity are thus based on several different constructs:

1. A definition based on a language's potential to coin new words (see Hocket´s definition as well as

Chomsky´s term creativity cited in Bauer, 2001, p. 1).

2. A definition based on the potential of language users to unintentionally coin new words (Shultink

(1961) cited in Bauer, 2001 p. 1).

3. A definition based on the availability of processes at the time of coinage, i.e. processes that can

produce new words when necessary (Bauer, 1983).

4. A definition based on the assumption that it is a feature of morphological processes (and not a whole

language nor an affix) that allows coinage (Bauer, 2001).

5. A definition based on the assumption that it is a feature of affixes, and not morphological processes

that enables the morphological productivity (Plag, 2002).

6. A definition based on the assumption that it is the features of affixes or morphological rules that

enables the morphological productivity (Plag, 2006).

7. A type-frequency based definition that presupposes that the words are in use in the language

community at the time productivity is estimated in order to be considered productive (Baayen,

2012).

2.1.2 Other issues concerning the notion of productivity

While the notion of productivity is struggled with (Aronoff, 1976; Bauer, 2001; Bauer,

2005; van Marle, 1991), there are three additional issues on which there is no consensus

that need to be addressed, namely:

1. What it is that is productive or unproductive; whether it is a particular affixes, morphological

processes or words themselves (Bauer, 2001)

2. Whether productivity is an (a) all-or-nothing process, (b) whether it can be analysed as one of three

different degrees of productivity (non-productive, intermediate and fully productive (Bauer, 2001;

Ljung, 2003)) or (c) whether it can range along a scale (Bauer, 2001).

3. When an affix, word or morphological-rule is to be considered productive. An affix could be said to

4

have been productive in the past, to be currently productive, or to be potentially productive (see

Bauer, 2005 or section 2.1.3 step 3 below where the different cases are explained and exemplified).

According to Plag (2006), the second problem stated here above is one of the main issues

concerning the nature of productivity. It is concerned with whether productivity is a

qualitative or a quantitative notion. The qualitative approach assumes that affixes or

processes either have the feature of productivity, or that they do not. The quantitative

aspect, on the other hand, adopts the approach that productivity is a gradient whereby

morphological processes (or affixes) can be more or less productive than others, and that

those that are not productive at all, or those that are very productive, only mark the

beginning and the end of a productivity scale.

2.1.3 Approaches to the issues concerning the notion of productivity

The above mentioned issues make it necessary to adopt certain assumptions in order to

enable a corpus-based study of the productivity of affixes. For the study described below,

the following assumptions are made:

1. What it is that is productive: That morphological productivity is a property of morphological

processes or affixes, which can give rise to new words (Plag, 2003; Plag, 2006; Bauer, 1983; Adams

(1973) cited in Plag, 2006, p. 122; and Spencer (1991) cited in Plag, 2006, p. 122). It is thus not

words or languages as a whole that have the property of being productive or unproductive.

2. Whether it is an all-or-nothing feature, a three-step scale, or a continuous scalar: In agreement with

Bauer (2005), it will be assumed that morphological productivity ranges from unproductive to

productive on a continuous scale.

3. When an affix or morphological process is considered to be productive: It is assumed below that

whether (and when) an affix or process is productive will depend on the measurements used, and the

kinds of productivity they consider (either past, current or potential productivity), this is further

discussed below in section 5. The different time aspects of productivity can be exemplified by the

following affixes:

The suffixes -ter and -th (laughter, length) were productive in the past, but are no longer considered

to be productive as they are not used in coinages of new words (Plag, 2006).

The suffix -ness (indecisiveness) is currently productive because it is used in the production of new

word-forms (although it has not been as productive in the past) (Plag, 2006).

The prefix over- has been productive (over-empty) and is currently productive (over-administer,

over-charged). Moreover, it is also potentially productive according to measures such as

'Productivity in the narrow sense' for example (see section 2.3.5).

2.2 Methodological issues

2.2.1 The corpus

The current study makes use of the British National Corpus (BNC), which is a general

corpus containing a representative language sample of British-English from the late 1980s

and early 1990s. Clearly, this corpus cannot be the bases of a diachronic study of

productivity, but should be adequate for synchronic and potential productivity studies.

Historical corpora (e.g. the Oxford English Dictionary (OED)) or genre-specific corpora

would of course be more useful for measures of diachronic productivity (Plag, 2006; Plag

et al., 1999).

5

2.2.2 The selection criteria

An important issue in word formation studies is exactly what is to count as an example of

the relevant category. In some previous studies, criteria for inclusion/exclusion of types or

tokens have not been stated explicitly. Bauer (2001), for example, writes about calculating

the productivity of the suffix -ment, and says that "when irrelevant words have been

deleted, this leaves 1,110 words containing the affix -ment" without giving any further

specification of what would be considered irrelevant (p. 8). Such lack of explicitness

makes replication and other studies difficult. In the methods section below, this issue is

further discussed and the selection criteria for this study are stated explicitly.

2.2.3 The data

It is generally accepted that different genres show different degrees of productivity of

different affixes (Plag et al., 1999; Plag, 2002, 2006; Baayen & Renouf, 1996). Plag et al.

(1999) among others stress the importance of different genres in morphological

productivity studies and claims that some texts have a higher frequency of certain affixes

than others. For example, derivational affixes have been shown to be more productive in

written than in spoken language (Plag, 2006).

The data used for the current study will be assumed to apply quite generally to current

British English since the BNC is considered a balanced and general corpus (Plag, 2002).

2.3 Notions

2.3.1 Type frequency

Type frequency is considered to be the frequency of different words, where 'words' refers

to what is often called lexemes (i.e. abstract words disregarding inflectional variation). Plag

(2006) writes that word types are simply different words (presumably, he means lexemes).

This rough definition of word-type is not without problems. For affixes such as super-,

arch- and re-, one would like to know whether items such as names (e.g. Supermec), loan

words (e.g. architecto), hyphenated items (e.g. superstars-in-the-making), and conversion

forms (e.g. rebound as noun or verb) should be included as word types in type-frequency

counts or not. Such items are given as lemmas in the Lancaster interface of the BNC.

Additionally, more precise definitions or a set of selection criteria would be helpful for

anyone carrying out studies on such lemmas.

Another issue concerning types is that of combined- and lexicalized forms. While a

particular word may have been historically a combined form (e.g. respond), it may no

longer be processed as one, but might rather be processed as a single, 'lexicalixed', item.

Thus, historically, we would have a prefix+stem item word, but psycholinguistically we

might have a single item. Even though there are often pronunciation changes that

accompany lexicalization (compare rebound with respond for example), it might be

difficult to determine at what point a word form becomes lexicalized (see e.g. Hay &

Baayen, 2001). A pronunciation change-check has nevertheless been applied to the lemmas

in this study in order to spot this kind of changes (see section 2.3 and 3.3 for explanation).

6

Another issue concerning type frequency (as well as token frequency) is that some

researchers claim that it is only indirectly related to productivity. The ground for this claim

is that some affixes, for example the suffix -ment, has a “high” type frequency without

being used in current word-coinage (Lehnert (1971) cited in Bauer, 2001, p. 48). In the

present-day English corpus The Barnhart Dictionary of new English, only one new word is

listed (Englishment) (Bauer, 2001). Additionally, Bauer (2001) and Plag (2002, 2006) state

that type frequency is rather a result of past, rather than present, productivity. Certainly, if

productivity is only looked at from a synchronic point of view and a suffix like -ment is not

used in the production of new words, then the underlying word-formation processes that

handle the suffixation should be considered unproductive. As Bauer and Plag claim that the

type frequency is the result of past productivity, one way of approaching this difficulty is

simply to view productivity from a more diachronic point of view (Bauer, 2001; Plag,

2002).

2.3.2 Token frequency

Token frequency refers to the number of times a word form (or lexeme) occurs in a text or

corpus. In general, a word with a high token frequency indicates that the word is more

commonly used than a word with a low token frequency and is therefore considered to be

more productive (Fernandez-Dominguez et al., 2007; Hay & Baayen 2001). The problems

mentioned above concerning word types certainly apply equally to the selection of tokens

as well, i.e. what is to be included as a token of a particular type (names, loans and/or

expressions). Token frequency, in similarity with type frequency, measures past

productivity.

2.3.3 Hapax legomena

Hapax legomena or hapaxes, as they are often referred to, are forms that only occur once in

the corpus (Bauer, 2005; Plag, 2006). They have a type frequency, as well as a token

frequency, of one and are considered to estimate prefixes current productivity (Plag, 2006).

The number of hapaxes for a particular affix is an important measurement of its

productivity on the grounds that new words will be rare, or newly coined, and consequently

will only occur with a very low frequency, often once (Plag et al., 1999). Another reason is

that many hapaxes of a given affix may indicate many neologisms, and many neologisms in

turn indicate high productivity of the affix or its morphological process. The proportion of

neologisms would therefore be an indication of the likelihood of meeting a newly coined

word (Fernandez-Dominguez et al., 2007; Plag, 2006; Lehrer, 1995; Bauer, 1983; Plag et

al., 1999). The number of hapaxes is thus considered to be an important measurement for

estimating morphological productivity (Plag, 2006; Plag et al., 1999; Plag, 2003; Baayen

and Renouf, (1997) cited in Plag et al., 1999, p. 12).

A large number of hapaxes implies what Plag (1999) and Bauer (2001) refer to as

availability, i.e. that a process can be used in order to produce new words. A word-

formation rule is considered to be productive (available) if the morphological processes,

concerning a given affix, result in many low-frequency words (such as hapaxes) and a low

7

number of high-frequency words (Fernandez-Dominguez et al., 2007, Baayen and Renouf

(1996) cited in Plag et al., 1999, p. 12; Plag, 2003; Plag, 2006). We can subsequently

reason that the larger the number of hapaxes is in relation to the token frequency, the

greater the productivity of that affix (Plag, 2003; Plag, 2006).

One problem with hapax legomena is that there are several definitions of the notion in the

literature (Fernandez-Dominguez et al., 2007). An additional problem is that hapaxes vary

with corpora, whether it is genre-specific or general, and corpus size (Plag et al., 1999;

Plag, 2003; Plag, 2006). As the corpus size increases, words that were hapaxes in a small

corpus become words with a higher token frequency in a larger corpus, i.e. the set of

hapaxes changes (Plag, 2006; Plag, 2003). Despite this, hapaxes in this study are looked at

in a general sense, referring to lemmas with a token frequency of one that may or may not

appear in dictionaries.

2.3.4 Neologism

Another frequency model for determining the current or contemporary productivity of an

affix is the number of newly coined words in a given period of time, the so called

neologisms (Plag, 2006; Plag, 2003; Lehrer, 1995; Bauer, 1983). A hapax legomena may

be an occurrence of a low frequency item that happens to only appear once in a corpus, a

coinage, or a neologism. Tests for contemporary neologisms would normally exclude items

that occur in dictionaries or occur regularly in certain genres.

With a sufficiently large corpus, the proportion of neologism among the hapax items

should increase so that the number of hapaxes can be used as an estimation of the number

of neologisms, and subsequently of the productivity of an affix. The hapax legomena are, in

the study below, not scrutinized for whether they are neologisms or not.

2.3.5 'Productivity in the narrow sense'

As mentioned above, the number of hapaxes is considered to be a measure of productivity,

the more hapaxes of a specific affix there are, the more productive is the prefix considered

to be (Fernandez-Dominguez et al., 2007; Plag, 2006, 2002; Plag et al., 1999; Baayen &

Renouf, 1996). If the number of hapaxes is related to the overall token frequency of all the

types with a particular affix, the resulting quotient will vary positively with the number of

hapaxes and negatively with the token frequency (van Marle, 1992). This quotient (P) can

be viewed as a measure of the affix´s likelihood to create new words and to, thus, be

productive1. Plag (2006) credits Baayen with this measure that is referred to as

'Productivity in the narrow sense':

P = hapax frequency / token frequency

1 In the limiting case where the hapax frequency equals the token frequency, the quotient of 1 should not be

interpreted as 'certainty' in any sense.

8

Since the hapax frequency depends on the corpus, its size and genres (see section 2.3.3

above), the P measurement will be highly corpus specific. Consequently, this measure is

difficult to compare across corpora. Nonetheless, this measure will be considered in the

study described below where (only) the BNC has been used.

2.3.6 Method Q - Type frequency related to token frequency

Chitashvili & Baayen (1993) (cited in Plag et al., 1999, p. 11) are credited for describing

the situation in which the number of hapaxes will approximate half of the observed

vocabulary size in a “sufficiently large” corpus. This distribution of hapaxes is referred to

as a "Large Number of Rare Events" (LNRE).

As will be seen below (see tables 4.2 and 5.1), the hapax frequency (n1) for individual

affixes seems to be roughly half the total type frequency for that affix (Vtot, type frequency

including hapax frequency). This implies that using type frequency instead of hapax

frequency in the calculation of P will produce a very similar quotient as that one of

'Productivity in the narrow sense'. One might therefore ask whether hapax frequency is

providing so much more important information than simple type frequency. Consequently,

in the study described below will a calculation using type frequency (excluding hapax

frequency (V)), instead of hapax frequency, be compared to the P values (which are based

on hapax frequency). This quotient (Q) is simply total type frequency minus hapax

frequency divided by the token frequency of a particular affix:

Q = type frequency (excluding hapax frequency) / token frequency

2.4 Research questions

Based on the discussion above, several research questions can now be formulated.

(1) How do the prefixes compare using the different measures of productivity.

(2) To what extent do the measures correlate with each other.

(3) To what extent does the occurrence of hapax legomena provide useful information

concerning productivity.

3. Methodology

This study comprises the methods described in the preceding section. The direct

measurements include type frequency, token frequency, and hapax legomena, with

'Productivity in the narrow sense' (P) and Method Q (Q) calculations based on these

measurements.

9

3.1 The data and the BNC

3.1.1 The BNC

The data used in this study is derived from the British National Corpus (BNC). The BNC is

a collection of 100 million British English words of written (90%) and spoken (10%)

language-samples from a broad range of contexts ranging from radio shows to

governmental meetings and is collected between the years 1970s-1993. This is a

monolingual, synchronic, very general corpus and is considered by Plag et al. (1999) and

Plag (2002) among others, to be a balanced corpus for use in morphological corpora-based

studies.

3.1.2 The data

The data for this study have been taken from the entire corpus, without regard to genre, text

type, or any sociolinguistic variables.

3.2 The prefixes

The prefixes used in this study were selected on the basis of meaning, viz. degree and/or

size (Ljung 2003: 70) and comprise the following: arch-, hyper-, mega-, mini-, over-, out-,

semi-, super-, ultra-, under-. Because a rigorous and minute study of each type and token

for each of the prefixes was considered beyond the scope of this project, a random sample

of 50 word types for each prefix type was carried out (see sampling procedure below). The

central aim of the sampling was to estimate the number of types, tokens and hapax

legomena that fulfilled the selection criteria (see section 3.3). The sampling procedure was

the following:

1. For each prefix, the initial list of types based on a lemma search for each prefix (using the

Lancaster interface of the BNC) was scanned for obvious typographical errors and

nonsense strings (e.g. lemmas including numbers and/or signs (out12, minister.he and

hyper.0/1) etc.). These were removed.

2. In order to get a random sample of 50 types for each prefix (see Appendix A), the list of the

remaining types (i.e. after step 1, which included both hapaxes and non-hapaxes), was

divided by 50, giving a quotient q.

3. A random number between 1 and q was then generated by using a random number table.

4. Every q-th type was included as part of the random sample. These 50 types (for each

prefix) were then checked for whether they met the selection criteria or not (see section

3.3). If an item did not meet the selection criteria, it was discounted. It should be kept in

mind that these types included both hapaxes and non-hapaxes.

5. Among the non-hapax types in the random sample (see step 4 above), the proportion of

discounted items that did not meet the selection criteria was applied to the entire set of non-

hapax types for that prefix (to what was left after typological errors and nonsense strings

were removed).

6. Among the hapaxes in the random sample (see step 4 above), the proportion of discounted

hapaxes that did not meet the selection criteria was applied to the total number of hapaxes

in the list that step 1 resulted in (for that particular affix) and was then removed.

7. The proportion of discounted sample non-hapax types was also applied to the total number

10

of tokens for a given prefix, giving an estimate of the acceptable number of tokens for that

prefix.

If, for example, prefix X had an initial type list of 110 items, and 10 typographically faulty,

then these 10 would be removed, leaving 100 items. Of these 100 types, q would be equal

to 2, in order to get a sample of 50 items (lemmas). Every second type, including hapaxes,

would then be selected. Among these 50 sample items, some would be hapax legomena,

others would have frequencies greater than 1 (and be so-called non-hapax types). These 50

items would then be scrutinized to see whether they fulfilled the selection criteria. If,

among the 50 samples, 30 were non-hapax types and 20 hapaxes, and say that 5 of the non-

hapaxes did not meet the criteria for selection, then, the total number of non-hapax types

would be reduced by 5/30. The number of tokens corresponding to the 30 types would then

be reduced by the same factor. The proportion of acceptable hapaxes was similarly

calculated, so that if, say, 5 of the 20 hapaxes did not fulfill the selection criteria, then the

total number of hapaxes was reduced by 5/20. In spite of a very large number of tokens,

types, and hapaxes, this sampling method allowed for reasonable estimates of the three

statistics.

For the prefix arch-, a rather large proportion of types and hapaxes did not meet the

selection criteria. As many as 58% of the hapaxes (n1) were discounted (compared to an

average of 10% discounted types for the other prefixes); and 63% of the rest of the types

(V) were discounted (compared to an average of 12% in the other cases). Since the total

number of hapaxes was 154, this number was reduced by 89, giving 65 as the estimate of

acceptable hapaxes. The procedure was the same for the non-hapax types leaving 55

acceptable non-hapax types. Thus, the total number of types including hapaxes (Vtot)

resulted in 120. The token frequency was calculated by excluding the same proportion of

tokens (as the one that was excluded for types), giving a rough estimate of the number of

tokens for each affix. This was 63% in the case of arch-, leaving 5464 tokens for the prefix.

The proportion of discounted items for the other nine prefixes was considerably less, as can

be seen in Table 3.1 below.

The large differences between the original number of tokens (row 12) and the revised

number of tokens (row 13) for out-, over-, and under- is due to prepositions and adverbs

with the same form as the prefix included in the list of types for the prefixes. The number

of tokens for out as a preposition, for example, is about 200,000.

11

Table 3.1 Sampling procedures applied to types, tokens and hapaxes and the 10 investigated prefixes results.

Arch- Hyper- Mega- Mini- Out- Over- Semi- Super- Ultra- Under-

1. Original number

of types 331 367 251 668 898 2122 814 1006 315 1036

2. Revised number

of types 303 358 241 622 797 2013 814 965 297 1036

3. Revised types

after hapaxes

removed

149 162 88 237 456 1054 294 433 123 519

4. Per cent of types

excluded based on

sample

63 5 17 28 25 0 0 27 10 0

5. Number of

lemmas excluded 94 8 15 66 114 0 0 116 12 0

6. Remaining

number of lemmas 55 154 73 171 342 1054 294 317 111 519

7. Original number

of hapaxes 181 204 165 429 434 1052 520 572 191 472

8. Revised number

of hapaxes 154 196 88 385 341 959 487 265 174 404

9. Per cent of

hapaxes excluded

based on sample

58 4 11 10 27 0 6 18 3 9

10. Number of

hapaxes excluded 89 8 15 39 92 0 29 48 5 36

11. Remaining

number of hapaxes 65 188 136 346 249 959 458 217 169 368

12. Original number

of tokens 18446 2889 1381 51167 264577 181924 7425 23738 2000 128024

13. Revised number

of tokens 16557 2736 1216 49885 67077 50015 6557 21495 1336 66337

14. Per cent tokens

excluded based on

type sample

63 5 17 28 25 0 0 27 10 0

15. Number of

tokens excluded 11093 136 206 13967 16769 0 0 5804 134 0

16. Remaining

number of tokens 5464 2600 1010 35918 50308 50015 6557 15691 1202 66337

12

3.3 Selection criteria

A type or hapax had to meet the criteria listed below in order to be included in the counts of

the sample:

1. Names of any kind were excluded (e.g. SuperSparc-II and SuperMax).

2. Items that were not prefix+stem words were excluded (e.g. superb, overt, minister, etc.)

3. Items where a pronunciation change was evident (indicating lexicalization) were excluded

(e.g. minister).

4. Multi-hyphenated items, lemmas with more than 2 hyphens (e.g. superstar-in-the-making),

were excluded.

5. Loan word (e.g. architecto) were excluded.

6. Misspelling were included if the intended words was clear (e.g. super-heroe (super-hero)).

While other criteria could have been applied, this set seems to conform to the practices

indicated (however vaguely) in the relevant literature. The results of the sampling

procedure and the relevant calculations (P and Q) are provided in the results section below.

4. Results Having discussed the theoretical and methodological groundwork, the results of this study

can now be presented. As previously mentioned, all the counts are based on the BNC data

and selected according to the selection criteria presented in the preceding section.

4.1 Results of type frequency

The first measurement concerns type frequency, which is calculated by counting all the

unique derivatives of a given affix (Bauer, 2001; Plag, 2006; Plag, 2003). The lemmas are

all counted once and include hapaxes, low frequency words, as well as high frequencies

items.

In general, the larger the number of types of a prefix, the more productive is the prefix

considered to be (Plag, 2003; Fernandez-Dominguez et al., 2007; Hay & Baayen 2001). A

low number of types thus indicates unproductive affixes, while a high number of types

indicate the opposite. For example, the suffix –ter has the type frequency of two as it does

not occur in any other words than laughter and slaughter in English (Bauer, 2001). This

would, for example, be considered as a less productive affix than the suffix -ness that has

2466 occurrences listed in the BNC (Plag, 2006).

As seen in Figure 4.1 below where hapaxes as well as types of a particular affix are

included in the counts (V + n1, also referred to as Vtot); arch- has the lowest number of

different derivatives listed in the BNC corpus. It should thus be considered to have the

lowest productivity of the investigated affixes according to this productivity measure.

Over- should, in contrast to arch- then, be considered as the most productive one. Under-

with less than half the type frequency of over- (887) is the second most productive affix,

the other seven prefixes range between 209 and 752 types.

13

Figure 4.1. Type frequency for ten prefixes (hapaxes included).

If the hapaxes (n1) are excluded from the type counts (Vtot – n1, leaving V), the overall

order of the prefixes is nearly the same as in the case of the total number of types (Vtot);

i.e. arch- is still the least productive, and over- the most productive (see Figure 4.2). In the

case of V, the remaining prefixes range from 73 to 519.

Figure 4.2. Type-frequencies of ten prefixes.

120

342 209

517 591

2013

752 534

280

887

0

500

1000

1500

2000

2500

arch- hyper- mega- mini- out- over- semi- super- ultra- under-

To

tal n

um

ber

of

typ

es

(V +

n1)

Prefix

Total type frequency

55

154

73

171

342

1054

294 317

111

519

0

200

400

600

800

1000

1200

arch- hyper- mega- mini- out- Over- semi- super- ultra- under-

Nu

mb

er

of

typ

es

(V

)

Prefix

Type frequency

14

4.2 Results of token frequency

The token frequency value (N) for a particular affix has been calculated by summing all the

token frequency-values listed for the prefix´s different derivatives. Token frequency is also

considered to be an indicator of productivity, i.e. the higher the token frequency, the more

productive the prefix. For example, Plag (2004: 11) gives the token frequency of the suffix

-wise as 2091, and that of -ness as 106957. He then states that -ness is considered to be the

more productive one of the two (from a token frequency point of view).

The prefix under- has the highest token frequency (66337) of the investigated prefixes and

is, therefore, considered to be the most productive prefix based on this count. Out-, over-

and mini- also have high token frequencies while mega-, ultra- and hyper- have

comparatively fewer tokens. The prefix with the smallest token frequency is mega-, with

only 1.3% as many occurrences as under- (1010 versus 66337 occurrences). Based on this

count, one can claim that under- is close to 65 times more productive than mega-.

As seen in Figure 4.3, it seems that the prefixes fall into two somewhat distinct groups;

those with relatively high token frequencies (mini-, out-, over- and under-) and those with

relatively low token frequencies (arch-, hyper-, mega-, semi-, super- and ultra).

Figure 4.3. The token frequencies of the investigated prefixes.

5464 2600 1010

35918

50308 50015

6557

15691

1202

66337

0

10000

20000

30000

40000

50000

60000

70000

arch- hyper- mega- mini- out- over- semi- super- ultra- under-

Nu

mb

er

of

token

s

(N)

Prefix

Token frequency

15

4.3 Results of hapax frequency

Hapaxes (n1) are word-forms that only occur once in the corpus. This means that they have

a token frequency of one, and that there is no variation of inflectional forms. For each

prefix, the number of hapaxes is shown in Figure 4.4.

Over-, in comparison to the other prefixes, is the prefix with the highest number of hapaxes

(959) and thus the most productive one from a hapax perspective. It is almost twise as

productive as semi- which has 458 hapaxes and is the second most productive affix, while

arch- is the least productive one.

Figures 4.1, 4.2, and 4.3 show quite similar patterning. As will be shown below and

discussed in the following section, the correlation-values for type and hapax frequency will

be high.

Figure 4.4. Hapax frequencies (n1).

4.4 Results of 'Productivity in the narrow sense'

By calculating the hapax frequency (n1) divided with the token frequency (N), we get a

statistic which could be thought of as indicating the probability of encountering new word-

formations with a specific prefix (Plag, 2006). This probability is called 'Productivity in the

narrow sense' (P). According to Plag (2006), the higher the value of P, the higher the

potential of the prefix to produce new word formations.

A high hapax freqency combined with a low token frequency results in a high value of P

indicating a high likelihood of coinability for that specific prefix. When the opposite

relation holds on the other hand (low hapax frequency and high token frequency), it is an

indication of low probability of encountering new word-formations with that particular

affix.

As can be seen in Figure 4.5, mega- and ultra- have the highest P values while out- has the

lowest.

65

188 136

346

249

959

458

217 169

368

0

200

400

600

800

1000

1200

arch- hyper- mega- mini- out- over- semi- super- ultra- under-

Nu

mb

er

of

hap

axes

(n1)

Prefix

Hapax frequency

16

Figure 4.5 The P quotient: hapax frequency divided by token frequency.

4.5 Method Q – Type frequency related to token frequency

Method Q has been calculated by dividing type frequency of a prefix (V) by the token

frequency (N). The quotient Q is hypothesized to be similar to P in the sense of being an

indicator of potential productivity. Like in the case of P will a high quotient indicate a high

potential productivity of the prefix.

As seen in Figure 4.6, the most productive prefix, based on the Q measurement, is ultra-

which is approximately eighteen times more productive than the least productive prefix

mini-. It is noticeable that Figure 4.6 is quite similar to 4.5 (P). This will be discussed in the

subsequent section.

Figure 4.6. The Q-values of ten affixes.

0,000

0,020

0,040

0,060

0,080

0,100

0,120

0,140

0,160

arch- hyper- mega- mini- out- over- semi- super- ultra- under-

P

(n1/N

)

Prefix

'Productivity in the narrow sense' (P)

0,000

0,010

0,020

0,030

0,040

0,050

0,060

0,070

0,080

0,090

0,100

arch- hyper- mega- mini- out- over- semi- super- ultra- under-

Q

(V/N

)

Prefix

Method Q (Q)

17

4.6 Correlations

The results of the five methods for calculating productivity have been presented above. In

order to get a picture of the extent to which these different methods could be measuring

similar aspects of productivity, correlation figures were calculated for the five methods (see

Table 4.1 below).

In Table 4.1, 4.3 and 5.1, V refers to type frequency (hapaxes excluded), N to token

frequency, n1 to hapax frequency, P to 'Productivity in the narrow sense' (n1/N), and Q to

Method Q (V/N). All tables, except 4.1, also include Vtot that refers to type frequency

including hapax legomena.

The table shows that type frequency (N) and hapax frequency (n1) correlate to a very high

degree (0.93). However, the highest correlation value is between 'Productivity in the

narrow sense' (P) and Method X (Q) (0.98). A high correlation indicates that there may be

common underlying processes for both calculations.

Correlation

values

N n1 P

(n1/N)

Q

(V/N)

V 0.69 0.93 -0.42 -0.51

N 0.54 -0.67 -0.70

n1 -0.30 -0.18

P (n1/N) 0.98

Table 4.1. Correlation values for the five measures.

4.7 Summary of results

Table 4.2 below presents the values for each prefix and each measure. The prefixes are

sorted in alphabetical order.

Affix Vtot

(n1+V)

V N n1 P

(n1/N)

Q

(V/N)

Arch- 120 55 5464 65 0.012 0.010

Hyper- 342 154 2600 188 0.072 0.059

Mega- 209 73 1010 136 0.135 0.072

Mini- 517 171 35918 346 0.010 0.005

Over- 591 342 50308 249 0.005 0.007

Out- 2013 1054 50015 959 0.019 0.021

Semi- 752 294 6557 458 0.070 0.045

Super- 534 317 15691 217 0.014 0.020

Ultra- 280 111 1202 169 0.141 0.092

Under- 887 519 66337 368 0.006 0.008

Table 4.2. The results of ten prefixes, calculated with five different productivity measures.

18

Table 4.3 ranks the different prefixes for each productivity measure; 1 is signifying the

highest productivity rank, and 10 the lowest (the least productive prefix).

Productivity

ranking

Vtot

(n1+V)

V N n1 P

(n1/N)

Q

(V/N)

1. over- over- under- over- mega- ultra-

2. under- under- out- semi- ultra- mega-

3. semi- out- over- under- hyper- hyper-

4. out- super- mini- mini- semi- semi-

5. super- semi- super- out- over- over-

6. mini- mini- semi- super- super- super-

7. hyper- hyper- arch- hyper- arch- arch-

8. ultra- ultra- hyper- ultra- mini- under-

9. mega- mega- ultra- mega- under- out-

10. arch- arch- mega- arch- out- mini-

Table 4.3. Ranking of prefixes for each measure.

5. Discussion

5.1 Sizing up the productivity measures

To address the first research question, we can state quite clearly that the different

productivity measurements give different results and that the ten prefixes used in this study

are often ranked quite differently by the five measures/calculations. However, as can be

seen in the tables above, although the prefixes are ranked differently by the different

measurements, some patterns can nonetheless be detected (see explanation of this in the

next paragraph). The prefix super-, however, unlike most of the other prefixes, does show

consistency across the measurements, getting a mid-range ranking for all five measures.

An interesting case is mega-, whose P and Q values give a high productivity value, while

the token frequency (N), type frequency (V) and hapax frequency (n1) give a low

productivity score (that is, compared to the other prefixes). For under-, the opposite seems

to be the case, i.e. high values for V, N and n1 but low for P and Q. The implication of such

differences suggests that P and Q, on the one hand, and the type, token, and hapax counts,

on the other hand, reflect different aspects of productivity. By themselves, type, token and

hapax measures are often regarded as reflecting factual (past or contemporary)

productivity, while P is considered to reflect what Plag calls an affix's potential or,

probability, of occurring in new word-formations (Plag, 2006). Consequently, we might

conclude that while under- has been productive in the past (with high V and N values), the

P and Q value for mega- would imply that it is potentially productive, meaning

(presumably) that it is in some sense likely to be used to coin new words.

19

Plag et al. (1999) claim that for a large corpus, the total number of hapaxes will come to

about half of the number of types for the total corpus vocabulary (the LNRE distribution).

As seen in Table 5.1 below, this estimate also seems to apply to individual prefixes. The

hapax/type quotient for the investigated prefixes in this studdy ranges from 0.42 to 0.67 of

the number of types (with a mean of 0.534 and a standard deviation of 0.099). Since this

relationship seems to be fairly consistent across the ten prefixes, it is not suprising that the

P and Q measurements are highly correlated (r = 0.98) and seem to provide the same kind

of information. Considering this similarity of information in the type and hapax counts, one

thing that could be considered surprising is that several researchers seem to argue that

hapax counts give so much more information than simple type counts (Plag, 2003; Baayen

& Renouf, 1996; Fernandez-Dominguez et al., 2007; Plag et al., 1999; Plag, 2006).

Prefix n1 Vtot n1/Vtot

1. arch- 65 120 0.54

2. hyper- 188 342 0.55

3. mega- 136 209 0.65

4. mini- 346 517 0.67

5. out- 249 591 0.42

6. over- 959 2013 0.48

7. semi- 485 752 0.61

8. super- 217 534 0.41

9. ultra- 169 280 0.60

10. under- 368 887 0.41

Table 5.1. Hapax and total type counts and their quotients.

The high correlation values of hapaxes and types, on the one hand, and P (and Q) on the

other (see Table 4.1 above), would seem to support Plag's claim that different

measurements are to be viewed as reflecting different aspects of morphological

productivity (Plag, 2006).

One aspect of hapax legomena that has not been considered in this study is the proportion

of neologism. It is assumed that the larger the corpus, the larger the proportion of

neologisms (Plag, 2006). Given a set of hapaxes, we can assume that many, but probably

not all of the items, are neologisms in some sense of the term. In order to get a better

understanding of the value of hapaxes, a careful study of neologisms would seem to be

needed.

Concerning methodological problems, it seems that most of them concern the selection

criteria that are to be used for selecting types and tokens for inclusion in the productivity

counts. The relevant literature is often vague about specifying selection criteria, and the

criteria that have been applied in past studies are very seldom explicitly stated (see section

2.2.2 above).

20

5.2 Measurement correlations

To address the second research question concerning the extent the measures correlate with

each other, we can notice, as mentioned above, that type frequency (V) and hapax

frequency (n1) correlate very highly (r=0.93). A similarly high correlation value applies to

P and Q (r=0.98). The other correlations seem to be moderate, ranging from -0.18 to 0.7.

The high correlation between hapaxes and types could be regarded as surprising given that

some types have a quite high token frequency and could thus be considered to be

lexicalized to a greater or lesser extent. However, the value seems to suggest that types and

hapaxes on the one hand, and the P and Q calculations on the other, are closely related

kinds of information.

5.3 Hapaxes´ importance concerning productivity

Plag (2006) considers P (the 'Productivity in the narrow sense') to calculate potential

productivity, or, the probability of meeting a newly produced word type with a specific

affix. It is difficult, however, to see how the notion of probability should be interpreted

here. P is calculated as the number of hapaxes divided by the total token frequency for a

particular affix. In the case of a process (or affix) in a given corpus that has only a few

types, perhaps with most of them having high token frequencies, and only a few hapaxes, it

does seem that such a case would be of low productivity (the high token frequency might

indicate some degree of lexicalization, leaving few items on which to base an analogy for

production). On the other hand, if the token frequency equals the hapax count for some

affix (i.e. all occurrences of the affix are hapaxes, but not necessarily neologisms), then P

necessarily equal 1 (certainty), which seems rather counter-intuitive but should perhaps be

considered simply a minor glitch in the calculation.

In addressing the third research question concerning the extent to which hapax legomena

provide useful productivity information, we can now argue, since the P and Q calculations

correlate so highly (r=0,98), and that the hapax counts for the different prefixes do

approximate half the type counts for any particular prefix, that the hapax counts are not

actually providing much more information that the type counts.

5.4 Conclusion

Clearly, there is no single method for measuring the productivity of affixes, and no single

method is acknowledged by the majority of word-formation scholars as being superior to

all the others (Bauer, 2005). Rather, the different models should be viewed as reflecting

different aspects of morphological productivity (Plag, 2006) where type and hapax

frequency (and to some extent even token frequency), measures similar aspects of

productivity ('factual' productivity); while P and Q, on the other hand, considers another

kind of information and thus another view of productivity (potential productivity). By

investigating several of the methods, comparing and analyzing the different results and the

information they give, one can hopefully come to a more nuanced view of just what the

general notion of productivity is supposed to mean and measure.

21

This study of productivity measurements of ten prefixes in the BNC was based on random

samples of types of each of the ten prefixes. In order to get a more reliable view of the

measurements, a study that scrutinizes each token of each type for each prefix would need

to be done. While such a study was outside the scope of this project, it could be considered

for future work. In addition to more reliable numbers, such a study would probably also

provide data for more rigorous selection criteria. Such a study could also calculate variation

of results based on varying sets of selection criteria. Such a study would also contribute

towards lifting the general study of morphological productivity up from what Bauer calls "a

rather poor state" (Bauer 2001: 25).

22

References

Aronoff, M. (1976). Word Formation in Generative Grammar. Cambridge, USA: MIT

Press.

Baayen, H. (2012). homepage of R. Harald Baayen. Retrieved from

http://www.ualberta.ca/~baayen/.

Baayen, R. H. & Renouf, A. (1996) Chronicling the times: Productive lexical innovations

in an English newspaper. Language, 72(1), 69-96.

Bauer, L. (1983). English word-formation. Cambridge: Cambridge Univ. Press.

Bauer, L. (2001). Morphological productivity. Cambridge: Cambridge Univ. Press.

Bauer, L. (2005). Productivity: Theories. In P. tekauer & R. Lieber (Eds.), Handbook of

word-formation (pp. 315-34). Dordrecht, The Netherlands: Springer.

Fernández-Domínguez, J., Díaz-Negrillo, A., tekauer, P. (2007). How is low

morphological productivity measured? ATLANTIS, 29(1), 29-54.

Hay, J. & Baayen, H. (2001). Parsing and Productivity. In G. E. Booji & J. van Marle

(Eds.), Yearbook of Morphology 2001 (pp. 203-235). Dordrecht, The Netherlands: Kluwer

Academic Publishers.

Lehrer, A. (1995). Prefixes in English word formation. Folia Linguistica 29(1), 133-148.

Ljung, M. (2003). Making words in English. Lund: Studentlitteratur.

Plag, I., Dalton-Puffer, C., Baayen, H. (1999). Morphological productivity across speech

and writing. English Language and Linguistics 3(2), 209-228.

Plag, I. (2003). Word-formation in English. Cambridge: Cambridge University Press.

Plag, I. (2006). Productivity. In B. Arts & A. McMahon (Eds.), The Handbook of English

Linguistics (pp. 121-128). Malden, USA: Blackwell Publishing Ltd.

Plag, I. (2007). Introduction to English Linguistics. Berlin: Mouton de Gruyter.

van Marle, J. (1992) The relationship between morphological productivity and frequency: a

comment on Baayen´s performance-oriented conception of morphological productivity. In

G. E. Booij & J. van Marle (Eds.), Yearbook of Morphology 1991 (pp. 151-163).

Dorodrecht, The Netherlands: Kluwer Academic Publishers.

23

The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Oxford

University Computing Services on behalf of the BNC Consortium. URL:

http://www.natcorp.ox.ac.uk/

24

Appendix A

"A" equals accepted, that the lemma under consideration has been approved. "X", on the

other hand, means that the lemma has been rejected, not considered as valid.

Random sample of

the prefix arch- Headword Frequency

A = accepted

X = rejected

1 arch-appeaser 1 A

2 arch-boss 1 A

3 arch-cynic 1 A

4 arch-gallic 1 A

5 arch-introspective 1 A

6 arch-priestess 1 A

7 arch-thatcherite 1 A

8 archadian 1 X

9 archaeoentomology 1 X

10 archaeopterix 1 X

11 archambaud 1 X

12 archangetica 1 X

13 archdeadon 1 X

14 archenfield 1 X

15 archetypology 1 A

16 archias 1 X

17 archidonate 1 A

18 archilocho 1 X

19 archipiélago 1 X

20 architect-owner 1 X

21 architecto 1 X

22

architecture-

independent

1 X

23 architype 1 A

24 archi 1 X

25 archon-list 1 X

26 archterrorist 1 A

27 archimede 2 X

28 arch-opponent 2 A

29 archaeogastropod 2 X

30 archd 2 X

31 architectura 2 X

32 archmage 2 A

33 arch-priest 3 A

34 archaistic 3 X

35 archdruid 3 A

36 archipelagoe 3 X

25

37 arch-conservative 4 A

38 archilochean 4 X

39 archelaus 5 X

40 archao 6 X

41 architectonic 8 X

42 archy 9 X

43 archived 11 X

44 archimandrite 14 A

45 archeologist 24 X

46 archdiocese 39 A

47 archangel 56 A

48 archivist 149 X

49 archway 261 A

50 archaeologist 537 X

Random sample of

the prefix hyper- Headword Frequency

A = accepted

X = rejected

1 hyper-awareness 1 A

2 hyper-edge 1 A

3 hyper-flex 1 A

4 hyper-knee 1 A

5 hyper-passionate 1 A

6 hyper-smart 1 A

7 hyperaccumulation 1 A

8 hyperaesthetic 1 A

9 hyperbolae 1 X

10 hypercarbon 1 A

11 hypercompetitive 1 A

12 hyperdensity 1 A

13 hyperechoic 1 A

14 hypergames 1 A

15 hypergastinaemia 1 A

16 hypergraphics 1 A

17 hyperinsulinemia 1 A

18 hyperlipaemic 1 A

19 hypermetabolic 1 A

20 hypernormal 1 A

21 hyperparasitic 1 A

22 hyperpigment 1 A

23 hypersaline 1 A

24 hypersnow 1 A

25 hyperstriatum 1 A

26 hypertext-style 1 A

27 hypertonicity 1 A

28 hyperventilate 1 A

26

29 hyper-media 2 A

30 hyperarc 2 A

31 hyperconsciousness 2 A

32 hyperintelligent 2 A

33 hypermanic 2 A

34 hyperpolarisation 2 A

35 hypersexuality 2 A

36 hypertext-to-text 2 A

37 hyper-real 3 A

38 hyperextension 3 A

39 hypersomnia 3 A

40 hyperchannel 4 A

41 hyperpnoea 4 A

42 hyperparasite 5 A

43 hyperfine 6 A

44 hyperinsulinaemia 7 A

45 hyper-base 9 A

46 hyperglycaemic 15 A

47 hypertensive 21 A

48 hyperdesk 26 A

49 hypermedia 38 A

50 hyperion 62 X

Random sample of

the prefix mega- Headword Frequency

A = accepted

X = rejected

1 mega-cash 1 A

2 mega-city 1 A

3 mega-death 1 A

4 mega-diverse 1 A

5 mega-firm 1 A

6 mega-gloss 1 A

7 mega-high 1 A

8 mega-interesting 1 A

9 mega-massive 1 A

10 mega-microscope 1 A

11 mega-moan 1 A

12 mega-profitable 1 A

13 mega-skrag 1 A

14 mega-stonking 1 A

15 mega-ton 1 A

16 mega-volume 1 A

17 mega-whopper 1 A

18 megabecquerel 1 A

19 megabuck 1 A

20 megaceros 1 A

27

21 megacounty 1 A

22 megafabulously 1 A

23 megaglob 1 A

24 megaira 1 X

25 megalencephaly 1 A

26 megali 1 X

27 megalomedia 1 A

28 megamillion 1 A

29 megamuseum 1 A

30 megaphone-wielding 1 A

31 megapixel 1 A

32 megaprogram 1 A

33 megaship 1 A

34 megasthenes 1 X

35 megaterium 1 X

36 megatooth 1 A

37 megavessel 1 A

38 megazone 1 A

39 megaproject 2 A

40 mega-expensive 2 A

41 mega-store 2 A

42 megadrives 2 A

43 megamerger 2 A

44 megate 2 X

45 mega-buck 3 A

46 megabase 3 A

47 megagame 3 A

48 megatek 3 X

49 megalosaurus 4 A

50 megastardom 4 A

Random sample of

the prefix mini- Headword Frequency

A = accepted

X = rejected

1 mini-aileron 1 A

2 mini-bead 1 A

3 mini-captain 1 A

4 mini-classic 1 A

5 mini-conference 1 A

6 mini-debate 1 A

7 mini-earthquake 1 A

8 mini-fad 1 A

9 mini-forest 1 A

10 mini-grinder 1 A

11 mini-hypermarket 1 A

12 mini-launch 1 A

28

13 mini-micro 1 A

14 mini-parade 1 A

15 mini-proof 1 A

16 mini-retrieval 1 A

17 mini-section 1 A

18 mini-slab 1 A

19 mini-studio 1 A

20 mini-thesis 1 A

21 mini-twelfth 1 A

22 mini-wurlitzer 1 A

23 minichrom 1 A

24 minidisk 1 A

25 minilink 1 A

26 minimissile 1 A

27

mining-to-

manufacturing

1 A

28 minipuls 1 A

29 miniskirted 1 A

30 minister-it 1 X

31 ministers 1 X

32 minitanker 1 A

33 mini-tunnel 2 A

34 mini-cinema 2 A

35 mini-jmc 2 A

36 mini-season 2 A

37 mini-uzi 2 A

38 minifundium 2 A

39 minis-type 2 X

40 minit 2 X

41 miniato 3 X

42 minitex 3 X

43 miniclub 4 A

44 mini-version 5 A

45 mini-golf 6 A

46 mini-van 7 A

47 mini-cab 9 A

48 mini-golf 13 A

49 minimization 18 X

50 mini-enterprise 28 A

Random sample of

the prefix out- Headword Frequency

A = accepted

X = rejected

1 out-act 1 A

2 out-cross 1 A

3 out-fish 1 A

29

4 out-i 1 X

5 out-nursing 1 A

6 out-of-keeping 1 X

7 out-of-stock 1 X

8 out-patient 1 A

9 out-quarterback 1 A

10 out-spoke 1 A

11 out-turn 1 A

12 outbluffed 1 A

13 outcross 1 A

14 outer-ring 1 X

15 outflood 1 A

16 outlandos 1 X

17 outof-the-way 1 X

18 output-setting 1 A

19 outsang 1 A

20 outside-the-scope 1 A

21 outswel 1 A

22 outwardness 1 A

23 outreaching 2 A

24 out-hit 2 A

25 out-of-time 2 X

26 out-standing 2 A

27 outerhead 2 A

28 outlawing 2 A

29 outshining 2 A

30 out-and-back 3 X

31 out-placement 3 A

32 outflanking 3 A

33 outr 3 X

34 out-of-competition 4 X

35 outbr 4 X

36 outward-pointing 4 A

37 outbound 5 A

38 outsourcing 5 A

39 out-of-contract 7 X

40 outsized 8 A

41 outflung 12 A

42 out-manoeuvr 15 A

43 outrigger 19 A

44 outsource 25 A

45 out-of-work 39 X

46 outplay 58 A

47 outbound 100 A

48 outpost 164 A

30

49 outpatient 368 A

50 outbreak 1173 A

Random sample of

the prefix over-

Headword Frequency A = accepted

X = rejected

1 over-abundant 1 A

2 over-bleach 1 A

3 over-chill 1 A

4 over-declamatory 1 A

5 over-embellishment 1 A

6 over-florid 1 A

7 over-ideologisation 1 A

8 over-lean 1 A

9 over-modulate 1 A

10 over-possessive 1 A

11 over-rarefy 1 A

12 over-sample 1 A

13 over-solicitousness 1 A

14 over-sweet-anemone 1 A

15 over-upholstered 1 A

16 overattentive 1 A

17 overcoat-clad 1 A

18 overdiscuss 1 A

19 overfrow 1 A

20 overinsterpretation 1 A

21 overmonolithic 1 A

22 overproducing 1 A

23 oversight 1 A

24 overtrading 1 A

25 over-spiritualizing 2 A

26 over-control 2 A

27 over-furnished 2 A

28 over-police 2 A

29 over-strong 2 A

30 overcoated 2 A

31 overjacket 2 A

32 overstrained 2 A

33 over-doing 3 A

34 over-solicitous 3 A

35 overman 3 A

36 over-breeding 4 A

37 over-sentimental 4 A

38 overshirt 4 A

39 overmighty 5 A

40 over-spending 6 A

31

41 over-stretched 7 A

42 overindulgence 8 A

43 over-state 10 A

44 overeating 12 A

45 oversize 15 A

46 overwritten 19 A

47 overreach 27 A

48 overrated 40 A

49 oversimplification 66 A

50 overload 212 A

Random sample of

the prefix semi- Headword Frequency

A = accepted

X = rejected

1 semi-accidentally 1 A

2 semi-architectural 1 A

3 semi-bachelor 1 A

4 semi-bureaucrat 1 A

5 semi-collapse 1 A

6 semi-consciously 1 A

7 semi-customer 1 A

8 semi-destroy 1 A

9 semi-dwarf 1 A

10 semi-existence 1 A

11 semi-flat 1 A

12 semi-gothic 1 A

13 semi-illiterate 1 A

14 semi-intensive 1 A

15 semi-liberated 1 A

16 semi-market 1 A

17 semi-national 1 A

18 semi-palmated 1 A

19 semi-pluralism 1 A

20 semi-pureed 1 A

21 semi-refined 1 A

22 semi-rs 1 A

23 semi-seriously 1 A

24 semi-slavery 1 A

25 semi-stable 1 A

26 semi-strangled 1 A

27 semi-train 1 A

28 semi-wet 1 A

29 semiconductor-based 1 A

30 semilla 1 X

31 seminis 1 X

32 semirecumbent 1 A

32

33 semivoluminous 1 A

34 semi-audible 2 A

35 semi-democratic 2 A

36 semi-government 2 A

37 semi-peripheral 2 A

38 semi-ruin 2 A

39 semiautomated 2 A

40 semiprofessional 2 A

41 semi-flexible 3 A

42 semi-proletariat 3 A

43 semi-annually 4 A

44

semi-

proletarianisation

4 A

45 semi-aquatic 5 A

46 semitonal 5 A

47 semi-abstract 7 A

48 semi-nude 8 A

49 semi-autobiographical 12 A

50 semi-rural 19 A

Random sample of

the prefix super- Headword Frequency

A = accepted

X = rejected

1 super-absorbency A

2 super-bush 1 A

3 super-deluxe 1 A

4 super-fitness 1 A

5 super-heroe 1 A

6 super-liner 1 A

7 super-mini 1 A

8 super-pit 1 A

9 super-rocket 1 A

10 super-sexy 1 A

11 super-spy 1 A

12 super-treble 1 A

13 superactivate 1 A

14 superbly-located 1 X

15 supercalendered 1 A

16 superchrome 1 A

17 supercut 1 A

18 superfirm 1 A

19 supergrass 1 A

20 superimposer 1 A

21 superlativeness 1 X

22 supermax 1 X

33

23

supernaturally-

flavoured

1 A

24 superpatriot 1 A

25 superquinn 1 X

26 supersmart 1 A

27 superstress 1 A

28 supertinta 1 X

29 super-loud 2 A

30 super-hyped 2 A

31 super-stadium 2 A

32 superbly-judged 2 X

33 superfecundity 2 A

34 superlunary 2 A

35 superpruf 2 X

36 supertintas 2 X

37 super-saver 3 A

38 supernaturalism 3 A

39 super-delegate 4 A

40 supermarioland 4 X

41 superbike 5 A

42 superstation 5 A

43 superphosphate 6 A

44 superspec 7 X

45 supermac 9 X

46 super-fit 12 A

47 supermini 15 A

48 superscript 22 A

49 super-power 41 A

50 superposition 70 A

Random sample of

the prefix ultra- Headword Frequency

A = accepted

X = rejected

1 ultra-absorbent 1 A

2 ultra-avantgarde 1 A

3 ultra-centralised 1 A

4 ultra-current 1 A

5 ultra-defensive 1 A

6 ultra-drawing 1 A

7 ultra-fast 1 A

8 ultra-free 1 A

9 ultra-happy 1 A

10 ultra-hip 1 A

11 ultra-keen 1 A

12 ultra-low 1 A

13 ultra-model 1 A

34

14 ultra-naughty 1 A

15 ultra-portability 1 A

16 ultra-quiet 1 A

17 ultra-reformist 1 A

18 ultra-right-wing 1 A

19 ultra-slow 1 A

20 ultra-sophisticate 1 A

21 ultra-structure 1 A

22 ultra-tog 1 X

23 ultra-yah 1 A

24 ultracentrifuge 1 A

25 ultradextral 1 A

26 ultramagnetic 1 A

27 ultrarunner 1 A

28 ultrasonographically 1 A

29 ultratherm 1 A

30 ultra-lightweight 2 A

31 ultra-compact 2 A

32 ultra-fit 2 A

33 ultra-powerful 2 A

34 ultra-rightwing 2 A

35 ultra-vivid 2 A

36 ultramafic 2 A

37 ultraviolet-coloured 2 A

38 ultra-leftism 3 A

39 ultra-tight 3 A

40 ultrafiltration 3 A

41 ultrasonically 3 A

42 ultra-low 4 A

43 ultralight 4 A

44 ultra-leftist 5 A

45 ultrasparc-iii 6 X

46 ultrasparc-i 8 X

47 ultra-nationalist 11 A

48 ultramarine 45 A

49 ultra-modern 30 A

50 ultrasound 164 A

Random sample of

the prefix under- Headword Frequency

A = accepted

X = rejected

1 under-active 1 A

2 under-challenge 1 A

3 under-demanding 1 A

4 under-fda 1 X

5 under-indeed 1 A

35

6 under-make-up 1 A

7 under-plan 1 A

8 under-report 1 A

9 under-sixteens 1 A

10 under-take 1 A

11 under-treatment 1 A

12 underarmed 1 A

13 underclubbing 1 A

14 underdog 1 A

15 undergoe 1 A

16 underhoof 1 A

17 undermarket 1 A

18 underpraise 1 A

19 underrepresent 1 A

20 underskirt 1 A

21 understa 1 X

22 undertaste 1 A

23

underwear-as-

outerwear

1 A

24 under-capitalised 2 A

25 under-inform 2 A

26 under-running 2 A

27 under-water 2 A

28 underemphasise 2 A

29 undermaintained 2 A

30 undersown 2 A

31 under-arm 3 A

32 under-write 3 A

33 underinsured 3 A

34 undervoice 3 A

35 under-valued 4 A

36 underscore 4 A

37 under-par 5 A

38 undersaturated 5 A

39 under-insurance 6 A

40 understrength 6 A

41 undersaddle 7 A

42 under-employ 9 A

43 understudy 11 A

44 underpainting 14 A

45

under-secretary-

general

17 A

46 underpowered 23 A

47 underwrit 31 A

48 underrated 45 A

37

Stockholms universitet

106 91 Stockholm

Telefon: 08–16 20 00

www.su.se