032_2004_v1_lars trap_jensen_spoken language in dictionaries_does it really matter

Upload: escarlata-ohara

Post on 07-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 032_2004_V1_Lars TRAP_JENSEN_Spoken Language in Dictionaries_does It Really Matter

    1/8

  • 8/18/2019 032_2004_V1_Lars TRAP_JENSEN_Spoken Language in Dictionaries_does It Really Matter

    2/8

  • 8/18/2019 032_2004_V1_Lars TRAP_JENSEN_Spoken Language in Dictionaries_does It Really Matter

    3/8

    TH E DICTIONARY-MAKING PROCESS

    view (cf. Rundell & Stock p. 49). However interesting from a theoretical point of view this discussion s, exicographers are also practical workers with a budget and a deadline and have to take these more mundane matters into account.

    The overall guiding principle fo r corpus composition has been the broadest possible coverage (for a more detailed exposition, see e.g. Asmussen & Norling-Christensen 1998). This also holds fo r its spoken part, but an important additional factor has been, for obvious economic reasons, availability. T he editorial team have carried out some of the transcription work hemselves, but he majority of the material w as generously donated n electronic format ro m various nstitutions and ndividuals. e have eceived adio and elevision programmes rom he anish roadcasting orporation, ranscribed ociolinguistic nd sociological nterviews ro m university colleagues, unedited reports ro m political debates from the Danish parliament and from the city council of Copenhagen, peeches, ectures, church ermons, various elephone answer services, and even oudspeaker announcements from rain tations. o, lthough he ub-corpus f poken anguage s ot ptimally balanced in every respect, it does include a variety of both uses and users, covering private as well as public language, and the language ofexperts as well as that oflaymen.

    With nearly 8 million words of semiscripted and unscripted spoken words the corpus was at the time of its compilation the largest spoken corpus in Northern Europe, and even today t s till comparable o, or nstance, he 0 million words of the British National Corpus.

    3 . Lexical items At he exical evel, ubstantial umber f words an e ound hat re ertainly characteristic of spoken language. However, it does not follow that words of this kind will occur in a spoken language corpus only. On the contrary, many of them are familiar words that are frequent in written texts as well, and the reason is, of course, that it is common to reproduce poken language n the written medium n fiction as well as n journalism. An interesting question s herefore: are here any words or inguistic ound units) hat are genuinely oral in the sense that they occur only in the spoken medium? hi our experience, the answer s entatively affirmative. e have ndeed ncluded a number of words and phrases hat we have ound ahnost exclusively n he poken corpus. hese have been

    marked n he dictionary with the comment "particularly n poken anguage , and n the following, the most conspicuous groups of lexical items are examined in more detail.

    Most obvious are interjections and onomatopoeic words which are ahnost by nature confined o poken anguage. ith he bove-mentioned eservation n mind, e have recorded om e nterjections not ound n any other dictionary, .g . ad ignalling disgust ('yuck'), arh or ah r signalling hesitation or doubt, and the positive response particle jaha ('yeah').

    One highly significant characteristic of spoken language is its volatility. Once a word or a chain of words has been uttered, t is normally gone and cannot be retrieved it was never eant or torage. onsequently, n mportant eyword n his onnection s

    conventionalization.

    As far

    as

    sound-words are

    concerned,

    t is

    required

    for

    a

    sound

    or

    an exclamation to become part ofthe lexicon that there is agreement in the language community

    on the expression ide the signifiant in Saussurean terms) of the utterance, .e . ho w it is

    313

  • 8/18/2019 032_2004_V1_Lars TRAP_JENSEN_Spoken Language in Dictionaries_does It Really Matter

    4/8

    E U R A L E X 2 0 0 4 PROCEEDINGS

    rendered as a linguistic item. It is important to realize that the items contained in a corpus of spoken anguage are not poken words proper, but rather ranscriptions of spoken words. Therefore, an affirmative answer to the question above is in the strict sense a contradiction in terms: nce he utterance s put on paper and occurs n, ay , a anguage corpus or n a dictionary (i.e. a written medium), it is obviously no longer 'genuinely oral'. So we need to recognize a conventional w ay of w riting an utterance before we can consider it a candidate for nclusion n he ictionary. uch conventionalizations re oreover often anguage specific which is probably the reason why words for the am e ound m ay vary from one language to the other. For instance, the English particle of assent uh-huh is written aha in German, panish and other anguages, ven hough he physical ound s probably very much he ame. However, once he convention has been established, here s nothing o prevent an mpact rom he convention on he poken orm uch hat, or example, he English anguage community agrees that the sound uttered by the male hen is spelled, by convention, ock-a-doodle-doo nd hat by a ubsequent convention he exeme s pronounced orrespondingly, hereas he ermans gree n ikeriki, he rench n cocorico, the Russians on kookarekoo, and so on. Similarly, languages have different words for yawning, neezing, gasping, arting tc. ven though he y are motivated by he same bodily ounds. Because of the requirement of a certain degree of conventionalization, he number of words found exclusively in spoken language is rather limited. Good writers have a subtle feeling fo r language and are often keen observers of linguistic behaviour. And they make use of these observations when they le t their characters engage in dialogues in fictive texts, ideed, t s ikely

    hat uthors lay n ctive nd mportant ole n

    he conventionalization process itself. very good place o ook or vidence s n comic strips. his enre ften epicts nformal onversations esembling veryday eal-life situations, and the skilled cartoonist m ay ucceed n creating new conventions ending a highly personal flavour to the strip, or as the ultimate success even in creating new words. I personally associate words ike aughh nd bleagh with Charles M. Schultz' Peanuts, nd words ike hrmpf and wak (or uak) are Barksisms, I believe. The Danish word bvadr (an interjection indicating disgust), which has now become an established word found in several Danish ictionaries, as n act reated n 96 0 y he ranslator f Peanuts s representation of bleagh.

    ff w e accept a ess trict criterion and define poken anguage as emarks being produced by one or more speakers as monologue or dialogue (including written versions of them), the number oflexical items oflexicograhic relevance increases considerably.

    Another roup f tems re iscourse markers, .e . ords nd xpressions hat bracket units of tau4 occurring outside the propositional content of a sentence (cf. Schiffrin 1987). Their primary function is to indicate the relationship between speaker and hearer, or between speaker and text, bi English they include items such as well, y'know, h, now, hen, so and / mean and arguably also text coherence markers like as Ijust said, on one hand.. n the other (these and many ofthe following examples are described at the semantic level. As they re ften multi-functional or polysemous, hey m ay well ccur n more han ne

    category,

    and they may have additional general senses not accounted fo r here). Related to this group aiefillers words that speakers use to keep the floor while planning the utterance ahead (in English words like er, mmh, well, in Danish words like ceh, mm, jah), and tag-

    3 1 4

  • 8/18/2019 032_2004_V1_Lars TRAP_JENSEN_Spoken Language in Dictionaries_does It Really Matter

    5/8

    TH E DICTIONARY-MAKING PROCESS

    questions (in

    Danish ikk' or

    ikk' også (as

    in

    hun kommer,

    kk'? 'she will

    come,

    won't

    she?y and vel (as in hun kommer ikke, e T 'she won't come, will she?')). It should be stressed that

    the categories as uch are by no means confined o poken anguage: extual coherence markers, fo r instance, are clearly common in written texts. However, the point is rather that they are not the same; the inventory seems to have a clearly delimited subset characteristic of spoken anguage, m The Danish Dictionary we have included from this group tems ike altså 'then'), du ve d ('you know'), or 0vrigt ('by the way'), hva' 'huh?'), hvordan det ('how's that'), jeg mener ('I mean'), lissom ('like'), om jeg så må sige ('so to speak'), se nu .. ('take ..'), se så ('there now'), så ('then, so'), and sa'n ('like, kind of).

    Pragmatic phrases re much more requent n conversation han n written exts. Conversations re ocial acts where one must give eedback constantly nd ssure he interlocutor of one's interest, ympathy and agreement. Some of this akes linguistic form, often s ormulaic hrases hich re xchanged ccording o onventional ules f appropriateness. Even if they are often semantically transparent, their pragmatic linkage to particular conversational situations has made us include a number, fo r example: det kan du bande på ('you bet'), det er jo det ('that's it'), en gang tiI (lit. 'once more', that is 'beg your pardon, sorry'), klart (lit. 'clearly', that is 'sure', cf. German kIar), det skal jeg love for ( I 11 say so , you bet'), jeg kan godt sige dig ('I tell you'), det må du nok sige('you can say that again'), det må jeg sige ('what yo u know, well how about that'), det siger du ikke ('you don't say'), nej, ved du nu hvad ('come on'), helt œrligt ('honestly').

    As conversation is an activity that takes place in time and space and involves tw o or more speakers, t is no surprise that deictic pronouns and adverbs occur frequently in this text type when speakers refer and orientate themselves in such a setting. Again, w e find a subset which is confined to the spoken medium: her,sens and dersens, den her and den der (both airs eaning this' nd that', ith dditional onnotations f nformality nd reservation, cf. he English use n narrative: / looked up and sa\v his uge bloke coming towards me), nd the adverbs henne (indicating ocation away ro m he speaker) and her ('here' used in a special time sense as in her tilforåret 'this (not so distant) spring'). For the groups mentioned so far, and fo r the following groups in particular one could argue that they might as well have been abelled 'informal'. And t s rue hat nformality s a ecurrent feature f most, f no t ll, f the exical tems nder discussion. ne ight herefore speculate if informality is an intrinsic feature of spoken language, but clearly this is no t the case: the spoken medium is represented in the corpus by a variety ofgenres, some ofwhich certainly re ather ormal, ike ermons, arliamentary ebates, peeches nd ailway announcements. So, nstead one must content oneself with noting that the relevant lexical items seem to originate in informal conversation like the personal interviews and radio and television programmes.

    Another recognizable group are swearwords. Again, these words are not confined to the spoken medium, but they are, like interjections, much more frequent here and display a wider and more varied range of elements han s commonly ound n written exts. W e include mong thers llerhelvedes, seleme, ddermame, ammer- nageme, kraftedeme, kraftstejleme, pokkerme, aftsuseme, ateme, gisme, serenjenseme not translatable one by one, but they serve of course the same function as bleeding, God damn it, etc. in English) The same is true of slang and colloquialisms, and a possible reason is the

    315

  • 8/18/2019 032_2004_V1_Lars TRAP_JENSEN_Spoken Language in Dictionaries_does It Really Matter

    6/8

  • 8/18/2019 032_2004_V1_Lars TRAP_JENSEN_Spoken Language in Dictionaries_does It Really Matter

    7/8

    THE DICTIONARY-MAKING PROCESS

    corpus, not least because we have adopted a policy ofbringing authentic quotations supplied with the source of origin, .e . he editors should only change a quotation if they explicitly indicate that an alteration has taken place: two dots indicate that something has been omitted, whereas ext n quare rackets ignals hat he ontent as een hanged, dded r reformulated n elation o he original ext, addition o hese general principles, he editors have been allowed to change a quotation from a spoken source if the change has to do with transcription irregularities (unauthorized spelling, punctuation, hyphenation etc.) that cannot be attributed o he original utterance, nd also where he ranscriber has added metacomments o he ext. An example may llustrate he case: n he article afslutning, sense 3 (here 'end-of-term') we find the following quotation:

    den 10.juni har vi afslutning .. hvor vifår vores uddannelsesbevis overrakt talespKbh87 (June 10 we celebrate end-of-term .. where we receive our certificate ofeducation)

    T he passage where the quotation is taken from reads in full:

    ja den tiendejuni har vi afslutning simpelthen hvor vi få r {t0ven} vores {pause} uddannelsesbevis overrakt m ed {pause} karakterer og alt det fine der som man skal have {pause} og så kan m an så g- {pause} gå ud og håbe på man kan få etjob {pause}

    m the quotation a word has been omitted (the filler simpelthen) and replaced by tw o dots, and two metacomments (on hesitation and pause) have simply been left out.

    Obviously, oo many breaks with dots or brackets disturb he eading and hould preferably be avoided. As a general editorial principle, no t more than a single, or in rare cases two breaks, has been allowed. Given this, it is hardly surprising that the proportion of spoken quotations does no t each he 7-20 per cent which he tatistics would uggest. Nevertheless, there are altogether 7,148 quotations taken from spoken language out ofa total of c 100,000 in the dictionary, or about 7 per cent. That is no t at all a negligible figure and certainly adds o the flavour of the dictionary: on average, ahnost tw o quotations on each page come from the spoken language corpus.

    6. Conclusion No doubt, it is tedious and time-consuming to put together a corpus of spoken language. It is also rue hat poken assages re ot lways ery uited s uthentic xamples n transcribed, written form. Having said that, however, there can be no doubt that inclusion of spoken Danish has contributed ignificantly o T he Danish Dictionary. 15 xplicit notes referring to spoken language tell us that a dictionary is not complete without accounting fo r it. nd ith ore han ,000 uthentic uotations ro m poken ources he user will certainly notice and, we hope, also appreciate his often-neglected, but by no means are variety of language.

    7

  • 8/18/2019 032_2004_V1_Lars TRAP_JENSEN_Spoken Language in Dictionaries_does It Really Matter

    8/8

    E U R A L E X 2 0 0 4PROCEEDMGS

    References Asmussen, J . and Norling-Christensen, 0. 1 9 9 8 'The Corpus ofthe Danish Dictionary',

    mLexikos 8 (AFRILEX-reeb/series 8). (pp. 223-242) SteUenbosch. Landau, S. . 20Q\.Dictionaries: the Art and Craft og Lexicography, second edition)

    Cambridge: Cambridge UniversityPress. Moon, R. 1998. On Usmg Spoken Data in Corpus Lexicography', in T. Fontenelle et al.

    (eds.), EURALEX98Proceedings. (Volume pp. 347-357) Liège. RundelI, M. Stock, P. 1992 (October). The Corpus RevolutionY in English Today 32.

    (pp.45-51) Schiffrin, D. 1987. DiscourseMarkers. Cambndge: Cambndge University Press.

    3 18