international journal of theoretical...

18
ISSN 1337-7892 Volume 4 • 2013 • Number 1 GLOTTOTHEORY International Journal of Theoretical Linguistics

Upload: others

Post on 05-Apr-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

Aka

dem

ie V

erla

gG

LOT

TO

TH

EOR

YV

olu

me

4 •

20

13 •

Nu

mb

er 1

ISSN 1337-7892

Volume 4 • 2013 • Number 1

GLOTTOTHEORYInternational Journal

of Theoretical Linguistics

Page 2: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.

Notes for the AuthorsPlease send text file, figures, tables and required fonts via email attachment to Thorsten Roelcke, E-Mail: [email protected] (hard copies to: Prof. Dr. Thor-sten Roelcke, Rappenstockweg 2, D–79872 Bernau im Schwarzwald). All standard/current word processing programs, preferable Word for Windows are allowed. Please do not use any write protection. Languages of publication are English and German. The contribu-tions will be reviewed; the editors have a right to return a contribution reviewed by the members of the editorial board to the author for revision.

Divide your manuscript as follows:• Your name and affiliation

(e. g.: Thorsten Roelcke (Freiburg))• Title (and subtitle) of the article• Abstract• Body of the text (including footnotes)• Appendix• Abbreviations• References• Correspondence address• Tables and figures as a separate file (min. 600 dpi)

Form of the manuscript:Please, prefer plain and avoid esthetic formatting.Use italics for cited linguistic forms (in the running text, in examples as well as in figures and tables), for non-English resp. non-German words, for titles of books and journals in the running text.Use small capitals for abbreviated category labels and for surnames of authors and editors (regardless of whether they occur in the running text, in examples or in the references).Use single quotation marks for the translation of lin-guistic forms (e. g. mit ‘with’) and double quotation marks for short quotations in the running text and “qualified” words or phrases. Longer quotations (more than three lines) have to be presented as block quota-tion (small type size).Please use scientific transcription or transliteration for examples from languages with non-Latin, non-Cyrillic or non-Greek script.

Notes:Please print notes as footnotes (no endnotes!) and num-ber them consecutively. Do not burden them with too much text, illustrations and tables.

Bibliographical citations and references:Within the text please refer by means of author’s or ed-itor’s surname, year of publication and page number (Altmann & Lehfeldt 1973: 132).

In the list of references, please do not abbreviate journal titles and first names of authors and editors. Follow the examples below:

For books:Köhler, Reinhard (2012): Quantitative Syntax Analysis. Berlin: De Gruyter Mouton.König, Ekkehard & Gast, Volker (2009): Under-standing English-German Contrasts. 2nd ed. Berlin: Schmidt.

For edited books:Roelcke, Thorsten (Hg./ed.) (2003): Variations-typologie. Ein sprachtypologisches Handbuch der eu-ropäischen Sprachen in Geschichte und Gegenwart/Variation Typology. A Typological Handbook of Euro-pean Languages Past and Present. Berlin: de Gruyter.Mayer, Felix (ed.) (2001): Language for Special Pur-poses: Perspectives for the New Millennium. Vol. 2: LSP in Academic Discourse and in the Fields of Law, Busi-ness and Medicine. Tübingen: Narr.

For contributions to edited books:Riecke, Jörg (2012): Die „Idee von Sprachgeschich-te“ in Egon Friedells Kulturgeschichte, in: Bär, Jo-chen A. & Müller, Marcus (Hg.): Geschichte der Sprache – Sprache der Geschichte. Probleme und Per-spektiven der historischen Sprachwissenschaft des Deutschen. Oskar Reichmann zum 75. Geburtstag. Berlin: Akademie, 585–607.

For contributions to journals:Popescu, Ioan-Iovitz; Čech, Radek & Altmann, Gabriel (2010): Structural Conservatism and Inno-vation in Texts, in: Glottotheory 3, 43–63.

For reviews:Busch-Lauer, Ines-A. (2011): Review of: Kvam, Sigmund; Knutsen, Karen Patrick & Langemey-er, Peter (Hg./ed.): Textsorten und kulturelle Kompe-tenz. Interdisziplinäre Beiträge zur Textwissenschaft / Genre and Cultural Competence. An Interdisciplinary Approach to the Study of Text, Münster [et al.]: Wax-mann, in: Fachsprache. International Journal of Spe-cialized Communication 34, 223–226.

Proof-reading:Proof corrections have to be entered into the print-out version by hand. The galley-proofs have to be returned within 14 days.

ATTENTION! Authors are strongly advised to restrict corrections of galley-proofs to the absolute minimum. Authors will be charged for any changes and/or addi-tions exceeding 5 % of the original text.

Contents / Inhalt

Editorial note / Editorische Notiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Thorsten RoelckeDer Beitrag von graphischen Abbildungen zur Konstituierung von Fachwortschatz in der terminologischen Grundsatznorm DIN 2330 des Deutschen Instituts für Normung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Karl-Heinz BestSilbenlängen im Deutschen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

Ramon Ferrer-i-Cancho, Antoni Hernández-FernándezThe Failure of the Law of Brevity in Two New World Primates. Statistical Caveats.. . . . . 45

Emmerich KelihGrapheme Inventory Size and Repeat Rate in Slavic Languages . . . . . . . . . . . . . . . . . . . . .56

Viktor V. Levickij †Phonetic Symbolism in Natural Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72

Ioan-Iovitz Popescu, Radek Čech, Gabriel AltmannDescriptivity in Slovak Lyrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92

Maria RukkWording of the PISA Reading Sample Tasks and the Reading Outcome Results . . . . . .105

Irma SorvaliTexts and Words in Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Zhao XiaodongA Corpus-Based Study on English and Chinese inter-textual Vocabulary Growth . . . .119

REVIEWEmília NemcováKöhler, Reinhard (2012): Quantitative Syntax Analysis. Berlin, New York: De Gruyter Mouton. (Quantitative Linguistics, 65) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .132

Page 3: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.Emmerich Kelih Grapheme Inventory Size and Repeat Rate in Slavic Languages56

Grapheme Inventory Size and Repeat Rate in Slavic LanguagesEmmerich Kelih

Abstract: In the present study the interrelation of grapheme inventory size and the repeat rate, based on the grapheme frequencies from parallel texts in Slavic languages, is discussed in detail. The obtained law like interrelation between the inventory size and the repeat rate requires an in-depth linguistic discussion of the determination of the inventory size in the analysed Slavic languages, particularly in Slovene and Macedonian.

Key words: grapheme frequencies, repeat rate, grapheme inventory

Introduction

A quantitative-statistical analysis of grapheme frequencies requires the definition of graph-eme inventories and an exact determination of the inventory size of the languages under examination. As is known, there are different possibilities for the definition of graphemes as linguistic units, such as the letter approach, and the definition of a grapheme as a unit corresponding to a phoneme. Nevertheless, the definition of linguistic units (graphemes, grapheme inventory) has to be understood simply as a necessary, but intermediate step of a quantitative analysis: it must be followed by the integration of the analysed unit into a syn-ergetic framework, with a particular focus on the interrelation of the grapheme inventory and related properties with other units and linguistic levels.

It is well known from synergetic and quantitative linguistics that the inventory size (determined by the number of graphemes or phonemes) is systematically related to the en-tropy and repeat rate of the corresponding grapheme/phoneme rank frequencies (cf. Alt-mann/Lehfeldt 1980: 151ff.; Grzybek/Kelih 2005). Generally speaking, the grapheme inventory has a direct impact on hierarchically higher linguistic levels, for instance on the structure of syllables, morphemes, word forms, etc., and on the length of word forms/lexemes (cf. Köhler 1986: 23f., Nettle 1995, Nettle 1998 and Kelih 2008 with some critical remarks regarding this relation for Slavic languages).

This paper tackles the relation of the grapheme inventory size and the repeat rate in twelve Slavic languages, based on data from a parallel text corpus (Kelih 2009a, 2009b, 2009c). As already mentioned above, special attention is paid to the question of an appro-priate definition of the grapheme and the grapheme inventory. Moreover, it will be shown that in quantitative and synergetic linguistics not only one definition of the grapheme is appropriate, but rather the definition must be chosen in dependency of language and sys-tem-inherent factors. Another important issue is the compatibility and the agreement of the used definitions with linguistic laws and hypotheses.

DOI 10.524/glot.2013.0005

Page 4: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.57Glottotheory Vol. 4, Number 1

1 Determination of the inventory size

According to Kempgen (1995: 201), the determination of the grapheme/phoneme inven-tory size mainly has to be motivated by theoretical reflections. Furthermore, the determi-nation of the grapheme/phoneme inventory has a direct influence on the shape of the cor-responding rank frequencies and related properties such as the repeat rate and the entro-py. Without giving an extensive overview of different definitions (cf. Kohrt 1985) of the grapheme as the basic unit of writing systems, at least two main approaches to a grapheme definition seem to be important: (1) in relation to the writing system (letters, alphabet) (2) in relation to the phonological/allophonic system of one language (cf. Amirova 1977, 1985 and Dürscheid 2006³). Hence, a grapheme definition is automatically connected with further problems, such as the definition of the phoneme, the complex dichotomy of writ-ten vs. oral language, general problems of the grapheme-phoneme relation, general ortho-graphical problems and the question of the standardisation (including language policy) of a language.

Generally, a grapheme is a discrete unit of the writing system, or a sign of the writing system, which represents a sound, a sequence of sounds, an allophone or a phoneme. Or to be more precise, according to Altmann (2008: 149): “A grapheme is a letter, or a combina-tion of letters, or a letter with addition al diacritical marks (such as those in Slavic, French, Spanish, German etc.) used as a whole in a language and attributable to a phoneme“. This basic definition of the grapheme can already be used for our cross-linguistic analysis of Slavic languages. A single letter, a character, double letter (ligature), digraphs1, etc. can be treated as graphemes insofar as they represent phonemes of the language under investiga-tion. In addition to this “strong” linguistic definition of a grapheme, at least a second, more practically motivated approach can be taken into consideration: usually in text books, or-thographical dictionaries, reference grammars and monolingual dictionaries some basic information about the grapheme inventory of one language is given. Of course, such lists of “graphemes” (e. g. in monolingual dictionaries) are in fact not the result of systemat-ic linguistic considerations, but for the most part rather the result of an evolution of the writing system that is not linguistically reflected. Moreover, from a historical and cultur-al point of view, these grapheme inventories are the result of many different influence fac-tors of the diachronic evolution of the writing system, such as for instance of the whole process of standardisation, the choice of a particular dialect for the Standard language, the conscious choice of a specific cultural model (e. g. the adoption of Czech diacritic signs in South Slavic languages, the introduction of Latin letters into the Serbian Cyrillic writing system in the 19th century, the extensive use of diagraphs in some West Slavic languages) and, in particular cases, a politically motivated delimitation to neighbouring writing sys-tems. Moreover, grapheme systems tend in some way to be “conservative”, that is, they are

1 For the sake of completeness, the definition of letters is as follows (Altmann 2008: 149): “A letter is a single sign adopted from Latin, Greek or other alphabetic scripts not necessarily attributable to a single phoneme”.

Page 5: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.Emmerich Kelih Grapheme Inventory Size and Repeat Rate in Slavic Languages58

in some way resistant against changes, and usually changes in the phonological system are not immediately reflected in the writing system. In some cases, letters or graphemes lose their functionality because of changes in the underlying phoneme system.2 In other words, grapheme inventories can be understood as a quasi naturally-occurring system which is the result of self-regulation, functionality and culturally determined historical standardi-sation processes.

A systematic analysis of orthographic dictionaries, reference grammars, grammars and texts books is the basis for the overview of the grapheme systems of twelve Slavic standard languages in Table 1. The languages are already in ascending order of grapheme inventory size (K); quoted “alternative” counts of the grapheme inventories are given in brackets. Be-low a short description of the analysed Slavic writing systems with particular focus on cer-tain “specific” properties (use of diagraphs, inventory of “foreign” graphemes, etc.) is given.

Table 1: Graphemes and grapheme inventory size of Slavic Standard languages

Language Graphemes K

Slovene <a, b, c, č, d, e, f, g, h, i, j, k, l, m, n, o, p, r, s, š, t, u, v, z, ž> 25

Serbian <а , б, в, г, д, ђ, е, ж, з, и, ј, к, л, љ, м, н, њ, о, п, р, с, т, ћ, у, ф, х, ц, ч, џ, ш> 30

Croatian <a, b, c, č, ć, d, dž, đ, e, f, g, h, i, j, k, l, lj, m, n, nj, o, p, r, s, š, t, u, v, z, ž> 30

Bulgarian <а, б, в, г, д, е, ж, з, и, й, к, л, м, н, о, п, р, с, т, у, ф, х, ц, ч, ш, щ, ъ, ь, ю, я> 30

Macedonian <а, б, в, г, д, ѓ, е, ж, з, ѕ, и, ј, к, л, љ, м, н, њ, о, п, р, с, т, ќ, у, ф, х, ц, ч, џ, ш> 31

Russian <а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я> 33

Ukrainian <а, б, в, г, ґ, д, е, є, ж, з, и, і, ї, й, к, л, м, н, о, п, р, с, т, у, ф, х, ц, ч, ш, щ, ю, я, ь, > + <’>

33(34)

Belorussian <а б в г д е ё ж з і й к л м н о п р с т у ў ф х ц ч ш ы ь э ю я ъ> 33

Upper Sorb-ian

<a, b, c, č, d, e, ě, dź, f, g, h, ch, i, j, k, ł, l, m, n, ń, o, ó, p, q, r, ř, s, š, t, ć, u, v, w, x, y, z, >

37

Polish <a, ą, b, c, ć, d, e, ę, f, g, h, i, j, k, l, ł, m, n, ń, o, ó, p, r, s, ś, t, u, w, y, z, ź, ż, ch, cz, dz, dź, dż, rz, sz>+ (q, v, x)>

39(42)

Czech <a, á, b, c, č, d, ď, e, é, ě, f, g, h, ch, i, í, j, k, l, m, n, ň, o, ó, p, r, ř, s, š, t, ť , u, ú, ů, v, y, ý, z, ž> + <w, x, q>

39(42)

Slovak <a, á, ä, b, c, č, d, ď, dz, dž, e, é, f, g, h, ch, i, í, j, k, l, ľ, ĺ, m, n, ň, o, ó, ô, p, r, ŕ, s, š, t, ť, u, ú, v, y, ý, z, ž> + <q, w, x>

43(46)

Slovene, which uses Latin script, has an inventory of K = 25 (cf. Toporišič 20004: 72ff. and particularly the Slovenski Pravopis (20016: 3) with an overview of the grapheme-phoneme

2 For instance, in Old Church Slavonic the graphemes ь and ъ represented vowel sounds, where-as in modern Russian they mark only phonetic adaptations of the preceding or following conso-nants.

Page 6: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.59Glottotheory Vol. 4, Number 1

relation). Among the Slavic writing systems, Slovene has the smallest inventory size. Since the middle of 19th century, it has used diacritical elements, and suprasegmental features are not marked separately. Neither digraphs nor “foreign” letters are part of the Slovene writ-ing system.

Bulgarian, which uses Cyrillic script, has an inventory of 30 graphemes (cf. Pravopis-en rečnik […] (1983: 15) and Tilkov (1982: 287)). Since 1824, the Bulgarian writing system and orthography have been reformed nine times, always accompanied by changes of the grapheme inventory size. As in other Slavic Cyrillic scripts (Russian, Ukrainian, Belorus-sian) the soft sign <ь> (in front of <o>) and the yotated vowel letters <ю, я> signalise the palatalisation of the preceding consonant; only in word initial position and after a vowel do <ю, я> represent the combination of /j/ + /a/ resp. /j/ + /u/. Due to this economic represen-tation of the palatalisation, Bulgarian has a relatively efficient writing system in relation to the phoneme system (30 graphemes vs. over 40 phonemes). The two diagraphs <дж> and <дз>, which represent the phonemes /dž/ and /dz/ are not treated as “independent” graph-emes of Bulgarian, whereas <щ> (expressing the phoneme sequence š/+/t/) is considered an autonomous Bulgarian grapheme.3

There are no substantial structural differences between the Serbian (Cyrillic script) and the Croatian (Latin script) writing system. Both writing systems are the result of standardisation processes at the beginning of 19th century. The Latin Script uses diacritic marks and the digraphs <lj>, <nj> and <dž>. Cf. Anić/Silić (2001), Badurina/Marković/Mićanović (2008:3)2 and Babić/Finka/Moguš (20005: 4) for a detailled analysis of the Croatian writing system. The Cyrillic script makes use of the ligatures <њ, љ> and <џ>. Cf. the overview of the Serbian writing system in Rečnik srpskoga jezika (2007: 6) and Pešikan/Jerković/Pižurica (2006: 12f.)8. Both languages have an inventory size of 30 graphemes, and prosodic features are not marked in the standard orthography.

The Macedonian writing system, a relatively young one (official codification 7 June 1945), has absorbed various basic features from the Serbian orthographic system (e. g. the use of the ligatures <љ> and <њ>); furthermore, it makes use of the newly invented signs <ѓ> and <ќ>, and <ѕ> was borrowed from the Latin script. The Macedonian writing system has 31 graphemes (cf. Pravopis na makedonskiot […] (1970: 1), with an almost perfect 1:1 relation of graphemes and phonemes, except for the syllabic /r/, which is not expressed by a separate grapheme (cf. Koneski 1952: 75).

The Russian alphabet has 33 graphemes (cf. Pravila russkoj […] 2009: 12). This graph-eme inventory includes <ё>, which is part of the “official” Russian writing system, but its use is only obligatory for particular texts in textbooks and linguistic reference books, whereas in other texts the use of <ё> depends on the writer’s preference. According to Jak-ovlev (1928), one of the first system-oriented analyses of the Russian writing system, the Russian alphabet can be treated as a relatively economic one due to the efficient expression

3 For further details about the Bulgarian writing system and the orthographic reform in 1945, see Back (1982).

Page 7: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.Emmerich Kelih Grapheme Inventory Size and Repeat Rate in Slavic Languages60

of palatalised consonants with the help of the soft sign <ь> and yotated vowels (cf. Refor-matskij 1933 for a further discussion).

Similar techniques for marking palatalised consonants are applied in the Ukrainian writing system, which contains 33 graphemes. Special attention has to be paid to the graph-eme <ґ>, which has a relatively high symbolic importance for the Ukrainian culture: After its abolishment in 1928 (cf. Hornjatkevyč 1993: 300), it was for a long time used only by Ukrainian emigrants, and was re-introduced in 1990 (cf. Koc’-Hryhorčuk 1997). Howev-er, it has to be noted that the use of this grapheme is still controversial today (cf. Danylen-ko 2005) and not yet regulated by “official” orthographical rules. The situation is similar regarding the apostrophe <’>, which according to the official Ukrainian orthography is not part of the Ukrainian writing system4.

Belorussian, like Ukrainian and Russian, has 33 graphemes (cf. Burak 1974: 34), with the (in regard to other Slavic Cyrillic scripts) unique grapheme <ў>, representing the bila-bial semi-vowel /w/. The recent Belorussian script system is a result of codification process-es in 1926 and 1933 (cf. May 1977 and 1978 for further details).

The Latin writing system of Upper Sorbian consists of 37 graphemes (cf. Stone 2006: 178); sometimes a smaller grapheme inventory with 34 graphemes, such as for instance in Stone (1993: 601), is stated. The Upper Sorbian writing system is the result of standardi-sation processes since the middle of the 19th century, which was accompanied by a certain amount of clerical resistance against a unification of the Upper Sorbian writing system. The last orthographical reform was carried out in 1948 (cf. Fasske 1984, Pohončowa 2000: 6ff) with some minor changes, such as for instance the replacement of <kh> by <ch>. Generally speaking, the Upper Sorbian writing systems also make use of diacritical signs as well as digraphs (<ch, dź >). The different inventory size (37 versus 34 graphemes) is mainly caused by the diverging treatment of the graphemes <q, v, x>, which can only occur in foreign and loan words. Furthermore, these graphemes are treated in official orthogra-phies as “additional” graphemes (cf. Völkel 1980: 13).

According to the Nowy słownik […] (2000: CXXXVII) and Birnbaum/Molas (2006), the Polish alphabet (Latin script) has 32 graphemes, although this grapheme inventory contains only simple, not-compounded graphemes as units. Additionally, in Nowy słownik […] (2000: CXXXVII) <q, v, x> are mentioned, but they can occur only in foreign words, and are therefore given in the official orthography in brackets, which emphasises their pe-ripheral status within the Polish writing system. Furthermore – and this decision has a large influence on the number of graphemes in the inventory – Polish has seven “separate” digraphs: <ch, cz, dz, dź, dż, rz, sz>. Consequently, including these digraphs, Polish has an inventory of 39 graphemes, and including the foreign letters even an inventory of 42 graph-emes. Generally, Polish, similar to Upper Sorbian, also uses diacritical signs and digraphs.

Czech (Latin script) has an inventory of 42 graphemes (cf. Pravidla 1993: 11). This relatively large number is due to the fact that the vowel quantity (length) is marked sep-

4 In Grzybek/Kelih (2005), an alternative interpretation of the Ukrainian alphabet with K = 34 (including the apostrophe) is offered.

Page 8: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.61Glottotheory Vol. 4, Number 1

arately, and thus all vowels graphemes (like á, é, í, etc.) are understood as independent graphemes of the Czech writing system; it also makes use of digraphs (<ch>). The graph-emes <q, w, x> are treated as part of the Czech writing system, even though they occur only in foreign and loan words (cf. Vintr 2006: 195). The grapheme <g>, representing the phoneme /g/, which has not yet been fully integrated into the Czech phoneme system, has a similar position. The phoneme /ou/ is not represented by a separate grapheme, thus the Czech writing system has in total an inventory of 42 units.

Within all Slavic languages the Slovak alphabet (Latin script) has the largest alphabet with K = 46 graphemes (cf. Pravidlá slovenského […] 2000: 25). It makes uses of diacriti-cal elements and of the digraphs <dz, dž, ch>. Similarly to Czech, vowel length is marked by separate graphemes; the letters <q, w, x> occur only in foreign words, but they are treat-ed as part of the Slovak writing system. The marginal position of <ä> is determined by the peripheral phonological status of /ae/. In the academic grammar of Slovak (cf. Pauliny/Ružička/Štolc 1968: 89)5, the graphemes <x, w, q, ö, ü >, which occur in foreign words only, are not treated as units of the Slovak writing system. The diphthongs <ia, ie, iu> are represented by the combinations of letters <ia, ie, iu>, which again are not treated as sepa-rate graphemes of Slovak.

As a first intermediate result, it can be stated that all Slavic writing systems are charac-terised by a systematic utilisation of different techniques of the representation of phonemes in general: the most typical feature of the Latin scripts is the usage of diacritical signs and digraphs. Furthermore, all graphemes can be quite consistently attributed to their corre-sponding phonemes. A noticeable characteristic of the Slavic writing systems is their dif-fering treatment of so called “foreign” graphemes, e. g. graphemes occurring only in for-eign and loan words. These are partly understood as an integral part of the writing system, such as for instance in Czech, Slovak and Upper Sorbian, whereas for instance in South Slavic languages, the graphemes/letters <x, y, w> can occur in texts5, but are nevertheless not treated as part of the writing systems of these languages.

One of the main important characteristics of the Cyrillic scripts is the economical marking of the palatalisation (except for Serbian and Macedonian), and thus the East Slavic languages in particular have relatively small grapheme inventories in comparison to their phoneme systems.

However, due to different traditions of the analysed writing systems, the determina-tion of the number of graphemes in our cross-linguistic examination leads partly to some inconsistent results: Whereas for instance in Slovak, Croatian, Polish and other languages digraphs are explicitly treated as separate units (graphemes), in other languages (such as for instance the Bulgarian <dž> and <dz>) graphemes are not interpreted as digraphs. Sim-ilar problems, as already discussed above, arise in regard to occurring “foreign” letters, or letters which are assumed to be not fully “integrated” into a writing system.

5 An alternative approach is that these “foreign” letters are substituted by letters already occurring in the system like <ks> for <x>, etc.

Page 9: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.Emmerich Kelih Grapheme Inventory Size and Repeat Rate in Slavic Languages62

As already mentioned in the introduction, the determination of the grapheme inven-tories cannot be the result, but only a necessary intermediate step of a linguistic investiga-tion of the systemic value of the inventory size. Our approach does not involve the determi-nation of a single adequate definition of the term grapheme and of the inventory size, but rather an inductive empirical test of the adequacy of proposed determined grapheme in-ventories for the exploration of the interrelation between inventory size and the repeat rate. Thus, the adequacy of the grapheme definitions used results not solely from pure linguistic considerations, but rather from their empirical relevance.

2 Used data

For the empirical analysis, grapheme frequencies were determined in the Russian novel “How the steel was tempered” and its translations into eleven Slavic languages. The paral-lel text corpus used is described in detail in Kelih (2009a, 2009b). The basis motivation for the use of parallel texts is to achieve a high level of homogeneity in cross-linguistic inves-tigations and comparisons. The determined grapheme frequencies are published in Kelih (2009d); thus, there is no need for a re-publication in this paper. One basic problem of the study of grapheme frequencies at text level is that some graph-emes do not occur in the text, especially certain “peripheral” graphemes. Hence, two dif-ferent grapheme inventories have to be distinguished: (1) K as the systemic grapheme in-ventory, which includes all graphemes as listed in Table 1, regardless of their occurrence in texts, and (2) Kemp as the empirical grapheme inventory, which includes only the graph-emes occurring in the analysed texts.

Table 2: Inventory and sample size6

Language6 K Kemp. not occurring graphemes sample size N

1 Slo. 25 25 – 288871

2 Serb. 30 30 – 265344

3 Cro. 30 30 – 269384

4 Bulg. 30 30 – 276131

5 Mac. 31 31 – 283510

6 Rus. 33 32 ё 266055

7 Ukr. 34 34 – 264283

8 Cz. 42 40 q 255880

9 Sk. 46 45 q 257795

10 Pol. 39 39 – 276623

11 Sorb. 37 34 q, v, x 297996

12 Belor. 33 33 – 266237

6 For the languages in the tables and figures the following abbreviations are used: Bulg. = Bulgarian, Cro. = Croatian, Mac. = Macedonian, Sorb. = Upper-Sorbian, Pol. = Polish, Rus. = Russian, Serb. = Serbian, Sk. = Slovak, Slo. = Slovene, Cz. = Czech, Ukr. = Ukrainian, Belorus. = Belorussian.

Page 10: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.63Glottotheory Vol. 4, Number 1

There are some notable (cf. Table 2) differences between K and Kemp, especially in West Slavic languages, where the graphemes <q, x> etc. are not realised; in Russian, we are confronted with the <ё> problem, i.e. this letter occurs in the used novel only as <e>. In all other lan-guages there are no particular differences between the systemic and the empirical graph-eme inventories.

2.1 Interrelation between inventory size and repeat rate

As known from quantitative phonology (cf. Altmann/Lehfeldt 1980: 151f.), there is an interrelation between the inventory size and the repeat rate of grapheme rank frequencies. The repeat rate can be computed as 2

1

n

rr

RR p=

= ∑ , i.e. the sum of the square root of the relative grapheme frequencies.

The repeat rate is generally an indicator for the degree of the uniformity of rank fre-quency distribution, and gives some information about the symmetrical/asymmetrical distribution of graphemes. For further details about the repeat rate, cf. Gryzebk/Kelih/Altmann (2005: 122f.).

It is well known both from the theoretical and the empirical viewpoint, that there is systematic regulation of the repeat rate of graphemes/phonemes in dependency of the cor-responding inventory size. Altmann/Lehfeldt (1980: 158f.) show, based on 63 languages, a successive decline of the repeat rate with increasing inventory size. Or in other words, lan-guages with a low inventory size tend to overuse particular graphemes/phonemes, whereas languages with higher inventory sizes tend to use their graphemes/phonemes much more uniformly. This characteristic seems to be a basic mechanism of grapheme and phoneme systems, functionally motivated by the self-regulation of redundancy in texts.

The interrelation between inventory size (K) and repeat rate (RR) is usually captured by the simple power model bRR aK −= , as shown in Altmann/Lehfeldt (1980: 171f.), where

Table 3: Inventory size and repeat rate

Language K Kemp. RR

Slo. 25 25 0.0624

Serb. 30 30 0.0621

Cro. 30 30 0.0616

Bulg. 30 30 0.0627

Mac. 31 31 0.0697

Rus. 33 32 0.0543

Ukr. 34 34 0.0495

Belor. 33 33 0.0556

Cz. 42 41 0.0455

Sk. 46 45 0.0511

Pol. 39 39 0.0489

Sorb. 37 34 0.0494

Page 11: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.Emmerich Kelih Grapheme Inventory Size and Repeat Rate in Slavic Languages64

a and b are iteratively determined parameters.7 Based on these findings the following hy-pothesis in the Slavic parallel-text corpora will now be tested: the higher the inventory size, the lower the repeat rate, expressed by this corresponding formula: RR = a*K-b.The proposed interrelation has to be tested in two different ways: firstly, the systemic inven-tory size, and secondly the empirical inventory size (Kemp) have to be interrelated with the repeat rate. The results are as follows: For K an R² = 0.617 (a = 0.49; b = 0.62) and for Kemp (a = 0.49; b = 0.62) a slightly worse R² = 0.5681 is obtained.

Figure 1: Inventory size vs. repeat rate

As can be seen, in both cases we are dealing with the same parameters, the only notable dif-ference is that in the case of the systemic inventory, the fitting result is slightly better. Nev-ertheless, it must be emphasised that in both cases the results are not satisfying – usually an R² > 0.80 is considered to be an indicator of a satisfying fit8. Hence it must be conclud-ed that the interrelation between inventory size and repeat rate in the used material indeed cannot be corroborated empirically; however there are no theoretical objections to the pro-posed hypothesis. It seems that the linguistic boundary conditions of the analysis have to be discussed once again, with particular focus on the determination of the used inventory sizes, which will be carried out in the next chapter.

2.2 Discussion: Linguistic boundary conditions

Several reasons can be proposed for the obtained unsatisfying modelling. The two most likely ones are as follows: (1) the parallel texts used, i. e. the Russian novel and in particular its translations into Slavic languages are not appropriate for the empirical analysis of writ-ing systems, or (2) the grapheme inventories in some languages have not been determined adequately9. Due to the lack of systematic investigation of the repeat rate and the invento-

7 In the re-analysis of the Altmann/Lehfeldt (1980) data by Grzybek/Kelih (2005), a satisfying R² = 0.7928 was obtained.

8 The use of a linear model does not produce any better results.9 An additional as yet not considered influence factor is the degree of the distinctness of a writing

system.

K vs. RR (*R² = 0.617) Kemp. vs. RR (R² = 0.5681)

Page 12: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.65Glottotheory Vol. 4, Number 1

ry size in Slavic languages generally10, the focus below is on some alternative definitions of the graphemes and alternative grapheme inventories.

First of all, in Fig. 2 a graphic representation of the relation between empirical inven-tory size and the repeat rate in Slavic languages is given.

Clearly a general trend of a decrease in the repeat rate with increasing inventory size is seen. However, there is notable deviant behaviour of particular languages: especially Slovak and Macedonian seem to have a too high repeat rate in comparison to other Slavic languages and to their own inventory size, whereas Slovene – again when compared with other Slav-ic languages, seems to have a too low repeat rate. Generally, it occurs that in particular the languages with the lowest inventory size (Slovene) and the highest inventory size (Slovak) are in disequilibrium regarding their inventory size and repeat rate. The same holds true for Macedonian which is the youngest Slavic standard language with an almost synthet-ically constructed writing system. Clearly this language – and this has to be analysed in detail in the future – is located at the periphery in terms of its attractors and has not yet reached a balanced state of inventory size and repeat rate. A further, more general reason

10 There are several analyses of grapheme frequencies of Slavic languages (cf. Grzybek/Kelih/Alt-mann (2004, 2005) for Russian, Gryzbek/Kelih (2005a), Grzybek/Kelih (2005b) for Slovak and Grzybek/Kelih/Stadlober (2006) for Slovene), where different theoretical discrete proba-bility distributions have been tested in order to model grapheme frequencies. A preliminary, but as yet unpublished, analysis of the relation between repeat rate and the grapheme inventory – the same grapheme inventories as in chapter 2.1 are used – showed a similar unsatisfying result, i. e. it can be concluded that not the analysed material, but rather some deeper structural inadequacies clearly cause the unsatisfying modelling.

Figure 2: Kemp. vs. repeat rate

Page 13: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.Emmerich Kelih Grapheme Inventory Size and Repeat Rate in Slavic Languages66

for the deviant behaviour of some languages can be a weak correspondence between the writing and the sound/phoneme system, such as for instance in Slovak, where <y, ý> are “separate” graphemes, but without corresponding phonemes /y, ý/. Furthermore there is no phonetic or phonological difference between /i/ and /y/ in general. Thus, in the next step, the possibilities of an “altern ative” determination of the grapheme inventory for Slovene, Slovak and Macedonian11 has to be discussed.

As already mentioned above, Slovak has a too high repeat rate and a too high invento-ry size. Up to now, the three digraphs (<dz, dž, ch>) have been treated as separate graph-emes of the Slovak language. An interpretation of all constituent letters of these graphemes (<d, z, ž, c, and h> are already part of the Slovak writing system) as single units – and this can be done without notable loss of information (cf. Grzybek/Kelih/Altmann 2005a, 2005b, 2005c) – leads to grapheme inventory size with 43 units. A further reduction of the grapheme inventory can be reached by eliminating the so called “foreign” graphemes. In the parallel text used, for instance <q> does not occur at all; <w> occurs six times and <x> ten times in the Slovak translation, which in total consists of 260690 graphemes. The elim-ination of the three graphemes <w, x, q> yields a new, “alternative” grapheme inventory of Slovak with 40 units.

The Slovenian language has the writing system with the smallest inventory size, and in-terestingly enough, Slovene is one of the Slavic languages, which has less graphemes than phonemes (cf. Kelih 2008 with a detailed analysis of the grapheme-phoneme relation of Slovene). A lower number of graphemes than phonemes is in fact also typical for East Slavic languages, but it is clearly only in the case of Slovene that the lower number of graphemes causes an over-exploitation of particular graphemes, such as for instance <o>, which rep-resents the phonemes /o/, /o:/ and /:/, and <e>, which represents the phonemes /e/ /e:/ and /ε:/. Furthermore, there is no separate grapheme for the representation of the phoneme /ə/ (cf. Toporišič 2000: 72) and /dž/. Thus, taking into account the phonological system of Slo-vene, one can also operate with an “alternative” grapheme system of 29 units, which cap-tures the relationship between inventory size and repeat rate more “optimally”. However, in view of an observable conservatism in performing orthographical reforms, an enlarge-ment of the Slovene grapheme system is improbable, but, as it has to be shown below, at least required from the perspective of systemic needs.

Finally, the status of the Macedonian writing system has to be discussed. The repeat rate of this language is too high in regard to other Slavic languages, and thus a linguisti-cally motivated reduction of the grapheme inventory has to be performed. One possibili-ty is an alternative interpretation of the occurring ligatures <љ> and <њ>, which must not be counted as two separate graphemes: only their components should be taken into con-sideration. This leads to a reduction of the Macedonian writing system from 31 to 29 units.

11 The correctness of the identification of outliers can be proven empirically: excluding Slovak, Mac-edonian and Slovene from the modelling, one immediately obtains a satisfying R² = 0.84 (a = 1.99 and b = 1.027). In addition to this empirically motivated argument, one has to offer a linguistic explanation.

Page 14: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.67Glottotheory Vol. 4, Number 1

Due to the high symbolic value of these two graphemes of the Macedonian alphabet, which were consciously taken from the Serbian writing system, an introduction of new digraphs like <нј> <лј> is not realistic. However, such a decision can be motivated at least by consid-erations from the point of view of system linguistics.

But it has to be noted that during the discussion concerning an appropriate writing sys-tem for Macedonian in 1944–1945, for a short time even a grapheme system with 25 graph-emes was discussed (cf. for details Friedman 1993 and Ilievski 2005). The suggestion was to not use the letters <њ, љ, ќ, ѓ>, but only the letters <н, г, к, г> with an additional apostro-phe to mark the corresponding sounds. On the other hand, an enlargement of the graph-eme system was discussed at the same time. In particular, the introduction of a special sign for the occurring semi-vowel (“Schwa-Laut”) and of a new letter for the syllabic phoneme /r/, resulting in a grapheme inventory of 33 units, was suggested. Generally, this discussion about the Macedonian writing system is a representative example for a quite broad range of possibilities of the determination of a grapheme inventory. As can be seen, not only linguis-tic considerations play a role, but also cultural and language policy. The recent Macedonian writing system and its units are clearly motivated by a conscious closeness (use of <њ, љ>) to the Serbian writing system, and simultaneously by a conscious delimitation from the Ser-bian and Bulgarian writing systems, with the introduction of the new signs <ќ> and <ѓ>. In this respect, writing systems are mostly the result of a balance/misbalance of language-in-ternal (degree of the repeat rate, control of redundancy) and language external needs (close-ness, delimitation etc. from other cultures, impact of cultural and language policy, etc.).

After the introduction of “alternative” grapheme inventories, our proposed hypothe-sis can finally be tested again. Based on the new conditions of the analysis, i.e. the change of particular grapheme inventories (Slovene Knew = 29, Slovak Knew = 40 and Macedonian Knew = 29), the following convincing result is obtained: using the power model RR = 1.88* Knew

-1.0065, a R² = 0.8085 indicates a well-fitting result, which is much better than the pre-vious analyses (R² = 0.61 and R² = 0.56). A graphical representation of the new model12 is given in Figure 3.

A further improvement of the result can be achieved by generalizing the power func-tion Using a model with three parameters RR = 0.0000008 * Knew

4.76 * exp(1)(-195,56/Knew), a R² = 0.8950 is obtained, i.e. an increase of parameters leads to a significant improvement of the fitting result. This indicates the existence of a further “disturbing” factor in the inventory-re-peat rate relation13.

12 One must note a marginal methodological flaw in the performed analysis: for Slovak, the corre-sponding graphemes <x,w> have been replaced by Slovak letters (<x> expressed by <ks>, w ex-pressed by <vv>); a similar procedure has been used for Macedonian. For Slovene, due to miss-ing possibilities of an automatic detection of the alternatively introduced “graphemes” (which are in fact phonemes), an analysis of the grapheme frequencies based on the new inventory has not been performed.

13 The use of a linear model is not appropriate.

Page 15: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.Emmerich Kelih Grapheme Inventory Size and Repeat Rate in Slavic Languages68

A second possibility is a further discussion of “appropriate” and “adequate” grapheme inventories for other Slavic languages. However, the focus of the presented analysis must be on the “positive” proof of the regulation of the repeat rate, in dependency of the inventory size in Slavic parallel texts. Furthermore, to draw a more general conclusion, in quantita-tive cross-linguistic investigations of writing systems, no generally valid grapheme defini-tions can be given, but rather a language-specific decision has to be made. And thus it can be concluded that the “improvement” of the graphemic system should be made in agree-ment with the theoretical model that takes into account the boundary conditions.

3 References

Altmann, G. & Lehfeldt, W. (1980): Einführung in die Quantitative Phonologie. Bochum: Brockmeyer (Quantitative Linguistics, 7).

Anić, V. & Silić, J. (2001): Pravopis hrvatskoga jezika. Zagreb: Novi Liber. Amirova, T. A. (1977): K istorii i teorii grafemiki. Moskva: Nauka.Amirova, T. A. (1985): Funkcional'naja vzaimosvjaz' pis'mennogo i zvukovogo jazyka.

Moskva: Nauka.Back, O. (1982): Bemerkungen zur bulgarischen Orthographie, in: Die slawischen Sprachen,

1, 5-12. Babić, St.; Finka, M. & Moguš, M. (2000)5: Hrvatski pravopis. Zagreb.Badurina, L.; Marković, I. & Mićanović, K. (2008)2: Hrvatski pravopis. Zagreb: Mati-

ca Hrvatska. Birnbaum, H. & Molas, J. (2006): Das Polnische. in: Rehder, P. (ed.) (2006), 145–164.Burak, L. V. (1974): Sučasnaja belaruskaja mova. Minsk: Vyš. Škola. Comrie, B. & Corbett, G. G. (eds.) (1993): The Slavonic languages. London/New York:

Routledge.Danylenko, A. (2005): From G to H and again to G in Ukrainian. Between the West Eu-

ropean and Byzantine Tradition?, in: Die Welt der Slaven 50, 33-56.Dürscheid, Christa (2006): Einführung in die Schriftlinguistik. 3. überarbeitete und

ergänzte Auflage. Wiesbaden: Westdeutscher Verlag (Studienbücher zur Linguistik 8).

Figure 3: Inventory size vs. repeat rate in Slavic languages

Page 16: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.69Glottotheory Vol. 4, Number 1

Fasske, H. (1984): Zur Herausbildung einer einheitlichen Graphik und Orthographie des Obersorbischen im 19. Jahrhundert, in: Zeitschrift für Slawistik, 29, 6, 872–878.

Friedman, Victor A. (1993): The First Philological Conference for the Establishment of the Macedonian Alphabet and the Macedonian Literary language: Its Precedents and Consequences, in: Fishman, Joshua A. (ed.) (1993): The Earliest Stage of Language Planning: The ‘First Congress’ Phenomenon. Berlin: Mouton de Gruyter, 159–180.

Grzybek, P. & Kelih, E. (2005a): Graphemhäufigkeiten im Ukrainischen. Teil I: Ohne Apostroph. In: Altmann, G., Levickij, V., Perebyjnis, V. (eds.), Problemy kvantytatyvnoї lingvistyky – Problems of Quantitative Linguistics: 159–179. Černivci: Ruta.

Grzybek, P. & Kelih, E. (2005b): Towards a general model of grapheme frequencies in Slavic languages. In: Garabík, R. (ed.), Computer Treatment of Slavic and East Europe-an Languages: 73–7. Bratislava: Veda.

Grzybek, P. & Kelih, E. (2005c): Häufigkeiten von Buchstaben/Graphemen/Phonemen: Konvergenzen des Rangierungsverhaltens, in: Glottometrics 9, 62–73.

Grzybek, P.; Kelih, E. & Altmann, G. (2005a): Graphemhäufigkeiten im Slowakischen (Teil I: Ohne Digraphen), in Nemcová, E., (eds.): Philologia actualis slovaca. UCM, Tr-nava. [in Druck].

Grzybek, P.; Kelih, E. & Altmann, G. (2005b): Graphemhäufigkeiten im Slowakisch-en (Teil II: Mit Digraphen), in: Sprache und Sprachen in Mitteleuropa. GeSuS, Trna-va (2005), 661–684.

Grzybek, P.; Kelih, E. & Altmann, G. (2005c): Graphemhäufigkeiten (am Beispiel des Russischen). Teil III: Die Bedeutung des Inventarumfangs – eine Nebenbemerkung zur Diskussion um das ‘ё’, in: Anzeiger für Slavische Philologie 33, 117–140.

Grzybek, P.; Kelih, E. & Altmann, G. (2004): Häufigkeiten russischer Grapheme. Teil II: Modelle der Häufigkeitsverteilung, in: Anzeiger für Slavische Philologie 32, 25–54.

Grzybek, P.; Kelih, E. & Stadlober, E. (2006): Graphemhäufigkeiten des Slowenischen (und anderer slawischer Sprachen). Ein Beitrag zur theoretischen Begründung der sog. Schriftlinguistik, in: Anzeiger für Slavische Philologie, 34; 41–74.

Hornjatkevyč, A. (1993): The 1928 Ukrainian orthography, in: Fishman, Joshua A. (ed.) (1993): The Earliest Stage of Language Planning: The ‘First Congress’ Phenomenon. Ber-lin: Mouton de Gruyter, 293–303.

Ilievski, P. (2005): Makedonskiot pravopis: Svečen sobir po povod 60-godišninata od done-suvanjeto na makedonskata azbuka i pravopis i nivnata primena vo Bukvarot. Skopje (Svečeni sobiri, sveska 50).

Jakovlev, N. F. (1928): Matematičeskaja formula postroenija alfavita, in: Kul’tura i pis’mennost’ Vostoka, kn. I. Moskva, 41–64.

Kelih, E. (2009a): Slawisches Parallel-Textkorpus: Projektvorstellung von “Kak zakaljalas’ stal’ (KZS)”, in: Kelih, E.; Levickij, V. & Altmann, G. (eds.) (2009), 106–124.

Kelih, E. (2009b): Preliminary analysis of a Slavic parallel corpus, in: Levická, J. & Gara-bík, R. (Hg.): NLP, Corpus Linguistics, Corpus Based Grammar Research. Fifth Inter-national Conference Smolenice, Slovakia, 25–27 November 2009. Proceedings. Bratisla-va: Tribun, 175–183.

Page 17: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.Emmerich Kelih Grapheme Inventory Size and Repeat Rate in Slavic Languages70

Kelih, E. (2009c): Zur Homogenität von Graphemhäufigkeiten in Texten: Evidenz aus dem Russischen, in: Kelih, E.; Levickij, V. & Altmann, G. (eds.) (2009), 85–105.

Kelih, E. (2009d): Graphemhäufigkeiten in slawischen Sprachen: Stetige Modelle, in: Glot-tometrics, 18, 53–69.

Kelih, E. (2008): Phoneminventar – Wortlänge: Einige grundsätzliche Überlegungen, in: Aktual’ni problemy hermans’koї filolohiї. Materialy III mižnarodnoї naukovoї konferenciї prisvjačenoї 70-riččju vid dnja narodžennja profesora, doktora filologičnych nauk V.V. Levic’kogo. Černivci: Knihi XXI, 25–29.

Kelih, E.; Levickij, V. & Altmann, G. (eds.) (2009): Metody analizu teksta/Methods of Text Analysis. Černivci: ČNU.

Kempgen, S. (1995): Phonemcluster und Phonemdistanzen (im Russischen), in: Weiss, D. (ed.): Slavistische Linguistik 1994. Referate des XX. Konstanzer Slavistischen Arbeit-streffens Zürich 20.–22.9.1994. München: Sagner (Slavistische Beiträge, 332), 197–221.

Koc’-Hryhorčuk, L. (1997): Zadlja jedinoho ukraїn’skoho pravopysu, in: Onyškevyč, L. et al. (1997): Pro ukraïns’kyj pravopys i problemy movy: zbirnyk dopovidej Movnoï Sekciï 16-oï Ričnoï Konferenciï Problematyky Urbana-Šampejn, 20–25 červnja 1997. Es-says on Ukrainian orthography and language: proceedings of the Sixteenth Annual Con-ference on Ukrainian Subjects at the University of Illionois at Urban-Champaign, June 20–25, 1997. N’ju Jork u.a.: Naukove Tovarystvo Im. Ševčenka, 87–95.

Kohrt, M. (1985): Problemgeschichte des Graphembegriffs und des frühen Phonembegriffs. Tübingen: Niemeyer (Reihe Germanistische Linguistik, 61).

Köhler, R. (1986): Zur linguistischen Synergetik: Struktur und Dynamik der Lexik. Bo-chum: Brockmeyer (Quantitative Linguistics, 31).

Mayo, P. J. (1977): The alphabet and orthography of Byelorussian in the 20th century, in: The Journal of Byelorussian Studies 4, 28–48.

Mayo, P. J. (1978): Byelorussian orthography: From the 1933 reform to the present day, in: The Journal of Byelorussian Studies, 4, 25–47.

Nettle, D. (1995): “Segmental inventory size, word length, and communicative efficiency”, in: Linguistics 33, 2; 359–367.

Nettle, D. (1998): “Coevolution of Phonology and the Lexicon in Twelve Languages of West Africa“, in: Journal of Quantitative Linguistics, 5; 240–245.

Nowy słownik […] (2000). Nowy słownik ortograficzny PWN. Warszawa. Petr, J. u.a. (eds) (1986): Mluvnice češtiny. Fonetika, Fonologie, Morfonologie a morfemika.

Tvoření slov. Praha: Academia. Pešikan, M.; Jerković, J. & Pižurica, M. (2006)8: Pravopis srpskog jezika. I. Pravila. II.

Rečnik. Novi Sad u. a: Matica Srpska.Pohončowa, Anja (2000): Procowanja wo pśibliženje gorno- a dolnoserbskogo pšawopisa

po lěśe 1945, in: Lětopis. Časopis za rěč, stawizny a kulturu Łužiskich Serbow, 47, 1, 3–21Pravopis na makedonskiot […] (1970). Pravopis na makedonskiot literaturen jazik so pravo-

pisen rečnik. Skopje: Prosvetno delo. Pravidla českého […] (1993): Pravidla českého pravopisu. Praha: Academia.

Page 18: International Journal of Theoretical Linguisticshomepage.univie.ac.at/emmerich.kelih/wp-content/uploads/... · 2014-02-15 · International Journal of Theoretical Linguistics Aaee

© A

kadem

ie Verlag

. This d

ocu

men

t is pro

tected b

y Germ

an co

pyrig

ht law

. You

may co

py an

d d

istribu

te this d

ocu

men

t for yo

ur p

erson

al use o

nly. O

ther u

se is on

ly allow

ed w

ith w

ritten p

ermissio

n b

y the co

pyrig

ht h

old

er.71Glottotheory Vol. 4, Number 1

Pravidlá slovenského […] (2000): Pravidlá slovenského pravopisu (2000): Jazykovedný Ústav L'udovíta Štúra. 3. verbesserte und erweiterte Auflage. Bratislava: Veda.

Pravila russkoj […] (2008): Pravila russkoj orfografii i punktuacii.  Polnyj akademičeskij spravočnik. Moskva: ĖKSMO.

Pravopisen rečnik […] (1983): Pravopisen rečnik na săvremennija bălgarski knižoven ezik. Sofija: Akademija na naukite.

Reformatskij, A. A. (1933): Lingvistika i poligrafija, in: Pis’mennost‘ i Revoljucija, 17, 42–58.Rehder, Peter (ed.) (2006): Einführung in die slawischen Sprachen. 5. Auflage. Darmstadt:

Wissenschaftliche Buchgesellschaft.Ružička, J. & Štolc, J. (1968)5: Slovenská gramatika. Bratislava.Slovenski Pravopis (2001)6: Slovenski Pravopis. Ljubljana: SAZU.Stone, Gerald (2003): Sorbian (Upper and Lower), in: Comrie, B. & Corbett, G. G.

(eds.) (1993): The Slavonic languages. London/New York: Routledge, 593-685.Stone, G. (2006): Das Obersorbische. In: Rehder, P. (ed.) (2006), 178-187.Toporišič, Jože (2000): Slovenska Slovnica. Četrta, prenovljena in razširjena izdaja. Mari-

bor: Založba Obzorja.Tilkov, D. [et al.] (eds.) (1982): Gramatika na săvremennija bălgarski knižoven ezik. Tom 1.

Fonetika. Sofija: Izdatelstvo na Bălgarskata Akademija na Naukite.Ukraïns‘kyj pravopys (1996)5: Nacional‘na Akademija Nauk Ukraïny. Instytut Movoznavs-

tva im. O. O. Potebni. Instytut ukraïns‘koï movy. Kyïv: Naukova Dumka. Völkel, P. (1980)³: Prawopisny słownik hornjoserbskeje rěče. Hornjoserbsko-němski słownik.

Obersorbisch-deutsches Wörterbuch. Budyšin: Domowina. Vintr, J. (2006): Das Tschechische, in: Rehder, P. (ed.) (2006), 194–213.

Dr. Emmerich Kelih Institut für Slawistik Universität Wien [email protected]