new approaches to language and prehistory from typology, genetics, and quantitative linguistics...

41
New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Upload: river-faulkner

Post on 01-Apr-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

New approaches to language andprehistory from typology, genetics,

and quantitative linguistics

Søren Wichmann

MPI-EVA & Leiden University

Page 2: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Lecture I

Page 3: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Language and prehistory

• Linguistic reconstruction is one of our primary tools for learning about the prehistoric past. In many ways it is our best, and this is especially true at time depths where archaeology has trouble identifying the ethnicity of its subject matters.

Robert L. Rankin

• Too many comparative historical linguists want to dig up Troy, linguistically speaking. They consider it more important that comparative linguistics shed light on prehistoric migrations than that it shed light on the nature of language change. I can only say that I do not share those views on the focus of comparative linguistics. I do not consider comparative linguistics a branch of prehistory. . .

S. P. Harrison

Page 4: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Traditional methods

• Battery of tools in traditional historical linguistics:– Wörter und Sachen– Loanwords

• Aspects of prehistory within reach of these methods:– Reconstruction of cultural and environmental

inventories, homelands– Language contact situations

Page 5: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Wörter und Sachen

Make reconstructions of lexical items for a given proto-language and infer something about the culture and environment of the speakers.

A time-honored method, which has been applied to many language families in the world.

Page 6: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Indo-European

Reconstructible items include:

AGRICULTURE: ‚grain‘, ‚fruit‘, ‚barley‘, ‚wheat‘, ‚rye‘, ‚chaff‘, ‚field‘, ‚garden‘, ‚to plough‘, ‚plough‘, ‚ploughshare‘, ‚furrow‘, ‚harrow‘, ‚hoe‘, ‚sow‘, ‚harvest‘, ‚mow‘, ‚sickle‘, ‚thresh‘, ‚winnow‘, ‚grind‘, ‚quern‘

DOMESTIC ANIMALS: livestock‘, ‚herdsman‘, ‚graze‘, ‚guard, protect‘, ‚larger domestic animal‘, ‚pig‘, ‚boar‘, ‚piglet‘, ‚sheep‘, ‚ram‘, ‚ewe‘, ‚lamb‘, ‚goat‘, ‚bovine‘, ‚bull‘, ‚to milk‘, ‚milk‘, ‚curds‘, ‚whey‘, ‚buttermilk‘, ‚butter‘, ‚dog‘, ‚horse‘

FOODS: ‚salt‘, ‚honey‘, ‚mead‘, ‚beer‘, ‚wine‘, ‚apple‘, ‚cherry‘, ‚berry‘, ‚blackberry, mulberry‘, ‚bean‘, ‚porridge‘, ‚broth‘.

Page 7: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Indo-European (cont.)

ECONOMY: ‚exchange‘, ‚to sell‘, ‚to buy‘, ‚purchase‘, ‚payment, prize‘, ‚gift‘, ‚wealth‘.

LEGAL TERMS: ‚law‘, ‚plead a case‘, ‚guilty‘, ‚penalty‘, ‚make whole‘.

TRANSPORT: ‚yoke‘, ‚wagon‘, ‚wheel‘, ‚axle‘, ‚shaft (of a cart or wagon)‘, ‚pole/peg‘, ‚reins‘, ‚boat‘, row‘.

TECHNOLOGY: ‚craftsman‘, ‚craft‘, ‚metal‘, ‚gold‘, ‚silver‘, ‚axe‘, ‚spit‘, ‚auger‘, ‚awl‘, ‚whetstone‘, ‚net‘.

HOUSE AND BUILDINGS: ‚to build‘, ‚carpenter‘, ‚house‘, ‚hearth‘, ‚door‘, ‚doorjamb‘, ‚roof‘, ‚room‘, ‚beam/plank‘, ‚dwelling‘, ‚cauldron‘, ‚dish‘, ‚plate‘, cup‘, ‚bed‘.

CLOTHING AND TEXTILES: ‚wool‘, ‚comb‘, ‚spin‘, ‚braid‘, ‚plait‘, ‚twist‘, ‚weave‘, ‚sew‘, ‚fasten‘, ‚thread‘, ‚sinew‘, ‚wear‘, ‚skin bag‘.

Page 8: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Indo-European (cont.)

WARFARE AND FORTIFICATION: ‚war-band‘, ‚hold/conquer‘, ‚citadel‘, ‚hillfort‘, ‚fort‘, ‚booty‘, ‚sword‘, ‚spear‘, ‚spear-point‘.

SOCIAL STRUCTURE AND SOCIAL INTERACTION: ‚master‘, ‚housemaster‘, ‚household/village‘, ‚member of a household‘, ‚group‘, ‚groupmaster‘, ‚family‘, ‚people‘, ‚member of one‘s groups‘, ‚dear‘, ‚king‘, ‚rule‘, ‚free‘, ‚stranger, guest/host‘, servant‘, ‚dowry‘, ‚one‘s own custom‘, ‚fame‘, ‚poet/seer‘.

RELIGION AND BELIEFS: ‚holy‘, ‚god‘, ‚sky-father‘, ‚pray‘, ‚speak solemnly‘, ‚call/invoke‘, ‚priest, seer/poet‘, ‚worship‘, ‚consecrate‘, ‚handle reverently‘, ‚libate‘, ‚sacrificial metal‘, meal‘, ‚sacred grove‘, sacred enclosure‘, ‚magical glory‘, ‚sorcery‘.

Page 9: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Uralic

HUNTING, FISHING: ‚bow‘, ‚arrow‘, ‚bowstring‘, ‚knife‘, ‚to hunt‘, ‚track‘

FOOD: ‚egg‘, ‚berry‘, ‚bird-cherry‘, hare‘

TOOLS & TECHNOLOGY: ‚needle‘, ‚belt‘, ‚glue birch-bark‘, ‚drill‘, ‚cord/rope‘, ‚handle‘, ‚(lodge)pole‘, ‚bark/leather‘, ‚enclosure/fence‘, ‚metal‘, ‚to braid‘, ‚shaft‘, ‚to cook‘.

TRAVEL & TRANSPORT: ‚ski‘, ‚to row‘, ‚fathom‘, ‚cross-rail (in boat)‘.

CLIMATE & ENVIRONMENT: ‚snow‘, ‚lake‘, ‚river‘, ‚wave‘, ‚summer/thaw‘, ‚water‘.

COMMERCE: ‚to give/sell‘.

Page 10: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Problems with Wörter und Sachen

The method assumes that a proto-language corresponds to a point in time, but proto-languages have long lives and can be widespread. So different words for different things can belong to different temporal strata.

Precisely words for culturally important items are likely to diffuse quickly. Early borrowing among related languages can be impossible to detect. E.g., ‚whisky‘ can be reconstructed for proto-Algonkian, which should date to thousands of years before European conquest. Native words for ‚glasses‘ and ‚church‘ can be reconstructed for proto-Oaxaca Mixean (a subgroup of the Mixe-Zoquean languages of Mexico), which should date to at least a century before European conquest.

Page 11: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Problems with homelands

The signature in the lexicon may date to the latest expansion of the group.

Inferring homelands from lexical reconstructions requires a large proto-vocabulary. A large proto-vocabulary can only be reconstructed for a family with a relatively shallow time depth and many languages. Precisely this type of language family is likely to be one which has expanded quickly and recently. So the kind of language family that provides the best materials for inferring homelands is also the kind of language family whose homeland is most difficult to narrow down!

Page 12: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Problem with any approach within the traditional framework of comparative

linguistics

TIME!

Beyond some 10,000 years (nobody know how many exactly) the lexicon has changed so much that there is nothing to compare for the comparative method.

Page 13: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

So. . .

We need to go beyond the comparative methods. The time depth which it reaches is limited, and even within its ‚field of operation‘ there are problems when it comes to reconstructing aspects of culture and environment or to determine homelands.

Page 14: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

It turns out that the key to going beyond the traditional method is quantification. Comparative linguistics is an extremely qualitatively oriented field, and most historical linguistics have been afraid of numbers.

Page 15: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Let‘s make a fresh start and go as far back in time as we can imagine. . .

Page 16: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Can we say anything about the linguistic situation in pre-neolithic times, i.e. some 10,000-20,000 years ago?

Page 17: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Yes, I believe we can something about language family sizes, for instance.

Page 18: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

But it will require an imaginative use of quantitative data and some detours before we get to say anything about this. So hang on. . .

Page 19: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

The problem: were there many little language families of roughly the same size or perhaps a few big ones, some middle-size ones, and a lot of small ones?

Dixon‘s model of punctuation and equilibrium assumes the first model.

Linguistic equilibrium:

„Each political group would have a population comparable to those of other groups in the area. That is, one group could be, say, four times as big as another, but not a hundred times as big“ (Dixon 1997: 69)

Page 20: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

(1489) Niger-Congo(1262) Austronesian(552) Trans-New Guinea(443) Indo-European(372) Afro-Asiatic(365) Sino-Tibetan(258) Australian(199) Nilo-Saharan(172) Oto-Manguean(168) Austro-Asiatic(104) Sepik-Ramu(75) Dravidian(70) Tupi(70) Tai-Kadai(69) Mayan(65) Altaic(62) Uto-Aztecan(60) Arawakan(48) Torricelli(47) Na-Dene(46) Quechuan(40) Algic(38) Uralic(36) East Papuan(34) North Caucasian

(33) Geelvink Bay(33) Penutian(32) Macro-Ge(32) Hmong-Mien(30) Panoan(29) Carib(29) Khoisan(28) Hokan(27) Salishan(26) West Papuan(25) Tucanoan(22) Chibchan(17) Siouan(16) Mixe-Zoque(13) Andamanese(12) Japanese(11) Totonacan(11) Mataco-Guaicuru(11) Eskimo-Aleut(10) Choco(10) Iroquoian(8) Arauan(7) Chumash(7) Sko(7) Zaparoan

(7) Barbacoan(7) Left May(6) Maku(6) Muskogean(6) Kwomtari-Baibai(6) Kiowa-Tanoan(6) Tacanan(6) Witotoan(5) Caddoan(5) Chukotko-Kamchatkan(5) Mascoian(5) Guahiban(5) South Caucasian(5) Wakashan(5) Nambiquaran(5) Chapacura-Wanham(4) Huavean(4) Gulf(4) Misumalpan(4) Subtiaba-Tlapanec(4 Jivaroan)(4) Yanomam(3) Katukinan(3) Basque

(3) Aymaran(3) East Bird's Head(2) Lower Mamberamo (2) Harakmbet(2) Peba-Yaguan(2) Yenisei Ostyak(2) Arutani-Sape(2) Amto-Musan(2) Zamucoan(2) Alacalufan(2) Araucanian(2) Yukaghir(2) Yuki(2) Uru-Chipaya(2) Keres(2) Cahuapanan(2) Salivan(2) Chon(2) Bayono-Awbono(1) Coahuiltecan(1) Paezan(1) Lule-Vilela(1) Chimakuan(1) Mura(1) Mosetenan(1) Cant(30) Other language Isolates

A ranking of the world’s language families in terms of number oflanguages according to data from Ethnologue

Page 21: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Language family sizes in Ethnologue

y = 11202x-1.9016

0

500

1000

1500

2000

0 50 100 150

Page 22: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Language family sizes in Ethnologue (log-log scale)

y = 11202x-1.9016

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+00 1.00E+01 1.00E+02 1.00E+03

Page 23: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Internet sites ranked by the number of unique AOL visitors they receivedDec. 1, 1997 (after Adamic and Huberman 2002: Fig. 2).

Page 24: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Language family sizes in Ethnologue (log-log scale)

y = 11202x-1.9016

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+00 1.00E+01 1.00E+02 1.00E+03

Page 25: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Different phenomena exhibitingpower-law distributions

• Urban conglomerations (Auerbach 1913)• Abundance of biological taxa (Yule 1924)• Word frequencies (Zipf 1949)• Distributional of personal income (Champernowne 1953)• Earthquakes sizes (Kanamori and Anderson

1975)• Popularity of internet sites (Glassman 1994)• Gene activity (Ueda et al. 2004)

Page 26: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

‘Zipf’s law’ applies when:

For a set of quantities their distribution is such that each quantity Q is inversely proportional to its rank R

Zipf‘s law a special instance of a power-law distribution

Q = c R-a

Page 27: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Poisson distribution

Exponential Network

Power-law distribution

Scale-free Network

Albert László BarabásiAlbert László Barabási et al.: The Architecture of Complexity: From the Diameter of the WWW to the Structure of the Cell (power-point presentation)(http://www.nd.edu/~networks/papers.htm)

Page 28: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

P(k) ~k-

( = 3)

SCIENCE CITATION INDEX

Albert László BarabásiAlbert László Barabási et al.: The Architecture of Complexity: From the Diameter of the WWW to the Structure of the Cell (power-point presentation)(http://www.nd.edu/~networks/papers.htm)

Page 29: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

4781 Swedes; 18-74; 59% response rate.

Liljeros et al. Nature 2001

Nodes: people (Females; Males)Links: sexual relationships

Albert László BarabásiAlbert László Barabási et al.: The Architecture of Complexity: From the Diameter of the WWW to the Structure of the Cell (power-point presentation)(http://www.nd.edu/~networks/papers.htm)

Page 30: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Power-law distributions can be generated in networks by preferential attachment....

1 2 34

56

Page 31: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

m = ∑ i ∙ Pi

i=0

At any given taxonomic level an entity has the probability P0 of producing no offspring, the probability P1 of producing one offspring, the probability P2 of producing two offspring, etc. The mean number of offspring, m, is the sum of the set of probabilities Pi times offspring i.

If m > 1 (as in the example) the family will likely grow, if it is equal to 1 it will converge towards extinction over infinite time, and if m < 1 the family is certain to eventually become extinguished.

i 0 1 2 3 4 5

Pi 0.1 0.5 0.2 0.1 0.05 0.05

Example of a probability set:

...But how can power-laws be explained in a branching model?

Page 32: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Efforts to simulate power-law distributions based on the branching model did not succeed (straight lines emerged on a log-normal, not a log-log chart). However, succes was achieved when a variable was added giving a probability for any given language to start a new language family (lineage).

Page 33: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University
Page 34: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Language family sizes in Ethnologue (log-log scale)

y = 11202x-1.9016

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+00 1.00E+01 1.00E+02 1.00E+03

Page 35: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University
Page 36: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Returning to the question of language family sizes. It is unlikely that language families some 10,000-20,000 years ago would have roughly equal sizes, following a poisson distribution. Given the universality of power-laws it is better to assume that the language family sizes had this kind of distribution early on as well.

Page 37: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

The large language families before neolithic times were likely not the same today‘s large families. The following six language families are the largest in terms of number of languages and all are spoken in areas where agriculture was practiced very early on:

Niger-Congo Austronesian Trans-New Guinea Indo-European Afro-Asiatic Sino-Tibetan

Page 38: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

Prior to the spread of agriculture there would have been other large language families. Maybe the big families include some families that are still relatively large, and yet seem to have been reduced by the more recent expansion of families of agriculturalists, for instance Eastern Sudanic (reduced by Afro-Asiatic and Niger-Congo) or West Papuan (reduced by Austronesian).

Page 39: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University
Page 40: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University
Page 41: New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University

-Fin-

More case stories tomorrow