origin of man, language and languages central asia a common inquiry in genetics, linguistics and...

Post on 18-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Origin of Man, Language and Languages

Central AsiaA common inquiry in genetics, linguistics and

anthropology

Granted byEuropean Science foundation

CNRS

Eurasia

Central Asia

KAZAKS

KARAKALPAKS

TURKMENSUZBEKS

TURKIC - SPEAKING PEOPLE

IRANIAN-SPEAKING PEOPLE

KIRGHIZESTAJIKS

TAJIKS

KIRGHIZES

UZBEKS

UZBEKS

Expeditions

• 2001 Karakalpakie (Karakalpak, Uzbek, Kazakh)• 2002 Karakalpakie (On Tort Uruw, Turkmen)• 2003 Kirghizie (North and South)• 2004 Tajik from Uzbekistan (Ferghana and

Samarkand area)• 2005 Boukhara area : Kazakh, Ouzbek and Tajik• 2005 Tajik and Uzbek from Tajikistan (Gharm

and Penjinkent area)

Ethnological questionnaire

• Location, langage spoken, tribe if applicable of individuals and 4 grand-parents

• Information on married children

FULL NAME : N° : Location : Age : Sex : Localization Individual’s location Birth place

Father of father Father’s location Mother of father Father of mother Mother’s location Mother of mother

Language Individual’s language

Father of father Father’s language Mother of father Father of mother Mother’s language Mother of mother

Tribe (or group) Individual’s GF (Group of Filiation: tribes, clan, lignage)

Father of father Father’s GF

Mother of father

Father of mother Mother’s GF

Mother of mother

Individual’s spouse Alive ? : Y/N Second marriage : Y/N if Yes add a questionnaire Why this partner ? (cousin, same clan , same village ?…)

Father of father F Mother of father Father of mother

Spouse’s location (origin)

M Mother of mother Father of father F Mother of father Father of mother

Spouse’s Language

M Mother of mother Father of father F

Mother of father

Father of mother

Spouse’s GF

M

Mother of mother

Linguistic data

Blood, DNA

• 5 ml for each individual

• Informed consent

• On the field : blood is process white cells

• In the laboratory : DNA extraction

Main goals

• Trace back population history

• Describe genetic diversity in Central Asia

• Compare genetic and linguistic distances

• History of Eurasia : Past demographic expansion

– By Raphaelle Chaix (Former PhD Student)

« Mismatch distribution »

Pop 1

All 2 by 2 comparison Pop 2

A T A A T CA C A T T C

Number of differences between sequences

Fq

0 2 4 6 8

Fq

0 2 4 6 8

seq1seq2

Estimation of = 2Tu

Age of expansion

Age of expansion ()ADNmt

N=133

r=0.7 p=0

longitude

ADNmt

Chr Y

N=77

Age of expansion ()

r=0.3 p=0.01

longitude

ChrY

0 20 40 60 80 100

45

67

8

longitude

tau

ss

mig

r

25 30 35 40 45 50

24

68

10

latitude

tau

ss

mig

r

31 6.6

17 5.627 7.4

25 6

ADNmt / Chr Y (KY)

Age of expansion (T)

30 7.2

24 7

mtDNA dating depends on the mutation rate30000yrs BP China to 17000yrs BP in Europe (-27000 in CA) 62000yrs BP in China to 35000yrs BP in Europe (-54000 in CA)

« intermatch distribution » (Harpending et al 2000 – Excoffier et al 2004)

Pop 1

All 2 by 2 comparison

Pop 2

Number of differences between sequences

Fq

0 2 4 6 8

Fq

0 2 4 6 8

Center of expansion ?

Same center of expansion for the 2 populations

Extrême-Orient

Asie-Centrale

Europe

Cultural expansion

Fq

0 2 4 6 8

Demic expansion

Fq

0 2 4 6 8

Extrême-Orient

Asie-Centrale

Europe

Cultural expansion with high migration rate =“Demic expansion”

Fq

0 2 4 6 8

Extrême-Orient

Asie-Centrale

Europe

Past expansion in Eurasia

• Mitochondrial DNA : East to West in Paleolithic (from China to Central Asia and then to Europe – from Middle East to Europe) no cultural expansion

• Y chromosome : expansion during Neolithic. Two centers of expansion (China and ME, Pakistan CA) perhaps 3 (Europe).For central Asia : same timing for expansion as Middle East, a little bit earlier but not statistically significant. – Differences explained by lower Ne for male

Central Asia diversity

• 463 individuals typed for two uniparental markers :– Y chromosome Micro satellites + SNP

– Mitochondrial DNA HVS-I + RFLP

• 400 to be typed

Y Chromosome diversitycode pop N diversité descriptif

kk1 54 0,97 Karakalpaks (Qongirat)kz1 49 0,84 Kazakhsotu1 54 0,89 Karakalpaks (On Tort Uruw)tkm1 51 0,84 Turkmènesuz1 40 0,97 Ouzbekskir2 37 0,91 Kirghizes de kirgizie centrale (mélange)kz2 14 0,86 Kazakhs :Almaty, Katon-Karagay, Karatutuk, Rachmanovsky Kluchi

tkm2 21 0,94 Turkmènes d Ashgabattd2 22 0,87 Tadjiks de Penjikentuz2 28 1 Ouzbeks de Kashkadaryaui2 33 0,98 Ouighours d Alma-Aty, Lavar

KRA 46 0,82 Kirghizes AndijanKRG 20 0,78 Kirgize Nord JankatalabKRM 22 0,7 Kirgize Nord DobolooTJK 30 0,98 Tajik Ferghana KamangaronTJR 29 0,98 Tajik Freghana Richtan

Genetic distances among populations : chromosome Y

r gen-geo = 0,85

KZ

KK UZ

OTU

TK

KRG KRMKRA

TJKTJR

• Comparison of genetic and ethnological data

From oral traditionAncêtre commun de la tribuCommon tribe’s ancestor

Common clan’s ancestor

Common lineage’s ancestor

If a recent common ancestor

If no recent common ancestor

Common

ancestor

Strong genetical

kinship

Low genetical

kinship

Patrilinear filiation Y chromosome study

250 men : genealogical information

1 – Ethnological questionnaire

2 – Patrilinear genetic kinship

12 microsatellites of Y chromosome

-0,2-0,1

00,10,20,30,40,50,60,70,80,9

1 2 3 4

Co

effi

cien

t d

e p

aren

KZ

TK

UZ

QN

OTU

tribeSame tribeSame lineage Same clan

Mean genetic kinship coefficient for each ethnological class of the five populations examined in this study.

KZ Kazakhs; TK Turkmen; UZ Uzbeks; QN Qongrat; OTU On Tort Uruw.

Kin

ship

coe

ffic

ient

-0,2-0,1

00,10,20,30,40,50,60,70,80,9

1 2 3 4

Co

effi

cien

t d

e p

aren

KZ

TK

UZ

QN

OTU

tribeSame tribeSame lineage Same clan

Kin

ship

coe

ffic

ient

-0,2-0,1

00,10,20,30,40,50,60,70,80,9

1 2 3 4

Co

effi

cien

t d

e p

aren

KZ

TK

UZ

QN

OTU

tribeSame tribeSame lineage Same clan

Kin

ship

coe

ffic

ient

-0,2-0,1

00,10,20,30,40,50,60,70,80,9

1 2 3 4

Co

effi

cien

t d

e p

aren

KZ

TK

UZ

QN

OTU

tribeSame tribeSame lineage Same clan

Kin

ship

coe

ffic

ient

ASD = mean square number of differences between the ancestor and the individuals

T : age

Ancestor of the clan or of the

lineage

T = ASD /

= mutation rate 2.1x 10-3 per generation

mutation

Datation

Ancêtre commun de la tribu

Common clan’s ancestor

Common lineage’s ancestor 5

0 g

ener

atio

ns

Ap

p. 1

500

year

s

15

gen

erat

ion

s

Ap

p. 4

50 y

ears

Tribe : mythical ancestor

Conclusion

Genetical data can help decipher social organisation

Lineages and clans : people share a recent common ancestor

Tribes : a conglomerate of clans who subsequently invented a mythical ancestor to strengthen group unity

Y chromosome

• Low diversity of some populations is explained by social organistaion

• Distances among populations related to geographical distances

Mitochondrial DNA

r gen-geo = O

KZKK UZOTU

TK

KRG KRM KRATJK

TJRTja Tju

MDS based on mt DNA – 12 populations – (Kimura 2P – α =0.26)

Central Asian Populationscode N name of population reference locationkk1 55 Karakalpaks (Qongirat) Present Study Karakalpakiekz1 50 Kazakhs Present Study Karakalpakieotu1 53 Karakalpaks (On Tort Uruw) Present Study Karakalpakietk1 51 Turkmènes Present Study Karakalpakieuz1 40 Ouzbeks Present Study Karakalpakiekz3 55 Kazakhs Comas 1998 Alma Atikit3 48 Kirghizes de Talas Comas 1998 Talaskir3 47 Kirghizes de Sary-Tash Comas 1998 Sry-Tashkr4 20 Karakalpaks Comas 2004 Nukuskz4 20 Kazakhs Comas 2004 Gasli kuz4 20 Ouzbeks du Khorezm Comas 2004 Urgenchkg4 20 Kirghizes Comas 2004 Oshtu4 20 Turkmènes Comas 2004 Urgenchuz4 20 Ouzbeks Comas 2004 Samarkandetd4 20 Tadjiks Comas 2004 Samarkandeuz2 42 Ouzbeks Quintana 2004 Samarkandetk2 41 Turkmènes Quintana 2004krt2 32 Kurds du Turkmenistan Quintana 2004KRA 48 Kirghizes Andijan Present Study AndijanKRG 20 Kirgize Nord Jankatalab Present Study NukusKRM 26 Kirgize Nord Doboloo Present Study NukusTJK 30 Tajik Ferghana Kamangaron Present Study Ferghana ValleyTJR 29 Tajik Freghana Richtan Present Study Ferghana ValleyTja 33 Tajik Samarkande Agalic Present Study Samarkande areaTju 29 Tajik Samarkand Urgut Present Study Samarkande areaui3 55 Ouighours Comas 1998 Alma Atiui4 16 Ouighours Comas 2004 Tashkentshu 44 Shugnan (Pamir - Tajikistan) Quintana 2004

MDS de l’ADN mito

Tajiks

No gen-geo correlation

MDS based on mt DNA – 28 populations – (Kimura 2P – α =0.26)

Kazakhs

Karakalpaks Karakalpaks_(OTU)

Karakalpaks_(Qongirat)

Ouighours

Kazakhs

Shugnan_(Pamir_Tajikistan)

Kurds_du_TurkmenistanTurkmènes

Turkmènes

Kirghizes (Comas)

Tajiks

Ouzbeks

Ouzbeks

Kazakhs Uighurs

Kirghizes

Kirghizes

Kirghizes A

Tajiks RTajiks A &U

Ouzbeks (Korezhm)

Ouzbeks

TurkmènesKirghizes G&M

Mitochondrial DNA

Karakalpaks

N=3 Uzbeks

N=4 Kazakhs

N=3 Turkmen

N=3 Kirghizes

N=6 Tajiks N=5

Karakalpaks -0,00013

Uzbeks 0,00951 0,00404

Kazakhs -0,00205 0,00478 -0,00182

Turkmen 0,00889 0,01291 0,00835 -0,00079

Kirghizes 0,00626 0,0203 0,01182 0,0218 0,0084

Tajiks 0,02246 0,01517 0,02256 0,02408 0,03533 0,02497

Mean distance (Fst) between populations

Diagonal show intra group distances

Exogamous populations

Endogamous populations

Mitochondrial DNA

• Distances among populations not related to linguistic or geographical distances

• Exchange among populations differ between Turko-Mongol (exogamous) populations and Indo-Iranian (endogamous) populations

Conclusion

• Past history: clear movement from east to west in paleolithic – strong population growth in neolithic.

• Exchange between populations clearly different for male and female

• Linguistic distances ?

Computational linguistic• Background

Design of the sampling

Swadesh list

2/3 speakers for each sampling location (interspeaker variation)

Analyses

We are not interested in historical linguistics.

Central Asia about 1000 CE : language groups

Indo-iranian

Turkic : Oguz, Kipchak, Karluk

?SogdianKhorasmian

Persian-Tadjik

OguzKipchak

Karluk ?Ossetic

Pamirian

Dardic

We want to statistically compare genetic and linguistic data

More linguistic differences among Iranian populations than among Turkish populations ?

We have two major linguistic groups Indo-Iranian and Turk

We will focus on them separately since they both constitute a DIALECT-CHAIN

Borrowing, if it exists is less of a problem since it reflects CONTACT (migrations), a kind of information that is embedded in genetic data. More than historical linguistics we look for a POPULATION LINGUISTICS

… we selected distance-based approaches

Phonetic alignment:

•An alignement algorithm (string mapping)

•A metric for measuring distances between phonetic segments

Distance Matrices:

•Correlate linguistic and geographic distances

•Correlate linguistic and genetic distances (mt DNA)

Dialectometrical Computation of distances(Kondrak 2004, Heeringa 2004)

From Ph Mennecier

What remains to be done in genetic analysis

• Phylogeography of Y and mtDNA geographic patterns of genetic variation may reveal migrations synchronic to linguistic phenomena (replacement, borrowing,..)

• Autosomal markers• Samples from Tajikistan

Thanks all the people who participated to this study

In France :

Dr. François Jacquesson, linguist, CNRS Pr. Evelyne Heyer, geneticist, MNHN, CNRSDr. Lluis Quintana, geneticist, CNRS, Inst. Pasteur Dr. Philippe Mennecier, linguist, MNHNDr. Frederic Austerlitz, geneticist, CNRSDr. Svetlana Jacquesson, anthropologist, IFEAC Dr. Franz Manni, geneticist, MNHNDr. R Chaix (former PhD student, in Oxford)Dr. P Balaresque (former PhD student, in Leicester)

In Central Asia :Dr. Tatiana Hegai, geneticist, Tashkent

Pr. Ruslan Ruzibakiev, geneticist, TashkentDr. Aldashev, geneticist , BishkekPr. Vadim Yagodin, archaeologist, NukusDr. Bakyt Amanbaeva, archaeologist, BishkekPr. Firuza Nasyrova, genetist, Douchanbé

Alignement

I N D U S T R Y

0

I 0

N 0

T 1

E 2

R 3 4 5

E 6

S 6

T 6 7 8

industry Subst i/i 0

industry Subst. n/n

0

intdustry Insert t 1

intedustry Insert e 1

interdustry Insert r 1

interustry Delete d 1

Interstry Delete u 1

interestry Insert e 1

Interestry Subst s/s 0

Interestry Subst t/t 0

Interesty Delete r 1

Interest Delete y 1

Total cost 8

The indels are weighted as 1 instead of 2 in a newer version of the algorithm

Y Chromosome diversitycode pop N diversité descriptif

kk1 54 0,97 Karakalpaks (Qongirat)kz1 49 0,84 Kazakhsotu1 54 0,89 Karakalpaks (On Tort Uruw)tkm1 51 0,84 Turkmènesuz1 40 0,97 Ouzbekskir2 37 0,91 Kirghizes de kirgizie centrale (mélange)kz2 14 0,86 Kazakhs :Almaty, Katon-Karagay, Karatutuk, Rachmanovsky Kluchi

tkm2 21 0,94 Turkmènes d Ashgabattd2 22 0,87 Tadjiks de Penjikentuz2 28 1 Ouzbeks de Kashkadaryaui2 33 0,98 Ouighours d Alma-Aty, Lavar

KRA 46 0,82 Kirghizes AndijanKRG 20 0,78 Kirgize Nord JankatalabKRM 22 0,7 Kirgize Nord DobolooTJK 30 0,98 Tajik Ferghana KamangaronTJR 29 0,98 Tajik Freghana Richtan

Genetic distances among populations : chromosome Y

kk1 Karakalpaks (Qongirat)kz1 Kazakhsotu1 Karakalpaks (On Tort Uruw)tkm1 Turkmènesuz1 Ouzbekskir2 Kirghizes de kirgizie centrale (mélange)kz2 Kazakhs :Almaty, Katon-Karagay, Karatutuk, Rachmanovsky Kluchi

tkm2 Turkmènes d Ashgabattd2 Tadjiks de Penjikentuz2 Ouzbeks de Kashkadaryaui2 Ouighours d Alma-Aty, Lavar

KRA Kirghizes AndijanKRG Kirgize Nord JankatalabKRM Kirgize Nord DobolooTJK Tajik Ferghana KamangaronTJR Tajik Freghana Richtan

top related