origin of man, language and languages central asia a common inquiry in genetics, linguistics and...
TRANSCRIPT
Origin of Man, Language and Languages
Central AsiaA common inquiry in genetics, linguistics and
anthropology
Granted byEuropean Science foundation
CNRS
Eurasia
Central Asia
KAZAKS
KARAKALPAKS
TURKMENSUZBEKS
TURKIC - SPEAKING PEOPLE
IRANIAN-SPEAKING PEOPLE
KIRGHIZESTAJIKS
TAJIKS
KIRGHIZES
UZBEKS
UZBEKS
Expeditions
• 2001 Karakalpakie (Karakalpak, Uzbek, Kazakh)• 2002 Karakalpakie (On Tort Uruw, Turkmen)• 2003 Kirghizie (North and South)• 2004 Tajik from Uzbekistan (Ferghana and
Samarkand area)• 2005 Boukhara area : Kazakh, Ouzbek and Tajik• 2005 Tajik and Uzbek from Tajikistan (Gharm
and Penjinkent area)
Ethnological questionnaire
• Location, langage spoken, tribe if applicable of individuals and 4 grand-parents
• Information on married children
FULL NAME : N° : Location : Age : Sex : Localization Individual’s location Birth place
Father of father Father’s location Mother of father Father of mother Mother’s location Mother of mother
Language Individual’s language
Father of father Father’s language Mother of father Father of mother Mother’s language Mother of mother
Tribe (or group) Individual’s GF (Group of Filiation: tribes, clan, lignage)
Father of father Father’s GF
Mother of father
Father of mother Mother’s GF
Mother of mother
Individual’s spouse Alive ? : Y/N Second marriage : Y/N if Yes add a questionnaire Why this partner ? (cousin, same clan , same village ?…)
Father of father F Mother of father Father of mother
Spouse’s location (origin)
M Mother of mother Father of father F Mother of father Father of mother
Spouse’s Language
M Mother of mother Father of father F
Mother of father
Father of mother
Spouse’s GF
M
Mother of mother
Linguistic data
Blood, DNA
• 5 ml for each individual
• Informed consent
• On the field : blood is process white cells
• In the laboratory : DNA extraction
Main goals
• Trace back population history
• Describe genetic diversity in Central Asia
• Compare genetic and linguistic distances
• History of Eurasia : Past demographic expansion
– By Raphaelle Chaix (Former PhD Student)
« Mismatch distribution »
Pop 1
All 2 by 2 comparison Pop 2
A T A A T CA C A T T C
Number of differences between sequences
Fq
0 2 4 6 8
Fq
0 2 4 6 8
seq1seq2
Estimation of = 2Tu
Age of expansion
Age of expansion ()ADNmt
N=133
r=0.7 p=0
longitude
ADNmt
Chr Y
N=77
Age of expansion ()
r=0.3 p=0.01
longitude
ChrY
0 20 40 60 80 100
45
67
8
longitude
tau
ss
mig
r
25 30 35 40 45 50
24
68
10
latitude
tau
ss
mig
r
31 6.6
17 5.627 7.4
25 6
ADNmt / Chr Y (KY)
Age of expansion (T)
30 7.2
24 7
mtDNA dating depends on the mutation rate30000yrs BP China to 17000yrs BP in Europe (-27000 in CA) 62000yrs BP in China to 35000yrs BP in Europe (-54000 in CA)
« intermatch distribution » (Harpending et al 2000 – Excoffier et al 2004)
Pop 1
All 2 by 2 comparison
Pop 2
Number of differences between sequences
Fq
0 2 4 6 8
Fq
0 2 4 6 8
Center of expansion ?
Same center of expansion for the 2 populations
Extrême-Orient
Asie-Centrale
Europe
Cultural expansion
Fq
0 2 4 6 8
Demic expansion
Fq
0 2 4 6 8
Extrême-Orient
Asie-Centrale
Europe
Cultural expansion with high migration rate =“Demic expansion”
Fq
0 2 4 6 8
Extrême-Orient
Asie-Centrale
Europe
Past expansion in Eurasia
• Mitochondrial DNA : East to West in Paleolithic (from China to Central Asia and then to Europe – from Middle East to Europe) no cultural expansion
• Y chromosome : expansion during Neolithic. Two centers of expansion (China and ME, Pakistan CA) perhaps 3 (Europe).For central Asia : same timing for expansion as Middle East, a little bit earlier but not statistically significant. – Differences explained by lower Ne for male
Central Asia diversity
• 463 individuals typed for two uniparental markers :– Y chromosome Micro satellites + SNP
– Mitochondrial DNA HVS-I + RFLP
• 400 to be typed
Y Chromosome diversitycode pop N diversité descriptif
kk1 54 0,97 Karakalpaks (Qongirat)kz1 49 0,84 Kazakhsotu1 54 0,89 Karakalpaks (On Tort Uruw)tkm1 51 0,84 Turkmènesuz1 40 0,97 Ouzbekskir2 37 0,91 Kirghizes de kirgizie centrale (mélange)kz2 14 0,86 Kazakhs :Almaty, Katon-Karagay, Karatutuk, Rachmanovsky Kluchi
tkm2 21 0,94 Turkmènes d Ashgabattd2 22 0,87 Tadjiks de Penjikentuz2 28 1 Ouzbeks de Kashkadaryaui2 33 0,98 Ouighours d Alma-Aty, Lavar
KRA 46 0,82 Kirghizes AndijanKRG 20 0,78 Kirgize Nord JankatalabKRM 22 0,7 Kirgize Nord DobolooTJK 30 0,98 Tajik Ferghana KamangaronTJR 29 0,98 Tajik Freghana Richtan
Genetic distances among populations : chromosome Y
r gen-geo = 0,85
KZ
KK UZ
OTU
TK
KRG KRMKRA
TJKTJR
• Comparison of genetic and ethnological data
From oral traditionAncêtre commun de la tribuCommon tribe’s ancestor
Common clan’s ancestor
Common lineage’s ancestor
If a recent common ancestor
If no recent common ancestor
Common
ancestor
Strong genetical
kinship
Low genetical
kinship
Patrilinear filiation Y chromosome study
250 men : genealogical information
1 – Ethnological questionnaire
2 – Patrilinear genetic kinship
12 microsatellites of Y chromosome
-0,2-0,1
00,10,20,30,40,50,60,70,80,9
1 2 3 4
Co
effi
cien
t d
e p
aren
té
KZ
TK
UZ
QN
OTU
tribeSame tribeSame lineage Same clan
Mean genetic kinship coefficient for each ethnological class of the five populations examined in this study.
KZ Kazakhs; TK Turkmen; UZ Uzbeks; QN Qongrat; OTU On Tort Uruw.
Kin
ship
coe
ffic
ient
-0,2-0,1
00,10,20,30,40,50,60,70,80,9
1 2 3 4
Co
effi
cien
t d
e p
aren
té
KZ
TK
UZ
QN
OTU
tribeSame tribeSame lineage Same clan
Kin
ship
coe
ffic
ient
-0,2-0,1
00,10,20,30,40,50,60,70,80,9
1 2 3 4
Co
effi
cien
t d
e p
aren
té
KZ
TK
UZ
QN
OTU
tribeSame tribeSame lineage Same clan
Kin
ship
coe
ffic
ient
-0,2-0,1
00,10,20,30,40,50,60,70,80,9
1 2 3 4
Co
effi
cien
t d
e p
aren
té
KZ
TK
UZ
QN
OTU
tribeSame tribeSame lineage Same clan
Kin
ship
coe
ffic
ient
ASD = mean square number of differences between the ancestor and the individuals
T : age
Ancestor of the clan or of the
lineage
T = ASD /
= mutation rate 2.1x 10-3 per generation
mutation
Datation
Ancêtre commun de la tribu
Common clan’s ancestor
Common lineage’s ancestor 5
0 g
ener
atio
ns
Ap
p. 1
500
year
s
15
gen
erat
ion
s
Ap
p. 4
50 y
ears
Tribe : mythical ancestor
Conclusion
Genetical data can help decipher social organisation
Lineages and clans : people share a recent common ancestor
Tribes : a conglomerate of clans who subsequently invented a mythical ancestor to strengthen group unity
Y chromosome
• Low diversity of some populations is explained by social organistaion
• Distances among populations related to geographical distances
Mitochondrial DNA
r gen-geo = O
KZKK UZOTU
TK
KRG KRM KRATJK
TJRTja Tju
MDS based on mt DNA – 12 populations – (Kimura 2P – α =0.26)
Central Asian Populationscode N name of population reference locationkk1 55 Karakalpaks (Qongirat) Present Study Karakalpakiekz1 50 Kazakhs Present Study Karakalpakieotu1 53 Karakalpaks (On Tort Uruw) Present Study Karakalpakietk1 51 Turkmènes Present Study Karakalpakieuz1 40 Ouzbeks Present Study Karakalpakiekz3 55 Kazakhs Comas 1998 Alma Atikit3 48 Kirghizes de Talas Comas 1998 Talaskir3 47 Kirghizes de Sary-Tash Comas 1998 Sry-Tashkr4 20 Karakalpaks Comas 2004 Nukuskz4 20 Kazakhs Comas 2004 Gasli kuz4 20 Ouzbeks du Khorezm Comas 2004 Urgenchkg4 20 Kirghizes Comas 2004 Oshtu4 20 Turkmènes Comas 2004 Urgenchuz4 20 Ouzbeks Comas 2004 Samarkandetd4 20 Tadjiks Comas 2004 Samarkandeuz2 42 Ouzbeks Quintana 2004 Samarkandetk2 41 Turkmènes Quintana 2004krt2 32 Kurds du Turkmenistan Quintana 2004KRA 48 Kirghizes Andijan Present Study AndijanKRG 20 Kirgize Nord Jankatalab Present Study NukusKRM 26 Kirgize Nord Doboloo Present Study NukusTJK 30 Tajik Ferghana Kamangaron Present Study Ferghana ValleyTJR 29 Tajik Freghana Richtan Present Study Ferghana ValleyTja 33 Tajik Samarkande Agalic Present Study Samarkande areaTju 29 Tajik Samarkand Urgut Present Study Samarkande areaui3 55 Ouighours Comas 1998 Alma Atiui4 16 Ouighours Comas 2004 Tashkentshu 44 Shugnan (Pamir - Tajikistan) Quintana 2004
MDS de l’ADN mito
Tajiks
No gen-geo correlation
MDS based on mt DNA – 28 populations – (Kimura 2P – α =0.26)
Kazakhs
Karakalpaks Karakalpaks_(OTU)
Karakalpaks_(Qongirat)
Ouighours
Kazakhs
Shugnan_(Pamir_Tajikistan)
Kurds_du_TurkmenistanTurkmènes
Turkmènes
Kirghizes (Comas)
Tajiks
Ouzbeks
Ouzbeks
Kazakhs Uighurs
Kirghizes
Kirghizes
Kirghizes A
Tajiks RTajiks A &U
Ouzbeks (Korezhm)
Ouzbeks
TurkmènesKirghizes G&M
Mitochondrial DNA
Karakalpaks
N=3 Uzbeks
N=4 Kazakhs
N=3 Turkmen
N=3 Kirghizes
N=6 Tajiks N=5
Karakalpaks -0,00013
Uzbeks 0,00951 0,00404
Kazakhs -0,00205 0,00478 -0,00182
Turkmen 0,00889 0,01291 0,00835 -0,00079
Kirghizes 0,00626 0,0203 0,01182 0,0218 0,0084
Tajiks 0,02246 0,01517 0,02256 0,02408 0,03533 0,02497
Mean distance (Fst) between populations
Diagonal show intra group distances
Exogamous populations
Endogamous populations
Mitochondrial DNA
• Distances among populations not related to linguistic or geographical distances
• Exchange among populations differ between Turko-Mongol (exogamous) populations and Indo-Iranian (endogamous) populations
Conclusion
• Past history: clear movement from east to west in paleolithic – strong population growth in neolithic.
• Exchange between populations clearly different for male and female
• Linguistic distances ?
Computational linguistic• Background
Design of the sampling
Swadesh list
2/3 speakers for each sampling location (interspeaker variation)
Analyses
We are not interested in historical linguistics.
Central Asia about 1000 CE : language groups
Indo-iranian
Turkic : Oguz, Kipchak, Karluk
?SogdianKhorasmian
Persian-Tadjik
OguzKipchak
Karluk ?Ossetic
Pamirian
Dardic
We want to statistically compare genetic and linguistic data
More linguistic differences among Iranian populations than among Turkish populations ?
We have two major linguistic groups Indo-Iranian and Turk
We will focus on them separately since they both constitute a DIALECT-CHAIN
Borrowing, if it exists is less of a problem since it reflects CONTACT (migrations), a kind of information that is embedded in genetic data. More than historical linguistics we look for a POPULATION LINGUISTICS
… we selected distance-based approaches
Phonetic alignment:
•An alignement algorithm (string mapping)
•A metric for measuring distances between phonetic segments
Distance Matrices:
•Correlate linguistic and geographic distances
•Correlate linguistic and genetic distances (mt DNA)
Dialectometrical Computation of distances(Kondrak 2004, Heeringa 2004)
From Ph Mennecier
What remains to be done in genetic analysis
• Phylogeography of Y and mtDNA geographic patterns of genetic variation may reveal migrations synchronic to linguistic phenomena (replacement, borrowing,..)
• Autosomal markers• Samples from Tajikistan
Thanks all the people who participated to this study
In France :
Dr. François Jacquesson, linguist, CNRS Pr. Evelyne Heyer, geneticist, MNHN, CNRSDr. Lluis Quintana, geneticist, CNRS, Inst. Pasteur Dr. Philippe Mennecier, linguist, MNHNDr. Frederic Austerlitz, geneticist, CNRSDr. Svetlana Jacquesson, anthropologist, IFEAC Dr. Franz Manni, geneticist, MNHNDr. R Chaix (former PhD student, in Oxford)Dr. P Balaresque (former PhD student, in Leicester)
In Central Asia :Dr. Tatiana Hegai, geneticist, Tashkent
Pr. Ruslan Ruzibakiev, geneticist, TashkentDr. Aldashev, geneticist , BishkekPr. Vadim Yagodin, archaeologist, NukusDr. Bakyt Amanbaeva, archaeologist, BishkekPr. Firuza Nasyrova, genetist, Douchanbé
Alignement
I N D U S T R Y
0
I 0
N 0
T 1
E 2
R 3 4 5
E 6
S 6
T 6 7 8
industry Subst i/i 0
industry Subst. n/n
0
intdustry Insert t 1
intedustry Insert e 1
interdustry Insert r 1
interustry Delete d 1
Interstry Delete u 1
interestry Insert e 1
Interestry Subst s/s 0
Interestry Subst t/t 0
Interesty Delete r 1
Interest Delete y 1
Total cost 8
The indels are weighted as 1 instead of 2 in a newer version of the algorithm
Y Chromosome diversitycode pop N diversité descriptif
kk1 54 0,97 Karakalpaks (Qongirat)kz1 49 0,84 Kazakhsotu1 54 0,89 Karakalpaks (On Tort Uruw)tkm1 51 0,84 Turkmènesuz1 40 0,97 Ouzbekskir2 37 0,91 Kirghizes de kirgizie centrale (mélange)kz2 14 0,86 Kazakhs :Almaty, Katon-Karagay, Karatutuk, Rachmanovsky Kluchi
tkm2 21 0,94 Turkmènes d Ashgabattd2 22 0,87 Tadjiks de Penjikentuz2 28 1 Ouzbeks de Kashkadaryaui2 33 0,98 Ouighours d Alma-Aty, Lavar
KRA 46 0,82 Kirghizes AndijanKRG 20 0,78 Kirgize Nord JankatalabKRM 22 0,7 Kirgize Nord DobolooTJK 30 0,98 Tajik Ferghana KamangaronTJR 29 0,98 Tajik Freghana Richtan
Genetic distances among populations : chromosome Y
kk1 Karakalpaks (Qongirat)kz1 Kazakhsotu1 Karakalpaks (On Tort Uruw)tkm1 Turkmènesuz1 Ouzbekskir2 Kirghizes de kirgizie centrale (mélange)kz2 Kazakhs :Almaty, Katon-Karagay, Karatutuk, Rachmanovsky Kluchi
tkm2 Turkmènes d Ashgabattd2 Tadjiks de Penjikentuz2 Ouzbeks de Kashkadaryaui2 Ouighours d Alma-Aty, Lavar
KRA Kirghizes AndijanKRG Kirgize Nord JankatalabKRM Kirgize Nord DobolooTJK Tajik Ferghana KamangaronTJR Tajik Freghana Richtan