ww2.biol.sc.eduww2.biol.sc.edu/~elygen/biol303/2011 term papers/jillian language... · web viewthis...
TRANSCRIPT
Language and Geography: Intertwining Elements in the Picture of Human Genetic Diversity
By Jillian ClaireBiol 303
November 4, 2011
Looking around any college campus in America today, one can immediately
see a picture of the diversity in not only human ethnicity and genetics, but the
diversity of language as well. This diversity has existed ever since humans first
evolved in Africa(Chiaroni et al. 2009), but diversity concentrated into an area as
small as a college campus is an extremely new phenomenon. For most of human
history languages and haplogroups, or populations of people that share a genetic
marker due to common ancestry, existed only in very specific geographical areas
and spread through gradual migration (Chiaroni et al. 2009). This phenomenon of
evolution lead to the formation of many different populations of humans in the
Americas, each one speaking a unique language. Flora Jay, Olivier Francois, and
Michael G.B. Blum studied the relationship of Native American population structure
and languages in the paper Predictions of Native American Population Structure
Using Linguistic Covariate in a Hidden Regression Framework, published in the
January 2011 volume of the journal PLoS ONE.
Studying the relationship between genetics and languages is not a new
concept, but no previous studies of Native American populations had shown any
significant relationship. So these authors approached the question in a new way.
Instead of using a tree-based test, which compares genetic distances to a language
tree, they used measures of linguistic distance derived from structural features of
the language (Jay et al. 2011). A language tree is a hierarchal classification of related
languages, but in the case of Native American languages a consensus on a language
tree among linguists has not been reached (Jay et al. 2011). The authors used three
levels of linguistic differentiation, the stock level of 8 groups, the group level of 14,
and the family level of 16 (Jay et al. 2011).
The aim of their study was to determine to what extent geographic and
linguistic origin can explain an individual’s membership to a genetic cluster and to
find out if languages contribute to a better prediction of cluster membership than
geography alone (Jay et al. 2011). From the start of the study it was evident that
geography alone provides a very good prediction of cluster membership. Figure 1
below shows a map of the Americas and the genetic clusters studied. The colored
areas show where the predicted membership coefficient is greater than 0.5, or
where a correct placement was not due to chance. It is clear that most of the
populations fall within the area of prediction (Jay et al. 2011).
Figure 1. (Jay et al. 2011)
How are these genetic clusters defined; and why does geography have so
much to do with genetic differentiation? Many studies have been conducted on this
topic. Andrea Manica, Franck Prugnolle, and Francois Balloux (2005) studied the
relationships of geography and ethnicity on human genetic diversity. They state that
humans cluster into five or six broad ethnic groups that generally correspond to
continents. However, this is not a fine enough classification to predict genetic
diversity (Manica et al. 2005). In their study they found that genetic differentiation
is essentially dependent on geographic isolation, and geography is always a far
better predictor for the proportion of shared variants between two populations than
ethnicity. The proportion of shared variants is simply the number of alleles that are
shared between two populations divided by the number of loci typed (Manica et al.
2005). Most significantly, they discovered a correlation of 93% between genetic
diversity and distance from East Africa along landmasses. This essentially proves
that geography can indeed predict genetic makeup (Manica et al. 2005).
Another study of genetic diversity analyzing Y-chromosome diversity and
human expansion with relation to cultural evolution, conducted by Jacques Chiaroni,
Peter Underhill, and Luca Cavalli-Sforza (2009), corroborates the claim of Manica et
al. that there is a strong relationship between genetic diversity and distance from
East Africa. A consequence of “Out of Africa expansion”, a “reasonable” observed
slope of decay of genetic diversity with distance from East Africa is hypothesized to
be the result of a serial founder effect (Chiaroni et al. 2009). Very interestingly, a
study by Quentin Atkinson found that phonemic diversity also experiences a linear
fall with distance from Africa (Atkinson 2011). A phoneme is the smallest segmental
sound used to form words. Atkinson’s study illustrated this decline in phonemic
diversity very clearly. Khoisan language families, which are found in East Africa,
contain about 100 phonemes, including clicks. However Polynesian languages, the
furthest from Africa, contain only 13. As a comparison, English contains about 45
(Atkinson 2011).
Below is a phylogenetic tree, which describes the relationships of human Y
chromosome haplogroups. More evidence for the Out of Africa expansion,
haplogroups A and B are shown by the tree to be the oldest groups, and they are
confined to the African continent (Chiaroni et al. 2009).
Figure 2. (Chiaroni et al. 2009)
Chiaroni proposes that if human migrations were random, the geographic
distribution of people with a specific haplogroup would follow a normal distribution
around the point of origin of the mutation that defines the haplogroup, with various
irregularities due to geographic obstacles (Chiaroni et al. 2009). The following
figure is a series of maps, which show the distribution of haplogroups from Figure 2.
The concentration of color, which illustrates spatial distribution, shows that the
populations carrying each haplogroup did most likely migrate slowly and
homogenously from their place of origin (Chiaroni et al. 2009).
Figure 3. (Chiaroni et al. 2009)
How does this expansion relate to cultural evolution, which includes
language? Chiaroni proposes that as humans migrated, culture developed and
became extremely localized, and eventually cultural evolution became so effective at
meeting biological needs that natural selection essentially ceased to have an effect.
Therefore, when humans migrated their culture had to adapt to their new
environment, but the elimination of natural selection meant that a haplogroup
would not die out in any area (Chiaroni et al. 2009).
It is clear that geography has a very significant effect on human genetics, but
what about language? What are the results of Jay’s study? Jay et al. (2011) set out to
determine if geographic and linguistic variables improve the estimation of Native
American genetic cluster membership. They found that geography alone is a very
good predictor of genetic cluster membership, with a correlation coefficient of 0.81.
Based on the two studies just discussed, this result is easily validated. However,
when the linguistic variable was added the correlation improved to 0.98 for the
finest linguistic classification, the family level (Jay et al. 2011).
Figure 4. (Jay et al. 2011)
Figure 4 shows the improvement in correlation when both geographical and
linguistic variables are considered. The correlation between language and
prediction of genetic cluster is very good, but it’s not perfect. Figure 5A below shows
a comparison of genetic cluster membership (estimation) and the prediction of
membership based on geography and geography plus language. 5B-D show the
breakdown of language classification used in the study.
Figure 5. (Jay et al. 2011)
The red boxes show where there is a significant difference between the
prediction based on geography and the prediction based on geography plus
language (Jay et al. 2011). However, the authors state that even the prediction with
language is not a perfect predictor. When historical expansion of populations
involved language replacement, as with the Tupi expansion, the lines between
genetics and language become blurred. Today the Tupi family contains
approximately 41 languages but many, many more populations of very small size
(Jay et al. 2011).
The study of Native American genetics was conducted using autosomal
microsatellite loci. They compiled their data set of DNA from 512 individuals of 28
different populations, obtained from the Human Genome Diversity Panel. The
individuals were genotyped at 678 microsatellite loci. The large sample size as well
as the large number of loci typed ensured reliable results (Jay et al 2011). This was
determined by first using simulated data, which at only 100 loci and geographical
and linguistic variables considered had a 0% rate of misclassifying individuals, as
seen in Figure 6 below.
Figure 6. (Jay et al. 2011)
The results of Jay’s study do not only hold true for Native American
populations. Rather, the same conclusions can be found in populations all over the
world. One such study, Parallel Evolution of Genes and Languages in the Caucasus
Region, found the same results among populations in the Caucasus region between
the Black Sea and the Caspian Sea. While this area covers much less land than the
Americas, it experienced similar diversification because of the mountainous terrain
(Balanovsky et al. 2011). This study was conducted using Y-chromosomal variation,
rather than autosomal microsatellite loci. Y-chromosomal variation is ideal for
population and evolution studies because it passes directly from father to son
without recombination with the mother’s DNA.
Unlike the Native American study, Balanovsky et al. (2011) did use tree-
based tests of haplogroup frequency and linguistic variation. Just from a brief
examination of the trees in Figure 7 it is evident that genetic clusters and language
groups do mirror each other.
Figure 7. (Balanovsky et al. 2011)
This study found that the Caucasus region contains four major haplogroups,
separated by distinct boundaries as shown in Figure 8. These boundaries also
coincide with four major linguistic groups of the Caucasus (Balanovsky et al. 2011).
Figure 8. (Balanovsky et al. 2011)
This study of linguistics and genetics in the Caucasus region reaches the
same conclusion as the study of Native American languages, that linguistic diversity
is as important as geography in shaping genetic diversity. The authors believe that
the genetic structure of the Caucasus may have evolved in a parallel process with
the diversification of Caucasus languages (Balanovsky et al. 2011). They authors
postulate that language and geography are probably so closely linked because of the
mountainous nature of the Caucasus region, and language likely had a larger
influence on genetic drift because of marriage and individual migrations linking
similar populations (Balanovsky et al. 2011).
These two studies were conducted very differently and on completely
different populations, and yet they yielded the same results. Language, when
coupled with geography, is an extremely accurate predictor of genetic cluster
membership. This is because the genetic patterns in human populations reflect very
ancient demographic events (Jay et al. 2011). Both papers suggest that cultural
traits contribute to gene flow between groups, because individuals are more likely
to move between groups if they share aspects of culture, like language. Individuals
would also generally prefer to mate with someone of the same language group (Jay
et al. 2011; Balanovsky et al. 2011).
These are very conclusive findings, but is there a practical application? It is
well known that there is variation between populations with respect to
susceptibility for certain diseases (Manica et al. 2005). There has been much focus
towards ethnic-specific drug processing, but these papers suggest that perhaps
geographic origin or linguistic group of the individual would be a better basis for
drug tailoring (Jay et al. 2011). The depth of relationship between genetics and
medicine is only just beginning to be understood, but these studies show that there
is a broader picture to be considered, one that could potentially create great strides
in personalized medicine.
Works Cited
Atkinson, Q. Phonemic diversity supports a serial founder effect model of language
expansion from Africa. Science. 332, 346-349 (2011).
Balanovsky, O., Dibirova, K., Dybo, A. et al. Parallel evolution of genes and languages
in the Caucasus region. Molecular Biology and Evolution. 28, 2905-2918
(2011).
Chiaroni, J., Underhill, P., and Cavalli-Sforza, L. Y chromosome diversity, human
expansion, drift, and cultural evolution. PNAS. 106, 20174-20179 (2009).
Jay, F., Francois, O., and Blum, M. Predictions of Native American population
structure using linguistic covariates in a hidden regression framework. PLoS
ONE. 6, 1-11 (2011).
Manica, A., Prugnolle, F., Balloux, F. Geography is a better determinant of human
genetic differentiation than ethnicity. Human Genetics. 118, 366-371 (2005).