the genetic assimilation in language borrowing inferred from ......and tai-kadai speaking zhuang...

11
RESEARCH ARTICLE The genetic assimilation in language borrowing inferred from Jing People Xiufeng Huang 1 * | Qinghui Zhou 1 * | Xiaoyun Bin 1 | Shu Lai 1 | Chaowen Lin 1 | Rong Hu 2,3 | Jiashun Xiao 4 | Dajun Luo 4 | Yingxiang Li 4 | Lan-Hai Wei 5 | Hui-Yuan Yeh 6 | Gang Chen 4 | Chuan-Chao Wang 2,3 1 College of Basic Medical Sciences, Youjiang Medical University for Nationalities, Baise, Guangxi 533000, China 2 Department of Anthropology and Ethnology, Xiamen University, Xiamen 361005, China 3 International Medical Anthropology Team, Xiamen University, Xiamen 361005, China 4 WeGene, Shenzhen 518040, China 5 Institut National des Langues et Civilisations Orientales, Paris 75214, France 6 School of Humanities, Nanyang Technological University, Nanyang 639798, Singapore Correspondence Chuan-Chao Wang, No. 422, Siming South Road, Xiamen, Fujian, China. Email: [email protected] and Xiufeng Huang, No. 98, Chengxiang Road, Baise, Guangxi, China. Email: [email protected] Funding information The Construction Project for Promoting Technological Innovation of Guangxi Universities for the Laboratory of Physical Characteristics of Guangxi Minorities, Grant Number: Gui[2015]5; Nanqiang Outstanding Young Talents Program of Xiamen University Abstract Objectives: The Jing people are a recognized ethnic group in Guangxi, southwest China, who are the immigrants from Vietnam during the 16th century. They speak Vietnamese but with lots of lan- guage borrowings from Cantonese, Zhuang, and Mandarin. However, its unclear if there is large- scale gene flow from surrounding populations into Jing people during their language change due to the very limited genetic information of this population. Materials and Methods: We collected blood samples from 37 Jing and 3 Han Chinese individuals from Wanwei, Shanxin, and Wutou islands in Guangxi and genotyped about 600,000 genome- wide single nucleotide polymorphisms (SNPs). We used Principal Component Analysis (PCA), ADMIXTURE analysis, f statistics, qpWave and qpAdm to infer the population genetic structure and admixture. Results: Our data revealed that the Jing people are genetically similar to the populations in south- west China and mainland Southeast Asia. But compared with Vietnamese, they show significant evidence of gene flow from surrounding East Asians. The admixture proportion is estimated to be around 3542% in different Jing groups using southern Han Chinese as a proxy. The majority of the paternal lineages of Jing people are most likely from surrounding East Asians. Discussion: We conclude that the formation and language change of present-day Jing people have involved genetic assimilation of surrounding East Asian populations. The language borrowing, in this case, is not only a cultural phenomenon but has involved demic diffusion. KEYWORDS gene flow, Jing people, language borrowing, population admixture 1 | INTRODUCTION The Jing people form a relatively small population that lives mostly in the three islands of Wutou, Wanwei, and Shanxin on the southern tip of Guangxi off the southwestern coast of mainland China. They are officially recognized as one of the 56 ethnic groups in China. The his- torical literature records the ancestor of Jing people migrated from northern Vietnam to southwest China at the beginning of the 16th century (Olson, 1998). They are now living among the Han Chinese and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar to Vietnamese but with a large number of variations, which are not only observed in pronuncia- tion and vocabulary but also in grammars. For example, the Jing lan- guage has adopted a tonal system from Tai-Kadai languages (Wei, 2006). The special population history makes Jing people as a very good example to investigate the relationships of language borrowing and genetic influence from surrounding populations. Forster and Renfrew *Xiufeng Huang and Qinghui Zhou contributed equally to this work. 638 | V C 2018 Wiley Periodicals, Inc. wileyonlinelibrary.com/journal/ajpa Am J Phys Anthropol. 2018;166:638648. Received: 9 November 2017 | Revised: 10 January 2018 | Accepted: 15 February 2018 DOI: 10.1002/ajpa.23449

Upload: others

Post on 04-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

R E S E A R CH AR T I C L E

The genetic assimilation in language borrowinginferred from Jing People

Xiufeng Huang1* | Qinghui Zhou1* | Xiaoyun Bin1 | Shu Lai1 | Chaowen Lin1 |

Rong Hu2,3 | Jiashun Xiao4 | Dajun Luo4 | Yingxiang Li4 | Lan-Hai Wei5 |

Hui-Yuan Yeh6 | Gang Chen4 | Chuan-Chao Wang2,3

1College of Basic Medical Sciences, Youjiang

Medical University for Nationalities, Baise,

Guangxi 533000, China

2Department of Anthropology and

Ethnology, Xiamen University, Xiamen

361005, China

3International Medical Anthropology Team,

Xiamen University, Xiamen 361005, China

4WeGene, Shenzhen 518040, China

5Institut National des Langues et

Civilisations Orientales, Paris 75214, France

6School of Humanities, Nanyang

Technological University, Nanyang 639798,

Singapore

Correspondence

Chuan-Chao Wang, No. 422, Siming South

Road, Xiamen, Fujian, China.

Email: [email protected]

and

Xiufeng Huang, No. 98, Chengxiang Road,

Baise, Guangxi, China.

Email: [email protected]

Funding information

The Construction Project for Promoting

Technological Innovation of Guangxi

Universities for the Laboratory of Physical

Characteristics of Guangxi Minorities,

Grant Number: Gui[2015]5; Nanqiang

Outstanding Young Talents Program of

Xiamen University

Abstract

Objectives: The Jing people are a recognized ethnic group in Guangxi, southwest China, who are

the immigrants from Vietnam during the 16th century. They speak Vietnamese but with lots of lan-

guage borrowings from Cantonese, Zhuang, and Mandarin. However, it’s unclear if there is large-

scale gene flow from surrounding populations into Jing people during their language change due to

the very limited genetic information of this population.

Materials and Methods: We collected blood samples from 37 Jing and 3 Han Chinese individuals

from Wanwei, Shanxin, and Wutou islands in Guangxi and genotyped about 600,000 genome-

wide single nucleotide polymorphisms (SNPs). We used Principal Component Analysis (PCA),

ADMIXTURE analysis, f statistics, qpWave and qpAdm to infer the population genetic structure and

admixture.

Results: Our data revealed that the Jing people are genetically similar to the populations in south-

west China and mainland Southeast Asia. But compared with Vietnamese, they show significant

evidence of gene flow from surrounding East Asians. The admixture proportion is estimated to be

around 35–42% in different Jing groups using southern Han Chinese as a proxy. The majority of

the paternal lineages of Jing people are most likely from surrounding East Asians.

Discussion: We conclude that the formation and language change of present-day Jing people

have involved genetic assimilation of surrounding East Asian populations. The language borrowing,

in this case, is not only a cultural phenomenon but has involved demic diffusion.

K E YWORD S

gene flow, Jing people, language borrowing, population admixture

1 | INTRODUCTION

The Jing people form a relatively small population that lives mostly in

the three islands of Wutou, Wanwei, and Shanxin on the southern tip

of Guangxi off the southwestern coast of mainland China. They are

officially recognized as one of the 56 ethnic groups in China. The his-

torical literature records the ancestor of Jing people migrated from

northern Vietnam to southwest China at the beginning of the 16th

century (Olson, 1998). They are now living among the Han Chinese

and Tai-Kadai speaking Zhuang people in nearby counties and towns.

The language that Jing people speak is similar to Vietnamese but with a

large number of variations, which are not only observed in pronuncia-

tion and vocabulary but also in grammars. For example, the Jing lan-

guage has adopted a tonal system from Tai-Kadai languages (Wei,

2006).

The special population history makes Jing people as a very good

example to investigate the relationships of language borrowing and

genetic influence from surrounding populations. Forster and Renfrew*Xiufeng Huang and Qinghui Zhou contributed equally to this work.

638 | VC 2018Wiley Periodicals, Inc. wileyonlinelibrary.com/journal/ajpa Am J Phys Anthropol. 2018;166:638–648.

Received: 9 November 2017 | Revised: 10 January 2018 | Accepted: 15 February 2018

DOI: 10.1002/ajpa.23449

Page 2: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

proposed that the language change in an already-populated region may

require a minimum proportion of immigrant males as reflected in the

strong association between languages with paternal Y chromosomes

but not with maternal Mitochondrial DNA (mtDNA) (Forster and

Renfrew, 2011). Therefore, it is expected that both Vietnamese and

local populations may have contributed to the gene pool of modern

Jing people, especially from perspective of Y-chromosome. A striking

feature of the Y chromosomal profile of Vietnamese is the high fre-

quency of haplogroup O2b-M176 (the updated name O1b2-M176)

ranging from 8.3 to 14%, which is mainly found among Japanese and

Korean people (Jin et al., 2003; Kim et al., 2011). The Y chromosomal

haplogroup frequencies of Vietnamese (Kim et al., 2011) and Kinh Viet-

namese (Poznik et al., 2016; https://www.yfull.com/tree/) are summar-

ized in Table 1. Zhong et al. genotyped the Y chromosomal SNPs of 45

Jing male individuals and found 91% belonging to East and Southeast

Asian specific haplogroup O-M175, 4.4% belonging to C3-M217, 2.2%

belonging to South Asian specific lineage H1-M370, and 2.2% belong-

ing to Q1a1a-M120 (Zhong et al., 2011). The Y chromosomal profile

suggests Jing people may have some South Asian admixture, but it

could not be able to tease apart the sources of Vietnamese and sur-

rounding East Asian ancestry without genotyping downstream makers

of the basal East and Southeast Asian clade O-M175.

On the maternal side, Li et al. reported the Kinh Vietnamese have

high frequencies of maternal mtDNA haplogroup B (B*, 4.2%; B4,

10.5%; B5, 6.3%), F (F*, 10.4%; F1a, 18.8%), M7b (M7b*, 8.3%; M7b1,

8.3%), and R (R*, 2.1%; R9b, 6.3%) (Li et al., 2007). Pischedda et al.

reported the typical Southeast Asian mtDNA lineages predominate in

the Vietnamese, for example, haplogroup M7 comprises 20%, M(3D,C)

comprises 29%, R9’F reaches 27%, and haplogroup B accounts for 25%

(Pischedda et al., 2017). Pischedda et al. also proposed the Vietnamese

are an admixture of an East Asian component from south China and a

southern Asian ancestral composite represented by the Malay 800

years ago using genomic data (Pischedda et al., 2017). Although a few

genome-wide studies of Vietnamese have been conducted so far, the

genomic and mtDNA data for Jing people are never reported before.

In this study, we report the genome-wide data of Jing people for

the first time. We analyze about 600,000 genome-wide SNPs including

18963 Y-chromosome and 4448 mtDNA phylogenetic relevant SNPs

from 40 samples collected from three Jing groups and one Han Chinese

group from the three islands of Wutou, Wanwei, and Shanxin in south-

ern Guangxi. We aim to explore the genetic structure and admixture of

Jing populations and shed light on the understanding of language

changes from a genetic perspective.

2 | MATERIALS AND METHODS

2.1 | Sampling and genotyping

We collected blood samples from 24 unrelated Jing and 3 Han Chinese

individuals from Wanwei, 8 Jing individuals from Wutou, and 5 Jing

TABLE 1 The Y chromosomal haplogroup frequency of Vietnamese and Kinh

Vietnamese Kinh

Haplogroup SNP Frequency Haplogroup SNP Frequency

O2* M122 2.08% O1b1a1a1a1a M88 26.10%

O2a* M324 29.17% O2a2a1a2 M7 13.00%

O2a2 P201 27.08% O2a1c1a F11 8.70%

C2 M217 12.50% O1b1a1a1b M1283 6.50%

O1a M119 4.17% O2a2b1a1a3a F2188 4.30%

O1b P31 10.42% O2a2b1a1a4a CTS5063 2.20%

O1b2a1a1 47z 4.17% O2a2b1a1a6 CTS1642 2.20%

D1 M15 2.08% O1a1a1 F446 (xF140) 4.30%

K* M9 2.08% O2a1 L127.1 4.30%

N M231 2.08% O2a2b1a2a1a2 F634 4.30%

Q1a1a M120 4.30%

O2a2b2a F871 2.20%

O2a2b1a2b F1725 2.20%

O2a2b1a2a1a1 F48 2.20%

N1b2a M1811 2.20%

N1a2a M128 2.20%

C2c1b F845 2.20%

F Y27277 2.20%

HUANG ET AL. | 639

Page 3: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

individuals from Shanxin. The criteria for the sample collecting are that

the people have lived there and don’t have recorded intermarriages

with other surrounding populations for at least three generations. Our

study was approved by the Ethical Committee of Youjiang Medical Uni-

versity for Nationalities. The study was conducted in accordance with

the human and ethical research principles of Youjiang Medical Univer-

sity for Nationalities. Informed consent was obtained from all individual

participants included in the study. Genomic DNA was extracted using

DP-318 Kit (Tiangen Biotechnology, Beijing). The DNA quality control

was carried out at the experimental centre of BGI-Shenzhen. Genotyp-

ing was performed on the Affymetrix WeGene V1 Arrays covering

596,744 SNPs at the WeGene genotyping centre, Shenzhen. The

WeGene V1 arrays were designed to identify all known paternal Y-

chromosome and maternal mtDNA lineages by adding 18963 Y-

chromosome and 4448 mtDNA phylogenetic relevant SNPs to the

Infinium Global Screening Array (GSA) (Yao et al., 2017a,b). The

dataset generated during the current study can be downloaded by the

following link when the paper is published: http://pan.xmu.edu.cn/s/

f4THZOEvSGs.

2.2 | Data merging

We merged our 40 samples with previously published populations

from International HapMap Project Phase 3 (International HapMap

Consortium, 2003), Human Genome Diversity Project (HGDP) (Li et al.,

2008), Simons Genome Diversity Project (SGDP) (Mallick et al., 2016),

and Vietnamese and Thai samples of the Asian Diversity Project (ADP)

(Liu et al., 2017), Tibetan samples from Lhasa and Yunnan province

(Beall et al., 2010; Wang et al., 2011), Archaic Altai Neanderthal (Pr€ufer

et al., 2014) and Denisovan genomes (Meyer et al., 2012), the 40,000-

year-old Tianyuan sample (Yang et al., 2017), and ancient West Eura-

sians (Jones et al., 2015; Lazaridis et al., 2014; Mathieson et al., 2015).

We finally generated a combined dataset covering 280,950 SNPs that

were used in subsequent analysis.

2.3 | Principal component analysis

We used smartpca (version: 13050), part of the EIGENSOFT package

(Patterson, Price, & Reich, 2006) to carry out Principal Component

Analysis (PCA). We did not perform any outlier removal iterations

(numoutlieriter: 0). We set all other options to the default. We assessed

statistical significance with a Tracy-Widom test using the twstats

program of EIGENSOFT. All the first six principal components that we

discuss and plot in what follows were highly statistically significant

(P<10212).

2.3.1 | f3-statistics

We computed statistics of the form f3 (Mbuti; X, Y) using the qp3Pop

program of ADMIXTOOLS (Patterson et al., 2012; Reich, Thangaraj,

Patterson, Price, & Singh, 2009), which measure the shared genetic

drift between populations X and Y since their separation from an Afri-

can outgroup Mbuti.

2.3.2 | f4-statistics

We computed f4-statistics of the form f4(X, Y; Test, Outgroup) using

the qpDstat program of ADMIXTOOLS (Patterson et al., 2012; Reich

et al., 2009) with default parameters to show if population Test is sym-

metrically related to X and Y or shares an excess of alleles with either

of the two, with standard errors computed with a block jackknife.

2.3.3 | qpWave analysis

We used the qpWave program of ADMIXTOOLS (Patterson et al.,

2012; Reich et al., 2009) with default parameters to test the number of

sources of ancestry that are needed to explain the variation of Jing and

Vietnamese populations. We used Mbuti, Druze, Bedouin, Kalash,

Papuan, Sardinian, Karitiana, Onge, Ulchi, and Eskimo_Sireniki as out-

groups because those groups are unlikely to have been affected by

recent gene flow with Jing and Vietnamese and might be differentially

related to the ancestral sources of Jing.

2.3.4 | qpAdm estimation

We used the qpAdm program of ADMIXTOOLS (Patterson et al., 2012;

Reich et al., 2009) with default parameters to estimate the admixture

proportions of tested populations with the proposed sources. We used

Vietnamese and Han as two sources and the same ten populations as

outgroups as in the above qpWave analysis.

2.4 | ADMIXTURE analysis

We carried out model-based clustering analysis using ADMIXTURE

1.30 (Alexander, Novembre, & Lange, 2009) by combining the present-

day worldwide populations with our 40 newly-genotyped individuals.

We used PLINK v1.90 (Chang et al., 2015) to thin the dataset of

280,950 autosomal SNPs to remove SNPs in strong linkage disequili-

brium, employing a window of 200 SNPs advanced by 25 SNPs and an

r2 threshold of 0.4 (with the flag: –indep-pairwise 200 25 0.4). A total

of 157,880 SNPs remained for analysis after this procedure. We ran

ADMIXTURE with default 5-fold cross-validation (–cv55), varying the

number of ancestral populations between K52 and K516 in 100

bootstraps with different random seeds. We used the unsupervised

ADMIXTURE approach, in which allele frequencies for non-admixed

ancestral populations are unknown and are computed during the analy-

sis. We used point estimation and terminated the block relaxation algo-

rithm when the objective function delta <0.0001. We chose the best

run according to the highest log likelihood. We used cross-validation to

identify an “optimal” number of clusters. We observed the lowest CV

errors for K514.

2.5 | Y chromosomal and mtDNA

haplogroup assignment

We assign the Y chromosomal and mtDNA haplogroups using in-house

tools following the International Society of Genetic Genealogy Y-DNA

Haplogroup Tree 2016, Version: 1.87, Date: March 29, 2016, http://

www.isogg.org/tree/ March 30, 2016; and mtDNA tree Build 16 (van

Oven and Kayser, 2009), http://www.phylotree.org/.

640 | HUANG ET AL.

Page 4: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

3 | RESULTS

We first carried out the model-based ADMIXTURE clustering analysis

to get a broad overview of the worldwide population genetic structure.

We used cross-validation to identify an “optimal” number of clusters

and observed the lowest CV errors for K514 (Figure 1, Supporting

Information Figure S1). At K514, we observed five main components

in East Eurasia. One of these components is enriched in Melanesian

FIGURE 1 The ADMIXTURE analysis of newly generated Jing and Han Chinese samples from Guangxi with other worldwide populations.We here only show the East Eurasian part of the plot at K514 with the lowest CV errors

FIGURE 2 Shared genetic drift among populations, measured by Outgroup f3 statistics (Mbuti; X, Y). Lighter colors indicate more shareddrift

HUANG ET AL. | 641

Page 5: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

and Papuan but seldom seen in any other populations. The second

component is enriched in Yakut and also shown in Tungusic and Mon-

golic speaking groups in northern China. The third component is

enriched in Tibetans and also found prevalent in Han Chinese and

Tibeto-Burman speaking groups. The fourth component was found to

be at highest proportions in the populations living in south China and

Southeast Asia, such as Tai-Kadai speaking Dai and Thai, Austroasiatic

speaking Vietnamese, and Austronesian speaking Ami and Atayal. Our

newly genotyped Jing samples are genetically similar with Dai and

Vietnamese with high proportions of the above fourth component. The

Han Chinese samples collected from Wanwei fall out of the general

clustering pattern of Han Chinese but show great similarity with the

Jing people. The fifth component is enriched in Japanese and also pres-

ent in various groups in East Asia. The Thai and Cambodian also have

some of the component that is enriched in Gujarati Indians (GIH), but

Jing samples don’t seem to have this ancestry. The outgroup f3-statis-

tics of the form f3 (Mbuti; X, Y) are consistent with the patterns

observed in the above ADMIXTURE plot suggesting a close genetic

FIGURE 3 Top two principal components of newly generated Jing and Han Chinese samples from Guangxi with other East Asianpopulations. CHB: Han Chinese in Beijing, China; CHD: Chinese in metropolitan Denver, CO, United States; JPT: Japanese in Tokyo, Japan;Han—NChina: Han Chinese in northern China

642 | HUANG ET AL.

Page 6: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

proximity between different Jing groups and other southern China and

Southeast Asian populations, especially Ami, Atayal, Dai, and Vietnam-

ese (Figure 2).

We next performed PCA using East and Southeast Asian popula-

tions. We observed the following five genetic clusters: Japanese;

Tungusic and Mongolic-speaking groups; Tibetans; Han Chinese; and

Southeast Asians including Cambodian and Thai. The Vietnamese are

plotted between the Han Chinese and Southeast Asian cluster. Our

newly reported Jing samples are laying right between Vietnamese and

Han Chinese in PC1 and PC2 (Figure 3). In PC3 and PC4, most of the

Jing samples overlap with Han Chinese (Figure 4). The PCA plots sug-

gest there is an excess affinity between Jing and Han Chinese com-

pared with Vietnamese.

We then used qpWave to test if Vietnamese and Jing are homoge-

neous or not by determining the number of sources of ancestry that

are needed to explain the variation of Vietnamese and Jing populations.

When the Vie Vietnamese and three Jing populations are analyzed

together as the tested groups, a minimum of two streams of ancestry

are needed to relate them to the outgroups: P51.342e-22 for rank 0

which amounts to a test for a single source of ancestry; P50.610 for

FIGURE 4 The third and fourth principal components of newly generated Jing and Han Chinese samples from Guangxi with other EastAsian populations. CHB: Han Chinese in Beijing, China; CHD: Chinese in metropolitan Denver, CO, United States; JPT: Japanese in Tokyo,Japan; Han—NChina: Han Chinese in northern China

HUANG ET AL. | 643

Page 7: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

rank 1 which amounts to a test for two streams of ancestry. We next

dropped each outgroup to determine if this pattern is driven by a cer-

tain outgroup with extra affinity to the tested populations. The single

source of ancestry is rejected in all the outgroup-dropping test (Table

2). Thus, the qpWave analysis strongly suggest that Vietnamese and

Jing are not derived from a single homogeneous population.

Motivated by the PCA and qpWave analysis, we continue to do f4-

statistics in the form of f4 (Vietnamese, Mbuti; Jing, Test), f4 (Test,

Mbuti; Jing, Vietnamese), and f4 (Jing, Mbuti; Vietnamese, Test) to for-

mally determine which populations attract Jing people compared with

Vietnamese (Table 3). We found Vietnamese share significant more

alleles with Jing than with many other northern East Asian groups but

seems to be equally related to Jing and southern groups, such as Dai,

She, and Atayal. Most East Asian groups share significant more alleles

with Jing than with Vietnamese, suggesting there might be gene flow

from other East Asians into Jing people after their separation from

Vietnamese about 500 years ago. This signal is not driven by excess

affinity between Vietnamese with Archaic Humans, Upper Paleolithic

East Asian, West Eurasians, South Asians, and Oceanians, since we

have not found any significant negative Z-scores in f4 (Test, Mbuti;

Jing, Vietnamese) when using Altai Neandertal, Denisovan, Tianyuan,

Loschbour, Anatolia_Neolithic, Kotias, French, Balochi, Sindhi, Burusho,

GIH, Onge and Papuan in the position of “Test” (Table 3). The Jing peo-

ple share significant more alleles with Han (southern Han Chinese from

HGDP), CHD, She, Dai, and Ami than with Vietnamese, which implies

the East Asian related gene flow into Jing people was probably from

southern Han Chinese and other southern indigenous populations.

TABLE 2 The P values of the qpWave analysis suggest Vietnameseand Jing are not derived from a single homogeneous population

P value

Outgroup dropping 1 2

No drop 1.342E-22 0.610

Mbuti 6.222E-22 0.854

Druze 7.560E-23 0.468

Bedouin 1.539E-23 0.543

Kalash 2.024E-22 0.734

Papuan 8.499E-24 0.508

Sardinian 3.060E-23 0.618

Karitiana 2.078E-22 0.485

Onge 1.665E-21 0.768

Ulchi 2.063E-15 0.480

Eskimo_Sireniki 4.625E-19 0.498

TABLE 3 The Z scores of f4 (Test, Mbuti; Jing, Vietnamese), f4 (Jing, Mbuti; Vietnamese, Test) and f4 (Vietnamese, Mbuti; Jing, Test)

Test f4 (Test, Mbuti; Jing, Vietnamese) f4 (Jing, Mbuti; Vietnamese, Test) f4 (Vietnamese, Mbuti; Jing, Test)

Wanwei Shanxin Wutou Wanwei Shanxin Wutou Wanwei Shanxin Wutou

CHB 9.445 3.787 6.476 2.488 4.072 4.198 9.694 5.670 8.463

CHD 9.128 4.059 6.478 25.614 23.925 23.471 4.330 2.176 4.347

Han-NChina 8.740 4.209 6.112 8.135 7.643 8.670 12.901 9.254 12.138

Tujia 8.702 3.818 6.191 20.712 20.084 0.177 4.915 3.237 5.015

Han 8.664 4.145 6.173 26.608 25.696 24.634 2.431 1.145 2.923

JPT 8.656 3.518 6.265 12.120 12.500 12.729 16.022 10.661 14.044

Mongola 8.439 3.191 5.947 21.577 20.165 21.801 24.548 18.945 22.454

Hezhen 8.357 3.204 6.178 14.031 14.072 13.969 17.953 13.931 16.259

Japanese 8.348 3.323 6.103 9.699 9.832 10.056 13.628 9.759 12.614

Xibo 8.280 2.997 5.000 18.346 18.800 19.108 22.609 16.980 20.256

Daur 8.176 3.277 5.591 16.234 15.890 16.554 19.748 15.796 18.827

Tu 8.172 4.197 5.279 21.747 19.555 22.160 24.924 18.680 22.799

She 8.139 3.854 5.725 24.488 23.734 23.266 1.304 0.594 2.024

Tibetan_Yunnan 8.107 3.357 5.717 21.609 19.989 21.171 24.511 17.095 20.482

Oroqen 8.077 3.785 5.899 16.867 15.611 16.566 20.013 16.224 18.417

Tibetan_Lhasa 7.763 3.590 5.921 27.183 24.430 25.809 29.926 21.856 25.502

Miao 7.683 3.144 5.818 22.168 21.258 21.545 2.983 1.817 3.318

(Continues)

644 | HUANG ET AL.

Page 8: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

We estimated the proportion of the ancestry derived from sur-

rounding East Asian populations in Jing people. We used Vietnamese

and the southern Han Chinese from HGDP as two proxies of the possi-

ble ancestral sources. The Jing people are suggested to have derived

35.5% to 43.0% of ancestry from southern Han related populations

(Table 4). The Jing people in Shanxin island have the lowest Han related

admixture, while the people in Wanwei and Wutou have higher pro-

portions of Han related component. The observation is consistent with

geographic information that Shanxin is an isolated island with only Jing

people living there, but Wanwei and Wutou are connected with the

mainland since the 1960s with Jing people living together with Han

and Zhuang people. The Han Chinese samples collected from Wanwei

are suggested to have the highest Han related ancestry at about 53.7%

and the left 46.3% Vietnamese related ancestry (Table 4).

4 | DISCUSSION

The Jing people are a relatively small population living mostly in the

three islands of Wutou, Wanwei, and Shanxin in southern Guangxi

after they separated from Vietnamese and migrated to China about

500 years ago. Their language has changed a lot compared with Viet-

namese by borrowing lots of words, constructions and other features

from surrounding Cantonese, Zhuang, and Mandarin languages. How-

ever, it’s unclear if their language change is accompanied by gene flow

TABLE 3 (Continued)

Test f4 (Test, Mbuti; Jing, Vietnamese) f4 (Jing, Mbuti; Vietnamese, Test) f4 (Vietnamese, Mbuti; Jing, Test)

Wanwei Shanxin Wutou Wanwei Shanxin Wutou Wanwei Shanxin Wutou

Naxi 7.620 3.743 5.199 10.861 10.274 11.483 14.224 11.018 13.547

Yakut 7.567 2.488 4.611 38.188 35.768 36.803 39.817 31.699 35.645

Yi 7.551 3.081 5.895 11.292 10.972 11.213 15.095 10.929 13.662

Dai 7.073 3.441 6.166 24.353 24.064 24.348 0.619 0.104 1.438

Atayal.SGDP 5.806 1.941 4.507 21.591 20.845 21.503 0.557 0.346 0.892

Thai 4.980 2.642 4.964 44.025 40.492 40.981 29.330 17.251 22.902

Lahu 4.879 2.716 4.825 6.111 5.180 5.496 8.413 6.378 8.230

Ami.SGDP 4.875 2.709 3.754 25.112 24.802 24.516 22.981 22.773 22.383

Cambodian 2.936 1.185 3.072 22.964 21.075 22.058 21.380 15.741 19.675

Altai Neandertal 0.653 20.644 20.698 100.000 100.000 100.000 100.000 100.000 100.000

Denisovan 0.640 0.098 20.239 100.000 100.000 100.000 100.000 100.000 100.000

Loschbour 21.117 21.110 1.151 43.816 41.703 42.169 44.221 42.809 44.169

Anatolia_Neolithic 21.408 21.648 0.858 66.542 63.916 64.900 65.829 63.464 65.420

Kotias 21.550 21.566 20.438 45.420 43.674 44.484 45.729 44.132 45.884

Tianyuan 1.238 20.096 1.244 27.729 26.605 27.772 28.567 27.468 29.021

Onge.DG 20.315 21.937 1.586 39.445 38.259 38.064 40.340 37.736 39.588

Papuan 1.026 20.372 0.980 41.744 40.688 41.403 42.628 41.003 42.788

Balochi 20.787 21.632 0.780 71.888 71.181 70.318 70.833 67.538 69.835

Sindhi 20.200 21.770 1.242 71.990 70.875 70.546 70.945 66.791 69.608

Burusho 0.760 20.707 1.616 69.201 67.451 67.383 68.683 63.496 66.557

GIH 20.140 21.530 1.205 68.723 67.702 67.254 67.100 62.974 65.846

French 20.448 21.222 1.248 66.323 63.970 64.861 66.009 62.988 65.077

We highlight the significant positive (Z>3) and negative (Z<23) values.

TABLE 4 Mixture proportions estimated in qpAdm using Vietnam-ese as one source population and southern Han Chinese as theother

Proportion

Test population P Vietnamese Han Std. err

Jing_Wanwei 0.151 0.570 0.430 0.046

Jing_Shanxin 0.140 0.645 0.355 0.080

Jing_Wutou 0.307 0.632 0.368 0.061

Han_Wanwei 0.399 0.463 0.537 0.095

“P” is the P value for rank 1, “proportion” refers to the proportion ofgene flow from the two sources. The “std.err” is the standard errorestimated using a Block Jackknife.

HUANG ET AL. | 645

Page 9: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

TABLE 5 Y chromosomal and mtDNA haplogroup assignments

ID Sex Population mtDNA Y chromosome

Han01 Female Han_Wanwei C7a1 –

Han02 Female Han_Wanwei Y1 –

Han03 Female Han_Wanwei F1g –

Jing01 Female Jing_Shanxin F2a –

Jing02 Female Jing_Shanxin R9b2 –

Jing03 Female Jing_Shanxin B5a –

Jing04 Female Jing_Shanxin M7b1a1b –

Jing05 Male Jing_Shanxin B5b1 O1b1a1a1a1a-M88

Jing06 Male Jing_Wanwei M7b1a1 O1a1a1a-F140

Jing07 Female Jing_Wanwei M8a2b –

Jing08 Male Jing_Wanwei M12a2 O2a2b1a1a6-CTS1642

Jing09 Male Jing_Wanwei F2b O2a2b1a1a6-CTS1642

Jing10 Female Jing_Wanwei F2b –

Jing11 Male Jing_Wanwei B5a1 O2a2b1a1a6-CTS1642

Jing12 Male Jing_Wanwei F2b O2a2b1a1a6-CTS1642

Jing13 Female Jing_Wanwei N9a6 –

Jing14 Male Jing_Wanwei F1a1a O2a2b1a1a6-CTS1642

Jing15 Female Jing_Wanwei F2b –

Jing16 Male Jing_Wanwei N9a6 O1a1a1a1-F78

Jing17 Male Jing_Wanwei F1a O2a2b1a1a6-CTS1642

Jing18 Female Jing_Wanwei M8a2b –

Jing19 Female Jing_Wanwei B4a1e –

Jing20 Female Jing_Wanwei M7c2 –

Jing21 Male Jing_Wanwei B4a1e O1b1a1a1-M1348, M1310

Jing22 Female Jing_Wanwei M7c2 –

Jing23 Male Jing_Wanwei B4 O2a1a-Page127, F964, F3143

Jing24 Male Jing_Wanwei D5a2a1 O2a1a-Page127, F964, F3143

Jing25 Male Jing_Wanwei M7b1a1a3 D1a1-M15

Jing26 Male Jing_Wanwei F3a1 N1c1a-M178

Jing27 Male Jing_Wanwei D5b O2a2b1a1a-F8, F42

Jing28 Male Jing_Wanwei M7c2 O1a1a1a1-F78

Jing29 Male Jing_Wanwei F2b O1a1a1a-F140

Jing30 Female Jing_Wutou F1g –

Jing31 Male Jing_Wutou M7b1a1 O2a2a1a2-M7

Jing32 Female Jing_Wutou R9c1b1 –

Jing33 Female Jing_Wutou M71a1a –

Jing34 Female Jing_Wutou B5a1 –

Jing35 Male Jing_Wutou M7b1a1 O2a1c1a-F11

Jing36 Female Jing_Wutou B4 –

Jing37 Female Jing_Wutou B5a1a –

646 | HUANG ET AL.

Page 10: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

from surrounding populations due to the very limited available genetic

information. Here the comprehensive genome-wide study of Jing peo-

ple has shed light on the understanding of language changes from a

genetic perspective. Our results show that Jing people are genetically

close to Vietnamese but also with significant evidence of deriving addi-

tional ancestry from southern Han Chinese and other southern indige-

nous populations. The language borrowing of Jing people is not only a

cultural phenomenon but also has involved gene flow. We note that

the Han Chinese in the Wanwei looks genetically more like Jing than

Han, which we suspect is caused by the fact that there are only a few

Han individuals in the Jing villages and they have intermarriages with

Jing people.

On the paternal side, one interesting question is if the language

transition in Jing people follows the male-induced hypothesis that For-

ster and Renfrew have proposed (Forster and Renfrew, 2011). If there

is male-dominated admixture from surrounding populations into Jing,

we would expect to find lineages that are common in surrounding pop-

ulations but rare in Vietnamese. We found a high frequency of hap-

logroup O2a2b1a1a-F8, F42 in Jing people, including 5.3% of

O2a2b1a1a*-F8, F42 and 31.6% of O2a2b1a1a6-CTS1642 (Table 5).

The haplogroup O2a2b1a1a-F8, F42 is suggested to be one of the

three super-grandfathers for present-day Chinese that experienced

star-like expansions in Neolithic Era about 5.4 thousand years ago (Yan

et al., 2014). The sublineage O2a2b1a1a6-CTS1642 reaches high fre-

quency in Dai people in Xishuangbanna (23.1%), but very low fre-

quency in Kinh Vietnamese (2.2%) (Poznik et al., 2016). Therefore, this

sublineage has more chance of being introduced from surrounding Tai-

Kadai speaking populations into Jing people rather than migrating from

Vietnamese. However, without large-scale high-resolution genotyping,

we cannot rule out the possibility that some populations of Vietnam

also have high frequencies of this O2a2b1a1a6-CTS1642 lineage. The

second frequent haplogroup is O1a1a1a-F140 accounting for 21% of

Jing people together with its sublineage O1a1a1a1-F78 (Table 5). The

haplogroup O1a1a1a is a sublineage of O1a-M119, which is prevalent

along the southeast coast of China, occurring at high frequencies in

Tai-Kadai speaking people and Taiwan aborigines (Wang & Li 2013).

The haplogroup O1a1a1a-F140 is also not observed in Kinh Vietnam-

ese. We also found Hmong-Mien enriched lineage O2a2a1a2-M7 (Cai

et al., 2011) and central-eastern Chinese enriched lineage O2a1c1a-

F11 (Wang et al., 2013; Yao X et al., 2017), and Tai-Kadai frequent line-

age O1b1a1a1a1a-M88 in Jing people, which has also been detected

in Kinh Vietnamese (Poznik et al., 2016). The maternal mtDNA lineages

of Jing people are consistent with the general profile of southern China

and Southeast Asia with high frequencies of haplogroup B, F, and M7

(Li et al., 2007).

The language change is an interesting cultural practice that might

be influenced and reflected by population genetic admixture, as sug-

gested by Forster and Renfrew that the language change in an already-

populated region may need immigrant males as reflected in the strong

association between languages with paternal Y chromosomes (Forster

and Renfrew, 2011). The previous studies could not be able to identify

and distinguish the downstream lineages in close-related populations

due to the very limited number of phylogenetic relevant SNPs. The

recent next-generation sequencing of worldwide samples has yielded a

variety of novel SNPs, which have revolutionized the Y chromosomal

tree. We took advantage of the microarray SNP genotyping technology

and classified the Jing samples into very detailed and informative line-

ages. We found the majority of paternal lineages in Jing people, espe-

cially the haplogroup O2a2b1a1a6-CTS1642 and O1a1a1a-F140 are

most likely introduced from surrounding southern Han Chinese or Tai-

Kadai speaking populations rather than a genetic legacy from Vietnam-

ese 500 years ago. The genetic evidence in this study supports the

male-associated language change hypothesis regarding the formation

of present-day Jing people and their language.

The data presented in the present study is the first genome-wide

dataset of Jing people generated to date, which is not only valuable in

anthropological studies, but also in other applied fields, such as forensic

identification, paternity tests, and medical research.

ORCID

Rong Hu http://orcid.org/0000-0002-3115-784X

Chuan-Chao Wang http://orcid.org/0000-0001-9628-0307

REFERENCES

Alexander, D. H., Novembre, J., & Lange, K. (2009). Fast model-based

estimation of ancestry in unrelated individuals. Genome Research, 19,

1655–1664.

Beall, C. M., Cavalleri, G. L., Deng, L., Elston, R. C., Gao, Y., Knight, J., . . .

Zheng, Y. T. (2010). Natural selection on EPAS1 (HIF2a) associated

with low hemoglobin concentration in Tibetan highlanders. Proceedings

of the National Academy of Science of the United States of America, 107,

11459–11464.

Cai, X., Qin, Z., Wen, B., Xu, S., Wang, Y., Lu, Y., . . . Li, H. (2011). Human

migration through bottlenecks from Southeast Asia into East Asia

during Last Glacial Maximum revealed by Y chromosomes. PLoS One,

6, e24282.

Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., & Lee,

J. J. (2015). Second-generation PLINK, rising to the challenge of

larger and richer datasets. GigaScience, 4, 7. Available at: https,//

www.cog-genomics.org/plink2.

Forster, P., & Renfrew, C. (2011). Mother tongue and Y chromosomes.

Science (New York, N.Y.), 333, 1390–1391.

International HapMap Consortium. (2003). The International HapMap

Project. Nature, 426, 789–796.

Jin, H.-J., Kwak, K.-D., Hammer, M. F., Nakahori, Y., Shinka, T., Lee, J.-W., . . .

Kim, W. (2003). Y-chromosomal DNA haplogroups and their implications

for the dual origins of the Koreans. Human Genetics, 114, 27–35.

Jones, E. R., Gonzalez-Fortes, G., Connell, S., Siska, V., Eriksson, A., Marti-

niano, R., . . . Bradley, D. G. (2015). Upper Palaeolithic genomes reveal

deep roots of modern Eurasians. Nature Communications, 6, 8912.

Kim, S.-H., Kim, K.-C., Shin, D.-J., Jin, H.-J., Kwak, K.-D., Han, M.-S., . . .

Kim, W. (2011). High frequencies of Y-chromosome haplogroup O2b-

SRY465 lineages in Korea: A genetic perspective on the peopling of

Korea. Investigative Genetics, 2, 10.

Lazaridis, I., Patterson, N., Mittnik, A., Renaud, G., Mallick, S., Kirsanow, K.,

. . . Krause, J. (2014). Ancient human genomes suggest three ancestral

populations for present-day Europeans. Nature, 513, 409–413.

Li, H., Cai, X., Winograd-Cort, E. R., Wen, B., Cheng, X., Qin, Z., . . . Jin, L.

(2007). Mitochondrial DNA diversity and population differentiation in

HUANG ET AL. | 647

Page 11: The genetic assimilation in language borrowing inferred from ......and Tai-Kadai speaking Zhuang people in nearby counties and towns. The language that Jing people speak is similar

southern East Asia. American Journal of Physical Anthropology, 134,

481–488.

Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M., Ramachan-

dran, S., . . . Myers, R. M. (2008). Worldwide human relationships inferred

from genome-wide patterns of variation. Science, 319, 1100–1104.

Liu, X., Lu, D., Saw, W.-Y., Shaw, P. J., Wangkumhang, P., Ngamphiw, C.,

. . . Teo, Y.-Y. (2017). Characterising private and shared signatures of

positive selection in 37 Asian populations. European Journal of Human

Genetics, 25, 499–508.

Mallick, S., Li, H., Lipson, M., Mathieson, I., Gymrek, M., Racimo, F., . . .

Reich, D. (2016). The Simons Genome Diversity Project, 300

genomes from 142 diverse populations. Nature, 538, 201–206.

Mathieson, I., Lazaridis, I., Rohland, N., Mallick, S., Patterson, N., Rooden-

berg, S. A., . . . Reich, D. (2015). Genome-wide patterns of selection

in 230 ancient Eurasians. Nature, 528, 499–503.

Meyer, M., Kircher, M., Gansauge, M. T., Li, H., Racimo, F., Mallick, S., . . .

Sudmant, P. H. (2012). A high-coverage genome sequence from an

archaic Denisovan individual. Science, 338, 222–226.

Olson, J. S. (1998). An ethnohistorical dictionary of China. Westport:

Greenwood Press.

Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y., . . . Reich,

D. (2012). Ancient admixture in human history. Genetics, 192, 1065–1093.

Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and

eigenanalysis. PLoS Genetics, 2, e190.

Pischedda, S., Barral-Arca, R., G�omez-Carballa, A., Pardo-Seco, J., Catelli,

M. L., �Alvarez-Iglesias, V., . . . Salas, A. (2017). Phylogeographic and

genome-wide investigations of Vietnam ethnic groups reveal signa-

tures of complex historical demographic movements. Scientific

Reports, 7, 12630.

Poznik, G. D., Xue, Y., Mendez, F. L., Willems, T. F., Massaia, A., Wilson

Sayres, M. A., . . . Tyler-Smith, C. (2016). Punctuated bursts in human

male demography inferred from 1,244 worldwide Y-chromosome

sequences. Nature Genetics, 48, 593–599.

Pr€ufer, K., Racimo, F., Patterson, N., Jay, F., Sankararaman, S., Sawyer, S.,

. . . Pääbo, S. (2014). The complete genome sequence of a Neander-

thal from the Altai Mountains. Nature, 505, 43–49.

Reich, D., Thangaraj, K., Patterson, N., Price, A. L., & Singh, L. (2009).

Reconstructing Indian population history. Nature, 461, 489–494.

van Oven, M., & Kayser, M. (2009). Updated comprehensive phyloge-

netic tree of global human mitochondrial DNA variation. Human

Mutation, 30, E386–E394.

Wang, B., Zhang, Y. B., Zhang, F., Lin, H., Wang, X., Wan, N., . . . Yu, J.

(2011). On the origin of Tibetans and their genetic basis in adapting

high-altitude environments. PLoS One, 6, e17002.

Wang, C. C., & Li, H. (2013). Inferring human history in East Asia from Y

chromosomes. Investigative Genetics, 4, 11.

Wang, C.-C., Yan, S., Qin, Z.-D., Lu, Y., Ding, Q.-L., Wei, L.-H., . . . Li, H.

(2013). Late Neolithic expansion of ancient Chinese revealed by Y

chromosome haplogroup O3a1c-002611. Journal of Systematics and

Evolution, 51, 280–286.

Wei, S. G. (2006). The variations of Chinese Jing Dialect. Journal of

Guangxi University for Nationalities, 28, 13–18.

Yan, S., Wang, C.-C., Zheng, H.-X., Wang, W., Qin, Z.-D., Wei, L.-H., . . .

Jin, L. (2014). Y chromosomes of 40% Chinese descend from three

Neolithic super-grandfathers. PLoS One, 9, e105691.

Yang, M. A., Gao, X., Theunert, C., Tong, H., Aximu-Petri, A., Nickel, B.,

. . . Fu, Q. (2017). 40,000-Year-old individual from asia provides

insight into early population structure in Eurasia. Current Biology, 27,

3202–3208.e9.

Yao, H.-B., Tang, S., Yao, X., Yeh, H.-Y., Zhang, W., Xie, Z., . . . Wang, C.-

C. (2017a). The genetic admixture in Tibetan-Yi Corridor. American

Journal of Physical Anthropology, 164, 522–532.

Yao, X., Tang, S., Bian, B., Wu, X., Chen, G., & Wang, C. C. (2017b).

Improved phylogenetic resolution for Y-chromosome Haplogroup

O2a1c-002611. Scientific Reports, 7, 1146.

Zhong, H., Shi, H., Qi, X.-B., Duan, Z.-Y., Tan, P.-P., Jin, L., . . . Ma, R. Z.

(2011). Extended Y chromosome investigation suggests postglacial

migrations of modern humans into East Asia via the northern route.

Molecular Biology and Evolution, 28, 717–727.

SUPPORTING INFORMATION

Additional Supporting Information may be found online in the sup-

porting information tab for this article.

How to cite this article: Huang X, Zhou Q, Bin X, et al. The

genetic assimilation in language borrowing inferred from Jing

People. Am J Phys Anthropol. 2018;166:638–648. https://doi.

org/10.1002/ajpa.23449

648 | HUANG ET AL.