supporting information text (si text) - nature · web viewthe y-chromosome landscape of the...
TRANSCRIPT
The Y-chromosome landscape of the Philippines: extensive heterogeneity and varying
genetic affinities of Negrito and non-Negrito groups
Frederick Delfin1,2, Jazelyn M. Salvador1, Gayvelline C. Calacal1, Henry B. Perdigon1,
Kristina A. Tabbada1, Lilian P. Villamor1, Saturnina C. Halos1, Ellen Gunnarsdóttir2,
Sean Myles1,6, David A. Hughes2, Shuhua Xu3, Li Jin3, Oscar Lao4, Manfred Kayser4,
Matthew E. Hurles5, Mark Stoneking2 and Maria Corazon A. De Ungria1*
1DNA Analysis Laboratory, Natural Sciences Research Institute, University of the
Philippines, Diliman, 1101, Quezon City, Philippines;
2Department of Evolutionary Genetics, Max Planck Institute for Evolutionary
Anthropology, Deutscher Platz, D04103, Leipzig, Germany;
3Chinese Academy of Sciences and Max Planck Society Partner Institute for
Computational Biology, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai,
200031 China;
4Department of Forensic Molecular Biology, Erasmus University Medical Center
Rotterdam, Dr. Molewaterplein 50, 3000 CA Rotterdam, The Netherlands;
5The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton,
Cambridge, CB10 1SA, United Kingdom;
6Current affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, NY
14853-2703 USA.
*Corresponding author: Maria Corazon A. De Ungria, PhD
DNA Analysis Laboratory, Natural Sciences Research Institute, University of the
Philippines, Diliman, 1101, Quezon City, Philippines. Telefax: (63-2) 925-2965, E-
mail: [email protected]
1
SUPPLEMENTARY TEXT: MATERIALS AND METHODS
POPULATION SAMPLE HISTORY
The 390 samples included in this study comprise two sets of population sample
collections. One set, initially composed of 1032 samples (607 males, 425 females) rep-
resenting 15 Filipino language groups (excluding Surigaonon) (Figure 1) was the result
of population sampling efforts by the University of the Philippines, Natural Sciences
Research Institute DNA Analysis Laboratory (UP-NSRI-DAL) from 1997–2004. Biolo-
gical samples in this collection consisted of whole blood blotted on FTA™ cards (FTA™
Gene Guard system, Whatman Inc., Springfield Mill, Maidstone, Kent, UK), buccal
cells collected by swab and transferred on FTA™ cards and plain buccal swabs. The
FTA™ Gene Guard system (Whatman Inc., Springfield Mill, Maidstone, Kent, UK) was
used to extract DNA from whole blood blotted on FTA™ paper and from saliva-buccal
cells swabbed on FTA™ paper.1 A phenol-chloroform method2 and/or the QIAamp®
DNA Blood Mini kit (QIAGEN Inc., Valencia, CA, USA) were used to extract DNA
from buccal swab samples. DNA samples were stored in the UP-NSRI-DAL sample re-
pository. Given the length of time in storage, DNA samples from some groups (Ivatan,
Bugkalot, Kalangoya, CAR and Maranao) required enrichment and were therefore sub-
jected to whole genome amplification (WGA) using the GenomiPhi™ DNA amplifica-
tion kit (GE Healthcare Bio-Sciences AB, Uppsala, Sweden) kit following manufac-
turer’s instructions. Male DNA samples in this collection were typed for Y-chromosome
binary markers (Y-SNPs) at the Wellcome Trust Sanger Institute, Wellcome Trust
Genome Campus, Hinxton, Cambridge, U.K. following a protocol adopted from the
work of Paracchini, Hurles and colleagues3,4 with modifications (Supplementary infor-
2
mation: Figure 1, Table 1, Text – DNA typing) using the ABI Prism® SNaPshot™ multi-
plex kit (Applied Biosystems, Foster, CA, USA) with amplicons detected using capil-
lary electrophoresis (CE) on an ABI Prism® 3100 Genetic Analyzer (Applied Biosys-
tems, Foster, CA, USA) following manufacturer's instructions.
Another set of 142 samples (86 males, 56 females) consisted of saliva samples
from the Mamanwa, Manobo and Surigaonon groups (Figure 1) collected by the Max
Planck Institute for Evolutionary Anthropology (MPI-EVA), Leipzig, Germany in 2005.
Saliva samples were processed at MPI-EVA, using a high-salt, DNA extraction
method.5 Male DNA samples in this collection were typed for Y-SNPs at the MPI-EVA,
using an in-house validated, single-base extension (SBE) assay (Supplementary
information: Figure 1, Table 1, Text – DNA typing) with amplicons detected using
Matrix-Assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF) Mass
Spectrometry (MS) as described elsewhere.6
Male DNA samples of both the UP-NSRI-DAL and the MPI-EVA were typed
for Y-chromosome microsatellite (Y-STR) markers at the UP-NSRI-DAL, Philippines
using the PowerPlex® Y system (Promega Corporation, Madison, WI, USA) with
amplicons detected on an ABI Prism® 310 Genetic Analyzer (Applied Biosystems,
Foster, CA, USA), all following the manufacturers’ instructions.
Given that the Mamanwa and Manobo groups were independently sampled by
the UP-NSRI-DAL and the MPI-EVA from the same Philippine region (Figure 1),
sample overlap and/or relatedness were checked using genealogical information. All
samples were also typed for autosomal microsatellite (A-STR) markers using the
PowerPlex® 16 system (Promega Corporation, Madison, WI, USA) with amplicons
detected on an ABI Prism® 310 Genetic Analyzer (Applied Biosystems, Foster, CA,
3
USA), all following manufacturers’ instructions; and as such, Mamanwa and Manobo
A-STR genotypes were also used to check for sample overlap and/or relatedness.
After excluding related and/or overlapping Mamanwa and Manobo samples; as
well as DNA samples that did not produce reliable results after Y-SNP and Y-STR
typing, the UP-NSRI-DAL male sample set (n=607) and the MPI-EVA male sample set
(n=86) were reduced to 317 and 73, respectively; resulting in 390 samples that were
included in this study (Figure 1).
POPULATION SAMPLING AND SAMPLE DETAILS
Population field sampling was conducted within enclaves (sitios) and/or districts
(barangays) found within municipalities or cities across the Philippines. Before the
collection of samples, interviews were conducted to compile the family history of each
potential participant. Samples were generally collected from unrelated volunteers.
However in some populations with small population sizes, most individuals were
related; hence volunteers not related within the second degree of consanguinity were
considered. A total of 1174 (1032: UP-NSRI-DAL; 142: MPI-EVA) population samples
were collected. The following details of population sample collection enumerate the
number of collected samples, sample type, collection sites and various personalities,
institutions and organizations who assisted in sample collection from 1997 to 2005.
UP-NSRI-DAL collection
The CAR (Cordillera Administrative Region) group in this study is composed of
three language groups namely: the Ibaloi (buccal swabs from 19 males and 32 females)
from Benguet and Nueva Vizcaya provinces; the Ifugao (buccal swabs from 28 males
4
and 26 females) from Banaue municipality in Ifugao province, Nagtipunan municipality
in Quirino province and Kayapa municipality of the Nueva Vizcaya province; and
Kankanaey (buccal swabs from 13 males and 50 females) from Mankayan and La
Trinidad, Benguet province. Samples were collected during several trips to the CAR
region in 1997 and 2002. After DNA typing, only two Ibaloi, one Ifugao and six
Kankanaey samples gave reliable results. Due to close geographical proximity and
cultural relatedness of the Ibaloi, Ifugao and Kankanaey language groups, nine samples
were pooled as one sample set and labeled as the CAR group. Dr. Michael Paul Tan and
Maricar Posa were involved in the collection of samples in 1997. Armando Bogite and
members of the New Tribes Mission facilitated the collection of samples in 2002.
Aeta of Zambales samples (19 male and 19 female whole blood samples) were
collected in January 1998 from volunteers who reside in the municipality of Castillejos,
located north of the Subic Bay Metropolitan Authority (SBMA), in the province of
Zambales. The collection was facilitated by Dr. Edith Tria of the Department of Health
and Dr. Leo Uy.
Bugkalot samples (31 male and 21 female saliva samples) were collected from
volunteers residing in different municipalities of Nueva Vizcaya and Quirino provinces
in two separate trips in January and February, 2002, respectively. The collection of
samples was facilitated by Rey and Lea de la Rosa, Pastor Angel de la Cruz and Jessie
Magallanes, all from the New Tribes Mission and Armando Argayosa. Ramon
Pagsiguian accompanied the sampling team to the community while Liza Faustino and
Laarni Carpina assisted in the collection of samples.
Ivatan samples (162 male and 104 female buccal swabs) were collected from
volunteers from the main islands of Batanes province namely Batan, Itbayat and
5
Sabtang in March 2002 through the assistance of Leticia Bayaras (Provincial Health
Office), Hilda Dacles (Municipal Health Office) and Patricia Gallo of the National
Commission on Indigenous Peoples (NCIP) with endorsement from Governor Vicente
Gato and facilitated by Dr. Victor Paz of the UP Archaeological Studies Program.
Laarni Carpina assisted in the collection and handling of samples.
Kalangoya samples (45 male and 15 female buccal swabs) were collected from
volunteers from Bambang and Kayapa municipalities of Nueva Vizcaya province in
March 2002 with the assistance of Satur Banih and with support from Mayor Tony
Dupiano, Jimmy Tindaan and Steve Baccar.
Maranao samples (28 male and 21 female buccal swabs) were collected in April
2002 from volunteers in two campuses of Mindanao State University (MSU), namely
MSU-Iligan Institute of Technology (MSU-IIT) in Lanao del Norte province and MSU-
Marawi, Marawi City, Lanao del Sur province. Permission to collect on campus was
obtained from Dr. Olga Nuneza of MSU-IIT. Irene Estrada Macarambon- Benito of
MSU-IIT facilitated volunteer participation in the study.
Aeta of Bataan samples (27 male and 14 female samples of saliva blotted on
FTA™ paper) were collected in April 2003 from volunteers residing in the municipalities
of Morong, Hermosa and Orion in the province of Bataan and in the area governed by
the SBMA in the Zambales province through the assistance of Edmond de Jesus and
with the endorsement of Amethya Concepcion of the SBMA, Dr. Roberto Pagulayan of
the UP Institute of Biology, Tribal Chief Bonifacio Florentino and Tribal Chief Josefina
Alejo. Joseph Salonga accompanied the sampling team to the communities. The Aeta of
Bataan is differentiated from the Aeta of Zambales because these groups use different
languages.7
6
Filipino groups residing on the island of Mindoro are collectively called
Mangyan. Samples of saliva blotted on FTA™ paper were collected from four Mangyan
groups namely the Hanunuo (15 male and 3 female samples), Iraya (16 male and 11
female samples), Tadyawan (14 male and 1 female samples) and Tawbuid (14 male and
1 female samples) in April 2003 during a gathering organized by the Mangyan Church
Tribal Association (MCTA) in Baco municipality and Calapan city of Oriental Mindoro
province. Pastor Andino Layda, Efren Aceveda, Peter Mayot and Pastor Diokno Onday
of the MCTA and Tribal leader Fundador Fuentes assisted in obtaining the free and
prior informed consent (FPIC) from volunteers. John Richards, Lore Jean de Guito and
Dothy Smith of the Overseas Mission Fellowship assisted in introducing the sampling
team to the community.
Ati samples (36 male and 26 female samples of saliva blotted on FTA™ paper)
were collected from volunteers from different districts of the provinces of Antique,
Aklan, Capiz and Iloilo, all on the island of Panay as well as from the adjacent islands
of Boracay and Guimaras in January 2004. Sampling was coordinated with Pastor
Emeterio Allianza, Pastor Jessie Elosendo, Pastor Enoc Valencia, Chief Salvador
Escuña, Chief Gregorio Elosendo, Chief Elias Valencia and Chief Enrique Martinez.
Alejandro Condez accompanied the sampling team around Panay and Guimaras islands
whereas Joana Guarin of the Aklan State University helped in the collection of samples
from residents on the island of Boracay.
Agta samples (44 male and 23 female samples of saliva blotted on FTA™ paper)
were collected from volunteers residing in Iriga City and Buhi municipality, both within
the province of Camarines Sur in March 2004. Contact with the communities was made
with the assistance of Julio Versoza of the Mt Isarog Protected Area Office, Dennis
7
Barroga and Belen Jacob of the National Commission on Indigenous People (NCIP).
Our request to go to the communities was approved by Director Lee Arroyo and Atty.
Corazon Crescini of the NCIP.
Mamanwa samples (26 male and 21 female samples of saliva blotted on FTA™
paper) were collected from volunteers from different municipalities of the province of
Surigao del Norte in April 2004, through the assistance of Leonita Gorgolon, Audie
Reliquette and Angelita Bullo, from the Provincial Health Office and Pastor Bernard
Yap of the Christ Faith Fellowship Church.
Manobo samples (70 male and 37 female samples of saliva blotted on FTA™
paper) were collected from volunteers from different municipalities of the province of
Agusan del Sur with the assistance of Mrs. Lily Labadan and Maria Marley Daday in
April 2004.
MPI-EVA collection
In August 2005, samples of saliva in lysis buffer were collected from Mamanwa
(38 male and 20 female samples); Manobo (11 male and 36 female samples) and
Surigaonon (37 male samples) groups. Mamanwa and Surigaonon volunteers were from
different municipalities of Surigao del Norte province and Manobo volunteers were
from different municipalities of Agusan del Norte province. Population sampling was
facilitated by Mr. Fernando A. Almeda Junior, Dr. Irinetta C. Montinola, Dr. Wilfredo
Sinco (all from the Surigaonon Heritage Center); Ms. Girlie Patagan (NCIP); Mrs.
Elizabeth S. Larase and Ms.Juliet P. Erazo (Office of Non-Formal Education) and the
Rotary Club of Surigao.
8
DNA TYPING
Multiplexes I.1 to III.19 (Supplementary Information: Figure 1 and Table 1)
Specific-PCR Multiplex reactions were performed in 10 microliter (µl) volumes
each containing 1X AmpliTaq Gold Buffer II (Applied Biosystems, Foster, CA, USA),
4 millimolar (mM) Magnesium Chloride (MgCl2), 0.4 mM deoxyribonucleotide
triphosphate mix (dNTP mix), 0.08–0.24 µM primer mix (Supplementary Table 1), 0.5
Unit AmpliTaq Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 2 µl or 1
FTA™ punch for DNA template; with the following thermocycling parameters: 94oC for
9 minutes (min); 15 cycles of 94oC for 30 seconds (sec), 59oC for 30 sec, 72oC for 60
sec and a final 72oC for 3 min.
With the incorporation of universal tags or “ZIP code” sequences
(Supplementary Table 1)3 in the specific-PCR products, even concentration of specific-
PCR products in the multiplex was obtained through a second amplification using high
concentrations of “ZIP code” primers. These reactions were performed in 20 µl volumes
each containing 1X AmpliTaq Gold Buffer II (Applied Biosystems, Foster, CA, USA),
4 mM MgCl2, 1 µM each of ZIP code primer A and primer B (Supplementary Table 1),
0.5 Unit AmpliTaq Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 10 µl
specific PCR product; with the following thermocycling parameters: 94oC for 9 min;
30–34 cycles of 94oC for 30 sec, 55oC for 30 sec, 72oC for 60 sec and a final 72oC for 3
min.
PCR products (specific PCR-ZIP reaction product) were cleaned in 12 µl-
volume reactions each containing 2 Units of shrimp alkaline phosphatase (SAP)
enzyme, 1.5 Units of Exonuclease I (Exo I) enzyme and 10 µl PCR product with the
following incubation parameters: 37oC for 1 hour (hr) and 80oC for 20 min.
9
Single-base extension (SBE) reactions were performed in 5 µl volumes using the
ABI Prism® SNaPshot™ multiplex kit (Applied Biosystems, Foster, CA, USA) following
manufacturer's instructions. Each SBE reaction contained 1X SNaPshot™ reaction mix,
0.2–0.6 µM extension primer mix (Supplementary Table 1) and 0.5 µl of cleaned PCR
product with the following thermocycling parameters: 25 cycles of 96oC for 10 sec,
50oC for 5 sec and 60oC for 30 sec. SBE products were detected by CE on an ABI
Prism® 3100 Genetic Analyzer (Applied Biosystems, Foster, CA, USA) following
manufacturer's instructions.
Cluster I and II Multiplexes (Supplementary Information: Figure 1 and Table 1)
Cluster I specific-PCR multiplex reactions were performed in 25 µl volumes
each containing 1X PCR Buffer, 4 mM MgCl2, 0.5 mM dNTP mix, 0.015–0.23 µM
each of forward primer mix and reverse primer mix (Table S1), 0.5 Unit AmpliTaq
Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 4 µl DNA template; with
the following thermocycling parameters: 95oC for 15 min; 36 cycles of 95oC for 20 sec,
59oC for 30 sec, 72oC for 60 sec and a final 72oC for 3 min.
Specific-PCR conditions for Cluster II were similar to Cluster I except that 0.12
µM of forward primer mix and reverse primer mix were used with the following
thermocycling parameters: 95oC for 15 min; 38 cycles of 95oC for 20 sec, 64oC for 30
sec, 72oC for 60 sec and a final 72oC for 3 min.
PCR products were cleaned in 10 µl-volume reactions each containing 0.32 Unit
of SAP enzyme, 0.2 Unit of Exo I enzyme and 8 µl PCR product with the following
incubation parameters: 37oC for 1 hr and 80oC for 20 min. PCR products were subjected
10
to a SBE assay followed by SBE product detection by MALDI-TOF MS as described
previously.6
DATA ANALYSES
The Reference data set
The reference data set was assembled from previously published works8-12,
composed of 1,756 males from 60 groups representing five Asia-Pacific population
groups (Figure 1). These population groups, the populations comprising each group and
their respective population codes are the following: East Asia composed of Korea
(KOR, n = 21), China (CHI, n = 36), Han Chinese from Taiwan (TCH, n
= 19), Ami (AMI, n = 9), Ata (ATA, n = 10), Bunum (BUN, n = 10),
Paiwan (PAI, n = 12) and Vietnam (VTN, n = 6); Southeast Asia
composed of Hiri (MO1, n = 20), Ternate (MO2, n = 13), South Borneo
(SBO, n = 40), Sumatra (SUM, n = 55), Malaysia (MAL, n = 17) Java
(JAV, n = 53), Flores (FLO, n = 73), Adonara (ADR, n = 96), Alor (ALR,
n = 34), East Timor (ETR, n = 48), Roti (ROT, n = 11), Lembata (LMB,
n = 31), Pantar (PNT, n = 38) and Solor (SLR, n = 43); Melanesia
composed of Ketengban (KET, n = 19), Una (UNA, n = 46), Yali (YAL,
n = 5), Dani (DAN, n = 12), Lani (LAN, n = 12), Citak (CIT, n = 28),
Kombai (KOM, n = 2), Awyu (AWY, n = 10), Asmat (ASM, n = 20),
Mappi (MAP, n = 10), Korowai (KRW, n = 11), Muyu (MYU, n = 8),
Papua NewGuinea (PNG) highlands (PHL, n = 31), PNG north coast
(NCo, n = 16), PNG south coast (SCo, n = 17), Trobriand (TRO, n =
52), Bereina (BRN, n = 35), Kapuna (KAP, n = 46), Tolai New Britain
11
(TOL, n = 19), Seimat–Wuvulu (S/W, n = 11), Andra–Hus (A/H, n =
20), Kurti (Kur, n = 18), Lele (Lel, n = 24), Mokerang (Mok, n = 5),
Nyindrou (Nyi, n = 17), Ere–Kele (E/K, n = 14), Nali (Nal, n = 18) and
Titan (Tit, n = 21); Fiji (FIJ, n = 101); Polynesia composed of Cook
Islands (COK, n = 66), Futuna (FUT, n = 50), Niue (NIE, n = 8),
Tokelau (TOK, n = 6), Tonga (TON, n = 28), Tuvalu (TUV, n = 100),
West Samoa (WES, n = 60); and Australia composed of Arnhem Land
(AS1, n = 60) and Great Sandy Desert (AS2, n = 35). Further grouping
of Melanesian populations into the Admiralty Island (ADM: S/W, A/H,
Kur, Lel, Mok, Nyi, E/K and Tit); East New Guinea (ENG: PHL, NCo, SCo,
TRO, BRN, and TOL) and West New Guinea (WNG: KET, DAN, CIT,
KOM, ASM, MAP and KRW) was done to have suitable sample sizes
(samples that share haplogroups with FE groups) for Correspondence
analysis (CA, Figure 4).
Reconciling the Filipino data with the reference data set
The reference data set used the haplogroup name C-RPS4Y, while the Filipino
data used the haplogroup name C-M130 (Supplementary Information: Figure 1, Tables
1 and 2); however, both these names refer to the same NRY binary marker.13,14 The
reference data set was typed for haplogroup O-M11011,12 and the Filipino data set for O-
M50 (Supplementary Information: Figure 1, Tables 1 and 2), which define the same
haplogroup lineage.14 Hence to avoid confusion in haplogroup names and to facilitate
comparison of haplogroups between Filipino and reference data set, names consistent
with the reference data (C-RPS4Y and O-M110) were used. The reference data set was
12
typed for O-M324, a subhaplogroup of O-M122, while the Filipino data set was not. For
compatibility in the analyses, O-M324 data was pooled with O-M122 in the reference
data set.
For compatibility in the analysis of Y-STR haplotypes between Filipino data and
the reference data set, seven overlapping Y-STR loci (DYS19, DYS389I, DYS389II,
DYS390, DYS391, DYS392 and DYS393) were used in all analyses. The DYS389
locus produces two fragments that overlap due to a duplicated priming site for the
forward primer.15 The DYS389 fragments are distinguishable by size; DYS389I being
shorter and DYS389II being the longer fragment which includes the smaller DYS389I
fragment. In practice, allele calls for DYS389II have been done in two ways; one in
which the allele size of DYS389I is included (DYS389II+I) and one in which the allele
size of DYS389I is subtracted (DYS389II-I). This practice differs across laboratory
typing systems (in-house developed or commercially purchased kits). The Filipino
DYS389 data was generated using the PowerPlex® Y system (Promega Corporation,
Madison, WI, USA) which yields DYS389II+I data, while the reference data set
considers DYS389II-I. Hence for compatibility, the Filipino DYS389I allele size was
subtracted from DYS389II to give the allele size for DYS389II, after which three
repeats were subtracted from DYS389I (this is particular to the in-house typing system
used for the reference data set) to give the compatible DYS389I allele size. For
example, original Filipino data: DYS389I-13, DYS389II-27; reference data set-
comparable data: DYS389I-10, DYS389II-14.
Network Analyses
13
Network analyses16 were performed using Network version 4.510 and Network
Publisher version 1.1.0.6 (http://fluxus-engineering.com). A network weighting
scheme17 based on Y-STR locus-specific mutation rates was used. In the following order
of Y-STR loci, DYS19:DYS389I:DYS389II-I:DYS390:DYS391:DYS392:DYS393, the
locus specific weights used were 4:4:3:3:2:12:10.17 As applied to the reference data set10,
initial analyses was performed using the Median-Joining (MJ) algorithm.16 However this
generated complex networks that were difficult to visualize and interpret, hence network
reduction schemes were applied. To reduce network complexity, Y-STR data was put
thru the Reduced-Median (RM) algorithm.18 The RM output was then put thru the MJ
algorithm. The RM-MJ output was then subjected to post processing using the Maxi-
mum Parsimony (MP) algorithm19, selecting the option “single, arbitrary near-optimal
tree within network”. The RM and MJ algorithms generate various non-parsimonious
links. The MP algorithm eliminates these links, simplifying the network; however, it
should be noted that MP calculations generate a single network that is just one of many
equally parsimonious networks. As performed in this study, the RM-MJ-MP network
was compared to the intial MJ network to ensure that the simplified network is represen-
tative of the complex network.
Estimation of haplogroup coalescent times (Time since the Most Recent Common An-
cestor-TMRCA)
The BATWING program20 (http://www.mas.ncl.ac.uk/~nijw/) was used to esti-
mate TMRCA for each Y-SNP haplogroup using both Y-SNP and Y-STR data. Y-SNP
data were used as Unique Event Polymorphisms (UEP sites) to constrain the gene ge-
nealogy model and Y-STR data were used under a Step-wise Mutation Model (SMM).
14
Y-STR mutation rates were modeled using a gamma distribution with parameters alpha
and beta [gamma (α, β)]. These parameters for DYS19 (5,2763), DYS389I (5,2192),
DYS389II-I (6,2192), DYS390 (12,2233), DYS391 (10,2182), DYS392 (1,2182) and
DYS393 (1,2182) were previously used for the reference data set.10 Population structure
was modeled to be a substructured population (14 Filipino groups, with sample sizes >
10). Population size was modeled to be of an initial constant size followed by exponen-
tial growth. Three independent Markov Chain Monte Carlo (MCMC) runs were per-
formed. Each MCMC run had a different random number seed, a total of 108 MCMC
chains and a 10% burn-in period. Built-in BATWING functions were used to evaluate
the MCMC run. The 95% posterior density was computed and TMRCA point estimates
(number of generations) were converted to time in years using a generation time of 30
years per generation. A generation time of 30 years per generation was previously used
for the reference data set10, but apart from ensuring comparable TMRCA estimates, a
30-year generation time for males (father-son intervals) was found to be appropriate for
recent agricultural societies across the world.21
Estimation of divergence time and migration rates
As there seem to be signals of genetic links between several FEN groups (Aeta-
Bataan, Aeta-Zambales and Agta) and indigenous Australians (Arnhem Land and Great
Sandy Desert) (Figure 3: C-RPS4Y and K-M9; Figure 4); divergence times and migra-
tion rates between these groups were estimated using haplotype data (7 Y-STR loci)
through pairwise, simulation-based analyses using the IM program22 (http://genfacul-
ty.rutgers.edu/hey/software).
15
All IM program information was available in the program documentation (Intro-
duction to the IM and IMa computer programs - March 5, 2007 and IM Documentation
- March 5, 2007) available at the program website. IM program author Jody Hey and the
IM Google group (http://groups.google.com/group/Isolation-with-Migration) provided
valuable support in that most of the questions and concerns that an IM program user
would have, are or may have already been addressed and posted at the group site.
One pair of populations (Australia, FEN group) was used for initial program
testing and to search for the appropriate IM run parameters. To test whether the IM pro-
gram ran properly, initial parameter settings were adopted from the IM Documentation
(http://genfaculty.rutgers.edu/hey/software#IMa2): Step-wise Mutation Model (SMM)
for STR data; population mutation parameter (q1 and q2) = 10; migration rates (m1 and
m2) = 10; divergence time (t) = 10. Retaining the initial parameter settings, the number
of MCMC iterations required to produce reasonable results was evaluated by using
varying number of MCMC iterations (105 – 107) with 10% burn-in periods. Reasonable
results refer to unimodal posterior distributions for the parameters being estimated,
trend plots (no trend), autocorrelation values (low values) and effective sample size
(ESS >50 for the t parameter23) values that indicate MCMC convergence. Given the t
parameter shows the slowest rate of mixing, a minimum ESS value >50 for this parame-
ter is acceptable.23 Using STR data requires the incorporation of multiple MCMC cou-
pled chains (Metropolis coupling) which subsequently require chain heating schemes
(IM Documentation). The following IM run parameters, previously addressed and
posted at the IM Google group site were therefore incorporated into the analyses: 30
coupled chains (Metropolis coupling) with geometric heating scheme with 0.95 and 0.9
for the first (-g1) and second (-g2) heating parameters; respectively. Using the parame-
16
ters already enumerated, credible intervals for the t parameter were very wide (wider
than the reported values in Table 4). To further refine analyses, the t parameter was re-
duced to t = 5 and the population split parameter (s) which accounts for changes in pop-
ulation size, was incorporated. A final test for the appropriate length of MCMC run was
performed using 3x107, 6x107 and 9x107 iterations, each with 10% burn-in periods. No
change in results was observed beyond 6x107 MCMC iterations.
Identifying an appropriate set of IM run parameters, population pairwise com-
parisons were then performed with the following parameters: mutation model = SMM ;
q1 and q2 = 10; m1 and m2 = 10; t = 5; s = 0.2; 30 coupled chains with geometric heat-
ing scheme (g1 = 0.95; g2 = 0.9); generation time = 30 years; burn-in = 6x106; total run
= 6x107 iterations. The use of a 30-year generation period has been discussed in Net-
work analyses. For each pairwise population comparison, three independent IM runs
with the same parameter settings, but different random number seeds, were performed.
Convergence on the stationary distribution was considered to be reached when each run
had a minimum ESS of >50 for the t parameter and when the independent runs gener-
ated similar distributions, as recommended previously.23 The peak of the distribution
(mode) of the estimated parameters (i.e. t, m1 and m2) has the highest probability, simi-
lar to a maximum likelihood estimate, and was therefore considered the actual estimate
of the parameter.
FUNDING
17
This work was supported by: the Natural Sciences Research Institute, University of the
Philippines (NSR-97-2-04, NSR-00-1-03 and NSR-03-1-01); the Philippine Council for
Advanced Science and Technology Research and Development, Department of Science
and Technology; the Academy of Science in the Developing World, Trieste, Italy (RGA:
02-117 RG/BIO/AS); the Wellcome Trust (077014/Z/05/Z), Cambridge, UK; the
Chinese Academy of Sciences and Max Planck Society Partner Institute for
Computational Biology, Shanghai China; the National Outstanding Youth Science
Foundation of China (30625016); National Science Foundation of China (30890034 and
30971577), and 863 Program (2007AA02Z312); the Max Planck Society, Germany; and
the European Commission (B7-7070/T-2000/005). Statements made herein do not
reflect the views of the European Commission. M.C.A. De Ungria was a recipient of a
Royal Society of London visiting fellowship. L. Jin was also supported by the Shanghai
Leading Academic Discipline Project (B111) and the Center for Evolutionary Biology.
S. Xu was also supported by the Science and Technology Commission of Shanghai
Municipality (09ZR1436400), the Knowledge Innovation Program of Shanghai
Institutes for Biological Sciences, Chinese Academy of Sciences (2008KIP311), SA-
SIBS Scholarship Program and the K.C.Wong Education Foundation, Hong Kong. The
funders had no role in study design, data collection and analysis, decision to publish, or
preparation of the manuscript.
REFERENCES
18
1. Salvador JM, De Ungria MCA: Isolation of DNA from saliva of betel quid chewers
using treated cards. J Forensic Sci 2003; 48: 794-797.
2. FBI: PCR-based typing protocols: Laboratory Manual. VA, USA: Federal Bureau of
Investigation, 1994.
3. Paracchini S, Arredi B, Chalk R, Tyler-Smith C: Hierarchical high-throughput SNP
genotyping of the human Y-chromosome using MALDI-TOF mass spectrometry.
Nucleic Acids Res 2002; 30: e27 21-26.
4. Hurles ME, Sykes BC, Jobling MA, Forster P: The dual origin of the Malagasy in
island Southeast Asia and East Africa: Evidence from maternal and paternal
lineages. Am J Hum Genet 2005; 76: 894-901.
5. Quinque D, Kittler R, Kayser M, Stoneking M, Nasidze I: Evaluation of saliva as a
source of human DNA for population and association studies. Anal Biochem 2006;
353: 272-277.
6. Hughes DA, Tang K, Strotmann R et al: Parallel Selection on TRPV6 in Human
Populations. . PLoS ONE 2008; 3: e1686 1681-1613.
7. Gordon RG, Jr. (ed): Ethnologue: Languages of the World, Fifteenth edition. Dallas,
Tex.: SIL International. Online version: http://www.ethnologue.com/. 2005.
8. Kayser M, Brauer S, Weiss G, Schiefenhövel W, Underhill PA, Stoneking M:
Independent Histories of Human Y Chromosomes from Melanesia and Australia.
Am J Hum Genet 2001; 68: 173-190.
9. Kayser M, Brauer S, Weiss G et al: Reduced Y-chromosome, but not mitochondrial
DNA, diversity in human populations from West New Guinea. Am J Hum Genet
2003; 72: 281–302.
19
10. Kayser M, Brauer S, Cordaux R et al: Melanesian and Asian Origins of Polynesians:
mtDNA and Y Chromosome Gradients Across the Pacific. Mol Biol Evol 2006; 23:
2234-2244.
11. Kayser M, Choi Y, van Oven et al: The Impact of the Austronesian Expansion:
Evidence from mtDNA and Y Chromosome Diversity in the Admiralty Islands of
Melanesia. Mol Biol Evol 2008; 25: 1362-1374.
12. Mona S, Grunz KE, Brauer S et al: Genetic Admixture History of Eastern Indonesia
as Revealed by Y-Chromosome and Mitochondrial DNA Analysis. Mol Biol Evol
2009; 26: 1865-1877.
13. Underhill PA, Shen P, Lin AA et al: Y chromosome sequence variation and the
history of human populations. Nat Genet 2000; 26: 358-361.
14. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF:
New binary polymorphisms reshape and increase resolution of the human Y
chromosomal haplogroup tree. Genome Res 2008; 18: 830-838.
15. Kayser M, Caglià A, Corach D et al: Evaluation of Y-chromosomal STRs: a
multicenter study. Int J Legal Med 1997; 110: 125–133.
16. Bandelt H-J, Forster P, Röhl A: Median-joining networks for inferring intraspecific
phylogenies. Mol Biol Evol 1999; 16: 37-48.
17. Mona S, Tommaseo-Ponzetta M, Brauer S, Sudoyo H, Marzuki S, Kayser M:
Patterns of Y-Chromosome Diversity Intersect with the Trans-New Guinea
Hypothesis. Mol Biol Evol 2007; 24: 2546-2555.
18. Bandelt H-J, Forster P, Sykes BC, Richards: MB: Mitochondrial portraits of human
populations using Median Networks. Genetics 1995; 141: 743-753.
20
19. Polzina T, Daneshmand SV: On Steiner trees and minimum spanning trees in
hypergraphs. Oper Res Lett 2003; 31: 12 – 20.
20. Wilson IJ, Weale ME, Balding DJ: Inferences from DNA data: population histories,
evolutionary processes and forensic match probabilities. J R Stat Soc Ser A 2003;
166: 155-201.
21. Matsumura S, Forster P: Generation time and effective population size in Polar
Eskimos. Proc R Soc B 2008; 275: 1501–1508.
22. Hey J, Nielsen R: Multilocus methods for estimating population sizes, migration
rates and divergence time, with applications to the divergence of Drosophila
pseudoobscura and D. perimilis. Genetics 2004; 167: 747-760.
23. Hey J: On the Number of New World Founders: A Population Genetic Portrait of the
Peopling of the Americas. PLoS Biol 2005; 3: e193 0965-0975.
21