supporting information text (si text) - nature · web viewthe y-chromosome landscape of the...

The Y-chromosome landscape of the Philippines: extensive heterogeneity and varying

genetic affinities of Negrito and non-Negrito groups

Frederick Delfin1,2, Jazelyn M. Salvador1, Gayvelline C. Calacal1, Henry B. Perdigon1,

Kristina A. Tabbada1, Lilian P. Villamor1, Saturnina C. Halos1, Ellen Gunnarsdóttir2,

Sean Myles1,6, David A. Hughes2, Shuhua Xu3, Li Jin3, Oscar Lao4, Manfred Kayser4,

Matthew E. Hurles5, Mark Stoneking2 and Maria Corazon A. De Ungria1*

1DNA Analysis Laboratory, Natural Sciences Research Institute, University of the

Philippines, Diliman, 1101, Quezon City, Philippines;

2Department of Evolutionary Genetics, Max Planck Institute for Evolutionary

Anthropology, Deutscher Platz, D04103, Leipzig, Germany;

3Chinese Academy of Sciences and Max Planck Society Partner Institute for

Computational Biology, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai,

200031 China;

4Department of Forensic Molecular Biology, Erasmus University Medical Center

Rotterdam, Dr. Molewaterplein 50, 3000 CA Rotterdam, The Netherlands;

5The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton,

Cambridge, CB10 1SA, United Kingdom;

6Current affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, NY

14853-2703 USA.

*Corresponding author: Maria Corazon A. De Ungria, PhD

DNA Analysis Laboratory, Natural Sciences Research Institute, University of the

Philippines, Diliman, 1101, Quezon City, Philippines. Telefax: (63-2) 925-2965, E-

mail: [email protected]

1

SUPPLEMENTARY TEXT: MATERIALS AND METHODS

POPULATION SAMPLE HISTORY

The 390 samples included in this study comprise two sets of population sample

collections. One set, initially composed of 1032 samples (607 males, 425 females) rep-

resenting 15 Filipino language groups (excluding Surigaonon) (Figure 1) was the result

of population sampling efforts by the University of the Philippines, Natural Sciences

Research Institute DNA Analysis Laboratory (UP-NSRI-DAL) from 1997–2004. Biolo-

gical samples in this collection consisted of whole blood blotted on FTA™ cards (FTA™

Gene Guard system, Whatman Inc., Springfield Mill, Maidstone, Kent, UK), buccal

cells collected by swab and transferred on FTA™ cards and plain buccal swabs. The

FTA™ Gene Guard system (Whatman Inc., Springfield Mill, Maidstone, Kent, UK) was

used to extract DNA from whole blood blotted on FTA™ paper and from saliva-buccal

cells swabbed on FTA™ paper.1 A phenol-chloroform method2 and/or the QIAamp®

DNA Blood Mini kit (QIAGEN Inc., Valencia, CA, USA) were used to extract DNA

from buccal swab samples. DNA samples were stored in the UP-NSRI-DAL sample re-

pository. Given the length of time in storage, DNA samples from some groups (Ivatan,

Bugkalot, Kalangoya, CAR and Maranao) required enrichment and were therefore sub-

jected to whole genome amplification (WGA) using the GenomiPhi™ DNA amplifica-

tion kit (GE Healthcare Bio-Sciences AB, Uppsala, Sweden) kit following manufac-

turer’s instructions. Male DNA samples in this collection were typed for Y-chromosome

binary markers (Y-SNPs) at the Wellcome Trust Sanger Institute, Wellcome Trust

Genome Campus, Hinxton, Cambridge, U.K. following a protocol adopted from the

work of Paracchini, Hurles and colleagues3,4 with modifications (Supplementary infor-

2

mation: Figure 1, Table 1, Text – DNA typing) using the ABI Prism® SNaPshot™ multi-

plex kit (Applied Biosystems, Foster, CA, USA) with amplicons detected using capil-

lary electrophoresis (CE) on an ABI Prism® 3100 Genetic Analyzer (Applied Biosys-

tems, Foster, CA, USA) following manufacturer's instructions.

Another set of 142 samples (86 males, 56 females) consisted of saliva samples

from the Mamanwa, Manobo and Surigaonon groups (Figure 1) collected by the Max

Planck Institute for Evolutionary Anthropology (MPI-EVA), Leipzig, Germany in 2005.

Saliva samples were processed at MPI-EVA, using a high-salt, DNA extraction

method.5 Male DNA samples in this collection were typed for Y-SNPs at the MPI-EVA,

using an in-house validated, single-base extension (SBE) assay (Supplementary

information: Figure 1, Table 1, Text – DNA typing) with amplicons detected using

Matrix-Assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF) Mass

Spectrometry (MS) as described elsewhere.6

Male DNA samples of both the UP-NSRI-DAL and the MPI-EVA were typed

for Y-chromosome microsatellite (Y-STR) markers at the UP-NSRI-DAL, Philippines

using the PowerPlex® Y system (Promega Corporation, Madison, WI, USA) with

amplicons detected on an ABI Prism® 310 Genetic Analyzer (Applied Biosystems,

Foster, CA, USA), all following the manufacturers’ instructions.

Given that the Mamanwa and Manobo groups were independently sampled by

the UP-NSRI-DAL and the MPI-EVA from the same Philippine region (Figure 1),

sample overlap and/or relatedness were checked using genealogical information. All

samples were also typed for autosomal microsatellite (A-STR) markers using the

PowerPlex® 16 system (Promega Corporation, Madison, WI, USA) with amplicons

detected on an ABI Prism® 310 Genetic Analyzer (Applied Biosystems, Foster, CA,

3

USA), all following manufacturers’ instructions; and as such, Mamanwa and Manobo

A-STR genotypes were also used to check for sample overlap and/or relatedness.

After excluding related and/or overlapping Mamanwa and Manobo samples; as

well as DNA samples that did not produce reliable results after Y-SNP and Y-STR

typing, the UP-NSRI-DAL male sample set (n=607) and the MPI-EVA male sample set

(n=86) were reduced to 317 and 73, respectively; resulting in 390 samples that were

included in this study (Figure 1).

POPULATION SAMPLING AND SAMPLE DETAILS

Population field sampling was conducted within enclaves (sitios) and/or districts

(barangays) found within municipalities or cities across the Philippines. Before the

collection of samples, interviews were conducted to compile the family history of each

potential participant. Samples were generally collected from unrelated volunteers.

However in some populations with small population sizes, most individuals were

related; hence volunteers not related within the second degree of consanguinity were

considered. A total of 1174 (1032: UP-NSRI-DAL; 142: MPI-EVA) population samples

were collected. The following details of population sample collection enumerate the

number of collected samples, sample type, collection sites and various personalities,

institutions and organizations who assisted in sample collection from 1997 to 2005.

UP-NSRI-DAL collection

The CAR (Cordillera Administrative Region) group in this study is composed of

three language groups namely: the Ibaloi (buccal swabs from 19 males and 32 females)

from Benguet and Nueva Vizcaya provinces; the Ifugao (buccal swabs from 28 males

4

and 26 females) from Banaue municipality in Ifugao province, Nagtipunan municipality

in Quirino province and Kayapa municipality of the Nueva Vizcaya province; and

Kankanaey (buccal swabs from 13 males and 50 females) from Mankayan and La

Trinidad, Benguet province. Samples were collected during several trips to the CAR

region in 1997 and 2002. After DNA typing, only two Ibaloi, one Ifugao and six

Kankanaey samples gave reliable results. Due to close geographical proximity and

cultural relatedness of the Ibaloi, Ifugao and Kankanaey language groups, nine samples

were pooled as one sample set and labeled as the CAR group. Dr. Michael Paul Tan and

Maricar Posa were involved in the collection of samples in 1997. Armando Bogite and

members of the New Tribes Mission facilitated the collection of samples in 2002.

Aeta of Zambales samples (19 male and 19 female whole blood samples) were

collected in January 1998 from volunteers who reside in the municipality of Castillejos,

located north of the Subic Bay Metropolitan Authority (SBMA), in the province of

Zambales. The collection was facilitated by Dr. Edith Tria of the Department of Health

and Dr. Leo Uy.

Bugkalot samples (31 male and 21 female saliva samples) were collected from

volunteers residing in different municipalities of Nueva Vizcaya and Quirino provinces

in two separate trips in January and February, 2002, respectively. The collection of

samples was facilitated by Rey and Lea de la Rosa, Pastor Angel de la Cruz and Jessie

Magallanes, all from the New Tribes Mission and Armando Argayosa. Ramon

Pagsiguian accompanied the sampling team to the community while Liza Faustino and

Laarni Carpina assisted in the collection of samples.

Ivatan samples (162 male and 104 female buccal swabs) were collected from

volunteers from the main islands of Batanes province namely Batan, Itbayat and

5

Sabtang in March 2002 through the assistance of Leticia Bayaras (Provincial Health

Office), Hilda Dacles (Municipal Health Office) and Patricia Gallo of the National

Commission on Indigenous Peoples (NCIP) with endorsement from Governor Vicente

Gato and facilitated by Dr. Victor Paz of the UP Archaeological Studies Program.

Laarni Carpina assisted in the collection and handling of samples.

Kalangoya samples (45 male and 15 female buccal swabs) were collected from

volunteers from Bambang and Kayapa municipalities of Nueva Vizcaya province in

March 2002 with the assistance of Satur Banih and with support from Mayor Tony

Dupiano, Jimmy Tindaan and Steve Baccar.

Maranao samples (28 male and 21 female buccal swabs) were collected in April

2002 from volunteers in two campuses of Mindanao State University (MSU), namely

MSU-Iligan Institute of Technology (MSU-IIT) in Lanao del Norte province and MSU-

Marawi, Marawi City, Lanao del Sur province. Permission to collect on campus was

obtained from Dr. Olga Nuneza of MSU-IIT. Irene Estrada Macarambon- Benito of

MSU-IIT facilitated volunteer participation in the study.

Aeta of Bataan samples (27 male and 14 female samples of saliva blotted on

FTA™ paper) were collected in April 2003 from volunteers residing in the municipalities

of Morong, Hermosa and Orion in the province of Bataan and in the area governed by

the SBMA in the Zambales province through the assistance of Edmond de Jesus and

with the endorsement of Amethya Concepcion of the SBMA, Dr. Roberto Pagulayan of

the UP Institute of Biology, Tribal Chief Bonifacio Florentino and Tribal Chief Josefina

Alejo. Joseph Salonga accompanied the sampling team to the communities. The Aeta of

Bataan is differentiated from the Aeta of Zambales because these groups use different

languages.7

6

Filipino groups residing on the island of Mindoro are collectively called

Mangyan. Samples of saliva blotted on FTA™ paper were collected from four Mangyan

groups namely the Hanunuo (15 male and 3 female samples), Iraya (16 male and 11

female samples), Tadyawan (14 male and 1 female samples) and Tawbuid (14 male and

1 female samples) in April 2003 during a gathering organized by the Mangyan Church

Tribal Association (MCTA) in Baco municipality and Calapan city of Oriental Mindoro

province. Pastor Andino Layda, Efren Aceveda, Peter Mayot and Pastor Diokno Onday

of the MCTA and Tribal leader Fundador Fuentes assisted in obtaining the free and

prior informed consent (FPIC) from volunteers. John Richards, Lore Jean de Guito and

Dothy Smith of the Overseas Mission Fellowship assisted in introducing the sampling

team to the community.

Ati samples (36 male and 26 female samples of saliva blotted on FTA™ paper)

were collected from volunteers from different districts of the provinces of Antique,

Aklan, Capiz and Iloilo, all on the island of Panay as well as from the adjacent islands

of Boracay and Guimaras in January 2004. Sampling was coordinated with Pastor

Emeterio Allianza, Pastor Jessie Elosendo, Pastor Enoc Valencia, Chief Salvador

Escuña, Chief Gregorio Elosendo, Chief Elias Valencia and Chief Enrique Martinez.

Alejandro Condez accompanied the sampling team around Panay and Guimaras islands

whereas Joana Guarin of the Aklan State University helped in the collection of samples

from residents on the island of Boracay.

Agta samples (44 male and 23 female samples of saliva blotted on FTA™ paper)

were collected from volunteers residing in Iriga City and Buhi municipality, both within

the province of Camarines Sur in March 2004. Contact with the communities was made

with the assistance of Julio Versoza of the Mt Isarog Protected Area Office, Dennis

7

Barroga and Belen Jacob of the National Commission on Indigenous People (NCIP).

Our request to go to the communities was approved by Director Lee Arroyo and Atty.

Corazon Crescini of the NCIP.

Mamanwa samples (26 male and 21 female samples of saliva blotted on FTA™

paper) were collected from volunteers from different municipalities of the province of

Surigao del Norte in April 2004, through the assistance of Leonita Gorgolon, Audie

Reliquette and Angelita Bullo, from the Provincial Health Office and Pastor Bernard

Yap of the Christ Faith Fellowship Church.

Manobo samples (70 male and 37 female samples of saliva blotted on FTA™

paper) were collected from volunteers from different municipalities of the province of

Agusan del Sur with the assistance of Mrs. Lily Labadan and Maria Marley Daday in

April 2004.

MPI-EVA collection

In August 2005, samples of saliva in lysis buffer were collected from Mamanwa

(38 male and 20 female samples); Manobo (11 male and 36 female samples) and

Surigaonon (37 male samples) groups. Mamanwa and Surigaonon volunteers were from

different municipalities of Surigao del Norte province and Manobo volunteers were

from different municipalities of Agusan del Norte province. Population sampling was

facilitated by Mr. Fernando A. Almeda Junior, Dr. Irinetta C. Montinola, Dr. Wilfredo

Sinco (all from the Surigaonon Heritage Center); Ms. Girlie Patagan (NCIP); Mrs.

Elizabeth S. Larase and Ms.Juliet P. Erazo (Office of Non-Formal Education) and the

Rotary Club of Surigao.

8

DNA TYPING

Multiplexes I.1 to III.19 (Supplementary Information: Figure 1 and Table 1)

Specific-PCR Multiplex reactions were performed in 10 microliter (µl) volumes

each containing 1X AmpliTaq Gold Buffer II (Applied Biosystems, Foster, CA, USA),

4 millimolar (mM) Magnesium Chloride (MgCl2), 0.4 mM deoxyribonucleotide

triphosphate mix (dNTP mix), 0.08–0.24 µM primer mix (Supplementary Table 1), 0.5

Unit AmpliTaq Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 2 µl or 1

FTA™ punch for DNA template; with the following thermocycling parameters: 94oC for

9 minutes (min); 15 cycles of 94oC for 30 seconds (sec), 59oC for 30 sec, 72oC for 60

sec and a final 72oC for 3 min.

With the incorporation of universal tags or “ZIP code” sequences

(Supplementary Table 1)3 in the specific-PCR products, even concentration of specific-

PCR products in the multiplex was obtained through a second amplification using high

concentrations of “ZIP code” primers. These reactions were performed in 20 µl volumes

each containing 1X AmpliTaq Gold Buffer II (Applied Biosystems, Foster, CA, USA),

4 mM MgCl2, 1 µM each of ZIP code primer A and primer B (Supplementary Table 1),

0.5 Unit AmpliTaq Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 10 µl

specific PCR product; with the following thermocycling parameters: 94oC for 9 min;

30–34 cycles of 94oC for 30 sec, 55oC for 30 sec, 72oC for 60 sec and a final 72oC for 3

min.

PCR products (specific PCR-ZIP reaction product) were cleaned in 12 µl-

volume reactions each containing 2 Units of shrimp alkaline phosphatase (SAP)

enzyme, 1.5 Units of Exonuclease I (Exo I) enzyme and 10 µl PCR product with the

following incubation parameters: 37oC for 1 hour (hr) and 80oC for 20 min.

9

Single-base extension (SBE) reactions were performed in 5 µl volumes using the

ABI Prism® SNaPshot™ multiplex kit (Applied Biosystems, Foster, CA, USA) following

manufacturer's instructions. Each SBE reaction contained 1X SNaPshot™ reaction mix,

0.2–0.6 µM extension primer mix (Supplementary Table 1) and 0.5 µl of cleaned PCR

product with the following thermocycling parameters: 25 cycles of 96oC for 10 sec,

50oC for 5 sec and 60oC for 30 sec. SBE products were detected by CE on an ABI

Prism® 3100 Genetic Analyzer (Applied Biosystems, Foster, CA, USA) following

manufacturer's instructions.

Cluster I and II Multiplexes (Supplementary Information: Figure 1 and Table 1)

Cluster I specific-PCR multiplex reactions were performed in 25 µl volumes

each containing 1X PCR Buffer, 4 mM MgCl2, 0.5 mM dNTP mix, 0.015–0.23 µM

each of forward primer mix and reverse primer mix (Table S1), 0.5 Unit AmpliTaq

Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 4 µl DNA template; with

the following thermocycling parameters: 95oC for 15 min; 36 cycles of 95oC for 20 sec,

59oC for 30 sec, 72oC for 60 sec and a final 72oC for 3 min.

Specific-PCR conditions for Cluster II were similar to Cluster I except that 0.12

µM of forward primer mix and reverse primer mix were used with the following

thermocycling parameters: 95oC for 15 min; 38 cycles of 95oC for 20 sec, 64oC for 30

sec, 72oC for 60 sec and a final 72oC for 3 min.

PCR products were cleaned in 10 µl-volume reactions each containing 0.32 Unit

of SAP enzyme, 0.2 Unit of Exo I enzyme and 8 µl PCR product with the following

incubation parameters: 37oC for 1 hr and 80oC for 20 min. PCR products were subjected

10

to a SBE assay followed by SBE product detection by MALDI-TOF MS as described

previously.6

DATA ANALYSES

The Reference data set

The reference data set was assembled from previously published works8-12,

composed of 1,756 males from 60 groups representing five Asia-Pacific population

groups (Figure 1). These population groups, the populations comprising each group and

their respective population codes are the following: East Asia composed of Korea

(KOR, n = 21), China (CHI, n = 36), Han Chinese from Taiwan (TCH, n

= 19), Ami (AMI, n = 9), Ata (ATA, n = 10), Bunum (BUN, n = 10),

Paiwan (PAI, n = 12) and Vietnam (VTN, n = 6); Southeast Asia

composed of Hiri (MO1, n = 20), Ternate (MO2, n = 13), South Borneo

(SBO, n = 40), Sumatra (SUM, n = 55), Malaysia (MAL, n = 17) Java

(JAV, n = 53), Flores (FLO, n = 73), Adonara (ADR, n = 96), Alor (ALR,

n = 34), East Timor (ETR, n = 48), Roti (ROT, n = 11), Lembata (LMB,

n = 31), Pantar (PNT, n = 38) and Solor (SLR, n = 43); Melanesia

composed of Ketengban (KET, n = 19), Una (UNA, n = 46), Yali (YAL,

n = 5), Dani (DAN, n = 12), Lani (LAN, n = 12), Citak (CIT, n = 28),

Kombai (KOM, n = 2), Awyu (AWY, n = 10), Asmat (ASM, n = 20),

Mappi (MAP, n = 10), Korowai (KRW, n = 11), Muyu (MYU, n = 8),

Papua NewGuinea (PNG) highlands (PHL, n = 31), PNG north coast

(NCo, n = 16), PNG south coast (SCo, n = 17), Trobriand (TRO, n =

52), Bereina (BRN, n = 35), Kapuna (KAP, n = 46), Tolai New Britain

11

(TOL, n = 19), Seimat–Wuvulu (S/W, n = 11), Andra–Hus (A/H, n =

20), Kurti (Kur, n = 18), Lele (Lel, n = 24), Mokerang (Mok, n = 5),

Nyindrou (Nyi, n = 17), Ere–Kele (E/K, n = 14), Nali (Nal, n = 18) and

Titan (Tit, n = 21); Fiji (FIJ, n = 101); Polynesia composed of Cook

Islands (COK, n = 66), Futuna (FUT, n = 50), Niue (NIE, n = 8),

Tokelau (TOK, n = 6), Tonga (TON, n = 28), Tuvalu (TUV, n = 100),

West Samoa (WES, n = 60); and Australia composed of Arnhem Land

(AS1, n = 60) and Great Sandy Desert (AS2, n = 35). Further grouping

of Melanesian populations into the Admiralty Island (ADM: S/W, A/H,

Kur, Lel, Mok, Nyi, E/K and Tit); East New Guinea (ENG: PHL, NCo, SCo,

TRO, BRN, and TOL) and West New Guinea (WNG: KET, DAN, CIT,

KOM, ASM, MAP and KRW) was done to have suitable sample sizes

(samples that share haplogroups with FE groups) for Correspondence

analysis (CA, Figure 4).

Reconciling the Filipino data with the reference data set

The reference data set used the haplogroup name C-RPS4Y, while the Filipino

data used the haplogroup name C-M130 (Supplementary Information: Figure 1, Tables

1 and 2); however, both these names refer to the same NRY binary marker.13,14 The

reference data set was typed for haplogroup O-M11011,12 and the Filipino data set for O-

M50 (Supplementary Information: Figure 1, Tables 1 and 2), which define the same

haplogroup lineage.14 Hence to avoid confusion in haplogroup names and to facilitate

comparison of haplogroups between Filipino and reference data set, names consistent

with the reference data (C-RPS4Y and O-M110) were used. The reference data set was

12

typed for O-M324, a subhaplogroup of O-M122, while the Filipino data set was not. For

compatibility in the analyses, O-M324 data was pooled with O-M122 in the reference

data set.

For compatibility in the analysis of Y-STR haplotypes between Filipino data and

the reference data set, seven overlapping Y-STR loci (DYS19, DYS389I, DYS389II,

DYS390, DYS391, DYS392 and DYS393) were used in all analyses. The DYS389

locus produces two fragments that overlap due to a duplicated priming site for the

forward primer.15 The DYS389 fragments are distinguishable by size; DYS389I being

shorter and DYS389II being the longer fragment which includes the smaller DYS389I

fragment. In practice, allele calls for DYS389II have been done in two ways; one in

which the allele size of DYS389I is included (DYS389II+I) and one in which the allele

size of DYS389I is subtracted (DYS389II-I). This practice differs across laboratory

typing systems (in-house developed or commercially purchased kits). The Filipino

DYS389 data was generated using the PowerPlex® Y system (Promega Corporation,

Madison, WI, USA) which yields DYS389II+I data, while the reference data set

considers DYS389II-I. Hence for compatibility, the Filipino DYS389I allele size was

subtracted from DYS389II to give the allele size for DYS389II, after which three

repeats were subtracted from DYS389I (this is particular to the in-house typing system

used for the reference data set) to give the compatible DYS389I allele size. For

example, original Filipino data: DYS389I-13, DYS389II-27; reference data set-

comparable data: DYS389I-10, DYS389II-14.

Network Analyses

13

Network analyses16 were performed using Network version 4.510 and Network

Publisher version 1.1.0.6 (http://fluxus-engineering.com). A network weighting

scheme17 based on Y-STR locus-specific mutation rates was used. In the following order

of Y-STR loci, DYS19:DYS389I:DYS389II-I:DYS390:DYS391:DYS392:DYS393, the

locus specific weights used were 4:4:3:3:2:12:10.17 As applied to the reference data set10,

initial analyses was performed using the Median-Joining (MJ) algorithm.16 However this

generated complex networks that were difficult to visualize and interpret, hence network

reduction schemes were applied. To reduce network complexity, Y-STR data was put

thru the Reduced-Median (RM) algorithm.18 The RM output was then put thru the MJ

algorithm. The RM-MJ output was then subjected to post processing using the Maxi-

mum Parsimony (MP) algorithm19, selecting the option “single, arbitrary near-optimal

tree within network”. The RM and MJ algorithms generate various non-parsimonious

links. The MP algorithm eliminates these links, simplifying the network; however, it

should be noted that MP calculations generate a single network that is just one of many

equally parsimonious networks. As performed in this study, the RM-MJ-MP network

was compared to the intial MJ network to ensure that the simplified network is represen-

tative of the complex network.

Estimation of haplogroup coalescent times (Time since the Most Recent Common An-

cestor-TMRCA)

The BATWING program20 (http://www.mas.ncl.ac.uk/~nijw/) was used to esti-

mate TMRCA for each Y-SNP haplogroup using both Y-SNP and Y-STR data. Y-SNP

data were used as Unique Event Polymorphisms (UEP sites) to constrain the gene ge-

nealogy model and Y-STR data were used under a Step-wise Mutation Model (SMM).

14

Y-STR mutation rates were modeled using a gamma distribution with parameters alpha

and beta [gamma (α, β)]. These parameters for DYS19 (5,2763), DYS389I (5,2192),

DYS389II-I (6,2192), DYS390 (12,2233), DYS391 (10,2182), DYS392 (1,2182) and

DYS393 (1,2182) were previously used for the reference data set.10 Population structure

was modeled to be a substructured population (14 Filipino groups, with sample sizes >

10). Population size was modeled to be of an initial constant size followed by exponen-

tial growth. Three independent Markov Chain Monte Carlo (MCMC) runs were per-

formed. Each MCMC run had a different random number seed, a total of 108 MCMC

chains and a 10% burn-in period. Built-in BATWING functions were used to evaluate

the MCMC run. The 95% posterior density was computed and TMRCA point estimates

(number of generations) were converted to time in years using a generation time of 30

years per generation. A generation time of 30 years per generation was previously used

for the reference data set10, but apart from ensuring comparable TMRCA estimates, a

30-year generation time for males (father-son intervals) was found to be appropriate for

recent agricultural societies across the world.21

Estimation of divergence time and migration rates

As there seem to be signals of genetic links between several FEN groups (Aeta-

Bataan, Aeta-Zambales and Agta) and indigenous Australians (Arnhem Land and Great

Sandy Desert) (Figure 3: C-RPS4Y and K-M9; Figure 4); divergence times and migra-

tion rates between these groups were estimated using haplotype data (7 Y-STR loci)

through pairwise, simulation-based analyses using the IM program22 (http://genfacul-

ty.rutgers.edu/hey/software).

15

All IM program information was available in the program documentation (Intro-

duction to the IM and IMa computer programs - March 5, 2007 and IM Documentation

- March 5, 2007) available at the program website. IM program author Jody Hey and the

IM Google group (http://groups.google.com/group/Isolation-with-Migration) provided

valuable support in that most of the questions and concerns that an IM program user

would have, are or may have already been addressed and posted at the group site.

One pair of populations (Australia, FEN group) was used for initial program

testing and to search for the appropriate IM run parameters. To test whether the IM pro-

gram ran properly, initial parameter settings were adopted from the IM Documentation

(http://genfaculty.rutgers.edu/hey/software#IMa2): Step-wise Mutation Model (SMM)

for STR data; population mutation parameter (q1 and q2) = 10; migration rates (m1 and

m2) = 10; divergence time (t) = 10. Retaining the initial parameter settings, the number

of MCMC iterations required to produce reasonable results was evaluated by using

varying number of MCMC iterations (105 – 107) with 10% burn-in periods. Reasonable

results refer to unimodal posterior distributions for the parameters being estimated,

trend plots (no trend), autocorrelation values (low values) and effective sample size

(ESS >50 for the t parameter23) values that indicate MCMC convergence. Given the t

parameter shows the slowest rate of mixing, a minimum ESS value >50 for this parame-

ter is acceptable.23 Using STR data requires the incorporation of multiple MCMC cou-

pled chains (Metropolis coupling) which subsequently require chain heating schemes

(IM Documentation). The following IM run parameters, previously addressed and

posted at the IM Google group site were therefore incorporated into the analyses: 30

coupled chains (Metropolis coupling) with geometric heating scheme with 0.95 and 0.9

for the first (-g1) and second (-g2) heating parameters; respectively. Using the parame-

16

http://groups.google.com/group/Isolation-with-Migration

ters already enumerated, credible intervals for the t parameter were very wide (wider

than the reported values in Table 4). To further refine analyses, the t parameter was re-

duced to t = 5 and the population split parameter (s) which accounts for changes in pop-

ulation size, was incorporated. A final test for the appropriate length of MCMC run was

performed using 3x107, 6x107 and 9x107 iterations, each with 10% burn-in periods. No

change in results was observed beyond 6x107 MCMC iterations.

Identifying an appropriate set of IM run parameters, population pairwise com-

parisons were then performed with the following parameters: mutation model = SMM ;

q1 and q2 = 10; m1 and m2 = 10; t = 5; s = 0.2; 30 coupled chains with geometric heat-

ing scheme (g1 = 0.95; g2 = 0.9); generation time = 30 years; burn-in = 6x106; total run

= 6x107 iterations. The use of a 30-year generation period has been discussed in Net-

work analyses. For each pairwise population comparison, three independent IM runs

with the same parameter settings, but different random number seeds, were performed.

Convergence on the stationary distribution was considered to be reached when each run

had a minimum ESS of >50 for the t parameter and when the independent runs gener-

ated similar distributions, as recommended previously.23 The peak of the distribution

(mode) of the estimated parameters (i.e. t, m1 and m2) has the highest probability, simi-

lar to a maximum likelihood estimate, and was therefore considered the actual estimate

of the parameter.

FUNDING

17

This work was supported by: the Natural Sciences Research Institute, University of the

Philippines (NSR-97-2-04, NSR-00-1-03 and NSR-03-1-01); the Philippine Council for

Advanced Science and Technology Research and Development, Department of Science

and Technology; the Academy of Science in the Developing World, Trieste, Italy (RGA:

02-117 RG/BIO/AS); the Wellcome Trust (077014/Z/05/Z), Cambridge, UK; the

Chinese Academy of Sciences and Max Planck Society Partner Institute for

Computational Biology, Shanghai China; the National Outstanding Youth Science

Foundation of China (30625016); National Science Foundation of China (30890034 and

30971577), and 863 Program (2007AA02Z312); the Max Planck Society, Germany; and

the European Commission (B7-7070/T-2000/005). Statements made herein do not

reflect the views of the European Commission. M.C.A. De Ungria was a recipient of a

Royal Society of London visiting fellowship. L. Jin was also supported by the Shanghai

Leading Academic Discipline Project (B111) and the Center for Evolutionary Biology.

S. Xu was also supported by the Science and Technology Commission of Shanghai

Municipality (09ZR1436400), the Knowledge Innovation Program of Shanghai

Institutes for Biological Sciences, Chinese Academy of Sciences (2008KIP311), SA-

SIBS Scholarship Program and the K.C.Wong Education Foundation, Hong Kong. The

funders had no role in study design, data collection and analysis, decision to publish, or

preparation of the manuscript.

REFERENCES

18

1. Salvador JM, De Ungria MCA: Isolation of DNA from saliva of betel quid chewers

using treated cards. J Forensic Sci 2003; 48: 794-797.

2. FBI: PCR-based typing protocols: Laboratory Manual. VA, USA: Federal Bureau of

Investigation, 1994.

3. Paracchini S, Arredi B, Chalk R, Tyler-Smith C: Hierarchical high-throughput SNP

genotyping of the human Y-chromosome using MALDI-TOF mass spectrometry.

Nucleic Acids Res 2002; 30: e27 21-26.

4. Hurles ME, Sykes BC, Jobling MA, Forster P: The dual origin of the Malagasy in

island Southeast Asia and East Africa: Evidence from maternal and paternal

lineages. Am J Hum Genet 2005; 76: 894-901.

5. Quinque D, Kittler R, Kayser M, Stoneking M, Nasidze I: Evaluation of saliva as a

source of human DNA for population and association studies. Anal Biochem 2006;

353: 272-277.

6. Hughes DA, Tang K, Strotmann R et al: Parallel Selection on TRPV6 in Human

Populations. . PLoS ONE 2008; 3: e1686 1681-1613.

7. Gordon RG, Jr. (ed): Ethnologue: Languages of the World, Fifteenth edition. Dallas,

Tex.: SIL International. Online version: http://www.ethnologue.com/. 2005.

8. Kayser M, Brauer S, Weiss G, Schiefenhövel W, Underhill PA, Stoneking M:

Independent Histories of Human Y Chromosomes from Melanesia and Australia.

Am J Hum Genet 2001; 68: 173-190.

9. Kayser M, Brauer S, Weiss G et al: Reduced Y-chromosome, but not mitochondrial

DNA, diversity in human populations from West New Guinea. Am J Hum Genet

2003; 72: 281–302.

19

10. Kayser M, Brauer S, Cordaux R et al: Melanesian and Asian Origins of Polynesians:

mtDNA and Y Chromosome Gradients Across the Pacific. Mol Biol Evol 2006; 23:

2234-2244.

11. Kayser M, Choi Y, van Oven et al: The Impact of the Austronesian Expansion:

Evidence from mtDNA and Y Chromosome Diversity in the Admiralty Islands of

Melanesia. Mol Biol Evol 2008; 25: 1362-1374.

12. Mona S, Grunz KE, Brauer S et al: Genetic Admixture History of Eastern Indonesia

as Revealed by Y-Chromosome and Mitochondrial DNA Analysis. Mol Biol Evol

2009; 26: 1865-1877.

13. Underhill PA, Shen P, Lin AA et al: Y chromosome sequence variation and the

history of human populations. Nat Genet 2000; 26: 358-361.

14. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF:

New binary polymorphisms reshape and increase resolution of the human Y

chromosomal haplogroup tree. Genome Res 2008; 18: 830-838.

15. Kayser M, Caglià A, Corach D et al: Evaluation of Y-chromosomal STRs: a

multicenter study. Int J Legal Med 1997; 110: 125–133.

16. Bandelt H-J, Forster P, Röhl A: Median-joining networks for inferring intraspecific

phylogenies. Mol Biol Evol 1999; 16: 37-48.

17. Mona S, Tommaseo-Ponzetta M, Brauer S, Sudoyo H, Marzuki S, Kayser M:

Patterns of Y-Chromosome Diversity Intersect with the Trans-New Guinea

Hypothesis. Mol Biol Evol 2007; 24: 2546-2555.

18. Bandelt H-J, Forster P, Sykes BC, Richards: MB: Mitochondrial portraits of human

populations using Median Networks. Genetics 1995; 141: 743-753.

20

19. Polzina T, Daneshmand SV: On Steiner trees and minimum spanning trees in

hypergraphs. Oper Res Lett 2003; 31: 12 – 20.

20. Wilson IJ, Weale ME, Balding DJ: Inferences from DNA data: population histories,

evolutionary processes and forensic match probabilities. J R Stat Soc Ser A 2003;

166: 155-201.

21. Matsumura S, Forster P: Generation time and effective population size in Polar

Eskimos. Proc R Soc B 2008; 275: 1501–1508.

22. Hey J, Nielsen R: Multilocus methods for estimating population sizes, migration

rates and divergence time, with applications to the divergence of Drosophila

pseudoobscura and D. perimilis. Genetics 2004; 167: 747-760.

23. Hey J: On the Number of New World Founders: A Population Genetic Portrait of the

Peopling of the Americas. PLoS Biol 2005; 3: e193 0965-0975.

21

supporting information text (si text) - nature · web viewthe y-chromosome landscape of the...

Documents