assortative mating on ancestry-variant traits in admixed...

14
ORIGINAL RESEARCH published: 24 April 2019 doi: 10.3389/fgene.2019.00359 Edited by: Michelle Luciano, University of Edinburgh, United Kingdom Reviewed by: Yong-Kyu Kim, Howard Hughes Medical Institute (HHMI), United States Xiaoyin Li, Case Western Reserve University, United States *Correspondence: I. King Jordan [email protected] Specialty section: This article was submitted to Behavioral and Psychiatric Genetics, a section of the journal Frontiers in Genetics Received: 25 September 2018 Accepted: 04 April 2019 Published: 24 April 2019 Citation: Norris ET, Rishishwar L, Wang L, Conley AB, Chande AT, Dabrowski AM, Valderrama-Aguirre A and Jordan IK (2019) Assortative Mating on Ancestry-Variant Traits in Admixed Latin American Populations. Front. Genet. 10:359. doi: 10.3389/fgene.2019.00359 Assortative Mating on Ancestry-Variant Traits in Admixed Latin American Populations Emily T. Norris 1,2,3 , Lavanya Rishishwar 1,2,3 , Lu Wang 1,3 , Andrew B. Conley 2 , Aroon T. Chande 1,2,3 , Adam M. Dabrowski 1 , Augusto Valderrama-Aguirre 3,4 and I. King Jordan 1,2,3 * 1 School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, United States, 2 IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA, United States, 3 PanAmerican Bioinformatics Institute, Cali, Colombia, 4 Biomedical Research Institute, Cali, Colombia Assortative mating is a universal feature of human societies, and individuals from ethnically diverse populations are known to mate assortatively based on similarities in genetic ancestry. However, little is currently known regarding the exact phenotypic cues, or their underlying genetic architecture, which inform ancestry-based assortative mating. We developed a novel approach, using genome-wide analysis of ancestry-specific haplotypes, to evaluate ancestry-based assortative mating on traits whose expression varies among the three continental population groups – African, European, and Native American – that admixed to form modern Latin American populations. Application of this method to genome sequences sampled from Colombia, Mexico, Peru, and Puerto Rico revealed widespread ancestry-based assortative mating. We discovered a number of anthropometric traits (body mass, height, and facial development) and neurological attributes (educational attainment and schizophrenia) that serve as phenotypic cues for ancestry-based assortative mating. Major histocompatibility complex (MHC) loci show population-specific patterns of both assortative and disassortative mating in Latin America. Ancestry-based assortative mating in the populations analyzed here appears to be driven primarily by African ancestry. This study serves as an example of how population genomic analyses can yield novel insights into human behavior. Keywords: assortative mating, mate choice, genetic ancestry, admixture, population genomics, polygenic phenotypes INTRODUCTION Mate choice is a fundamental dimension of human behavior with important implications for population genetic structure and evolution (Vandenburg, 1972; Buss, 1985; Robinson et al., 2017). It is widely known that humans choose to mate assortatively rather than randomly. That is to say that humans, for the most part, tend to choose mates that are more similar to themselves than can be expected by chance. Historically, assortative mating was based largely on geography, whereby partners were chosen from a limited set of physically proximal individuals (Cavalli- Sforza et al., 1994). Over millennia, assortative mating within groups of geographically confined individuals contributed to genetic divergence between groups, and the establishment of distinct Frontiers in Genetics | www.frontiersin.org 1 April 2019 | Volume 10 | Article 359

Upload: others

Post on 02-Sep-2019

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 1

ORIGINAL RESEARCHpublished: 24 April 2019

doi: 10.3389/fgene.2019.00359

Edited by:Michelle Luciano,

University of Edinburgh,United Kingdom

Reviewed by:Yong-Kyu Kim,

Howard Hughes MedicalInstitute (HHMI), United States

Xiaoyin Li,Case Western Reserve University,

United States

*Correspondence:I. King Jordan

[email protected]

Specialty section:This article was submitted to

Behavioral and Psychiatric Genetics,a section of the journal

Frontiers in Genetics

Received: 25 September 2018Accepted: 04 April 2019Published: 24 April 2019

Citation:Norris ET, Rishishwar L, Wang L,

Conley AB, Chande AT,Dabrowski AM, Valderrama-Aguirre A

and Jordan IK (2019) AssortativeMating on Ancestry-Variant Traits

in Admixed Latin AmericanPopulations. Front. Genet. 10:359.

doi: 10.3389/fgene.2019.00359

Assortative Mating onAncestry-Variant Traits in AdmixedLatin American PopulationsEmily T. Norris1,2,3, Lavanya Rishishwar1,2,3, Lu Wang1,3, Andrew B. Conley2,Aroon T. Chande1,2,3, Adam M. Dabrowski1, Augusto Valderrama-Aguirre3,4 andI. King Jordan1,2,3*

1 School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, United States, 2 IHRC-Georgia Tech AppliedBioinformatics Laboratory, Atlanta, GA, United States, 3 PanAmerican Bioinformatics Institute, Cali, Colombia, 4 BiomedicalResearch Institute, Cali, Colombia

Assortative mating is a universal feature of human societies, and individuals fromethnically diverse populations are known to mate assortatively based on similarities ingenetic ancestry. However, little is currently known regarding the exact phenotypic cues,or their underlying genetic architecture, which inform ancestry-based assortative mating.We developed a novel approach, using genome-wide analysis of ancestry-specifichaplotypes, to evaluate ancestry-based assortative mating on traits whose expressionvaries among the three continental population groups – African, European, and NativeAmerican – that admixed to form modern Latin American populations. Application ofthis method to genome sequences sampled from Colombia, Mexico, Peru, and PuertoRico revealed widespread ancestry-based assortative mating. We discovered a numberof anthropometric traits (body mass, height, and facial development) and neurologicalattributes (educational attainment and schizophrenia) that serve as phenotypic cuesfor ancestry-based assortative mating. Major histocompatibility complex (MHC) locishow population-specific patterns of both assortative and disassortative mating in LatinAmerica. Ancestry-based assortative mating in the populations analyzed here appearsto be driven primarily by African ancestry. This study serves as an example of howpopulation genomic analyses can yield novel insights into human behavior.

Keywords: assortative mating, mate choice, genetic ancestry, admixture, population genomics,polygenic phenotypes

INTRODUCTION

Mate choice is a fundamental dimension of human behavior with important implications forpopulation genetic structure and evolution (Vandenburg, 1972; Buss, 1985; Robinson et al., 2017).It is widely known that humans choose to mate assortatively rather than randomly. That is tosay that humans, for the most part, tend to choose mates that are more similar to themselvesthan can be expected by chance. Historically, assortative mating was based largely on geography,whereby partners were chosen from a limited set of physically proximal individuals (Cavalli-Sforza et al., 1994). Over millennia, assortative mating within groups of geographically confinedindividuals contributed to genetic divergence between groups, and the establishment of distinct

Frontiers in Genetics | www.frontiersin.org 1 April 2019 | Volume 10 | Article 359

Page 2: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 2

Norris et al. Ancestry Assortative Mating in Latinos

human populations, such as the major continental populationgroups recognized today (Rosenberg et al., 2002; Li et al., 2008;Genomes Project et al., 2015).

However, the process of geographic isolation followed bypopulation divergence that characterized human evolution hasnot been strictly linear. Ongoing human migrations havecontinuously brought previously isolated populations intocontact; when this occurs, the potential exists for once isolatedpopulations to admix, thereby forming novel population groups(Hellenthal et al., 2014). Perhaps the most precipitous exampleof this process occurred in the Americas, starting just over500 years ago with the arrival of Columbus in the NewWorld (Mann, 2011). This major historical event quickly led tothe co-localization of African, European and Native Americanpopulations that had been (mostly) physically isolated for tensof thousands of years (Jordan, 2016). As can be expected, thegeographic reunification of these populations was accompanied,to some extent, by genetic admixture and the resulting formationof novel populations. This is particularly true for populations inLatin America, which often show high levels of three-way geneticadmixture between continental population groups (Wang et al.,2008; Bryc et al., 2010; Ruiz-Linares et al., 2014; Montinaro et al.,2015; Rishishwar et al., 2015).

Nevertheless, modern admixed populations are still very muchcharacterized by non-random assortative mating. Assortativemating in modern populations has been shown to rest on avariety of traits, including physical (stature and pigmentation)and neurological (cognition and personality) attributes. Forexample, numerous studies have demonstrated an influenceof similarities in height and body mass on mate choice(Allison et al., 1996; Silventoinen et al., 2003; Robinsonet al., 2017; Stulp et al., 2017). In addition, assortativemating has been observed for diverse neurological traits, suchas educational attainment, introversion/extroversion and evenneurotic tendencies (Merikangas, 1982; Mare, 1991; Hur, 2003;Salces et al., 2004; Domingue et al., 2014; Zou et al., 2015). Harderto classify traits related to personal achievement (income andoccupational status) and culture (values and political leanings)also impact patterns of assortative mating (Merikangas, 1982;Kalmijn, 1994; Kandler et al., 2012). Odor is one of themore interesting traits implicated in mate choice, and it hasbeen linked to so-called disassortative (or negative assortative)mating, whereby less similar mates are preferred. Odor-baseddisassortative mating has been attributed to differences in genesof the major histocompatibility (MHC) locus, which functionsin the immune system, based on the idea that combinationsof divergent human leukocyte antigen (HLA) alleles providea selective advantage via elevated host resistance to pathogens(Wedekind et al., 1995; Chaix et al., 2008).

The traits that influence human mate choice are shapedby multiple factors with contributions from genes (G)and the environment (E) along with gene-by-environment(GxE) interactions. Studies that consider both genes andenvironment have shown different contributions of thesefactors to assortative mating patterns in human populations.Twin studies were initially used in an effort to tease apartthe genetic and environmental contributions to mate choice

(Lykken and Tellegen, 1993). Comparison of monozygotic anddizygotic twin pairs provided the first evidence for geneticinfluences on human mate choice, with 10–30% of the variancein mate choice explained by genetics compared to 10% sharedenvironment and 60% unique environmental variance (Rushtonand Bons, 2005). Subsequent twin design studies either did notfind any strong genetic effects on patterns of assortative mating(Zietsch et al., 2011) or found genetic effects on assortativemating with very different relative contributions of genes versusenvironmental effects depending on the trait under consideration(Verweij et al., 2012). A more recent study leveraged genome-wide association study (GWAS) variants that influence height toshow even more clear evidence for genetic effects on assortativemating (Li et al., 2017).

Ancestry is a particularly important determinant of assortativemating in modern admixed populations (Sebro et al., 2010;Zaitlen et al., 2017). Studies have shown that individuals inadmixed Latin American populations tend to mate with partnersthat have similar ancestry profiles. For example, partners fromboth Mexican and Puerto Rican populations have significantlyhigher ancestry similarities than expected by chance (Rischet al., 2009; Zou et al., 2015). In addition, a number of traitsthat have been independently linked to assortative mating showancestry-specific differences in their expression (Hancock et al.,2011). Accordingly, ancestry-based mate choice has recently beenrelated to a limited number of physical (facial development) andimmune-related (MHC loci) traits (Zou et al., 2015).

The studies that have uncovered the role of genetic ancestryin assortative mating among Latinos have relied on estimatesof global ancestry fractions between mate pairs (Risch et al.,2009; Zou et al., 2015). Given the recent accumulationof numerous whole genome sequences from admixed LatinAmerican populations – along with genome sequences fromglobal reference populations (Genomes Project et al., 2015) – it isnow possible to characterize local genetic ancestry for individualsfrom admixed American populations (Moreno-Estrada et al.,2013; Homburger et al., 2015; Rishishwar et al., 2015). In otherwords, the ancestral origins for specific chromosomal regions(haplotypes) can be assigned with high confidence for admixedindividuals (Maples et al., 2013). For the first time here, wesought to evaluate the impact of local ancestry on assortativemating in admixed Latin American populations. Since the geneticvariants that influence numerous phenotypes have been mappedto specific genomic regions, we reasoned that a focus on localancestry could help to reveal the specific phenotypic drivers ofancestry-based assortative mating.

Our approach to this question entailed an integrated analysisof local genetic ancestry and the genetic architecture of a varietyof human traits thought to be related to assortative mating.Assortative mating results in an excess of homozygosity, whereasdisassortative mating yields excess heterozygosity. It follows thatassortative (or disassortative) mating based on local ancestrywould yield an excess (or deficit) of ancestry homozygosity atspecific genetic loci. In other words, for a given population, alocus implicated in ancestry-based assortative mating would bemore likely to have the same ancestry at both pairs of haploidchromosomes within individuals than expected by chance. We

Frontiers in Genetics | www.frontiersin.org 2 April 2019 | Volume 10 | Article 359

Page 3: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 3

Norris et al. Ancestry Assortative Mating in Latinos

FIGURE 1 | Genetic ancestry proportions for the admixed Latin Americanpopulations studied here. For each population, distributions and averagevalues are shown for African (blue), European (orange), and Native American(red) ancestry.

developed a test statistic – the assortative mating index (AMI) –that evaluates this prediction for individual gene loci, and weapplied it to sets of genes that function together to encodepolygenic phenotypes. We find evidence of substantial localancestry-based assortative mating, and far less disassortativemating, for four admixed Latin American populations acrossa variety of anthropometric, neurological and immune-relatedphenotypes. Our approach also allowed us to assess the specificancestry components that drive patterns of assortative anddisassortative mating in these populations.

RESULTS

Global and Local Genetic Ancestry inLatin AmericaWe compared whole genome sequences from four admixedLatin American populations, characterized as part of the 1000Genomes Project (1KGP) (Genomes Project et al., 2015) togenome sequences and whole genome genotypes from a panelof 34 global reference populations from Africa, Europe andthe Americas (Table 1 and Supplementary Figure S1). Theprogram ADMIXTURE (Alexander et al., 2009) was usedto infer the continental genetic ancestry fractions – African,European and Native American – for individuals from thefour Latin American populations (Supplementary Figure S2).Distributions of individuals’ continental ancestry fractionsillustrate the distinct ancestry profiles of the four populations(Figure 1). Puerto Rico and Colombia show the highest Europeanancestry fractions along with the highest levels of three-wayadmixture. These two populations also have the highest Africanancestry fractions, although all four populations have relativelysmall fractions of African ancestry. Peru and Mexico show moreexclusively Native American and European admixture, with Peruhaving by far the largest Native American ancestry fraction.

The program RFMix (Maples et al., 2013) was used to inferlocal African, European and Native American genetic ancestry forindividuals from the four admixed Latin American populationsanalyzed here. RFMix uses global reference populations toperform chromosome painting, whereby the ancestral origins ofspecific haplotypes are characterized across the entire genomefor admixed individuals. Only haplotypes with high confidenceancestry assignments (≥99%) were taken for subsequent analysis.Examples of local ancestry assignment chromosome paintingsfor representative admixed individuals from each population areshown in Supplementary Figure S3. The overall continentalancestry fractions for admixed genomes calculated by globaland local ancestry analysis are highly correlated, and in factvirtually identical, across all individuals analyzed here, in supportof the reliability of these approaches to ancestry assignment(Supplementary Figure S4).

Assortative Mating and Local Ancestry inLatin AmericaWe analyzed genome-wide patterns of local ancestry assignmentto assess the evidence for assortative mating based on localancestry in Latin America (Figure 2A). For each individual,the ancestry assignments for pairs of haplotypes at any givengene were evaluated for homozygosity (i.e., the same ancestryon both haplotypes) or heterozygosity (i.e., different ancestryon both haplotypes) (Figure 2B). For each gene, across allfour populations, the observed values of ancestry homozygosityand heterozygosity were compared to expected values in orderto compute gene- and population-specific AMI values. AMI iscomputed as a log odds ratio as described in the Materials andMethods. The expected values of local ancestry homozygosityand heterozygosity used for the AMI calculations are based ona Hardy-Weinberg (HW) triallelic model with the three allelefrequencies computed as the locus-specific ancestry fractions.High positive AMI values result from an excess of observedlocal ancestry homozygosity and are thereby taken to indicateassortative mating based on shared local genetic ancestry.Conversely, low negative AMI values indicate excess localancestry heterozygosity and disassortative mating. We performeda series of controls to validate the performance of the AMItest statistic and the justification of the HW model for locus-specific ancestry. These controls are described in detail in theSupplementary Methods, and results of the controls can be seenin Supplementary Figures S5–S7.

While we were interested in exploring the relationshipbetween local genetic ancestry and assortative mating, werecognized that mate choice is based on phenotypes rather thangenotypes per se. Since phenotypes are typically encoded bymultiple genes, expressed in the context of their environment,we used data from genome-wide association studies (GWAS) toidentify sets of genes that function together to encode polygenicphenotypes (Figure 2C). We combined data from several GWASdatabase sources in order to curate a collection of 105 genesets that have been linked to the polygenic genetic architectureof a variety of human traits. These gene sets range in sizefrom 2 to 212 genes and include a total of 923 unique genes

Frontiers in Genetics | www.frontiersin.org 3 April 2019 | Volume 10 | Article 359

Page 4: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 4

Norris et al. Ancestry Assortative Mating in Latinos

TABLE 1 | Human populations analyzed in this study.

Dataseta Geographical source Short n Dataseta Geographical source n

Afri

ca(n

=54

7)

1KGP Esan in Nigeria ESN 99

Nat

ive

Am

eric

an(n

=28

0)

HGDP Pima in Mexico 14

1KGP Gambian in Western Division, The Gambia GWD 113 HGDP Maya in Mexico 21

1KGP Luhya in Webuye, Kenya LWK 99 Reich et al. Tepehuano in Mexico 25

1KGP Mende in Sierra Leone MSL 85 Reich et al. Mixtec in Mexico 5

1KGP Yoruba in Ibadan, Nigeria YRI 108 Reich et al. Mixe in Mexico 17

HGDP Mandenka 22 Reich et al. Zapotec in Mexico 43

HGDP Yoruba 21 Reich et al. Kaqchikel in Guatemala 13

Eur

ope

(n=

471)

1KGP Finnish in Finland FIN 99 Reich et al. Kogi in Colombia 4

1KGP British in England and Scotland GBR 90 Reich et al. Waunana in Colombia 3

1KGP Iberian populations in Spain IBS 107 Reich et al. Embera in Colombia 5

1KGP Toscani in Italy TSI 107 Reich et al. Guahibo in Colombia 6

HGDP Russian 25 Reich et al. Piapoco in Colombia 7

HGDP Orcadian 15 Reich et al. Inga in Colombia 9

HGDP French 28 Reich et al. Wayuu in Colombia 11

Adm

ixed

(n=

347) 1KGP Colombian in Medellin, Colombia CLM 94 HGDP Karitiana in Brazil 14

1KGP Peruvian in Lima, Peru PEL 85 HGDP Suruí in Brazil 8

1KGP Mexican Ancestry in LA, California MXL 64 Reich et al. Ticuna in Brazil 6

1KGP Puerto Rican in Puerto Rico PUR 104 Reich et al. Quechua in Peru 40

Reich et al. Aymara in Bolivia 23

Reich et al. Guarani in Paraguay 6

Populations are organized into continental groups, for both ancestral and admixed Latin American populations, and the number of individuals in each population andgroup is shown. a1KGP, 1000 Genomes Project; HGDP, Human Genome Diversity Panel; (Reich et al., 2012).

(Supplementary Figure S8). We focused on phenotypes thatare known or expected to influence mate choice and therebyimpact assortative mating patterns. These phenotypes fall intothree broad categories: anthropometric traits (e.g., body shape,stature, and pigmentation), neurological traits (e.g., cognition,personality, and addiction), and immune response (HLA genes).Finally, we used a meta-analysis of the AMI values for the sets ofgenes that underlie each polygenic phenotype in order to evaluatethe impact of local ancestry on assortative mating (Figure 2D).

We compared the distributions of observed AMI valuesversus those expected under random mating to assess the overallevidence for local ancestry-based assortative mating in LatinAmerica. Expected AMI values were computed via permutationanalysis by randomly combining pairs of haplotypes into 10,000diploid individuals for each population to approximate randommating. The distribution of the expected AMI values underrandom mating is narrow and centered around 0, whereasthe observed AMI values have a far broader distribution andtend to be positive (expected AMI µ = −0.01, σ = 0.03,observed AMI µ = 0.11, σ = 0.14; Figure 3A). When all fouradmixed Latin American populations are considered together,the mean observed AMI value is significantly greater thanthe expected mean AMI under random mating (t = 18.14,P = 8.12e-56). The same trend can be seen when all four

populations are considered separately (Supplementary FigureS9). Mean observed AMI values vary substantially acrosspopulations, with Mexico showing the highest levels of localancestry-based assortative mating and Puerto Rico showing thelowest (Figure 3B). There is also substantial variation seenfor the extent of assortative mating among the three broadfunctional categories of phenotypes (Figure 3C). Local ancestry-based assortative mating is particularly variable for HLA genes,with high levels of assortative mating seen for Mexico andevidence for disassortative mating seen for Colombia and PuertoRico. Anthropometric traits tend to show higher levels of localancestry-based assortative mating across all four populationscompared to neurological traits.

Local Ancestry-Based AssortativeMating for Polygenic PhenotypesWhen considered together, observed AMI levels are enrichedfor positive values compared to the expected values based onrandomly paired haplotypes, indicative of an overall trend ofassortative mating based on local ancestry in admixed LatinAmerican populations (Figure 3A and Supplementary FigureS10). We evaluated polygenic phenotypes individually to lookfor the strongest examples of traits linked to local ancestry-based

Frontiers in Genetics | www.frontiersin.org 4 April 2019 | Volume 10 | Article 359

Page 5: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 5

Norris et al. Ancestry Assortative Mating in Latinos

FIGURE 2 | Approach used to measure assortative mating on local ancestry.(A) Local ancestry is assigned for specific haplotypes across the genome:African (blue), European (orange), and Native American (red). (B) Withinindividual genomes, genes are characterized as homozygous or heterozygousfor local ancestry. For any given population, at each gene locus, the

(Continued)

FIGURE 2 | Continuedassortative mating index (AMI) is computed from the observed and expectedcounts of homozygous and heterozygous gene pairs. (C) Data fromgenome-wide association studies (GWAS) are used to evaluate polygenicphenotypes. (D) Meta-analysis of AMI values is used to evaluate thesignificance of ancestry-based assortative mating for polygenic phenotypes.

assortative mating and to evaluate traits that show either similaror variable assortative mating trends across populations. Wecomputed AMI values for 105 polygenic phenotypes across thefour populations; the expected and observed AMI values forall traits are shown in Supplementary Figure S10. As can beseen for the overall patterns of assortative mating, individualpolygenic phenotypes show more extreme positive (for mostcases) and negative (in a few cases) AMI values in the fouradmixed Latin American populations than can be expected forrandomly mating populations.

However, as discussed previously, it is known that admixedLatin American populations mate assortatively based on geneticancestry (Risch et al., 2009; Zou et al., 2015). This can beexpected to lead to an overall excess of ancestry homozygositygenome-wide compared to expectations based on truly randommating, and we wanted to control for this as well whenanalyzing the assortative mating signals for specific traits. Todo so, we performed an additional permutation analysis bychoosing random sets of genes, of the same size as the observedtrait-specific gene sets, and then computing AMI values andtheir significance levels for the randomly permuted gene sets.This allowed us to ask whether the polygenic phenotypes thathave statistically significant AMI values show more extremedeviations than can be expected based on genome-wide signals ofancestry-based assortative mating. The results of this additionalpermutation control are described below in the context ofthe specific polygenic phenotypes that were found to havesignificant AMI values.

There are 11 polygenic phenotypes that yield statisticallysignificant AMI values with the HW-based test, after correctionfor multiple tests, indicative of local ancestry-based assortativemating for specific traits (q < 0.05; Figure 4A). A number ofother traits shown are marginally significant after correction formultiple testing. We compared the observed AMI values for thegene sets studied here to population-specific null distributionsof expected AMI values generated using randomly permutedgene sets as described in the preceding paragraph (Figure 4B).This permutation test accounts for the genome-wide deviationsfrom HW that occur due to ancestry-based assortative mating.The comparisons of the observed versus expected AMI valuesfor this control indicate that the AMI signals that we observehere are trait-specific and cannot be attributed to the elevatedlevels of ancestry similarity seen for couples in admixed LatinAmerican populations.

The majority of the statistically significant cases of assortativemating are seen in the Mexican and Peruvian populations(10 out of 11), and the anthropometric functional categoryis most commonly seen among the significant phenotypes(8 out of 11). Height and body mass index are the mostcommonly observed phenotypes among the significant cases,

Frontiers in Genetics | www.frontiersin.org 5 April 2019 | Volume 10 | Article 359

Page 6: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 6

Norris et al. Ancestry Assortative Mating in Latinos

FIGURE 3 | Overview of ancestry-based assortative mating in the fouradmixed Latin American populations analyzed here. (A) Distributions ofobserved and expected AMI values for all four populations. Inset: Meanobserved and expected AMI values (±se) for all four populations. Significancebetween mean observed and expected AMI values (P = 8.12e-56) is indicatedby two asterisks. (B) Observed and expected average AMI values (±se)across all polygenic phenotype gene sets are shown for each population.(C) Average AMI values (±se) for each population are shown for the threemain phenotype functional categories characterized here: anthropometric,neurological, and human leukocyte antigen (HLA) genes.

each appearing in two out of the four populations analyzed here(Colombia and Peru for height and Mexico and Peru for bodymass index). The only neurological traits that show significantevidence of assortative mating are schizophrenia (Mexico and

Peru) and educational attainment (Mexico). A number of otherneurological and anthropometric traits in Mexico are marginallysignificant. Puerto Rico was the only population that did notshow any individual phenotypes with significant evidence ofassortative mating, consistent with its low overall AMI values(Figure 3B and Supplementary Figure S8). A list of thesesignificant traits, including references to the literature where thetrait single nucleotide polymorphism (SNP)-associations wereoriginally reported, is provided in Supplementary Table S1.In addition to evaluating individual phenotypes for statisticallysignificant AMI values, we also looked for polygenic phenotypesthat showed the most similar or dissimilar patterns of assortativemating across the four admixed Latin American populations.The top 20 phenotypes with the highest and lowest populationvariance are shown in Figure 4C (all are statistically significant atq < 0.05). The polygenic phenotypes with the most variance inpopulation-specific AMI values show more functional diversitycompared to the phenotypes with the strongest signals forassortative mating. All three functional categories are representedamong the highly population variant phenotypes, and the highlyvariant phenotypes consist of both assortative and disassortativemating cases (specifically the HLA genes that are described inmore detail below). Neurological phenotypes are enriched amongthe variant cases, including temperament and several addiction-related phenotypes: opioid sensitivity and drinking behavior.A list of the population (in)variant traits, including referencesto the literature where the trait SNP-associations were originallyreported, is provided in Supplementary Table S1.

Given the evidence of significant local ancestry-basedassortative mating that we observed for a number of traits, weevaluated whether there were particular ancestry componentsthat were most relevant to mate choice. In other words, we askedwhether the excess counts of observed ancestry homozygosity orheterozygosity are linked to specific local ancestry assignments:African, European and/or Native American. For significantpolygenic phenotype gene sets of interest, we computed theobserved versus expected ancestry homozygosity for eachancestry separately across all genes in the set (Figure 5). Heightis an anthropometric trait for which Colombia, Mexico, and Perushow significant evidence of assortative mating after correctionfor multiple tests (q < 0.05; Figure 5A), and Puerto Ricoshows nominally significant assortative mating for this same trait(P < 0.05). In Colombia, Peru, and Puerto Rico, assortativemating for this polygenic phenotype is driven by an excess ofAfrican homozygosity, whereas in Mexico there is a lack ofAfrican homozygosity. The neurological disease schizophreniashows statistically significant assortative mating in Mexico andPeru (q < 0.05), with marginally significant values in Colombia(Figure 5B). Patterns of assortative mating for this trait in Mexicoand Peru are driven mainly by European ancestry, whereasColombia and Puerto Rico show an excess of African ancestryhomozygosity for this same trait.

Both Colombia and Puerto Rico show disassortative matingpatterns for all HLA loci (both class I and II genes) (Figure 5C).The combined AMI values for the HLA loci are only marginallysignificant but they are among the lowest AMI values seenfor any trait evaluated here (Supplementary Figure S10), and

Frontiers in Genetics | www.frontiersin.org 6 April 2019 | Volume 10 | Article 359

Page 7: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 7

Norris et al. Ancestry Assortative Mating in Latinos

FIGURE 4 | Phenotypes with statistically significant patterns of assortative mating within and among populations. (A) The top 20 phenotypes with the highest, andmost statistically significant, assortative mating values (AMI) seen within any individual population. All AMI values shown are significant at P < 0.05, and the dashedline corresponds to a false discovery rate q-value cutoff of 0.05. (B) The observed AMI values for all trait-specific gene sets in each population (red lines) arecompared to distributions of expected AMI values (black lines) based on random permutations of 10,000 gene sets. The top panels show the overall distributions ofobserved and permuted AMI values for each population, with steep peaks around the mean values for expected AMI. The bottom panels show a more narrow rangeof observed and expected AMI values for each population, which are centered around the population-specific mean expected AMI values. (C) The top 20phenotypes with the highest or lowest, and most statistically significant, AMI variance levels across populations. Across population variance levels are normalizedusing the average AMI population variance level for all phenotypes. All AMI variance levels shown are significant at q < 0.05. The highest variance (most dissimilarpatterns) of the AMI are at the top, while the lowest variance (most similar patterns) of AMI are at the bottom.

they are also highly variable among populations (Figure 4C).HLA loci in Colombia and Puerto Rico show a distinct lackof ancestry homozygosity for almost all ancestry components(Figure 5C). Mexico and Peru, on the other hand, have someevidence for assortative mating for the HLA loci; Mexico hasthe highest estimates of ancestry homozygosity at HLA locifor any of the four populations, and Peru has an excess ofEuropean and Native American ancestry homozygosity anda deficit of African homozygosity for these genes. Similar

results for two additional anthropometric phenotypes are shownin Supplementary Figure S11: body mass index and facialdevelopment. These phenotypes show assortative mating inall four populations, with varying components of ancestralhomozygosity driving the relationships. When these resultsare considered together, African ancestry consistently showsthe strongest effect on driving assortative and disassortativemating in admixed Latin American populations (Figure 5 andSupplementary Figure S11).

Frontiers in Genetics | www.frontiersin.org 7 April 2019 | Volume 10 | Article 359

Page 8: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 8

Norris et al. Ancestry Assortative Mating in Latinos

FIGURE 5 | Individual examples of ancestry-based assortative and disassortative mating. Results of meta-analysis of (dis)assortative mating on polygenicphenotypes along with their ancestry drivers are shown for (A) an anthropometric trait: Height, (B) a neurological trait: Schizophrenia, and (C) the immune-relatedHLA class I and II genes. The meta-analysis plots show pooled AMI odds ratio values along with their 95% CIs and P-values. Stars indicate false discovery rateq-values < 0.05. The ancestry driver plots show the extent to which individual ancestry components – African (blue), European (orange), and Native American (red) –have an excess (>0) or a deficit (<0) of homozygosity.

We further evaluated the extent to which specific ancestrycomponents may drive assortative mating patterns amongadmixed individuals by evaluating the variance of the threecontinental ancestry components among individuals within eachLatin American population. Assortative mating is known toincrease population variance for traits that are involved in matechoice; thus, the ancestry components that drive assortativemating in a given population are expected to show higheroverall variance among individual genomes. African ancestryfractions show the highest variation among individuals for allfour populations (Figure 6), consistent with the results seen forthe five specific cases of assortative mating evaluated in Figure 5and Supplementary Figure S11.

DISCUSSION

Assortative mating is a nearly universal human behavior, andscientists have long been fascinated by the subject (Vandenburg,

1972; Buss, 1985). Studies of assortative mating in humanshave most often entailed direct measurements of traits – suchas physical stature, education, and ethnicity – followed bycorrelation of trait values between partners. Decades of suchstudies have revealed numerous, widely varying traits that areimplicated in mate choice and assortative mating. Studies ofthis kind typically make no assumptions regarding, nor haveany knowledge of, the genetic heredity of the traits underconsideration. Moreover, the extent to which the expressionof these traits varies among human population groups haslargely been ignored.

The first attempts to evaluate the genetic contributions toassortative mating entailed twin studies, whereby the similarityof mate choice for dizygotic versus monozygotic twins werecompared (Lykken and Tellegen, 1993). While twin studies diduncover a genetic contribution to variance in human mate choice,they often yielded widely inconsistent results. This was true forboth the overall extent of heritability in mate choice, whichranged from 0 to 30% (Rushton and Bons, 2005; Zietsch et al.,

Frontiers in Genetics | www.frontiersin.org 8 April 2019 | Volume 10 | Article 359

Page 9: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 9

Norris et al. Ancestry Assortative Mating in Latinos

FIGURE 6 | Inter-individual ancestry variance for the four admixed LatinAmerican populations analyzed here. Variance among individuals for theAfrican (blue), European (orange), and Native American (red) ancestry fractionswithin each population are shown.

2011), and the relative amounts of genetic versus environmentalcontributions across the different traits implicated in mate choice(Verweij et al., 2012).

More recent studies of assortative mating, powered byadvances in human genomics, have begun to explore thegenetic architecture underlying the human traits that formthe basis of mate choice in more detail (Domingue et al.,2014; Robinson et al., 2017). In addition, recent genomicanalyses have underscored the extent to which human geneticancestry influences assortative mating (Risch et al., 2009;Zou et al., 2015; Zaitlen et al., 2017). However, until thistime, these two strands of inquiry have not been broughttogether. The approach that we developed for this study allowedus to directly assess the connection between local geneticancestry – i.e., ancestry assignments for specific genome regionsor haplotypes – and the human traits that serve as cues forassortative mating.

Previous studies on human mate choice have demonstratedpronounced sex differences in mate preference; for example,females value earning capacity more in potential mates, whereasmales value reproductive capacity, as inferred from youth andphysical attractiveness (Buss, 1989; Geary, 2000). It should benoted that the genome-based approach that we employed heredoes not allow us to consider sex differences in mate preferencesince we are essentially observing the effects of assortativemating on the offspring of mate pairs, by comparing ancestryhomozygosity levels in the genomes of all individuals, rather thandirectly observing mate choice in couples.

Our approach relies on the well-established principle thatassortative mating results in an excess of genetic homozygosity(Sebro et al., 2010). However, we do not analyze homozygosityof specific genetic variants per se, as is normally done; rather weevaluate excess homozygosity, or the lack thereof, for ancestry-specific haplotypes (Figure 2B). By merging this approach with

data on the genetic architecture of polygenic human phenotypes,we were able to uncover specific traits that inform ancestry-basedassortative mating. This is because, when individuals exercisemate choice decisions based on ancestry, they must do so usingphenotypic cues that are ancestry-associated. In other words,ancestry-based assortative mating is, by definition, predicatedupon traits that vary in expression among human populationgroups (Supplementary Figure S12). An obvious example of thisis skin color (Hancock et al., 2011), and studies have indeedshown skin color to be an important feature of assortativemating (Frisancho et al., 1981; Malina et al., 1983; Trachtenberget al., 1985; Procidano and Rogler, 1989). It follows that theassortative mating traits that our study uncovered in admixedLatin American populations must be both genetically heritableand variable among African, European and Native Americanpopulation groups.

The traits we found to influence ancestry-based assortativemating vary among the continental population groupsthat admixed to form modern Latin American populations(Supplementary Table S2). For example, the anthropometrictraits found in our study – body mass, height and facialdevelopment – are both heritable and known to vary amongancestry groups. This implies that the genetic variants thatinfluence these traits should also vary among these populations.Accordingly, it is readily apparent that mate choice decisionsbased on these physical features could track local geneticancestry. Interpretation of the neurological traits thatshow evidence of local ancestry-based assortative mating –schizophrenia and educational attainment – is not quite asstraightforward. For schizophrenia, it is far more likely thatwe are analyzing genetic loci associated with a spectrumof personality traits that influence assortative mating, asopposed to mate choice based on full-blown schizophrenia,and indeed personality traits are widely known to impactmate choice decisions (Merikangas, 1982; Hur, 2003; Kandleret al., 2012). In addition, since schizophrenia prevalence doesnot vary greatly world-wide (Saha et al., 2005), it is morelikely that ancestry-based assortative mating for this trait istracking an underlying endophenotype rather than the diseaseitself. While educational attainment outcomes are largelyenvironmentally determined, recent large-scale GWAS studieshave uncovered a substantial genetic component to this trait,which is distributed among scores of loci across the genome(Rietveld et al., 2013, 2014; Davies et al., 2016; Okbay et al., 2016).The population distribution of education-associated variantsis currently unknown, but our results suggest the possibilityof ancestry-variation for some of them. Indeed, the averageallele frequencies for the variants that influence our top fourtraits of interest – height, BMI, schizophrenia, and educationalattainment – show significant variation among ancestry groups(Supplementary Figure S12).

Mate choice based on divergent MHC loci, apparently drivenby body odor preferences, is the best known example ofhuman disassortative mating (Wedekind et al., 1995). However,studies of this phenomenon have largely relied on ethnicallyhomogenous cohorts. In one case where females were askedto select preferred MHC-mediated odors from males of a

Frontiers in Genetics | www.frontiersin.org 9 April 2019 | Volume 10 | Article 359

Page 10: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 10

Norris et al. Ancestry Assortative Mating in Latinos

different ethnic group, they actually preferred odors of maleswith more similar MHC alleles (Jacob et al., 2002). Anotherstudy showed differences in MHC-dependent mate choice forhuman populations with distinct ancestry profiles (Chaix et al.,2008). Ours is the first study that addresses the role of ancestryin MHC-dependent mate choice in ethnically diverse admixedpopulations. Unexpectedly, we found very different results forMHC-dependent mate choice among the four Latin Americanpopulations that we studied. In fact, AMI values for the HLA lociare among the most population variable for any trait analyzedhere (Figure 4C). Mexico and Peru show evidence of assortativemating at HLA loci, whereas Colombia and Puerto Rico showevidence for disassortative mating (Figure 5C). Interestingly,disassortative mating for HLA loci in Colombia and PuertoRico is largely driven by African ancestry, and these twopopulations have substantially higher levels of African ancestrycompared to Mexico and Peru. The population- and ancestry-specific dynamics of MHC-dependent mate choice revealed hereunderscore the complexity of this issue. Given the complexity ofthe results reported here, particularly as they relate to differencesamong populations, it should also be noted that anomalouspatterns of linkage disequilibrium at MHC loci could confoundthe analysis at this region.

Assortative mating alone is not expected to change thefrequencies of alleles, or ancestry fractions in the case ofour study, within a population. Assortative mating does,however, change genotype frequencies, resulting in an excess ofhomozygous genotypes. Accordingly, ancestry-based assortativemating is expected to yield an excess of homozygosity forlocal ancestry assignments (i.e., ancestry-specific haplotypes)(Figure 2B). By increasing homozygosity in this way, assortativemating also increases the population genetic variance for thetraits that influence mate choice. In other words, assortativemating will lead to more extreme, and less intermediate,phenotypes than expected by chance. This population geneticconsequence of assortative mating allowed us to evaluate theextent to which specific continental ancestries drive mate choicedecisions in admixed populations, since specific ancestry driversof assortative mating are expected to have increased variance.We found that the fractions of African ancestry have the highestvariance among individuals for all four populations, consistentwith the idea that traits that are associated with African ancestrydrive most of the local ancestry-based assortative mating seen inthis study (Figure 6).

It is important to reiterate that previous studies haveshown evidence for assortative mating on both genetic ancestryand specific traits; accordingly, ancestry-based populationstratification could lead to the appearance of trait-basedassortative mating (Sebro et al., 2017). For example, assortativemating occurs among European-Americans along a North-SouthEuropean ancestry cline, which happens to mirror the cline inheight along this same axis (Sebro et al., 2010). This begs thequestion as to whether similarities in height among European-American couples is due to ancestry or due to the trait itself.Studies have shown conflicting results regarding this question.On the one hand, genetic ancestry (population stratification)alone was posited to account for observed patterns of assortative

mating in the US (Abdellaoui et al., 2014), whereas assortativemating for height was observed within distinct US populationgroups, independent of their ancestry (Li et al., 2017). Here,we have tried to tease apart the overall effects of ancestry-basedassortative mating versus trait-specific mate choice by permutingrandom sets of genes and re-computing our AMI test statisticfor each of the four Latin American populations analyzed here.This procedure allowed us to control for the background levelsof local ancestry homozygosity in these populations, whichcould have been generated by ancestry-based assortative matingalone. We used this control to parameterize the significancelevels for the test statistic that we used to discover trait-specificancestry-based assortative mating (Figure 4B). In other words,we only find a specific trait to be implicated in ancestry-basedassortative mating if the levels of ancestry homozygosity for thegenes associated with that trait are significantly higher than thegenome-wide background levels. In this sense, we have shownhow these specific traits may serve as cues that underlie, tosome extent, ancestry influenced mate choice in Latin Americanpopulations. A corollary to this conclusion is the fact thatancestry- and trait-based assortative mating cannot be completelydisentangled for modern admixed populations. Rather, thevariance in the expression of these traits across ancestrygroups may in fact be an informative marker for individuals’ancestral origins.

The confluence of African, European and Native Americanpopulations that marked the conquest and colonization of theNew World yielded modern Latin American populations thatare characterized by three-way genetic admixture (Wang et al.,2008; Bryc et al., 2010; Ruiz-Linares et al., 2014; Montinaroet al., 2015; Rishishwar et al., 2015). Nevertheless, mate choicein Latin America is far from random (Risch et al., 2009; Zouet al., 2015). Indeed, our results underscore the prevalence ofancestry-based assortative mating in modern Latin Americansocieties. The local ancestry approach that we developed providednew insight into this process by allowing us to hone in on thephenotypic cues that underlie ancestry-based assortative mating.Our method also illuminates the specific ancestry componentsthat drive assortative mating for different traits and makespredictions regarding traits that should vary among continentalpopulation groups.

MATERIALS AND METHODS

Whole Genome Sequences andGenotypesWhole genome sequence data for the four admixed LatinAmerican populations studied here were taken from the Phase3 data release of the 1000 Genomes Project (1KGP) (GenomesProject et al., 2015). Whole genome sequence data and genotypesfor the putative ancestral populations (Africa, Europe, and theAmericas) were taken from the 1KGP, the Human GenomeDiversity Project (Li et al., 2008) (HGDP), and a previous studyon Native American genetic ancestry (Reich et al., 2012).

Whole genome sequence data and genotypes were merged,sites common to all datasets were kept, and SNP strand

Frontiers in Genetics | www.frontiersin.org 10 April 2019 | Volume 10 | Article 359

Page 11: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 11

Norris et al. Ancestry Assortative Mating in Latinos

orientation was corrected as needed, using PLINK version 1.9(Chang et al., 2015). The resulting dataset consisted of 1,645individuals from 38 populations with variants characterized for239,989 SNPs. The set of merged SNP genotypes was phased,using the program SHAPEIT version 2.r837 (Delaneau et al.,2013), with the 1KGP haplotype reference panel. This phasedset of SNP genotypes was used for local ancestry analysis.PLINK was used to further prune the phased SNPs for linkage,yielding a pruned dataset containing 58,898 linkage-independentSNPs. This pruned set of SNP genotypes was used for globalancestry analysis.

Global and Local Ancestry AnalysisTo infer continental (global) ancestry of the four admixed LatinAmerican populations, ADMIXTURE (Alexander et al., 2009)was run on the pruned SNP genotype dataset (n = 58,898).ADMIXTURE was run using K = 4, yielding African, European,Asian and Native American ancestry fractions of each admixedindividual; the final Asian and Native American fractions weresummed to determine the Native American fraction of eachindividual. For local ancestry analysis of the admixed LatinAmerican populations, the program RFMix (Maples et al.,2013) version 1.5.4 was run in the PopPhased mode with aminimum node size of 5 and the “usereference-panels-in-EM”option with 2 EM iterations for each individual in the datasetusing the phased SNP genotypes (n = 239,989). ContinentalAfrican, European, and Native American populations wereused as reference populations, and contiguous regions with thesame ancestry assignment, i.e., ancestry-specific haplotypes, weredelineated where the RFMix ancestry assignment certainty wasat least 99%. The timing of admixture events was analyzed usingthe program TRACTS (Gravel, 2012; Gravel et al., 2013) withthe local ancestry haplotype assignments from RFMix. TRACTSwas used to evaluate possible admixture timings across 1,000bootstrap attempts, with the most likely series of admixtureevents chosen to represent each population.

Autosomal NCBI RefSeq coding genes were accessed fromthe UCSC Genome Browser and mapped to the ancestry-specific haplotypes characterized for each admixed LatinAmerican individual. For each diploid genome analyzed here,individual genes can have 0, 1, or 2 ancestry assignmentsdepending on the number of high confidence ancestry-specifichaplotypes at that locus. Our assortative mating index (AMI,see below) can only be computed for genes that have 2ancestry assignments in any given individual, i.e., cases wherethe ancestry is assigned for both copies of the gene. Thus,for each Latin American population p, the mean (xp), andstandard deviation (sdp) of the number of genes with 2 ancestryassignments were calculated and used to compute an ancestrygenotype threshold for the inclusion of genes in subsequentanalyses. Genes were used in subsequent assortative matinganalyses only if they were present above the ancestry genotypethreshold of xp − sdp.

Gene Sets for Polygenic PhenotypesThe polygenic genetic architectures of phenotypes that couldbe affected by assortative mating were characterized using

a variety of studies taken from the NHGRI-EBI GWASCatalog (MacArthur et al., 2017), the Genetic Investigation ofANthropometric Traits (GIANT) consortium1, and PubMedliterature sources.

For each polygenic phenotype, all SNPs previously implicatedat genome-wide significance levels of P ≤ 5 × 10−8 werecollected as the phenotype SNP set. The gene sets for thepolygenic phenotypes were collected by directly mapping trait-associated SNPs to genes. SNPs were used to create a geneset only if the SNP fell directly within a gene and thus nointergenic SNPs were used in creating gene sets. Gene sets fromthe GWAS Catalog were mapped from SNPs using EBI’s in-house pipeline. Sets from GIANT were mapped according tospecifications of each individual paper. Gene sets from literaturesearching were mapped using NCBI’s dbSNP. For each LatinAmerican population, phenotype gene sets were filtered to onlyinclude genes that passed the ancestry genotype threshold, asdescribed previously. Finally, the polygenic phenotype gene setswere filtered based on size, so that all polygenic phenotypesincluded two or more genes.

Linkage disequilibrium (LD) pruning was performed using theprogram PLINK to ensure that gene sets consisted of independentgenes. LD pruning was done using pairwise r2-values betweengenic SNPs for all pairs of genes in any given set. For any pairof genes with SNP r2 > 0.1, only one member of the pair wasretained for further analysis. The final data set contains gene setsfor 105 polygenic phenotypes, hierarchically organized into threefunctional categories, including 923 unique genes (haplotypes)(Supplementary Figure S7).

Assortative Mating Index (AMI)To assess local ancestry-based assortative mating, we developedthe AMI, a log odds ratio test statistic that computes the relativelocal ancestry homozygosity compared to heterozygosity for anygiven gene. Ancestry homozygosity occurs when both genesin a genome have the same local ancestry, whereas ancestryheterozygosity refers to a pair of genes in a genome with differentlocal ancestry assignments. The AMI is calculated as:

AMI = ln

(obs

(hom

)/exp

(hom

)obs

(het)/exp

(het) )

where obs(hom

)/exp

(hom

)is the ratio of the observed

and expected local ancestry homozygous gene pairs andobs

(het)/exp

(het)

is the ratio of the observed and expected localancestry heterozygous gene pairs.

The observed values of local ancestry homozygous andheterozygous gene pairs are taken from the gene-to-ancestrymapping data for each gene in each population. The expectedvalues of local ancestry homozygous and heterozygous genepairs are calculated for each gene in a population using atriallelic Hardy-Weinberg (HW) model, in which the gene-specific local ancestry assignment fractions are taken as the threeallele frequencies. For the African (a), European (e), and Native

1http://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium

Frontiers in Genetics | www.frontiersin.org 11 April 2019 | Volume 10 | Article 359

Page 12: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 12

Norris et al. Ancestry Assortative Mating in Latinos

American (n) gene-specific local ancestry assignment fractionsin a population, the HW expected genotype frequencies are:(a+ e+ n)2 or a2

+ 2ae+ e2+ 2an+ 2en+ n2. Accordingly,

the expected frequency of homozygous pairs is a2+ e2+ n2 and

the expected frequency of heterozygous pairs is 2ae+ 2an+ 2en.For each gene, in each population, the expected homozygousand heterozygous frequencies are multiplied by the number ofindividuals with two ancestry assignments for that gene to yieldthe expected counts of gene pairs in each class.

For each polygenic phenotype, a meta-analysis of gene-specificAMI values was conducted to evaluate the effect of all of thegenes involved in the phenotype on assortative mating, usingthe metafor (Viechtbauer, 2010) package in R. 95% confidenceintervals for each gene, meta-gene AMI values, significanceP-values, and false discovery rate q-values, were computed usingthe Mantel-Haenszel method under a fixed-effects model.

Controls for Evaluating the AssortativeMating Index (AMI)Four different controls were used to evaluate the design andperformance of the AMI test statistic: (1) a control for the useof HW as a null model in the AMI test statistic, (2) a permutationanalysis to evaluate expected AMI values under random mating,(3) a population genetic simulation to evaluate the power of theAMI test statistic to detect ancestry-based assortative mating andits dependence on the different ancestry combinations of thepopulations we analyzed, and (4) a permutation of random genesets to generate null distributions of AMI values expected giventhe observed genome-wide signals of ancestry-based assortativemating. Each of these control analyses is described in detail in theSupplementary Materials and Methods.

Ancestry-Specific Drivers of AssortativeMatingFor each significant polygenic phenotype of interest, we identifiedthe ancestry component related to mate choice by calculatingthe ancestry homozygosity (AHanc

phenotype) for all genes for eachancestry at the given phenotype. The ancestry homozygosity wascalculated as:

AHancphenotype =

∑g ∈ genes in the phenotype

(obsanc

g − expancg

expancg

)

where anc is one of the three ancestries – African, Europeanor Native American, g ∈ genes in the phenotype are all of thegenes involved in the polygenic phenotype, obsanc

g is the numberof observed homozygous genes for gene g coming from anc,and expanc

g is the number of expected homozygous genes forgene g coming from anc (as calculated using a triallelic Hardy–Weinberg model).

Statistical Significance TestingSignificance testing for the difference between the observedand expected AMI distributions (for both random mating and

assortative mating) was completed using the t-test packagein R. The metafor package, used for calculating the meta-analysis AMI values, also calculates a P-value and a falsediscovery rate q-value to correct for multiple statistical tests,which were used for identifying polygenic phenotypes thatare significantly influenced by local ancestry-based assortativemating in each Latin American population. The variance ofAMI values across the four populations for each phenotype wascalculated as it is implemented in R and used for identifyingphenotypes that had highly similar (minimal variance) or highlydissimilar (maximal variance) local ancestry-based assortativemating patterns. The coefficient of variation was used to measurethe inter-individual variance for each of the three continentalancestry components within the four admixed Latin Americanpopulations analyzed here.

ETHICS STATEMENT

All data were sourced from publicly-available databases and donot require additional ethics approval.

As per the Data Availability section of the manuscript, datacan be access from: 1000 Genomes Project data are available fromhttp://www.internationalgenome.org/data/

Human Genome Diversity Project data are available from http://www.hagsc.org/hgdp/

Previously published Native American genotype data can beaccessed from a data use agreement governed by the Universityof Antioquia as previously described (Reich et al., 2012).

AUTHOR CONTRIBUTIONS

EN conducted all of the ancestry-based assortative matinganalyses. LR performed the permutation and simulation analyses.LW participated in the assortative mating analysis for individualphenotypes. ABC performed the genetic ancestry analyses. EN,ATC, AD, and AV-A curated the GWAS SNP associations andpolygenic phenotype gene sets. EN, LR, LW, and ABC generatedthe manuscript figures. IJ conceived of, designed and supervisedthe project. EN, LR, and IJ wrote the manuscript. All authors readand approved the final manuscript.

FUNDING

EN, LR, ABC, ATC, and IJ were supported by the IHRC-GeorgiaTech Applied Bioinformatics Laboratory (ABiL). LW and ADwere supported by the Georgia Tech Bioinformatics GraduateProgram. AV was supported by Fulbright, Colombia.

SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be foundonline at: https://www.frontiersin.org/articles/10.3389/fgene.2019.00359/full#supplementary-material

Frontiers in Genetics | www.frontiersin.org 12 April 2019 | Volume 10 | Article 359

Page 13: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 13

Norris et al. Ancestry Assortative Mating in Latinos

REFERENCESAbdellaoui, A., Verweij, K. J., and Zietsch, B. P. (2014). No evidence for genetic

assortative mating beyond that due to population stratification. Proc. Natl.Acad. Sci. U.S.A. 111:E4137. doi: 10.1073/pnas.1410781111

Alexander, D. H., Novembre, J., and Lange, K. (2009). Fast model-based estimationof ancestry in unrelated individuals. Genome Res. 19, 1655–1664. doi: 10.1101/gr.094052.109

Allison, D. B., Neale, M. C., Kezis, M. I., Alfonso, V. C., Heshka, S., and Heymsfield,S. B. (1996). Assortative mating for relative weight: genetic implications. Behav.Genet. 26, 103–111. doi: 10.1007/bf02359888

Bryc, K., Velez, C., Karafet, T., Moreno-Estrada, A., Reynolds, A., Auton, A., et al.(2010). Colloquium paper: genome-wide patterns of population structure andadmixture among Hispanic/Latino populations. Proc. Natl. Acad. Sci. U.S.A.107(Suppl. 2), 8954–8961. doi: 10.1073/pnas.0914618107

Buss, D. M. (1985). Human mate selection: opposites are sometimes said to attract,but in fact we are likely to marry someone who is similar to us in almost everyvariable. Am. Sci. 73, 47–51.

Buss, D. M. (1989). Conflict between the sexes: strategic interference and theevocation of anger and upset. J. Pers. Soc. Psychol. 56, 735–747. doi: 10.1037//0022-3514.56.5.735

Cavalli-Sforza, L. L., Menozzi, P., and Piazza, A. (1994). The History and Geographyof Human Genes. Princeton: Princeton University Press.

Chaix, R., Cao, C., and Donnelly, P. (2008). Is mate choice in humans MHC-dependent? PLoS Genet. 4:e1000184. doi: 10.1371/journal.pgen.1000184

Chang, C. C., Chow, C. C., Tellier, L. C., Vattikuti, S., Purcell, S. M., and Lee, J. J.(2015). Second-generation PLINK: rising to the challenge of larger and richerdatasets. Gigascience 4:7. doi: 10.1186/s13742-015-0047-8

Davies, G., Marioni, R. E., Liewald, D. C., Hill, W. D., Hagenaars, S. P., Harris,S. E., et al. (2016). Genome-wide association study of cognitive functions andeducational attainment in UK Biobank (N = 112 151). Mol. Psychiatry 21,758–767. doi: 10.1038/mp.2016.45

Delaneau, O., Zagury, J. F., and Marchini, J. (2013). Improved whole-chromosomephasing for disease and population genetic studies. Nat. Methods 10, 5–6.doi: 10.1038/nmeth.2307

Domingue, B. W., Fletcher, J., Conley, D., and Boardman, J. D. (2014). Genetic andeducational assortative mating among US adults. Proc. Natl. Acad. Sci. U.S.A.111, 7996–8000. doi: 10.1073/pnas.1321426111

Frisancho, A. R., Wainwright, R., and Way, A. (1981). Heritability and componentsof phenotypic expression in skin reflectance of Mestizos from the Peruvianlowlands. Am. J. Phys. Anthropol. 55, 203–208. doi: 10.1002/ajpa.1330550207

Geary, D. C. (2000). Evolution and proximate expression of human paternalinvestment. Psychol. Bull. 126, 55–77. doi: 10.1037//0033-2909.126.1.55

Genomes Project, C., Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang,H. M., et al. (2015). A global reference for human genetic variation. Nature 526,68–74. doi: 10.1038/nature15393

Gravel, S. (2012). Population genetics models of local ancestry. Genetics 191,607–619. doi: 10.1534/genetics.112.139808

Gravel, S., Zakharia, F., Moreno-Estrada, A., Byrnes, J. K., Muzzio, M., Rodriguez-Flores, J. L., et al. (2013). Reconstructing Native American migrations fromwhole-genome and whole-exome data. PLoS Genet. 9:e1004023. doi: 10.1371/journal.pgen.1004023

Hancock, A. M., Witonsky, D. B., Alkorta-Aranburu, G., Beall, C. M.,Gebremedhin, A., Sukernik, R., et al. (2011). Adaptations to climate-mediatedselective pressures in humans. PLoS Genet. 7:e1001375. doi: 10.1371/journal.pgen.1001375

Hellenthal, G., Busby, G. B., Band, G., Wilson, J. F., Capelli, C., Falush, D., et al.(2014). A genetic atlas of human admixture history. Science 343, 747–751.doi: 10.1126/science.1243518

Homburger, J. R., Moreno-Estrada, A., Gignoux, C. R., Nelson, D., Sanchez,E., Ortiz-Tello, P., et al. (2015). Genomic insights into the ancestry anddemographic history of South America. PLoS Genet. 11:e1005602. doi: 10.1371/journal.pgen.1005602

Hur, Y. M. (2003). Assortative mating for personality traits, educational level,religious affiliation, height, weight, and body mass index in parents of Koreantwin sample. Twin Res. 6, 467–470. doi: 10.1375/136905203322686446

Jacob, S., McClintock, M. K., Zelano, B., and Ober, C. (2002). Paternally inheritedHLA alleles are associated with women’s choice of male odor. Nat. Genet. 30,175–179. doi: 10.1038/ng830

Jordan, I. K. (2016). The Columbian Exchange as a source of adaptive introgressionin human populations. Biol. Direct. 11:17. doi: 10.1186/s13062-016-0121-x

Kalmijn, M. (1994). Assortative mating by cultural and economic occupationalstatus. Am. J. Sociol. 100, 422–452. doi: 10.1086/230542

Kandler, C., Bleidorn, W., and Riemann, R. (2012). Left or right? Sources of politicalorientation: the roles of genetic factors, cultural transmission, assortativemating, and personality. J. Pers. Soc. Psychol. 102, 633–645. doi: 10.1037/a0025560

Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M., Ramachandran,S., et al. (2008). Worldwide human relationships inferred from genome-widepatterns of variation. Science 319, 1100–1104. doi: 10.1126/science.1153717

Li, X., Redline, S., Zhang, X., Williams, S., and Zhu, X. (2017). Height associatedvariants demonstrate assortative mating in human populations. Sci. Rep.7:15689. doi: 10.1038/s41598-017-15864-x

Lykken, D. T., and Tellegen, A. (1993). Is human mating adventitious or the resultof lawful choice? A twin study of mate selection. J. Pers. Soc. Psychol. 65, 56–68.doi: 10.1037//0022-3514.65.1.56

MacArthur, J., Bowler, E., Cerezo, M., Gil, L., Hall, P., Hastings, E., et al. (2017).The new NHGRI-EBI Catalog of published genome-wide association studies(GWAS Catalog).Nucleic Acids Res. 45, D896–D901. doi: 10.1093/nar/gkw1133

Malina, R. M., Selby, H. A., Buschang, P. H., Aronson, W. L., and Little, B. B. (1983).Assortative mating for phenotypic characteristics in a Zapotec community inOaxaca, Mexico. J. Biosoc. Sci. 15, 273–280.

Mann, C. C. (2011). 1493: Uncovering the New World Columbus Created.New York, NY: Vintage.

Maples, B. K., Gravel, S., Kenny, E. E., and Bustamante, C. D. (2013). RFMix:a discriminative modeling approach for rapid and robust local-ancestryinference. Am. J. Hum. Genet. 93, 278–288. doi: 10.1016/j.ajhg.2013.06.020

Mare, R. D. (1991). Five decades of educational assortative mating. Am. Sociol. Rev.56, 15–32.

Merikangas, K. R. (1982). Assortative mating for psychiatric disorders andpsychological traits. Arch. Gen. Psychiatry 39, 1173–1180.

Montinaro, F., Busby, G. B., Pascali, V. L., Myers, S., Hellenthal, G., and Capelli,C. (2015). Unravelling the hidden ancestry of American admixed populations.Nat. Commun. 6:6596. doi: 10.1038/ncomms7596

Moreno-Estrada, A., Gravel, S., Zakharia, F., McCauley, J. L., Byrnes, J. K.,Gignoux, C. R., et al. (2013). Reconstructing the population genetic historyof the Caribbean. PLoS Genet. 9:e1003925. doi: 10.1371/journal.pgen.1003925

Okbay, A., Beauchamp, J. P., Fontana, M. A., Lee, J. J., Pers, T. H., Rietveld, C. A.,et al. (2016). Genome-wide association study identifies 74 loci associated witheducational attainment. Nature 533, 539–542. doi: 10.1038/nature17671

Procidano, M. E., and Rogler, L. H. (1989). Homogamous assortative matingamong Puerto Rican families: intergenerational processes and the migrationexperience. Behav. Genet. 19, 343–354. doi: 10.1007/bf01066163

Reich, D., Patterson, N., Campbell, D., Tandon, A., Mazieres, S., Ray, N., et al.(2012). Reconstructing native American population history. Nature 488, 370–374. doi: 10.1038/nature11258

Rietveld, C. A., Esko, T., Davies, G., Pers, T. H., Turley, P., Benyamin, B.,et al. (2014). Common genetic variants associated with cognitive performanceidentified using the proxy-phenotype method. Proc. Natl. Acad. Sci. U.S.A. 111,13790–13794. doi: 10.1073/pnas.1404623111

Rietveld, C. A., Medland, S. E., Derringer, J., Yang, J., Esko, T., Martin, N. W., et al.(2013). GWAS of 126,559 individuals identifies genetic variants associated witheducational attainment. Science 340, 1467–1471. doi: 10.1126/science.1235488

Risch, N., Choudhry, S., Via, M., Basu, A., Sebro, R., Eng, C., et al. (2009). Ancestry-related assortative mating in Latino populations. Genome Biol. 10:R132. doi:10.1186/gb-2009-10-11-r132

Rishishwar, L., Conley, A. B., Wigington, C. H., Wang, L., Valderrama-Aguirre,A., and Jordan, I. K. (2015). Ancestry, admixture and fitness in Colombiangenomes. Sci. Rep. 5:12376. doi: 10.1038/srep12376

Robinson, M. R., Kleinman, A., Graff, M., Vinkhuyzen, A. A., Couper, D., Miller,M. B., et al. (2017). Genetic evidence of assortative mating in humans. Nat.Hum. Behav. 1:0016.

Frontiers in Genetics | www.frontiersin.org 13 April 2019 | Volume 10 | Article 359

Page 14: Assortative Mating on Ancestry-Variant Traits in Admixed ...jordan.biology.gatech.edu/pubs/Norris-Frontiers-2019.pdf · fgene-10-00359 April 23, 2019 Time: 17:24 # 2 Norris et al

fgene-10-00359 April 23, 2019 Time: 17:24 # 14

Norris et al. Ancestry Assortative Mating in Latinos

Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd,K. K., Zhivotovsky, L. A., et al. (2002). Genetic structure ofhuman populations. Science 298, 2381–2385. doi: 10.1126/science.1078311

Ruiz-Linares, A., Adhikari, K., Acuna-Alonzo, V., Quinto-Sanchez, M., Jaramillo,C., Arias, W., et al. (2014). Admixture in Latin America: geographicstructure, phenotypic diversity and self-perception of ancestry based on7,342 individuals. PLoS Genet. 10:e1004572. doi: 10.1371/journal.pgen.1004572

Rushton, J. P., and Bons, T. A. (2005). Mate choice and friendship in twins:evidence for genetic similarity. Psychol. Sci. 16, 555–559. doi: 10.1111/j.0956-7976.2005.01574.x

Saha, S., Chant, D., Welham, J., and McGrath, J. (2005). A systematic review ofthe prevalence of schizophrenia. PLoS Med. 2:e141. doi: 10.1371/journal.pmed.0020141

Salces, I., Rebato, E., and Susanne, C. (2004). Evidence of phenotypic and socialassortative mating for anthropometric and physiological traits in couplesfrom the Basque country (Spain). J. Biosoc. Sci. 36, 235–250. doi: 10.1017/s0021932003006187

Sebro, R., Hoffman, T. J., Lange, C., Rogus, J. J., and Risch, N. J. (2010). Testingfor non-random mating: evidence for ancestry-related assortative mating inthe Framingham heart study. Genet. Epidemiol. 34, 674–679. doi: 10.1002/gepi.20528

Sebro, R., Peloso, G. M., Dupuis, J., and Risch, N. J. (2017). Structured mating:patterns and implications. PLoS Genet. 13:e1006655. doi: 10.1371/journal.pgen.1006655

Silventoinen, K., Kaprio, J., Lahelma, E., Viken, R. J., and Rose, R. J.(2003). Assortative mating by body height and BMI: finnish twinsand their spouses. Am. J. Hum. Biol. 15, 620–627. doi: 10.1002/ajhb.10183

Stulp, G., Simons, M. J., Grasman, S., and Pollet, T. V. (2017). ). Assortativemating for human height: a meta-analysis. Am. J. Hum. Biol. 29:e22917.doi: 10.1002/ajhb.22917

Trachtenberg, A., Stark, A. E., Salzano, F. M., and Da Rocha, F. J. (1985). Canonicalcorrelation analysis of assortative mating in two groups of Brazilians. J. Biosoc.Sci. 17, 389–403.

Vandenburg, S. G. (1972). Assortative mating, or who marries whom? Behav. Genet.2, 127–157. doi: 10.1007/bf01065686

Verweij, K. J., Burri, A. V., and Zietsch, B. P. (2012). Evidence for genetic variationin human mate preferences for sexually dimorphic physical traits. PLoS One7:e49294. doi: 10.1371/journal.pone.0049294

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package.J. Stat. Softw. 36, 1–48.

Wang, S., Ray, N., Rojas, W., Parra, M. V., Bedoya, G., Gallo, C., et al. (2008).Geographic patterns of genome admixture in Latin American Mestizos. PLoSGenet. 4:e1000037. doi: 10.1371/journal.pgen.1000037

Wedekind, C., Seebeck, T., Bettens, F., and Paepke, A. J. (1995). MHC-dependentmate preferences in humans. Proc. Biol. Sci. 260, 245–249. doi: 10.1098/rspb.1995.0087

Zaitlen, N., Huntsman, S., Hu, D., Spear, M., Eng, C., Oh, S. S., et al.(2017). The effects of migration and assortative mating on admixture linkagedisequilibrium. Genetics 205, 375–383. doi: 10.1534/genetics.116.192138

Zietsch, B. P., Verweij, K. J., Heath, A. C., and Martin, N. G. (2011). Variationin human mate choice: simultaneously investigating heritability, parentalinfluence, sexual imprinting, and assortative mating. Am. Nat. 177, 605–616.doi: 10.1086/659629

Zou, J. Y., Park, D. S., Burchard, E. G., Torgerson, D. G., Pino-Yanes, M., Song, Y. S.,et al. (2015). Genetic and socioeconomic study of mate choice in Latinos revealsnovel assortment patterns. Proc. Natl. Acad. Sci. U.S.A. 112, 13621–13626.doi: 10.1073/pnas.1501741112

Conflict of Interest Statement: The authors declare that the research wasconducted in the absence of any commercial or financial relationships that couldbe construed as a potential conflict of interest.

Copyright © 2019 Norris, Rishishwar, Wang, Conley, Chande, Dabrowski,Valderrama-Aguirre and Jordan. This is an open-access article distributed under theterms of the Creative Commons Attribution License (CC BY). The use, distributionor reproduction in other forums is permitted, provided the original author(s) andthe copyright owner(s) are credited and that the original publication in this journalis cited, in accordance with accepted academic practice. No use, distribution orreproduction is permitted which does not comply with these terms.

Frontiers in Genetics | www.frontiersin.org 14 April 2019 | Volume 10 | Article 359