nationwide genomic study in denmark reveals remarkable … › content › genetics › early ›...

1

Nationwide Genomic Study in Denmark Reveals Remarkable

Population Homogeneity

Georgios Athanasiadis,*,†,1 Jade Y. Cheng,* Bjarni J. Vilhjálmsson,*,‡ Frank G.

Jørgensen,§ Thomas D. Als,**,††,‡‡ Stephanie Le Hellard,§§,*** Thomas Espeseth,†††,‡‡‡

Patrick F. Sullivan,§§§,****,†††† Christina M. Hultman,§§§ Peter C. Kjærgaard,†,‡‡‡‡,2

Mikkel H. Schierup*,†,§§§§ and Thomas Mailund,*,†

*Bioinformatics Research Centre, Aarhus University, Aarhus, 8000, Denmark

†Centre for Biocultural History, Aarhus University, Aarhus, 8000, Denmark

‡Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston,

MA 02115, USA

§Tørring Gymnasium, Tørring, 7160, Denmark

**Department of Biomedicine, Aarhus University, Aarhus, 8000, Denmark

††The Initiative for Integrative Psychiatric Research (iPSYCH), Aarhus University,

Aarhus, 8000, Denmark

‡‡Center for Integrative Sequencing, iSEQ, Aarhus University, Aarhus, 8000,

Denmark

§§Dr E. Martens Research Group of Biological Psychiatry, Center for Medical

Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, 5021,

Norway

Genetics: Early Online, published on August 17, 2016 as 10.1534/genetics.116.189241

Copyright 2016.

2

***NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical

Science, University of Bergen, Bergen, 5021, Norway

†††NORMENT, KG Jebsen Centre for Psychosis Research, Department of

Psychology, University of Oslo, Oslo, 0424, Norway

‡‡‡NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health

and Addiction, Oslo University Hospital, Oslo, 0424, Norway

§§§Department of Medical Epidemiology and Biostatistics, Karolinska Institutet,

Stockholm, 17177, Sweden

****Department of Genetics, University of North Carolina, Chapel Hill, NC 27599-

7264, USA

††††Department of Psychiatry, University of North Carolina, Chapel Hill, NC 27599-

7264, USA

‡‡‡‡Department of Culture and Society, Aarhus University, Aarhus, 8000, Denmark

§§§§Department of Bioscience, Aarhus University, Aarhus, 8000, Denmark

1Corresponding author: Bioinformatics Research Centre, Aarhus University, Aarhus,

8000, Denmark. E-mail: [email protected].

2Present address: The Natural History Museum of Denmark, University of

Copenhagen, Copenhagen, 1471, Denmark.

3

Running title

The population structure of Denmark

Key words

Human population genetics, population structure, admixture, polygenic risk score,

genetic prediction

Corresponding author

Name: Dr. Georgios Athanasiadis

Mailing address: Bioinformatics Research Centre, C.F. Møllers Allé, Building 1110,

Aarhus University, Aarhus, 8000, Denmark

Phone number: (+45) 87 15 55 69

E-mail: [email protected]

4

Abstract

Denmark has played a substantial role in the history of Northern Europe. Through a

nationwide scientific outreach initiative, we collected genetic and anthropometrical

data from ~800 high school students and used them to elucidate the genetic makeup

of the Danish population, as well as to assess polygenic predictions of phenotypic

traits in adolescents. We observed remarkable homogeneity across different

geographic regions, although we could still detect weak signals of genetic structure

reflecting the history of the country. Denmark presented genomic affinity with

primarily neighboring countries with overall resemblance of decreasing weight from

Britain, Sweden, Norway, Germany and France. A Polish admixture signal was

detected in Zealand and Funen and our date estimates coincided with historical

evidence of Wend settlements in the south of Denmark. We also observed

considerably diverse demographic histories among Scandinavian countries, with

Denmark having the smallest current effective population size compared to Norway

and Sweden. Finally, we found that polygenic prediction of self-reported adolescent

height in the population was remarkably accurate (R2 = 0.639±0.015). The high

homogeneity of the Danish population could render population structure a lesser

concern for the upcoming large-scale gene-mapping studies in the country.

5

Introduction

Denmark has played a substantial role in the history of Northern Europe. Like Swedes

and Norwegians, Danes are historically linked to the Vikings – the Germanic Norse

seafarers whose commercial and military operations marked the Viking Age in

European history (793-1066). Through a series of invasions, conquests and alliances,

the Danish Vikings established settlements as far as in Britain, Estonia, the Faroe

Islands, Iceland, Greenland and even Canada. Denmark’s long-lasting historical bond

with Sweden and Norway is nicely exemplified in the formation of the Kalmar Union,

a state that brought together the three Scandinavian nations from 1397 to 1523 as a

response to Germany’s expansionist tendency to the North (Derry 2000).

Denmark’s historical complexity is contrasted by the country’s small current size –

both in terms of geographic area (~43,000 km2) and population (5,614 million

inhabitants) – a fact even further contrasted by the notable compartmentalization of

the Danish linguistic map in about 30 dialectal areas (http://dialekt.ku.dk/). These

observations, together with the evident lack of strong geographic barriers in the

country, make present-day Denmark an interesting setting for studying the genetic

history of small populations with a glorious past.

In recent years, there has been an explosion of human genetic studies that contributed

substantially to the characterization of worldwide variation patterns (Li et al. 2008;

Lao et al. 2008; Novembre et al. 2008; Reich et al. 2009; Busby et al. 2015); the

reconstruction of population history in regions with poor/nonexistent historical

records (Moreno-Estrada et al. 2014; Moreno-Mayar et al. 2014); the study of local

and global patterns of admixture in multiethnic societies (Bryc et al. 2015); and the

study of admixture with other hominin species as well as its use in elucidating human

6

dispersals (Green et al. 2010; Reich et al. 2011). The increased power of high-

throughput genotype data and computational methods has also boosted the emergence

of many single-country genomic projects in Europe (Jakkula et al. 2008; Price et al.

2009; Genome of the Netherlands Consortium 2014; Leslie et al. 2015; Karakachoff

et al. 2015) and the release of useful data to the public (Welter et al. 2014).

In this work, we extend the collection of single-country genetic studies by introducing

a new project from Denmark. Unlike previous genomic projects involving Denmark

(Bae et al. 2013; Larsen et al. 2013), ours was conceived from the beginning as a

scientific outreach initiative with benefits for both the general public and our research

objectives. We invited high school students from across Denmark to participate in

activities whose primary goal was to promote genomic literacy in secondary

education (Athanasiadis et al. 2016). Most participants donated a DNA sample, which

we used to explore the extent to which recent and more distant historical events left

their mark on the genetic makeup of the Danish population. Apart from the population

structure and demographic history in Denmark, an additional focus of this work has

been the quantitative study of basic anthropometrical traits.

With this work, we ultimately report back to our DNA donors the biological and

historical insights we gained from analyzing their genetic data. Our results showed

remarkable homogeneity across different geographic regions in Denmark, though we

were still able to detect weak signals of genetic structure. Denmark received

substantial admixture contributions primarily from neighboring countries in the past

ten centuries with overall influence of decreasing weight from Britain, Sweden and

Norway. We also observed considerably diverse demographic histories among the

Scandinavian countries, as reflected in their historical effective population size.

Finally, we found that human height can be predicted with remarkable accuracy in

7

Danish adolescents. Overall, this work showcases how far we can go with the analysis

of genetic data from small, fairly homogeneous populations.

Materials and Methods

Sample description

New data:- We recruited ~800 students from 36 high schools from across Denmark

under the Where Are You From? project (Athanasiadis et al. 2016). These numbers

correspond to the sampling of approximately one in 9,350 Danish inhabitants (Figure

1). We asked each participant to provide a saliva sample for DNA analysis and to

answer an online questionnaire about family origin (their own, their parents’ and their

grandparents’ place of birth) and basic anthropometrical data (e.g. self-reported height

and weight). The institutional review board of Aarhus University approved the study.

Because no health-related questions were asked, there was no requirement for

additional approval by the University’s medical ethics committee. Informed consent

was obtained either from the participants (age > 18 years) or their parents (age < 18

years). We used the 23andMe (Mountain View, CA, USA) DNA analysis service for

the genotyping of 723 participants. 23andMe uses a custom HumanOmniExpress-24

BeadChip from Illumina (San Diego, CA, USA). Genetic and phenotypic data have

been deposited at the European Genome-Phenome Archive (EGA; https://ega-

archive.org) under accession number EGAS00001001868. After excluding duplicate

single nucleotide polymorphisms (SNPs), applying a per-locus missingness threshold

of 2% with PLINK v1.9 (Chang et al. 2015) and removing SNPs ambiguously

mapped to the forward DNA strand, 517,403 unique autosomal SNPs were available

for further analysis.

8

Figure 1. Location of the 36 Danish high schools participating in the Where Are You From? project colored by

geographic origin. Number of samples with at least three grandparents born in each of the six regions is shown in

brackets.

Additional data:- To put our study in a broader European context, we included four

additional datasets: (i) 2,833 individuals × 227,899 SNPs from the POpulation

REference Sample (POPRES) (Nelson et al. 2008); (ii) 494 individuals × 314,526

SNPs from Amyotrophic Lateral Sclerosis (ALS) Finland (Laaksovirta et al. 2010);

(iii) 381 individuals × 577,252 SNPs from the Swedish Schizophrenia Study (Ripke et

al. 2013); and (iv) 300 individuals × 537,306 SNPs from the Norwegian Cognitive

NeuroGenetics (NCNG) sample (Espeseth et al. 2012). We applied separate quality

controls to these datasets according to their particularities (protocols available upon

request).

Imputation

We carried out genotype imputation within each dataset separately before combining

them. We first changed DNA strand orientation of several SNPs in all five datasets to

create a uniform forward orientation using SNPFLIP

Capital Region (14)Zealand (36)Funen (13)South Jutland (37)Central Jutland (89)North Jutland (48)Total (237)

9

(https://github.com/endrebak/snp-flip) and PLINK. Table S1 shows the number of

SNPs that were flipped/removed from each dataset. We then used liftOver from the

UCSC Genome Browser to “lift” genome coordinates from the NCBI36/hg18 (March

2006) to the GRCh37/hg19 (February 2009) assembly in the POPRES, ALS Finland

and NCNG datasets. We produced “prephased” haplotypes for each dataset with

SHAPEIT v2.720 (Delaneau et al. 2012) and used the haplotypes together with the

latest 1000 Genomes Phase 3 reference panel (b37, October 2014) for the separate

imputation of the five datasets with IMPUTE2 v.2.3.1 (Howie et al. 2012). Finally,

we concatenated the imputed data into separate chromosomes and filtered them for

“info” ≥ 0.975 with QCTOOL (http://www.well.ox.ac.uk/~gav/qctool/#overview).

Principal component analysis

We ran principal component analysis (PCA) with PLINK to explore population

structure in our Danish sample. PCA was run on two different data combinations: (i)

POPRES, Where Are You From?, NCNG and the Swedish Schizophrenia Study; and

(ii) Where Are You From?, NCNG, the Swedish Schizophrenia Study and Germany

from POPRES. To avoid undesirable clustering due to extensive linkage

disequilibrium (LD), we thinned the genotypes with PLINK (using a window and step

size of 1,500 and 150 SNPs, respectively, and r2 threshold = 0.80) and removed SNPs

from known high-LD genomic regions (e.g. MHC on chromosome 6).

Chromosome painting, population clustering and admixture

We used a set of LD-based methods – CHROMOPAINTER (Lawson et al. 2012),

fineSTRUCTURE (Lawson et al. 2012) and GLOBETROTTER (Hellenthal et al.

2014) – to explore fine-grain population structure and admixture in Denmark. These

methods require a set of phased SNP data from “donor” (i.e. the available European

10

samples) and “recipient” populations (i.e. the Danish sample). In brief, the methods

detect extended multi-marker haplotypes across the genome, which are organized in

pairwise vectors of similarity counts (CHROMOPAINTER). These vectors are then

used by an MCMC algorithm (fineSTRUCTURE) to hierarchically cluster individuals

into groups that are often geographically, linguistically and/or historically meaningful

(Leslie et al. 2015). Admixture proportions are estimated through multiple linear

regression on the average proportion of DNA that each recipient copies from each of

the donor groups (GLOBETROTTER). Nonparametric standard errors of admixture

proportions were calculated with a jackknife procedure described elsewhere

(Montinaro et al. 2015). Finally, GLOBETROTTER also measures the decay of

association vs. genetic distance between the “chromosome chunks” copied from a

given pair of donor groups. This decay is exponentially distributed with rate equal to

the time that the admixture occurred (Hellenthal et al. 2014; Busby et al. 2015).

After jointly phasing 489,209 imputed autosomal SNPs across all five datasets with

IMPUTE2, we ran CHROMOPAINTER using default options (Leslie et al. 2015) on

three different datasets: (i) Denmark alone; (ii) Europe without Denmark; and (iii)

Europe and Denmark together. We previously ran each analysis ten times on a sample

subset (~10% of the total number samples and only for chromosomes 4, 10, 15, and

22) to estimate the switch and global emission rates used by CHROMOPAINTER’s

Hidden Markov Model. Once similarity vectors were defined in the three datasets, we

used fineSTRUCTURE to explore clustering in recipient (dataset i) and donor (dataset

ii) populations. For the GLOBETROTTER analysis, we merged the similarity

matrices from datasets ii and iii and ran the program with default options described

elsewhere (Leslie et al. 2015; Busby et al. 2015).

Ancestry component analysis

11

We used Ohana (https://github.com/jade-cheng/ohana) to estimate individual

admixture proportions in 13 European countries – mostly Western and Northern –

including Denmark. In brief, Ohana is a new model-based method that uses the same

likelihood admixture model as does STRUCTURE (Pritchard et al. 2000), FRAPPE

(Tang et al. 2005) and ADMIXTURE (Alexander et al. 2009), and Newton’s method

for optimization. Compared to established methods, Ohana achieves better likelihood

estimates (R. Nielsen, personal communication). After running the algorithm, we

reported per-country admixture proportions by averaging individual proportions

within each country.

Relatedness and identity by descent

To explore relatedness in our Danish sample, we used two different methods: KING

(Manichaikul et al. 2010) for a formal calculation of pairwise kinship coefficients and

BEAGLE Refined IBD (Browning and Browning 2013a) for the inference of DNA

segments that were identical by descent (IBD) between pairs of individuals. We

analyzed 406 individuals for whom we had complete information that all four of their

grandparents were born in Denmark. We excluded from the analysis one individual

from pairs of twins and siblings (preferentially the one with highest per-locus

genotype missingness).

Historical effective population size

We also estimated the historical effective population size (Ne) of the Danish, Swedish

and Norwegian populations through the combination of two IBD-based methods. We

first applied IBDseq (Browning and Browning 2013b) to the three Scandinavian

datasets separately to produce sets of pairwise IBD tracts. We then used IBDNe

(Browning and Browning 2015) to estimate Ne from the distribution of the inferred

12

IBD tracts over the past 150 generations for each of the three Scandinavian

populations. To maximize power, we used the original SNP data for this analysis.

Note that, when SNP data are used, IBDNe estimates are most reliable within the past

50 generations.

Polygenic prediction of height and BMI

We used self-reported height and weight from ~600 students of diverse ethnic

backgrounds to perform polygenic risk prediction of height and body mass index

(BMI) with LDpred (Vilhjálmsson et al. 2015), a summary statistic-based algorithm

that models LD to improve the prediction. As training data for the model, we used

public summary statistics from large genome-wide association studies of adult height

(Wood et al. 2014) and BMI (Locke et al. 2015). We first removed from our data

SNPs with minor allele frequency (MAF) < 0.01, as well as SNPs with a MAF

different from the one reported in the training data by a factor of 0.15. We then

assessed SNP effect under different fractions of causal variants: p = {1, 0.5, 0.2, 0.1,

0.05}, whereby p = 1 corresponds to the infinitesimal model (Visscher et al. 2008).

As an LD reference, we used a subset of 407 unrelated individuals (kinship

coefficient < 0.05) who had all four of their grandparents born in Denmark. Finally,

we validated LDpred’s prediction of height in 578 individuals, as well as the

prediction of BMI in 572 individuals. We calculated R2 – the proportion of

phenotypic variance explained by the predictor – by (i) adjusting for age, sex and the

first ten PCs, and (ii) actually including age, sex and the first ten PCs in the model.

PC adjustment ensures that genetic predictions are not influenced by ancestry.

Results

Principal component analysis

13

To put Danes in a European genetic context, we first ran PCA on 3,858 samples from

across Europe. In particular, we extended a previously published PCA within Europe

(Novembre et al. 2008) to include sizeable samples from Denmark, Norway and

Sweden, as these were underrepresented in the original study. In our PCA, Denmark

was represented by 407 students who had all four of their grandparents born in the

country. Our Danish samples clustered in a geographically meaningful manner along

the first two principal axes, partially overlapping with Norwegians and Swedes, and

showing close genetic proximity to samples from Great Britain, the Netherlands,

Germany and Poland (Figure 2A).

Figure 2. (A) PCA of 105,672 imputed SNPs after merging four datasets: Where Are You From?, POPRES,

NCNG and the Swedish Schizophrenia study without outliers clustering with Finland (total N = 3,858). Per-

country box plots (median and interquartile range) of PC values were added to facilitate interpretation. Whiskers

represent data within 1.5 times the interquartile range. IE: Ireland; ES: Spain; PT: Portugal; GB: Great Britain; FR:

France; BE: Belgium; CH: Switzerland; NL: the Netherlands; DK: Denmark; DE: Germany; NO: Norway; SE:

Sweden; AT: Austria; IT: Italy; PL: Poland; HU: Hungary; CZ: Czech Republic; HR: Croatia; RO: Romania; YU:

Yugoslavia; GR: Greece. (B) PCA of 105,672 imputed SNPs from Denmark, Sweden, Norway and Germany with

emphasis on the six geographic regions of Denmark (total N = 1,168). No clear genetic-geographic relationship

was observed. Ca: Capital; Ze: Zealand; Fu: Funen; SJ: South Jutland; CJ: Central Jutland; NJ: North Jutland. (C)

Correlation of PC1 and PC2 with average grandparent place-of-birth latitude along a 360° clockwise rotation.

Maximum correlation was observed for PC1 at 32° (r ≈ 0.24; p < 10-5).

−0.04 −0.02 0.00 0.02 0.04 0.06PC2

−0.06

−0.04

−0.02

0.00

0.02

0.04

PC1

ATBE

HR

CZ

DK

FR

DE

GR

HU

IE

IT

NL

NO

PL

PT

RO

ES

SE

CH

GB

YU

●●

●●●●●

●●●

●●●

●●●●

●●●

●

GRYU

ROHRCZHUPL

ITATSE

NODE

DKNL

CHBE

FRGB

PTES

IE

●●●

●●●●●●●●

●●●●

●●●●

●●

SENODK

PL IE NLGB

DEBECZ

ATHU

HRCHFR

RO

YUESPT

GR IT

-0.10 -0.05 0.00 0.05 0.10

PC2

-0.05

0.00

0.05

0.10

PC1

32°

CaZeFu

SJ

CJNJ

SENO

DE

°Rotation

corr

. coe

ffici

ent

0 30 90 135 180 210 270 315 360

-0.3

0.0

0.3PC1PC2

Ca

ZeFuSJ

CJ

NJ

B

C

A

14

We then looked for fine-scale genetic structure within Denmark by focusing the

analysis on 237 students who had at least three of their grandparents born in just one

of the following six regions: Capital Region; Zealand; Funen; South, Central and

North Jutland (Figure 2B). These six groups correspond to the five administrative

regions of Denmark (we further split the administrative region of South Denmark into

South Jutland and Funen). After rerunning PCA for Denmark, Sweden, Norway and

Germany alone (N = 1,168), we observed no geographically meaningful clustering of

the 237 Danish samples (Figure 2B). This lack of strong genetic structure was also

supported by the low average FST value between the six regions (FST = 0.0002). To

check whether structure was simply too weak to be visually detected, we calculated

for each Danish sample the average geographic coordinates of their grandparents’

place of birth and regressed the resulting values on PC1 and PC2 eigenvectors. We

repeated the procedure by gradually rotating the map clockwise and found a weak yet

significant correlation between PC1 and latitude along a northwest-southeast axis at

32° (r ≈ 0.24; p < 10-5; Figure 2B-C).

Chromosome painting, population clustering and admixture

To further explore historical genetic interactions between Denmark and neighboring

countries, we ran CHROMOPAINTER, fineSTRUCTURE and GLOBETROTTER

on a subset of the studied European populations. We used 2,745 individuals from 13

European countries (Norway, Sweden, Finland, the Netherlands, Germany, Poland,

Austria, Hungary, France, Belgium, Great Britain, Spain and Portugal) and the 237

individuals from the six geographic regions in Denmark (Figure 1) to quantify and

date European admixture in each of the six Danish groups. We used these six groups

for our analysis because we were unable to observe an alternative clustering:

15

fineSTRUCTURE clustered all Danish samples in a single large group (data not

shown), reflecting again the weak genetic structure in the Danish population.

After running CHROMOPAINTER and fineSTRUCTURE on the European donor

samples, these were organized in the eight major clusters seen in Figure 3A:

Norwegian (NOR), Swedish (SWE), Finnish (FIN), British (BRI), French (FRA),

German (GER), Polish (POL) and Iberian (IBE). There was always one predominant

country in the makeup of each of those clusters, with the exception of IBE, in which

Spain and Portugal were present at almost equal proportions, and GER, in which

samples from Austria, Hungary and the Netherlands were also present at large

numbers. For convenience, we named the eight clusters after the predominant

country/region in each of them. We treated these clusters as ancestry components

(“surrogate populations” in GLOBETROTTER jargon) and used GLOBETROTTER

to define their contribution to each of the six Danish regions and to date the

corresponding admixture events.

Figure 3. (A) fineSTRUCTURE grouping of the 2,745 European donor samples into eight clusters roughly

corresponding to well-defined geographic locations. The tree on the left illustrates cluster topology. Pie charts on

16

the European map show the relative contribution from each country to each of the eight inferred clusters. FIN:

Finnish; NOR: Norwegian; SWE: Swedish; POL: Polish; GER: German; BRI: British; FRA: French; IBE: Iberian.

Color legend at the bottom of the map shows different donor countries. FI: Finland. (B) GLOBETROTTER

admixture proportions of each of the eight European clusters in the six geographic regions of Denmark. Neither

FIN nor IBE made substantial contributions (< 2.5%) to the mixture profiles of Denmark.

Figure 3B and Table S2 show that the Swedish, Norwegian and British clusters made

the most substantial contribution to the ancestry profiles of all six Danish groups,

jointly accounting for 82.05-97.12% of the total admixture. The Scandinavian

component (Swedish and Norwegian clusters together) surprisingly accounted for less

than half of the total admixture (range: 41.69-48.96%), on a par with the British

component (40.37-48.16%), which peaked in South Jutland. Interestingly, the

contribution of the Swedish cluster alone (27.45-30.20%) was almost twice as large as

that of the Norwegian cluster (14.24-18.76%). This difference could be explained by

the reduced landscape connectivity between Norway and Denmark, affecting gene

flow patterns. It is also striking that the German cluster had little genetic influence on

Denmark (3.04-6.78%), despite the proximity and historically fluid borders between

the two countries. However, the latter observation could be due to the marginally

higher genetic affinity of Denmark with Britain, which could result in prioritizing

British over German samples in the initial chromosome-painting step of our analysis.

Similarly, the French component was present at small yet considerable portions (4.16-

5.42%) in all Danish regions except for Funen and South Jutland. Finally, it is worth

noting that there was a small Polish contribution to Zealand (6.28%) and Funen

(5.03%).

We further used GLOBETROTTER in a preliminary bootstrap re-sampling step

within each of the six Danish regions to produce admixture dates. A proportion of

nonsensical dates (i.e. ≤ 1 generation or ≥ 400 generations) above the empirical p-

17

value threshold (0.05) suggests that GLOBETROTTER cannot date reliably the

corresponding admixture event. Following this procedure, we found that we did not

have enough power to date admixture in the Capital region or in Funen (data not

shown). For the remaining four regions, we used GLOBETROTTER’s pairwise co-

ancestry curves (Figures S1-4) to define in more detail the type (i.e. one date vs. one

date multi-way vs. two dates) and actual date of admixture, as well as a tentative

composition of the admixing source populations. In all regions, GLOBETROTTER’s

best fitting admixture model (R2 = 0.136-0.354) involved one admixture event with

more than two sources admixing simultaneously (Table 1). Assuming 30 years per

generation, the date of admixture ranged from ~434 to ~943 years before present.

Notably, the algorithm identified two clusters as the most recurring admixing sources:

one showing highest affinity with GER and one with POL.

Ancestry component analysis

We also used Ohana to provide an independent estimate of individual admixture

proportions in the same set of 12 European countries (without Finland) and six Danish

regions. For this purpose, we assumed that each of the European samples was the

result of admixture between K ancestral populations (i.e. ancestry components). Even

though we ran the analysis for K = {2, …, 9} (Figure S5), we report the results for K

= 4, because this value corresponds to four distinct geographic origins of our samples:

the Iberian Peninsula; Eastern, Central and Nordic Europe. The analysis returned an

admixture pattern that was consistent with geography (i.e. North-South and East-West

clines; Table 2; Figure 4). In particular, we observed (i) a Southern European

component (blue), which was predominant in the Iberian peninsula but was also found

at large proportions in France; (ii) an Eastern European component (yellow), which

was predominant in Poland but was also found at large proportions in neighboring

18

countries (and notably in South and East Denmark, Sweden, Norway and Great

Britain); (iii) a Nordic component (green), which was predominant in Scandinavia

and to less extent in Germany; and (iv) a Central European component (red), at higher

proportions in Scandinavia and the Netherlands but also present in most European

countries (except for Iberia). Within Denmark, we observed that Zealand and South

Denmark had larger membership to the Eastern European component (18.90-20.36%)

compared to Central and North Jutland (~13%), resembling GLOBETROTTER’s

signal of Polish admixture observed in Zealand and Funen (Figure 3B; Table S2). The

Southern and Eastern European components were robust to different choices of K,

whereas the Central and Nordic European components were gradually divided into

smaller fractions as K went up, showing no clear geographic patterns (Figure S5).

Figure 4. Ancestral component analysis of 13

European countries (including six well-

defined geographic regions in Denmark

shown in the inset) assuming K = 4 ancestral

populations. Bar plot at the bottom shows

per-individual membership to each of the four

ancestral components, whereas pie charts on

the map resume per-country (or per-region

for Denmark) admixture proportions. Based

on their preponderance in different parts of

Europe, we interpret the four components as

(i) Southern European (blue); (ii) Eastern

European (yellow); (iii) Nordic (green); and

(iv) Central European (red). Regions from

South and East Denmark show higher

proportion of Eastern European ancestry in

accord with Figure 3B and Table S2.

PT ES FR

BE UK NL

DE AT HU

PT ES FR

BE UK NL

DE AT HU

PT ES FR

BE UK NL

DE AT HU

PT ES FR

BE UK NL

DE AT HU

PT ES FR

BE UK NL

DE AT HU

PT ES FR

BE UK NL

DE AT HU

PT ES FR

BE UK NL

DE AT HU

PT ES FR

BE UK NL

DE AT HU

PT ES FR

BE UK NL

DE AT HU

PL Ca Ze

Fu SJ CJ

NJ NO SE

PL Ca Ze

Fu SJ CJ

NJ NO SE

PL Ca Ze

Fu SJ CJ

NJ NO SE

PT

ES

FR

BE

GB

NL

DE

AT

HU

PL DK NO

SECa Ze Fu SJ CJ NJ

PL Ca Ze

Fu SJ CJ

NJ NO SE

PL Ca Ze

Fu SJ CJ

NJ NO SE

PL Ca Ze

Fu SJ CJ

NJ NO SE

PL Ca Ze

Fu SJ CJ

NJ NO SE

PL Ca Ze

Fu SJ CJ

NJ NO SE

PL Ca Ze

Fu SJ CJ

NJ NO SE

1 16 33 50 67 121 143 105 89 168 190 252 274 296 407 303 325 347 369 428 450 472 197

19

Relatedness and identity by descent

We followed up the evidence for weak genetic structure in Denmark by exploring the

degree and geographic distribution of relatedness among our Danish samples. Using

KING, we found that the vast majority of the Danish students were distantly related

(4th degree or more distant), with only four pairs of individuals showing 2nd or 3rd

degree relationships (apart from the twins and siblings initially removed; Figure S6).

Because of the inherent uncertainty in distinguishing between different degrees of

distantly related individuals, we did not attempt to further stratify the samples by

kinship coefficient.

After establishing that most participants in our sample were either unrelated or

distantly related (Figure S6), we examined DNA segments that were IBD between

pairs of individuals with BEAGLE Refined IBD. Using total genomic length of IBD

tracts as a proxy for relatedness, we traced each individual’s closest “genomic

relative” within Denmark without explicitly quantifying the degree of relationship.

Figure 5A shows the distribution of the geographic distance of each of 399

individuals from their closest genomic relative (pink) and from a randomly chosen

sample (green). The distribution of the distance from the closest relative (pink)

showed an enrichment for very short distances, i.e. less than ~50 km, as well as a

median value of 99.3 km – significantly closer than expected by chance at 131.4 km

(Mann-Whitney U = 60,997; p < 10-8). This points at a weak yet significant tendency

of participants to live close to genetically similar individuals.

20

Figure 5. (A) Distribution of geographic distance of each participant’s place of birth (N = 399) to that of their

closest “genomic relative” (pink), and to that of a randomly chosen sample (green). Genomic relatedness was

defined on the grounds of total genomic IBD. Arrows point at median values of the two distributions (medianrltv =

99.3 Km; medianrand = 131.4 Km). Seven individuals with unknown geographic coordinates were excluded from

this analysis. (B) Plot of rank of genomic relatedness vs. median geographic distance of each participant to their

closest genomic relative. We created 57 equally sized bins of individuals increasingly related to their closest

genomic relative (seven individuals per bin). Alternative bin sizes also produced significantly negative correlations

(data not shown).

To gain more insight into the latter observation, we grouped our Danish samples into

ranked bins by the amount of total genomic IBD shared with their closest genomic

relative (the higher the rank, the closer the relationship). We then calculated the

median geographic distance of each participant to their closest relative within each

bin and regressed this value against bin rank (Figure 5B) to observe a weak yet

significant negative correlation (r ≈ -0.35; p < 0.01). This observation points out that

geographic distance tends to be significantly shorter between Danes who share more

genomic IBD. Interestingly, this signal was not picked up by kinship coefficient (data

not shown).

A

Distance (km)

Density

0 50 100 150 200 250 300

0.000

0.005

0.010

0.015

medianrltv medianrand

closest relativerandom sample

0 10 20 30 40 50 60

0

50

100

150

200

B

Rank of genomic relatednessM

edia

n ge

ogra

phic

dis

tanc

e (K

m)

21

Historical effective population size

Historical Ne showed notable disparity between the three Scandinavian countries

(Figure 6). In particular, point estimates of Ne in Denmark, Norway and Sweden

showed a dramatic ~273-fold, ~262-fold and ~995-fold increase over the past 150

generations (~4,500±300 years), following the general upward European trend

(Browning and Browning 2015). What is more, Sweden’s and Denmark’s Ne

estimates were significantly different from each other (no overlap between the

corresponding 95% confidence intervals). During the High and Late Middle Ages, Ne

in Denmark remained stable, suffering an almost imperceptible decline as a

consequence of the otherwise devastating Black Death and similar patterns were

observed for Norway and Sweden. From the 15th Century on, Ne in Denmark started

to rise again at a moderate rate, whereas Ne in Norway and Sweden rose at an even

higher rate, resulting in Norway’s Ne eventually surpassing that of Denmark.

Interestingly, even though Denmark and Norway currently have very similar census

population sizes (5.614 and 5.084 million, respectively), Norway’s current Ne is 1.76

times larger than Denmark’s (even though not significantly different). Finally, the

upward trend of Denmark’s Ne did not seem to be withheld by other important

epidemics in the history of the country, such as cholera, the Spanish flu or, more

recently, polio (Figure 6).

22

Figure 6. Change in effective population size (Ne) of the Danish, Swedish and Norwegian populations over the

past 150 generations (log-scale). Shaded areas represent the upper and lower bounds of the 95% confidence

intervals after bootstrapping. Uncertainty in generation length is represented by year intervals on the x-axis,

assuming that each generation lasts 30±2 years. Black segments represent major epidemics from the recent history

of the Danish population and are plotted taking into account generation length uncertainty. For more clarity, the

inset shows the same graph with both axes in log-scale.

Polygenic prediction of height and BMI

Polygenic risk prediction in our Danish sample was overall far more accurate for

height than for BMI (Figure S7). In the case of height, we observed maximum

accuracy when assuming infinitesimal genetic architecture and adjusting for age, sex

and ten ancestry PCs – R2 = 0.251±0.031 (Figure S7A) vs. ~0.17 elsewhere (Wood et

al. 2014). When age, sex and the ten PCs were also included in the model, the

prediction rose substantially to 0.639±0.015 (p = 6.57×10-71), a fact reflected in the

strong and significant correlation between real and predicted height (Figure S7C). In

contrast, although the maximum accuracy of BMI prediction was also observed when

0

1000

2000

3000

4000

Year

Ne ×1

,000

Present

1894 - 19061787 - 1813

1467 - 1533934 - 1066

133 BC - 133 AD

2800 - 2200 BC

SwedenNorwayDenmarkPolio (1952)Spanish Flu (1918)Cholera (1853)Black Death (1350)

Year

Ne ×1

,000

23

we adjusted for age, sex and ten PCs, this was overall much poorer than for height –

R2 = 0.106±0.037 (Figure S7B) vs. ~0.065 elsewhere (Locke et al. 2015). In this case,

when age, sex and ten PCs were included in the model, the improvement of accuracy

was not as notable as for height (R2 = 0.195±0.034, p = 5.49×10-18), implying that

BMI is only marginally affected by age and sex (Figure S7D).

Discussion

The most notable observation of our study is the high genetic homogeneity of the

Danish population, possibly reflecting geographically uninterrupted gene flow, further

facilitated by the extended network of sea-based commerce and travelling (Derry

2000). This observation should be appreciated in a historical context, as our analysis

did not account for genetic contributions from recent immigration. Indeed, modern

Danish society accommodates different ethnic and cultural groups (Jensen and

Pedersen 2007), and this was also reflected in our sample, where ~4% of the

participants were born in Afghanistan, China, Ethiopia, Finland, Germany, Greenland,

Iraq, Jordan, Korea, Kosovo, the Netherlands or Zambia. This number went up when

we looked at grandparental origin, where 14.6% of the participants had at least one

grandparent born outside Denmark.

The historical homogeneity of the Danish population is reflected in the lack of PCA

clustering (Figure 2B), as well as in the notably low average FST between the six

geographic regions. To put this observation in context, a previous study was able to

detect subtle population structure along a north-south axis in the Netherlands

(Genome of the Netherlands Consortium 2014), a country of almost identical area to

Denmark’s. Even though we were able to detect significant correlation between PC1

and latitude (Figure 2C), the signal was less compelling than that from the Dutch

24

study. Similarly, a recent study of population structure in Great Britain (Leslie et al.

2015) found that the average pairwise FST estimates between 30 geographic regions

was 0.0007 – 3.5 times higher than the value we report here (0.0002).

To better understand how our FST estimates compare to those from Great Britain, we

calculated average FST values in two British population sets using FST values reported

elsewhere (Leslie et al. 2015). The two British sets included regions with geographic

distances comparable to those in Denmark. The first subset comprised of 13

populations from Central/Southern England with a resulting average FST < 0.0002.

The second subset comprised of nine populations from Northern England with a

resulting average pairwise FST = 0.0003.

In general, the study of admixture within the European continent is confounded by a

well-grounded isolation-by-distance mechanism (Lao et al. 2008; Novembre et al.

2008), as well as an increased historical complexity that renders most admixture

models unrealistically simple (Busby et al. 2015). Denmark is no exception to these

caveats. Even though it is tempting to explain the admixture proportions in Figures 3

and 4 as the result of historical admixing events, an alternative approach is to interpret

such proportions as “mixture profiles”. Bearing this in mind, we see that the mixture

profiles of all six Danish groups comprise two major ancestry components, one

predominant in Scandinavia and the other predominant in Central Europe (Figure 4).

In the GLOBETROTTER analysis, these two components were identified as

admixture contributions from the Swedish/Norwegian and the British clusters (Figure

3B; Table S2). In regard to the British contribution, however, this is actually more

likely to reflect admixture in the opposite direction, i.e. from Denmark to Britain

(Leslie et al. 2015).

25

Even though the mixture profiles of the six regions in Denmark were overall quite

similar, they differed in their membership to the Eastern European cluster. This

particular cluster was more prevalent in South and East Denmark (Figure 4) and was

independently observed in the GLOBETROTTER analysis as a small Polish

contribution to Zealand and Funen (Figure 3B; Table S2). This signal could be due to

historical Wend settlements in the broader area around Lolland in the southernmost

part of Denmark (Derry 2000), an observation that is also supported by our admixture

date estimates (Table 1). However, because similar mixture profiles were also

observed in Sweden and Norway (Figure 4), it is difficult to choose between isolation

by distance and actual admixture.

Given the high variance in the estimated admixture dates, the recurrence of the same

combinations of admixing sources, and the largely unstructured nature of our Danish

sample, we could interpret several of the admixture patterns in Table 1 as different

instances of the same event. For instance, focusing on the two surrogates that

GLOBETROTTER found best tagged by POL and GER, we see that the admixture

between them first occurred in Zealand in the 11th century, spreading to the rest of

peninsular Denmark afterwards. This observation fits historical knowledge about the

march of the Wends towards Denmark, as mentioned above.

A weak signal of population structure was also observed when we studied the

geographic distribution of IBD-based relatedness. The median distance to the closest

genomic relative was significantly smaller than to a randomly chosen sample, but it

represents a minor effect (median difference ≈ 30 km; Figure 5A). In our regression

analysis, we observed that this distance tended to be significantly smaller for pairs of

individuals that were more identical, yet the correlation was overall modest (r ≈ -0.35;

26

Figure 5B). These observations point out that genetic structure indeed exists in

Denmark, even though it is very weak.

It is striking that Denmark, Sweden and Norway – three closely related Scandinavian

countries – seem to differ considerably in their demographic history, as reflected in

their historical Ne trajectories (Figure 6). Indeed, current Sweden-to-Norway census

population size ratio is 1.89 – considerably close to their current Ne ratio (2.24) –

implying that the two populations have had similar reproductive variance and/or

population structure. On the contrary, Denmark’s Ne seems to have increased at a

consistently lower rate after the Middle Ages. This could be reflecting weaker

reproductive dynamics in the Danish population or weaker population structure

compared to Sweden and Norway. Larger sample sizes are warranted to shed more

light on the historical Ne trajectories in Scandinavian populations.

Finally, we found that self-reported adolescent height could be predicted with

remarkable accuracy using essentially nothing but information derived directly

(genotypes) or indirectly (sex and ancestry) from DNA available at birth. When we

combined SNP data with age, sex and PCA information, our predictor could explain

more than half of the total phenotypic variance (63.9%). The remaining unexplained

variance corresponded to a standard deviation of 5.43 cm. This means that, with 95%

confidence, we are at most ~10.65 cm off in our prediction of adolescent height

(Figure S7C). It is worth noting that adding age to the model did not yield significant

improvements to the prediction accuracy of height (data not shown), implying that

adolescent height is a reliable measurement for the validation of the prediction. In

addition, even though height was a self-reported measurement, its strong correlation

with predicted adult height suggests that the students provided reliable personal

information (Figure S7C). As for the poorer prediction of BMI, it is important to

27

remember that data training was carried out in adults, whereas prediction was

validated in adolescents. Therefore, further improvement of the BMI prediction as

subjects advance in age is a possibility.

In conclusion, our analysis showed that, by applying the simple criterion of

participants having all four of their grandparents born in Denmark, we obtained a

largely unstructured sample with very low FST values among different geographic

regions. This high homogeneity has the potential of rendering population structure in

large-scale Danish gene-mapping studies such as iPSYCH (http://ipsych.au.dk/) a

lesser concern, regardless of adjustments such as genomic control, PCA (Price et al.

2006) or mixed models (Yu et al. 2006). We also observed a notably disparate

demographic history in Demark compared to other Scandinavian countries – a fact

that can also be ascribed to the high homogeneity of the Danish population – and that

height in adolescents could be predicted with considerable accuracy. Lastly but not

least importantly, this study stands as an example of how large-scale public

engagement projects can provide mutual benefits for both the general public and the

scientific community through the promotion of scientific knowledge.

Acknowledgements

The authors would like to thank the high school students and their teachers for

participating in the Where Are You From? project, as well as Dagmar Hedvig Fog

Bjerre and Pia Johansson Nielsen for logistic support with organizing and shipping

the DNA kits to the participating schools. Special thanks to Prof. Jes Fabricius Møller

for helping us gain useful insights into the history and demography of the Danish

population. G.A. would like to thank Garrett Hellenthal, Francesco Montinaro,

George Busby, Sharon Browning and Christopher Gignoux for their help and useful

28

discussions. This project and related research were supported from the National

Lottery Funds (Danske Spil), The Danish Ministry of Education and the Centre for

Biocultural History, Aarhus University. B.J.V. was supported by a grant from the

Danish Council for Independent Research (DFF-1325-0014). K.M.H. was supported

by grants from the Swedish Research Council (K2014-62X-21445-05-3 and K2012-

63X-21445-03-2).

Literature Cited

Alexander D. H., Novembre J., Lange K., 2009 Fast model-based estimation of ancestry in unrelated individuals.

Genome Res. 19: 1655–1664.

Athanasiadis G., Jørgensen F. G., Cheng J. Y., Kjærgaard P. C., Schierup M. H., Mailund T., 2016 Spitting for

science: Danish high school students commit to a large-scale self-reported genetic study. PloS One.

Bae H. T., Sebastiani P., Sun J. X., Andersen S. L., Daw E. W., Terracciano A., Ferrucci L., Perls T. T., 2013

Genome-wide association study of personality traits in the long life family study. Front. Genet. 4: 65.

Browning B. L., Browning S. R., 2013a Improving the accuracy and efficiency of identity-by-descent detection in

population data. Genetics 194: 459–471.

Browning B. L., Browning S. R., 2013b Detecting identity by descent and estimating genotype error rates in

sequence data. Am. J. Hum. Genet. 93: 840–851.

Browning S. R., Browning B. L., 2015 Accurate Non-parametric Estimation of Recent Effective Population Size

from Segments of Identity by Descent. Am. J. Hum. Genet. 97: 404–418.

Bryc K., Durand E. Y., Macpherson J. M., Reich D., Mountain J. L., 2015 The genetic ancestry of African

Americans, Latinos, and European Americans across the United States. Am. J. Hum. Genet. 96: 37–53.

Busby G. B. J., Hellenthal G., Montinaro F., Tofanelli S., Bulayeva K., Rudan I., Zemunik T., Hayward C.,

Toncheva D., Karachanak-Yankova S., Nesheva D., Anagnostou P., Cali F., Brisighelli F., Romano V.,

Lefranc G., Buresi C., Ben Chibani J., Haj-Khelil A., Denden S., Ploski R., Krajewski P., Hervig T.,

Moen T., Herrera R. J., Wilson J. F., Myers S., Capelli C., 2015 The Role of Recent Admixture in

Forming the Contemporary West Eurasian Genomic Landscape. Curr. Biol. CB 25: 2518–2526.

29

Chang C. C., Chow C. C., Tellier L. C., Vattikuti S., Purcell S. M., Lee J. J., 2015 Second-generation PLINK:

rising to the challenge of larger and richer datasets. GigaScience 4: 7.

Delaneau O., Marchini J., Zagury J.-F., 2012 A linear complexity phasing method for thousands of genomes. Nat.

Methods 9: 179–181.

Derry T. K., 2000 History of Scandinavia: Norway, Sweden, Denmark, Finland, and Iceland. University of

Minnesota Press.

Devlin B., Roeder K., 1999 Genomic control for association studies. Biometrics 55: 997–1004.

Espeseth T., Christoforou A., Lundervold A. J., Steen V. M., Le Hellard S., Reinvang I., 2012 Imaging and

cognitive genetics: the Norwegian Cognitive NeuroGenetics sample. Twin Res. Hum. Genet. Off. J. Int.

Soc. Twin Stud. 15: 442–452.

Genome of the Netherlands Consortium, 2014 Whole-genome sequence variation, population structure and

demographic history of the Dutch population. Nat. Genet. 46: 818–825.

Green R. E., Krause J., Briggs A. W., Maricic T., Stenzel U., Kircher M., Patterson N., Li H., Zhai W., Fritz M.

H.-Y., Hansen N. F., Durand E. Y., Malaspinas A.-S., Jensen J. D., Marques-Bonet T., Alkan C., Prüfer

K., Meyer M., Burbano H. A., Good J. M., Schultz R., Aximu-Petri A., Butthof A., Höber B., Höffner

B., Siegemund M., Weihmann A., Nusbaum C., Lander E. S., Russ C., Novod N., Affourtit J., Egholm

M., Verna C., Rudan P., Brajkovic D., Kucan Z., Gusic I., Doronichev V. B., Golovanova L. V.,

Lalueza-Fox C., Rasilla M. de la, Fortea J., Rosas A., Schmitz R. W., Johnson P. L. F., Eichler E. E.,

Falush D., Birney E., Mullikin J. C., Slatkin M., Nielsen R., Kelso J., Lachmann M., Reich D., Pääbo S.,

2010 A draft sequence of the Neandertal genome. Science 328: 710–722.

Hellenthal G., Busby G. B. J., Band G., Wilson J. F., Capelli C., Falush D., Myers S., 2014 A genetic atlas of

human admixture history. Science 343: 747–751.

Howie B., Fuchsberger C., Stephens M., Marchini J., Abecasis G. R., 2012 Fast and accurate genotype imputation

in genome-wide association studies through pre-phasing. Nat. Genet. 44: 955–959.

Jakkula E., Rehnström K., Varilo T., Pietiläinen O. P. H., Paunio T., Pedersen N. L., deFaire U., Järvelin M.-R.,

Saharinen J., Freimer N., Ripatti S., Purcell S., Collins A., Daly M. J., Palotie A., Peltonen L., 2008 The

genome-wide patterns of variation expose significant substructure in a founder population. Am. J. Hum.

Genet. 83: 787–794.

30

Jensen P., Pedersen P. J., 2007 To Stay or Not to Stay? Out-Migration of Immigrants from Denmark. Int. Migr. 45:

87–113.

Karakachoff M., Duforet-Frebourg N., Simonet F., Le Scouarnec S., Pellen N., Lecointe S., Charpentier E., Gros

F., Cauchi S., Froguel P., Copin N., D.E.S.I.R. Study Group, Le Tourneau T., Probst V., Le Marec H.,

Molinaro S., Balkau B., Redon R., Schott J.-J., Blum M. G., Dina C., D E S I R Study Group, 2015 Fine-

scale human genetic structure in Western France. Eur. J. Hum. Genet. EJHG 23: 831–836.

Laaksovirta H., Peuralinna T., Schymick J. C., Scholz S. W., Lai S.-L., Myllykangas L., Sulkava R., Jansson L.,

Hernandez D. G., Gibbs J. R., Nalls M. A., Heckerman D., Tienari P. J., Traynor B. J., 2010

Chromosome 9p21 in amyotrophic lateral sclerosis in Finland: a genome-wide association study. Lancet

Neurol. 9: 978–985.

Lao O., Lu T. T., Nothnagel M., Junge O., Freitag-Wolf S., Caliebe A., Balascakova M., Bertranpetit J., Bindoff L.

A., Comas D., Holmlund G., Kouvatsi A., Macek M., Mollet I., Parson W., Palo J., Ploski R., Sajantila

A., Tagliabracci A., Gether U., Werge T., Rivadeneira F., Hofman A., Uitterlinden A. G., Gieger C.,

Wichmann H.-E., Rüther A., Schreiber S., Becker C., Nürnberg P., Nelson M. R., Krawczak M., Kayser

M., 2008 Correlation between genetic and geographic structure in Europe. Curr. Biol. CB 18: 1241–

1248.

Larsen M. H., Albrechtsen A., Thørner L. W., Werge T., Hansen T., Gether U., Haastrup E., Ullum H., 2013

Genome-Wide Association Study of Genetic Variants in LPS-Stimulated IL-6, IL-8, IL-10, IL-1ra and

TNF-α Cytokine Response in a Danish Cohort. PloS One 8: e66262.

Lawson D. J., Hellenthal G., Myers S., Falush D., 2012 Inference of population structure using dense haplotype

data. PLoS Genet. 8: e1002453.

Leslie S., Winney B., Hellenthal G., Davison D., Boumertit A., Day T., Hutnik K., Royrvik E. C., Cunliffe B.,

Wellcome Trust Case Control Consortium 2, International Multiple Sclerosis Genetics Consortium,

Lawson D. J., Falush D., Freeman C., Pirinen M., Myers S., Robinson M., Donnelly P., Bodmer W.,

2015 The fine-scale genetic structure of the British population. Nature 519: 309–314.

Li J. Z., Absher D. M., Tang H., Southwick A. M., Casto A. M., Ramachandran S., Cann H. M., Barsh G. S.,

Feldman M., Cavalli-Sforza L. L., Myers R. M., 2008 Worldwide human relationships inferred from

genome-wide patterns of variation. Science 319: 1100–1104.

31

Locke A. E., Kahali B., Berndt S. I., Justice A. E., Pers T. H., Day F. R., Powell C., Vedantam S., Buchkovich M.

L., Yang J., Croteau-Chonka D. C., Esko T., Fall T., Ferreira T., Gustafsson S., Kutalik Z., Luan J. ’an,

Mägi R., Randall J. C., Winkler T. W., Wood A. R., Workalemahu T., Faul J. D., Smith J. A., Hua Zhao

J., Zhao W., Chen J., Fehrmann R., Hedman Å. K., Karjalainen J., Schmidt E. M., Absher D., Amin N.,

Anderson D., Beekman M., Bolton J. L., Bragg-Gresham J. L., Buyske S., Demirkan A., Deng G., Ehret

G. B., Feenstra B., Feitosa M. F., Fischer K., Goel A., Gong J., Jackson A. U., Kanoni S., Kleber M. E.,

Kristiansson K., Lim U., Lotay V., Mangino M., Mateo Leach I., Medina-Gomez C., Medland S. E.,

Nalls M. A., Palmer C. D., Pasko D., Pechlivanis S., Peters M. J., Prokopenko I., Shungin D.,

Stančáková A., Strawbridge R. J., Ju Sung Y., Tanaka T., Teumer A., Trompet S., Laan S. W. van der,

Setten J. van, Van Vliet-Ostaptchouk J. V., Wang Z., Yengo L., Zhang W., Isaacs A., Albrecht E.,

Ärnlöv J., Arscott G. M., Attwood A. P., Bandinelli S., Barrett A., Bas I. N., Bellis C., Bennett A. J.,

Berne C., Blagieva R., Blüher M., Böhringer S., Bonnycastle L. L., Böttcher Y., Boyd H. A.,

Bruinenberg M., Caspersen I. H., Ida Chen Y.-D., Clarke R., Daw E. W., Craen A. J. M. de, Delgado G.,

Dimitriou M., Doney A. S. F., Eklund N., Estrada K., Eury E., Folkersen L., Fraser R. M., Garcia M. E.,

Geller F., Giedraitis V., Gigante B., Go A. S., Golay A., Goodall A. H., Gordon S. D., Gorski M., Grabe

H.-J., Grallert H., Grammer T. B., Gräßler J., Grönberg H., Groves C. J., Gusto G., Haessler J., Hall P.,

Haller T., Hallmans G., Hartman C. A., Hassinen M., Hayward C., Heard-Costa N. L., Helmer Q.,

Hengstenberg C., Holmen O., Hottenga J.-J., James A. L., Jeff J. M., Johansson Å., Jolley J., Juliusdottir

T., Kinnunen L., Koenig W., Koskenvuo M., Kratzer W., Laitinen J., Lamina C., Leander K., Lee N. R.,

Lichtner P., Lind L., Lindström J., Sin Lo K., Lobbens S., Lorbeer R., Lu Y., Mach F., Magnusson P. K.

E., Mahajan A., McArdle W. L., McLachlan S., Menni C., Merger S., Mihailov E., Milani L., Moayyeri

A., Monda K. L., Morken M. A., Mulas A., Müller G., Müller-Nurasyid M., Musk A. W., Nagaraja R.,

Nöthen M. M., Nolte I. M., Pilz S., Rayner N. W., Renstrom F., Rettig R., Ried J. S., Ripke S.,

Robertson N. R., Rose L. M., Sanna S., Scharnagl H., Scholtens S., Schumacher F. R., Scott W. R.,

Seufferlein T., Shi J., Vernon Smith A., Smolonska J., Stanton A. V., Steinthorsdottir V., Stirrups K.,

Stringham H. M., Sundström J., Swertz M. A., Swift A. J., Syvänen A.-C., Tan S.-T., Tayo B. O.,

Thorand B., Thorleifsson G., Tyrer J. P., Uh H.-W., Vandenput L., Verhulst F. C., Vermeulen S. H.,

Verweij N., Vonk J. M., Waite L. L., Warren H. R., Waterworth D., Weedon M. N., Wilkens L. R.,

Willenborg C., Wilsgaard T., Wojczynski M. K., Wong A., Wright A. F., Zhang Q., LifeLines Cohort

Study, Brennan E. P., Choi M., Dastani Z., Drong A. W., Eriksson P., Franco-Cereceda A., Gådin J. R.,

Gharavi A. G., Goddard M. E., Handsaker R. E., Huang J., Karpe F., Kathiresan S., Keildson S., Kiryluk

K., Kubo M., Lee J.-Y., Liang L., Lifton R. P., Ma B., McCarroll S. A., McKnight A. J., Min J. L.,

Moffatt M. F., Montgomery G. W., Murabito J. M., Nicholson G., Nyholt D. R., Okada Y., Perry J. R.

B., Dorajoo R., Reinmaa E., Salem R. M., Sandholm N., Scott R. A., Stolk L., Takahashi A., Tanaka T.,

32

Van’t Hooft F. M., Vinkhuyzen A. A. E., Westra H.-J., Zheng W., Zondervan K. T., ADIPOGen

Consortium, AGEN-BMI Working Group, CARDIOGRAMplusC4D Consortium, CKDGen

Consortium, GLGC, ICBP, MAGIC Investigators, MuTHER Consortium, MIGen Consortium, PAGE

Consortium, ReproGen Consortium, GENIE Consortium, International Endogene Consortium, Heath A.

C., Arveiler D., Bakker S. J. L., Beilby J., Bergman R. N., Blangero J., Bovet P., Campbell H., Caulfield

M. J., Cesana G., Chakravarti A., Chasman D. I., Chines P. S., Collins F. S., Crawford D. C., Cupples L.

A., Cusi D., Danesh J., Faire U. de, Ruijter H. M. den, Dominiczak A. F., Erbel R., Erdmann J., Eriksson

J. G., Farrall M., Felix S. B., Ferrannini E., Ferrières J., Ford I., Forouhi N. G., Forrester T., Franco O.

H., Gansevoort R. T., Gejman P. V., Gieger C., Gottesman O., Gudnason V., Gyllensten U., Hall A. S.,

Harris T. B., Hattersley A. T., Hicks A. A., Hindorff L. A., Hingorani A. D., Hofman A., Homuth G.,

Hovingh G. K., Humphries S. E., Hunt S. C., Hyppönen E., Illig T., Jacobs K. B., Jarvelin M.-R., Jöckel

K.-H., Johansen B., Jousilahti P., Jukema J. W., Jula A. M., Kaprio J., Kastelein J. J. P., Keinanen-

Kiukaanniemi S. M., Kiemeney L. A., Knekt P., Kooner J. S., Kooperberg C., Kovacs P., Kraja A. T.,

Kumari M., Kuusisto J., Lakka T. A., Langenberg C., Le Marchand L., Lehtimäki T., Lyssenko V.,

Männistö S., Marette A., Matise T. C., McKenzie C. A., McKnight B., Moll F. L., Morris A. D., Morris

A. P., Murray J. C., Nelis M., Ohlsson C., Oldehinkel A. J., Ong K. K., Madden P. A. F., Pasterkamp G.,

Peden J. F., Peters A., Postma D. S., Pramstaller P. P., Price J. F., Qi L., Raitakari O. T., Rankinen T.,

Rao D. C., Rice T. K., Ridker P. M., Rioux J. D., Ritchie M. D., Rudan I., Salomaa V., Samani N. J.,

Saramies J., Sarzynski M. A., Schunkert H., Schwarz P. E. H., Sever P., Shuldiner A. R., Sinisalo J.,

Stolk R. P., Strauch K., Tönjes A., Trégouët D.-A., Tremblay A., Tremoli E., Virtamo J., Vohl M.-C.,

Völker U., Waeber G., Willemsen G., Witteman J. C., Zillikens M. C., Adair L. S., Amouyel P.,

Asselbergs F. W., Assimes T. L., Bochud M., Boehm B. O., Boerwinkle E., Bornstein S. R., Bottinger E.

P., Bouchard C., Cauchi S., Chambers J. C., Chanock S. J., Cooper R. S., Bakker P. I. W. de, Dedoussis

G., Ferrucci L., Franks P. W., Froguel P., Groop L. C., Haiman C. A., Hamsten A., Hui J., Hunter D. J.,

Hveem K., Kaplan R. C., Kivimaki M., Kuh D., Laakso M., Liu Y., Martin N. G., März W., Melbye M.,

Metspalu A., Moebus S., Munroe P. B., Njølstad I., Oostra B. A., Palmer C. N. A., Pedersen N. L.,

Perola M., Pérusse L., Peters U., Power C., Quertermous T., Rauramaa R., Rivadeneira F., Saaristo T.

E., Saleheen D., Sattar N., Schadt E. E., Schlessinger D., Slagboom P. E., Snieder H., Spector T. D.,

Thorsteinsdottir U., Stumvoll M., Tuomilehto J., Uitterlinden A. G., Uusitupa M., Harst P. van der,

Walker M., Wallaschofski H., Wareham N. J., Watkins H., Weir D. R., Wichmann H.-E., Wilson J. F.,

Zanen P., Borecki I. B., Deloukas P., Fox C. S., Heid I. M., O’Connell J. R., Strachan D. P., Stefansson

K., Duijn C. M. van, Abecasis G. R., Franke L., Frayling T. M., McCarthy M. I., Visscher P. M.,

Scherag A., Willer C. J., Boehnke M., Mohlke K. L., Lindgren C. M., Beckmann J. S., Barroso I., North

33

K. E., Ingelsson E., Hirschhorn J. N., Loos R. J. F., Speliotes E. K., 2015 Genetic studies of body mass

index yield new insights for obesity biology. Nature 518: 197–206.

Manichaikul A., Mychaleckyj J. C., Rich S. S., Daly K., Sale M., Chen W.-M., 2010 Robust relationship inference

in genome-wide association studies. Bioinforma. Oxf. Engl. 26: 2867–2873.

Montinaro F., Busby G. B. J., Pascali V. L., Myers S., Hellenthal G., Capelli C., 2015 Unravelling the hidden

ancestry of American admixed populations. Nat. Commun. 6: 6596.

Moreno-Estrada A., Gignoux C. R., Fernández-López J. C., Zakharia F., Sikora M., Contreras A. V., Acuña-

Alonzo V., Sandoval K., Eng C., Romero-Hidalgo S., Ortiz-Tello P., Robles V., Kenny E. E., Nuño-

Arana I., Barquera-Lozano R., Macín-Pérez G., Granados-Arriola J., Huntsman S., Galanter J. M., Via

M., Ford J. G., Chapela R., Rodriguez-Cintron W., Rodríguez-Santana J. R., Romieu I., Sienra-Monge J.

J., Rio Navarro B. del, London S. J., Ruiz-Linares A., Garcia-Herrera R., Estrada K., Hidalgo-Miranda

A., Jimenez-Sanchez G., Carnevale A., Soberón X., Canizales-Quinteros S., Rangel-Villalobos H.,

Silva-Zolezzi I., Burchard E. G., Bustamante C. D., 2014 Human genetics. The genetics of Mexico

recapitulates Native American substructure and affects biomedical traits. Science 344: 1280–1285.

Moreno-Mayar J. V., Rasmussen S., Seguin-Orlando A., Rasmussen M., Liang M., Flåm S. T., Lie B. A., Gilfillan

G. D., Nielsen R., Thorsby E., Willerslev E., Malaspinas A.-S., 2014 Genome-wide ancestry patterns in

Rapanui suggest pre-European admixture with Native Americans. Curr. Biol. CB 24: 2518–2525.

Nelson M. R., Bryc K., King K. S., Indap A., Boyko A. R., Novembre J., Briley L. P., Maruyama Y., Waterworth

D. M., Waeber G., Vollenweider P., Oksenberg J. R., Hauser S. L., Stirnadel H. A., Kooner J. S.,

Chambers J. C., Jones B., Mooser V., Bustamante C. D., Roses A. D., Burns D. K., Ehm M. G., Lai E.

H., 2008 The Population Reference Sample, POPRES: a resource for population, disease, and

pharmacological genetics research. Am. J. Hum. Genet. 83: 347–358.

Novembre J., Johnson T., Bryc K., Kutalik Z., Boyko A. R., Auton A., Indap A., King K. S., Bergmann S., Nelson

M. R., Stephens M., Bustamante C. D., 2008 Genes mirror geography within Europe. Nature 456: 98–

101.

Price A. L., Patterson N. J., Plenge R. M., Weinblatt M. E., Shadick N. A., Reich D., 2006 Principal components

analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38: 904–909.

34

Price A. L., Helgason A., Palsson S., Stefansson H., St Clair D., Andreassen O. A., Reich D., Kong A., Stefansson

K., 2009 The impact of divergence time on the nature of population structure: an example from Iceland.

PLoS Genet. 5: e1000505.

Pritchard J. K., Stephens M., Donnelly P., 2000 Inference of population structure using multilocus genotype data.

Genetics 155: 945–959.

Reich D., Thangaraj K., Patterson N., Price A. L., Singh L., 2009 Reconstructing Indian population history. Nature

461: 489–494.

Reich D., Patterson N., Kircher M., Delfin F., Nandineni M. R., Pugach I., Ko A. M.-S., Ko Y.-C., Jinam T. A.,

Phipps M. E., Saitou N., Wollstein A., Kayser M., Pääbo S., Stoneking M., 2011 Denisova admixture

and the first modern human dispersals into Southeast Asia and Oceania. Am. J. Hum. Genet. 89: 516–

528.

Ripke S., O’Dushlaine C., Chambert K., Moran J. L., Kähler A. K., Akterin S., Bergen S. E., Collins A. L.,

Crowley J. J., Fromer M., Kim Y., Lee S. H., Magnusson P. K. E., Sanchez N., Stahl E. A., Williams S.,

Wray N. R., Xia K., Bettella F., Borglum A. D., Bulik-Sullivan B. K., Cormican P., Craddock N., Leeuw

C. de, Durmishi N., Gill M., Golimbet V., Hamshere M. L., Holmans P., Hougaard D. M., Kendler K. S.,

Lin K., Morris D. W., Mors O., Mortensen P. B., Neale B. M., O’Neill F. A., Owen M. J., Milovancevic

M. P., Posthuma D., Powell J., Richards A. L., Riley B. P., Ruderfer D., Rujescu D., Sigurdsson E.,

Silagadze T., Smit A. B., Stefansson H., Steinberg S., Suvisaari J., Tosato S., Verhage M., Walters J. T.,

Multicenter Genetic Studies of Schizophrenia Consortium, Levinson D. F., Gejman P. V., Kendler K. S.,

Laurent C., Mowry B. J., O’Donovan M. C., Owen M. J., Pulver A. E., Riley B. P., Schwab S. G.,

Wildenauer D. B., Dudbridge F., Holmans P., Shi J., Albus M., Alexander M., Campion D., Cohen D.,

Dikeos D., Duan J., Eichhammer P., Godard S., Hansen M., Lerer F. B., Liang K.-Y., Maier W., Mallet

J., Nertney D. A., Nestadt G., Norton N., O’Neill F. A., Papadimitriou G. N., Ribble R., Sanders A. R.,

Silverman J. M., Walsh D., Williams N. M., Wormley B., Psychosis Endophenotypes International

Consortium, Arranz M. J., Bakker S., Bender S., Bramon E., Collier D., Crespo-Facorro B., Hall J.,

Iyegbe C., Jablensky A., Kahn R. S., Kalaydjieva L., Lawrie S., Lewis C. M., Lin K., Linszen D. H.,

Mata I., McIntosh A., Murray R. M., Ophoff R. A., Powell J., Rujescu D., Van Os J., Walshe M.,

Weisbrod M., Wiersma D., Wellcome Trust Case Control Consortium 2, Donnelly P., Barroso I.,

Blackwell J. M., Bramon E., Brown M. A., Casas J. P., Corvin A. P., Deloukas P., Duncanson A.,

Jankowski J., Markus H. S., Mathew C. G., Palmer C. N. A., Plomin R., Rautanen A., Sawcer S. J.,

Trembath R. C., Viswanathan A. C., Wood N. W., Spencer C. C. A., Band G., Bellenguez C., Freeman

35

C., Hellenthal G., Giannoulatou E., Pirinen M., Pearson R. D., Strange A., Su Z., Vukcevic D., Donnelly

P., Langford C., Hunt S. E., Edkins S., Gwilliam R., Blackburn H., Bumpstead S. J., Dronov S., Gillman

M., Gray E., Hammond N., Jayakumar A., McCann O. T., Liddle J., Potter S. C., Ravindrarajah R.,

Ricketts M., Tashakkori-Ghanbaria A., Waller M. J., Weston P., Widaa S., Whittaker P., Barroso I.,

Deloukas P., Mathew C. G., Blackwell J. M., Brown M. A., Corvin A. P., McCarthy M. I., Spencer C. C.

A., Bramon E., Corvin A. P., O’Donovan M. C., Stefansson K., Scolnick E., Purcell S., McCarroll S. A.,

Sklar P., Hultman C. M., Sullivan P. F., 2013 Genome-wide association analysis identifies 13 new risk

loci for schizophrenia. Nat. Genet. 45: 1150–1159.

Tang H., Peng J., Wang P., Risch N. J., 2005 Estimation of individual admixture: analytical and study design

considerations. Genet. Epidemiol. 28: 289–301.

Vilhjálmsson B. J., Yang J., Finucane H. K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia

G., Do R., Hayeck T., Won H.-H., Schizophrenia Working Group of the Psychiatric Genomics

Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study,

Kathiresan S., Pato M., Pato C., Tamimi R., Stahl E., Zaitlen N., Pasaniuc B., Belbin G., Kenny E. E.,

Schierup M. H., De Jager P., Patsopoulos N. A., McCarroll S., Daly M., Purcell S., Chasman D., Neale

B., Goddard M., Visscher P. M., Kraft P., Patterson N., Price A. L., Discovery Biology and Risk of

Inherited Variants in Breast Cancer DRIVE study, 2015 Modeling Linkage Disequilibrium Increases

Accuracy of Polygenic Risk Scores. Am. J. Hum. Genet. 97: 576–592.

Visscher P. M., Hill W. G., Wray N. R., 2008 Heritability in the genomics era--concepts and misconceptions. Nat.

Rev. Genet. 9: 255–266.

Welter D., MacArthur J., Morales J., Burdett T., Hall P., Junkins H., Klemm A., Flicek P., Manolio T., Hindorff

L., Parkinson H., 2014 The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Nucleic Acids Res. 42: D1001–1006.

Wood A. R., Esko T., Yang J., Vedantam S., Pers T. H., Gustafsson S., Chu A. Y., Estrada K., Luan J. ’an, Kutalik

Z., Amin N., Buchkovich M. L., Croteau-Chonka D. C., Day F. R., Duan Y., Fall T., Fehrmann R.,

Ferreira T., Jackson A. U., Karjalainen J., Lo K. S., Locke A. E., Mägi R., Mihailov E., Porcu E.,

Randall J. C., Scherag A., Vinkhuyzen A. A. E., Westra H.-J., Winkler T. W., Workalemahu T., Zhao J.

H., Absher D., Albrecht E., Anderson D., Baron J., Beekman M., Demirkan A., Ehret G. B., Feenstra B.,

Feitosa M. F., Fischer K., Fraser R. M., Goel A., Gong J., Justice A. E., Kanoni S., Kleber M. E.,

Kristiansson K., Lim U., Lotay V., Lui J. C., Mangino M., Mateo Leach I., Medina-Gomez C., Nalls M.

A., Nyholt D. R., Palmer C. D., Pasko D., Pechlivanis S., Prokopenko I., Ried J. S., Ripke S., Shungin

36

D., Stancáková A., Strawbridge R. J., Sung Y. J., Tanaka T., Teumer A., Trompet S., Laan S. W. van

der, Setten J. van, Van Vliet-Ostaptchouk J. V., Wang Z., Yengo L., Zhang W., Afzal U., Arnlöv J.,

Arscott G. M., Bandinelli S., Barrett A., Bellis C., Bennett A. J., Berne C., Blüher M., Bolton J. L.,

Böttcher Y., Boyd H. A., Bruinenberg M., Buckley B. M., Buyske S., Caspersen I. H., Chines P. S.,

Clarke R., Claudi-Boehm S., Cooper M., Daw E. W., De Jong P. A., Deelen J., Delgado G., Denny J. C.,

Dhonukshe-Rutten R., Dimitriou M., Doney A. S. F., Dörr M., Eklund N., Eury E., Folkersen L., Garcia

M. E., Geller F., Giedraitis V., Go A. S., Grallert H., Grammer T. B., Gräßler J., Grönberg H., Groot L.

C. P. G. M. de, Groves C. J., Haessler J., Hall P., Haller T., Hallmans G., Hannemann A., Hartman C.

A., Hassinen M., Hayward C., Heard-Costa N. L., Helmer Q., Hemani G., Henders A. K., Hillege H. L.,

Hlatky M. A., Hoffmann W., Hoffmann P., Holmen O., Houwing-Duistermaat J. J., Illig T., Isaacs A.,

James A. L., Jeff J., Johansen B., Johansson Å., Jolley J., Juliusdottir T., Junttila J., Kho A. N.,

Kinnunen L., Klopp N., Kocher T., Kratzer W., Lichtner P., Lind L., Lindström J., Lobbens S.,

Lorentzon M., Lu Y., Lyssenko V., Magnusson P. K. E., Mahajan A., Maillard M., McArdle W. L.,

McKenzie C. A., McLachlan S., McLaren P. J., Menni C., Merger S., Milani L., Moayyeri A., Monda K.

L., Morken M. A., Müller G., Müller-Nurasyid M., Musk A. W., Narisu N., Nauck M., Nolte I. M.,

Nöthen M. M., Oozageer L., Pilz S., Rayner N. W., Renstrom F., Robertson N. R., Rose L. M., Roussel

R., Sanna S., Scharnagl H., Scholtens S., Schumacher F. R., Schunkert H., Scott R. A., Sehmi J.,

Seufferlein T., Shi J., Silventoinen K., Smit J. H., Smith A. V., Smolonska J., Stanton A. V., Stirrups K.,

Stott D. J., Stringham H. M., Sundström J., Swertz M. A., Syvänen A.-C., Tayo B. O., Thorleifsson G.,

Tyrer J. P., Dijk S. van, Schoor N. M. van, Velde N. van der, Heemst D. van, Oort F. V. A. van,

Vermeulen S. H., Verweij N., Vonk J. M., Waite L. L., Waldenberger M., Wennauer R., Wilkens L. R.,

Willenborg C., Wilsgaard T., Wojczynski M. K., Wong A., Wright A. F., Zhang Q., Arveiler D., Bakker

S. J. L., Beilby J., Bergman R. N., Bergmann S., Biffar R., Blangero J., Boomsma D. I., Bornstein S. R.,

Bovet P., Brambilla P., Brown M. J., Campbell H., Caulfield M. J., Chakravarti A., Collins R., Collins F.

S., Crawford D. C., Cupples L. A., Danesh J., Faire U. de, Ruijter H. M. den, Erbel R., Erdmann J.,

Eriksson J. G., Farrall M., Ferrannini E., Ferrières J., Ford I., Forouhi N. G., Forrester T., Gansevoort R.

T., Gejman P. V., Gieger C., Golay A., Gottesman O., Gudnason V., Gyllensten U., Haas D. W., Hall A.

S., Harris T. B., Hattersley A. T., Heath A. C., Hengstenberg C., Hicks A. A., Hindorff L. A., Hingorani

A. D., Hofman A., Hovingh G. K., Humphries S. E., Hunt S. C., Hypponen E., Jacobs K. B., Jarvelin

M.-R., Jousilahti P., Jula A. M., Kaprio J., Kastelein J. J. P., Kayser M., Kee F., Keinanen-Kiukaanniemi

S. M., Kiemeney L. A., Kooner J. S., Kooperberg C., Koskinen S., Kovacs P., Kraja A. T., Kumari M.,

Kuusisto J., Lakka T. A., Langenberg C., Le Marchand L., Lehtimäki T., Lupoli S., Madden P. A. F.,

Männistö S., Manunta P., Marette A., Matise T. C., McKnight B., Meitinger T., Moll F. L., Montgomery

G. W., Morris A. D., Morris A. P., Murray J. C., Nelis M., Ohlsson C., Oldehinkel A. J., Ong K. K.,

37

Ouwehand W. H., Pasterkamp G., Peters A., Pramstaller P. P., Price J. F., Qi L., Raitakari O. T.,

Rankinen T., Rao D. C., Rice T. K., Ritchie M., Rudan I., Salomaa V., Samani N. J., Saramies J.,

Sarzynski M. A., Schwarz P. E. H., Sebert S., Sever P., Shuldiner A. R., Sinisalo J., Steinthorsdottir V.,

Stolk R. P., Tardif J.-C., Tönjes A., Tremblay A., Tremoli E., Virtamo J., Vohl M.-C., Electronic

Medical Records and Genomics (eMEMERGEGE) Consortium, MIGen Consortium, PAGEGE

Consortium, LifeLines Cohort Study, Amouyel P., Asselbergs F. W., Assimes T. L., Bochud M., Boehm

B. O., Boerwinkle E., Bottinger E. P., Bouchard C., Cauchi S., Chambers J. C., Chanock S. J., Cooper R.

S., Bakker P. I. W. de, Dedoussis G., Ferrucci L., Franks P. W., Froguel P., Groop L. C., Haiman C. A.,

Hamsten A., Hayes M. G., Hui J., Hunter D. J., Hveem K., Jukema J. W., Kaplan R. C., Kivimaki M.,

Kuh D., Laakso M., Liu Y., Martin N. G., März W., Melbye M., Moebus S., Munroe P. B., Njølstad I.,

Oostra B. A., Palmer C. N. A., Pedersen N. L., Perola M., Pérusse L., Peters U., Powell J. E., Power C.,

Quertermous T., Rauramaa R., Reinmaa E., Ridker P. M., Rivadeneira F., Rotter J. I., Saaristo T. E.,

Saleheen D., Schlessinger D., Slagboom P. E., Snieder H., Spector T. D., Strauch K., Stumvoll M.,

Tuomilehto J., Uusitupa M., Harst P. van der, Völzke H., Walker M., Wareham N. J., Watkins H.,

Wichmann H.-E., Wilson J. F., Zanen P., Deloukas P., Heid I. M., Lindgren C. M., Mohlke K. L.,

Speliotes E. K., Thorsteinsdottir U., Barroso I., Fox C. S., North K. E., Strachan D. P., Beckmann J. S.,

Berndt S. I., Boehnke M., Borecki I. B., McCarthy M. I., Metspalu A., Stefansson K., Uitterlinden A. G.,

Duijn C. M. van, Franke L., Willer C. J., Price A. L., Lettre G., Loos R. J. F., Weedon M. N., Ingelsson

E., O’Connell J. R., Abecasis G. R., Chasman D. I., Goddard M. E., Visscher P. M., Hirschhorn J. N.,

Frayling T. M., 2014 Defining the role of common variation in the genomic and biological architecture

of adult human height. Nat. Genet. 46: 1173–1186.

Yu J., Pressoir G., Briggs W. H., Vroh Bi I., Yamasaki M., Doebley J. F., McMullen M. D., Gaut B. S., Nielsen D.

M., Holland J. B., Kresovich S., Buckler E. S., 2006 A unified mixed-model method for association

mapping that accounts for multiple levels of relatedness. Nat. Genet. 38: 203–208.

Supplementary Figure Legends

Figure S1. Co-ancestry curves for all pairs of surrogate populations identified in

Zealand (Figure 3B; Table S2).


South Jutland (Figure 3B; Table S2).

38


Central Jutland (Figure 3B; Table S2).


North Jutland (Figure 3B; Table S2).

Figure S5. Ancestral component analysis of 13 European countries (including six

well-defined geographic regions in Denmark shown in the inset) assuming K = {2, …,

9} ancestral populations.

Figure S6. Proportion of zero IBS sharing vs. kinship coefficient for 82,215 pairs of

individuals in a sample of 406 high school students who had all four of their

grandparents born in Denmark. Following KING’s guidelines, an estimated kinship

coefficient range (i) > 0.354, (ii) [0.177, 0.354], (iii) [0.0884, 0.177], (iv) [0.0442,

0.0884] and (v) [0, 0.0221] was considered to correspond to duplicate/MZ twin, 1st-

degree, 2nd-degree, 3rd-degree and 4th-degree (or further) relationships respectively.

The vast majority of pairs were either unrelated (negative kinship coefficient in the

grey-shaded part of the plot) or distantly related (kinship ∈ (0, 0.0221) in the red-

shaded part of the plot).

Figure S7. Prediction accuracy of LDpred for height and BMI when age, sex and

PCA information were (i) adjusted for (A and C), and (ii) included (B and D) in the

model.

39

Table 1. Summary of GLOBETROTTER’s inference of the date of admixture and

composition of admixing sources for each of four geographic regions of Denmark

Admixture event #1 Admixture event #2

Region Best fitting type

of admixture

Goodness

of fit (R2)

Years before

present [95% CI]

Year Minor

component

Major

component

Ratio Minor

component

Major

component

Ratio

Zealand

(N = 36)

1 date multi-way 0.208 943 [579, 1488] 1052 BRI SWE 0.47:0.53 GER POL 0.47:0.53

S.Jutland

(N = 37)

1 date multi-way 0.136 756 [232, 1548] 1239 FRA POL 0.21:0.79 GER POL 0.43:0.57

C.Jutland

(N = 89)

1 date multi-way 0.354 611 [418, 823] 1384 BRI POL 0.31:0.69 POL GER 0.45:0.55

N.Jutland

(N = 48)

1 date multi-way 0.239 434 [118, 741] 1561 BRI POL 0.33:0.67 GER POL 0.42:0.58

Type of admixture was chosen among three alternatives: (i) one date; (ii) one date

multi-way; (ii) two or more dates. Multi-way admixture involves two admixture

events. If a source population appears in both admixture events #1 and #2, then the

interpretation is “three-way admixture” (for more details see Busby et al. 2015). R2:

goodness of fit for a single date of admixture, taking the maximum value across all

inferred co-ancestry curves (R2 ≥ 0.2 corresponds to good fit). GLOBETROTTER

returns date estimates in generations. We multiplied these estimates by 30

years/generation. We retrieved historical dates by subtracting year estimates from

1995 (the median year of birth for our high school students). CI: confidence interval.

40

Table 2: Percentage of membership of 13 European countries (including six well-

defined geographic regions in Denmark shown in the inset) to each of K = 4 ancestral

populations

Ancestral component

Country/region % Southern (blue*)

% Eastern (yellow*) % Central (red*) % Nordic (green*)

PT 72.43 8.59 12.15 6.83

ES 78.32 3.57 9.25 8.86

FR 45.70 15.19 19.23 19.88

BE 35.64 14.06 20.84 29.46

GB 30.08 32.28 15.20 22.45

NL 24.88 20.06 23.62 31.44

DE 23.10 17.22 22.40 37.28

AT 20.28 17.44 19.88 42.39

HU 29.41 49.76 11.83 9.00

PL 3.59 76.19 9.83 10.38

Ca 5.39 21.27 24.94 48.40

Ze 7.73 18.90 25.67 47.71

Fu 6.51 20.36 28.28 44.85

SJ 5.94 19.49 28.94 45.64

CJ 7.34 13.35 30.30 49.01

NJ 5.82 13.23 32.35 48.60

NO 8.56 13.24 30.91 47.29

SE 4.98 17.39 27.27 50.35

*This table complements Figure 4; country/region abbreviations are explained in

Figure 2.

nationwide genomic study in denmark reveals remarkable … › content › genetics › early ›...

Documents