nationwide genomic study in denmark reveals remarkable … › content › genetics › early ›...
TRANSCRIPT
1
Nationwide Genomic Study in Denmark Reveals Remarkable
Population Homogeneity
Georgios Athanasiadis,*,†,1 Jade Y. Cheng,* Bjarni J. Vilhjálmsson,*,‡ Frank G.
Jørgensen,§ Thomas D. Als,**,††,‡‡ Stephanie Le Hellard,§§,*** Thomas Espeseth,†††,‡‡‡
Patrick F. Sullivan,§§§,****,†††† Christina M. Hultman,§§§ Peter C. Kjærgaard,†,‡‡‡‡,2
Mikkel H. Schierup*,†,§§§§ and Thomas Mailund,*,†
*Bioinformatics Research Centre, Aarhus University, Aarhus, 8000, Denmark
†Centre for Biocultural History, Aarhus University, Aarhus, 8000, Denmark
‡Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston,
MA 02115, USA
§Tørring Gymnasium, Tørring, 7160, Denmark
**Department of Biomedicine, Aarhus University, Aarhus, 8000, Denmark
††The Initiative for Integrative Psychiatric Research (iPSYCH), Aarhus University,
Aarhus, 8000, Denmark
‡‡Center for Integrative Sequencing, iSEQ, Aarhus University, Aarhus, 8000,
Denmark
§§Dr E. Martens Research Group of Biological Psychiatry, Center for Medical
Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, 5021,
Norway
Genetics: Early Online, published on August 17, 2016 as 10.1534/genetics.116.189241
Copyright 2016.
2
***NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical
Science, University of Bergen, Bergen, 5021, Norway
†††NORMENT, KG Jebsen Centre for Psychosis Research, Department of
Psychology, University of Oslo, Oslo, 0424, Norway
‡‡‡NORMENT, KG Jebsen Centre for Psychosis Research, Division of Mental Health
and Addiction, Oslo University Hospital, Oslo, 0424, Norway
§§§Department of Medical Epidemiology and Biostatistics, Karolinska Institutet,
Stockholm, 17177, Sweden
****Department of Genetics, University of North Carolina, Chapel Hill, NC 27599-
7264, USA
††††Department of Psychiatry, University of North Carolina, Chapel Hill, NC 27599-
7264, USA
‡‡‡‡Department of Culture and Society, Aarhus University, Aarhus, 8000, Denmark
§§§§Department of Bioscience, Aarhus University, Aarhus, 8000, Denmark
1Corresponding author: Bioinformatics Research Centre, Aarhus University, Aarhus,
8000, Denmark. E-mail: [email protected].
2Present address: The Natural History Museum of Denmark, University of
Copenhagen, Copenhagen, 1471, Denmark.
3
Running title
The population structure of Denmark
Key words
Human population genetics, population structure, admixture, polygenic risk score,
genetic prediction
Corresponding author
Name: Dr. Georgios Athanasiadis
Mailing address: Bioinformatics Research Centre, C.F. Møllers Allé, Building 1110,
Aarhus University, Aarhus, 8000, Denmark
Phone number: (+45) 87 15 55 69
E-mail: [email protected]
4
Abstract
Denmark has played a substantial role in the history of Northern Europe. Through a
nationwide scientific outreach initiative, we collected genetic and anthropometrical
data from ~800 high school students and used them to elucidate the genetic makeup
of the Danish population, as well as to assess polygenic predictions of phenotypic
traits in adolescents. We observed remarkable homogeneity across different
geographic regions, although we could still detect weak signals of genetic structure
reflecting the history of the country. Denmark presented genomic affinity with
primarily neighboring countries with overall resemblance of decreasing weight from
Britain, Sweden, Norway, Germany and France. A Polish admixture signal was
detected in Zealand and Funen and our date estimates coincided with historical
evidence of Wend settlements in the south of Denmark. We also observed
considerably diverse demographic histories among Scandinavian countries, with
Denmark having the smallest current effective population size compared to Norway
and Sweden. Finally, we found that polygenic prediction of self-reported adolescent
height in the population was remarkably accurate (R2 = 0.639±0.015). The high
homogeneity of the Danish population could render population structure a lesser
concern for the upcoming large-scale gene-mapping studies in the country.
5
Introduction
Denmark has played a substantial role in the history of Northern Europe. Like Swedes
and Norwegians, Danes are historically linked to the Vikings – the Germanic Norse
seafarers whose commercial and military operations marked the Viking Age in
European history (793-1066). Through a series of invasions, conquests and alliances,
the Danish Vikings established settlements as far as in Britain, Estonia, the Faroe
Islands, Iceland, Greenland and even Canada. Denmark’s long-lasting historical bond
with Sweden and Norway is nicely exemplified in the formation of the Kalmar Union,
a state that brought together the three Scandinavian nations from 1397 to 1523 as a
response to Germany’s expansionist tendency to the North (Derry 2000).
Denmark’s historical complexity is contrasted by the country’s small current size –
both in terms of geographic area (~43,000 km2) and population (5,614 million
inhabitants) – a fact even further contrasted by the notable compartmentalization of
the Danish linguistic map in about 30 dialectal areas (http://dialekt.ku.dk/). These
observations, together with the evident lack of strong geographic barriers in the
country, make present-day Denmark an interesting setting for studying the genetic
history of small populations with a glorious past.
In recent years, there has been an explosion of human genetic studies that contributed
substantially to the characterization of worldwide variation patterns (Li et al. 2008;
Lao et al. 2008; Novembre et al. 2008; Reich et al. 2009; Busby et al. 2015); the
reconstruction of population history in regions with poor/nonexistent historical
records (Moreno-Estrada et al. 2014; Moreno-Mayar et al. 2014); the study of local
and global patterns of admixture in multiethnic societies (Bryc et al. 2015); and the
study of admixture with other hominin species as well as its use in elucidating human
6
dispersals (Green et al. 2010; Reich et al. 2011). The increased power of high-
throughput genotype data and computational methods has also boosted the emergence
of many single-country genomic projects in Europe (Jakkula et al. 2008; Price et al.
2009; Genome of the Netherlands Consortium 2014; Leslie et al. 2015; Karakachoff
et al. 2015) and the release of useful data to the public (Welter et al. 2014).
In this work, we extend the collection of single-country genetic studies by introducing
a new project from Denmark. Unlike previous genomic projects involving Denmark
(Bae et al. 2013; Larsen et al. 2013), ours was conceived from the beginning as a
scientific outreach initiative with benefits for both the general public and our research
objectives. We invited high school students from across Denmark to participate in
activities whose primary goal was to promote genomic literacy in secondary
education (Athanasiadis et al. 2016). Most participants donated a DNA sample, which
we used to explore the extent to which recent and more distant historical events left
their mark on the genetic makeup of the Danish population. Apart from the population
structure and demographic history in Denmark, an additional focus of this work has
been the quantitative study of basic anthropometrical traits.
With this work, we ultimately report back to our DNA donors the biological and
historical insights we gained from analyzing their genetic data. Our results showed
remarkable homogeneity across different geographic regions in Denmark, though we
were still able to detect weak signals of genetic structure. Denmark received
substantial admixture contributions primarily from neighboring countries in the past
ten centuries with overall influence of decreasing weight from Britain, Sweden and
Norway. We also observed considerably diverse demographic histories among the
Scandinavian countries, as reflected in their historical effective population size.
Finally, we found that human height can be predicted with remarkable accuracy in
7
Danish adolescents. Overall, this work showcases how far we can go with the analysis
of genetic data from small, fairly homogeneous populations.
Materials and Methods
Sample description
New data:- We recruited ~800 students from 36 high schools from across Denmark
under the Where Are You From? project (Athanasiadis et al. 2016). These numbers
correspond to the sampling of approximately one in 9,350 Danish inhabitants (Figure
1). We asked each participant to provide a saliva sample for DNA analysis and to
answer an online questionnaire about family origin (their own, their parents’ and their
grandparents’ place of birth) and basic anthropometrical data (e.g. self-reported height
and weight). The institutional review board of Aarhus University approved the study.
Because no health-related questions were asked, there was no requirement for
additional approval by the University’s medical ethics committee. Informed consent
was obtained either from the participants (age > 18 years) or their parents (age < 18
years). We used the 23andMe (Mountain View, CA, USA) DNA analysis service for
the genotyping of 723 participants. 23andMe uses a custom HumanOmniExpress-24
BeadChip from Illumina (San Diego, CA, USA). Genetic and phenotypic data have
been deposited at the European Genome-Phenome Archive (EGA; https://ega-
archive.org) under accession number EGAS00001001868. After excluding duplicate
single nucleotide polymorphisms (SNPs), applying a per-locus missingness threshold
of 2% with PLINK v1.9 (Chang et al. 2015) and removing SNPs ambiguously
mapped to the forward DNA strand, 517,403 unique autosomal SNPs were available
for further analysis.
8
Figure 1. Location of the 36 Danish high schools participating in the Where Are You From? project colored by
geographic origin. Number of samples with at least three grandparents born in each of the six regions is shown in
brackets.
Additional data:- To put our study in a broader European context, we included four
additional datasets: (i) 2,833 individuals × 227,899 SNPs from the POpulation
REference Sample (POPRES) (Nelson et al. 2008); (ii) 494 individuals × 314,526
SNPs from Amyotrophic Lateral Sclerosis (ALS) Finland (Laaksovirta et al. 2010);
(iii) 381 individuals × 577,252 SNPs from the Swedish Schizophrenia Study (Ripke et
al. 2013); and (iv) 300 individuals × 537,306 SNPs from the Norwegian Cognitive
NeuroGenetics (NCNG) sample (Espeseth et al. 2012). We applied separate quality
controls to these datasets according to their particularities (protocols available upon
request).
Imputation
We carried out genotype imputation within each dataset separately before combining
them. We first changed DNA strand orientation of several SNPs in all five datasets to
create a uniform forward orientation using SNPFLIP
Capital Region (14)Zealand (36)Funen (13)South Jutland (37)Central Jutland (89)North Jutland (48)Total (237)
9
(https://github.com/endrebak/snp-flip) and PLINK. Table S1 shows the number of
SNPs that were flipped/removed from each dataset. We then used liftOver from the
UCSC Genome Browser to “lift” genome coordinates from the NCBI36/hg18 (March
2006) to the GRCh37/hg19 (February 2009) assembly in the POPRES, ALS Finland
and NCNG datasets. We produced “prephased” haplotypes for each dataset with
SHAPEIT v2.720 (Delaneau et al. 2012) and used the haplotypes together with the
latest 1000 Genomes Phase 3 reference panel (b37, October 2014) for the separate
imputation of the five datasets with IMPUTE2 v.2.3.1 (Howie et al. 2012). Finally,
we concatenated the imputed data into separate chromosomes and filtered them for
“info” ≥ 0.975 with QCTOOL (http://www.well.ox.ac.uk/~gav/qctool/#overview).
Principal component analysis
We ran principal component analysis (PCA) with PLINK to explore population
structure in our Danish sample. PCA was run on two different data combinations: (i)
POPRES, Where Are You From?, NCNG and the Swedish Schizophrenia Study; and
(ii) Where Are You From?, NCNG, the Swedish Schizophrenia Study and Germany
from POPRES. To avoid undesirable clustering due to extensive linkage
disequilibrium (LD), we thinned the genotypes with PLINK (using a window and step
size of 1,500 and 150 SNPs, respectively, and r2 threshold = 0.80) and removed SNPs
from known high-LD genomic regions (e.g. MHC on chromosome 6).
Chromosome painting, population clustering and admixture
We used a set of LD-based methods – CHROMOPAINTER (Lawson et al. 2012),
fineSTRUCTURE (Lawson et al. 2012) and GLOBETROTTER (Hellenthal et al.
2014) – to explore fine-grain population structure and admixture in Denmark. These
methods require a set of phased SNP data from “donor” (i.e. the available European
10
samples) and “recipient” populations (i.e. the Danish sample). In brief, the methods
detect extended multi-marker haplotypes across the genome, which are organized in
pairwise vectors of similarity counts (CHROMOPAINTER). These vectors are then
used by an MCMC algorithm (fineSTRUCTURE) to hierarchically cluster individuals
into groups that are often geographically, linguistically and/or historically meaningful
(Leslie et al. 2015). Admixture proportions are estimated through multiple linear
regression on the average proportion of DNA that each recipient copies from each of
the donor groups (GLOBETROTTER). Nonparametric standard errors of admixture
proportions were calculated with a jackknife procedure described elsewhere
(Montinaro et al. 2015). Finally, GLOBETROTTER also measures the decay of
association vs. genetic distance between the “chromosome chunks” copied from a
given pair of donor groups. This decay is exponentially distributed with rate equal to
the time that the admixture occurred (Hellenthal et al. 2014; Busby et al. 2015).
After jointly phasing 489,209 imputed autosomal SNPs across all five datasets with
IMPUTE2, we ran CHROMOPAINTER using default options (Leslie et al. 2015) on
three different datasets: (i) Denmark alone; (ii) Europe without Denmark; and (iii)
Europe and Denmark together. We previously ran each analysis ten times on a sample
subset (~10% of the total number samples and only for chromosomes 4, 10, 15, and
22) to estimate the switch and global emission rates used by CHROMOPAINTER’s
Hidden Markov Model. Once similarity vectors were defined in the three datasets, we
used fineSTRUCTURE to explore clustering in recipient (dataset i) and donor (dataset
ii) populations. For the GLOBETROTTER analysis, we merged the similarity
matrices from datasets ii and iii and ran the program with default options described
elsewhere (Leslie et al. 2015; Busby et al. 2015).
Ancestry component analysis
11
We used Ohana (https://github.com/jade-cheng/ohana) to estimate individual
admixture proportions in 13 European countries – mostly Western and Northern –
including Denmark. In brief, Ohana is a new model-based method that uses the same
likelihood admixture model as does STRUCTURE (Pritchard et al. 2000), FRAPPE
(Tang et al. 2005) and ADMIXTURE (Alexander et al. 2009), and Newton’s method
for optimization. Compared to established methods, Ohana achieves better likelihood
estimates (R. Nielsen, personal communication). After running the algorithm, we
reported per-country admixture proportions by averaging individual proportions
within each country.
Relatedness and identity by descent
To explore relatedness in our Danish sample, we used two different methods: KING
(Manichaikul et al. 2010) for a formal calculation of pairwise kinship coefficients and
BEAGLE Refined IBD (Browning and Browning 2013a) for the inference of DNA
segments that were identical by descent (IBD) between pairs of individuals. We
analyzed 406 individuals for whom we had complete information that all four of their
grandparents were born in Denmark. We excluded from the analysis one individual
from pairs of twins and siblings (preferentially the one with highest per-locus
genotype missingness).
Historical effective population size
We also estimated the historical effective population size (Ne) of the Danish, Swedish
and Norwegian populations through the combination of two IBD-based methods. We
first applied IBDseq (Browning and Browning 2013b) to the three Scandinavian
datasets separately to produce sets of pairwise IBD tracts. We then used IBDNe
(Browning and Browning 2015) to estimate Ne from the distribution of the inferred
12
IBD tracts over the past 150 generations for each of the three Scandinavian
populations. To maximize power, we used the original SNP data for this analysis.
Note that, when SNP data are used, IBDNe estimates are most reliable within the past
50 generations.
Polygenic prediction of height and BMI
We used self-reported height and weight from ~600 students of diverse ethnic
backgrounds to perform polygenic risk prediction of height and body mass index
(BMI) with LDpred (Vilhjálmsson et al. 2015), a summary statistic-based algorithm
that models LD to improve the prediction. As training data for the model, we used
public summary statistics from large genome-wide association studies of adult height
(Wood et al. 2014) and BMI (Locke et al. 2015). We first removed from our data
SNPs with minor allele frequency (MAF) < 0.01, as well as SNPs with a MAF
different from the one reported in the training data by a factor of 0.15. We then
assessed SNP effect under different fractions of causal variants: p = {1, 0.5, 0.2, 0.1,
0.05}, whereby p = 1 corresponds to the infinitesimal model (Visscher et al. 2008).
As an LD reference, we used a subset of 407 unrelated individuals (kinship
coefficient < 0.05) who had all four of their grandparents born in Denmark. Finally,
we validated LDpred’s prediction of height in 578 individuals, as well as the
prediction of BMI in 572 individuals. We calculated R2 – the proportion of
phenotypic variance explained by the predictor – by (i) adjusting for age, sex and the
first ten PCs, and (ii) actually including age, sex and the first ten PCs in the model.
PC adjustment ensures that genetic predictions are not influenced by ancestry.
Results
Principal component analysis
13
To put Danes in a European genetic context, we first ran PCA on 3,858 samples from
across Europe. In particular, we extended a previously published PCA within Europe
(Novembre et al. 2008) to include sizeable samples from Denmark, Norway and
Sweden, as these were underrepresented in the original study. In our PCA, Denmark
was represented by 407 students who had all four of their grandparents born in the
country. Our Danish samples clustered in a geographically meaningful manner along
the first two principal axes, partially overlapping with Norwegians and Swedes, and
showing close genetic proximity to samples from Great Britain, the Netherlands,
Germany and Poland (Figure 2A).
Figure 2. (A) PCA of 105,672 imputed SNPs after merging four datasets: Where Are You From?, POPRES,
NCNG and the Swedish Schizophrenia study without outliers clustering with Finland (total N = 3,858). Per-
country box plots (median and interquartile range) of PC values were added to facilitate interpretation. Whiskers
represent data within 1.5 times the interquartile range. IE: Ireland; ES: Spain; PT: Portugal; GB: Great Britain; FR:
France; BE: Belgium; CH: Switzerland; NL: the Netherlands; DK: Denmark; DE: Germany; NO: Norway; SE:
Sweden; AT: Austria; IT: Italy; PL: Poland; HU: Hungary; CZ: Czech Republic; HR: Croatia; RO: Romania; YU:
Yugoslavia; GR: Greece. (B) PCA of 105,672 imputed SNPs from Denmark, Sweden, Norway and Germany with
emphasis on the six geographic regions of Denmark (total N = 1,168). No clear genetic-geographic relationship
was observed. Ca: Capital; Ze: Zealand; Fu: Funen; SJ: South Jutland; CJ: Central Jutland; NJ: North Jutland. (C)
Correlation of PC1 and PC2 with average grandparent place-of-birth latitude along a 360° clockwise rotation.
Maximum correlation was observed for PC1 at 32° (r ≈ 0.24; p < 10-5).
−0.04 −0.02 0.00 0.02 0.04 0.06PC2
−0.06
−0.04
−0.02
0.00
0.02
0.04
PC1
ATBE
HR
CZ
DK
FR
DE
GR
HU
IE
IT
NL
NO
PL
PT
RO
ES
SE
CH
GB
YU
●●
●●●●●
●●●
●●●
●●●●
●●●
●
GRYU
ROHRCZHUPL
ITATSE
NODE
DKNL
CHBE
FRGB
PTES
IE
●●●
●●●●●●●●
●●●●
●●●●
●●
SENODK
PL IE NLGB
DEBECZ
ATHU
HRCHFR
RO
YUESPT
GR IT
-0.10 -0.05 0.00 0.05 0.10
PC2
-0.05
0.00
0.05
0.10
PC1
32°
CaZeFu
SJ
CJNJ
SENO
DE
°Rotation
corr
. coe
ffici
ent
0 30 90 135 180 210 270 315 360
-0.3
0.0
0.3PC1PC2
Ca
ZeFuSJ
CJ
NJ
B
C
A
14
We then looked for fine-scale genetic structure within Denmark by focusing the
analysis on 237 students who had at least three of their grandparents born in just one
of the following six regions: Capital Region; Zealand; Funen; South, Central and
North Jutland (Figure 2B). These six groups correspond to the five administrative
regions of Denmark (we further split the administrative region of South Denmark into
South Jutland and Funen). After rerunning PCA for Denmark, Sweden, Norway and
Germany alone (N = 1,168), we observed no geographically meaningful clustering of
the 237 Danish samples (Figure 2B). This lack of strong genetic structure was also
supported by the low average FST value between the six regions (FST = 0.0002). To
check whether structure was simply too weak to be visually detected, we calculated
for each Danish sample the average geographic coordinates of their grandparents’
place of birth and regressed the resulting values on PC1 and PC2 eigenvectors. We
repeated the procedure by gradually rotating the map clockwise and found a weak yet
significant correlation between PC1 and latitude along a northwest-southeast axis at
32° (r ≈ 0.24; p < 10-5; Figure 2B-C).
Chromosome painting, population clustering and admixture
To further explore historical genetic interactions between Denmark and neighboring
countries, we ran CHROMOPAINTER, fineSTRUCTURE and GLOBETROTTER
on a subset of the studied European populations. We used 2,745 individuals from 13
European countries (Norway, Sweden, Finland, the Netherlands, Germany, Poland,
Austria, Hungary, France, Belgium, Great Britain, Spain and Portugal) and the 237
individuals from the six geographic regions in Denmark (Figure 1) to quantify and
date European admixture in each of the six Danish groups. We used these six groups
for our analysis because we were unable to observe an alternative clustering:
15
fineSTRUCTURE clustered all Danish samples in a single large group (data not
shown), reflecting again the weak genetic structure in the Danish population.
After running CHROMOPAINTER and fineSTRUCTURE on the European donor
samples, these were organized in the eight major clusters seen in Figure 3A:
Norwegian (NOR), Swedish (SWE), Finnish (FIN), British (BRI), French (FRA),
German (GER), Polish (POL) and Iberian (IBE). There was always one predominant
country in the makeup of each of those clusters, with the exception of IBE, in which
Spain and Portugal were present at almost equal proportions, and GER, in which
samples from Austria, Hungary and the Netherlands were also present at large
numbers. For convenience, we named the eight clusters after the predominant
country/region in each of them. We treated these clusters as ancestry components
(“surrogate populations” in GLOBETROTTER jargon) and used GLOBETROTTER
to define their contribution to each of the six Danish regions and to date the
corresponding admixture events.
Figure 3. (A) fineSTRUCTURE grouping of the 2,745 European donor samples into eight clusters roughly
corresponding to well-defined geographic locations. The tree on the left illustrates cluster topology. Pie charts on
16
the European map show the relative contribution from each country to each of the eight inferred clusters. FIN:
Finnish; NOR: Norwegian; SWE: Swedish; POL: Polish; GER: German; BRI: British; FRA: French; IBE: Iberian.
Color legend at the bottom of the map shows different donor countries. FI: Finland. (B) GLOBETROTTER
admixture proportions of each of the eight European clusters in the six geographic regions of Denmark. Neither
FIN nor IBE made substantial contributions (< 2.5%) to the mixture profiles of Denmark.
Figure 3B and Table S2 show that the Swedish, Norwegian and British clusters made
the most substantial contribution to the ancestry profiles of all six Danish groups,
jointly accounting for 82.05-97.12% of the total admixture. The Scandinavian
component (Swedish and Norwegian clusters together) surprisingly accounted for less
than half of the total admixture (range: 41.69-48.96%), on a par with the British
component (40.37-48.16%), which peaked in South Jutland. Interestingly, the
contribution of the Swedish cluster alone (27.45-30.20%) was almost twice as large as
that of the Norwegian cluster (14.24-18.76%). This difference could be explained by
the reduced landscape connectivity between Norway and Denmark, affecting gene
flow patterns. It is also striking that the German cluster had little genetic influence on
Denmark (3.04-6.78%), despite the proximity and historically fluid borders between
the two countries. However, the latter observation could be due to the marginally
higher genetic affinity of Denmark with Britain, which could result in prioritizing
British over German samples in the initial chromosome-painting step of our analysis.
Similarly, the French component was present at small yet considerable portions (4.16-
5.42%) in all Danish regions except for Funen and South Jutland. Finally, it is worth
noting that there was a small Polish contribution to Zealand (6.28%) and Funen
(5.03%).
We further used GLOBETROTTER in a preliminary bootstrap re-sampling step
within each of the six Danish regions to produce admixture dates. A proportion of
nonsensical dates (i.e. ≤ 1 generation or ≥ 400 generations) above the empirical p-
17
value threshold (0.05) suggests that GLOBETROTTER cannot date reliably the
corresponding admixture event. Following this procedure, we found that we did not
have enough power to date admixture in the Capital region or in Funen (data not
shown). For the remaining four regions, we used GLOBETROTTER’s pairwise co-
ancestry curves (Figures S1-4) to define in more detail the type (i.e. one date vs. one
date multi-way vs. two dates) and actual date of admixture, as well as a tentative
composition of the admixing source populations. In all regions, GLOBETROTTER’s
best fitting admixture model (R2 = 0.136-0.354) involved one admixture event with
more than two sources admixing simultaneously (Table 1). Assuming 30 years per
generation, the date of admixture ranged from ~434 to ~943 years before present.
Notably, the algorithm identified two clusters as the most recurring admixing sources:
one showing highest affinity with GER and one with POL.
Ancestry component analysis
We also used Ohana to provide an independent estimate of individual admixture
proportions in the same set of 12 European countries (without Finland) and six Danish
regions. For this purpose, we assumed that each of the European samples was the
result of admixture between K ancestral populations (i.e. ancestry components). Even
though we ran the analysis for K = {2, …, 9} (Figure S5), we report the results for K
= 4, because this value corresponds to four distinct geographic origins of our samples:
the Iberian Peninsula; Eastern, Central and Nordic Europe. The analysis returned an
admixture pattern that was consistent with geography (i.e. North-South and East-West
clines; Table 2; Figure 4). In particular, we observed (i) a Southern European
component (blue), which was predominant in the Iberian peninsula but was also found
at large proportions in France; (ii) an Eastern European component (yellow), which
was predominant in Poland but was also found at large proportions in neighboring
18
countries (and notably in South and East Denmark, Sweden, Norway and Great
Britain); (iii) a Nordic component (green), which was predominant in Scandinavia
and to less extent in Germany; and (iv) a Central European component (red), at higher
proportions in Scandinavia and the Netherlands but also present in most European
countries (except for Iberia). Within Denmark, we observed that Zealand and South
Denmark had larger membership to the Eastern European component (18.90-20.36%)
compared to Central and North Jutland (~13%), resembling GLOBETROTTER’s
signal of Polish admixture observed in Zealand and Funen (Figure 3B; Table S2). The
Southern and Eastern European components were robust to different choices of K,
whereas the Central and Nordic European components were gradually divided into
smaller fractions as K went up, showing no clear geographic patterns (Figure S5).
Figure 4. Ancestral component analysis of 13
European countries (including six well-
defined geographic regions in Denmark
shown in the inset) assuming K = 4 ancestral
populations. Bar plot at the bottom shows
per-individual membership to each of the four
ancestral components, whereas pie charts on
the map resume per-country (or per-region
for Denmark) admixture proportions. Based
on their preponderance in different parts of
Europe, we interpret the four components as
(i) Southern European (blue); (ii) Eastern
European (yellow); (iii) Nordic (green); and
(iv) Central European (red). Regions from
South and East Denmark show higher
proportion of Eastern European ancestry in
accord with Figure 3B and Table S2.
PT ES FR
BE UK NL
DE AT HU
PT ES FR
BE UK NL
DE AT HU
PT ES FR
BE UK NL
DE AT HU
PT ES FR
BE UK NL
DE AT HU
PT ES FR
BE UK NL
DE AT HU
PT ES FR
BE UK NL
DE AT HU
PT ES FR
BE UK NL
DE AT HU
PT ES FR
BE UK NL
DE AT HU
PT ES FR
BE UK NL
DE AT HU
PL Ca Ze
Fu SJ CJ
NJ NO SE
PL Ca Ze
Fu SJ CJ
NJ NO SE
PL Ca Ze
Fu SJ CJ
NJ NO SE
PT
ES
FR
BE
GB
NL
DE
AT
HU
PL DK NO
SECa Ze Fu SJ CJ NJ
PL Ca Ze
Fu SJ CJ
NJ NO SE
PL Ca Ze
Fu SJ CJ
NJ NO SE
PL Ca Ze
Fu SJ CJ
NJ NO SE
PL Ca Ze
Fu SJ CJ
NJ NO SE
PL Ca Ze
Fu SJ CJ
NJ NO SE
PL Ca Ze
Fu SJ CJ
NJ NO SE
1 16 33 50 67 121 143 105 89 168 190 252 274 296 407 303 325 347 369 428 450 472 197
19
Relatedness and identity by descent
We followed up the evidence for weak genetic structure in Denmark by exploring the
degree and geographic distribution of relatedness among our Danish samples. Using
KING, we found that the vast majority of the Danish students were distantly related
(4th degree or more distant), with only four pairs of individuals showing 2nd or 3rd
degree relationships (apart from the twins and siblings initially removed; Figure S6).
Because of the inherent uncertainty in distinguishing between different degrees of
distantly related individuals, we did not attempt to further stratify the samples by
kinship coefficient.
After establishing that most participants in our sample were either unrelated or
distantly related (Figure S6), we examined DNA segments that were IBD between
pairs of individuals with BEAGLE Refined IBD. Using total genomic length of IBD
tracts as a proxy for relatedness, we traced each individual’s closest “genomic
relative” within Denmark without explicitly quantifying the degree of relationship.
Figure 5A shows the distribution of the geographic distance of each of 399
individuals from their closest genomic relative (pink) and from a randomly chosen
sample (green). The distribution of the distance from the closest relative (pink)
showed an enrichment for very short distances, i.e. less than ~50 km, as well as a
median value of 99.3 km – significantly closer than expected by chance at 131.4 km
(Mann-Whitney U = 60,997; p < 10-8). This points at a weak yet significant tendency
of participants to live close to genetically similar individuals.
20
Figure 5. (A) Distribution of geographic distance of each participant’s place of birth (N = 399) to that of their
closest “genomic relative” (pink), and to that of a randomly chosen sample (green). Genomic relatedness was
defined on the grounds of total genomic IBD. Arrows point at median values of the two distributions (medianrltv =
99.3 Km; medianrand = 131.4 Km). Seven individuals with unknown geographic coordinates were excluded from
this analysis. (B) Plot of rank of genomic relatedness vs. median geographic distance of each participant to their
closest genomic relative. We created 57 equally sized bins of individuals increasingly related to their closest
genomic relative (seven individuals per bin). Alternative bin sizes also produced significantly negative correlations
(data not shown).
To gain more insight into the latter observation, we grouped our Danish samples into
ranked bins by the amount of total genomic IBD shared with their closest genomic
relative (the higher the rank, the closer the relationship). We then calculated the
median geographic distance of each participant to their closest relative within each
bin and regressed this value against bin rank (Figure 5B) to observe a weak yet
significant negative correlation (r ≈ -0.35; p < 0.01). This observation points out that
geographic distance tends to be significantly shorter between Danes who share more
genomic IBD. Interestingly, this signal was not picked up by kinship coefficient (data
not shown).
A
Distance (km)
Density
0 50 100 150 200 250 300
0.000
0.005
0.010
0.015
medianrltv medianrand
closest relativerandom sample
0 10 20 30 40 50 60
0
50
100
150
200
B
Rank of genomic relatednessM
edia
n ge
ogra
phic
dis
tanc
e (K
m)
21
Historical effective population size
Historical Ne showed notable disparity between the three Scandinavian countries
(Figure 6). In particular, point estimates of Ne in Denmark, Norway and Sweden
showed a dramatic ~273-fold, ~262-fold and ~995-fold increase over the past 150
generations (~4,500±300 years), following the general upward European trend
(Browning and Browning 2015). What is more, Sweden’s and Denmark’s Ne
estimates were significantly different from each other (no overlap between the
corresponding 95% confidence intervals). During the High and Late Middle Ages, Ne
in Denmark remained stable, suffering an almost imperceptible decline as a
consequence of the otherwise devastating Black Death and similar patterns were
observed for Norway and Sweden. From the 15th Century on, Ne in Denmark started
to rise again at a moderate rate, whereas Ne in Norway and Sweden rose at an even
higher rate, resulting in Norway’s Ne eventually surpassing that of Denmark.
Interestingly, even though Denmark and Norway currently have very similar census
population sizes (5.614 and 5.084 million, respectively), Norway’s current Ne is 1.76
times larger than Denmark’s (even though not significantly different). Finally, the
upward trend of Denmark’s Ne did not seem to be withheld by other important
epidemics in the history of the country, such as cholera, the Spanish flu or, more
recently, polio (Figure 6).
22
Figure 6. Change in effective population size (Ne) of the Danish, Swedish and Norwegian populations over the
past 150 generations (log-scale). Shaded areas represent the upper and lower bounds of the 95% confidence
intervals after bootstrapping. Uncertainty in generation length is represented by year intervals on the x-axis,
assuming that each generation lasts 30±2 years. Black segments represent major epidemics from the recent history
of the Danish population and are plotted taking into account generation length uncertainty. For more clarity, the
inset shows the same graph with both axes in log-scale.
Polygenic prediction of height and BMI
Polygenic risk prediction in our Danish sample was overall far more accurate for
height than for BMI (Figure S7). In the case of height, we observed maximum
accuracy when assuming infinitesimal genetic architecture and adjusting for age, sex
and ten ancestry PCs – R2 = 0.251±0.031 (Figure S7A) vs. ~0.17 elsewhere (Wood et
al. 2014). When age, sex and the ten PCs were also included in the model, the
prediction rose substantially to 0.639±0.015 (p = 6.57×10-71), a fact reflected in the
strong and significant correlation between real and predicted height (Figure S7C). In
contrast, although the maximum accuracy of BMI prediction was also observed when
0
1000
2000
3000
4000
Year
Ne ×1
,000
Present
1894 - 19061787 - 1813
1467 - 1533934 - 1066
133 BC - 133 AD
2800 - 2200 BC
SwedenNorwayDenmarkPolio (1952)Spanish Flu (1918)Cholera (1853)Black Death (1350)
Year
Ne ×1
,000
23
we adjusted for age, sex and ten PCs, this was overall much poorer than for height –
R2 = 0.106±0.037 (Figure S7B) vs. ~0.065 elsewhere (Locke et al. 2015). In this case,
when age, sex and ten PCs were included in the model, the improvement of accuracy
was not as notable as for height (R2 = 0.195±0.034, p = 5.49×10-18), implying that
BMI is only marginally affected by age and sex (Figure S7D).
Discussion
The most notable observation of our study is the high genetic homogeneity of the
Danish population, possibly reflecting geographically uninterrupted gene flow, further
facilitated by the extended network of sea-based commerce and travelling (Derry
2000). This observation should be appreciated in a historical context, as our analysis
did not account for genetic contributions from recent immigration. Indeed, modern
Danish society accommodates different ethnic and cultural groups (Jensen and
Pedersen 2007), and this was also reflected in our sample, where ~4% of the
participants were born in Afghanistan, China, Ethiopia, Finland, Germany, Greenland,
Iraq, Jordan, Korea, Kosovo, the Netherlands or Zambia. This number went up when
we looked at grandparental origin, where 14.6% of the participants had at least one
grandparent born outside Denmark.
The historical homogeneity of the Danish population is reflected in the lack of PCA
clustering (Figure 2B), as well as in the notably low average FST between the six
geographic regions. To put this observation in context, a previous study was able to
detect subtle population structure along a north-south axis in the Netherlands
(Genome of the Netherlands Consortium 2014), a country of almost identical area to
Denmark’s. Even though we were able to detect significant correlation between PC1
and latitude (Figure 2C), the signal was less compelling than that from the Dutch
24
study. Similarly, a recent study of population structure in Great Britain (Leslie et al.
2015) found that the average pairwise FST estimates between 30 geographic regions
was 0.0007 – 3.5 times higher than the value we report here (0.0002).
To better understand how our FST estimates compare to those from Great Britain, we
calculated average FST values in two British population sets using FST values reported
elsewhere (Leslie et al. 2015). The two British sets included regions with geographic
distances comparable to those in Denmark. The first subset comprised of 13
populations from Central/Southern England with a resulting average FST < 0.0002.
The second subset comprised of nine populations from Northern England with a
resulting average pairwise FST = 0.0003.
In general, the study of admixture within the European continent is confounded by a
well-grounded isolation-by-distance mechanism (Lao et al. 2008; Novembre et al.
2008), as well as an increased historical complexity that renders most admixture
models unrealistically simple (Busby et al. 2015). Denmark is no exception to these
caveats. Even though it is tempting to explain the admixture proportions in Figures 3
and 4 as the result of historical admixing events, an alternative approach is to interpret
such proportions as “mixture profiles”. Bearing this in mind, we see that the mixture
profiles of all six Danish groups comprise two major ancestry components, one
predominant in Scandinavia and the other predominant in Central Europe (Figure 4).
In the GLOBETROTTER analysis, these two components were identified as
admixture contributions from the Swedish/Norwegian and the British clusters (Figure
3B; Table S2). In regard to the British contribution, however, this is actually more
likely to reflect admixture in the opposite direction, i.e. from Denmark to Britain
(Leslie et al. 2015).
25
Even though the mixture profiles of the six regions in Denmark were overall quite
similar, they differed in their membership to the Eastern European cluster. This
particular cluster was more prevalent in South and East Denmark (Figure 4) and was
independently observed in the GLOBETROTTER analysis as a small Polish
contribution to Zealand and Funen (Figure 3B; Table S2). This signal could be due to
historical Wend settlements in the broader area around Lolland in the southernmost
part of Denmark (Derry 2000), an observation that is also supported by our admixture
date estimates (Table 1). However, because similar mixture profiles were also
observed in Sweden and Norway (Figure 4), it is difficult to choose between isolation
by distance and actual admixture.
Given the high variance in the estimated admixture dates, the recurrence of the same
combinations of admixing sources, and the largely unstructured nature of our Danish
sample, we could interpret several of the admixture patterns in Table 1 as different
instances of the same event. For instance, focusing on the two surrogates that
GLOBETROTTER found best tagged by POL and GER, we see that the admixture
between them first occurred in Zealand in the 11th century, spreading to the rest of
peninsular Denmark afterwards. This observation fits historical knowledge about the
march of the Wends towards Denmark, as mentioned above.
A weak signal of population structure was also observed when we studied the
geographic distribution of IBD-based relatedness. The median distance to the closest
genomic relative was significantly smaller than to a randomly chosen sample, but it
represents a minor effect (median difference ≈ 30 km; Figure 5A). In our regression
analysis, we observed that this distance tended to be significantly smaller for pairs of
individuals that were more identical, yet the correlation was overall modest (r ≈ -0.35;
26
Figure 5B). These observations point out that genetic structure indeed exists in
Denmark, even though it is very weak.
It is striking that Denmark, Sweden and Norway – three closely related Scandinavian
countries – seem to differ considerably in their demographic history, as reflected in
their historical Ne trajectories (Figure 6). Indeed, current Sweden-to-Norway census
population size ratio is 1.89 – considerably close to their current Ne ratio (2.24) –
implying that the two populations have had similar reproductive variance and/or
population structure. On the contrary, Denmark’s Ne seems to have increased at a
consistently lower rate after the Middle Ages. This could be reflecting weaker
reproductive dynamics in the Danish population or weaker population structure
compared to Sweden and Norway. Larger sample sizes are warranted to shed more
light on the historical Ne trajectories in Scandinavian populations.
Finally, we found that self-reported adolescent height could be predicted with
remarkable accuracy using essentially nothing but information derived directly
(genotypes) or indirectly (sex and ancestry) from DNA available at birth. When we
combined SNP data with age, sex and PCA information, our predictor could explain
more than half of the total phenotypic variance (63.9%). The remaining unexplained
variance corresponded to a standard deviation of 5.43 cm. This means that, with 95%
confidence, we are at most ~10.65 cm off in our prediction of adolescent height
(Figure S7C). It is worth noting that adding age to the model did not yield significant
improvements to the prediction accuracy of height (data not shown), implying that
adolescent height is a reliable measurement for the validation of the prediction. In
addition, even though height was a self-reported measurement, its strong correlation
with predicted adult height suggests that the students provided reliable personal
information (Figure S7C). As for the poorer prediction of BMI, it is important to
27
remember that data training was carried out in adults, whereas prediction was
validated in adolescents. Therefore, further improvement of the BMI prediction as
subjects advance in age is a possibility.
In conclusion, our analysis showed that, by applying the simple criterion of
participants having all four of their grandparents born in Denmark, we obtained a
largely unstructured sample with very low FST values among different geographic
regions. This high homogeneity has the potential of rendering population structure in
large-scale Danish gene-mapping studies such as iPSYCH (http://ipsych.au.dk/) a
lesser concern, regardless of adjustments such as genomic control, PCA (Price et al.
2006) or mixed models (Yu et al. 2006). We also observed a notably disparate
demographic history in Demark compared to other Scandinavian countries – a fact
that can also be ascribed to the high homogeneity of the Danish population – and that
height in adolescents could be predicted with considerable accuracy. Lastly but not
least importantly, this study stands as an example of how large-scale public
engagement projects can provide mutual benefits for both the general public and the
scientific community through the promotion of scientific knowledge.
Acknowledgements
The authors would like to thank the high school students and their teachers for
participating in the Where Are You From? project, as well as Dagmar Hedvig Fog
Bjerre and Pia Johansson Nielsen for logistic support with organizing and shipping
the DNA kits to the participating schools. Special thanks to Prof. Jes Fabricius Møller
for helping us gain useful insights into the history and demography of the Danish
population. G.A. would like to thank Garrett Hellenthal, Francesco Montinaro,
George Busby, Sharon Browning and Christopher Gignoux for their help and useful
28
discussions. This project and related research were supported from the National
Lottery Funds (Danske Spil), The Danish Ministry of Education and the Centre for
Biocultural History, Aarhus University. B.J.V. was supported by a grant from the
Danish Council for Independent Research (DFF-1325-0014). K.M.H. was supported
by grants from the Swedish Research Council (K2014-62X-21445-05-3 and K2012-
63X-21445-03-2).
Literature Cited
Alexander D. H., Novembre J., Lange K., 2009 Fast model-based estimation of ancestry in unrelated individuals.
Genome Res. 19: 1655–1664.
Athanasiadis G., Jørgensen F. G., Cheng J. Y., Kjærgaard P. C., Schierup M. H., Mailund T., 2016 Spitting for
science: Danish high school students commit to a large-scale self-reported genetic study. PloS One.
Bae H. T., Sebastiani P., Sun J. X., Andersen S. L., Daw E. W., Terracciano A., Ferrucci L., Perls T. T., 2013
Genome-wide association study of personality traits in the long life family study. Front. Genet. 4: 65.
Browning B. L., Browning S. R., 2013a Improving the accuracy and efficiency of identity-by-descent detection in
population data. Genetics 194: 459–471.
Browning B. L., Browning S. R., 2013b Detecting identity by descent and estimating genotype error rates in
sequence data. Am. J. Hum. Genet. 93: 840–851.
Browning S. R., Browning B. L., 2015 Accurate Non-parametric Estimation of Recent Effective Population Size
from Segments of Identity by Descent. Am. J. Hum. Genet. 97: 404–418.
Bryc K., Durand E. Y., Macpherson J. M., Reich D., Mountain J. L., 2015 The genetic ancestry of African
Americans, Latinos, and European Americans across the United States. Am. J. Hum. Genet. 96: 37–53.
Busby G. B. J., Hellenthal G., Montinaro F., Tofanelli S., Bulayeva K., Rudan I., Zemunik T., Hayward C.,
Toncheva D., Karachanak-Yankova S., Nesheva D., Anagnostou P., Cali F., Brisighelli F., Romano V.,
Lefranc G., Buresi C., Ben Chibani J., Haj-Khelil A., Denden S., Ploski R., Krajewski P., Hervig T.,
Moen T., Herrera R. J., Wilson J. F., Myers S., Capelli C., 2015 The Role of Recent Admixture in
Forming the Contemporary West Eurasian Genomic Landscape. Curr. Biol. CB 25: 2518–2526.
29
Chang C. C., Chow C. C., Tellier L. C., Vattikuti S., Purcell S. M., Lee J. J., 2015 Second-generation PLINK:
rising to the challenge of larger and richer datasets. GigaScience 4: 7.
Delaneau O., Marchini J., Zagury J.-F., 2012 A linear complexity phasing method for thousands of genomes. Nat.
Methods 9: 179–181.
Derry T. K., 2000 History of Scandinavia: Norway, Sweden, Denmark, Finland, and Iceland. University of
Minnesota Press.
Devlin B., Roeder K., 1999 Genomic control for association studies. Biometrics 55: 997–1004.
Espeseth T., Christoforou A., Lundervold A. J., Steen V. M., Le Hellard S., Reinvang I., 2012 Imaging and
cognitive genetics: the Norwegian Cognitive NeuroGenetics sample. Twin Res. Hum. Genet. Off. J. Int.
Soc. Twin Stud. 15: 442–452.
Genome of the Netherlands Consortium, 2014 Whole-genome sequence variation, population structure and
demographic history of the Dutch population. Nat. Genet. 46: 818–825.
Green R. E., Krause J., Briggs A. W., Maricic T., Stenzel U., Kircher M., Patterson N., Li H., Zhai W., Fritz M.
H.-Y., Hansen N. F., Durand E. Y., Malaspinas A.-S., Jensen J. D., Marques-Bonet T., Alkan C., Prüfer
K., Meyer M., Burbano H. A., Good J. M., Schultz R., Aximu-Petri A., Butthof A., Höber B., Höffner
B., Siegemund M., Weihmann A., Nusbaum C., Lander E. S., Russ C., Novod N., Affourtit J., Egholm
M., Verna C., Rudan P., Brajkovic D., Kucan Z., Gusic I., Doronichev V. B., Golovanova L. V.,
Lalueza-Fox C., Rasilla M. de la, Fortea J., Rosas A., Schmitz R. W., Johnson P. L. F., Eichler E. E.,
Falush D., Birney E., Mullikin J. C., Slatkin M., Nielsen R., Kelso J., Lachmann M., Reich D., Pääbo S.,
2010 A draft sequence of the Neandertal genome. Science 328: 710–722.
Hellenthal G., Busby G. B. J., Band G., Wilson J. F., Capelli C., Falush D., Myers S., 2014 A genetic atlas of
human admixture history. Science 343: 747–751.
Howie B., Fuchsberger C., Stephens M., Marchini J., Abecasis G. R., 2012 Fast and accurate genotype imputation
in genome-wide association studies through pre-phasing. Nat. Genet. 44: 955–959.
Jakkula E., Rehnström K., Varilo T., Pietiläinen O. P. H., Paunio T., Pedersen N. L., deFaire U., Järvelin M.-R.,
Saharinen J., Freimer N., Ripatti S., Purcell S., Collins A., Daly M. J., Palotie A., Peltonen L., 2008 The
genome-wide patterns of variation expose significant substructure in a founder population. Am. J. Hum.
Genet. 83: 787–794.
30
Jensen P., Pedersen P. J., 2007 To Stay or Not to Stay? Out-Migration of Immigrants from Denmark. Int. Migr. 45:
87–113.
Karakachoff M., Duforet-Frebourg N., Simonet F., Le Scouarnec S., Pellen N., Lecointe S., Charpentier E., Gros
F., Cauchi S., Froguel P., Copin N., D.E.S.I.R. Study Group, Le Tourneau T., Probst V., Le Marec H.,
Molinaro S., Balkau B., Redon R., Schott J.-J., Blum M. G., Dina C., D E S I R Study Group, 2015 Fine-
scale human genetic structure in Western France. Eur. J. Hum. Genet. EJHG 23: 831–836.
Laaksovirta H., Peuralinna T., Schymick J. C., Scholz S. W., Lai S.-L., Myllykangas L., Sulkava R., Jansson L.,
Hernandez D. G., Gibbs J. R., Nalls M. A., Heckerman D., Tienari P. J., Traynor B. J., 2010
Chromosome 9p21 in amyotrophic lateral sclerosis in Finland: a genome-wide association study. Lancet
Neurol. 9: 978–985.
Lao O., Lu T. T., Nothnagel M., Junge O., Freitag-Wolf S., Caliebe A., Balascakova M., Bertranpetit J., Bindoff L.
A., Comas D., Holmlund G., Kouvatsi A., Macek M., Mollet I., Parson W., Palo J., Ploski R., Sajantila
A., Tagliabracci A., Gether U., Werge T., Rivadeneira F., Hofman A., Uitterlinden A. G., Gieger C.,
Wichmann H.-E., Rüther A., Schreiber S., Becker C., Nürnberg P., Nelson M. R., Krawczak M., Kayser
M., 2008 Correlation between genetic and geographic structure in Europe. Curr. Biol. CB 18: 1241–
1248.
Larsen M. H., Albrechtsen A., Thørner L. W., Werge T., Hansen T., Gether U., Haastrup E., Ullum H., 2013
Genome-Wide Association Study of Genetic Variants in LPS-Stimulated IL-6, IL-8, IL-10, IL-1ra and
TNF-α Cytokine Response in a Danish Cohort. PloS One 8: e66262.
Lawson D. J., Hellenthal G., Myers S., Falush D., 2012 Inference of population structure using dense haplotype
data. PLoS Genet. 8: e1002453.
Leslie S., Winney B., Hellenthal G., Davison D., Boumertit A., Day T., Hutnik K., Royrvik E. C., Cunliffe B.,
Wellcome Trust Case Control Consortium 2, International Multiple Sclerosis Genetics Consortium,
Lawson D. J., Falush D., Freeman C., Pirinen M., Myers S., Robinson M., Donnelly P., Bodmer W.,
2015 The fine-scale genetic structure of the British population. Nature 519: 309–314.
Li J. Z., Absher D. M., Tang H., Southwick A. M., Casto A. M., Ramachandran S., Cann H. M., Barsh G. S.,
Feldman M., Cavalli-Sforza L. L., Myers R. M., 2008 Worldwide human relationships inferred from
genome-wide patterns of variation. Science 319: 1100–1104.
31
Locke A. E., Kahali B., Berndt S. I., Justice A. E., Pers T. H., Day F. R., Powell C., Vedantam S., Buchkovich M.
L., Yang J., Croteau-Chonka D. C., Esko T., Fall T., Ferreira T., Gustafsson S., Kutalik Z., Luan J. ’an,
Mägi R., Randall J. C., Winkler T. W., Wood A. R., Workalemahu T., Faul J. D., Smith J. A., Hua Zhao
J., Zhao W., Chen J., Fehrmann R., Hedman Å. K., Karjalainen J., Schmidt E. M., Absher D., Amin N.,
Anderson D., Beekman M., Bolton J. L., Bragg-Gresham J. L., Buyske S., Demirkan A., Deng G., Ehret
G. B., Feenstra B., Feitosa M. F., Fischer K., Goel A., Gong J., Jackson A. U., Kanoni S., Kleber M. E.,
Kristiansson K., Lim U., Lotay V., Mangino M., Mateo Leach I., Medina-Gomez C., Medland S. E.,
Nalls M. A., Palmer C. D., Pasko D., Pechlivanis S., Peters M. J., Prokopenko I., Shungin D.,
Stančáková A., Strawbridge R. J., Ju Sung Y., Tanaka T., Teumer A., Trompet S., Laan S. W. van der,
Setten J. van, Van Vliet-Ostaptchouk J. V., Wang Z., Yengo L., Zhang W., Isaacs A., Albrecht E.,
Ärnlöv J., Arscott G. M., Attwood A. P., Bandinelli S., Barrett A., Bas I. N., Bellis C., Bennett A. J.,
Berne C., Blagieva R., Blüher M., Böhringer S., Bonnycastle L. L., Böttcher Y., Boyd H. A.,
Bruinenberg M., Caspersen I. H., Ida Chen Y.-D., Clarke R., Daw E. W., Craen A. J. M. de, Delgado G.,
Dimitriou M., Doney A. S. F., Eklund N., Estrada K., Eury E., Folkersen L., Fraser R. M., Garcia M. E.,
Geller F., Giedraitis V., Gigante B., Go A. S., Golay A., Goodall A. H., Gordon S. D., Gorski M., Grabe
H.-J., Grallert H., Grammer T. B., Gräßler J., Grönberg H., Groves C. J., Gusto G., Haessler J., Hall P.,
Haller T., Hallmans G., Hartman C. A., Hassinen M., Hayward C., Heard-Costa N. L., Helmer Q.,
Hengstenberg C., Holmen O., Hottenga J.-J., James A. L., Jeff J. M., Johansson Å., Jolley J., Juliusdottir
T., Kinnunen L., Koenig W., Koskenvuo M., Kratzer W., Laitinen J., Lamina C., Leander K., Lee N. R.,
Lichtner P., Lind L., Lindström J., Sin Lo K., Lobbens S., Lorbeer R., Lu Y., Mach F., Magnusson P. K.
E., Mahajan A., McArdle W. L., McLachlan S., Menni C., Merger S., Mihailov E., Milani L., Moayyeri
A., Monda K. L., Morken M. A., Mulas A., Müller G., Müller-Nurasyid M., Musk A. W., Nagaraja R.,
Nöthen M. M., Nolte I. M., Pilz S., Rayner N. W., Renstrom F., Rettig R., Ried J. S., Ripke S.,
Robertson N. R., Rose L. M., Sanna S., Scharnagl H., Scholtens S., Schumacher F. R., Scott W. R.,
Seufferlein T., Shi J., Vernon Smith A., Smolonska J., Stanton A. V., Steinthorsdottir V., Stirrups K.,
Stringham H. M., Sundström J., Swertz M. A., Swift A. J., Syvänen A.-C., Tan S.-T., Tayo B. O.,
Thorand B., Thorleifsson G., Tyrer J. P., Uh H.-W., Vandenput L., Verhulst F. C., Vermeulen S. H.,
Verweij N., Vonk J. M., Waite L. L., Warren H. R., Waterworth D., Weedon M. N., Wilkens L. R.,
Willenborg C., Wilsgaard T., Wojczynski M. K., Wong A., Wright A. F., Zhang Q., LifeLines Cohort
Study, Brennan E. P., Choi M., Dastani Z., Drong A. W., Eriksson P., Franco-Cereceda A., Gådin J. R.,
Gharavi A. G., Goddard M. E., Handsaker R. E., Huang J., Karpe F., Kathiresan S., Keildson S., Kiryluk
K., Kubo M., Lee J.-Y., Liang L., Lifton R. P., Ma B., McCarroll S. A., McKnight A. J., Min J. L.,
Moffatt M. F., Montgomery G. W., Murabito J. M., Nicholson G., Nyholt D. R., Okada Y., Perry J. R.
B., Dorajoo R., Reinmaa E., Salem R. M., Sandholm N., Scott R. A., Stolk L., Takahashi A., Tanaka T.,
32
Van’t Hooft F. M., Vinkhuyzen A. A. E., Westra H.-J., Zheng W., Zondervan K. T., ADIPOGen
Consortium, AGEN-BMI Working Group, CARDIOGRAMplusC4D Consortium, CKDGen
Consortium, GLGC, ICBP, MAGIC Investigators, MuTHER Consortium, MIGen Consortium, PAGE
Consortium, ReproGen Consortium, GENIE Consortium, International Endogene Consortium, Heath A.
C., Arveiler D., Bakker S. J. L., Beilby J., Bergman R. N., Blangero J., Bovet P., Campbell H., Caulfield
M. J., Cesana G., Chakravarti A., Chasman D. I., Chines P. S., Collins F. S., Crawford D. C., Cupples L.
A., Cusi D., Danesh J., Faire U. de, Ruijter H. M. den, Dominiczak A. F., Erbel R., Erdmann J., Eriksson
J. G., Farrall M., Felix S. B., Ferrannini E., Ferrières J., Ford I., Forouhi N. G., Forrester T., Franco O.
H., Gansevoort R. T., Gejman P. V., Gieger C., Gottesman O., Gudnason V., Gyllensten U., Hall A. S.,
Harris T. B., Hattersley A. T., Hicks A. A., Hindorff L. A., Hingorani A. D., Hofman A., Homuth G.,
Hovingh G. K., Humphries S. E., Hunt S. C., Hyppönen E., Illig T., Jacobs K. B., Jarvelin M.-R., Jöckel
K.-H., Johansen B., Jousilahti P., Jukema J. W., Jula A. M., Kaprio J., Kastelein J. J. P., Keinanen-
Kiukaanniemi S. M., Kiemeney L. A., Knekt P., Kooner J. S., Kooperberg C., Kovacs P., Kraja A. T.,
Kumari M., Kuusisto J., Lakka T. A., Langenberg C., Le Marchand L., Lehtimäki T., Lyssenko V.,
Männistö S., Marette A., Matise T. C., McKenzie C. A., McKnight B., Moll F. L., Morris A. D., Morris
A. P., Murray J. C., Nelis M., Ohlsson C., Oldehinkel A. J., Ong K. K., Madden P. A. F., Pasterkamp G.,
Peden J. F., Peters A., Postma D. S., Pramstaller P. P., Price J. F., Qi L., Raitakari O. T., Rankinen T.,
Rao D. C., Rice T. K., Ridker P. M., Rioux J. D., Ritchie M. D., Rudan I., Salomaa V., Samani N. J.,
Saramies J., Sarzynski M. A., Schunkert H., Schwarz P. E. H., Sever P., Shuldiner A. R., Sinisalo J.,
Stolk R. P., Strauch K., Tönjes A., Trégouët D.-A., Tremblay A., Tremoli E., Virtamo J., Vohl M.-C.,
Völker U., Waeber G., Willemsen G., Witteman J. C., Zillikens M. C., Adair L. S., Amouyel P.,
Asselbergs F. W., Assimes T. L., Bochud M., Boehm B. O., Boerwinkle E., Bornstein S. R., Bottinger E.
P., Bouchard C., Cauchi S., Chambers J. C., Chanock S. J., Cooper R. S., Bakker P. I. W. de, Dedoussis
G., Ferrucci L., Franks P. W., Froguel P., Groop L. C., Haiman C. A., Hamsten A., Hui J., Hunter D. J.,
Hveem K., Kaplan R. C., Kivimaki M., Kuh D., Laakso M., Liu Y., Martin N. G., März W., Melbye M.,
Metspalu A., Moebus S., Munroe P. B., Njølstad I., Oostra B. A., Palmer C. N. A., Pedersen N. L.,
Perola M., Pérusse L., Peters U., Power C., Quertermous T., Rauramaa R., Rivadeneira F., Saaristo T.
E., Saleheen D., Sattar N., Schadt E. E., Schlessinger D., Slagboom P. E., Snieder H., Spector T. D.,
Thorsteinsdottir U., Stumvoll M., Tuomilehto J., Uitterlinden A. G., Uusitupa M., Harst P. van der,
Walker M., Wallaschofski H., Wareham N. J., Watkins H., Weir D. R., Wichmann H.-E., Wilson J. F.,
Zanen P., Borecki I. B., Deloukas P., Fox C. S., Heid I. M., O’Connell J. R., Strachan D. P., Stefansson
K., Duijn C. M. van, Abecasis G. R., Franke L., Frayling T. M., McCarthy M. I., Visscher P. M.,
Scherag A., Willer C. J., Boehnke M., Mohlke K. L., Lindgren C. M., Beckmann J. S., Barroso I., North
33
K. E., Ingelsson E., Hirschhorn J. N., Loos R. J. F., Speliotes E. K., 2015 Genetic studies of body mass
index yield new insights for obesity biology. Nature 518: 197–206.
Manichaikul A., Mychaleckyj J. C., Rich S. S., Daly K., Sale M., Chen W.-M., 2010 Robust relationship inference
in genome-wide association studies. Bioinforma. Oxf. Engl. 26: 2867–2873.
Montinaro F., Busby G. B. J., Pascali V. L., Myers S., Hellenthal G., Capelli C., 2015 Unravelling the hidden
ancestry of American admixed populations. Nat. Commun. 6: 6596.
Moreno-Estrada A., Gignoux C. R., Fernández-López J. C., Zakharia F., Sikora M., Contreras A. V., Acuña-
Alonzo V., Sandoval K., Eng C., Romero-Hidalgo S., Ortiz-Tello P., Robles V., Kenny E. E., Nuño-
Arana I., Barquera-Lozano R., Macín-Pérez G., Granados-Arriola J., Huntsman S., Galanter J. M., Via
M., Ford J. G., Chapela R., Rodriguez-Cintron W., Rodríguez-Santana J. R., Romieu I., Sienra-Monge J.
J., Rio Navarro B. del, London S. J., Ruiz-Linares A., Garcia-Herrera R., Estrada K., Hidalgo-Miranda
A., Jimenez-Sanchez G., Carnevale A., Soberón X., Canizales-Quinteros S., Rangel-Villalobos H.,
Silva-Zolezzi I., Burchard E. G., Bustamante C. D., 2014 Human genetics. The genetics of Mexico
recapitulates Native American substructure and affects biomedical traits. Science 344: 1280–1285.
Moreno-Mayar J. V., Rasmussen S., Seguin-Orlando A., Rasmussen M., Liang M., Flåm S. T., Lie B. A., Gilfillan
G. D., Nielsen R., Thorsby E., Willerslev E., Malaspinas A.-S., 2014 Genome-wide ancestry patterns in
Rapanui suggest pre-European admixture with Native Americans. Curr. Biol. CB 24: 2518–2525.
Nelson M. R., Bryc K., King K. S., Indap A., Boyko A. R., Novembre J., Briley L. P., Maruyama Y., Waterworth
D. M., Waeber G., Vollenweider P., Oksenberg J. R., Hauser S. L., Stirnadel H. A., Kooner J. S.,
Chambers J. C., Jones B., Mooser V., Bustamante C. D., Roses A. D., Burns D. K., Ehm M. G., Lai E.
H., 2008 The Population Reference Sample, POPRES: a resource for population, disease, and
pharmacological genetics research. Am. J. Hum. Genet. 83: 347–358.
Novembre J., Johnson T., Bryc K., Kutalik Z., Boyko A. R., Auton A., Indap A., King K. S., Bergmann S., Nelson
M. R., Stephens M., Bustamante C. D., 2008 Genes mirror geography within Europe. Nature 456: 98–
101.
Price A. L., Patterson N. J., Plenge R. M., Weinblatt M. E., Shadick N. A., Reich D., 2006 Principal components
analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38: 904–909.
34
Price A. L., Helgason A., Palsson S., Stefansson H., St Clair D., Andreassen O. A., Reich D., Kong A., Stefansson
K., 2009 The impact of divergence time on the nature of population structure: an example from Iceland.
PLoS Genet. 5: e1000505.
Pritchard J. K., Stephens M., Donnelly P., 2000 Inference of population structure using multilocus genotype data.
Genetics 155: 945–959.
Reich D., Thangaraj K., Patterson N., Price A. L., Singh L., 2009 Reconstructing Indian population history. Nature
461: 489–494.
Reich D., Patterson N., Kircher M., Delfin F., Nandineni M. R., Pugach I., Ko A. M.-S., Ko Y.-C., Jinam T. A.,
Phipps M. E., Saitou N., Wollstein A., Kayser M., Pääbo S., Stoneking M., 2011 Denisova admixture
and the first modern human dispersals into Southeast Asia and Oceania. Am. J. Hum. Genet. 89: 516–
528.
Ripke S., O’Dushlaine C., Chambert K., Moran J. L., Kähler A. K., Akterin S., Bergen S. E., Collins A. L.,
Crowley J. J., Fromer M., Kim Y., Lee S. H., Magnusson P. K. E., Sanchez N., Stahl E. A., Williams S.,
Wray N. R., Xia K., Bettella F., Borglum A. D., Bulik-Sullivan B. K., Cormican P., Craddock N., Leeuw
C. de, Durmishi N., Gill M., Golimbet V., Hamshere M. L., Holmans P., Hougaard D. M., Kendler K. S.,
Lin K., Morris D. W., Mors O., Mortensen P. B., Neale B. M., O’Neill F. A., Owen M. J., Milovancevic
M. P., Posthuma D., Powell J., Richards A. L., Riley B. P., Ruderfer D., Rujescu D., Sigurdsson E.,
Silagadze T., Smit A. B., Stefansson H., Steinberg S., Suvisaari J., Tosato S., Verhage M., Walters J. T.,
Multicenter Genetic Studies of Schizophrenia Consortium, Levinson D. F., Gejman P. V., Kendler K. S.,
Laurent C., Mowry B. J., O’Donovan M. C., Owen M. J., Pulver A. E., Riley B. P., Schwab S. G.,
Wildenauer D. B., Dudbridge F., Holmans P., Shi J., Albus M., Alexander M., Campion D., Cohen D.,
Dikeos D., Duan J., Eichhammer P., Godard S., Hansen M., Lerer F. B., Liang K.-Y., Maier W., Mallet
J., Nertney D. A., Nestadt G., Norton N., O’Neill F. A., Papadimitriou G. N., Ribble R., Sanders A. R.,
Silverman J. M., Walsh D., Williams N. M., Wormley B., Psychosis Endophenotypes International
Consortium, Arranz M. J., Bakker S., Bender S., Bramon E., Collier D., Crespo-Facorro B., Hall J.,
Iyegbe C., Jablensky A., Kahn R. S., Kalaydjieva L., Lawrie S., Lewis C. M., Lin K., Linszen D. H.,
Mata I., McIntosh A., Murray R. M., Ophoff R. A., Powell J., Rujescu D., Van Os J., Walshe M.,
Weisbrod M., Wiersma D., Wellcome Trust Case Control Consortium 2, Donnelly P., Barroso I.,
Blackwell J. M., Bramon E., Brown M. A., Casas J. P., Corvin A. P., Deloukas P., Duncanson A.,
Jankowski J., Markus H. S., Mathew C. G., Palmer C. N. A., Plomin R., Rautanen A., Sawcer S. J.,
Trembath R. C., Viswanathan A. C., Wood N. W., Spencer C. C. A., Band G., Bellenguez C., Freeman
35
C., Hellenthal G., Giannoulatou E., Pirinen M., Pearson R. D., Strange A., Su Z., Vukcevic D., Donnelly
P., Langford C., Hunt S. E., Edkins S., Gwilliam R., Blackburn H., Bumpstead S. J., Dronov S., Gillman
M., Gray E., Hammond N., Jayakumar A., McCann O. T., Liddle J., Potter S. C., Ravindrarajah R.,
Ricketts M., Tashakkori-Ghanbaria A., Waller M. J., Weston P., Widaa S., Whittaker P., Barroso I.,
Deloukas P., Mathew C. G., Blackwell J. M., Brown M. A., Corvin A. P., McCarthy M. I., Spencer C. C.
A., Bramon E., Corvin A. P., O’Donovan M. C., Stefansson K., Scolnick E., Purcell S., McCarroll S. A.,
Sklar P., Hultman C. M., Sullivan P. F., 2013 Genome-wide association analysis identifies 13 new risk
loci for schizophrenia. Nat. Genet. 45: 1150–1159.
Tang H., Peng J., Wang P., Risch N. J., 2005 Estimation of individual admixture: analytical and study design
considerations. Genet. Epidemiol. 28: 289–301.
Vilhjálmsson B. J., Yang J., Finucane H. K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia
G., Do R., Hayeck T., Won H.-H., Schizophrenia Working Group of the Psychiatric Genomics
Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study,
Kathiresan S., Pato M., Pato C., Tamimi R., Stahl E., Zaitlen N., Pasaniuc B., Belbin G., Kenny E. E.,
Schierup M. H., De Jager P., Patsopoulos N. A., McCarroll S., Daly M., Purcell S., Chasman D., Neale
B., Goddard M., Visscher P. M., Kraft P., Patterson N., Price A. L., Discovery Biology and Risk of
Inherited Variants in Breast Cancer DRIVE study, 2015 Modeling Linkage Disequilibrium Increases
Accuracy of Polygenic Risk Scores. Am. J. Hum. Genet. 97: 576–592.
Visscher P. M., Hill W. G., Wray N. R., 2008 Heritability in the genomics era--concepts and misconceptions. Nat.
Rev. Genet. 9: 255–266.
Welter D., MacArthur J., Morales J., Burdett T., Hall P., Junkins H., Klemm A., Flicek P., Manolio T., Hindorff
L., Parkinson H., 2014 The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.
Nucleic Acids Res. 42: D1001–1006.
Wood A. R., Esko T., Yang J., Vedantam S., Pers T. H., Gustafsson S., Chu A. Y., Estrada K., Luan J. ’an, Kutalik
Z., Amin N., Buchkovich M. L., Croteau-Chonka D. C., Day F. R., Duan Y., Fall T., Fehrmann R.,
Ferreira T., Jackson A. U., Karjalainen J., Lo K. S., Locke A. E., Mägi R., Mihailov E., Porcu E.,
Randall J. C., Scherag A., Vinkhuyzen A. A. E., Westra H.-J., Winkler T. W., Workalemahu T., Zhao J.
H., Absher D., Albrecht E., Anderson D., Baron J., Beekman M., Demirkan A., Ehret G. B., Feenstra B.,
Feitosa M. F., Fischer K., Fraser R. M., Goel A., Gong J., Justice A. E., Kanoni S., Kleber M. E.,
Kristiansson K., Lim U., Lotay V., Lui J. C., Mangino M., Mateo Leach I., Medina-Gomez C., Nalls M.
A., Nyholt D. R., Palmer C. D., Pasko D., Pechlivanis S., Prokopenko I., Ried J. S., Ripke S., Shungin
36
D., Stancáková A., Strawbridge R. J., Sung Y. J., Tanaka T., Teumer A., Trompet S., Laan S. W. van
der, Setten J. van, Van Vliet-Ostaptchouk J. V., Wang Z., Yengo L., Zhang W., Afzal U., Arnlöv J.,
Arscott G. M., Bandinelli S., Barrett A., Bellis C., Bennett A. J., Berne C., Blüher M., Bolton J. L.,
Böttcher Y., Boyd H. A., Bruinenberg M., Buckley B. M., Buyske S., Caspersen I. H., Chines P. S.,
Clarke R., Claudi-Boehm S., Cooper M., Daw E. W., De Jong P. A., Deelen J., Delgado G., Denny J. C.,
Dhonukshe-Rutten R., Dimitriou M., Doney A. S. F., Dörr M., Eklund N., Eury E., Folkersen L., Garcia
M. E., Geller F., Giedraitis V., Go A. S., Grallert H., Grammer T. B., Gräßler J., Grönberg H., Groot L.
C. P. G. M. de, Groves C. J., Haessler J., Hall P., Haller T., Hallmans G., Hannemann A., Hartman C.
A., Hassinen M., Hayward C., Heard-Costa N. L., Helmer Q., Hemani G., Henders A. K., Hillege H. L.,
Hlatky M. A., Hoffmann W., Hoffmann P., Holmen O., Houwing-Duistermaat J. J., Illig T., Isaacs A.,
James A. L., Jeff J., Johansen B., Johansson Å., Jolley J., Juliusdottir T., Junttila J., Kho A. N.,
Kinnunen L., Klopp N., Kocher T., Kratzer W., Lichtner P., Lind L., Lindström J., Lobbens S.,
Lorentzon M., Lu Y., Lyssenko V., Magnusson P. K. E., Mahajan A., Maillard M., McArdle W. L.,
McKenzie C. A., McLachlan S., McLaren P. J., Menni C., Merger S., Milani L., Moayyeri A., Monda K.
L., Morken M. A., Müller G., Müller-Nurasyid M., Musk A. W., Narisu N., Nauck M., Nolte I. M.,
Nöthen M. M., Oozageer L., Pilz S., Rayner N. W., Renstrom F., Robertson N. R., Rose L. M., Roussel
R., Sanna S., Scharnagl H., Scholtens S., Schumacher F. R., Schunkert H., Scott R. A., Sehmi J.,
Seufferlein T., Shi J., Silventoinen K., Smit J. H., Smith A. V., Smolonska J., Stanton A. V., Stirrups K.,
Stott D. J., Stringham H. M., Sundström J., Swertz M. A., Syvänen A.-C., Tayo B. O., Thorleifsson G.,
Tyrer J. P., Dijk S. van, Schoor N. M. van, Velde N. van der, Heemst D. van, Oort F. V. A. van,
Vermeulen S. H., Verweij N., Vonk J. M., Waite L. L., Waldenberger M., Wennauer R., Wilkens L. R.,
Willenborg C., Wilsgaard T., Wojczynski M. K., Wong A., Wright A. F., Zhang Q., Arveiler D., Bakker
S. J. L., Beilby J., Bergman R. N., Bergmann S., Biffar R., Blangero J., Boomsma D. I., Bornstein S. R.,
Bovet P., Brambilla P., Brown M. J., Campbell H., Caulfield M. J., Chakravarti A., Collins R., Collins F.
S., Crawford D. C., Cupples L. A., Danesh J., Faire U. de, Ruijter H. M. den, Erbel R., Erdmann J.,
Eriksson J. G., Farrall M., Ferrannini E., Ferrières J., Ford I., Forouhi N. G., Forrester T., Gansevoort R.
T., Gejman P. V., Gieger C., Golay A., Gottesman O., Gudnason V., Gyllensten U., Haas D. W., Hall A.
S., Harris T. B., Hattersley A. T., Heath A. C., Hengstenberg C., Hicks A. A., Hindorff L. A., Hingorani
A. D., Hofman A., Hovingh G. K., Humphries S. E., Hunt S. C., Hypponen E., Jacobs K. B., Jarvelin
M.-R., Jousilahti P., Jula A. M., Kaprio J., Kastelein J. J. P., Kayser M., Kee F., Keinanen-Kiukaanniemi
S. M., Kiemeney L. A., Kooner J. S., Kooperberg C., Koskinen S., Kovacs P., Kraja A. T., Kumari M.,
Kuusisto J., Lakka T. A., Langenberg C., Le Marchand L., Lehtimäki T., Lupoli S., Madden P. A. F.,
Männistö S., Manunta P., Marette A., Matise T. C., McKnight B., Meitinger T., Moll F. L., Montgomery
G. W., Morris A. D., Morris A. P., Murray J. C., Nelis M., Ohlsson C., Oldehinkel A. J., Ong K. K.,
37
Ouwehand W. H., Pasterkamp G., Peters A., Pramstaller P. P., Price J. F., Qi L., Raitakari O. T.,
Rankinen T., Rao D. C., Rice T. K., Ritchie M., Rudan I., Salomaa V., Samani N. J., Saramies J.,
Sarzynski M. A., Schwarz P. E. H., Sebert S., Sever P., Shuldiner A. R., Sinisalo J., Steinthorsdottir V.,
Stolk R. P., Tardif J.-C., Tönjes A., Tremblay A., Tremoli E., Virtamo J., Vohl M.-C., Electronic
Medical Records and Genomics (eMEMERGEGE) Consortium, MIGen Consortium, PAGEGE
Consortium, LifeLines Cohort Study, Amouyel P., Asselbergs F. W., Assimes T. L., Bochud M., Boehm
B. O., Boerwinkle E., Bottinger E. P., Bouchard C., Cauchi S., Chambers J. C., Chanock S. J., Cooper R.
S., Bakker P. I. W. de, Dedoussis G., Ferrucci L., Franks P. W., Froguel P., Groop L. C., Haiman C. A.,
Hamsten A., Hayes M. G., Hui J., Hunter D. J., Hveem K., Jukema J. W., Kaplan R. C., Kivimaki M.,
Kuh D., Laakso M., Liu Y., Martin N. G., März W., Melbye M., Moebus S., Munroe P. B., Njølstad I.,
Oostra B. A., Palmer C. N. A., Pedersen N. L., Perola M., Pérusse L., Peters U., Powell J. E., Power C.,
Quertermous T., Rauramaa R., Reinmaa E., Ridker P. M., Rivadeneira F., Rotter J. I., Saaristo T. E.,
Saleheen D., Schlessinger D., Slagboom P. E., Snieder H., Spector T. D., Strauch K., Stumvoll M.,
Tuomilehto J., Uusitupa M., Harst P. van der, Völzke H., Walker M., Wareham N. J., Watkins H.,
Wichmann H.-E., Wilson J. F., Zanen P., Deloukas P., Heid I. M., Lindgren C. M., Mohlke K. L.,
Speliotes E. K., Thorsteinsdottir U., Barroso I., Fox C. S., North K. E., Strachan D. P., Beckmann J. S.,
Berndt S. I., Boehnke M., Borecki I. B., McCarthy M. I., Metspalu A., Stefansson K., Uitterlinden A. G.,
Duijn C. M. van, Franke L., Willer C. J., Price A. L., Lettre G., Loos R. J. F., Weedon M. N., Ingelsson
E., O’Connell J. R., Abecasis G. R., Chasman D. I., Goddard M. E., Visscher P. M., Hirschhorn J. N.,
Frayling T. M., 2014 Defining the role of common variation in the genomic and biological architecture
of adult human height. Nat. Genet. 46: 1173–1186.
Yu J., Pressoir G., Briggs W. H., Vroh Bi I., Yamasaki M., Doebley J. F., McMullen M. D., Gaut B. S., Nielsen D.
M., Holland J. B., Kresovich S., Buckler E. S., 2006 A unified mixed-model method for association
mapping that accounts for multiple levels of relatedness. Nat. Genet. 38: 203–208.
Supplementary Figure Legends
Figure S1. Co-ancestry curves for all pairs of surrogate populations identified in
Zealand (Figure 3B; Table S2).
Figure S2. Co-ancestry curves for all pairs of surrogate populations identified in
South Jutland (Figure 3B; Table S2).
38
Figure S3. Co-ancestry curves for all pairs of surrogate populations identified in
Central Jutland (Figure 3B; Table S2).
Figure S4. Co-ancestry curves for all pairs of surrogate populations identified in
North Jutland (Figure 3B; Table S2).
Figure S5. Ancestral component analysis of 13 European countries (including six
well-defined geographic regions in Denmark shown in the inset) assuming K = {2, …,
9} ancestral populations.
Figure S6. Proportion of zero IBS sharing vs. kinship coefficient for 82,215 pairs of
individuals in a sample of 406 high school students who had all four of their
grandparents born in Denmark. Following KING’s guidelines, an estimated kinship
coefficient range (i) > 0.354, (ii) [0.177, 0.354], (iii) [0.0884, 0.177], (iv) [0.0442,
0.0884] and (v) [0, 0.0221] was considered to correspond to duplicate/MZ twin, 1st-
degree, 2nd-degree, 3rd-degree and 4th-degree (or further) relationships respectively.
The vast majority of pairs were either unrelated (negative kinship coefficient in the
grey-shaded part of the plot) or distantly related (kinship ∈ (0, 0.0221) in the red-
shaded part of the plot).
Figure S7. Prediction accuracy of LDpred for height and BMI when age, sex and
PCA information were (i) adjusted for (A and C), and (ii) included (B and D) in the
model.
39
Table 1. Summary of GLOBETROTTER’s inference of the date of admixture and
composition of admixing sources for each of four geographic regions of Denmark
Admixture event #1 Admixture event #2
Region Best fitting type
of admixture
Goodness
of fit (R2)
Years before
present [95% CI]
Year Minor
component
Major
component
Ratio Minor
component
Major
component
Ratio
Zealand
(N = 36)
1 date multi-way 0.208 943 [579, 1488] 1052 BRI SWE 0.47:0.53 GER POL 0.47:0.53
S.Jutland
(N = 37)
1 date multi-way 0.136 756 [232, 1548] 1239 FRA POL 0.21:0.79 GER POL 0.43:0.57
C.Jutland
(N = 89)
1 date multi-way 0.354 611 [418, 823] 1384 BRI POL 0.31:0.69 POL GER 0.45:0.55
N.Jutland
(N = 48)
1 date multi-way 0.239 434 [118, 741] 1561 BRI POL 0.33:0.67 GER POL 0.42:0.58
Type of admixture was chosen among three alternatives: (i) one date; (ii) one date
multi-way; (ii) two or more dates. Multi-way admixture involves two admixture
events. If a source population appears in both admixture events #1 and #2, then the
interpretation is “three-way admixture” (for more details see Busby et al. 2015). R2:
goodness of fit for a single date of admixture, taking the maximum value across all
inferred co-ancestry curves (R2 ≥ 0.2 corresponds to good fit). GLOBETROTTER
returns date estimates in generations. We multiplied these estimates by 30
years/generation. We retrieved historical dates by subtracting year estimates from
1995 (the median year of birth for our high school students). CI: confidence interval.
40
Table 2: Percentage of membership of 13 European countries (including six well-
defined geographic regions in Denmark shown in the inset) to each of K = 4 ancestral
populations
Ancestral component
Country/region % Southern (blue*)
% Eastern (yellow*) % Central (red*) % Nordic (green*)
PT 72.43 8.59 12.15 6.83
ES 78.32 3.57 9.25 8.86
FR 45.70 15.19 19.23 19.88
BE 35.64 14.06 20.84 29.46
GB 30.08 32.28 15.20 22.45
NL 24.88 20.06 23.62 31.44
DE 23.10 17.22 22.40 37.28
AT 20.28 17.44 19.88 42.39
HU 29.41 49.76 11.83 9.00
PL 3.59 76.19 9.83 10.38
Ca 5.39 21.27 24.94 48.40
Ze 7.73 18.90 25.67 47.71
Fu 6.51 20.36 28.28 44.85
SJ 5.94 19.49 28.94 45.64
CJ 7.34 13.35 30.30 49.01
NJ 5.82 13.23 32.35 48.60
NO 8.56 13.24 30.91 47.29
SE 4.98 17.39 27.27 50.35
*This table complements Figure 4; country/region abbreviations are explained in
Figure 2.