Supplementary Information for:
Ancestry, admixture and selection in Colombian genomes Lavanya Rishishwar1,2,3,†, Andrew B. Conley1,†, Charles H. Wigington1, Lu Wang1, Augusto Valderrama-Aguirre2,4,5 and I. King Jordan1,2,3,* 1School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA 2PanAmerican Bioinformatics Institute, Cali, Valle del Cauca, Colombia 3BIOS Center de Bioinformática y Biología Computacional, Manizales, Caldas, Colombia 4Biomedical Research Institute, Universidad Libre, Cali, Valle del Cauca, Colombia 5Regenerar - Center of Excellence for Regenerative and Personalized Medicine, Valle del Cauca, Colombia
Ancestral Populations from Around the WorldName Color Ancestral Origin
CEU Northern & Western Europe 85 23128226 Whole genome sequencing
CHB Beijing, China 92 23128226 Whole genome sequencing
Totonac & Bolivian Mexico & Bolivia 45 22606979 SNP microarray
YRI Nigeria 87 23128226 Whole genome sequencing
Admixed Populations from the Americas
CLM Medellin, Colombia 60 23128226 Whole genome sequencing
Colombian Colombia 26 20445096 Illumina 610K array
Dominican Republican Dominican Republic 27 20445096 Illumina 610K array
Ecuador Ecuador 20 20445096 Illumina 610K array
Mexican Mexico 112 20445096 Affymetrix GeneChip 500K Array Set
Puerto Rican Puerto Rico 27 20445096 Illumina 610K array
n Source (PMID) Platform
Name Color Country n Source (PMID) Platform
S1 Figure. World-wide source of genome sequence and genotype data analyzed in this study. The global locations of putative ancestral populations are shown – African (blue), Asian (red), European (orange) and Native American (purple) – along with the locations of admixed American populations (green). Information on the source populations, number of samples, supporting publication and genome characterization platform is provided. Note: The global map shown here was modified from http://commons.wikimedia.org/wiki/File:World_Map_Blank_-_with_blue_sea.svg, which has been released into the public domain and thus free to be used by anyone for any purpose.
PC1 (81.07%)
PC2
(10.
54%
)
−15 −10 −5 0 5 10 15 20 25 30
−10
−50
510
European
East Asian
African
Medellin
Totonac &Bolivian
PC3 (3.79%)
PC4
(0.3
0%)
−15 −10 −5 0 5 10 15 20 25 30
−10
−50
510
European
East Asian
African
Medellin
Totonac &Bolivian
PC5 (0.11%)
PC6
(0.0
9%)
−15 −10 −5 0 5 10 15 20 25 30
−10
−50
510
European
African
Medellin
Totonac &Bolivian
PC7 (0.06%)
PC8
(0.0
6%)
−15 −10 −5 0 5 10 15 20 25 30
−10
−50
510
European
African
Medellin
Totonac &Bolivian
(A) (B)
(C) (D)
S2 Figure. PCA of pairwise allele sharing distances among admixed Colombian genomes from Medellin compared to putative ancestral populations from Africa, the Americas, Asia and Europe. The panels have comparisons of the successive principal components (PCs): (A) PC1-PC2, (B) PC3-PC4, (C) PC5-PC6 and (D) PC7-PC8.
Ancestry-specific Segment Size
Fre
quen
cy
0 5000 10000 15000
0
5000
10000
15000
20000
S3 Figure. Size distribution of the ancestry-specific chromosomal segments used for the ancestry enrichment analysis.
Scalechr2:
1 Mb hg19108,500,000 109,000,000 109,500,000 110,000,000 110,500,000
RefSeq Genes
Num
ber o
fC
hrom
osom
esLo
g Fo
ldEn
richm
ent
0.85
-1.57
0
86
7
Scalechr2:
2 Mb hg19107,000,000 108,000,000 109,000,000 110,000,000 111,000,000 112,000,000 113,000,000
RefSeq Genes
95
71.48
-1.87
0
Num
ber o
fC
hrom
osom
esLo
g Fo
ldEn
richm
ent
EuropeanAfrican Native American
EDAR
S4 Figure. Example of an Asian (American)-depleted region found at the EDAR locus. Population-wide chromosome counts from the three ancestry-components are shown above the genomic axis and log-fold enrichment values (observed counts/genomic average counts) for the ancestry components are shown below the axis. The upper panel shows a ~7Mb region surrounding the locus on chromosome 2, and the lower panel shows a zoomed in view centered on the Asian (American) ancestry-depleted gene EDAR.