recombination based population genomics jaume bertranpetit marta melé francesc calafell asif javed...

40
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Upload: eugene-gibbs

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Recombination based population genomics

Jaume Bertranpetit

Marta Melé

Francesc Calafell

Asif Javed

Laxmi Parida

Page 2: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Recall: IRiS

Identification of Recombinations in SequencesIdentification of Recombinations in Sequences

IRiS is a computational method developed with

biological insight detects evidence of historical recombinations minimizes number of recombinations in

Ancestral Recombinational Graph (ARG)

Page 3: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Recotypes

recombinationedge

mutation edge

extantsequence

Two chromosomes Two chromosomes shareshare a recombination if a recombination if the junction is co-inherited.the junction is co-inherited.

Page 4: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

RecotypesTwo chromosomes Two chromosomes shareshare a recombination if a recombination if the junction is co-inherited.the junction is co-inherited.

r1

a b

Page 5: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

RecotypesTwo chromosomes Two chromosomes shareshare a recombination if a recombination if the junction is co-inherited.the junction is co-inherited.

r1

r2

a bc

Page 6: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

RecotypesTwo chromosomes Two chromosomes shareshare a recombination if a recombination if the junction is co-inherited.the junction is co-inherited.

r1

r2

a bc

r1 r2 …

a 1 0

b 1 0

c 0 1

Page 7: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Validity of inferred recombinations

Comparison with sperm typing

Computer simulated recombinations

Page 8: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in vitroChr 1 near MS32 minisatellite

Jeffreys et al. 2005

80 UK semen donor of North European origin

- Sperm typing- LDhat and Phase (200 SNPs)

IRiS

LDhat Phase

spermtyping

HapMap 2 CEU populationsimilar SNP density

Page 9: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

Page 10: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

Page 11: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

Page 12: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

Page 13: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

Page 14: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

IRiS

recombination detected?

Page 15: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in silico

HapMap 3 X chromosome data

•Select 2 chromosomes at random.

•Pick a random breakpoint.

•Create a new chromosome.

•Check if it is unique, add to the dataset.

•Run IRiS on the dataset to see if the breakpoint is detected.

Chromosomes

IRiS

recombination detected?

69% recombinations detectedAll detected recombinations detect the correct sequenceNo false positives

Page 16: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Recombinomics

Strong population structure

Agreement with traditional methods FST vs. recombinational distance

More informative than SNPs STRUCTURE PCA

Page 17: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Regions18 regions selected from HapMap 3 X-chromosome in males

(to avoid phasing errors) 50 KB away from known CNV and SD

(to avoid genotyping errors) 50 KB away from genes

(to avoid selection) at least 80 SNPs

Chromosomes: LWK(4343), MKK (8888), YRI (8888), ASW (4242), GIH (4242), CHB (4040), CHD (2121), JPT(2525), MEX(2121), CEU (7474), TSI (4040)

Page 18: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Analysis

For each region IRiS inferred recotypes for each chromosome 5166 recombinations were inferred 3459 co-occurred in at least two chromosomes

r1 r2 r3 r4 r5 r6 … r3459

LK1 0 1 1 0 0 0 0

LK2 1 0 1 1 0 0 0

:

LK43 1 0 1 0 0 0

MK1 0 1 0 0 1 1 1

:

TI40 0 0 0 0 0 1 0

Chromosome

Recombination

Page 19: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Analysis

For each region IRiS inferred recotypes for each chromosome 5166 recombinations were inferred 3459 co-occurred in at least two chromosomes

r1 r2 r3 r4 r5 r6 … r3459

LK1 0 1 1 0 0 0 0

LK2 1 0 1 1 0 0 0

:

LK43 1 0 1 0 0 0

MK1 0 1 0 0 1 1 1

:

TI40 0 0 0 0 0 1 0

Chromosome

Recombination

Recotype

Page 20: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Agreement with LDhat

number of recombinations inferred by IRiS

reco

mbi

natio

n ra

te in

ferr

ed b

y LD

hat

Spearman correlation= 0.711pvalue <10-30

Each point represents a short haplotype segment Each point represents a short haplotype segment in HapMap CEU populationin HapMap CEU population

Page 21: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Agreement with LDhat

number of recombinations inferred by IRiS

reco

mbi

natio

n ra

te in

ferr

ed b

y LD

hat

Spearman correlation= 0.711pvalue <10-30

Each point represents a short haplotype segment Each point represents a short haplotype segment in HapMap CEU populationin HapMap CEU population

Correlation in hotspots

2 = 38.39

pvalue<6x10-10

Page 22: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Recombinational distance between populations

Two populations genetically closer will share a Two populations genetically closer will share a higher number of recombinationshigher number of recombinations

Recombinational distance

Correlation between FST distance and recombinational distance for the 18 region

[0.35 – 0.75 ] with pvalues < 0.025

=RA + RB -RAB

RABDAB

MDS All regions combined stress=6.1%

1 -

Page 23: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

PCA of population data r1 r2 r3 r4 r5 r6 … r3459

LK1 0 1 1 0 0 0 0

LK2 1 0 1 1 0 0 0

:

LK43 1 0 1 0 0 0

MK1 0 1 0 0 1 1 1

:

TI40 0 0 0 0 0 1 0

Recall recotypes

Page 24: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

PCA of population data r1 r2 r3 r4 r5 r6 … r3459

LK1 0 1 1 0 0 0 0

LK2 1 0 1 1 0 0 0

:

LK43 1 0 1 0 0 0

MK1 0 1 0 0 1 1 1

:

TI40 0 0 0 0 0 1 0

Recall recotypes

r1 r2 r3 r4 r5 r6 … r3459

LK 14 7 4 9 0 1 0

MK 1 4 7 0 5 7 24

:

TI 0 1 7 1 0 0 1

Page 25: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

PCA of population data

r1 r2 r3 r4 r5 r6 … r3459

LK 14 7 4 9 0 1 0

MK 1 4 7 0 5 7 24

:

TI 0 1 7 1 0 0 1

The first two PCs capture 66.4% of the variance

Page 26: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

PCA of recotypes

more on this later

Page 27: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Recotypes vs. SNPs

Due to ascertainment bias gene diversity does Due to ascertainment bias gene diversity does not reflect population structurenot reflect population structure

Percentage of variance

SNPs Recotypes

Across groups 9% 6%

Within groups 4% 1%

Within populations

87% 93%

Normalized comparison linearly scaled to [0,1] using 21 samples per populationin agreement with Lewontin 72

results similar to Conrad 07

Page 28: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

from SNPs to haplotypes to recotypes

(a STRUCTURE comparison)K=2

SNPs

haplotypes

recotypes

Page 29: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

from SNPs to haplotypes to recotypes

(a STRUCTURE comparison)K=3

SNPs

haplotypes

recotypes

Page 30: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

from SNPs to haplotypes to recotypes

(a STRUCTURE comparison)K=4

SNPs

haplotypes

recotypes

Page 31: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

from SNPs to haplotypes to recotypes

(a STRUCTURE comparison)K=5

SNPs

haplotypes

recotypes

Page 32: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Africa within global genetic variation

Avg. Number of recombinations in 21 random chromsomes

Out of Africa hypothesisFounder’s effect

minority African specific component

Structure k=4

Page 33: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Genetic variation within Africa

Maasai specificminor component

Structure k=5

Subsaharan Maasai are distinct among Africans.

African-American exhibit stronger recombinational affinity with African populations than European populations. (Parra 98Parra 98)

Page 34: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Genetic variation outside Africa Structure k=5

Outside Africa, Gujarati and Japanese exhibit the highest and lowest number of recombinations respectively.

Gujarati Indians show intermediate position between Europeans and East Asians.

Avg. Number of recombinations in 21 random chromsomes

Page 35: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Venturing outside the X-chromosome Benefits

The bigger picture More regions and hence more information

Challenges Higher number of recombinations makes the

picture murkier Phasing errors

Page 36: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Regions

81 regions selected from HapMap 3

50 KB away from known CNV and SD(to avoid genotyping errors)

50 KB away from genes(to avoid selection)

at least 200 SNPs 25 samples per population

(each sample has twochromosomes)

Page 37: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Analysis For each region IRiS inferred recotypes for each

chromosome 34140 recombinations were inferred

For each sample the two recotypes were mergedmerged.

SNPs recotypes

PCA plots

Page 38: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

Quantifying population structure PCA and by k nearest neighbors is used to

predict population of every sample

Africans Non- Africans

MKK

LKK

YRIASW

GIH E. Asian MEX European

CHB+CHD JPT CEU TSI(4,3)

(0,7)(3,13) (8,13)

Perfectly classified

classifiedwith errors

Misclassification by (recotypes, SNPs)

Page 39: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

East Asian population

Recotypes are more informative of underlying Recotypes are more informative of underlying population structure.population structure.

SNPs recotypes

PCA plots

Page 40: Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida

in conclusion …

Recotypes show strong agreement with in silico and

in vetro recombination rates estimates are highly informative of the underlying

population structure provide a novel approach to study the

recombinational dynamics