polymorphism structure of the human genome gabor t. marth department of biology boston college...

Post on 20-Dec-2015

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Polymorphism Structure of the Human Genome

Gabor T. Marth

Department of BiologyBoston CollegeChestnut Hill, MA 02467

Human variation structure is heterogeneous

chromosomal averages

polymorphism density along chromosomes

Heterogeneity at the level of distributions

0.0

0

5.0

0

10

.00

15

.00

20

.00

25

.00

30

.00

35

.00

40

.00

4 kb

8 kb

12 kb

16 kb0

0.1

0.2

0.3

0.4

“sparse” “dense”

marker density

“rare” “common”

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

allele frequenc

y

What explains nucleotide diversity?

5

6

7

8

30 33 36 39 42 45 48 51 54

G+C Content [%]

SN

P R

ate

[per

10,

000

bp

]

5

6

7

8

0.3 1.2 2.1 3 3.9 4.8 5.7

CpG Content [%]

SN

P R

ate

[p

er

10,0

00 b

p]

G+C nucleotide content

CpG di-nucleotide content

5

6

7

8

9

10

0 0.5 1 1.5 2 2.5 3 3.5 4

Recombination rate [per Mb]

SN

P R

ate

[per

10,

000

bp

] recombination rate

functional constraints

3’ UTR 5.00 x 10-4

5’ UTR 4.95 x 10-4

Exon, overall 4.20 x 10-4

Exon, coding 3.77 x 10-4

synonymous 366 / 653non-synonymous 287 / 653

Variance is so high that these quantities are poor predictors of nucleotide diversity in local regions hence random processes are likely to govern the basic shape of the genome variation landscape (random) genetic drift

Components of drift: Genealogy

present generation

randomly mating population, genealogy evolves in a non-deterministic fashion

Components of drift: Mutation

mutation randomly “drift”: die out, go to higher frequency or get fixed

Modulators: Changing population size

mutation randomly “drift”: die out, go to higher frequency or get fixed

genetic bottleneck

Modulators: Population subdivision

subdivision

subdivision promotes private polymorphisms, and skews allele frequency

Modulators: Recombination

accgttatgcaga acagttatgtaga

acagttatgcaga

accgttatgtagaaccgttatgcaga acagttatgtaga

recombination

different nucleotide sites within the same DNA segment no longer share the same genealogy

Modulators: Natural selection

negative (purifying) selection

positive selection

the genealogy is no longer independent of (and hence cannot be decoupled from) the mutation process

Modeling ancestral processes

“forward simulations” the “Coalescent” process

By focusing on a small sample, complexity of the relevant part of the ancestral process is greatly reduced. There are,

however, limitations.

Inferences from variation data

larger population size (N) -> more mutations -> higher diversity (θ)

larger mutation rate (μ) -> more mutations -> higher diversity (θ)

higher diversity -> larger population size OR higher mutation rate(θ = 4Nμ)

Ancestral inference: modeling

past

present

stationary expansioncollapse

MD(simulation)

AFS(direct form)

histo

ry

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

1 2 3 4 5 6 7 8 9 100

0.05

0.1

1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

bottleneck

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

Ancestral inference: model fitting

0

0.05

0.1

0.15

1 2 3 4 5 6 7 8 9 10

minor allele count

bottleneckmodest but

uninterrupted expansion

Allelic association

accgttatgcaga

acagttatgtaga

acagttatgcaga

accgttatgtaga

possible allele combinations (2-marker

haplotypes)

higher recombination rate

(r)

Allelic association: LD

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.81E-6

1E-5

1E-4

1E-3

0.01

0.1

1

10

100

1000

Recom

bin

ation F

raction

r2

European Asian

African American

Dis

tance (k

b)

measure of allelic association: “linkage disequilibrium (LD)”

Haplotype structure

“haplotype block”

top related