association studies, haplotype blocks and tagging snps prof. sorin istrail

17
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail

Upload: blake-colden

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Association Studies, Haplotype Blocks and

Tagging SNPs

Prof. Sorin Istrail

Association studies

DiseaseResponder

ControlNon-responder

Allele 0 Allele 1

Marker A is associated with

Phenotype

Marker A:

Allele 0 =

Allele 1 =

Association studies• Evaluate whether

nucleotide polymorphisms associate with phenotype

T A GA A

C G GA A

C G TA A

T A TC G

T G TA G

T G GA G

T A GA A

C G GA A

C G TA A

T A TC G

T G TA G

T G GA G

Association studies

Hypothesis – Haplotype Blocks?

The genome consists largely of blocks of

common SNPs with relatively little recombination

within the blocks Patil et al., Science, 2001; Jeffreys et al., Nature Genetics, 2001; Daly et al., Nature Genetics, 2001

Sense genes

Antisense genes

200 kb

1 2 3 4

DNA

SNPs

Haplotypeblocks

Haplotype Block StructureLD-Blocks, and 4-Gamete Test Blocks

One definition of block

•Based on the Four Gamete test.

•Intuition: when between two SNPs there are all four gametes, there is a recombination point somewhere inbetween the two sites

Four Gamete Block Test• Hudson and Kaplan 1985

A segment of SNPs is a block if between every pair of SNPs at most 3 out of the 4 gametes (00, 01,10,11) are observed.

0 0 10 1 11 1 01 1 1

0 0 10 1 11 1 01 0 1

BLOCK VIOLATES THE BLOCK DEFINITION

Finding Recombination Hotspots:Many Possible Partitions into Blocks

A C T A G A T A G C C TG T T C G A C A A C A TA C T C T A T G A T C GG T T A T A C G A C A TA C T C T A T A G T A TA C T A G C T G G C A T

All four gametes are present:

A C T A G A T A G C C TG T T C G A C A A C A TA C T C T A T G A T C GG T T A T A C G A C A TA C T C T A T A G T A TA C T A G C T G G C A T

Find the left-most right endpoint of any constraint and mark the site

before it a recombination site.

Eliminate any constraints crossing that site.

Repeat until all constraints are gone.

The final result is a minimum-size set of sites crossing all constraints.

Tagging SNPs

ACGATCGATCATGAT

GGTGATTGCATCGAT

ACGATCGGGCTTCCG

ACGATCGGCATCCCG

GGTGATTATCATGAT

A------A---TG--

G------G---CG--

A------G---TC--

A------G---CC--

G------A---TG--

An example of real data set

and its haplotype block

structure. Colors refer to the

founding population, one

color for each founding

haplotype

Only 4 SNPs are needed to tag

all the different haplotypes

Informativeness A measure for the “information” a SNP contains about about another SNP. Useful for designing SNPs Arraysand Tagging SNPs selection.

0 1 00 1

0 1 10 0

s

h2

h1

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1s1 s2 s3 s4 s5

I(s1,s2) = 2/4 = 1/2

Informativeness

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1s1 s2 s3 s4 s5

I({s1,s2}, s4) = 3/4

Informativeness

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1s1 s2 s3 s4 s5

I({s3,s4},{s1,s2,s5}) = 3

S={s3,s4} is a

Minimal Informative Subset

Informativeness

Minimum Set Cover= Minimum Informative Subset

s1

s2

s5

s3

s4

e1

e2

e3

e4

e5

e6

SNPs Edges

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1

s1

s2

s3

s4

s5

Graph theory insight

Informativeness

Minimum Set Cover {s3, s4}= Minimum Informative Subset

s1

s2

s5

s3

s4

e1

e2

e3

e4

e5

e6

SNPs Edges

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1

s1

s2

s3

s4

s5

Informativeness

Graph theory insight