genome-wide association studies bnfo 602 roshan. application of snps: association with disease...

14
Genome-wide association studies BNFO 602 Roshan

Upload: juliet-mclaughlin

Post on 17-Dec-2015

220 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Genome-wide association studies

BNFO 602

Roshan

Page 2: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Application of SNPs: association with disease

• Experimental design to detect cancer associated SNPs:– Pick random humans with and without

cancer (say breast cancer)– Perform SNP genotyping– Look for associated SNPs – Also called genome-wide association study

Page 3: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Case-control example

• Study of 100 people:– Case: 50 subjects with

cancer

– Control: 50 subjects without cancer

• Count number of alleles and form a contingency table

#Allele1 #Allele2

Case 10 90

Control 2 98

Page 4: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Odds ratio

• Odds of allele 1 in cancer = a/b = e

• Odds of allele 1 in healthy = c/d = f

• Odds ratio of recessive in cancer vs healthy = e/f

#Allele1 #Allele2

Cancer a b

Healthy c d

Page 5: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Example

• Odds of allele 1 in case = 15/35

• Odds of allele 1 in control = 2/48

• Odds ratio of allele 1 in case vs control = (15/35)/(2/48) = 10.3

#Allele1 #Allele2

Case 15 35

Control 2 48

Page 6: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Statistical test of association (P-values)

• P-value = probability of the observed data (or worse) under the null hypothesis

• Example:– Suppose we are given a series of coin-tosses– We feel that a biased coin produced the tosses– We can ask the following question: what is the probability

that a fair coin produced the tosses?– If this probability is very small then we can say there is a

small chance that a fair coin produced the observed tosses.– In this example the null hypothesis is the fair coin and the

alternative hypothesis is the biased coin

Page 7: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Binomial distribution

• Bernoulli random variable: – Two outcomes: success of failure– Example: coin toss

• Binomial random variable:– Number of successes in a series of independent Bernoulli

trials

• Example: – Probability of heads=0.5– Given four coin tosses what is the probability of three heads? – Possible outcomes: HHHT, HHTH HTHH, HHHT– Each outcome has probability = 0.5^4– Total probability = 4 * 0.5^4

Page 8: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Binomial distribution

• Bernoulli trial probability of success=p, probability of failure = 1-p

• Given n independent Bernoulli trials what is the probability of k successes?

• Binomial applet: http://www.stat.tamu.edu/~west/applets/binomialdemo.html

n

k

⎝ ⎜

⎠ ⎟pk (1− p)n−k

Page 9: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Hypothesis testing under Binomial hypothesis

• Null hypothesis: fair coin (probability of heads = probability of tails = 0.5)

• Data: HHHHTHTHHHHHHHTHTHTH• P-value under null hypothesis = probability

that #heads >= 15• This probability is 0.021• Since it is below 0.05 we can reject the null

hypothesis

Page 10: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Null hypothesis for case control contingency table

• We have two random variables:– X: disease status– A: allele type.

• Null hypothesis: the two variables are independent of each other (unrelated)

• Under independence – P(X=case and A=1)= P(X=case)P(A=1)

• Expected number of cases with allele 1 is– P(X=case)P(A=1)N– where N is total observations

• P(X=case)=(a+b)/N• P(A=1)=(a+c)/N• What is expected number of controls with

allele 2?• Do the probabilities sum to 1?

#allele1 #allele2

case a b

control c d

Page 11: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Chi-square statistic

χ 2 =(Oi − E i)

2

E ii=1

n

Oi = observed frequency for ith outcomeEi = expected frequency for ith outcomen = total outcomes

The probability distribution of this statistic is given by thechi-square distribution with n-1 degrees of freedom.Proof can be found at http://ocw.mit.edu/NR/rdonlyres/Mathematics/18-443Fall2003/4226DF27-A1D0-4BB8-939A-B2A4167B5480/0/lec23.pdf

Page 12: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Chi-square

• Using chi-square we can test how well do observed values fit expected values computed under the independence hypothesis

• We can also test for the data under multinomial or multivariate normal distribution with probabilities given by the independence assumption. This would require cumulative distribution functions of multinomial and multi-variate normal which are hard to compute.

• Chi-square p-values are easier to compute

Page 13: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Case control#allele1 #allele2

case a b

control c d

E1: expected cases with allele 1E2: expected cases with allele 2E3: expected controls with allele 1E4: expected controls with allele 2N = a + b + c + d

E1 = ((a+b)/N)((a+c)/N) N = (a+b)(a+c)/NE2 = (a+b)(b+d)/NE3 = (c+d)(a+c)/NE4 = (c+d)(b+d)/N

Now compute chi-square statistic

Page 14: Genome-wide association studies BNFO 602 Roshan. Application of SNPs: association with disease Experimental design to detect cancer associated SNPs: –Pick

Chi-square statistic

#Allele1 #Allele2

Case 15 35

Control 2 48

• Compute expected valuesand chi-square statistic• Compute chi-square p-value by referring tochi-square distribution