quantitative genomics and geneticsmezeylab.cb.bscb.cornell.edu/labmembers/documents...lecture16:...
TRANSCRIPT
Lecture16: Intro to GWAS -population genetics
Jason [email protected]
March 24, 2020 (T) 8:40-9:55
Quantitative Genomics and Genetics
BTRY 4830/6830; PBSB.5201.01
Announcements I
โข We are continuing with โself-studyโ class content this week:
โข Lecture 16 (today) - recorded and will be posted
โข NO lecture Thurs. (March 26)
โข NO lectures next week (March 31 & April 2 = Spring Break)
โข Computer lab this week (March 26/27)
โข NO Computer lab next week (April 2/3)
โข For Computer Lab this week (!!) stay tuned (=Piazza message) for your TAs for information (availability / help sessions etc)
Announcements IIโข Cornell has extended the Spring 2020 academic calendar by 1
week (last day of classes May 12) so we have adjusted the calendar as follows:
โข Midterm will now be Tues. (April 14) - Fri. (April 17)
โข I will make the key to (Optional! But suggested!!) Homework #4 available April 10
โข Homeworks #2 & #3 will be graded before Spring Break
โข Back to normal class lectures / computer labs for the week of April 6 (!!)
โข IF YOU DO THE WORK OF THE CLASS (homeworks = done!, Midterm, Project, Final) YOU DO NOT WORRY ABOUT YOUR GRADE = YOU WILL GET A GOOD GRADE (!!)
Summary of lecture 16
โข Last lecture, we began our introduction to Genome-wide Association Study (GWAS) analysis (!!)
โข Today, we will continue this introduction and also discuss important concepts in population genetics
Conceptual OverviewGenetic System
Does A1 -
> A2
affec
t Y?
Sample or experimental
popMeasured individuals
(genotype,
phenotype)
Pr(Y|X)Model params
Reject / DNR Regres
sion
model
F-test
Review: genetic system
โข Our goal in quantitative genetics / genomics is to identify loci (positions in the genome) that contain causal mutations / polymorphisms / alleles
โข causal mutation or polymorphism - a position in the genome where an experimental manipulation of the DNA produces an effect on the phenotype on average or under specified conditions
โข Formally, we may represent this as follows:
โข Our experiment will be a statistical experiment (sample and inference!)
Pr(X = x) = Pr(X1 = x1, X2 = x2, ..., Xn
= xn
) = PX(x) or fX(x)
MLE(p) =1
n
nX
i=1
xi
(8)
MLE(ยต) = x =
1
n
nX
i=1
xi
(9)
A1 ! A2 ) ๏ฟฝY |Z (10)
4
Review: genetic inference
โข For our model focusing on one locus:
โข We have four possible parameters we could estimate:
โข However, for our purposes, we are only interested in the genetic parameters and testing the following null hypothesis:
think of there being a plane that we would draw through these points where the slopeof the plane in the z-axis along the x-axis would be ๏ฟฝa and the slope of the plane alongthe y-axis would be ๏ฟฝd, i.e. the we are projecting the values of the phenotypes into threedimensions and the multiple regression defines a plane through the points in these threedimensions.
For this regression model (where we are assuming a probability model of the form Pr(Y |X))we have four parameters โ =
โฅ๏ฟฝยต,๏ฟฝa,๏ฟฝd,๏ฟฝ
2โ
โค. We are interested in a case where in the true
probability model Cov(X,Y ) 6= 0, which corresponds to any case where ๏ฟฝa 6= 0 or ๏ฟฝd 6= 0(๏ฟฝยต and ๏ฟฝ2
โ may be any value). As we will discuss, the way we are going to assess whether agenotype is a causal polymorphism, i.e. by performing a hypothesis test with the followingnull and alternative hypotheses:
H0 : ๏ฟฝa = 0 \ ๏ฟฝd = 0 (24)
HA : ๏ฟฝa 6= 0 [ ๏ฟฝd 6= 0 (25)
Note that intuitively, if we reject this null hypothesis, there is a relationship between thephenotype Y and the genotype possessed by an individual, i.e. the definition of a causalpolymorphism. Also, note that cases where ๏ฟฝa 6= 0 or ๏ฟฝd = 0 and ๏ฟฝa = 0 and ๏ฟฝd 6= 0 aresuch cases (where the first defines a straight line through the mean of the phenotypes asso-ciated with each genotype and the latter defines a case where the mean of the heterozygotesA1A2 is greater than the homozygotes). In genetic terms, a case where ๏ฟฝd = 0 is a (purely)additive case and any case where ๏ฟฝd 6= 0 is a case of โdominanceโ, i.e. a case where themean phenotype associated with the heterozygote genotype is not mid-way between themeans of the phenotypes associated with the homozygote genotypes (a case where ๏ฟฝa = 0and ๏ฟฝd 6= 0 is an example of โoverdominanceโ or โunderdominanceโ).
Note section 4 on โquantitative genetic notationโ has been moved to the notes for nextlecture (lecture 10).
9
and we can write the โpredictedโ value of yi of an individual as:
yi = ๏ฟฝ0 + xi๏ฟฝ1 (14)
which is the value we would expect yi to take if there is no error. Note that by conventionwe write the predicted value of y with a โhatโ, which is the same terminology that we usefor parameter estimates. I consider this a bit confusing, since we only estimate parame-ters, but you can see where it comes from, i.e. the predicted value of yi is a function ofparameter estimates.
As an example, letโs consider the values all of the linear regression components wouldtake for a specific value yi. Letโs consider a system where:
Y = ๏ฟฝ0 +X๏ฟฝ1 + โ = 0.5 +X(1) + โ (15)
โ โ N(0,๏ฟฝ2โ ) = N(0, 1) (16)
If we take a sample and obtain the value y1 = 3.8 for an individual in our sample, the truevalues of the equation for this individual are:
3.8 = 0.5 + 3(1) + 0.3 (17)
Letโs say we had estimated the parameters ๏ฟฝ0 and ๏ฟฝ1 from the sample to be ๏ฟฝ0 = 0.6 and๏ฟฝ1 = 2.9. The predicted value of y1 in this case would be:
y1 = 3.5 = 0.6 + 2.9(1) (18)
Note that we have not yet discussed how we estimate the ๏ฟฝ parameters but we will get tothis next lecture.
To produce a linear regression model useful in quantitative genomics, we will define amultiple linear regression, which simply means that we have more than one independent(fixed random) variable X, each with their own associated ๏ฟฝ. Specifically, we will definethe two following independent (random) variables:
Xa(A1A1) = ๏ฟฝ1, Xa(A1A2) = 0, Xa(A2A2) = 1 (19)
Xd(A1A1) = ๏ฟฝ1, Xd(A1A2) = 1, Xd(A2A2) = ๏ฟฝ1 (20)
and the following regression equation:
Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ (21)
โ โ N(0,๏ฟฝ2โ ) (22)
7
and we can write the โpredictedโ value of yi of an individual as:
yi = ๏ฟฝ0 + xi๏ฟฝ1 (14)
which is the value we would expect yi to take if there is no error. Note that by conventionwe write the predicted value of y with a โhatโ, which is the same terminology that we usefor parameter estimates. I consider this a bit confusing, since we only estimate parame-ters, but you can see where it comes from, i.e. the predicted value of yi is a function ofparameter estimates.
As an example, letโs consider the values all of the linear regression components wouldtake for a specific value yi. Letโs consider a system where:
Y = ๏ฟฝ0 +X๏ฟฝ1 + โ = 0.5 +X(1) + โ (15)
โ โ N(0,๏ฟฝ2โ ) = N(0, 1) (16)
If we take a sample and obtain the value y1 = 3.8 for an individual in our sample, the truevalues of the equation for this individual are:
3.8 = 0.5 + 3(1) + 0.3 (17)
Letโs say we had estimated the parameters ๏ฟฝ0 and ๏ฟฝ1 from the sample to be ๏ฟฝ0 = 0.6 and๏ฟฝ1 = 2.9. The predicted value of y1 in this case would be:
y1 = 3.5 = 0.6 + 2.9(1) (18)
Note that we have not yet discussed how we estimate the ๏ฟฝ parameters but we will get tothis next lecture.
To produce a linear regression model useful in quantitative genomics, we will define amultiple linear regression, which simply means that we have more than one independent(fixed random) variable X, each with their own associated ๏ฟฝ. Specifically, we will definethe two following independent (random) variables:
Xa(A1A1) = ๏ฟฝ1, Xa(A1A2) = 0, Xa(A2A2) = 1 (19)
Xd(A1A1) = ๏ฟฝ1, Xd(A1A2) = 1, Xd(A2A2) = ๏ฟฝ1 (20)
and the following regression equation:
Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ (21)
โ โ N(0,๏ฟฝ2โ ) (22)
7
HA : Cov(Xa, Y ) 6= 0 [ Cov(Xd, Y ) 6= 0 (35)
H0 : ๏ฟฝa = 0 \ ๏ฟฝd = 0 (36)
HA : ๏ฟฝa 6= 0 [ ๏ฟฝd 6= 0 (37)
14
H0 : Cov(Xa, Y ) = 0 \ Cov(Xd, Y ) = 0 (35)
HA : Cov(Xa, Y ) 6= 0 [ Cov(Xd, Y ) 6= 0 (36)
H0 : ๏ฟฝa = 0 \ ๏ฟฝd = 0 (37)
HA : ๏ฟฝa 6= 0 [ ๏ฟฝd 6= 0 (38)
14
OR
Review: genetic inference IIโข Recall that inference (whether estimation or hypothesis testing)
starts by collecting a sample and defining a statistic on that sample
โข In this case, we are going to collect a sample of n individuals where for each we will measure their phenotype and their genotype (i.e. at the locus we are focusing on)
โข That is an individual i will have phenotype yi and genotype gi = AjAk (where we translate these into xa and xd)
โข Using the phenotype and genotype we will construct both an estimator (a statistic!) and we will additionally construct a test statistic
โข Remember that our regression probability model defines a sampling distribution on our sample and therefore on our estimator and test statistic (!!)
โข Letโs look at the structure of this estimator:
Review: genetic estimation
Using matrix notation, matrix multiplication, and matrix addition, we can re-write this as:
๏ฟฝ
โงโงโงโค
y1y2...yn
โฅ
โโโโ =
๏ฟฝ
โงโงโงโค
1 x1,a x1,d1 x2,a x2,d...
.... . .
1 xn,a xn,d
โฅ
โโโโ
๏ฟฝ
โค๏ฟฝยต๏ฟฝa๏ฟฝd
โฅ
โ +
๏ฟฝ
โงโงโงโค
โฅ1โฅ2...โฅn
โฅ
โโโโ
which we can write using the following compact matrix notation:
y = x๏ฟฝ + โฅ (14)
for a specific sample andY = X๏ฟฝ + โฅ (15)
for an arbitrary sample, where the ๏ฟฝ and โฅ here are vectors.
Recall that there are true values of ๏ฟฝ = [๏ฟฝยต,๏ฟฝa,๏ฟฝd] that describe the true relationshipbetween genotype and phenotype (specifically the true genotypic values), which in turndescribe the variation in Y in a given sample of size n, given genotype states X. Just aswith our general estimation framework, we are interested in defining a statistic (a functionon a sample) that takes a sample as input and returns a vector, where the elements of thevector provide an estimate of ๏ฟฝ, i.e. we will define a statistic T (y,xa,xd) = ๏ฟฝ = [๏ฟฝยต, ๏ฟฝa, ๏ฟฝd].More specifically we will define a maximum likelihood estimate (MLE) of these parameters(again, recall that for all the complexity of how MLEโs are calculated, they are simplystatistics that take a sample as an input and provide an estimator as an output). We willnot discuss the derivation of the MLE for the ๏ฟฝ parameters of a multiple regression model(although it is not that diโฅcult to derive), but will rather just provide the form of the MLE.Note that this MLE has a simple form, such that we do not have to go through the processof maximizing a likelihood, rather, we can write down a simple formula that provides anexpression that we know is the (single) maximum of the likelihood of the regression model.
With the vector and matrix notation introduced above, we can write the MLE as follows:
MLE(๏ฟฝ) =
๏ฟฝ
โค๏ฟฝยต๏ฟฝa๏ฟฝd
โฅ
โ
where the formula is as follows:
MLE(๏ฟฝ) = (XTX)๏ฟฝ1XTY (16)
As a side-note, this is also the โleast-squaresโ estimate of the regression parameters andthe โBest Linear Unbiased Estimateโ (BLUE) of these parameters, i.e. several statistics
5
Using matrix notation, matrix multiplication, and matrix addition, we can re-write this as:
๏ฟฝ
โงโงโงโค
y1y2...yn
โฅ
โโโโ =
๏ฟฝ
โงโงโงโค
1 x1,a x1,d1 x2,a x2,d...
.... . .
1 xn,a xn,d
โฅ
โโโโ
๏ฟฝ
โค๏ฟฝยต๏ฟฝa๏ฟฝd
โฅ
โ +
๏ฟฝ
โงโงโงโค
โฅ1โฅ2...โฅn
โฅ
โโโโ
which we can write using the following compact matrix notation:
y = x๏ฟฝ + โฅ (14)
for a specific sample andY = X๏ฟฝ + โฅ (15)
for an arbitrary sample, where the ๏ฟฝ and โฅ here are vectors.
Recall that there are true values of ๏ฟฝ = [๏ฟฝยต,๏ฟฝa,๏ฟฝd] that describe the true relationshipbetween genotype and phenotype (specifically the true genotypic values), which in turndescribe the variation in Y in a given sample of size n, given genotype states X. Just aswith our general estimation framework, we are interested in defining a statistic (a functionon a sample) that takes a sample as input and returns a vector, where the elements of thevector provide an estimate of ๏ฟฝ, i.e. we will define a statistic T (y,xa,xd) = ๏ฟฝ = [๏ฟฝยต, ๏ฟฝa, ๏ฟฝd].More specifically we will define a maximum likelihood estimate (MLE) of these parameters(again, recall that for all the complexity of how MLEโs are calculated, they are simplystatistics that take a sample as an input and provide an estimator as an output). We willnot discuss the derivation of the MLE for the ๏ฟฝ parameters of a multiple regression model(although it is not that diโฅcult to derive), but will rather just provide the form of the MLE.Note that this MLE has a simple form, such that we do not have to go through the processof maximizing a likelihood, rather, we can write down a simple formula that provides anexpression that we know is the (single) maximum of the likelihood of the regression model.
With the vector and matrix notation introduced above, we can write the MLE as follows:
MLE(๏ฟฝ) =
๏ฟฝ
โค๏ฟฝยต๏ฟฝa๏ฟฝd
โฅ
โ
where the formula is as follows:
MLE(๏ฟฝ) = (XTX)๏ฟฝ1XTY (16)
As a side-note, this is also the โleast-squaresโ estimate of the regression parameters andthe โBest Linear Unbiased Estimateโ (BLUE) of these parameters, i.e. several statistics
5
Using matrix notation, matrix multiplication, and matrix addition, we can re-write this as:
๏ฟฝ
โงโงโงโค
y1y2...yn
โฅ
โโโโ =
๏ฟฝ
โงโงโงโค
1 x1,a x1,d1 x2,a x2,d...
.... . .
1 xn,a xn,d
โฅ
โโโโ
๏ฟฝ
โค๏ฟฝยต๏ฟฝa๏ฟฝd
โฅ
โ +
๏ฟฝ
โงโงโงโค
โฅ1โฅ2...โฅn
โฅ
โโโโ
which we can write using the following compact matrix notation:
y = x๏ฟฝ + โฅ (14)
for a specific sample andY = X๏ฟฝ + โฅ (15)
for an arbitrary sample, where the ๏ฟฝ and โฅ here are vectors.
Recall that there are true values of ๏ฟฝ = [๏ฟฝยต,๏ฟฝa,๏ฟฝd] that describe the true relationshipbetween genotype and phenotype (specifically the true genotypic values), which in turndescribe the variation in Y in a given sample of size n, given genotype states X. Just aswith our general estimation framework, we are interested in defining a statistic (a functionon a sample) that takes a sample as input and returns a vector, where the elements of thevector provide an estimate of ๏ฟฝ, i.e. we will define a statistic T (y,xa,xd) = ๏ฟฝ = [๏ฟฝยต, ๏ฟฝa, ๏ฟฝd].More specifically we will define a maximum likelihood estimate (MLE) of these parameters(again, recall that for all the complexity of how MLEโs are calculated, they are simplystatistics that take a sample as an input and provide an estimator as an output). We willnot discuss the derivation of the MLE for the ๏ฟฝ parameters of a multiple regression model(although it is not that diโฅcult to derive), but will rather just provide the form of the MLE.Note that this MLE has a simple form, such that we do not have to go through the processof maximizing a likelihood, rather, we can write down a simple formula that provides anexpression that we know is the (single) maximum of the likelihood of the regression model.
With the vector and matrix notation introduced above, we can write the MLE as follows:
MLE(๏ฟฝ) =
๏ฟฝ
โค๏ฟฝยต๏ฟฝa๏ฟฝd
โฅ
โ
where the formula is as follows:
MLE(๏ฟฝ) = (XTX)๏ฟฝ1XTY (16)
As a side-note, this is also the โleast-squaresโ estimate of the regression parameters andthe โBest Linear Unbiased Estimateโ (BLUE) of these parameters, i.e. several statistics
5
2 Hypothesis testing with the regression model
As a reminder, our inference goal in quantitative genomics is to test the following nullhypothesis for a multiple regression model: Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ with โ โ N(0,๏ฟฝ2
โ ),which we use to assess whether there is an eโตect of a polymorphism on a phenotype:
H0 : ๏ฟฝa = 0 \ ๏ฟฝd = 0 (1)
HA : ๏ฟฝa 6= 0 [ ๏ฟฝd 6= 0 (2)
To do this, we will construct a likelihood ratio test (LRT) with an exact distribution (inthis case, an F-test). We will not go into the details of how this test is derived, but remem-ber that this has the same form as any LRT that we discussed in a previous lecture (andremember that a LRT works like any other statistic, i.e. it is a function on a sample thatproduces a value that we then use to determine a p-value!!). We will however consider thecomponents of an F-statistic so we know how to calculate it and perform our hypothesistest.
To construct this LRT, we need the maximum likelihood estimates of the regression pa-rameters:
MLE(โ) =
2
4๏ฟฝยต๏ฟฝa๏ฟฝd
3
5
where recall from last lecture, this has the following form:
MLE(โ) = (XTX)๏ฟฝ1
X
TY (3)
MLE(๏ฟฝ) = (xTx)๏ฟฝ1
x
Ty (4)
With these estimates, we can construct the predicted phenotypic value yi for an individuali in a sample:
yi = ๏ฟฝยต + xi,a๏ฟฝa + xi,d๏ฟฝd (5)
where the parameter estimates are the MLE. We will next define two functions of thepredicted values. The first is the sum of squares of the model (SSM):
SSM =nX
i=1
(yi ๏ฟฝ y)2 (6)
where y = 1nโ
ni yi is the mean of the sample. The second is the sum of squares of the error
(SSE):
SSE =nX
i=1
(yi ๏ฟฝ yi)2 (7)
2
โข We now have everything we need to construct a hypothesis test for:
โข This is equivalent to testing the following:
โข For a linear regression, we use the F-statistic for our sample:
โข We then determine a p-value using the distribution of the F-statistic under the null:
We will next use these two expressions to define two corresponding functions: the meansquare model (MSM) and the mean square error (MSE) terms. These later functionsdepend on the concept of degrees of freedom (df). Degrees of freedom have a rigorous jus-tification that you will encounter in an advanced statistics course. In this course, we willnot consider this justification or a deep intuition as to what df represent. For our purposes,it is enough to be able to calculate the df for our model and for our error. For our model,we determine df as the total number of ๏ฟฝ parameters in our model (three in this case: ๏ฟฝยต,๏ฟฝa, and ๏ฟฝd) minus one for the estimate of y such that df(M) = 3 ๏ฟฝ 1 = 2. For our error,the df is the total sample n minus the one for each of the three ๏ฟฝ parameters estimated inthe regression model such that df(E) = n๏ฟฝ 3. Note that this approach for determining dfworks for any model. For example, if we were to consider a regression model with just ๏ฟฝยตand ๏ฟฝa (and no ๏ฟฝd), we would have df(M) = 2๏ฟฝ 1 and df(E) = n๏ฟฝ 2.
With these terms for df, we can now define MSM and MSE:
MSM =SSM
df(M)=
SSM
2(8)
MSE =SSE
df(E)=
SSE
n๏ฟฝ 3(9)
and with these definitions, we can finally calculate our F-statistic:
F[2,n๏ฟฝ3] =MSM
MSE(10)
F[2,n๏ฟฝ3](y,xa,xd) =MSM
MSE(11)
Pr(F[2,n๏ฟฝ3]|H0) (12)
pval(F[2,n๏ฟฝ3](y,xa,xd)) (13)
where the distribution of an F-statistic depends on two numbers [2, n๏ฟฝ 3]. Now, while itseems very complex work to calculate an F-statistic, again, remember that this operateslike any other statistic in a hypothesis testing framework. The F-statistic is simply afunction on a sample, where we know the distribution of the F-statistic under the nullhypothesis H0 : ๏ฟฝa = 0\ ๏ฟฝd = 0 (which we can look up in a table). If our sample producesa value for the F-statistic that is greater than some critical threshold cโต corresponding toa type I error โต, we reject the null hypothesis. Again, note that F-tests (i.e. tests usingF-statistics) are LRT where we know the exact distribution under the null hypothesis.
3
We will next use these two expressions to define two corresponding functions: the meansquare model (MSM) and the mean square error (MSE) terms. These later functionsdepend on the concept of degrees of freedom (df). Degrees of freedom have a rigorous jus-tification that you will encounter in an advanced statistics course. In this course, we willnot consider this justification or a deep intuition as to what df represent. For our purposes,it is enough to be able to calculate the df for our model and for our error. For our model,we determine df as the total number of ๏ฟฝ parameters in our model (three in this case: ๏ฟฝยต,๏ฟฝa, and ๏ฟฝd) minus one for the estimate of y such that df(M) = 3 ๏ฟฝ 1 = 2. For our error,the df is the total sample n minus the one for each of the three ๏ฟฝ parameters estimated inthe regression model such that df(E) = n๏ฟฝ 3. Note that this approach for determining dfworks for any model. For example, if we were to consider a regression model with just ๏ฟฝยตand ๏ฟฝa (and no ๏ฟฝd), we would have df(M) = 2๏ฟฝ 1 and df(E) = n๏ฟฝ 2.
With these terms for df, we can now define MSM and MSE:
MSM =SSM
df(M)=
SSM
2(8)
MSE =SSE
df(E)=
SSE
n๏ฟฝ 3(9)
and with these definitions, we can finally calculate our F-statistic:
F[2,n๏ฟฝ3] =MSM
MSE(10)
F[2,n๏ฟฝ3](y,xa,xd) =MSM
MSE(11)
Pr(F[2,n๏ฟฝ3]|H0) (12)
pval(F[2,n๏ฟฝ3](y,xa,xd)) (13)
where the distribution of an F-statistic depends on two numbers [2, n๏ฟฝ 3]. Now, while itseems very complex work to calculate an F-statistic, again, remember that this operateslike any other statistic in a hypothesis testing framework. The F-statistic is simply afunction on a sample, where we know the distribution of the F-statistic under the nullhypothesis H0 : ๏ฟฝa = 0\ ๏ฟฝd = 0 (which we can look up in a table). If our sample producesa value for the F-statistic that is greater than some critical threshold cโต corresponding toa type I error โต, we reject the null hypothesis. Again, note that F-tests (i.e. tests usingF-statistics) are LRT where we know the exact distribution under the null hypothesis.
3
think of there being a plane that we would draw through these points where the slopeof the plane in the z-axis along the x-axis would be ๏ฟฝa and the slope of the plane alongthe y-axis would be ๏ฟฝd, i.e. the we are projecting the values of the phenotypes into threedimensions and the multiple regression defines a plane through the points in these threedimensions.
For this regression model (where we are assuming a probability model of the form Pr(Y |X))we have four parameters โ =
โฅ๏ฟฝยต,๏ฟฝa,๏ฟฝd,๏ฟฝ
2โ
โค. We are interested in a case where in the true
probability model Cov(X,Y ) 6= 0, which corresponds to any case where ๏ฟฝa 6= 0 or ๏ฟฝd 6= 0(๏ฟฝยต and ๏ฟฝ2
โ may be any value). As we will discuss, the way we are going to assess whether agenotype is a causal polymorphism, i.e. by performing a hypothesis test with the followingnull and alternative hypotheses:
H0 : ๏ฟฝa = 0 \ ๏ฟฝd = 0 (24)
HA : ๏ฟฝa 6= 0 [ ๏ฟฝd 6= 0 (25)
Note that intuitively, if we reject this null hypothesis, there is a relationship between thephenotype Y and the genotype possessed by an individual, i.e. the definition of a causalpolymorphism. Also, note that cases where ๏ฟฝa 6= 0 or ๏ฟฝd = 0 and ๏ฟฝa = 0 and ๏ฟฝd 6= 0 aresuch cases (where the first defines a straight line through the mean of the phenotypes asso-ciated with each genotype and the latter defines a case where the mean of the heterozygotesA1A2 is greater than the homozygotes). In genetic terms, a case where ๏ฟฝd = 0 is a (purely)additive case and any case where ๏ฟฝd 6= 0 is a case of โdominanceโ, i.e. a case where themean phenotype associated with the heterozygote genotype is not mid-way between themeans of the phenotypes associated with the homozygote genotypes (a case where ๏ฟฝa = 0and ๏ฟฝd 6= 0 is an example of โoverdominanceโ or โunderdominanceโ).
Note section 4 on โquantitative genetic notationโ has been moved to the notes for nextlecture (lecture 10).
9
where for an individual i in a sample we may write:
yi = ๏ฟฝยต + xi,a๏ฟฝa + xi,d๏ฟฝd + โi (23)
H0 : Cov(X,Y ) = 0 (24)
An intuitive way to consider this model, is to plot the phenotype Y on the Y-axis againstthe genotypes A1A1, A1A2, A2A2 on the X-axis for a sample (see class). We can repre-sent all the individuals in our sample as points that are grouped in the three categoriesA1A1, A1A2, A2A2 and note that the true model would include points distributed in threenormal distributions, with the means defined by the three classes A1A1, A1A2, A2A2. Ifwe were to then re-plot these points in two plots, Y versus Xa and Y versus Xd, the firstwould look like the original plot, and the second would put the points in two groups (seeclass). The multiple linear regression equation (20, 21) defines โtwoโ regression lines (ormore accurately a plane) for these latter two plots, where the slopes of the lines are ๏ฟฝa and๏ฟฝd (see class). Note that ๏ฟฝยต is where these two plots (the plane) intersect the Y-axis butwith the way we have coded Xa and Xd, this is actually an estimate of the overall meanof the population (hence the notation ๏ฟฝยต).
๏ฟฝยต = 2,๏ฟฝa = 5,๏ฟฝd = 0,๏ฟฝ2โ = 1
๏ฟฝยต = 0,๏ฟฝa = 4,๏ฟฝd = ๏ฟฝ2,๏ฟฝ2โ = 1
๏ฟฝยต = 0,๏ฟฝa = 2,๏ฟฝd = 3,๏ฟฝ2โ = 1
๏ฟฝยต = 0,๏ฟฝa = 2,๏ฟฝd = 3,๏ฟฝ2โ = 1
๏ฟฝยต = 2,๏ฟฝa = 0,๏ฟฝd = 0,๏ฟฝ2โ = 1
To consider a โplaneโ interpretation of the multiple regression model, letโs consider threeaxes, where on the x-axis we will plot Xa, on the y-axis we will plot Xd, and on the z-axis(which we will plot coming out towards you from the page) we will plot the phenotype Y .We can draw the x-axis and y-axis as follows:
1 A1A2
๏ฟฝ1 A1A1 A2A2
-1 0 1
where the genotype are placed where they would map on the x- and y-axis. Now the phe-notypes would be plotted above each of these three genotypes in the z-plane and we could
8
Review: genetic hypothesis test I
โข To construct our LRT for our null, we will need several components, first the predicted value of the phenotype for each individual:
โข Second, we need the โSum of Squares of the Modelโ (SSM) and the โSum of Squares of the Errorโ (SSE):
โข Third, we need the โMean Squared Modelโ (MSM) and the โMean Square Errorโ (MSE) with degrees of freedom (df) and :
โข Finally, we calculate our (LRT!) statistic, the F-statistic with degrees of freedom [2, n-3]:
that use di๏ฟฝerent approaches for defining an estimator have the same answer for a multipleregression model. With these estimates, we can construct the predicted phenotypic valueyi for and individual i in a sample:
yi = ๏ฟฝยต + xi,a๏ฟฝa + xi,d๏ฟฝd (17)
where the parameter estimates are MLE. Note that the โhatโ notation for the predictedvalue is a little odd, since yi is not a parameter we estimate, but given that the predictedvalue is a function of the estimates of parameters, you can see the origin of this notation.We will use predicted values next lecture when we construct a statistic for performing ahypothesis test using the multiple regression model.
6
2 Hypothesis testing with the regression model
As a reminder, our inference goal in quantitative genomics is to test the following nullhypothesis for a multiple regression model: Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ with โ โ N(0,๏ฟฝ2
โ ),which we use to assess whether there is an eโตect of a polymorphism on a phenotype:
H0 : ๏ฟฝa = 0 \ ๏ฟฝd = 0 (1)
HA : ๏ฟฝa 6= 0 [ ๏ฟฝd 6= 0 (2)
To do this, we will construct a likelihood ratio test (LRT) with an exact distribution (inthis case, an F-test). We will not go into the details of how this test is derived, but remem-ber that this has the same form as any LRT that we discussed in a previous lecture (andremember that a LRT works like any other statistic, i.e. it is a function on a sample thatproduces a value that we then use to determine a p-value!!). We will however consider thecomponents of an F-statistic so we know how to calculate it and perform our hypothesistest.
To construct this LRT, we need the maximum likelihood estimates of the regression pa-rameters:
MLE(โ) =
2
4๏ฟฝยต๏ฟฝa๏ฟฝd
3
5
where recall from last lecture, this has the following form:
MLE(โ) = (XTX)๏ฟฝ1XTY (3)
With these estimates, we can construct the predicted phenotypic value yi for an individuali in a sample:
yi = ๏ฟฝยต + xi,a๏ฟฝa + xi,d๏ฟฝd (4)
where the parameter estimates are the MLE. We will next define two functions of thepredicted values. The first is the sum of squares of the model (SSM):
SSM =nX
i=1
(yi ๏ฟฝ y)2 (5)
where y = 1nโ
ni yi is the mean of the sample. The second is the sum of squares of the error
(SSE):
SSE =nX
i=1
(yi ๏ฟฝ yi)2 (6)
We will next use these two expressions to define two corresponding functions: the meansquare model (MSM) and the mean square error (MSE) terms. These later functions
2
We will next use these two expressions to define two corresponding functions: the meansquare model (MSM) and the mean square error (MSE) terms. These later functionsdepend on the concept of degrees of freedom (df). Degrees of freedom have a rigorous jus-tification that you will encounter in an advanced statistics course. In this course, we willnot consider this justification or a deep intuition as to what df represent. For our purposes,it is enough to be able to calculate the df for our model and for our error. For our model,we determine df as the total number of ๏ฟฝ parameters in our model (three in this case: ๏ฟฝยต,๏ฟฝa, and ๏ฟฝd) minus one for the estimate of y such that df(M) = 3 ๏ฟฝ 1 = 2. For our error,the df is the total sample n minus the one for each of the three ๏ฟฝ parameters estimated inthe regression model such that df(E) = n๏ฟฝ 3. Note that this approach for determining dfworks for any model. For example, if we were to consider a regression model with just ๏ฟฝยตand ๏ฟฝa (and no ๏ฟฝd), we would have df(M) = 2๏ฟฝ 1 and df(E) = n๏ฟฝ 2.
With these terms for df, we can now define MSM and MSE:
MSM =SSM
df(M)=
SSM
2(8)
MSE =SSE
df(E)=
SSE
n๏ฟฝ 3(9)
and with these definitions, we can finally calculate our F-statistic:
F[2,n๏ฟฝ3] =MSM
MSE(10)
where the distribution of an F-statistic depends on two numbers [2, n๏ฟฝ 3]. Now, while itseems very complex work to calculate an F-statistic, again, remember that this operateslike any other statistic in a hypothesis testing framework. The F-statistic is simply afunction on a sample, where we know the distribution of the F-statistic under the nullhypothesis H0 : ๏ฟฝa = 0\ ๏ฟฝd = 0 (which we can look up in a table). If our sample producesa value for the F-statistic that is greater than some critical threshold cโต corresponding toa type I error โต, we reject the null hypothesis. Again, note that F-tests (i.e. tests usingF-statistics) are LRT where we know the exact distribution under the null hypothesis.
3 Alternative linear models for testing the same null hy-pothesis
So far we have used a multiple regression formulation to test the null hypothesis that agiven polymorphism is not a causal polymorphism:
Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ (11)
3
We will next use these two expressions to define two corresponding functions: the meansquare model (MSM) and the mean square error (MSE) terms. These later functionsdepend on the concept of degrees of freedom (df). Degrees of freedom have a rigorous jus-tification that you will encounter in an advanced statistics course. In this course, we willnot consider this justification or a deep intuition as to what df represent. For our purposes,it is enough to be able to calculate the df for our model and for our error. For our model,we determine df as the total number of ๏ฟฝ parameters in our model (three in this case: ๏ฟฝยต,๏ฟฝa, and ๏ฟฝd) minus one for the estimate of y such that df(M) = 3 ๏ฟฝ 1 = 2. For our error,the df is the total sample n minus the one for each of the three ๏ฟฝ parameters estimated inthe regression model such that df(E) = n๏ฟฝ 3. Note that this approach for determining dfworks for any model. For example, if we were to consider a regression model with just ๏ฟฝยตand ๏ฟฝa (and no ๏ฟฝd), we would have df(M) = 2๏ฟฝ 1 and df(E) = n๏ฟฝ 2.
With these terms for df, we can now define MSM and MSE:
MSM =SSM
df(M)=
SSM
2(8)
MSE =SSE
df(E)=
SSE
n๏ฟฝ 3(9)
and with these definitions, we can finally calculate our F-statistic:
F[2,n๏ฟฝ3] =MSM
MSE(10)
where the distribution of an F-statistic depends on two numbers [2, n๏ฟฝ 3]. Now, while itseems very complex work to calculate an F-statistic, again, remember that this operateslike any other statistic in a hypothesis testing framework. The F-statistic is simply afunction on a sample, where we know the distribution of the F-statistic under the nullhypothesis H0 : ๏ฟฝa = 0\ ๏ฟฝd = 0 (which we can look up in a table). If our sample producesa value for the F-statistic that is greater than some critical threshold cโต corresponding toa type I error โต, we reject the null hypothesis. Again, note that F-tests (i.e. tests usingF-statistics) are LRT where we know the exact distribution under the null hypothesis.
3 Alternative linear models for testing the same null hy-pothesis
So far we have used a multiple regression formulation to test the null hypothesis that agiven polymorphism is not a causal polymorphism:
Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ (11)
3
We will next use these two expressions to define two corresponding functions: the meansquare model (MSM) and the mean square error (MSE) terms. These later functionsdepend on the concept of degrees of freedom (df). Degrees of freedom have a rigorous jus-tification that you will encounter in an advanced statistics course. In this course, we willnot consider this justification or a deep intuition as to what df represent. For our purposes,it is enough to be able to calculate the df for our model and for our error. For our model,we determine df as the total number of ๏ฟฝ parameters in our model (three in this case: ๏ฟฝยต,๏ฟฝa, and ๏ฟฝd) minus one for the estimate of y such that df(M) = 3 ๏ฟฝ 1 = 2. For our error,the df is the total sample n minus the one for each of the three ๏ฟฝ parameters estimated inthe regression model such that df(E) = n๏ฟฝ 3. Note that this approach for determining dfworks for any model. For example, if we were to consider a regression model with just ๏ฟฝยตand ๏ฟฝa (and no ๏ฟฝd), we would have df(M) = 2๏ฟฝ 1 and df(E) = n๏ฟฝ 2.
With these terms for df, we can now define MSM and MSE:
MSM =SSM
df(M)=
SSM
2(8)
MSE =SSE
df(E)=
SSE
n๏ฟฝ 3(9)
and with these definitions, we can finally calculate our F-statistic:
F[2,n๏ฟฝ3] =MSM
MSE(10)
where the distribution of an F-statistic depends on two numbers [2, n๏ฟฝ 3]. Now, while itseems very complex work to calculate an F-statistic, again, remember that this operateslike any other statistic in a hypothesis testing framework. The F-statistic is simply afunction on a sample, where we know the distribution of the F-statistic under the nullhypothesis H0 : ๏ฟฝa = 0\ ๏ฟฝd = 0 (which we can look up in a table). If our sample producesa value for the F-statistic that is greater than some critical threshold cโต corresponding toa type I error โต, we reject the null hypothesis. Again, note that F-tests (i.e. tests usingF-statistics) are LRT where we know the exact distribution under the null hypothesis.
3 Alternative linear models for testing the same null hy-pothesis
So far we have used a multiple regression formulation to test the null hypothesis that agiven polymorphism is not a causal polymorphism:
Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ (11)
3
We will next use these two expressions to define two corresponding functions: the meansquare model (MSM) and the mean square error (MSE) terms. These later functionsdepend on the concept of degrees of freedom (df). Degrees of freedom have a rigorous jus-tification that you will encounter in an advanced statistics course. In this course, we willnot consider this justification or a deep intuition as to what df represent. For our purposes,it is enough to be able to calculate the df for our model and for our error. For our model,we determine df as the total number of ๏ฟฝ parameters in our model (three in this case: ๏ฟฝยต,๏ฟฝa, and ๏ฟฝd) minus one for the estimate of y such that df(M) = 3 ๏ฟฝ 1 = 2. For our error,the df is the total sample n minus the one for each of the three ๏ฟฝ parameters estimated inthe regression model such that df(E) = n๏ฟฝ 3. Note that this approach for determining dfworks for any model. For example, if we were to consider a regression model with just ๏ฟฝยตand ๏ฟฝa (and no ๏ฟฝd), we would have df(M) = 2๏ฟฝ 1 and df(E) = n๏ฟฝ 2.
With these terms for df, we can now define MSM and MSE:
MSM =SSM
df(M)=
SSM
2(8)
MSE =SSE
df(E)=
SSE
n๏ฟฝ 3(9)
and with these definitions, we can finally calculate our F-statistic:
F[2,n๏ฟฝ3] =MSM
MSE(10)
where the distribution of an F-statistic depends on two numbers [2, n๏ฟฝ 3]. Now, while itseems very complex work to calculate an F-statistic, again, remember that this operateslike any other statistic in a hypothesis testing framework. The F-statistic is simply afunction on a sample, where we know the distribution of the F-statistic under the nullhypothesis H0 : ๏ฟฝa = 0\ ๏ฟฝd = 0 (which we can look up in a table). If our sample producesa value for the F-statistic that is greater than some critical threshold cโต corresponding toa type I error โต, we reject the null hypothesis. Again, note that F-tests (i.e. tests usingF-statistics) are LRT where we know the exact distribution under the null hypothesis.
3 Alternative linear models for testing the same null hy-pothesis
So far we have used a multiple regression formulation to test the null hypothesis that agiven polymorphism is not a causal polymorphism:
Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ (11)
3
We will next use these two expressions to define two corresponding functions: the meansquare model (MSM) and the mean square error (MSE) terms. These later functionsdepend on the concept of degrees of freedom (df). Degrees of freedom have a rigorous jus-tification that you will encounter in an advanced statistics course. In this course, we willnot consider this justification or a deep intuition as to what df represent. For our purposes,it is enough to be able to calculate the df for our model and for our error. For our model,we determine df as the total number of ๏ฟฝ parameters in our model (three in this case: ๏ฟฝยต,๏ฟฝa, and ๏ฟฝd) minus one for the estimate of y such that df(M) = 3 ๏ฟฝ 1 = 2. For our error,the df is the total sample n minus the one for each of the three ๏ฟฝ parameters estimated inthe regression model such that df(E) = n๏ฟฝ 3. Note that this approach for determining dfworks for any model. For example, if we were to consider a regression model with just ๏ฟฝยตand ๏ฟฝa (and no ๏ฟฝd), we would have df(M) = 2๏ฟฝ 1 and df(E) = n๏ฟฝ 2.
With these terms for df, we can now define MSM and MSE:
MSM =SSM
df(M)=
SSM
2(8)
MSE =SSE
df(E)=
SSE
n๏ฟฝ 3(9)
and with these definitions, we can finally calculate our F-statistic:
F[2,n๏ฟฝ3] =MSM
MSE(10)
where the distribution of an F-statistic depends on two numbers [2, n๏ฟฝ 3]. Now, while itseems very complex work to calculate an F-statistic, again, remember that this operateslike any other statistic in a hypothesis testing framework. The F-statistic is simply afunction on a sample, where we know the distribution of the F-statistic under the nullhypothesis H0 : ๏ฟฝa = 0\ ๏ฟฝd = 0 (which we can look up in a table). If our sample producesa value for the F-statistic that is greater than some critical threshold cโต corresponding toa type I error โต, we reject the null hypothesis. Again, note that F-tests (i.e. tests usingF-statistics) are LRT where we know the exact distribution under the null hypothesis.
3 Alternative linear models for testing the same null hy-pothesis
So far we have used a multiple regression formulation to test the null hypothesis that agiven polymorphism is not a causal polymorphism:
Y = ๏ฟฝยต +Xa๏ฟฝa +Xd๏ฟฝd + โ (11)
3
Pr(X = x) = Pr(X1 = x1, X2 = x2, ..., Xn
= xn
) = PX(x) or fX(x)
MLE(p) =1
n
nX
i=1
xi
(8)
MLE(ยต) = x =
1
n
nX
i=1
xi
(9)
A1 ! A2 ) ๏ฟฝY |Z (10)
gi
= Aj
Ak
(11)
2.1๏ฟฝ 0.3 + (0)(๏ฟฝ0.2) + (1)(1.1) + 0.7 (12)
SSE =
nX
n=1
(yi
๏ฟฝ yi
)
2(13)
4
Review: genetic hypothesis test II
โข In general, the F-distribution (continuous random variable!) under the H0 has variable forms that depend on d.f.:
โข Note when calculating a p-value for the genetic model, we consider the value of the F-statistic we observe or more extreme towards positive infinite (!!) using the F-distribution with [2,n=3] d.f.
โข However, also this is actually a two-tailed test (what is going on here (!?)
Review: genetic hypothesis test III
Review: quantitative genomic analysis I
โข We now know how to assess the null hypothesis as to whether a polymorphism has a causal effect on our phenotype
โข Occasionally we will assess this hypothesis for a single genotype
โข In quantitative genomics, we generally do not know the location of causal polymorphisms in the genome
โข We therefore perform a hypothesis test of many genotypes throughout the genome
โข This is a genome-wide association study (GWAS)
โข Analysis in a GWAS raises (at least) two issues we have not yet encountered:
โข An analysis will consist of many hypothesis tests (not just one)
โข We often do not test the causal polymorphism (usually)
โข Note that this latter issue is a bit strange (!?) - how do we assess causal polymorphisms if we have not measured the causal polymorphism?
โข Also note that causal genotypes will begin to be measured in our GWAS with next-generation sequencing data (but the issue will still be present!)
Review: quantitative genomic analysis II
Review: correlation among genotypes
Copyright: Journal of Diabetes and its Complications; Science Direct; Vendramini et al
โข If we test a (non-causal) genotype that is correlated with the causal genotype AND if correlated genotypes are in the same position in the genome THEN we can identify the genomic position of the casual genotype (!!)
โข This is the case in genetic systems (why!?)
โข Do we know which genotype is causal in this scenario?
โข Mapping the position of a causal polymorphism in a GWAS requires there to be LD for genotypes that are both physically linked and close to each other AND that markers that are either far apart or on different chromosomes to be in equilibrium
โข Note that disequilibrium includes both linkage disequilibrium AND other types of disequilibrium (!!), e.g. gametic phase disequilibrium
Chr. 1
A B C
Chr. 2
D
equilibrium, linkage
equilibrium, no linkage
LD
Review: linkage disequilibrium
Genome-Wide Association Study (GWAS)
โข For a typical GWAS, we have a phenotype of interest and we do not know any causal polymorphisms (loci) that affect this phenotype (but we would like to find them!)
โข In an โidealโ GWAS experiment, we measure the phenotype and N genotypes THROUGHOUT the genome for n independent individuals
โข To analyze a GWAS, we perform N independent hypothesis tests
โข When we reject the null hypothesis, we assume that we have located a position in the genome that contains a causal polymorphism (not the causal polymorphism!), hence a GWAS is a mapping experiment
โข This is as far as we can go with a GWAS (!!) such that (often) identifying the causal polymorphism requires additional data and or follow-up experiments, i.e. GWAS is a starting point
The Manhattan plot: examples
GTF2H1
Expected โLogโP
Obs
erve
d โL
ogโP
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโ
โโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโ
โ
0 1 2 3 4 5
02
46
810
12
MTRR
Expected โLogโP
Obs
erve
d โL
ogโP
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโ
โโโโโโโ
โ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโ
โโโโ
0 1 2 3 4 5
02
46
8
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโ
โโโโโโ โโโโโโ
โโโโโโโ
โโโ โโโโโโโโโโโ
โโโโโโ โโ
โโโโโโโโโโโ โโโ
โโโโโโโโโโ โโโโโโโโ
โโโโโ โโโโโโโโ โโ
โโโโโโ
โโโโโโโ โโโโ
โโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโ โโโโโโโโโโ
โโโโโโโโโโโโโโโโ โโโโโโโโโโ
โโโโโโโโโโโโโ
โโโโโโโโโโโโโโโ โโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโ โโโโโโโโโโโโ โโโโโโโ
โโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโ โโโโโ โโโโโโ โโ โโ
โ
โโโโ โโ
โโโโโโโโโโ โโโ
โ โโโโ
โ โโโโโโโโโโโโโโโโ โโ
โ
GTF2H1
โLogโP
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโ
โโโ
โโโโโโโโโ โโ
โโโโโโโโโโ
โโโโโโ โโ
โโโโโโโโโโโโโโโ โโโโโโโ
โโโโโ
โโโโโโโโโโโโ โโโโโโโ
โโโโโโโ
โโโโโโโโโโ โโโโโโโ โโโโโโ โ
โโโโโ โโโโโโ
โโโโโโโโโโโโโโโ โโโโโโ
โโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโ โโโโโโโโโโโโโโโ
โโโโโโโโโ
โโโโโโโโโโโโโโ
โโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโ
โโโโ โโโโโโโโโโโโ โโ
โโโโโโโโโโโ โ
โโโโโโโโโ โโโโโโโโ โ
โโโโโ
โโโโ
โโโโโโโโโโ
โ โโโโโโ โ
โโโโโโโโ โโโ
โโโโโ โโโโโโโโโ
โโ
โ โโโโโโโ โโโ โโโโโโโโโโ
โโโโโโ
โโโ
โโโ โโโ โ โโโโโโ โโโโโ โ โ โโ
โ
04
812
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโ
โโโโโโโ โโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโ โโโโโโโโ
โโโโโโโโ โโโโโโโโ
โโโโโ โโโโโโโโโ โโโโโโโโโ โ
โโโโโโโ โโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โโโโโโ
โโโโโโโโโโโ
โโโโโโโโโโโโ โโโโโโโโโ
โโโโโโโโโ โโโโโโ
โโโโโโโโ โโโโโโโโโโโโโโ โโโ
โโโโโโ โโโโโโโโโโ โโโโโ
โโโโโโโโโ
โโโโโโโโโโโโโโโโ
โโโโโ โโโโโโโ โโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโ โโโโโ
โโโโโโโโโโโโโโโโโโ
โโโโโโโ โ
โโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโ โโโโโโโโ
โโโโโโโ โโโโโ
โโโ
โโโ โโ โ
GTF2H1
โLogโP
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโ โโโโโโโโ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโ
โโโโโ
โโโโโโโโโ
โโโโโโโ
โโโ
โโโโโโโโโโโ
โโโโโโโ โโโโ
โโโโโโโโโโโ
โโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโ
โโโโโ โโโโโโโโโโโโโโ โโโโ โโโ
โโโโโโโ
โโโ
โโโโโโโโโโโโโ
โโโโโโโโโโโ
โโโโโโโโโโโโโโโโ โโโโโโโ
โโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โโโโโโโโโโโโโโโโโ
โโโโโโโโโ
โโโโโโโโโโโโโโ โโโโโโโโโโโ
โโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโ
โ โโโโโโโโ โโโโโโโโโโโโโ
โโโโโโโโโโ โโโโโโโโโโโโ
โโโโโโโโโโ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
โโโโ
โโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโ โโโ
โโ
โ
โ
โโโโโโโโโโโโโโโโโโ
โ โโโโโโโโโโโโโ โโโโโโโโโ
โโ โโ
โโ
โโโโโโโโโ โ โโ โโโโโโโ
โโโโโ
04
812
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโ
โโโโโโโโ
โโโโโโโโโโโโโโโโโโโ โโ
โโโโโโโโโโ
โโโโโโ โโโโโโโโโโโ โโโโโ
โโโโโโ โโโโโโโโโโโ โ
โโโโโโโโ โโโโ
โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโ
โโโ โโโโโโโโโโโโโ
โโโโโโโโโ
โ โโโโโโโโโโโโโโโโโโโโ
โโโโ โโ
โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโ
โโ โโ
โโโโโโโโโโโโ โโโโโโ
โโโโ โโโโ
โโโโโโโ
โโโโโโโโโ โโ
โโโ โโโโโโโโโโ
โ
โโโโโ
โ
โโโ
โโโโโโโ โโโโโ
โโโโโโโ โโโ โโโโโโโโโ
โ โโโโโโโโ โโ
โโโโโโ
โโโโโโโโโโโโโโโโโโโ โ โโโโโ
โโ โโโโโโโโ
โ โโโโโโโโ
โโโโ
โโโโโ โ
MTRR
โLogโP
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโโ โโโโโโโโโโ
โโโโโโโโโโโโโโโ โโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โโ โโโโโโโโโโโโโโโโโ
โโโโโโโโ โ
โโโโโโโโโโโโโโโ
โโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโ
โโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโ
โโโโ โโ
โโโโโโโโ โโโโโโโ
โ โโโโโ
โโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโ
โโโโโ โโโโโโโ
โโโโโโโโโโโโโโโโโโโ โโ
โโโโโโโโโโโโโโโโโโ โ
โโโโโ
โโโโโโโโโโ โโโ
โโโโโ
โโโโโโโ โโโ
โโโโโโโโ โโโโ
โโโโโโโ โโโโโโโโโ โโโโโ โโโโ โโ
โโโโโโโโโ
โโโโโ
โโโ
โโโโโโโโโโโ โ
โโโ
โโโโโ
โโโโโโโโโโ
โโ โโ
โ
โโโโโโโโโโโโ โโ โโโโโโโ
โโโ
โโโ
โ
โโโโโโโโโ โ
โโโโ โโ โ โโโโ
โโโ
04
8
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโ โโ
โโโโโโโโโโโโโโโโ
โโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโ
โโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโ
โโโโโ โโโโโโโโโโโ โโโโโ
โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโ
โโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโ โโ
โโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโ โโโโโโโโโโโโโโโโ
โโโโโโโโโโโโ
โโโโโโโโโโโโโโ
โโโโโโโโโโโโ
โโโโโโโโ โ
โโโโโโโโโโโ โโ
โโโโโโโโ
โโโโ
โโโโโโโโโ โโโ
โ
โโโโโโโโโโโโโโ โ
โโโ
โโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโ
โโ
โโโโโโโโโโ โโโโโโโโ
โโโโโโโโโโโโโโโโ
โโโโโโโโโโโโ
โโโโโ โโโโโโโโโโ
โโโโโโโ
โโโโโโ โ
โโโโโ
โ
โ
โโโโ โโโ
โ
MTRR
Chromosome
โLogโP
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโ โโ
โโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโ โโโโโ
โโโโโโโโโโโโ
โโโ โโโ
โโโโโโโโโโโโ โโโโโโโโโโโโ
โโโโโโโโโ
โโโโโโ โ
โโโโ
โโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโ
โโโโโโโโโโ โโ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโ
โโ โโโโโ
โโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโ
โโโโโโโโโโ
โโโโโโโโโโโโ
โโโโโโโโ โโโโโโ
โโโโโโโโโโโโโโโโโโโโ โโโโโโ
โโโโโโโโโโโ โโโโ
โโโโโโโโโโโโ
โโโโโโโโโ โโโโโโ
โโโโโโโโโโ โ
โโโ
โโโโโโ
โโโโโโโโโโโ โโโโโโโโโ โโโโ
โโโโโโโโโโโโโโโโโโโโโ
โโ
โ
โ
โโโโโโ
โ
โโ
โ
โโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโ โโโโโโโโ
โโโโโโโโ
โโ โโโโโโ
โโโโโโ
โ
โโโโโ
โโ
โ โโโโโโ
โโโโโโโโโโ โ
โโโโ
โโโโโโ
โ โโโโโโโ
04
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
โข Mapping the position of a causal polymorphism in a GWAS requires there to be LD for genotypes that are both physically linked and close to each other AND that markers that are either far apart or on different chromosomes to be in equilibrium
โข Note that disequilibrium includes both linkage disequilibrium AND other types of disequilibrium (!!), e.g. gametic phase disequilibrium
Chr. 1
A B C
Chr. 2
D
equilibrium, linkage
equilibrium, no linkage
LD
Linkage Disequilibrium (LD)
Different chromosomes Iโข Polymorphisms on different chromosomes tend to be in
equilibrium because of independent assortment and random mating, i.e. random matching of gametes to form zygotes
Copyright: http://geneticssuite.net/node/21
Different chromosomes IIโข Polymorphisms on different chromosomes tend to be in
equilibrium because of independent assortment and random mating, i.e. random matching of gametes to form zygotes
Different chromosomes III
โข More formally, we represent independent assortment as:
โข For random pairing of gametes to produce zygotes:
โข Putting this together for random pairing of gametes to produce zygotes we get the conditions for equilibrium:
and 13) a gamete will have the same probability of having one of the โfourโ chromosomesfrom chromosome 11 and one of the โfourโ chromosomes from chromosome 13. Now if weassume all gametes in the population are produced this way, the frequency of a particulargamete in the population with alleles Ai and Bk will bePr(AiBk) = Pr(Ai)Pr(Bk)
i.e. the frequency of a gamete is the product of the frequency of each allele, and thesame for all other allele combinations in gametes. If we now assume random mating thenthe probability of any two specific gametes in the population fusing to produce an oโตspringis the same and the frequency of a genotype produced by two gametes mating is:
fr(AiBk)fr(AjBl) = fr(Ai)fr(Bk)fr(Aj)fr(Bl) = fr(AiAj)fr(BkBl) = fr(AiAjBkBl)(9)
Pr(AiBkAjBl) = Pr(AiBk)Pr(AjBl) (10)
Pr(AiBkAjBl) = Pr(AiBk)Pr(AjBl) = Pr(Ai)Pr(Bk)Pr(Aj)Pr(Bl) = Pr(Ai, Aj)Pr(Bk, Bl)(11)
where again this is for all combinations of i, j, k, l = 1 or 2. Thus, if we considered twomarkers, one on each chromosome, the states of the genotypes at these markers will alsobe independent. Since independence implies a correlation of zero, we expect markers ondiโตerent chromosomes to be uncorrelated (see class notes for a diagram).
Next letโs consider markers on the same chromosome. For this part, we will take theopposite strategy and show how markers that are close together are highly correlated (non-independent) and the further markers are located from each other physically, the greaterthe probability of a recombination event, which increases their independence. In sexual,diploid organisms where there is recombination, sections of a chromosome are swappedbetween two chromosomes that end up in gametes (see class notes for a diagram). Sincethe more recombination, the more genotypes of two markers are โmixed upโ, more recombi-nation tends to lower the correlation between markers. As an illustrative example, consideran extreme case where A1 and B1 always occur together on a chromosome and where A2
and B2 always occur together, i.e. fr(A1B1) 6= 0, fr(A2B2) 6= 0, and the frequency ofall other genotypes is zero. This is a case of a perfect correlation between XA and XB
genotypes such that |corr(XA, XB)| = 1, intuitively if A1 occurs, this means B1 occursand vice versa with A2 and B2. However, if there is a recombination event between thesemarkers, as well as A1๏ฟฝB1 and A2๏ฟฝB2 chromosomes in the population, we will also haveA1 ๏ฟฝB2 and A2 ๏ฟฝB1. Now, A1 does not always occur with B1 and as a consequence, thecorrelation between markers A and B is now less than one. As more recombination eventsoccur, the (absolute) correlation continues to decrease until |corr(XA, XB)| = 0. Now,in genetic systems, more recombination events happen between markers that are furtherapart on a chromosome (a consequence of the biological process of recombination). As aconsequence, there is more recombination and therefore lower correlation between markers
4
H0 : Cov(Xa, Y ) = 0 โง Cov(Xd, Y ) = 0 (35)
HA : Cov(Xa, Y ) โค= 0 โ Cov(Xd, Y ) โค= 0 (36)
H0 : ๏ฟฝa = 0 โง ๏ฟฝd = 0 (37)
HA : ๏ฟฝa โค= 0 โ ๏ฟฝd โค= 0 (38)
F๏ฟฝstatistic = f(๏ฟฝ) (39)
๏ฟฝยต = 0,๏ฟฝa = 4,๏ฟฝd = ๏ฟฝ1,โฅ2๏ฟฝ = 1 (40)
๏ฟฝ๏ฟฝa = 0, ๏ฟฝ๏ฟฝ
d = 0 (41)
๏ฟฝ๏ฟฝa = ๏ฟฝa, ๏ฟฝ
๏ฟฝd = ๏ฟฝd (42)
Pr(A1, A1) = Pr(A1)Pr(A1) = p2 (43)
Pr(A1, A2) = Pr(A1)Pr(A2) = 2pq (44)
Pr(A1, A1) = Pr(A2)Pr(A2) = q2 (45)
โฅ (Corr(Xa,A, Xa,B) = 0) โง (Corr(Xa,A, Xd,B) = 0) (46)
โง(Corr(Xd,A, Xa,B) = 0) โง (Corr(Xd,A, Xd,B) = 0) (47)
14
H0 : Cov(Xa, Y ) = 0 โง Cov(Xd, Y ) = 0 (35)
HA : Cov(Xa, Y ) โค= 0 โ Cov(Xd, Y ) โค= 0 (36)
H0 : ๏ฟฝa = 0 โง ๏ฟฝd = 0 (37)
HA : ๏ฟฝa โค= 0 โ ๏ฟฝd โค= 0 (38)
F๏ฟฝstatistic = f(๏ฟฝ) (39)
๏ฟฝยต = 0,๏ฟฝa = 4,๏ฟฝd = ๏ฟฝ1,โฅ2๏ฟฝ = 1 (40)
๏ฟฝ๏ฟฝa = 0, ๏ฟฝ๏ฟฝ
d = 0 (41)
๏ฟฝ๏ฟฝa = ๏ฟฝa, ๏ฟฝ
๏ฟฝd = ๏ฟฝd (42)
Pr(A1, A1) = Pr(A1)Pr(A1) = p2 (43)
Pr(A1, A2) = Pr(A1)Pr(A2) = 2pq (44)
Pr(A2, A2) = Pr(A2)Pr(A2) = q2 (45)
โฅ (Corr(Xa,A, Xa,B) = 0) โง (Corr(Xa,A, Xd,B) = 0) (46)
โง(Corr(Xd,A, Xa,B) = 0) โง (Corr(Xd,A, Xd,B) = 0) (47)
โฅ (Corr(Xa,A, Xa,B) โค= 0) โ (Corr(Xa,A, Xd,B) โค= 0) (48)
โ (Corr(Xd,A, Xa,B) โค= 0) โ (Corr(Xd,A, Xd,B) โค= 0) (49)
Pr(AiBk, AjBl) = Pr(AiAj)Pr(BkBl) (50)
Pr(AiBk, AjBl) = Pr(AiBk)Pr(AjBl) (51)
= Pr(Ai)Pr(Aj)Pr(Bk)Pr(Bl) = Pr(AiAj)Pr(BkBl) (52)
14
H0 : Cov(Xa, Y ) = 0 โง Cov(Xd, Y ) = 0 (35)
HA : Cov(Xa, Y ) โค= 0 โ Cov(Xd, Y ) โค= 0 (36)
H0 : ๏ฟฝa = 0 โง ๏ฟฝd = 0 (37)
HA : ๏ฟฝa โค= 0 โ ๏ฟฝd โค= 0 (38)
F๏ฟฝstatistic = f(๏ฟฝ) (39)
๏ฟฝยต = 0,๏ฟฝa = 4,๏ฟฝd = ๏ฟฝ1,โฅ2๏ฟฝ = 1 (40)
๏ฟฝ๏ฟฝa = 0, ๏ฟฝ๏ฟฝ
d = 0 (41)
๏ฟฝ๏ฟฝa = ๏ฟฝa, ๏ฟฝ
๏ฟฝd = ๏ฟฝd (42)
Pr(A1, A1) = Pr(A1)Pr(A1) = p2 (43)
Pr(A1, A2) = Pr(A1)Pr(A2) = 2pq (44)
Pr(A2, A2) = Pr(A2)Pr(A2) = q2 (45)
โฅ (Corr(Xa,A, Xa,B) = 0) โง (Corr(Xa,A, Xd,B) = 0) (46)
โง(Corr(Xd,A, Xa,B) = 0) โง (Corr(Xd,A, Xd,B) = 0) (47)
โฅ (Corr(Xa,A, Xa,B) โค= 0) โ (Corr(Xa,A, Xd,B) โค= 0) (48)
โ (Corr(Xd,A, Xa,B) โค= 0) โ (Corr(Xd,A, Xd,B) โค= 0) (49)
Pr(AiBk, AjBl) = Pr(AiAj)Pr(BkBl) (50)
Pr(AiBk, AjBl) = Pr(AiBk)Pr(AjBl) (51)
= Pr(Ai)Pr(Aj)Pr(Bk)Pr(Bl) = Pr(AiAj)Pr(BkBl) (52)
14
Same chromosome Iโข For polymorphisms on the same chromosome, they are linked so
if they are in disequilibrium, they are in LD
โข In general, polymorphisms that are closer together on a chromosome are in greater LD than polymorphisms that are further apart (exactly what we need for GWAS!)
โข This is because of recombination, the biological process by which chromosomes exchange sections during meiosis
โข Since recombination events occur at random throughout a chromosome (approximately!), the further apart two polymorphisms are, the greater the probability of a recombination event between them
โข Since the more recombination events that occur between polymorphisms, the closer they get to equilibrium, this means markers closer together tend to be in greater LD
Same chromosome II
โข In diploids, recombination occurs between pairs of chromosomes during meiosis (the formation of gametes)
โข Note that this results in taking alleles that were physically linked on different chromosomes and physically linking them on the same chromosome
โข To see how recombination events tend to increase equilibrium, consider an extreme example where alleles A1 and B1 always occur together on a chromosome and A2 and B2 always occur together on a chromosome:
โข If there is a recombination event, most chromosomes are A1-B1 and A2-B2 but now there is an A1-B2 and A2-B1 chromosome such that:
โข Note recombination events disproportionally lower the probabilities of the more frequent pairs!
โข This means over time, the polymorphisms will tend to increase equilibrium (decrease LD)
โข Since the more recombination events, the greater the equilibrium, polymorphisms that are further apart will tend to be in greater equilibrium, those closer together in greater LD
Same chromosome III
and 13) a gamete will have the same probability of having one of the โfourโ chromosomesfrom chromosome 11 and one of the โfourโ chromosomes from chromosome 13. Now if weassume all gametes in the population are produced this way, the frequency of a particulargamete in the population with alleles Ai and Bk will bePr(AiBk) = Pr(Ai)Pr(Bk)
i.e. the frequency of a gamete is the product of the frequency of each allele, and thesame for all other allele combinations in gametes. If we now assume random mating thenthe probability of any two specific gametes in the population fusing to produce an oโตspringis the same and the frequency of a genotype produced by two gametes mating is:
fr(AiBk)fr(AjBl) = fr(Ai)fr(Bk)fr(Aj)fr(Bl) = fr(AiAj)fr(BkBl) = fr(AiAjBkBl)(9)
Pr(AiBkAjBl) = Pr(AiBk)Pr(AjBl) (10)
Pr(AiBkAjBl) = Pr(AiBk)Pr(AjBl) = Pr(Ai)Pr(Bk)Pr(Aj)Pr(Bl) = Pr(Ai, Aj)Pr(Bk, Bl)(11)
where again this is for all combinations of i, j, k, l = 1 or 2. Thus, if we considered twomarkers, one on each chromosome, the states of the genotypes at these markers will alsobe independent. Since independence implies a correlation of zero, we expect markers ondiโตerent chromosomes to be uncorrelated (see class notes for a diagram).
Next letโs consider markers on the same chromosome. For this part, we will take theopposite strategy and show how markers that are close together are highly correlated (non-independent) and the further markers are located from each other physically, the greaterthe probability of a recombination event, which increases their independence. In sexual,diploid organisms where there is recombination, sections of a chromosome are swappedbetween two chromosomes that end up in gametes (see class notes for a diagram). Sincethe more recombination, the more genotypes of two markers are โmixed upโ, more recombi-nation tends to lower the correlation between markers. As an illustrative example, consideran extreme case where A1 and B1 always occur together on a chromosome and where A2
and B2 always occur together, i.e.
Pr(A1B2) = 0, Pr(A2B1) = 0
Pr(A1A2B1B1) = 0, Pr(A1A1B1B2) = 0
Corr(XA1A2,โค, Xโค,B1B1) = 0
Corr(XAiAj ,โค, Xโค,BkBl) = 0
4
and 13) a gamete will have the same probability of having one of the โfourโ chromosomesfrom chromosome 11 and one of the โfourโ chromosomes from chromosome 13. Now if weassume all gametes in the population are produced this way, the frequency of a particulargamete in the population with alleles Ai and Bk will bePr(AiBk) = Pr(Ai)Pr(Bk)
i.e. the frequency of a gamete is the product of the frequency of each allele, and thesame for all other allele combinations in gametes. If we now assume random mating thenthe probability of any two specific gametes in the population fusing to produce an oโตspringis the same and the frequency of a genotype produced by two gametes mating is:
fr(AiBk)fr(AjBl) = fr(Ai)fr(Bk)fr(Aj)fr(Bl) = fr(AiAj)fr(BkBl) = fr(AiAjBkBl)(9)
Pr(AiBkAjBl) = Pr(AiBk)Pr(AjBl) (10)
Pr(AiBkAjBl) = Pr(AiBk)Pr(AjBl) = Pr(Ai)Pr(Bk)Pr(Aj)Pr(Bl) = Pr(Ai, Aj)Pr(Bk, Bl)(11)
where again this is for all combinations of i, j, k, l = 1 or 2. Thus, if we considered twomarkers, one on each chromosome, the states of the genotypes at these markers will alsobe independent. Since independence implies a correlation of zero, we expect markers ondiโตerent chromosomes to be uncorrelated (see class notes for a diagram).
Next letโs consider markers on the same chromosome. For this part, we will take theopposite strategy and show how markers that are close together are highly correlated (non-independent) and the further markers are located from each other physically, the greaterthe probability of a recombination event, which increases their independence. In sexual,diploid organisms where there is recombination, sections of a chromosome are swappedbetween two chromosomes that end up in gametes (see class notes for a diagram). Sincethe more recombination, the more genotypes of two markers are โmixed upโ, more recombi-nation tends to lower the correlation between markers. As an illustrative example, consideran extreme case where A1 and B1 always occur together on a chromosome and where A2
and B2 always occur together, i.e.
Pr(A1B2) = 0, Pr(A2B1) = 0
Pr(A1B2) 6= 0, Pr(A2B1) 6= 0
Pr(A1A2B1B1) = 0, Pr(A1A1B1B2) = 0
Corr(XA1A1,โค, Xโค,B1B1) = 1
4
D๏ฟฝ = Dmin(XA1B1 (A1B1),XA2B2 (A2B2))
if D < 0
D๏ฟฝ = Dmin(XA1B2 (A1B2),XA2B1 (A2B1))
if D > 0
r =Cov(XAi
(Ai),XBj(Bj))โค
V ar(XAi(Ai))
qV ar(XBj
(Bj))
r = DโคV ar(XAi
(Ai))q
V ar(XBj(Bj))
Pr(A1, A1) = Pr(A1)Pr(A1) = Pr(XA1(A1))Pr(XA1(A1)) = p2
Pr(A1, A2) = Pr(A1)Pr(A2) = Pr(XA1(A1))Pr(XA2(A1)) = 2pqPr(A2, A2) = Pr(A2)Pr(A2) = Pr(XA2(A1))Pr(XA2(A1)) = q2
2. If two markers X and X ๏ฟฝ in a population are in H-W equilibrium, they are uncorre-lated, i.e. Corr(X,X ๏ฟฝ) = 0.
Pr(AiBj , AkBl) = Pr(AiBj)Pr(AkBl) = Pr(XAiBj (AiBj))Pr(XAiBj (AiBj))
๏ฟฝ Corr(Xa,A, Xa,B) = 0 OR Corr(Xd,A, Xd,B) = 0
Pr(XA1B1(A1B1), XA1B1(A1B1)) โฅ= Pr(XA1B1(A1B1))Pr(XA1B1(A1B1))
Pr(AiBj , AkBl) โฅ= Pr(AiBj)Pr(AkBl) ๏ฟฝ Corr(Xa,A, Xa,B) โฅ= 0
Pr(AiBj , AkBl) = Pr(AiBj)Pr(AkBl) ๏ฟฝ Corr(Xa,A, Xa,B) = 0
Corr(Xa,A, Xa,B) โฅ= 0
Corr(Xa,A, Xa,B) โฅ= 1
Corr(Xa,A, Xa,B) = 1 AND Corr(Xd,A, Xd,B) = 1
where the former property is a consequence of the independent segregation of chromosomesinto gametes and the latter is a property of genotypes that are on distinct chromosomesor genotypes that are โfar apartโ on a chromosome, such that the probability of a re-combination event between them each generation is one-half. So, from our discussionabove, independent assortment of chromosomes, random mating, and recombination tends
6
D๏ฟฝ = Dmin(XA1B1 (A1B1),XA2B2 (A2B2))
if D < 0
D๏ฟฝ = Dmin(XA1B2 (A1B2),XA2B1 (A2B1))
if D > 0
r =Cov(XAi
(Ai),XBj(Bj))โค
V ar(XAi(Ai))
qV ar(XBj
(Bj))
r = DโคV ar(XAi
(Ai))q
V ar(XBj(Bj))
Pr(A1, A1) = Pr(A1)Pr(A1) = Pr(XA1(A1))Pr(XA1(A1)) = p2
Pr(A1, A2) = Pr(A1)Pr(A2) = Pr(XA1(A1))Pr(XA2(A2)) = 2pqPr(A2, A2) = Pr(A2)Pr(A2) = Pr(XA2(A2))Pr(XA2(A2)) = q2
2. If two markers X and X ๏ฟฝ in a population are in H-W equilibrium, they are uncorre-lated, i.e. Corr(X,X ๏ฟฝ) = 0.
Pr(AiBj , AkBl) = Pr(AiBj)Pr(AkBl) = Pr(AiAk)Pr(BjBl)
Pr(AiBj , AkBl) = Pr(AiAk)Pr(BjBl)
Pr(AiBj , AkBl) โฅ= Pr(AiAk)Pr(BjBl)
๏ฟฝ Corr(Xa,A, Xa,B) โฅ= 0 OR Corr(Xd,A, Xd,B) โฅ= 0
Pr(XA1B1(A1B1), XA1B1(A1B1)) โฅ= Pr(XA1B1(A1B1))Pr(XA1B1(A1B1))
Pr(AiBj , AkBl) โฅ= Pr(AiBj)Pr(AkBl) ๏ฟฝ Corr(Xa,A, Xa,B) โฅ= 0
Pr(AiBj , AkBl) = Pr(AiBj)Pr(AkBl) ๏ฟฝ Corr(Xa,A, Xa,B) = 0
Corr(Xa,A, Xa,B) โฅ= 0
Corr(Xa,A, Xa,B) โฅ= 1 AND Corr(Xd,A, Xd,B) โฅ= 1
6
โข Mapping the position of a causal polymorphism in a GWAS requires there to be LD for genotypes that are both physically linked and close to each other AND that markers that are either far apart or on different chromosomes to be in equilibrium
โข Note that disequilibrium includes both linkage disequilibrium AND other types of disequilibrium (!!), e.g. gametic phase disequilibrium
Chr. 1
A B C
Chr. 2
D
equilibrium, linkage
equilibrium, no linkage
LD
Linkage Disequilibrium (LD)
โข Recall we the one coin flip example (how does the parameter of Bernoulli relate to MAF?):
โข The following model for two coin flips maps perfectly on to the model of genotypes (e.g., represented as number of A1 alleles) under Hardy-Weinberg equilibrium (e.g., for MAF = 0.5):
โข Note that the model need not conform to H-W since consider the following model (we could use a multinomial probability distribution):
Side topic: connection coin flip models to allele / genotypes
Pr(X = x) = Pr(X1 = x1, X2 = x2, ..., Xn
= xn
) = PX(x) or fX(x)
MLE(p) =1
n
nX
i=1
xi
(8)
MLE(ยต) = x =
1
n
nX
i=1
xi
(9)
A1 ! A2 ) ๏ฟฝY |Z (10)
gi
= Aj
Ak
(11)
2.1๏ฟฝ 0.3 + (0)(๏ฟฝ0.2) + (1)(1.1) + 0.7 (12)
SSE =
nX
n=1
(yi
๏ฟฝ yi
)
2(13)
HA
: ๏ฟฝAjAk
6= ๏ฟฝAlAm (14)
Y = ๏ฟฝ00 +X 0
a
๏ฟฝ0a
+X 0๏ฟฝ0d
+ โ (15)
๏ฟฝ2= 1 (16)
โ (17)
โฆ = {H,T} (18)
X(H) = 0, X(T ) = 1 (19)
4
Pr(X = x) = Pr(X1 = x1, X2 = x2, ..., Xn
= xn
) = PX(x) or fX(x)
MLE(p) =1
n
nX
i=1
xi
(8)
MLE(ยต) = x =
1
n
nX
i=1
xi
(9)
A1 ! A2 ) ๏ฟฝY |Z (10)
gi
= Aj
Ak
(11)
2.1๏ฟฝ 0.3 + (0)(๏ฟฝ0.2) + (1)(1.1) + 0.7 (12)
SSE =
nX
n=1
(yi
๏ฟฝ yi
)
2(13)
HA
: ๏ฟฝAjAk
6= ๏ฟฝAlAm (14)
Y = ๏ฟฝ00 +X 0
a
๏ฟฝ0a
+X 0๏ฟฝ0d
+ โ (15)
๏ฟฝ2= 1 (16)
โ (17)
โฆ = {H,T} (18)
X(H) = 0, X(T ) = 1 (19)
4
The diโฅerences among diโฅerent models in a particular family therefore simply dependson the specific values of the parameters.
To make this concept more concrete, let first consider the probability model for a dis-crete random variable that can take only one of two values 0 or 1 (which could representโHeadsโ or โTailsโ for a coin sample space of โone flipโ). In this case, our specific probabilitymodel is the Bernoulli distribution, which is a function of a single parameter p:
Pr(X = x|p) = PX
(x|p) = px(1๏ฟฝ p)1๏ฟฝx (1)
Note that we use a conditional notation, since the specific probability model depends onthe value of the contant, e.g. a โfair coinโ probability model is a case where p = 0.5. Theparameter p can take values from [0, 1], so in our parameter notation, we have ๏ฟฝ = p and๏ฟฝ = [0, 1]. We will often use the following shorthand X โค Bern(p) to indicate a randomvariable that has a Bernoulli distribution.
Letโs now introduce a second probability model that we could use to model our ran-dom variable describing the โnumber of Tailsโ for our sample space of โtwo coin flipsโS = {HH,HT, TH, TT}. Recall that this random variable had the following structure:X(HH) = 0, X(HT ) = 1, X(TH) = 1, X(TT ) = 2. We can simply represent this randomvariable as a function of two random variables X1 โค Bern(p) and X2 โค Bern(p) if we setX = X1 +X2. More generally, we could do this for a sample space for n flips of a coin ifwe set X =
โคn
i=1Xi
. In this case, the probability model for X is a binomial distribution:
Pr(X = x|n, p) = PX
(x|n, p) =๏ฟฝn
x
โฅpx(1๏ฟฝ p)n๏ฟฝx (2)
which technically has two parameters (n, p) but we often consider sets of probability modelsindexed by p for a specific n, i.e. we only consider the parameter p. For example, in our twoflip case, we have n = 2 and for these two flips, we can define a number of models includingthe โfair coinโ model p = 0.5. Note that if you are unfamiliar with โchooseโ notation, it isdefined as follows: ๏ฟฝ
n
x
โฅ=
n!
x!(n๏ฟฝ x)!(3)
n! = n โฅ (n๏ฟฝ 1) โฅ (n๏ฟฝ 2) โฅ ... โฅ 1 (4)
which intuitively accounts for the diโฅerent orderings that lead to the same number ofโTailsโ, e.g. in the two flip case, the ordering HT and TH produce the same number ofTails. We use the following shorthand for the Binomial distribution: X โค Bin(n, p).
Other important discrete distributions include the Hypergeometric, Geometric, and Pois-son. We will discuss the former when we consider Fisherโs exact test. While we will not
2
3 Discrete random variables
To make the concept of a random variable more clear, letโs begin by considering discreterandom variables, where just as with discrete sample spaces, we assume that we can enu-merate the values that the random variable can take, i.e. they take specific values wecan count such as 0, 1, 2, etc. and cannot take any value within an interval (althoughnote they can potentially take an infinite number of discrete states!). For example, for oursample space of two coin flips S = {HH,HT, TH, TT}, we can define a random variableX representing โnumber of Tailsโ:
X(HH) = 0, X(HT ) = 1, X(TH) = 1, X(TT ) = 2 (3)
This is something useful we might want to know about our sample outcomes and now wecan work with numbers as opposed to concepts like โHTโ.
Since we have defined a probability function and a random variable on the same sam-ple space S, we can think of the probability function as inducing a probability distributionon the random variable. We will often represent probability distributions using PX(x) orPr(X = x), where the lower case โxโ indicates the specific value taken by the randomvariable X. For example, if we define a โfair coinโ probability model for our two flip samplespace:
Pr(HH) = Pr(HT ) = Pr(TH) = Pr(TT ) = 0.25 (4)
given this probability model and the random variable defined in equation (3), we now havethe following probability distribution for X:
PX(x) = Pr(X = x) =
๏ฟฝโค
โฅ
Pr(X = 0) = 0.25Pr(X = 1) = 0.5Pr(X = 2) = 0.25
(5)
where, again, we use lower case x to indicate a specific realization of the random variableX. Note that it is implicit that a probability of zero is assigned to every other value ofX. Here, we have also introduced the notation PX(x) to indicate that this probabilitydistribution is a probability mass function or โpmfโ, i.e. a probability distribution for a dis-crete random variable. This is to distinguish it from a probability distribution defined on acontinuous random variable, which we will see is slightly di๏ฟฝerent conceptually. Intuitively,the โmassโ part of this description can be seen when plotting this probability distributionwith the value taken by X on the X-axis and the probability on the Y-axis (see plot fromclass). In this case the โmassโ of the probability is located at three points: 0, 1, and 2.
Now that we have introduced a pmf, letโs consider a related concept: the cumulativemass function or โcmfโ. When first introduced, it is not clear why we need to define cmfโs.However, it turns out the cmfโs play an important role in probability theory and statistics,
2
3 Discrete random variables
To make the concept of a random variable more clear, letโs begin by considering discreterandom variables, where just as with discrete sample spaces, we assume that we can enu-merate the values that the random variable can take, i.e. they take specific values wecan count such as 0, 1, 2, etc. and cannot take any value within an interval (althoughnote they can potentially take an infinite number of discrete states!). For example, for oursample space of two coin flips S = {HH,HT, TH, TT}, we can define a random variableX representing โnumber of Tailsโ:
X(HH) = 0, X(HT ) = 1, X(TH) = 1, X(TT ) = 2 (3)
This is something useful we might want to know about our sample outcomes and now wecan work with numbers as opposed to concepts like โHTโ.
Since we have defined a probability function and a random variable on the same sam-ple space S, we can think of the probability function as inducing a probability distributionon the random variable. We will often represent probability distributions using PX(x) orPr(X = x), where the lower case โxโ indicates the specific value taken by the randomvariable X. For example, if we define a โfair coinโ probability model for our two flip samplespace:
Pr(HH) = Pr(HT ) = Pr(TH) = Pr(TT ) = 0.25 (4)
given this probability model and the random variable defined in equation (3), we now havethe following probability distribution for X:
PX(x) = Pr(X = x) =
๏ฟฝโค
โฅ
Pr(X = 0) = 0.25Pr(X = 1) = 0.5Pr(X = 2) = 0.25
(5)
where, again, we use lower case x to indicate a specific realization of the random variableX. Note that it is implicit that a probability of zero is assigned to every other value ofX. Here, we have also introduced the notation PX(x) to indicate that this probabilitydistribution is a probability mass function or โpmfโ, i.e. a probability distribution for a dis-crete random variable. This is to distinguish it from a probability distribution defined on acontinuous random variable, which we will see is slightly di๏ฟฝerent conceptually. Intuitively,the โmassโ part of this description can be seen when plotting this probability distributionwith the value taken by X on the X-axis and the probability on the Y-axis (see plot fromclass). In this case the โmassโ of the probability is located at three points: 0, 1, and 2.
Now that we have introduced a pmf, letโs consider a related concept: the cumulativemass function or โcmfโ. When first introduced, it is not clear why we need to define cmfโs.However, it turns out the cmfโs play an important role in probability theory and statistics,
2
3 Discrete random variables
To make the concept of a random variable more clear, letโs begin by considering discreterandom variables, where just as with discrete sample spaces, we assume that we can enu-merate the values that the random variable can take, i.e. they take specific values wecan count such as 0, 1, 2, etc. and cannot take any value within an interval (althoughnote they can potentially take an infinite number of discrete states!). For example, for oursample space of two coin flips S = {HH,HT, TH, TT}, we can define a random variableX representing โnumber of Tailsโ:
X(HH) = 0, X(HT ) = 1, X(TH) = 1, X(TT ) = 2 (3)
This is something useful we might want to know about our sample outcomes and now wecan work with numbers as opposed to concepts like โHTโ.
Since we have defined a probability function and a random variable on the same sam-ple space S, we can think of the probability function as inducing a probability distributionon the random variable. We will often represent probability distributions using PX(x) orPr(X = x), where the lower case โxโ indicates the specific value taken by the randomvariable X. For example, if we define a โfair coinโ probability model for our two flip samplespace:
Pr(HH) = Pr(HT ) = Pr(TH) = Pr(TT ) = 0.25 (4)
given this probability model and the random variable defined in equation (3), we now havethe following probability distribution for X:
PX(x) = Pr(X = x) =
๏ฟฝโค
โฅ
Pr(X = 0) = 0.25Pr(X = 1) = 0.5Pr(X = 2) = 0.25
(5)
where, again, we use lower case x to indicate a specific realization of the random variableX. Note that it is implicit that a probability of zero is assigned to every other value ofX. Here, we have also introduced the notation PX(x) to indicate that this probabilitydistribution is a probability mass function or โpmfโ, i.e. a probability distribution for a dis-crete random variable. This is to distinguish it from a probability distribution defined on acontinuous random variable, which we will see is slightly di๏ฟฝerent conceptually. Intuitively,the โmassโ part of this description can be seen when plotting this probability distributionwith the value taken by X on the X-axis and the probability on the Y-axis (see plot fromclass). In this case the โmassโ of the probability is located at three points: 0, 1, and 2.
Now that we have introduced a pmf, letโs consider a related concept: the cumulativemass function or โcmfโ. When first introduced, it is not clear why we need to define cmfโs.However, it turns out the cmfโs play an important role in probability theory and statistics,
2
For the specific probability function and random variables we have defined, this producesthe following PX1,X2(x1, x2):
Pr(X1 = 0, X2 = 0) = 0.0, P r(X1 = 0, X2 = 1) = 0.25Pr(X1 = 1, X2 = 0) = 0.25, P r(X1 = 1, X2 = 1) = 0.25Pr(X1 = 2, X2 = 0) = 0.25, P r(X1 = 2, X2 = 1) = 0.0
(18)
where Pr(X1 = x1, X2 = x2) = Pr(X1 โคX2), etc. We can also write this using our tablenotation:
X2 = 0 X2 = 1X1 = 0 0.0 0.25 0.25X1 = 1 0.25 0.25 0.5X1 = 2 0.25 0.0 0.25
0.5 0.5
Note that with this table we have also written out the marginal pdfโs of X1 and X2,which are just the pdfโs of X1 and X2: PX1(x1) = {Pr(X1 = 0) = 0.25, P r(X1 = 1) =0.5, P r(X1 = 2) = 0.25} and PX2(x2) = {Pr(X2 = 0) = 0.5, P r(X2 = 1) = 0.5}.
Just as we defined conditional probabilities for subsets of a sample space S for whichwe have defined a probability function Pr(S), we can similarly define the conditional prob-abilities of random variables:
Pr(X1|X2) =Pr(X1 โคX2)
Pr(X2)(19)
such that we have for example:
Pr(X1 = 0|X2 = 1) =Pr(X1 = 0 โคX2 = 1)
Pr(X2 = 1)=
0.25
0.5= 0.5 (20)
Note that we can in fact use random variables as a means to define sample space subsets,so the concept of conditional probability defined for sample spaces and for joint randomvariables are interchangeable.
We can similarly define an (interchangeable) concept of independent random variables.Note that our current X1 and X2 are not independent, since:
Pr(X1 = 0 โคX2 = 1) = 0.25 โฅ= Pr(X1 = 0)Pr(X2 = 1) = 0.25 ๏ฟฝ 0.5 = 0.125 (21)
and for random variables to be independent, all possible combinations of outcomes mustadhere to the definition of independence. To provide an example of random variables that
7
The diโฅerences among diโฅerent models in a particular family therefore simply dependson the specific values of the parameters.
To make this concept more concrete, let first consider the probability model for a dis-crete random variable that can take only one of two values 0 or 1 (which could representโHeadsโ or โTailsโ for a coin sample space of โone flipโ). In this case, our specific probabilitymodel is the Bernoulli distribution, which is a function of a single parameter p:
Pr(X = x|p) = PX
(x|p) = px(1๏ฟฝ p)1๏ฟฝx (1)
Note that we use a conditional notation, since the specific probability model depends onthe value of the contant, e.g. a โfair coinโ probability model is a case where p = 0.5. Theparameter p can take values from [0, 1], so in our parameter notation, we have ๏ฟฝ = p and๏ฟฝ = [0, 1]. We will often use the following shorthand X โค Bern(p) to indicate a randomvariable that has a Bernoulli distribution.
Letโs now introduce a second probability model that we could use to model our ran-dom variable describing the โnumber of Tailsโ for our sample space of โtwo coin flipsโS = {HH,HT, TH, TT}. Recall that this random variable had the following structure:X(HH) = 0, X(HT ) = 1, X(TH) = 1, X(TT ) = 2. We can simply represent this randomvariable as a function of two random variables X1 โค Bern(p) and X2 โค Bern(p) if we setX = X1 +X2. More generally, we could do this for a sample space for n flips of a coin ifwe set X =
โคn
i=1Xi
. In this case, the probability model for X is a binomial distribution:
Pr(X = x|n, p) = PX
(x|n, p) =๏ฟฝn
x
โฅpx(1๏ฟฝ p)n๏ฟฝx (2)
which technically has two parameters (n, p) but we often consider sets of probability modelsindexed by p for a specific n, i.e. we only consider the parameter p. For example, in our twoflip case, we have n = 2 and for these two flips, we can define a number of models includingthe โfair coinโ model p = 0.5. Note that if you are unfamiliar with โchooseโ notation, it isdefined as follows: ๏ฟฝ
n
x
โฅ=
n!
x!(n๏ฟฝ x)!(3)
n! = n โฅ (n๏ฟฝ 1) โฅ (n๏ฟฝ 2) โฅ ... โฅ 1 (4)
which intuitively accounts for the diโฅerent orderings that lead to the same number ofโTailsโ, e.g. in the two flip case, the ordering HT and TH produce the same number ofTails. We use the following shorthand for the Binomial distribution: X โค Bin(n, p).
Other important discrete distributions include the Hypergeometric, Geometric, and Pois-son. We will discuss the former when we consider Fisherโs exact test. While we will not
2
Thatโs it for today
โข See you Tues.!