a simulation study for bayesian hierarchical model
Post on 15-Jun-2022
2 Views
Preview:
TRANSCRIPT
A SIMULATION STUDY FOR BAYESIAN HIERARCHICALMODEL SELECTION METHODS
Fang Fang
A Thesis Submitted to theUniversity of North Carolina Wilmington in Partial Fulfillment
of the Requirements for the Degree ofMaster of Science
Department of Mathematics and Statistics
University of North Carolina Wilmington
2009
Approved by
Advisory Committee
Chair
Accepted by
Dean, Graduate School
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 BAYESIAN STATISTICS . . . . . . . . . . . . . . . . . . . . 3
2.2 HIERARCHICAL BAYESIAN MODEL . . . . . . . . . . . . 4
2.3 GIIBS SAMPLER . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 PRIOR DISTRIBUTION . . . . . . . . . . . . . . . . . . . . 9
2.5 POSTERIOR DISTRIBUTION . . . . . . . . . . . . . . . . . 10
3 TWO MODEL SEARCH METHOD . . . . . . . . . . . . . . . . . . 12
3.1 ACTIVATION PROBABILITY . . . . . . . . . . . . . . . . . 12
3.2 MODEL SEARCH BY SYSTEMATIC PROCESS . . . . . . . 13
3.3 MODEL SEARCH BY STOCHASTIC PROCESS . . . . . . . 16
4 SIMULATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 DATA SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
A. EXAMPLE OF USING R CODE TO GENERATE SIMULATED
DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B. SYSTEMATIC SEARCH METHOD . . . . . . . . . . . . . . . . . 32
ii
C. STOCHASTIC SEARCH METHOD . . . . . . . . . . . . . . . . . 43
iii
ABSTRACT
Model selection is a useful method in determine important features that control a
response. This thesis explores two model selection strategies in a hierarchical setting.
The first method is the systematic search proposed in Haikun Bao’s thesis (2006)
and the second method is a stochastic search proposed in Yi Chen’s thesis (2007) and
further developed by UNCW student Qijun Fang (2009). An intensive simulation
study investigates the usefulness of these two methodologies under a number of
situations Both methods appear to identify important features in the study, but the
stochastic search produces slightly better results.
iv
DEDICATION
I would like to dedicate this thesis to my family especially my husband Qijun
Fang, whose love and support make me able to do more than I thought.
v
ACKNOWLEDGMENTS
I would like to thank my advisor Dr. Susan Simmons from the bottom of my
heart. Without her great insight, guidance, patience and encouragement all the way
through my study and research in the past two and half years, this can never be
possible.
I would also like to thank all the faculty members, staff and fellow students in
the Mathematics and Statistics department for all kinds of help in my studying life
here.
vi
LIST OF TABLES
1 Marker Information (X Matrix) . . . . . . . . . . . . . . . . . . . . . 21
2 Quantitative Trait Information (Y Matrix) . . . . . . . . . . . . . . . 22
3 Result of one QTL detection . . . . . . . . . . . . . . . . . . . . . . . 22
4 Result of two QTLs detection (same chromosome) . . . . . . . . . . . 23
5 Result of two QTLs detection (different chromosomes) . . . . . . . . 24
6 Result of Three QTLs detection . . . . . . . . . . . . . . . . . . . . . 25
7 Result of two QTLs detection (same chromosome) . . . . . . . . . . . 26
vii
LIST OF FIGURES
1 Structure of hierarchical model . . . . . . . . . . . . . . . . . . . . . 5
2 Detect QTL by systematic method . . . . . . . . . . . . . . . . . . . 15
3 Model vectors inside a Markov Chain . . . . . . . . . . . . . . . . . . 18
4 Genetic map for Bay×Sha . . . . . . . . . . . . . . . . . . . . . . . . 19
viii
1 INTRODUCTION
In published literature, there exists many methods to identify potential useful
models. For example, in many regression books, forward selection, stepwise forward
selection, backward elimination and best subsets are just a few of the methods
discussed (Kutner et al, ).
However, the choice of the best model for the data is still a question of interest.
Model selection is the task of choosing the best statistical models for a given data-
set from a group of potential models. By choosing the best model for the data,
the most important features that control the response are identified. One example
is the Quantitative Trait Loci(QTL) experiment, researchers often need to identify
the location or loci on a genetic map responsible for controlling a quantitative trait
from many possible potential markers. The markers responsible for controlling the
response or quantitative trait are referred to as QTL. QTL greatly help researchers
understand the biochemical basis of these traits and their evolution in populations
over time (Bao, et al 2006). However, if a bad model is selected, then inference
based on the data and the model will usually lead to erroneous conclusions and can
potentially lead researchers in the wrong direction.
Therefore, one must be careful when attempting to discover the “best” model for
the data. But the word “best” can be somewhat controversial. Usually, the greater
the number of features the model has, the better that model fits the data. However,
the complexity of the model increases as well. The more potential features there are
for a data set, the more possible models there are. For example, when there are P
features, there are a total of 2P possible models. For large P , the above mentioned
methodologies may lead to numerous different “best” models.
There are a number of frequentist and Bayesian model selection procedures avail-
able to identify the “best” model(Ibrahim et al,). Several criterion-based procedures
1
have been proposed and are frequently used. Among the most widely used crite-
ria are Akaike’s Information Criterion(Akaike, 1973), Schwarz’s Bayesian Informa-
tion Criterion(Schwarz, 1978), Bedrick and Tsai’s modification of AIC(1994) and
Gabriel(1968) and Mckay(1977)’s discussion about the approach of simultaneous
test procedures for model selection in the multivariate linear model. This thesis
investigates two different model search strategies. Both of these strategies are useful
in hierarchical structures.
Hierarchical models are useful in modeling complex data structures. For example,
Dominici, Samet, and Zeger (2000) used a hierarchical approach to combine dose
response estimates characterizing the health effects of air pollution in a series of U.S
cities, drawing on results from Daniel and Kass(1988) to argue that the approach
will give similar results to a full analysis based on raw data when the study specific
sample sizes are moderate to large. Simmons, Piegorsch, Nitcheva, and Zeiger(2003)
used Bayesian hierarchical models to synthesize information from multiple studies
in an environmental mutagenesis context. Coull, Mezzetti, Ryan used a Bayesian
hierarchical model to quantify the adverse health effects associated with in-utero
exposure to methylmercury. Bayesian Hierarchical models are flexible and are able
to handle many features.
This thesis compare two model search strategies developed by graduate stu-
dents at UNCW for hierarchical models. One method involves using a systematic
search(Boone, et al) and the other one involves a stochastic search( Simmons, et al).
Section 2 discusses the background information for the hierarchical model. Section
3 introduces these two model search strategies and section 4 outlines the simulation
study. Section 5 is the conclusion of this study.
2
2 BACKGROUND
2.1 BAYESIAN STATISTICS
Unlike frequentists who take parameters as fixed but unknown quantities, Bayesian
statisticians regard parameters as random variables with a probability distribution.
The process of Bayesian data analysis can be illustrated by the following three steps:
1. Setting up a full probability model for all observable and unobservable data
which are consistent with knowledge about the problem and the data collection
processes.
2. Conditioning on observed data, calculate and interpret the appropriate pos-
terior distribution-the conditional probability distribution of the unobserved
quantities of ultimate interest after the data are observed.
3. Evaluating the fitness of the model and the implications of the resulting poste-
rior distribution. Test if the model fits the data will. Check if the substantive
conclusions are reasonable, and see how sensitive the results are to the model-
ing assumptions. If necessary, one can alter or expand the model and repeat
the three steps.
Bayesians first set up a prior distribution for the parameters, which is from
the prior knowledge and/or expert advice. And then this information is combined
with the observed data to obtain the posterior distribution by Bayes’ Theorem.
Assuming θ is an unknown parameter, and p(θ) is its prior probability distribution.
The distribution of the data is represented as p(y|θ) . Using the prior probability
distribution and the information from the data, the posterior probability can be
found by Bayes’ Theorem:
p(θ|Y ) =p(y|θ)p(θi)∫
M p(y|θ)p(θ)dθ(1)
3
The Bayesian analysis is then based on inferences from the posterior distribution[1][2][3].
2.2 HIERARCHICAL BAYESIAN MODEL
The hierarchical Bayesian Model is more flexible to adequately deal with the
situation when the observed data has multiple levels, such as QTL detection in
plant experiment. Suppose the quantitative trait in the plant QTL experiment is
represented as yij, with i = 1, . . . , L (L=number of plant lines or genotypes) and
j = 1, . . . , ni ( ni=number of replicates). In the first level, the distribution of
observed data from any ith genotype is assumed to be independently distributed
with the mean represented by the parameter θi and a variance σ2i .
The probability density function of the data given the parameter θi and a variance
σ2i is written as p(yij|θi, σ2
i ) , which is called the likelihood function. In the second
level, the mean θi is conditional on the future parameters which are known as hyper-
parameters. Therefore, the Bayesian hierarchical model constructs a relationship
between multi-parameters in a layered data structure[4].
Figure 1 illustrates the structure of the data in a plant QTL experiment. The
data yij are obtained from a distribution with mean θi which depends on the hyper-
parameters of β and τ 2.
4
Hyper‐parameters
β τ2
θ1
y11 ,
y12 ,
y1n1
θ2
y21 ,
y22 ,
y2n2
θ3
y31 ,
y32 ,
y3n3
θi
yi1 ,
yi2 ,
yin3
Figure 1: Structure of hierarchical model
5
2.3 GIIBS SAMPLER
Gibbs sampler is a particular Markov chain Monte Carlo algorithm which has
been found useful in many multidimensional problems. In this thesis, we use the
Gibbs sampler to obtain samples from the joint posterior distribution. In each
iteration of the Gibbs sampler, we get a sample of parameters conditional on the
values of all the other parameters. At each iteration t, parameters θ, σ2, β, τ 2 are
sampled and updated conditional on the last value of the other parameters.
β(t+1) ∼ P (β|θ(t), σ2(t), τ 2(t), y) (2)
τ 2(t+1) ∼ P (τ 2|θ(t), σ2(t), β(t+1), y) (3)
θ(t+1) ∼ P (θ|σ2(t), τ 2(t+1), β(t+1), y) (4)
σ2(t+1) ∼ P (σ|θ(t+1), τ 2(t+1), β(t+1), y) (5)
In addition, in order to generate a Gibbs sequence containing samples following
the stationary joint posterior distribution, we will use the following initial values:
1. θ(0) sample average of observed data in the ith line
2. σ2(0) sample variance of observed data in the ith line
3. β(0)estimates from a regression model based on the marker origin information
matrix
4. τ 2(0) variance between sample means
Now we are able to start the Gibbs sampler from these starting points to obtain
samples for θ, σ2, β, τ 2 by updating these parameters sequentially from their full
conditional distribution. The four full conditional distributions are:
6
1. To obtain samples of θ′s from the distribution of θ conditional on the other
parameters, p(θ|σ2, β, τ 2, y), we have
p(θ|σ2, β, τ 2, y) =p(θ, σ2, β, τ 2|y)
p(σ2, β, τ 2|y)
=p(θ|Xβ, τ 2)p(y|θ, σ2)∫p(θ|Xβ, τ 2)p(y|θ, σ2)dθ
∝ exp[− 1
2τ 2(θ −Xβ)′(θ −Xβ)−
L∑i=1
ni∑j=1
1
2σ2i
(yij − θi)2]
∝ exp{L∑i=1
[−1
2(
1
τ 2+niσ2i
)θ2i + (
Xiβ
τ 2+
∑nij=1 yij
σ2i
)θi]}
∝ exp{L∑ −1
2( 11τ2
+niσ2i
)(θi −
Xiβτ2 +
∑ni yijσ2i
1τ2 + ni
σ2i
)2}
where Xi is the ith line of genotypes in the experiment
θi|σ2, τ 2, β, Y ∼ N [
Xiβτ2 +
∑ni yijσ2i
1τ2 + ni
σ2i
,1
1τ2 + ni
σ2i
] (6)
2. To obtain samples of σ2’s from the distribution of σ2 conditional on the other
parameters, p(σ2|θ, β, τ 2, y), we have
P (σ2|θ, τ 2, β, y) =P (y|θ, σ2)P (θ|τ 2, β)∫P (Y |θ, σ2)P (σ2)dσ2
∝L∏
(σ2i )−(
σ20+2
2+ni2
+1)exp{−[L∑i=1
1
2σ2i
+L∑i=1
ni∑j=1
1
2σ2i
(yij − θi)2]}
∝L∏
(σ2i )−(
σ20+2
2+ni2
+1)exp{−[L∑i=1
(1
2σ2i
)[ni∑j=1
(yij − θ)2 + 1]}
σ2i |θ, τ 2, β, Y ∼ Inv −Gamma[
σ20 + ni
2,ni∑j=1
(yij − θi)2 + 1
2] (7)
3. To obtain samples of τ 2’s from the distribution of τ 2 conditional on the order
7
parameters, p(β|θ, σ2, τ 2, y), we have
P (β|θ, σ2, τ 2, y) =P (θ|τ 2, β)P (β)∫P (θ|τ 2, β)P (β)dβ
∝ exp{−ββ′
200− 1
2τ 2(θ −Xβ)′(θ −Xβ)}
∝ exp{−1
2[β′(
1
100+X ′X
τ 2)β − 2
τ 2θ′Xβ]}
∝ exp{−1
2[β − (
I
100+X ′X
τ 2)X ′θ
τ 2]′(
I
100
+X ′X
τ 2)[β − (
I
100+X ′X
τ 2)X ′θ
τ 2]}
where I is L× L identity matrix.
β|θ, σ2, τ 2, Y ∼ N [(I
100+X ′X
τ 2)X ′θ
τ 2, (
I
100+X ′X
τ 2)−1] (8)
4. To obtain samples of τ 2’s from the distribution of τ 2 conditional on the order
parameters, p(τ 2|θ, σ2, β, y), we have
P (τ 2|θ, σ2, β, y) =p(τ 2, θ, σ2, β|y)
p(θ, σ2, β|y)
=P (y|θ, σ2)P (θ|τ 2, β)P (σ2)P (τ 2)P (β)
P (y|θ, σ2)P (θ|β)P (σ2)P (β)
=P (θ|τ 2, β)P (τ 2)∫P (θ|β, τ 2)P (τ 2)dτ 2
∝ τ−τ0−2−Gexp{− 1
2τ 2[(θ −Xβ)′(θ −Xβ) + 1]}
∝ (τ 2)−(L+τ20
2+1)exp{− [(θ −Xβ)′(θ −Xβ) + 1]/2
τ 2}
where τ 20 = 1, L is the number of gynotypes in the experiment
τ 2i |σ2, τ 2, β, y ∼ Inv −Gamma(
L+ τ 20
2,(θ −Xβ)′(θ −Xβ) + 1
2) (9)
In the QTL plant experiment, there are 38 candidate markers in Bay-0×Shahdara
8
recombinant inbred lines from Arabidopisis thaliana, which means there are 238 pos-
sible regression models existed in the experiment . In this thesis, we will consider a
much smaller subset of the model space and for each model we will only do 52000
iterations in the Gibbs sampler due to the computational intensity, and in order
to diminish the effect of a possible bad starting distribution, the first 2000 itera-
tions out of 52000 iterations for each parameter will be discarded. We assume that
the distribution of the simulated parameter values, for large enough iteration t, con-
verges to a stationary joint posterior distribution p(θ, σ2, β, τ 2|y), which is the target
distribution we are trying to simulate[5][6].
2.4 PRIOR DISTRIBUTION
Reasonable prior distribution are able to be set up in terms of given information
and knowledge, and in order to simplify result, we try to use the conjugate prior
distribution. Conjugate means the property that the posterior distribution follows
the same parametric form as the prior distribution[6]. In many times, the conjugate
prior distributions can put the posterior distribution in analytic form..Here, we have
an example to illustrate this case:
Let’s suppose the data are obtained from a poisson distribution, the likelihood
function is of the form p(y|θ) = θy ·e−θy!
, then the conjugate prior for this distribution
is the Gamma distribution p(θ) = Gamma(α, β), and its posterior distribution is of
the form:
p(θ|y) =p(y|θ)p(θ)∑p(y|θ)p(θ)
=p(y|θ)p(θ)∫p(y|θ)p(θ)dθ
=
θy ·e−θy!
θα−1 e− θβ
Γ(α)·βα∫∞0
θy ·e−θy!
θα−1 e− θβ
Γ(α)·βαdθ
9
=θy+α−1 · e−θ(1+ 1
β)∫∞
0 θy+α−1 · e−θ(1+ 1β
)dθ
=θy+α−1 · e−θ(1+ 1
β)
Γ(y + α)( ββ+1
)y+α∫∞
0θy+α−1·e
− θββ+1
Γ(y+α)( ββ+1
)y+αdθ
=θy+α−1 · e−θ(1+ 1
β)
Γ(y + α)( ββ+1
)y+α
We can see the posterior distribution is Gamma: p(θ|y) ∼ Gamma(y + α, ββ+1
)
2.5 POSTERIOR DISTRIBUTION
The posterior distribution summarize the current state of knowledge about all
the uncertain quantities (including unobservable parameters and also missing, latent,
and unobserved potential data) in Bayesian analysis. Analytically, the posterior
density is proportional to the product of the prior density and the likelihood function.
Now, let’s derive the posterior distribution of parameters p(β, θ, σ2, τ 2|y):
p(β, θ, σ2, τ 2|y) =p(y|β, θ, σ2, τ 2)p(β, θ, σ2, τ 2)∫ ∫
· · ·∫p(y|β, θ, σ2, τ 2)p(β, θ, σ2, τ 2)dθdσ2dβdτ 2
(10)
where
∫ ∫· · ·
∫p(y|β, θ, σ2, τ 2)p(β, θ, σ2, τ 2)dθdσ2dβdτ 2 = p(y) = p(D|M) (11)
This is the probability of data given model which is of great interest and we use
the Monte Carlo method to estimate the integral. The quantity p(D|M) can be
approximated by averaging the product of p(y|β, θ, σ2, τ 2) and p(β, θ, σ2, τ 2) after
substituting samples obtained from the Gibbs Sampler[6].
∫ ∫· · ·
∫p(y|β, θ, σ2, τ 2)p(β, θ, σ2, τ 2)dθdσ2dβdτ 2
10
=1
N
∑p(y|β, θ, σ2, τ 2)p(β, θ, σ2, τ 2) (12)
where N is the number of iterations in the Gibbs sampler, N=50000 in QTL exper-
iment, if the first 2000 iterations have already been discarded.
Our ultimate goal is to obtain posterior probabilities for the model
p(Mi|D) =p(D|Mi)p(Mi)∑Li=1 p(D|Mi)p(Mi)
(13)
Here p(Mi) is the prior probability of the model, and since we have no prior knowl-
edge on which model is the best, we assign p(Mi) to have equal probability. After
simplification we can obtain the posterior probabilities of model, as
p(Mi|D) =p(D|Mi)∑Li=1 p(D|Mi)
(14)
11
3 TWO MODEL SEARCH METHOD
The observed quantitative traits in the plant QTL experiment are represented
as yij, with i = 1, . . . , L(L=number of plant lines or genotypes) and j = 1, . . . , ni (
ni=number of replicates). The ture mean of yij is represented as θi for genotype i ,
and we assume yij ∼ N(θi, σ2) . Each θi is assumed to be linearly dependent on the
genetic composition of the plant which can be expressed as,
θi = β0 + β1xi1 + β2xi2 + · · ·+ βMxiM + ε (15)
where xmi = 0.5 if the marker is from parent A , xmi = −0.5 if the marker is from
parent B, and xmi = 0 if the information of this marker was lost.
3.1 ACTIVATION PROBABILITY
After finding the posterior probabilities for each model, we need to figure out
which marker or markers are most important, we answer this by finding activation
probabilities for each marker. The activation probability for each marker is defined
as
p(βi 6= 0|D) =K∑i=1
p(βj 6= 0|Mi, D)p(Mi|D) (16)
where K is the total number of all the models, Mi is the ith model. By Bayesian
model averaging[8], βj 6= 0 is dependent on whether the jth marker is included in
the best models or not. That is,
P (βj 6= 0|Mi, D) = 1 if jth marker is in the ith model
P (βj 6= 0|Mi, D) = 0 if jth marker is not in the ith model
By using the activation probability, we can us to detect which marker has the
significant effect on the plant QTL. In our experiment, we set up 0.5 as the sufficient
12
line for the marker’s activation probability.
Since many genetics usually have a lot of markers(sometimes more than 100),
suppose we have M markers, then the numbers of potential models should be 2M ,
which may be an extremely large number. It can become computationally inten-
sive in these situations, so we need to find a method to simplify the procedure of
computation.
3.2 MODEL SEARCH BY SYSTEMATIC PROCESS
The systematic processs can breaks down the genome into smaller regions by
conditioning on the regions of important. In systematic search method, first we
break the genome into N chromosomes, yielding 2N number of models need to be
evaluated. Then we obtain the activation probability for each segments by using:
p(Si 6= 0|D) =K∑i=1
p(Sj 6= 0|Mi, D)p(Mi|D) (17)
where Sj means the ith segment(chromosome here).
First, we identify which chromosome(s) is(are) important if a chromosome has
an activation probability great than 0.5, then in the next step, we will divide
it into halves. We will keep doing this procedure do these until the important
marker(s) with QTL(s) are identified. For instance, in our plant experiment there
are 5 chromosomes which contains 9,7,6,8,8 markers respectively. The search algo-
rithm first find which chromosomes make a significant contribution to the QTL
by searching through all potential models and calculating the activation proba-
bility for each chromosome (We denote them as S1, S2, S3, S4, S5). Suppose we
obtained p(S1 6= 0|D) = 0.6, p(S2 6= 0|D) = 0.4, p(S3 6= 0|D) = 0.7, p(S4 6=
0|D) = 0.3, p(S5 6= 0|D) = 0.2. Then chromosomes 1(S1) and 3(S3) have acti-
vation probabilities greater than 0.5, Then we do future analysis on chromosomes
13
1 and 3. We know chromosome 1 has 9 markers and chromosome 3 has 6 mark-
ers, and then we divide these two chromosomes to two parts, as 5+4 and 3+3,
so now we have 4 segments which are denoted as S11, S12, S31, S32( 24 models).
Suppose, after calculating the new activation probabilities for segments, we have
p(S11 6= 0|D) = 0.5, p(S12 6= 0|D) = 0.4, p(S31 6= 0|D) = 0.9, p(S32 6= 0|D) = 0.3.
The algorithm is rerun and continues to divide S11 an S31. Finally it will pick out the
markers with activation probability higher than 0.5, and they are of great interest.
14
Chr 1 M1‐9
0.99
Chr 2 M10‐16
0.01
Chr 3 M17‐22
0.98
Chr 4 M23‐30
0.32
Chr 5 M31‐38
0.01
M1‐5
0.99
M6‐9
0.11
M17‐19
0.98
M20‐22
0.22
M1‐3
0.02
M4‐5
0.99
M17‐18
0.12
M19
0.99
M4
0.99
M5
0.30
M19
0.99
M4
0.99
M19
0.99
Figure 2: Detect QTL by systematic method
15
3.3 MODEL SEARCH BY STOCHASTIC PROCESS
In stochastic search method, we use Monte Carlo Markov Chain Model Com-
parison, a widely used stochastic search algorithm. We define the Model Selection
Vector for the ith model as−→Mi. M , The length of
−→Mi is the number of total candidate
markers in the experiment. Each mth (m ≤ M) element in−→Mi corresponds to its
mth marker counterpart. The value of the mth element is either 1 or 0, and defined
by if the mth marker in the model or not. For example, we have 5 markers in our
experiment, if the−→Mi is [1,1,0,0,1], the model should include the marker 1,2 and 5,
that is,
θi = β0 + β1X1i + β1X2i + β1X5i + εi
where i = 1, · · · , L.
The stochastic search begins by randomly choosing a starting model, and cal-
culating the posterior model probability p(Mi|D) of this model. And then, it will
randomly replace one location value in the model vector (replace to 1 if it is origi-
nally 0; replace to 0 if it is originally 1). Referring the example above, if the vector
−→Mi is [1,1,0,0,1], the
−−−→Mi+1 for the (q + 1)th model could be [0,1,0,0,1] , [1,0,0,0,1] ,
[1,1,1,0,1] , [1,1,0,1,1] , or [1,1,0,0,0] , suppose the first location was chosen, which is
[0,1,0,0,1], then the markers included in the model changed to marker 2 and 5, and
the posterior model probability p(Mi+1|D) is calculated for this model.
With posterior model probability p(Mi|D) for the ith model, we can identify
which model or models are better fit the given quantitative trait data, and to lo-
cate potential markers associated with the quantitative trait. The posterior model
probabilities can be used to help with the model search through the model space.
Based on the two consecutive posterior model probabilities we can use acceptance
probability to compare the two models. Similar to Metropolis-Hasting algorithm,
the acceptance probability αi+1,i is defined as minimum between 1 and the ratio of
16
Posterior Model Probabilities of the (i + 1)th model to the ith model which is the
best-fit model given data[5][7]that is
αi+1,i = min(1 +p(Mi+1|D)
p(Mi|D)) (18)
By using acceptance probability αi+1,i as the successs probability, we randomly
generate a Bernoulli random variable with success probability αi+1,i. If the generated
variable is 1, then the chain move to the new model. That indicates if this move
happens then the new model (the (i+1)th model) will replace the old one (ith model)
and the new model is the best fit model given the data so far. Also by using the
acceptance probability, A tally for each best model is kept. This tally indicates the
frequency in which each model is defined as the best fit model[3].
In our plant QTL experiment, we run 20 chains and 2000 steps within each
chain. Therefore, we have 40000 models to consider in total. And then, we will find
the activation probabilities for each marker based on these models and find out the
significant marker from the result table.
17
01101100111111001000101 Step 1
11101100111111001000101 Step 2
11111100111111001000101 Step 3
10111100111111001000101 Step 4
00110101101001111011010 Step 2000
Figure 3: Model vectors inside a Markov Chain
18
4 SIMULATION
4.1 DATA SET
In our plant experiment we use the marker information of the Bay-0×Shahdara
(Bay×Sha) population, which is in 165 lines × 38 markers’ structure (X matrix).
The Bay×Sha population is created by Oliver Loudet and Sylvain Chaillou(11). Fig-
ure 4 illustrates the genetic map of the Bay×Sha population, including the locations
of genetic markers within five chromosomes and their relative distance.Genetic Map of Arabidopsis thalianaGenetic Map of Arabidopsis thaliana
Pattern Recognition in Bioinformatics 20072
Figure 4: Genetic map for Bay×Sha
Bay×Sha population includes 5 chromosomes which contain 6 to 9 markers re-
spectively. Marker value are set to be -0.5, 0, or 0.5. If the marker value is -0.5,
that means the marker is from parent A; if the marker value is 0.5, that means the
19
marker is from parent B; if the marker value is 0, that means the marker information
is missing. The simulated response matrix y is L× J matrix where L is the number
of lines and J is the number of replicates of the observations within each line.
In this simulation, L=165 and J=10. simulations are made to identify 1 QTL
to 6 QTLs, different level of effect sizes have been assigned, and different level of
gamma noise have been added. The simulated responses are created by
yij = µ+∑
ai × xi +Rgamma (19)
where i− 1, 2, . . . , 165, j = 1, 2, . . . , 10, µ is the true mean, ai is the effect size of the
ith marker, xi is the ith marker value, and Rgamma is random noise from the gamma
distribution. For example, let’s suppose maker 2 and 9 have significant effect. We
assign mean as 40, effect size as 1 and 3 for each quantitative trait, and we also assign
a random noise following gamma distribution (e.g. Gamma(0.5,1)). The simulated
response yij can be created by
yij = 40 + 1 ∗ x2 + 3 ∗ x9 +Rgamma(0.5, 1) (20)
where x2 and x9 is decided by the marker value in the ith line of Bay×Sha matrix
for yij, effect size of marker 2 is 1 and effect size of marker 9 is 3 . For each
ith line in the simulated data matrix, we have different yij values due to different
marker information of the line and the type of gamma noise added. In this thesis,
60 simulated response y matrix are created. Table3-7 will show the creation of these
simulated response y matrix.
20
Line M1 M2 M3 M4 · · · M35 M36 M37 M381 0.5 0.5 -0.5 -0.5 · · · 0.5 0.5 -0.5 -0.52 0.5 0.5 -0.5 -0.5 · · · 0.5 0 -0.5 -0.53 0.5 -0.5 -0.5 -0.5 · · · 0.5 0.5 0.5 0.54 -0.5 -0.5 -0.5 -0.5 · · · -0.5 -0.5 -0.5 -0.55 -0.5 -0.5 -0.5 -0.5 · · · 0.5 0.5 0.5 0.56 -0.5 -0.5 -0.5 0.5 · · · 0.5 0.5 0.5 0.57 -0.5 -0.5 -0.5 -0.5 · · · 0.5 0.5 0.5 0.58 0.5 0.5 0.5 -0.5 · · · -0.5 0.5 0.5 0.59 0.5 0.5 0.5 0.5 · · · -0.5 -0.5 0.5 0.510 -0.5 -0.5 -0.5 0.5 · · · -0.5 0.5 -0.5 -0.5...
......
......
. . ....
......
...156 0.5 -0.5 -0.5 -0.5 · · · 0.5 0.5 0.5 0.5157 0.5 0.5 0.5 0.5 · · · 0 -0.5 -0.5 -0.5158 -0.5 -0.5 -0.5 0.5 · · · 0.5 -0.5 -0.5 0.5159 -0.5 -0.5 -0.5 0.5 · · · 0.5 -0.5 0.5 0.5160 0.5 0.5 0.5 0.5 · · · 0 -0.5 0 0.5161 0.5 -0.5 -0.5 0.5 · · · 0.5 0.5 0.5 0.5162 0.5 0.5 0.5 0.5 · · · 0 -0.5 -0.5 -0.5163 -0.5 -0.5 0.5 0 · · · 0.5 0.5 -0.5 -0.5164 -0.5 -0.5 -0.5 -0.5 · · · 0.5 0.5 -0.5 -0.5165 0.5 0.5 0.5 0.5 · · · -0.5 -0.5 0.5 0.5
Table 1: Marker Information (X Matrix)
21
Line Replicate 1 Replicate 2 Replicate 3 · · · Replicate 8 Replicate 9 Replicate 101 41.18938131 41.56474455 41.23084718 · · · 40.63483704 43.00570651 40.568718272 40.55281128 40.54592232 40.50762652 · · · 40.91177787 41.19623356 40.565815523 39.50224105 40.82294396 41.39494765 · · · 39.54800084 39.52676063 39.501633554 40.08673491 40.20206574 39.50099809 · · · 41.98796159 39.93326206 41.675876635 39.67481233 39.99182259 39.56960166 · · · 40.06223603 40.62162165 40.191047876 39.650172 39.65022736 39.76031288 · · · 39.72163767 40.15230126 39.575957337 39.5000614 40.83352131 40.21347511 · · · 39.94490492 39.63723997 41.263690558 40.8803921 40.5132346 41.54494996 · · · 40.57912699 40.85283013 42.076662299 42.05764982 41.12081364 41.07372557 · · · 40.50223531 42.12374071 41.0507723910 40.82399762 39.71636811 39.7291572 · · · 40.39473932 39.56300054 39.54394469
......
......
. . ....
......
156 39.51070138 39.759929 40.44457607 · · · 39.61957826 39.79321536 39.78441496157 40.84933151 41.69942735 41.09785163 · · · 40.67930273 40.67975313 40.511839158 39.6581928 39.59629037 39.9879948 · · · 41.56171914 39.90985647 39.86658909159 40.29898451 39.58609607 39.50691827 · · · 40.9229873 39.60580775 39.59049143160 40.71519641 40.66436995 41.28936524 · · · 41.16070788 40.65967771 41.26533944161 39.50003279 40.00588332 39.92925531 · · · 39.50237111 41.25149955 39.76797481162 40.74623311 41.7616776 40.52894885 · · · 41.3252194 40.80819499 40.99630735163 39.66519567 40.77677182 39.6134509 · · · 41.46895354 39.59331165 40.67266395164 39.66885692 39.63394557 39.57497036 · · · 39.98043048 39.5397068 39.76860469165 40.68137354 40.67683189 41.01929184 · · · 40.60035641 40.65613246 40.51191945
Table 2: Quantitative Trait Information (Y Matrix)
Effect Objective Gamma noise Result of Result ofsize(s) Marker(s) parameters systematic search stochastic search
1 C1M2 α = 0.5, β = 1 C1M2, C1M5 C1M2α = 1, β = 3 C1M2 C1M2
3 C1M2 α = 0.5, β = 1 C1M2 C1M2α = 1, β = 3 C1M2 C1M2
5 C1M2 α = 0.5, β = 1 C1M1, C1M2 C1M2α = 1, β = 3 C1M5 C1M2
7 C1M2 α = 0.5, β = 1 C1M2 C1M2α = 1, β = 3 C1M2 C1M2
9 C1M2 α = 0.5, β = 1 C1M1, C1M2 C1M2α = 1, β = 3 C1M1, C1M2 C1M2
Table 3: Result of one QTL detection
22
Effect Objective Gamma noise Result of Result ofsize(s) Marker(s) parameters systematic search stochastic search
1,3 C1M2, C1M5 α = 0.5, β = 1 C1M2, C1M5 C1M2, C1M5α = 1, β = 3 C1M1, C1M2, C1M5 C1M2, C1M5
1,5 C1M2, C1M5 α = 0.5, β = 1 C1M1, C1M2, C1M5 C1M2, C1M5α = 1, β = 3 C1M2, C1M5 C1M2, C1M5
1,7 C1M2, C1M5 α = 0.5, β = 1 C1M2, C1M5 C1M2, C1M5α = 1, β = 3 C1M1, C1M2, C1M5, C1M9 C1M2, C1M5
1,9 C1M2, C1M5 α = 0.5, β = 1 C1M1, C1M2, C1M4, C1M5 C1M2, C1M5α = 1, β = 3 C1M2, C1M5 C1M2, C1M5
3,5 C1M2, C1M5 α = 0.5, β = 1 C1M2, C1M4, C1M5 C1M2, C1M5α = 1, β = 3 C1M1, C1M2, C1M4, C1M5 C1M2, C1M5
3,7 C1M2, C1M5 α = 0.5, β = 1 C1M1, C1M2, C1M3, C1M5 C1M2, C1M5α = 1, β = 3 C1M2, C1M5 C1M2, C1M5
3,9 C1M2, C1M5 α = 0.5, β = 1 C1M1, C1M2, C1M4, C1M5 C1M2, C1M5α = 1, β = 3 C1M2, C1M5 C1M2, C1M5
5,7 C1M2, C1M5 α = 0.5, β = 1 C1M2, C1M3 C1M2, C1M5α = 1, β = 3 C1M1, C1M2, C1M4, C1M5 C1M2, C1M5
5,9 C1M2, C1M5 α = 0.5, β = 1 C1M2, C1M5 C1M2, C1M5α = 1, β = 3 C1M1, C1M2, C1M3, C1M5 C1M2, C1M5
7,9 C1M2, C1M5 α = 0.5, β = 1 C1M1, C1M2, C1M4, C1M5 C1M2, C1M5α = 1, β = 3 C1M1, C1M2, C1M4, C1M5 C1M2, C1M5
Table 4: Result of two QTLs detection (same chromosome)
23
Effect Objective Gamma noise Result of Result ofsize(s) Marker(s) parameters systematic search stochastic search
2,4 C1M5, C2M15 α = 0.5, β = 1 C1M4, C1M5, C1M5, C2M15C2M14, C2M15
α = 1, β = 3 C1M4, C1M5, C1M5, C2M15C2M14, C2M15
2,6 C1M5, C2M15 α = 0.5, β = 1 C1M2, C1M5, C1M5, C2M15C2M14, C2M15
α = 1, β = 3 C1M5, C2M12, C2M13, C1M5, C2M15C2M14, C2M15
2,8 C1M5, C2M15 α = 0.5, β = 1 C1M5, C2M15 C1M5, C2M15
α = 1, β = 3 C1M4, C1M5, C1M5, C2M15C2M14, C2M15
4,6 C1M5, C2M15 α = 0.5, β = 1 C1M5, C1M8, C1M5, C2M15C2M14, C2M15
α = 1, β = 3 C1M5, C2M15 C1M5, C2M15
4,8 C1M5, C2M15 α = 0.5, β = 1 C1M5, C2M15 C1M5, C2M15
α = 1, β = 3 N/A C1M5, C2M15
6,8 C1M5, C2M15 α = 0.5, β = 1 C1M4, C1M5, C1M5, C2M15C2M14, C2M15
α = 1, β = 3 C1M5, C2M15 C1M5, C2M15
Table 5: Result of two QTLs detection (different chromosomes)
24
Effect Objective Gamma noise Result of Result ofsize(s) Marker(s) parameters systematic search stochastic search2,4,6 C1M6, α = 0.5, β = 1 N/A C1M6,
C2M15, C3M21 C2M15, C3M21α = 1, β = 3 C1M6, C2M15, C1M6,
C3M20, C3M21 C2M15, C3M212,4,8 C1M6, α = 0.5, β = 1 N/A C1M6,
C2M15, C3M21 C2M15, C3M21α = 1, β = 3 N/A C1M6,
C2M15, C3M212,6,8 C1M6, α = 0.5, β = 1 N/A C1M6,
C2M15, C3M21 C2M15, C3M21α = 1, β = 3 N/A C1M6,
C2M15, C3M214,6,8 C1M6, α = 0.5, β = 1 N/A C1M6,
C2M15, C3M21 C2M15, C3M21α = 1, β = 3 N/A C1M6,
C2M15, C3M211,5,9 C1M2, α = 0.5, β = 1 C1M2, C1M9, C1M2,
C1M9, C2M15 C2M15, C2M16 C1M9, C2M15α = 1, β = 3 N/A C1M2,
C1M9, C2M153,6,9 C1M2, α = 0.5, β = 1 C1M1, C1M2, C1M2,
C3M19, C4M26 C3M19, C4M26 C3M19, C4M26α = 1, β = 3 C1M1, C1M2, C1M2,
C3M19, C4M26 C3M19, C4M26
Table 6: Result of Three QTLs detection
25
Effect Objective Gamma noise Result of Result ofsize(s) Marker(s) parameters systematic search stochastic search
1,3, C1M2, C1M9, α = 0.5, β = 1 N/A C1M2, C1M9,5, 9 C2M15, C5M31 C2M15, C5M31
α = 1, β = 3 N/A C1M2, C1M9,C2M15, C5M31
1,3, 5, C1M2, C1M9, α = 0.5, β = 1 N/A C1M2, C1M9,7, 9 C4M23, C4M23,
C5M31, C5M35 C5M31, C5M35α = 1, β = 3 N/A C1M2, C1M9,
C4M23,C5M31, C5M35
1,2, 5, C1M2, α = 0.5, β = 1 N/A C1M2,7,8, 9 C1M5, C1M9, C1M5, C1M9,
C2M15, C2M15,C4M27, C5M35 C4M27, C5M35
α = 1, β = 3 N/A C1M2,C1M5, C1M9,
C2M15, C4M27,C5M35
Table 7: Result of two QTLs detection (same chromosome)
26
4.2 RESULTS
The systematic search method and stochastic search method are compared in
this simulation study. We use 0.5 as the threshold value for the marker activation
probability. We run 52000 iterations in the Gibbs sampler and the cut off is 2000.
There are 20 Markov chains in the stochastic search method and each chain is
proceeded by 2000 steps.
The program is written in Fortran77 code, and we use the Fortran Power Sta-
tion4.0 developer to compile and build the code. We ran the executable file on
DELL OPTIPLEX GX745 PC which uses operation system of Microsoft Windows
XP Professional SP2, Intel(R) Core(TM)2(R) CPU 6400 at 2.13 GHz and 512 MB
RAM.
It takes the program one week to obtain the result by using the stochastic search
method and the time spent by using the systematic search method varies from several
minutes to half an hour which is determined by the number of segments in each step.
Tables3-7 summarize the results from the systematic and stochastic search meth-
ods with the activation probability of all important QTLs. In each method, we test
the effect of multiple QTL, effect size, and the influence of different levels of noises.
Generally, both search methods are able to identify the correct 1 QTL and 2 QTL
successfully. However, when we are dealing with more than 2 QTLs, the systematic
search method often identifies some extra wrong QTLs at the mean time identifying
the correct QTLs.
The greatest advantage for the systematic search method is its high speed to
identify the significant markers. However, this method may perform poorly under
a smaller effective size and stronger noise. From Table3-7 we can see, except for
some special cases, the systematic search method may identify more QTLs than we
want due to the influence of effective sizes and level of Gamma noises. In the case
27
of dealing with more than 2 QTLs, the systematic search method will fail to obtain
the correct activation probability information if 8 or more segments appears in the
searching step (that’s why we can see many N/A’s in the table).
Compared with the systematic search method, the main shortage for the stochas-
tic search method is that it takes relatively much longer time to identify the correct
QTLs. But, the good thing is its robustness in various simulation environments.
From the result tables, we can see that the stochastic search method can always
identify the desired QTLs no matter of the number of QTLs considered, the levels
of effective sizes assigned or the levels of Gamma noise added. The result from
the stochastic search method are always quite sound. The significant QTLs are so
significant while most of the insignificant markers have activation probabilities less
than the threshold value.
28
5 CONCLUSION
The Bayesian hierarchical regression model is an effective method to detect QTL
because complex data structures can be incorporated under this model. In this the-
sis, we compare two search method under this model (stochastic search method and
systematic search method). Since fitting every possible model would be computa-
tional challenge, the stochastic search method randomly choose 20 chains of models
to calculate the activation probabilities, while the systematic search method divides
segments on the genome into smaller and smaller until QTLs are identified. By com-
paring these two methods, the stochastic search method has a better performance
because of its successful identification of the correct QTLs without error.
We found both methods have advantages, the stochastic search method can de-
tect the accurate QTLs we are interested in, while the Systematic search method
saves time and is very efficient. In future researches, we may study a way to combine
these two methods in order to efficiently and correctly identify the QTLs.
29
REFERENCES
[1] Gelman A., Carlin A.J, Stern H.S and Rubin D.B., Bayesian Data Analysis,
2nd Edition, Chapman Hall/CRC, Boca Raton London Ney York Washington,
D.C., 2004.
[2] Micheal Lavine, What is Bayesian statistics and why everything else is wrong,
ISDS, Duke University, Durham, North Carolina.
[3] Chen Y, QTL Detection from stochastic processs by Bayesian hieraechical re-
gression model, UNCW,2007.
[4] David B Dunson, “Practical Advantages of Bayesian Analysis of Epidemio-
logic”, American Journal of Epidemiology, VOL.153.No.12:1222-6,2001.
[5] Walsh B., Markov Chain Monte Carlo and Gibbs Sampling, Lecture Note for
EEB 581,version 26,April, 2004.
[6] Bao H, Bayesian Hierarchical regression model to detect Quantitative trait loci,
UNCW,2006.
[7] Boone E.L, Simmons S.J, Ye K, Stapleton A.E, “Analyzing Quantitative Trait
Loci for the Arabidopsis thaliana using Markov Chain Monte Carlo Model Com-
position with restricted and unrestricted model spaces”, Statistical methodology,
(3) 69-78, 2006.
[8] Congdon P, Bayesian Statistical Modeling, 2nd Edition, John Wiley and Son.
Ltd.
[9] Loudet O, “Chaillou S,Daniel-Vedele F”Bay-0*Shahdara recombination inbred
line population: a powerful tool for the genetic dissection of complex traits in
Arabidopsis”, Theoretical and Applied Genetics, VOL. 104,1173-1184, 2002.
30
APPENDIX
A. EXAMPLE OF USING R CODE TO GENERATE SIMULATED DATA
X<-read.table(’bayxsha2.csv’,sep=’,’)
yinew<-matrix(nrow=165,ncol=10)
for (i in 1:165)
{for (j in 1:10)
{yinew[i,j]<-40+3*X[i,2]+6*X[i,19]+9*X[i,26]+rgamma(1,1,3)}}
write.table(yinew,’i369b.csv’,sep=’,’,row.names=FALSE,col.names=FALSE)
31
B. SYSTEMATIC SEARCH METHOD
program test
USE MSIMSL
PARAMETER (M=38,L=165,taunot=0.5,sigmanot=0.5,KK=52000,
& kutoff=2000,sigbeta=100.d0,numseg=5)
! M is number of Markers (column) and L is number of lines
DOUBLE PRECISION taua,sigmaa(L),
& Xinit(L,M),Yinit(L,20),sumy(L),sumy2(L),ybar(L),
& thetas(L),sigma2(L),ybar2(L),tau2(1),Xuse(L,M),
& XB(L),XTXsend(M+1,M+1),adjust,
& temp,tau2init,thetasinit(L),Marksegprob(numseg),
& likesum , segPostProb(numseg) ,
& sigma2init(L), postmodelprob(L)
INTEGER ni(L),NOBS,Modelvec(M),Muse,Mtemp,numval, digit,
& ni_seg(numseg),segvec(numseg),Modelmatrix(L,numseg),
& temp11,temp12
!Setting parameters
taua = 1 + taunot + (L/2)
NOBS = 0
sumtheta = 0.0d0
sumtheta2 = 0.0d0
open(10, file=’ni.csv’,status=’old’)
do i=1,L
read(10,*) ni(i)
NOBS =NOBS + ni(i)
enddo
close(10)
open(55, file=’ni_seg.csv’,status=’old’)
do i=1,numseg
read(55,*) ni_seg(i)
enddo
close(55)
do i = 1,L
sigmaa(i)=(ni(i)/2) + 1 + sigmanot
enddo
!Read data
open(16, file=’bayxsha2.csv’, status=’old’)
do i=1,L
read(16,*) (Xinit(i,j),j=1,M)
enddo
close(16)
open(19, file=’a1a.csv’, status=’old’)
32
do i=1,L
mtemp=ni(i)
read(19,*) (Yinit(i,j), j=1,mtemp)
enddo
close(19)
! Get initial estimates
do i=1,L
sumy(i) = 0.d0
sumy2(i) = 0.d0
do j=1,ni(i)
sumy(i) =sumy(i) + Yinit(i,j) !Create ybar
sumy2(i) = sumy2(i) + Yinit(i,j)*Yinit(i,j)
enddo
ybar(i) = sumy(i)/ni(i)
thetas(i) = ybar(i)
sigma2(i) = (sumy2(i) - ni(i)*(ybar(i)**2))/(ni(i) - 1)
ybar2(i) = sumy2(i)/ni(i)
thetasinit(i) = thetas(i)
sigma2init(i) = sigma2(i)
enddo
do i = 1,L
sumtheta = sumtheta + thetas(i)
sumtheta2 = sumtheta2 + (thetas(i)**2)
enddo
thetabar = sumtheta/L
tau2(1) = (sumtheta2 - L*(thetabar**2))/(L - 1)
tau2init = tau2(1)
do i=1,numseg
segvec(i) = 1
Marksegprob(i) =0.d0
segPostProb(i) = 0 !Set all the Posterier Probabilitys of Seg 0
enddo
knum=0
do i =1, numseg
if (segvec(i).eq.1) then
do j = 1, ni_seg(i)
Modelvec(j+knum) = 1
enddo
endif
knum=knum + ni_seg(i)
enddo
Muse=M
Mtemp=Muse+1
numval=1
33
adjust=0.d0
CALL GETX (M,Xinit,Xuse,Modelvec,L)
CALL REGRESSION (Xuse,Muse,M,L,Yinit,ni,NOBS,XTXsend,XB)
write (*,*) ’Program initializing...Please Stand by’
CALL Gibbs(Xuse,Yinit,ni,L,M,Muse,taua,sigmaa,
& ybar,thetas,ybar2,tau2,KK,kutoff,
& sigma2,sigbeta,XB,XTXsend,Mtemp,numval,adjust,
& temp)
write (*,*) ’Completed adjustment’
write (*,*) ’Adjustment = ’, adjust
nummodel=(2**numseg)-1
do i=1, nummodel
knum = 0
write(*,*) ’i = ’,i
temp11=i
temp12=temp11/2
Muse = 0 !initiate the Muse and Modelvec
do j=1, M
Modelvec(j) = 0
enddo
do j=1, numseg
digit=temp11-temp12*2
write(*,*) ’digit for ’, j, ’is’, digit
temp11=temp12
temp12=temp11/2
do jj=1,ni_seg(j)
Modelvec(jj+knum)=1*digit
enddo
if (digit.eq.1) then
Muse=Muse+ni_seg(j)
endif
Modelmatrix(i,j) = digit
knum = knum + ni_seg(j)
!write(*,*) ’modelvector’, Modelvec
enddo
Mtemp=Muse+1
numval = 2 !numval=2 indicate the Gibbs will find the likelihood
CALL GETX (M,Xinit,Xuse,Modelvec,L)
write(*,*) ’Muse=’, Muse
CALL REGRESSION (Xuse,Muse,M,L,Yinit,ni,NOBS,XTXsend,XB)
tau2(1)=tau2init
do ii = 1, L
thetas(ii) = thetasinit(ii)
sigma2(ii) = sigma2init(ii)
34
enddo
! write(*,*) ’modelvector’, Modelvec
CALL Gibbs(Xuse,Yinit,ni,L,M,Muse,taua,sigmaa,
& ybar,thetas,ybar2,tau2,KK,kutoff,
& sigma2,sigbeta,XB,XTXsend,Mtemp,numval,adjust,
& temp)
postmodelprob(i)=temp !temp is the likelihood value
write(*,*) ’likelihood value is ’, postmodelprob(i)
enddo
!***************************Part Four********************************
likesum = 0.d0
do i=1,nummodel !Find the Posterier Probability for each Model
likesum=likesum+postmodelprob(i)
enddo
do i = 1,nummodel
postmodelprob(i) = postmodelprob(i)/likesum
enddo
do i=1,numseg !Find the Posterier Probability for each segment
do j = 1,nummodel
MarksegProb(i)=Marksegprob(i)+
& Modelmatrix(j,i)* postmodelprob(j)
enddo
enddo
open(3,file="MarkerProbability.txt",status="new")
do i=1,numseg
write (3,*) ’Probability of segment’,i
write(3,*) MarksegProb(i)
enddo
close(3)
!write (*,*) ’Posterier Probability of Markers are’, MarkPostProb
end !Main program ends here
!***********************************************************************
!******************Subroutine Part**************************************
!***********************************************************************
! Get the correct X columns in the beginning of matrix
SUBROUTINE GETX (M,Xinit,Xuse,Modelvec,L)
doubleprecision Xinit(L,M),Xuse(L,M)
integer M, Modelvec(M),L
j=1
do i=1,M
if (Modelvec(i).eq.1) then
do s=1,L
Xuse(s,j) = Xinit(s,i)
enddo
35
j = j + 1
endif
enddo
return
end
SUBROUTINE REGRESSION (Xuse,Muse,M,L,Yinit,ni,NOBS,XTXsend,XB)
doubleprecision Xuse(L,M), X(L,Muse),Yinit(L,20),yregress(NOBS),
& xregress(NOBS,Muse),betastemp(Muse+1),
& SST, SSE,Xbetas(L,Muse+1),XB(L),XTXsend(M+1,M+1),
& XTX(Muse+1,Muse+1)
integer Muse,M,L,ni(L),num,num2
! This part of the routine subsets the full X matrix to
! get the correct X
do j=1,Muse
do i=1,L
X(i,j) = Xuse(i,j)
enddo
enddo
! Rearrange the Y matrix to a vector for regression
num = 1
do i = 1,L
do j = 1,ni(i)
yregress(num) = Yinit(i,j)
num = num + 1
enddo
enddo
! Expand the correct X matrix for regression
num2 = 1
do i = 1,L
do s = 1,ni(i)
do j = 1,Muse
xregress(num2,j) = X(i,j)
enddo
num2 = num2 + 1
enddo
enddo
! This does the regression
CALL DRLSE (NOBS, yregress, Muse, xregress, NOBS, 1, betastemp,
& SST, SSE)
! From the regression routine
!write(*,*) betastemp
! Create appropriate X matrix by adding a column of 1’s for the
! intercept term
do i = 1,L
36
Xbetas(i,1) = 1.d0
XB(i) = 0.d0
do j = 1, Muse
Xbetas(i,j+1) = Xuse(i,j)
enddo
enddo
! Calculate XB
do i = 1,L
do j = 1,Muse+1
XB(i) = XB(i) + Xbetas(i,j)*betastemp(j)
enddo
enddo
! Calculates XTX
CALL DMXTXF (L, Muse+1, Xbetas, L, Muse+1, XTX, Muse+1)
! Need to send XTX out of this function. In order to do so
! must save this to an M by M matrix
do i = 1,Muse+1
do j=1,Muse+1
XTXsend(i,j) = XTX(i,j)
enddo
enddo
return
end
!Subroutine for Gibbs sampler
SUBROUTINE Gibbs (Xuse,Y,ni,L,M,Muse,taua,sigmaa,
& ybar,thetas,ybar2,tau2,KK,kutoff,
& sigma2,sigbeta,XB,XTXsend,Mtemp,numval,adjust,
& bayesfac)
DOUBLE PRECISION Xuse(L,M),XB(L),tau2(1),
& taua,taub(1),sigmab(L),Y(L,20),betamu(Muse+1),
& covarbeta(Muse+1,Muse+1),sigma2(L),thetamu(L),
& thetas(L),thetasig(L),ybar(L),
& stdtau2(1),betasst(Muse+1),stdsig(L),ybar2(L),
& stdtheta(L),sigmaa(L),minloglik,
& liktemp(KK),temp5,maxloglik,
& sumtemp4,bayesfac,RSIG(Muse+1,Muse+1),TOL,
& X(L,Muse),DMACH,betasst2(Muse+1),
& XTXsend(M+1,M+1),Xbetas(L,Muse+1),betas(Muse+1),
& XTX(Muse+1,Muse+1),adjust
INTEGER ni(L),IRANK,KK,kutoff,Mtemp,L,Muse,icount,numsim,numval
!Set up a groups of new parameters
Mtemp=Muse+1
TOL = 100.0*DMACH(4)
minloglik = 1.d8
37
maxloglik = -1.d8
sumtemp4 = 0.d0
icount = 0
!sigbeta = 100.d0
! Get the correct X
do j=1,Muse
do i=1,L
X(i,j) = Xuse(i,j)
enddo
enddo
! Get the correct XTX
do j=1,Muse+1
do i=1,Muse+1
XTX(i,j) = XTXsend(i,j)
enddo
enddo
!Gibbs Sampler
do numsim=1,KK
!write (*,*) numsim
!***** THETAS ***************************
CALL thetapar (tau2,sigma2,XB,L,ybar,ni,thetamu,thetasig) !parameter
CALL DRNNOR (L,stdtheta)
do i=1,L
thetas(i) = stdtheta(i)*thetasig(i) + thetamu(i)
enddo
!***** TAU ***************************
CALL tauparm (thetas,XB,L,taub)
CALL drngam(1,taua,stdtau2)
tau2(1) = taub(1)/stdtau2(1)
!***** BETA ***************************
CALL betapar (XTX,Muse,tau2,L,thetas,X,betamu,covarbeta,sigbeta)
CALL DCHFAC (Mtemp, covarbeta, Mtemp, TOL, IRANK, RSIG, Mtemp)
! Cholesky factor
CALL DRNNOR(Mtemp,betasst)
do i=1,Mtemp
betasst2(i) = 0.d0
do j=1,Mtemp
betasst2(i) = betasst2(i) + RSIG(i,j)*betasst(j)
enddo
betas(i) = betasst2(i) +betamu(i)
enddo
do i = 1,L
Xbetas(i,1) = 1.d0
XB(i) = 0.d0
38
do j = 1, Muse
Xbetas(i,j+1) = Xuse(i,j)
enddo
enddo
do i = 1,L
do j = 1,Muse+1
XB(i) = XB(i) + Xbetas(i,j)*betas(j)
enddo
enddo
! ***** SIGMA ***************************
CALL sigmaparm (ybar,ybar2,ni,thetas,L,sigmab)
CALL drngam(L,sigmaa(1),stdsig)
do i = 1,L
sigma2(i) = sigmab(i)/stdsig(i)
enddo
!write (*,*) ’taub=’, taub
!write (*,*) ’tau2=’, tau2
CALL llike (betas,XB,tau2,Y,sigma2,thetas,
& L,Muse,sigmaa,taua,temp5,sigbeta,icountup,ni,
& adjust)
if (temp5.le.10) liktemp(numsim)=dexp(temp5)
if (temp5.gt.10) liktemp(numsim)=0
if (numval.eq.1) then
if ((temp5.ge.maxloglik) .and. (numsim.ge.kutoff))
& maxloglik = temp5
!write(*,*) "temp5 = ", temp5
!write(*,*) "numsim = ", numsim
if ((temp5.le.minloglik) .and. (numsim.ge.kutoff))
& minloglik = temp5
if (s.ge.kutoff) icount = icount + icountup
endif
enddo ! Here ends the simulation for the Gibbs Sampler
if (numval.eq.1) adjust = maxloglik
if (numval.eq.2) then
do s=(kutoff+1),KK
sumtemp4 = sumtemp4 + liktemp(s)
enddo
denom = (KK-(kutoff+1.0)+0.d0)
bayesfac = sumtemp4/denom
!write(*,*) ’bayesfac = ’, bayesfac ,’icount = ’, icount
endif
return
end
!Subroutine for updating the Tau parameter
39
SUBROUTINE tauparm (thetas,XB,L,taub)
DOUBLE PRECISION sumTXB,taub(1),thetas(L),XB(L)
INTEGER L
sumTXB=0.d0
do i=1,L
sumTXB=sumTXB + (thetas(i) - XB(i))*(thetas(i) - XB(i))
& +1.d0
enddo
taub(1)=0.5*sumTXB
return
end
!Subroutine for updating the Sigma parameter
SUBROUTINE sigmaparm (ybar,ybar2,ni,thetas,L,sigmab)
DOUBLEPRECISION ybar(L),thetas(L),sumythetas,sigmab(L),ybar2(L),
& dni(L)
INTEGER ni(L)
sumythetas=0.d0
do i=1,L
dni(i) = ni(i) + 0.0
sigmab(i) = 0.5*(1+(dni(i)*ybar2(i) - 2*thetas(i)*dni(i)*
& ybar(i) + dni(i)*thetas(i)*thetas(i)))
enddo
return
end
!Subroutine for updating the Beta parameter
SUBROUTINE betapar (XTX,Muse,tau2,L,thetas,X,betamu,covarbeta,
& sigbeta)
DOUBLE PRECISION XTX(Muse+1,Muse+1),step1(Muse+1,Muse+1),
& covarbeta(Muse+1,Muse+1),mupart2(Muse+1),thetas(L),
& tau2(1) , Xbetas(L,Muse+1),X(L,Muse), betamu(Muse+1)
INTEGER Muse,L
do i=1,Muse+1
do j=1,Muse+1
if (i.eq.j) then
step1(i,j)=(1/sigbeta)+((1/tau2(1))*XTX(i,j))
else
step1(i,j) = ((1/tau2(1))*XTX(i,j))
endif
enddo
enddo
do i = 1,L
Xbetas(i,1) = 1.d0
do j = 1, Muse
Xbetas(i,j+1) = X(i,j)
40
enddo
enddo
CALL DLINDS (Muse+1, step1, Muse+1, covarbeta, Muse+1)
! CALL DMURRV (L, Muse+1, Xbetas, L, Muse+1, thetas, 1, L,
!& mupart2)
do j = 1,Muse+1
mupart2(j)=0.d0
do i = 1,L
mupart2(j)=mupart2(j)+Xbetas(i,j)*thetas(i)
enddo
mupart2(j) = mupart2(j)/tau2(1) ! I am the one
enddo
! CALL DMURRV (Mtemp, Mtemp, covarbeta, Mtemp, Mtemp, mupart2,
! & 1,Mtemp, betamu)
do i= 1, Muse+1
betamu(i)=0.d0
do j =1, Muse+1
betamu(i)=betamu(i)+covarbeta(i,j)*mupart2(j)
enddo
enddo
return
end
!Subroutine for updating the Theta parameter
SUBROUTINE thetapar (tau2,sigma2,XB,L,ybar,ni,thetamu,thetasig)
DOUBLE PRECISION tau2(1),sigma2(L),XB(L),ybar(L),thetamu(L),
& thetasig(L),dni(L)
INTEGER L ,ni(L)
do i=1,L
dni(i)=ni(i) + 0.0
thetamu(i) = (1/tau2(1))*(tau2(1)*sigma2(i)/(dni(i)*tau2(1)
& +sigma2(i)))*XB(i) +(1/sigma2(i))
& *(tau2(1)*sigma2(i)/(dni(i)*tau2(1)+sigma2(i)))*
& dni(i)*ybar(i)
enddo
do i=1,L
thetasig(i) = sqrt(tau2(1)*sigma2(i)/(dni(i)*tau2(1)
& +sigma2(i)))
enddo
return
end
!Subroutine for likelyhood function
SUBROUTINE llike (betas,XB,tau2,Y,sigma2,thetas,
& L,M1,sigmaa,taua,likehood2,sigbeta,icountup,ni,
& adjust)
41
DOUBLE PRECISION betas(M1+1),XB(L),tau2(1),
& taua,Y(L,20),btb,thetas(L),
& sigma2(L),sigmaa(L),lik1,lik2,likehood,
& likehood2 ,adjust
INTEGER M1,L ,ni(L)
lik1=0.d0
lik2=0.d0
btb=0.d0
icountup = 0
Mtemp=M1+1
do i=1,L
lik1= lik1 - (sigmaa(i))*dlog(sigma2(i)) -
& (1/(2.d0*sigma2(i))) -
& (1/(2.d0*tau2(1)))*
& (thetas(i) - XB(i))*
& (thetas(i) - XB(i))
end do
do i=1,L
do j=1,ni(i)
lik2 = lik2 -(1/(2.d0*sigma2(i)))*(Y(i,j)-thetas(i))*
& (Y(i,j)-thetas(i))
end do
end do
do i = 1,M1+1
btb=btb + betas(i)*betas(i)
end do
likehood = lik1 + lik2 - (taua)*dlog(tau2(1))
& - (1/(2.d0*tau2(1))) - (1/(2.d0*sigbeta)) * btb
likehood2=likehood - adjust !Adjusting likelihood
!write(*,*) "likelihood =", likehood2
return
end
42
C. STOCHASTIC SEARCH METHOD
program test
USE MSIMSL
PARAMETER (M=38,L=165,taunot=0.5,sigmanot=0.5,KK=52000,
& kutoff=2000,sigbeta=100.d0,knum=2000,nn=10)
! M is number of Markers (column) and L is number of lines
DOUBLE PRECISION taua,sigmaa(L),
& Xinit(L,M),Yinit(L,20),sumy(L),sumy2(L),ybar(L),
& thetas(L),sigma2(L),ybar2(L),tau2(1),Xuse(L,M),
& tempModel(M),XB(L),XTXsend(M+1,M+1),adjust,
& temp,curlikhood,oldlikhood,
& likesum , MarkPostProb(M) ,tau2init, thetasinit(L),
& sigma2init(L), DummyVal1, DummyVal2,
& ModelTable(knum*20,42)
REAL probsucc
INTEGER ni(L),NOBS,Modelvec(M),Muse ,Mtemp,numval,newvector(1),LL,
& Newmodel, Modelvecbuf(M),lbinval(1),Locator,Previous,ii,tt
!Setting parameters
taua = 1 + taunot + (L/2)
NOBS = 0
sumtheta = 0.0d0
sumtheta2 = 0.0d0
open(10, file=’ni.csv’,status=’old’)
do i=1,L
read(10,*) ni(i)
NOBS =NOBS + ni(i)
enddo
close(10)
do i = 1,L
sigmaa(i)=(ni(i)/2) + 1 + sigmanot
enddo
!Read data
open(16, file=’bayxsha2.csv’, status=’old’)
do i=1,L
read(16,*) (Xinit(i,j),j=1,M)
enddo
close(16)
open(19, file=’newy.csv’, status=’old’)
do i=1,L
mtemp=ni(i)
read(19,*) (Yinit(i,j), j=1,mtemp)
enddo
43
close(19)
! Get initial estimates
do i=1,L
sumy(i) = 0.d0
sumy2(i) = 0.d0
do j=1,ni(i)
sumy(i) =sumy(i) + Yinit(i,j) !Create ybar
sumy2(i) = sumy2(i) + Yinit(i,j)*Yinit(i,j)
enddo
ybar(i) = sumy(i)/ni(i)
thetas(i) = ybar(i)
sigma2(i) = (sumy2(i) - ni(i)*(ybar(i)**2))/(ni(i) - 1)
ybar2(i) = sumy2(i)/ni(i)
thetasinit(i) = thetas(i)
sigma2init(i) = sigma2(i)
enddo
do i = 1,L
sumtheta = sumtheta + thetas(i)
sumtheta2 = sumtheta2 + (thetas(i)**2)
enddo
thetabar = sumtheta/L
tau2(1) = (sumtheta2 - L*(thetabar**2))/(L - 1)
tau2init = tau2(1)
do i=1,M
Modelvec(i) = 1
MarkPostProb(i) = 0 !Set all the Posterier Probabilitys of Markers 0
enddo
Muse=M
Mtemp=Muse+1
numval=1
adjust=0.d0
CALL GETX (M,Xinit,Xuse,Modelvec,L)
CALL REGRESSION (Xuse,Muse,M,L,Yinit,ni,NOBS,XTXsend,XB)
write (*,*) ’Program initializing...Please Stand by’
CALL Gibbs(Xuse,Yinit,ni,L,M,Muse,taua,sigmaa,
& ybar,thetas,ybar2,tau2,KK,kutoff,
& sigma2,sigbeta,XB,XTXsend,Mtemp,numval,adjust,
& temp)
write (*,*) ’Finish the first Gibbs...’
!***********************************************************************
!*******************Start the Stochastic search*************************
!***********************************************************************
do ii = 1, nn !Over all loop start here
write (*,*) ’ii=’,ii
44
CALL DRNUN (M, tempModel)
!initiallize the beginning model vector
do i=1,M
if (tempModel(i).ge.0.5) Modelvec(i) = 1
if (tempModel(i).lt.0.5) Modelvec(i) = 0
enddo
Newmodel=1
oldlikhood=0
curlikhood=0
likesum=0
probsucc=0
Locator=1
previous=0
do k = 1,knum !Each iteration loop start here
tt = k+(ii-1)*knum !Model indicator
write (*,*) ’k=’,k
!*****************************Part One**********************************
if (k.eq.1) then
Newmodel = 1
endif
select case (Newmodel)
case (0) !When Newmodel is 0, reserve the modelvectors
do i=1,38
Modelvec(i) = Modelvecbuf(i)
enddo
CALL RNUND(1, M, newvector) !Change the model vectors
newtemp=newvector(1)
if (Modelvec(newtemp).eq.1) then
Modelvec(newtemp) = 0
else
Modelvec(newtemp) = 1
endif
case (1)
CALL RNUND(1, M, newvector) !Change the model vectors
newtemp=newvector(1)
if (Modelvec(newtemp).eq.1) then
Modelvec(newtemp) = 0
else
Modelvec(newtemp) = 1
endif
endselect
!write(*,*) ’modelvectors are’,Modelvec
!Initialize the current row of the table
ModelTable(tt,1) = 0 !First column is the Model Value
45
ModelTable(tt,2) = 0 !Second column is the Model likelihood
ModelTable(tt,3) = 0 !Thrid column is the Visiting times
ModelTable(tt,4) = 0 !Fourth the posterior probability
do j=1,38
ModelTable(tt,(j+4))=Modelvec(j)
enddo !Find the new model vector
DummyVal1 = 0
DummyVal2 = 0
do i = 1,19
if (Modelvec(i).eq.1) then
DummyVal1 = DummyVal1 + 2**(i-1)
endif
enddo
do i = 1,19
if (Modelvec(i+19).eq.1) then
DummyVal2 = DummyVal2 + 2**(i-1)
endif
enddo
ModelTable(tt,1) = DummyVal1*10000000+DummyVal2
!Find the Model values
!write (*,*) ’Model value initially=’,Modelinfo(k,1)
!write (*,*) ’Model vectors=’,Modelvec
!*****************************Part two**********************************
if (tt.ne.1) then
!Search if the model has been done before (for K>1)
LL = 1
do while ((ModelTable(tt,1).ne.
& ModelTable(LL,1)).AND.(LL.ne.tt))
LL = LL + 1
enddo
!write (*,*) ’LL=’,LL
!write (*,*) ’k(>1)=’,k
if (LL.lt.tt) then
!write (*,*) ’Modelinfo(LL,2)=’,Modelinfo(LL,2)
curlikhood = ModelTable(LL,2)
probsucc = curlikhood/oldlikhood
!write (*,*) ’curlikhood=’,curlikhood
!write (*,*) ’oldlikhood=’,oldlikhood
if ((probsucc.lt.1).AND.(probsucc.gt.0)) then
CALL RNBIN(1,1,probsucc,lbinval)
Newmodel=lbinval(1)
elseif (probsucc.eq.0) then
Newmodel=0
elseif (probsucc.ge.1) then
46
Newmodel=1
endif
!write (*,*) ’Newmodel=’, Newmodel
if (Newmodel.eq.1) then
ModelTable(LL,3) = ModelTable(LL,3) + 1
!write (*,*) ’The deleted k is (1)’,k
do j=1,42
ModelTable(tt,j)=0
enddo !Delete the information of this row
Locator = LL
else
!write (*,*) ’The delete k is (0)’,k
ModelTable(Locator,3) = ModelTable(Locator,3) + 1
do j=1,42
ModelTable(tt,j)=0
enddo
endif
else
Muse = 0
do i = 1,M
Muse = Muse + Modelvec(i)
enddo
Mtemp = Muse+1
numval = 2 !numval=2 indicate the Gibbs will find the likelihood
CALL GETX (M,Xinit,Xuse,Modelvec,L)
CALL REGRESSION (Xuse,Muse,M,L,Yinit,ni,NOBS,XTXsend,XB)
tau2(1)=tau2init
do i = 1, L
thetas(i) = thetasinit(i
sigma2(i) = sigma2init(i)
enddo
CALL Gibbs(Xuse,Yinit,ni,L,M,Muse,taua,sigmaa,
& ybar,thetas,ybar2,tau2,KK,kutoff,
& sigma2,sigbeta,XB,XTXsend,Mtemp,numval,adjust,
& temp)
curlikhood=temp !temp is the likelihood value
!write (*,*) ’Curentlikelihood of more than 2=’,curlikhood
if (curlikhood.ne.0) then
if ((oldlikhood.eq.0).or
& .(curlikhood.gt.oldlikhood)) then
probsucc = 1 !Make sure it is greater than 1
else
probsucc = curlikhood/oldlikhood
!write (*,*) ’oldlikhood of more than 2=’,oldlikhood
47
!write (*,*) ’probsucc=’,probsucc
endif !Get the right value for Probsucc
if ((probsucc.lt.1).and.(probsucc.gt.0)) then
CALL RNBIN(1,1,probsucc,lbinval)
Newmodel=lbinval(1)
elseif (probsucc.ge.1) then
Newmodel=1
elseif (probsucc.eq.0) then
Newmodel=0
endif
if (Newmodel.eq.1) then
ModelTable(tt,3) = ModelTable(tt,3) + 1
ModelTable(tt,2) = curlikhood
Locator = tt
elseif (Newmodel.eq.0) then
ModelTable(Locator,3) = ModelTable(Locator,3) + 1
ModelTable(tt,2) = curlikhood
endif
else
do j = 1, 42
ModelTable(tt,j) = 0
Newmodel = 0
enddo
endif
endif
elseif (tt.eq.1) then
!write (*,*) ’k1=’,k
Muse = 0
do i = 1,M
Muse = Muse + Modelvec(i)
enddo
Mtemp = Muse+1
numval = 2 !numval=2 indicate the Gibbs will find the likelihood
CALL GETX (M,Xinit,Xuse,Modelvec,L)
CALL REGRESSION (Xuse,Muse,M,L,Yinit,ni,NOBS,XTXsend,XB)
tau2(1) = tau2init
do i = 1, L
sigma2(i) = sigma2init(i)
thetas(i) = thetasinit(i)
enddo
CALL Gibbs(Xuse,Yinit,ni,L,M,Muse,taua,sigmaa,
& ybar,thetas,ybar2,tau2,KK,kutoff,
& sigma2,sigbeta,XB,XTXsend,Mtemp,numval,adjust,
& temp)
48
curlikhood=temp !temp is the likelihood value
if (curlikhood.ne.0) then
ModelTable(1,3) = ModelTable(1,3) + 1
ModelTable(1,2) = curlikhood
else
do j = 1, 42
ModelTable(1,j) = 0
enddo
NewModel = 0
endif
!write (*,*) ’Currentlikelihood=’,curlikhood
endif !If the model doesn’t move, visiting times will be 0
if ((k.eq.1).or.(Newmodel.eq.1)) then
if (NewModel.eq.1) then
oldlikhood=curlikhood !Store the likelihood
do i=1,38 !Store the model vectors
Modelvecbuf(i) = Modelvec(i)
enddo
else
NewModel = 1
endif
endif
!write (*,*) ’Newmodel=’,Newmodel
!write (*,*) ’modelinfo(k,2)=’,modelinfo(k,2)
!write (*,*) ’Model value finally is=’, Modelinfo(k,1)
enddo !Each iteration loop ends here
enddo !Over all loop ends here
!***************************Part Four********************************
do i=1,(knum*nn) !Find the Posterier Probability for each Model
likesum=likesum+ModelTable(i,2)*ModelTable(i,3)
enddo
!write (*,*) ’likesum=’,likesum
do i=1,(knum*nn)
ModelTable(i,4)=(ModelTable(i,2)/likesum)*ModelTable(i,3)
!write (*,*) ’Modelinfo(i,1)=’,Modelinfo(i,1)
!write (*,*) ’MOdelinfo(i,2)=’,Modelinfo(i,2)
!write (*,*) ’Modelinfo(i,3)=’,Modelinfo(i,3)
!write (*,*) ’Modelinfo(i,4)=’,Modelinfo(i,4)
enddo
do i=1,38 !Find the Posterier Probability for each Marker
do k=1,(knum*nn)
MarkPostProb(i)=MarkPostProb(i)+
&ModelTable(k,(i+4))*ModelTable(k,4)
enddo
49
enddo
open(3,file="MarkerProbability.txt",status="new")
do i=1,38
write (3,*) ’Probability of Marker’,i
write(3,*) MarkPostProb(i)
enddo
close(3)
!write (*,*) ’Posterier Probability of Markers are’, MarkPostProb
end !Main program ends here
!***********************************************************************
!******************Subroutine Part**************************************
!***********************************************************************
! Get the correct X columns in the beginning of matrix
SUBROUTINE GETX (M,Xinit,Xuse,Modelvec,L)
doubleprecision Xinit(L,M),Xuse(L,M)
integer M, Modelvec(M),L
j=1
do i=1,M
if (Modelvec(i).eq.1) then
do s=1,L
Xuse(s,j) = Xinit(s,i)
enddo
j = j + 1
endif
enddo
return
end
SUBROUTINE REGRESSION (Xuse,Muse,M,L,Yinit,ni,NOBS,XTXsend,XB)
doubleprecision Xuse(L,M), X(L,Muse),Yinit(L,20),yregress(NOBS),
& xregress(NOBS,Muse),betastemp(Muse+1),
& SST, SSE,Xbetas(L,Muse+1),XB(L),XTXsend(M+1,M+1),
& XTX(Muse+1,Muse+1)
integer Muse,M,L,ni(L),num,num2
! This part of the routine subsets the full X matrix to
! get the correct X
do j=1,Muse
do i=1,L
X(i,j) = Xuse(i,j)
enddo
enddo
! Rearrange the Y matrix to a vector for regression
num = 1
do i = 1,L
do j = 1,ni(i)
50
yregress(num) = Yinit(i,j)
num = num + 1
enddo
enddo
! Expand the correct X matrix for regression
num2 = 1
do i = 1,L
do s = 1,ni(i)
do j = 1,Muse
xregress(num2,j) = X(i,j)
enddo
num2 = num2 + 1
enddo
enddo
! This does the regression
CALL DRLSE (NOBS, yregress, Muse, xregress, NOBS, 1, betastemp,
& SST, SSE)
! From the regression routine
!write(*,*) betastemp
! Create appropriate X matrix by adding a column of 1’s for the
! intercept term
do i = 1,L
Xbetas(i,1) = 1.d0
XB(i) = 0.d0
do j = 1, Muse
Xbetas(i,j+1) = Xuse(i,j)
enddo
enddo
! Calculate XB
do i = 1,L
do j = 1,Muse+1
XB(i) = XB(i) + Xbetas(i,j)*betastemp(j)
enddo
enddo
! Calculates XTX
CALL DMXTXF (L, Muse+1, Xbetas, L, Muse+1, XTX, Muse+1)
! Need to send XTX out of this function. In order to do so
! must save this to an M by M matrix
do i = 1,Muse+1
do j=1,Muse+1
XTXsend(i,j) = XTX(i,j)
enddo
enddo
return
51
end
!Subroutine for Gibbs sampler
SUBROUTINE Gibbs (Xuse,Y,ni,L,M,Muse,taua,sigmaa,
& ybar,thetas,ybar2,tau2,KK,kutoff,
& sigma2,sigbeta,XB,XTXsend,Mtemp,numval,adjust,
& bayesfac)
DOUBLE PRECISION Xuse(L,M),XB(L),tau2(1),
& taua,taub(1),sigmab(L),Y(L,20),betamu(Muse+1),
& covarbeta(Muse+1,Muse+1),sigma2(L),thetamu(L),
& thetas(L),thetasig(L),ybar(L),
& stdtau2(1),betasst(Muse+1),stdsig(L),ybar2(L),
& stdtheta(L),sigmaa(L),minloglik,
& liktemp(KK),temp5,maxloglik,
& sumtemp4,bayesfac,RSIG(Muse+1,Muse+1),TOL,
& X(L,Muse),DMACH,betasst2(Muse+1),
& XTXsend(M+1,M+1),Xbetas(L,Muse+1),betas(Muse+1),
& XTX(Muse+1,Muse+1),adjust
INTEGER ni(L),IRANK,KK,kutoff,Mtemp,L,Muse,icount,numsim,numval
!Set up a groups of new parameters
Mtemp=Muse+1
TOL = 100.0*DMACH(4)
minloglik = 1.d8
maxloglik = -1.d8
sumtemp4 = 0.d0
icount = 0
!sigbeta = 100.d0
! Get the correct X
do j=1,Muse
do i=1,L
X(i,j) = Xuse(i,j)
enddo
enddo
! Get the correct XTX
do j=1,Muse+1
do i=1,Muse+1
XTX(i,j) = XTXsend(i,j)
enddo
enddo
!Gibbs Sampler
do numsim=1,KK
!write (*,*) numsim
!***** THETAS ***************************
CALL thetapar (tau2,sigma2,XB,L,ybar,ni,thetamu,thetasig) !parameter
CALL DRNNOR (L,stdtheta)
52
do i=1,L
thetas(i) = stdtheta(i)*thetasig(i) + thetamu(i)
enddo
!***** TAU ***************************
CALL tauparm (thetas,XB,L,taub)
CALL drngam(1,taua,stdtau2)
tau2(1) = taub(1)/stdtau2(1)
!***** BETA ***************************
CALL betapar (XTX,Muse,tau2,L,thetas,X,betamu,covarbeta,sigbeta)
CALL DCHFAC (Mtemp, covarbeta, Mtemp, TOL, IRANK, RSIG, Mtemp)
! Cholesky factor
CALL DRNNOR(Mtemp,betasst)
do i=1,Mtemp
betasst2(i) = 0.d0
do j=1,Mtemp
betasst2(i) = betasst2(i) + RSIG(i,j)*betasst(j)
enddo
betas(i) = betasst2(i) +betamu(i)
enddo
do i = 1,L
Xbetas(i,1) = 1.d0
XB(i) = 0.d0
do j = 1, Muse
Xbetas(i,j+1) = Xuse(i,j)
enddo
enddo
do i = 1,L
do j = 1,Muse+1
XB(i) = XB(i) + Xbetas(i,j)*betas(j)
enddo
enddo
! ***** SIGMA ***************************
CALL sigmaparm (ybar,ybar2,ni,thetas,L,sigmab)
CALL drngam(L,sigmaa(1),stdsig)
do i = 1,L
sigma2(i) = sigmab(i)/stdsig(i)
enddo
!write (*,*) ’taub=’, taub
!write (*,*) ’tau2=’, tau2
CALL llike (betas,XB,tau2,Y,sigma2,thetas,
& L,Muse,sigmaa,taua,temp5,sigbeta,icountup,ni,
& adjust)
if (temp5.le.10) liktemp(numsim)=dexp(temp5)
if (temp5.gt.10) liktemp(numsim)=0
53
if (numval.eq.1) then
if ((temp5.ge.maxloglik) .and. (numsim.ge.kutoff))
& maxloglik = temp5
!write(*,*) "temp5 = ", temp5
!write(*,*) "numsim = ", numsim
if ((temp5.le.minloglik) .and. (numsim.ge.kutoff))
& minloglik = temp5
if (s.ge.kutoff) icount = icount + icountup
endif
enddo ! Here ends the simulation for the Gibbs Sampler
if (numval.eq.1) adjust = maxloglik
if (numval.eq.2) then
do s=(kutoff+1),KK
sumtemp4 = sumtemp4 + liktemp(s)
enddo
denom = (KK-(kutoff+1.0)+0.d0)
bayesfac = sumtemp4/denom
!write(*,*) ’bayesfac = ’, bayesfac ,’icount = ’, icount
endif
return
end
!Subroutine for updating the Tau parameter
SUBROUTINE tauparm (thetas,XB,L,taub)
DOUBLE PRECISION sumTXB,taub(1),thetas(L),XB(L)
INTEGER L
sumTXB=0.d0
do i=1,L
sumTXB=sumTXB + (thetas(i) - XB(i))*(thetas(i) - XB(i))
& +1.d0
enddo
taub(1)=0.5*sumTXB
return
end
!Subroutine for updating the Sigma parameter
SUBROUTINE sigmaparm (ybar,ybar2,ni,thetas,L,sigmab)
DOUBLEPRECISION ybar(L),thetas(L),sumythetas,sigmab(L),ybar2(L),
& dni(L)
INTEGER ni(L)
sumythetas=0.d0
do i=1,L
dni(i) = ni(i) + 0.0
sigmab(i) = 0.5*(1+(dni(i)*ybar2(i) - 2*thetas(i)*dni(i)*
& ybar(i) + dni(i)*thetas(i)*thetas(i)))
enddo
54
return
end
!Subroutine for updating the Beta parameter
SUBROUTINE betapar (XTX,Muse,tau2,L,thetas,X,betamu,covarbeta,
& sigbeta)
DOUBLE PRECISION XTX(Muse+1,Muse+1),step1(Muse+1,Muse+1),
& covarbeta(Muse+1,Muse+1),mupart2(Muse+1),thetas(L),
& tau2(1) , Xbetas(L,Muse+1),X(L,Muse), betamu(Muse+1)
INTEGER Muse,L
do i=1,Muse+1
do j=1,Muse+1
if (i.eq.j) then
step1(i,j)=(1/sigbeta)+((1/tau2(1))*XTX(i,j))
else
step1(i,j) = ((1/tau2(1))*XTX(i,j))
endif
enddo
enddo
do i = 1,L
Xbetas(i,1) = 1.d0
do j = 1, Muse
Xbetas(i,j+1) = X(i,j)
enddo
enddo
CALL DLINDS (Muse+1, step1, Muse+1, covarbeta, Muse+1)
! CALL DMURRV (L, Muse+1, Xbetas, L, Muse+1, thetas, 1, L,
!& mupart2)
do j = 1,Muse+1
mupart2(j)=0.d0
do i = 1,L
mupart2(j)=mupart2(j)+Xbetas(i,j)*thetas(i)
enddo
mupart2(j) = mupart2(j)/tau2(1) ! I am the one
enddo
! CALL DMURRV (Mtemp, Mtemp, covarbeta, Mtemp, Mtemp, mupart2,
! & 1,Mtemp, betamu)
do i= 1, Muse+1
betamu(i)=0.d0
do j =1, Muse+1
betamu(i)=betamu(i)+covarbeta(i,j)*mupart2(j)
enddo
enddo
return
end
55
!Subroutine for updating the Theta parameter
SUBROUTINE thetapar (tau2,sigma2,XB,L,ybar,ni,thetamu,thetasig)
DOUBLE PRECISION tau2(1),sigma2(L),XB(L),ybar(L),thetamu(L),
& thetasig(L),dni(L)
INTEGER L ,ni(L)
do i=1,L
dni(i)=ni(i) + 0.0
thetamu(i) = (1/tau2(1))*(tau2(1)*sigma2(i)/(dni(i)*tau2(1)
& +sigma2(i)))*XB(i) +(1/sigma2(i))
& *(tau2(1)*sigma2(i)/(dni(i)*tau2(1)+sigma2(i)))*
& dni(i)*ybar(i)
enddo
do i=1,L
thetasig(i) = sqrt(tau2(1)*sigma2(i)/(dni(i)*tau2(1)
& +sigma2(i)))
enddo
return
end
!Subroutine for likelyhood function
SUBROUTINE llike (betas,XB,tau2,Y,sigma2,thetas,
& L,M1,sigmaa,taua,likehood2,sigbeta,icountup,ni,
& adjust)
DOUBLE PRECISION betas(M1+1),XB(L),tau2(1),
& taua,Y(L,20),btb,thetas(L),
& sigma2(L),sigmaa(L),lik1,lik2,likehood,
& likehood2 ,adjust
INTEGER M1,L ,ni(L)
lik1=0.d0
lik2=0.d0
btb=0.d0
icountup = 0
Mtemp=M1+1
do i=1,L
lik1= lik1 - (sigmaa(i))*dlog(sigma2(i)) -
& (1/(2.d0*sigma2(i))) -
& (1/(2.d0*tau2(1)))*
& (thetas(i) - XB(i))*
& (thetas(i) - XB(i))
end do
do i=1,L
do j=1,ni(i)
lik2 = lik2 -(1/(2.d0*sigma2(i)))*(Y(i,j)-thetas(i))*
& (Y(i,j)-thetas(i))
end do
56
end do
do i = 1,M1+1
btb=btb + betas(i)*betas(i)
end do
likehood = lik1 + lik2 - (taua)*dlog(tau2(1))
& - (1/(2.d0*tau2(1))) - (1/(2.d0*sigbeta)) * btb
likehood2=likehood - adjust !Adjusting likelihood
!write(*,*) "likelihood =", likehood2
return
end
57
top related