sequential multiple decision procedures (smdp) for genome scans q.y. zhang and m.a. province...
Post on 04-Jan-2016
220 Views
Preview:
TRANSCRIPT
Sequential Multiple Decision Procedures Sequential Multiple Decision Procedures (SMDP)(SMDP)
for Genome Scansfor Genome Scans
Q.Y. Zhang and M.A. Province Q.Y. Zhang and M.A. Province
Division of Statistical GenomicsDivision of Statistical GenomicsWashington University School of MedicineWashington University School of Medicine
Statistical Genetics Forum, April, 2006Statistical Genetics Forum, April, 2006
ReferencesReferences
R.E. Bechhofer, J. Kiefer., M. Sobel. 1968. Sequential identification and ranking procedures. The University of Chicago Press, Chicago.
M.A. Province. 2000. A single, sequential, genome-wide test to identify simultaneously all promising areas in a linkage scan. Genetic Epidemiology,19:301-332 .
Q.Y. Zhang, M.A. Province . 2005. Simplified sequential multiple decision procedures for genome scans . 2005 Proceedings of American Statistical Association. Biometrics section:463~468
SMDP SMDP
SequentialSequential Multiple DecisionMultiple Decision Procedures Procedures
Sequential testSequential test
Multiple hypothesis testMultiple hypothesis test
Idea 1: SequentialIdea 1: Sequential
nn00Start from a small sample size
Increase sample size, sequential test at each stage (SPRT)
Stop when stopping rule is satisfied
nn00+1+1
nn00+2+2
nn00+i+i
…
Experiment in next stage Extra data for validation
…
Idea 2: Multiple DecisionIdea 2: Multiple Decision
SNP1SNP1
SNP2SNP2
SNP3SNP3
SNP4SNP4
SNP5SNP5
SNP6SNP6
……
SNPnSNPn
Simultaneous testSimultaneous testMultiple hypothesis testMultiple hypothesis test Independent testIndependent test
Binary hypothesis testBinary hypothesis test test 1
test 2
test 3
test 4
test 5
test 6
test n
SNP1SNP1
SNP2SNP2
SNP3SNP3
SNP4SNP4
SNP5SNP5
SNP6SNP6
……
SNPnSNPntest-wise error and experiment-wise error
p value correction
Signal Signal group group
Noise Noise group group
Binary Hypothesis TestBinary Hypothesis Test
SNP1SNP1
SNP2SNP2
SNP3SNP3
SNP4SNP4
SNP5SNP5
SNP6SNP6
……
SNPnSNPn
test 1 H0: Eff.(SNP1)=0 vs. H1: Eff.(SNP1)≠0
test 2 H0: Eff.(SNP2)=0 vs. H1: Eff.(SNP2)≠0
test 3 ……
test 4 ……
test 5 ……
test 6 ……
test n H0: Eff.(SNPn)=0 vs. H1: Eff.(SNPn)≠0
Multiple Hypothesis TestMultiple Hypothesis Test
SNP1SNP1
SNP2SNP2
SNP3SNP3
SNP4SNP4
SNP5SNP5
SNP6SNP6
……
SNPnSNPn
H1: SNP1,2,3 are truly different from the others
H2: SNP1,2,4 are truly different from the others
H3 ……
H4 ……
H5: SNP4,5,6 are truly different from the others
H6 ……
……
Hu: SNPn,n-1,n-2 are truly different from the others
H: any t SNPs are truly different from the others (n-t)
u= number of all possible combination of t out of n
SMDPSMDP
Sequential test Multiple hypothesis test
Sequential Multiple Decision Procedure
Koopman-Darmois(K-D) PopulationsKoopman-Darmois(K-D) Populations (Bechhofer et al., 1968)(Bechhofer et al., 1968)
The freq/density function of a K-D population can be written in the form:
f(x)=exp{P(x)Q(θ)+R(x)+S(θ)}
A. The normal density function with unknown mean and known variance;
B. The normal density function with unknown variance and known mean;
C. The exponential density function with unknown scale parameter and known location parameter;
D. The Bernoulli distribution with unknown probability of “success” on a single trial;
E. The Poisson distribution with unknown mean;
……
The distance of two K-D populations is defined as :
)()(, jiji QQ ji
jiB
2
1
2
1,:
SMDP SMDP (Bechhofer et al., 1968)(Bechhofer et al., 1968)
Selecting the Selecting the t t best of best of MM K-D populations K-D populations
Sequential Sampling
1 2 … h h+1 …
Pop. 1
Pop. 2
:
Pop. t-1
Pop. t
Pop. t+1
Pop. t+2
:
Pop. M
D
Y1,h
Y2,h
:
:
Yi,h
:
::
YM,h
U
j
thj
thU
hU
YD
YDW
1
)exp(
)exp(
)(],[
*
)(],[
*
],[
)!(!
!
tMt
MU
U possible combinations
of t out of M
t
khi
thu k
YY1
,)(
,
For each combination u
)(],[
)(],[
)(],[
)(],[ ... t
hUt
hUt
hth YYYY 121
*],[ PW hU Stopping rule
Prob. of correct selection (PCS) > P*, whenever D>D*
SMDP: SMDP: P*, t, D*P*, t, D*
P* P* arbitrary, 0.95arbitrary, 0.95
t fixed or variedt fixed or varied
D* indifference zone D* indifference zone
Pop. 1
Pop. 2
:
Pop. t-1
Pop. t
Pop. t+1 Pop. t+2
:
:
:
Pop. M
D
*)exp(
)exp(
)(],[
*
)(],[
*
],[ PYD
YDW
U
j
thj
thU
hU
1
SMDP stopping rule
Prob. of correct selection (PCS) > P*whenever D>D*
Correct selection Populations with Q(θ)> Q(θt)+D* are selected
D*
Q(θt)+D
Q(θt)+D*
Q(θt)
SMDP: SMDP: Computational ProblemComputational Problem
)t(h],U[
)t(h],1U[
)t(h],2[
)t(h],1[
*U
1j
)t(h],j[
*
)t(h],U[
*
h],U[
YY...YY
P)YDexp(
)YDexp(W
1
2
3
:
h
h+1
:
N
Sequential stage
Y1,h
Y2,h
:
Yt,h
Yt+1,h
Yt+2,h
:
YM,h
U sums of U possible combinations of t out of MEach sum contains t members of Yi,h
)!tM(!t
!MU
Computer time
?
h],U[]1U[
h],U[]2U[
h],U[]2[
h],U[]1[
h],U[
)t(h],U[
)t(h],1U[
)t(h],1S[
)t(h],S[
)t(h],2[
)t(h],1[
*U
Sj
)t(h],j[
*)t(h],S[
*
)t(h],U[
*]SU[
h],U[
WWW...WW
YY...YY...YY
P)YDexp()YDexp()1S(
)YDexp(W
Simplified Stopping RuleSimplified Stopping Rule (Bechhofer et al., 1968)(Bechhofer et al., 1968)
U-S+1= Top Combination Number (TCN)
TCN=2 (i.e. S=U-1,U-S=1)=> the simplest stopping rule
}P1
P)1U(ln{
D
1YY
*
*
*h],tM[h],1tM[
When TCN=U (i.e. S=1, U-S=U-1)=> the original stopping rule
How to choose TCN? Balance between computational accuracy and computational time
SMDP Combined With Regression ModelSMDP Combined With Regression Model(M.A. Province, 2000, page 320-321)(M.A. Province, 2000, page 320-321)
Z1 , X1
Z2 , X2
Z3 , X3
: :
Zh , Xh
Zh+1 , Xh+1
: :
ZN , XN
Data pairs for a marker
Sequential sum of squares of regression residualsYi,h denotes Y for marker i at stage h
1h
1j
2j1h
21h1h1h
h
1j
21hj
h
1j
2)h(j
h
1j
2)h(j
1h
1h)h()h(
1h1h
VY
),0(N~VrV
)XX()XX(h
)XX(h
)Xˆˆ(Zr
XZ
Combine SMDP With Regression ModelCombine SMDP With Regression Model(M.A. Province, 2000, page 319)(M.A. Province, 2000, page 319)
),(~
)ˆˆ( )()(
2111
111
0
NVrV
XZr
XZ
hhh
hhh
hh
Case B : the normal density function with unknown variance and known mean;
h
jjihi VY
1
2,,
Simplified Stopping Rule Simplified Stopping Rule M.A. Province, 2000 M.A. Province, 2000
page 321-322 page 321-322
A Real Data Example (A Real Data Example (M.A. Province, 2000, page 310)M.A. Province, 2000, page 310)
A Real Data Example (A Real Data Example (M.A. Province, 2000, page 308)M.A. Province, 2000, page 308)
Simulation Results (1) Simulation Results (1) M.A. Province, 2000, page 312M.A. Province, 2000, page 312
Simulation Results (2) Simulation Results (2) M.A. Province, 2000, page 313M.A. Province, 2000, page 313
h],U[]1U[
h],U[]2U[
h],U[]2[
h],U[]1[
h],U[
)t(h],U[
)t(h],1U[
)t(h],1S[
)t(h],S[
)t(h],2[
)t(h],1[
*U
Sj
)t(h],j[
*)t(h],S[
*
)t(h],U[
*]SU[
h],U[
WWW...WW
YY...YY...YY
P)YDexp()YDexp()1S(
)YDexp(W
Simplified SMDPSimplified SMDP (Bechhofer et al., 1968)(Bechhofer et al., 1968)
U-S+1= Top Combination Number (TCN)
How to choose TCN?
Balance between computational accuracy and computational time
DataData
Sample Sample sizesize
GenotypeGenotype PhenotypePhenotype
8585
Cell linesCell lines
5841 SNPs5841 SNPs
(category: 0,1,2)(category: 0,1,2)
ViabFu7ViabFu7
(continuous)(continuous)
Relation of Relation of WW and and t t (h=50, D*=10)(h=50, D*=10)
Effective Top Combination Number
ETCN
Zhang & Province,2005,page 465Zhang & Province,2005,page 465
ETCN CurveETCN Curve
Zhang & Province,2005,page 466Zhang & Province,2005,page 466
t t =?=?
Zhang & Province,2005,page 466Zhang & Province,2005,page 466
Zhang & Province,2005,page 467Zhang & Province,2005,page 467
P*=0.95P*=0.95D*=10D*=10TCN=10000TCN=10000
72 SNPs72 SNPsP<0.01P<0.01
SMDP SummarySMDP Summary
Advantages:Advantages:
Test, identify all signals simultaneously, no multiple comparisons Test, identify all signals simultaneously, no multiple comparisons
Use “Minimal” N to find significant signals, efficient Use “Minimal” N to find significant signals, efficient
Tight control statistical errors (Type I, II), powerfulTight control statistical errors (Type I, II), powerful
Save rest of N for validation, reliableSave rest of N for validation, reliable
Further studies:Further studies:
Computer time Computer time
Extension to more methods/modelsExtension to more methods/models
Extension to non-K-D distributionsExtension to non-K-D distributions
Thanks !Thanks !
top related