sequential multiple decision procedures (smdp) for genome scans q.y. zhang and m.a. province...

Sequential Multiple Decision Procedures Sequential Multiple Decision Procedures (SMDP)(SMDP)

for Genome Scansfor Genome Scans

Q.Y. Zhang and M.A. Province Q.Y. Zhang and M.A. Province

Division of Statistical GenomicsDivision of Statistical GenomicsWashington University School of MedicineWashington University School of Medicine

Statistical Genetics Forum, April, 2006Statistical Genetics Forum, April, 2006

ReferencesReferences

R.E. Bechhofer, J. Kiefer., M. Sobel. 1968. Sequential identification and ranking procedures. The University of Chicago Press, Chicago.

M.A. Province. 2000. A single, sequential, genome-wide test to identify simultaneously all promising areas in a linkage scan. Genetic Epidemiology,19:301-332 .

Q.Y. Zhang, M.A. Province ． 2005. Simplified sequential multiple decision procedures for genome scans ． 2005 Proceedings of American Statistical Association. Biometrics section:463~468

SMDP SMDP

SequentialSequential Multiple DecisionMultiple Decision Procedures Procedures

Sequential testSequential test

Multiple hypothesis testMultiple hypothesis test

Idea 1: SequentialIdea 1: Sequential

nn00Start from a small sample size

Increase sample size, sequential test at each stage (SPRT)

Stop when stopping rule is satisfied

nn00+1+1

nn00+2+2

nn00+i+i

Experiment in next stage Extra data for validation

Idea 2: Multiple DecisionIdea 2: Multiple Decision

SNP1SNP1

SNP2SNP2

SNP3SNP3

SNP4SNP4

SNP5SNP5

SNP6SNP6

……

SNPnSNPn

Simultaneous testSimultaneous testMultiple hypothesis testMultiple hypothesis test Independent testIndependent test

Binary hypothesis testBinary hypothesis test test 1

test 2

test 3

test 4

test 5

test 6

test n

SNP1SNP1

SNP2SNP2

SNP3SNP3

SNP4SNP4

SNP5SNP5

SNP6SNP6

……

SNPnSNPntest-wise error and experiment-wise error

p value correction

Signal Signal group group

Noise Noise group group

Binary Hypothesis TestBinary Hypothesis Test

SNP1SNP1

SNP2SNP2

SNP3SNP3

SNP4SNP4

SNP5SNP5

SNP6SNP6

……

SNPnSNPn

test 1 H0: Eff.(SNP1)=0 vs. H1: Eff.(SNP1)≠0

test 2 H0: Eff.(SNP2)=0 vs. H1: Eff.(SNP2)≠0

test 3 ……

test 4 ……

test 5 ……

test 6 ……

test n H0: Eff.(SNPn)=0 vs. H1: Eff.(SNPn)≠0

Multiple Hypothesis TestMultiple Hypothesis Test

SNP1SNP1

SNP2SNP2

SNP3SNP3

SNP4SNP4

SNP5SNP5

SNP6SNP6

……

SNPnSNPn

H1: SNP1,2,3 are truly different from the others

H3 ……

H4 ……

H6 ……

……

Hu: SNPn,n-1,n-2 are truly different from the others

H: any t SNPs are truly different from the others (n-t)

u= number of all possible combination of t out of n

SMDPSMDP

Sequential test Multiple hypothesis test

Sequential Multiple Decision Procedure

Koopman-Darmois(K-D) PopulationsKoopman-Darmois(K-D) Populations (Bechhofer et al., 1968)(Bechhofer et al., 1968)

The freq/density function of a K-D population can be written in the form:

f(x)=exp{P(x)Q(θ)+R(x)+S(θ)}

A. The normal density function with unknown mean and known variance;

B. The normal density function with unknown variance and known mean;

C. The exponential density function with unknown scale parameter and known location parameter;

D. The Bernoulli distribution with unknown probability of “success” on a single trial;

E. The Poisson distribution with unknown mean;

……

The distance of two K-D populations is defined as :

)()(, jiji QQ ji

SMDP SMDP (Bechhofer et al., 1968)(Bechhofer et al., 1968)

Selecting the Selecting the t t best of best of MM K-D populations K-D populations

Sequential Sampling

1 2 … h h+1 …

Pop. 1

Pop. 2

Pop. t-1

Pop. t

Pop. t+1

Pop. t+2

Pop. M

U possible combinations

of t out of M

For each combination u

)(],[ ... t

hth YYYY 121

*],[ PW hU Stopping rule

Prob. of correct selection (PCS) > P*, whenever D>D*

SMDP: SMDP: P*, t, D*P*, t, D*

P* P* arbitrary, 0.95arbitrary, 0.95

t fixed or variedt fixed or varied

D* indifference zone D* indifference zone

Pop. 1

Pop. 2

Pop. t-1

Pop. t

Pop. t+1 Pop. t+2

Pop. M

*)exp(

],[ PYD

SMDP stopping rule

Prob. of correct selection (PCS) > P*whenever D>D*

Correct selection Populations with Q(θ)> Q(θt)+D* are selected

Q(θt)+D

Q(θt)+D*

Q(θt)

SMDP: SMDP: Computational ProblemComputational Problem

)t(h],U[

)t(h],1U[

)t(h],2[

)t(h],1[

)t(h],j[

)t(h],U[

YY...YY

P)YDexp(

)YDexp(W

Sequential stage

Yt+1,h

Yt+2,h

U sums of U possible combinations of t out of MEach sum contains t members of Yi,h

)!tM(!t

Computer time

h],U[]1U[

h],U[]2U[

h],U[]2[

h],U[]1[

)t(h],U[

)t(h],1U[

)t(h],1S[

)t(h],S[

)t(h],2[

)t(h],1[

)t(h],j[

*)t(h],S[

)t(h],U[

WWW...WW

YY...YY...YY

P)YDexp()YDexp()1S(

)YDexp(W

Simplified Stopping RuleSimplified Stopping Rule (Bechhofer et al., 1968)(Bechhofer et al., 1968)

U-S+1= Top Combination Number (TCN)

TCN=2 (i.e. S=U-1,U-S=1)=> the simplest stopping rule

P)1U(ln{

*h],tM[h],1tM[

When TCN=U (i.e. S=1, U-S=U-1)=> the original stopping rule

How to choose TCN? Balance between computational accuracy and computational time

SMDP Combined With Regression ModelSMDP Combined With Regression Model(M.A. Province, 2000, page 320-321)(M.A. Province, 2000, page 320-321)

Z1 , X1

Z2 , X2

Z3 , X3

Zh , Xh

Zh+1 , Xh+1

ZN , XN

Data pairs for a marker

Sequential sum of squares of regression residualsYi,h denotes Y for marker i at stage h

21h1h1h

1h)h()h(

),0(N~VrV

)XX()XX(h

)Xˆˆ(Zr

Combine SMDP With Regression ModelCombine SMDP With Regression Model(M.A. Province, 2000, page 319)(M.A. Province, 2000, page 319)

)ˆˆ( )()(

Case B : the normal density function with unknown variance and known mean;

jjihi VY

Simplified Stopping Rule Simplified Stopping Rule M.A. Province, 2000 M.A. Province, 2000

322 page 321-322

A Real Data Example (A Real Data Example (M.A. Province, 2000, page 310)M.A. Province, 2000, page 310)

A Real Data Example (A Real Data Example (M.A. Province, 2000, page 308)M.A. Province, 2000, page 308)

Simulation Results (1) Simulation Results (1) M.A. Province, 2000, page 312M.A. Province, 2000, page 312

Simulation Results (2) Simulation Results (2) M.A. Province, 2000, page 313M.A. Province, 2000, page 313

h],U[]1U[

h],U[]2U[

h],U[]2[

h],U[]1[

)t(h],U[

)t(h],1U[

)t(h],1S[

)t(h],S[

)t(h],2[

)t(h],1[

)t(h],j[

*)t(h],S[

)t(h],U[

WWW...WW

YY...YY...YY

P)YDexp()YDexp()1S(

)YDexp(W

Simplified SMDPSimplified SMDP (Bechhofer et al., 1968)(Bechhofer et al., 1968)

U-S+1= Top Combination Number (TCN)

How to choose TCN?

Balance between computational accuracy and computational time

DataData

Sample Sample sizesize

GenotypeGenotype PhenotypePhenotype

Cell linesCell lines

5841 SNPs5841 SNPs

(category: 0,1,2)(category: 0,1,2)

ViabFu7ViabFu7

(continuous)(continuous)

Relation of Relation of WW and and t t (h=50, D*=10)(h=50, D*=10)

Effective Top Combination Number

Zhang & Province,2005,page 465Zhang & Province,2005,page 465

ETCN CurveETCN Curve

t t =?=?

P*=0.95P*=0.95D*=10D*=10TCN=10000TCN=10000

72 SNPs72 SNPsP<0.01P<0.01

SMDP SummarySMDP Summary

Advantages:Advantages:

Test, identify all signals simultaneously, no multiple comparisons Test, identify all signals simultaneously, no multiple comparisons

Use “Minimal” N to find significant signals, efficient Use “Minimal” N to find significant signals, efficient

Tight control statistical errors (Type I, II), powerfulTight control statistical errors (Type I, II), powerful

Save rest of N for validation, reliableSave rest of N for validation, reliable

Further studies:Further studies:

Computer time Computer time

Extension to more methods/modelsExtension to more methods/models

Extension to non-K-D distributionsExtension to non-K-D distributions

Thanks !Thanks !

sequential multiple decision procedures (smdp) for genome scans q.y. zhang and m.a. province...

snp20 test

snp10 test

test n h0

genomewide test

t snps

t best

possible combination

t members of yi

Documents

sole scans

scans - unodc.org · title: scans author: zauner subject:...

smdp-gh 2011 track four handout re

naruto scans 464

storyboard scans

welcome to the smdp news s.mdp news, the newsletter for …...

budget test scans

cobra scans

apr 1 aug 20 santa monica shines launchesflorida new york...

bleach scans 374

m1911 blueprints scans

medical imaging x-rays ct scans mris ultrasounds pet scans

scans. filescans

relation scans

smdp workshop on mixed-signal vlsi design,...

scans from book

smdp graphic-data from johns hopkins and smdp …florida new...

media scans

scans book

sharpening scans