session iii. introduction to probability and testing for goodness of fit:

39
Session III. Introduction to Probability and Testing for Goodness of Fit: Or An IDEA about Probability and Testing! (Zar: Chapters 5, 22)

Upload: alexandre-deangelo

Post on 30-Dec-2015

25 views

Category:

Documents


1 download

DESCRIPTION

Session III. Introduction to Probability and Testing for Goodness of Fit: Or An IDEA about Probability and Testing! (Zar: Chapters 5, 22). What is Probability?. (1) Not defined much like a point was not defined in geometry; but -------. (2) Probability is a measure of the “chance” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Session III.Introduction to Probability and Testing for Goodness of Fit:

Or An IDEA about Probability and Testing!

(Zar: Chapters 5, 22)

Page 2: Session III. Introduction to Probability and  Testing for Goodness of Fit:

What is Probability?

(1) Not defined much like a point was not defined in geometry; but -------

(2) Probability is a measure of the “chance” that an “event” will happen.

(a) “What’s the probability it will rain?”

(b) “What’s the probability that the coin will be ‘heads’ when I flip it?”

Subjective

Objective

(3) Measured between 0 - 100% or between 0-1.

Page 3: Session III. Introduction to Probability and  Testing for Goodness of Fit:

If “A” is the event, then P(A) is the probability of A.

How do you get a Value for P(A)?

Estimate of P(A) is

ˆ P (A) # of times "A" happens

# of total tries

x1

0if miss (not A)

1if hit (A)

x2 0 or 1

x3 0 or 1

xn 0 or 1

ˆ P (A)

x1 x2 xn

nP(A)

Page 4: Session III. Introduction to Probability and  Testing for Goodness of Fit:

What are “Odds”?

O(A)

#of Times "A" Happens#of Times "A" doesn't Happen

1 2

1 2

n

n

i

i

x x x

n x x x

x

n x

P̂( )

P̂(~ )

A

A

Page 5: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Events

Disjoint or Mutually ExclusiveUniverse

AB

IndependentWhen one event does not effect another

Ex: Coin flips

Random selection from an infinite set or selection with replacement from a finite set.

Page 6: Session III. Introduction to Probability and  Testing for Goodness of Fit:

DependentWhen one event effects another

Ex: Checker flipsRandom selection from a finite set

Ex: Colored balls in a bag

Joint outcomesIndependent: Multiply!

P( , ) P( ) P( )

1 1 1

2 2 4

H H H H

First TossH T

H (H,H) (H,T)Second TossT (T,H) (T,T)

Page 7: Session III. Introduction to Probability and  Testing for Goodness of Fit:

What is the chance of each outcome?

If Mutually Exclusive? Add!

1 1 1P ((H,H) + (T,T)) =

4 4 21

P (1 head)= P ((H,T)+(T,H))=2

3P (at least 1 head)= P (1 head)+ P (2 heads)=

4

1st & 2nd toss are independent!The outcomes (H,H), (H,T), (T,H), and (T,T) aremutually exclusive!

Page 8: Session III. Introduction to Probability and  Testing for Goodness of Fit:

A Simple Hypothesis

The “Binomial” --Any experiment with just two outcomes

EX: Flower color Suppose only yellow or green flowers.

(Y Y) (g g)

1st cross: (Y g) (Y g)

2nd cross: (Y Y) (Y g) (g Y) (g g )

Page 9: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Probability: 1/4 1/4 1/4 1/4

Hypothesis: Yellow is Dominant

Result: Y Y Y g

Probabiliy: 3/4 1/4

Page 10: Session III. Introduction to Probability and  Testing for Goodness of Fit:

THE EXPERIMENT:

Select 100 flowers at random:

Yellow GreenResult: 84 16

Expected: 75 25 (3/4100) (1/4100)

Under H0:

The Problem:

Is 84,16 consistent with the hypothesis?Does 84,16 support a probability of 75%, 25%?

Page 11: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Answer: In the form of a question:

What’s the probability that 84,16 could come from a true population of 75%, 25%?

Ho: p =75%

The Binomial Distribution:

Page 12: Session III. Introduction to Probability and  Testing for Goodness of Fit:

The Binomial Distribution: Take in Slow!

n =1 P(Y) = .75 P(g) =.25

n = 2P(YY)=P(Y)P(Y) = .75 .75 = .5625P(Yg) = P(g Y) = P(Y) P(g) = .75 .25 = .1875P(gg) = P(g) P(g) = .25 .25 = .0625

BUT…..

P(YY) + P(Yg) + P(gg) = .8125 ≠1

Page 13: Session III. Introduction to Probability and  Testing for Goodness of Fit:

What’s wrong?We have the possibility of both Yg + gY)!

Should be P(Y Y) + P (Y g) + P (g Y) + P (g g) 1

Which is # Probability(1) P(YY) = .252

(2) P(Yg) = P(gY) = .75 x .25(1) P(gg) = .252

Ex: P(at least one Y) = P(YY) + P(gY) + P(Yg) = .75

Page 14: Session III. Introduction to Probability and  Testing for Goodness of Fit:

n = 3

(1) P (Y Y Y) = .753

(3) P (Y Y g) = P(YgY) + P(gYY) = .752 .25

(3) P(Y g g) = P(g Y g) = P(g g Y) = .75 .252

(1) P(g g g ) = .253

The Binomial distribution (cont.)

Page 15: Session III. Introduction to Probability and  Testing for Goodness of Fit:

In general: for x= 0, 1, …, n

P(xY’s, (n-x)g’s) = #ways (xY’s; (n-x)g’s) P(Y Y … Y g g … g)

x n-x

Where

1n xxn

p px

!

! !

! 1 2 1

n n

x x n x

n n n n

Page 16: Session III. Introduction to Probability and  Testing for Goodness of Fit:

1 1 1

1 2 1 1 3 3 1 1 4 6 4 1

1 5 10 10 5 1 1 6 15 20 15 6 1 1 7 21 35 35 21 7 1

x

n By Pascal’s Triangle:

Page 17: Session III. Introduction to Probability and  Testing for Goodness of Fit:

np

qpxnx

nnp

qpxnx

nx

qpx

nx

pnxxx

xnxn

x

xnxn

x

xnxn

x

n

x

)1(111

01

1

0

0

))!1(1()!1(

)!1(

)!(!

!

),;Bin(E

Calculate the Expectation (mean) of a binomial:

Page 18: Session III. Introduction to Probability and  Testing for Goodness of Fit:

17 0.01651564 0.03762627 18 0.02538515 0.06301142 19 0.03651899 0.09953041 20 0.04930064 0.14883105 21 0.06260399 0.21143505 22 0.07493508 0.28637013 23 0.08470922 0.37107936 24 0.09059180 0.46167117 25 0.09179969 0.55347085 26 0.08826894 0.64173979

27 0.08064076 0.72238052 28 0.07008065 0.79246116 29 0.05799779 0.85045892 30 0.04575381 0.89621270 31 0.03443835 0.93065107 32 0.02475256 0.95540363 33 0.01700176 0.97240537

x P(x g’s) P(≤ x g’s) 0 0.00000000 0.00000000 1 0.00000000 0.00000000 2 0.00000000 0.00000000 3 0.00000000 0.00000000 4 0.00000002 0.00000002 5 0.00000010 0.00000012 6 0.00000052 0.00000064 7 0.00000235 0.00000299 8 0.00000910 0.00001209 9 0.00003100 0.00004308 10 0.00009402 0.00013710 11 0.00025642 0.00039352 12 0.00063392 0.00102744 13 0.00143038 0.00245782 14 0.00296294 0.00542076 15 0.00566251 0.01108327 16 0.01002735 0.02111062

For n=100:

Page 19: Session III. Introduction to Probability and  Testing for Goodness of Fit:
Page 20: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Conclusions:

Chance of (84,16) = chance of getting (84,16) or anything “rarer” than (84,16)

= P(84,16) +P(85,15) + P(86,14) +…+P(100,0)= .0211

What is rare enough?

Biomedical Convention: .05 or 5%

RULE: If the experiment is rarer than the cutoff level, say that the experiment is not consistent with the hypothesis!

If less rare, say it is consistent!

Page 21: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Other Cutoffs:

Example: the Bruston Explosive Bolt.

Lower:.01 or 1% (1/100)for situations needing a lower error rate

Or higher:

Example: Physiological StudiesExample: Secondary mets. in pediatric Leukemia study

Conclusion:

No one cutoff works in every situation. The cutoff should be set before hand to avoid bias.

.001 or .1% (1/1000)

Page 22: Session III. Introduction to Probability and  Testing for Goodness of Fit:

What is the cutoff? What is the p=value?

Chance [experiment statistic ≤ cutoff] or

Pr[X ≤ x | Ho] ≤ cutoff probability

Value inUniverse

StatisticUnder the Ho

If the probability statement is true, then decide that the experiment is not consistent with the hypothesis.

But there’s still a chance the experiment came from the Ho!

Page 23: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Decide from ExperimentH0 HA

H0 No Error Type I ()Actual Truth

HA Type II () No Error

Type I error = = PR[decide ~ Ho | Ho is true]

Three numbers: cutoff If you have one you have all

given

Type II error = ß = Pr[decide Ho | Ho is not true]

Page 24: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Summary of the Binomial

• Density function:• Distribution

function:

; , ; 1i n inb i n p p q q p

i

1

; ,x

i n i

i

nB x n p p q

i

Mean: np

Variance: npq

nxx i /

xxn

nxxs i

1

or 1/22

Page 25: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Another Way to look at flowers:

H0 : Yellow is dominantHA : Yellow is not dominant

Y g TotalObserved: 84 16 100

Expected percent: 75% 25%

Expected number: 75 25 100(n*proportion)

chi-square: (84-75)2 (16-25)2

75 25

Page 26: Session III. Introduction to Probability and  Testing for Goodness of Fit:

22#terms

1

(observed - expected)

expectedi

Degrees of freedom=#terms-1

2 (84 75)2

75

(16 25)2

25

1.083.244.32

84, 16 Example:Degrees of freedom = 2-1 =1

So how extreme is 4.32?

Page 27: Session III. Introduction to Probability and  Testing for Goodness of Fit:

X

109876543210

.8

.7

.6

.5

.4

.3

.2

.1

0.0

X

109876543210

1.0

.9

.8

.7

.6

.5

.4

.3

.2

.1

0.0

Support H0

Cutoff

Reject H0

Page 28: Session III. Introduction to Probability and  Testing for Goodness of Fit:

X

8.07.57.06.56.05.55.04.54.03.53.0

.030

.025

.020

.015

.010

.005

0.000

X

8.07.57.06.56.05.55.04.54.03.53.0

1.00

.99

.98

.97

.96

.95

.94

.93

.92

Table B.1p-value x

0.01 6.635

0.025 5.024

0.05 3.841

4.32

Page 29: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Table B.1DF p-value x 1 0.05 3.841 4 0.05 9.48810 0.05 18.30720 0.05 31.410

Page 30: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Another Example:More than 2 groups.

Color: Green & YellowTexture: Smooth & Wrinkled

Hypothesis: Y is dominantS is dominantColor and Texture are independent

Pr(any cell)=1/16

Color

T (Y Y) (g Y) (Y g) (g g)

e (S S) (YY,SS) (gY,SS) (Yg, SS) (gg,SS)

x (Sw) (YY,Sw) (gY,Sw) (Yg,Sw) (gg,SS)

t (W S) (YY,wS) (gY,wS) (Yg,wS) (gg,wS)

u (ww) (YY,ww) (gY,ww) (Yg,wS) (gg,ww)

r

e

Page 31: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Ho : 9 : 3 : 3 : 1 16

YS Yw gS gw Total

Obs: 152 39 53 6 250

Pr(H0): 9/16 3/16 3/16 1/16

(0.5625) (0.1875) (0.1875) (0.0625)

Expected: 140.625 46.875 46.875 15.625

2: (152-140.625)2 (39-46.875)2 (53-46.875)2 (6-15.625)2

140.625 46.875 46.875 15.625

0.9201 + 1.3230 + 0.8003 + 5.929 = 8.97

DF: 1 + 1 + 1 + 1 - 1 = 3

2.05 0(3) 7.815 8.97; (p=.0293<0.05) reject H

Page 32: Session III. Introduction to Probability and  Testing for Goodness of Fit:

So, where’s the Difference or Subdividing the H0

(1) too few gw

(2) about the right # of the others

combine YS + Yw + gS and compare to gw.But first, test (2):

H0 : YS, Yw, gS in 9 : 3 : 3Total

Obs: 152 39 53 244Pr(H0 ) : 9/15 = .6 3/15 = .2 3/15 = .2Exp: 146.4 48.8 48.82 : .2142 + 1.968 + 0.3615 = 2.544 D.F = 1 + 1 + 1 - 1= 2

205 (2) = 5.991 Accept H0

Page 33: Session III. Introduction to Probability and  Testing for Goodness of Fit:

H0 : Others vs gw

15 : 1 TotalObs: 244 6 250

Pr(H0): 15 = .9375 1 = .062516 16

Expected: 234.375 15.625

2: 0.3953 + 5.929 = 6.324 DF : 1 + 1 -1 = 1

2.05 (1) = 3.841 Reject H0 and accept “Too few gw”

Page 34: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Summary:

2

0

2

1

ˆ( )ˆ

observation

ˆ expected value (under H )

i

thi

thi

ki i

i

f f

f

f i

f i

DF = k-1

Page 35: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Others:

(1) Continuity Correction

(2) Rule of Thumb to use 2 instead of Binomial:

If no more than 25% of the

And none ≤ 1, then use 2.

ˆ 5if

(3) Log-Likelihood Ratio

Yates correction

1

1 1

2 lnˆ

ˆ2 ln ln

ki

ii

i

k k

i i i ii i

fG f

f

f f f f

Entropy

Information

22

1

ˆ 0.5

ˆ

k i i

ci i

f f

f

Page 36: Session III. Introduction to Probability and  Testing for Goodness of Fit:

(4) Heterogeneity Chi-SquareThere is often the need to combine chi-square analyses: the common cause is a batch effect where only a certain number of subjects (e.g., cages, school classes, laboratories, or clinics). There is a common hypothesis over all batches (e.g., gender, ethnicity, presence/absence of marker).

(a) Perform chi-square on each “batch”.

(b) Pool all batches and do a “pooled chi-square”.

(c) Sum the individual chi-squares (d.f.= sum of the individual batch chi-squares= k batches times the df for each batch)

(d) Subtract the pooled chi-square from the sum and test with (k-1)* individual batch df. This is the heterogeneity chi-square.

Page 37: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Ex 22.5: heterogeneity chi-square analysis. G. Mendal 1933, 32Experiment Ye llow seeds Green seeds Total

seeds (n)Chi-square DF

1 25 11 36 0.5926 1(27.0000) (9.0000)

2 32 7 39 1.0342 1(29.2500) (9.7500)

3 14 5 19 0.0175 1(14.2500) (4.7500)

4 70 27 97 0.4158 1(72.7500) (24.2500)

5 24 13 37 2.0270 1(27.7500) (9.2500)

6 20 6 26 0.0513 1(19.5000) (6.5000)

7 32 13 45 0.3630 1(33.7500) (11.2500)

8 44 9 53 1.8176 1(39.7500) (13.2500)

9 50 14 64 0.3333 1(48.0000) (16.0000)

10 44 18 62 0.5376 1(46.5000) (15.5000)

Total ofchi-squaresChi-squares

7.1899 10

(i.e., pooled 355 123 478 0.1367 1chi-squared (358.5000) (119.5000)

7.0532 9(0.50<P<0.75Difference total-pooled:

Page 38: Session III. Introduction to Probability and  Testing for Goodness of Fit:

Ex 22.6: Heterogeneity Chi-SquareSample Right-

handedLeft-handed

N Chi-square DF

1 3 11 14 4.5714* 1(7.0000) (7.0000)

2 4 12 16 4.0000* 1(8.00000 (8.0000)

3 5 15 20 5.0000* 1(10.0000) (10.0000)

4 14 4 18 5.5556* 1(9.0000) (9.0000)

5 13 4 17 4.7647* 1(8.5000) (8.5000)

6 17 5 22 6.5455* 1(11.0000) (11.0000)

*Statisticalysignificant..Total of chi-squares

30.5372 6

Chi-squareof totals(i.e. pooled

56 51 107 0.2336 1

chi-square) (53.5000) (53.5000)Heterogeneity chi-square

30.2036*P< 0.001

5

Difference total-pooled:

Page 39: Session III. Introduction to Probability and  Testing for Goodness of Fit:

II. SC Exchanges in Lymphocytes

Table 4.Distribution of exchanges between chromosomesChromosome Total Relative Proportional Observed

Length Length Exchanges

1 18.16 .08712 2662 16.90 .08107 2963 14.20 .06812 186

4-5 25.36 .12165 4426-12-X 78.06 .37446 88813-15 21.00 .10074 25516-18 18,28 .08769 12519-20 8.50 .04078 26

21-22-Y 8..00 .03838 23Total 208.46 1.00001 2507

TEST:Ho: Exchanges are proportional to length of chromosome

Problem Set 1: