stat2802-3902_chapter_6

12

Click here to load reader

Upload: kenneth-wong

Post on 12-Dec-2015

5 views

Category:

Documents


0 download

DESCRIPTION

lmlmlm

TRANSCRIPT

Page 1: STAT2802-3902_Chapter_6

Chapter 6

Supplemental Materials on Critical

Regions and p-values with Skew

Distributions

6.1 Tests on Normal Variances

6.1.1 One-sample χ2 test

The issue: Suppose that we want to test the null hypothesis H0: σ2 = σ20

against one of the alternatives H1: σ2 6= σ20 , H2: σ2 > σ2

0 , or H3: σ2 < σ20

on the basis of a random sample X1, . . . , Xn from N(µ, σ2), where µ and σ2

are unknown.

(a) The critical region approach

Step 1: To Find a Test Statistic. Note that the distribution of

χ2 =(n − 1)S2

σ2

is χ2(n − 1) that does not depend on the mean µ and the variance σ2. In

other words, χ2 is a pivotal quantity. The test statistic is

χ21 =̂

(n − 1)S2

σ20

=(n − 1)S2

σ2·σ2

σ20

= χ2 ·σ2

σ20

.

When H0 is true, i.e., σ2 = σ20 , the test statistic is

χ21 = χ2 ∼ χ2(n − 1). (6.1)

Page 2: STAT2802-3902_Chapter_6

142 6. Supplemental Materials on Critical Regions and p-values

Step 2: To Determine a Critical Region of Size α. Since

α = α1 + α2

= Pr{χ21 6 k1} + Pr{χ2

1 > k2}, for H1, (6.2)

α = Pr{χ21 > χ2(α, n − 1)}, for H2,

α = Pr{χ21 6 χ2(1 − α, n − 1)}, for H3,

where k1 = χ2(1−α1, n− 1) and k2 = χ2(α2, n− 1), the critical regions for

three alternatives H1, H2 or H3 are given by

C1 = {x: χ21, obs 6 k1 or χ2

1, obs > k2}, (6.3)

C2 = {x: χ21, obs > χ2(α, n − 1)},

C3 = {x: χ21, obs 6 χ2(1 − α, n − 1)},

where

χ21, obs =

(n − 1)s2

σ20

, (6.4)

denotes the observed value of the test statistic χ21.

Step 3: To Determine k1 and k2. Almost all textbooks simply use an

equal-tail approach by letting α1 = α2 = α/2 so that

k1 = χ2(1 − α/2, n − 1) and k2 = χ2(α/2, n − 1). (6.5)

However, for non-symmetrical density like this one, the equal-tail approach

leading to the results in (6.5) is not the correct likelihood ratio test.

In the follows, we present a simple method for determine k1 and k2 (see

Figure 6.1). We denote the density function of the χ2(n − 1) by

gn−1(x) =2−(n−1)/2

Γ((n − 1)/2)x(n−1)/2−1e−x/2, x > 0,

whose mode is n − 3. The equal-height approach requires to determine k1

and k2 satisfying (see Figure 6.1):

gn−1(k1) = gn−1(k2),

subject to k1 < n − 3 < k2. Note that

k1 = χ2(1 − α1, n − 1) and

k2 = χ2(α2, n − 1).

Page 3: STAT2802-3902_Chapter_6

6.1 Tests on Normal Variances 143

1 − α

α1 gn−1(k1) gn−1(k2) α2

k1 k2

Figure 6.1 The critical region C1 defined by (6.3) for a two-tailed χ2 test, wheregn−1(·) denotes the density function of the χ2(n − 1), and α1 + α2 = α.

That is, finding k1 and k2 is equivalent to finding α1 and α2. Since α2 =

α − α1, we only need to find α1, which satisfies the following equation:

gn−1(χ2(1 − α1, n − 1)) = gn−1(χ

2(α − α1, n − 1)). (6.6)

Especially, when gn−1(·) is skew toward the left, we always have 0 < α1 <

α/2. Thus, we can use the grid method or the bisection method to find α1.

function(n, alone)

{

# Function name: Compute.alpha.one.chisq.test(n, alone)

# n is the sample size

# alone is a vector consisting of the grid points

# over the interval [0.001, 0.025], e.g.,

# alone = seq(0.025, 0.001, -0.0001)

v <- n - 1

al <- 0.05

L <- length(alone)

k1 <- qchisq(alone, v)

k2 <- qchisq(1 - al + alone, v)

Page 4: STAT2802-3902_Chapter_6

144 6. Supplemental Materials on Critical Regions and p-values

gk1 <- dchisq(k1, v)

gk2 <- dchisq(k2, v)

error <- abs(gk1 - gk2)

result <- matrix(c(alone, k1, k2, gk1, gk2, error), L, 6)

return(result)

}

Example 6.1 Let X1, . . . , Xn be a random sample from N(µ, σ2), and

we observed n = 25 and s2 = 1600. Test H0: σ2 = σ20 = 900 against H1:

σ2 6= σ20 at the 0.05 level of significance.

Solution.

1◦ Since α = 0.05, from Table 6.1, we have α1 = 0.0138, k1 = 11.361 and

k2 = 37.817. In this example, the mode of the density of χ2(n − 1) is

n − 3 = 22.

Table 6.1 Calculation of α1 from (6.6)

α1 k1 k2 gn−1(k1) gn−1(k2) Error

0.0250 12.4012 39.364 0.01323 0.00609 0.0071416

0.0230 12.2454 39.047 0.01244 0.00653 0.0059176

0.0210 12.0793 38.751 0.01164 0.00696 0.0046744

0.0190 11.9010 38.472 0.01080 0.00739 0.0034095

0.0170 11.7082 38.209 0.00994 0.00782 0.0021199

0.0150 11.4974 37.960 0.00904 0.00824 0.0008017

0.0140 11.3840 37.840 0.00858 0.00845 0.0001305

0.0138 11.3610 37.817 0.00849 0.00849 4.7961 × 10−6

NOTE: Error = |gn−1(k1) − gn−1(k2)|.

2◦ Reject the H0 if χ21, obs 6 11.361 or χ2

1, obs > 37.817, where χ21, obs is

defined by (6.4).

3◦ Calculate χ21, obs = 42.667.

4◦ Since χ21, obs = 42.667 > 37.817, the null hypothesis is rejected. ‖

(b) The p-value approach

Page 5: STAT2802-3902_Chapter_6

6.1 Tests on Normal Variances 145

The corresponding p-values can be calculated by

p-value = p1 + p2

= Pr{χ21 6 b1} + Pr{χ2

1 > b2}, for H1, (6.7)

p-value = Pr{χ21 > χ2

1, obs}, for H2,

p-value = Pr{χ21 6 χ2

1, obs}, for H3,

where p1, p2, b1 and b2 are shown in Figure 6.2, χ21 is specified by (6.1) and

χ21, obs given by (6.4) denotes the observed value of the test statistic χ2

1. To

calculate the two-tailed p-value defined by (6.7), we consider two cases.

Case I: χ21, obs < n − 3. In this case, we define b1 = χ2

1, obs as shown in

Figure 6.2(i). The value of b2 can be obtained by solving

gn−1(b2) = gn−1(b1) subject to b2 > n − 3. (6.8)

Case II: χ21, obs > n − 3. In this case, we define b2 = χ2

1, obs as shown in

Figure 6.2(ii). The value of b1 can be determined by

gn−1(b1) = gn−1(b2) subject to 0 < b1 < n − 3. (6.9)

Example 6.1 (Revisited). Note that n = 25 and χ21, obs = 42.667 > 22 =

n − 3, then b2 = χ21, obs = 42.667, gn−1(b2) = g24(42.667) = 0.0028345. Now

(6.9) becomes g24(b1) = 0.0028345.

Table 6.2 Calculation of b1 from g24(b1) = 0.0028345

b1 |g24(b1) − 0.0028345| b1 |g24(b1) − 0.0028345|

11 4.2971 × 10−3 9.6 3.7808 × 10−4

10 1.2866 × 10−3 9.5 1.7535 × 10−4

9.9 1.0444 × 10−3 9.4 1.8020 × 10−5

9.8 8.1244 × 10−4 9.41 9.0208 × 10−7

9.7 5.9038 × 10−4 9.4095 4.6199 × 10−8

From Table 6.2, we obtain b1 = 9.4095 < 22. From (6.7),

p-value = p1 + p2

= Pr{χ2(n − 1) 6 b1} + Pr{χ2(n − 1) > b2}

= Pr{χ2(24) 6 9.4095} + Pr{χ2(24) > 42.667}

= 0.0034162 + 0.010854 = 0.01427 < 0.05

so that the H0 must be rejected. ‖

Page 6: STAT2802-3902_Chapter_6

146 6. Supplemental Materials on Critical Regions and p-values

(i) χ21, obs < n − 3

p1 gn−1(b1) gn−1(b2) p2

b1 = χ21, obs b2

(ii) χ21, obs > n − 3

p1 gn−1(b1) gn−1(b2) p2

b1 b2 = χ21, obs

Figure 6.2 The p-value defined by (6.7) for a two-tailed χ2 test, where gn−1(·)denotes the density of the χ2(n − 1) with mode n − 3. (i) χ2

1, obs given by (6.4) is

at the left tail; (ii) χ21, obs is at the right tail .

Page 7: STAT2802-3902_Chapter_6

6.1 Tests on Normal Variances 147

6.1.2 Two-sample F test

The issue: Suppose that we want to test the null hypothesis H0: σ21 = σ2

2

against one of the alternatives H1: σ21 6= σ2

2 , H2: σ21 > σ2

2 , or H3: σ21 < σ2

2

on the basis of two independent random samples Xi1, . . . , Xini

iid∼ N(µi, σ

2i ),

where µi and σ2i are unknown, i = 1, 2.

(a) The critical region approach

Step 1: To Find a Test Statistic. Define νi = ni − 1, i = 1, 2. Note

that the distribution of

F =σ2

2S21

σ21S

22

is F (ν1, ν2) that does not depend on the means µi and the variances σ2i . In

other words, F is a pivotal quantity. The test statistic is

F0 =̂S2

1

S22

=σ2

2S21

σ21S

22

·σ2

1

σ22

= F ·σ2

1

σ22

.

When H0 is true, i.e., σ21 = σ2

2 , the test statistic is

F0 = F ∼ F (ν1, ν2). (6.10)

Step 2: To Determine a Critical Region of Size α. Since

α = α1 + α2

= Pr{F0 6 k1} + Pr{F0 > k2}, for H1, (6.11)

α = Pr{F0 > f(α, ν1, ν2)}, for H2,

α = Pr{F0 6 f(1 − α, ν1, ν2)}, for H3,

where k1 = f(1 − α1, ν1, ν2) and k2 = f(α2, ν1, ν2), the critical regions for

three alternatives H1, H2 or H3 are given by

C1 = {x: f0 6 k1 or f0 > k2}, (6.12)

C2 = {x: f0 > f(α, ν1, ν2)},

C3 = {x: f0 6 f(1 − α, ν1, ν2)},

where the observed value of the test statistic F0 is denoted by

f0 = s21/s

22. (6.13)

Page 8: STAT2802-3902_Chapter_6

148 6. Supplemental Materials on Critical Regions and p-values

Step 3: To Determine k1 and k2. Almost all textbooks simply use an

equal-tail approach by letting α1 = α2 = α/2 so that

k1 = f(1 − α/2, ν1, ν2) and k2 = f(α/2, ν1, ν2). (6.14)

However, for non-symmetrical density like this one, the equal-tail approach

leading to the results in (6.14) is not the correct likelihood ratio test.

In the follows, we present a simple method for determine k1 and k2 (see

Figure 6.3). We denote the density function of the F (ν1, ν2) by

hν1,ν2(x) =

Γ(ν1+ν2

2 )

Γ(ν1

2 )Γ(ν2

2 )

(

ν1

ν2

)

ν12

xν12−1

(

1 +ν1x

ν2

)

−ν1+ν2

2

, x > 0,

whose mode is (ν1−2)ν2

ν1(ν2+2) . The equal-height approach requires to determine

k1 and k2 satisfying (see Figure 6.3):

hν1,ν2(k1) = hν1,ν2

(k2)

subject to k1 < (ν1−2)ν2

ν1(ν2+2) < k2. Note that

k1 = f(1 − α1, ν1, ν2) and

k2 = f(α2, ν1, ν2).

That is, finding k1 and k2 is equivalent to finding α1 and α2. Since α2 =

α − α1, we only need to find α1, which satisfies the following equation:

hν1,ν2(f(1 − α1, ν1, ν2)) = hν1,ν2

(f(α − α1, ν1, ν2)). (6.15)

Especially, when hν1,ν2(·) is skew toward the left, we always have 0 < α1 <

α/2. Thus, we can use the grid method or the bisection method to find α1.

function(n1, n2, alone)

{

# Function name: Compute.alpha.one.F.test(n1, n2, alone)

# n1 and n2 are the sample sizes of the two samples

# alone is a vector consisting of the grid points

# over the interval [0.0001, 0.025], e.g.,

# seq(0.025, 0.001, -0.0001)

v1 <- n1 - 1

v2 <- n2 - 1

Page 9: STAT2802-3902_Chapter_6

6.1 Tests on Normal Variances 149

al <- 0.05

L <- length(alone)

k1 <- qf(alone, v1, v2)

k2 <- qf(1 - al + alone, v1, v2)

hk1 <- df(k1, v1, v2)

hk2 <- df(k2, v1, v2)

error <- abs(hk1 - hk2)

result <- matrix(c(alone, k1, k2, hk1, hk2, error), L, 6)

return(result)

}

Example 6.2 Let Xi1, . . . , Xinibe two independent random samples from

N(µi, σ2i ), i = 1, 2, and we observed n1 = 13, s2

1 = 37853.17, n2 = 7,

s22 = 15037.00. Test H0: σ2

1 = σ22 against H1: σ2

1 6= σ22 at α = 0.05.

Solution.

1◦ Since α = 0.05, from Table 6.3, we have α1 = 0.000657, k1 = 0.10889

and k2 = 4.0233. In this example, the mode of the density of F (12, 6)

is (ν1−2)ν2

ν1(ν2+2) = 0.625.

Table 6.3 Calculation of α1 from (6.15)

α1 k1 k2 hν1,ν2(k1) hν1,ν2

(k2) Error

0.025 0.26822 5.3662 0.31283 0.011359 0.30147

0.015 0.23158 4.6648 0.23300 0.017738 0.21526

0.005 0.17370 4.1885 0.11615 0.024730 0.09142

0.0005 0.10269 4.0176 0.02286 0.028046 0.00518

0.0006 0.10678 4.0212 0.02614 0.027971 0.00182

0.00065 0.10864 4.0230 0.02772 0.027934 0.00021

0.000657 0.10889 4.0233 0.02794 0.027928 1.4163 × 10−5

NOTE: Error = |hν1,ν2(k1) − hν1,ν2

(k2)|.

2◦ Reject the H0 if f0 6 0.10889 or f0 > 4.0233, where f0 is defined by

(6.13).

3◦ Calculate f0 = 2.5173.

4◦ Since f0 = 2.5173 ∈ (0.10889, 4.0233), we accept the H0. ‖

Page 10: STAT2802-3902_Chapter_6

150 6. Supplemental Materials on Critical Regions and p-values

1 − α

α1 hν1, ν2(k1) hν1, ν2

(k2) α2

k1 k2

Figure 6.3 The critical region C1 defined by (6.12) for a two-tailed F test, wherehν1,ν2

(·) denotes the density function of the F (ν1, ν2), and α1 + α2 = α.

(b) The p-value approach

The corresponding p-values can be calculated by

p-value = p1 + p2

= Pr{F0 6 b1} + Pr{F0 > b2}, for H1, (6.16)

p-value = Pr{F0 > f0}, for H2,

p-value = Pr{F0 6 f0}, for H3,

where p1, p2, b1 and b2 are shown in Figure 6.4, F0 is specified by (6.10)

and f0 given by (6.13) denotes the observed value of the test statistic F0.

To calculate the two-tailed p-value defined by (6.16), we consider two cases.

Case I: f0 is at the left tail. The value of b2 at the right tail can be

obtained by solving (see Figure 6.4(i))

hν1,ν2(b2) = hν1,ν2

(f0) subject to b2 >(ν1 − 2)ν2

ν1(ν2 + 2). (6.17)

Case II: f0 is at the right tail. The value of b1 at the left tail can be

determined by (see Figure 6.4(ii))

hν1,ν2(b1) = hν1,ν2

(f0) subject to 0 < b2 <(ν1 − 2)ν2

ν1(ν2 + 2). (6.18)

Page 11: STAT2802-3902_Chapter_6

6.1 Tests on Normal Variances 151

(i) f0 is at the left tail

p1 hν1, ν2(f0) hν1, ν2

(b2) p2

f0 b2

(ii) f0 is at the right tail

p1 hν1, ν2(b1) hν1, ν2

(f0) p2

b1 f0

Figure 6.4 The p-value defined by (6.16) for a two-tailed F test, where hν1, ν2(·)

denotes the density of the F (ν1, ν2) with mode (ν1−2)ν2

ν1(ν2+2) . (i) f0 given by (6.13) is

at the left tail; (ii) f0 is at the right tail .

Page 12: STAT2802-3902_Chapter_6

152 6. Supplemental Materials on Critical Regions and p-values

Example 6.2 (Revisited). Note that n1 = 13, n2 = 7 and f0 = 2.5173 >

0.625 = mode, then hν1,ν2(f0) = h12,6(2.5173) = 0.10241. Now (6.18) be-

comes hν1,ν2(b1) = 0.10241.

Table 6.4 Calculation of b1 from hν1,ν2(b1) = 0.10241

b1 |hν1,ν2(b1) − 0.10241| b1 |hν1,ν2

(b1) − 0.10241|

0.625 0.59141 0.170 0.007185

0.325 0.32767 0.168 0.003705

0.200 0.06411 0.166 0.000271

0.190 0.04426 0.1659 1.0091 × 10−4

0.180 0.02523 0.1658 6.9435 × 10−5

From Table 6.4, we obtain b1 = 0.1658 < 0.625. From (6.16),

p-value = p1 + p2

= Pr{F0 6 b1} + Pr{F0 > b2}

= Pr{F0 6 b1} + Pr{F0 > f0}

= Pr{F (12, 6) 6 0.1658} + Pr{F (12, 6) > 2.5173}

= 0.0041373 + 1 − 0.86694 = 0.1372 > 0.05

so that the H0 cannot be rejected. ‖