evaluating the efficiency of the Δ-distribution mean estimator

10
Evaluating the Efficiency of the Δ-Distribution Mean Estimator Author(s): Stephen J. Smith Source: Biometrics, Vol. 44, No. 2 (Jun., 1988), pp. 485-493 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/2531861 . Accessed: 28/06/2014 13:22 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PM All use subject to JSTOR Terms and Conditions

Upload: stephen-j-smith

Post on 28-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

Evaluating the Efficiency of the Δ-Distribution Mean EstimatorAuthor(s): Stephen J. SmithSource: Biometrics, Vol. 44, No. 2 (Jun., 1988), pp. 485-493Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2531861 .

Accessed: 28/06/2014 13:22

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

http://www.jstor.org

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 2: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

BIOMETRICS 44, 485-493 June 1988

Evaluating the Efficiency of the A-Distribution Mean Estimator

Stephen J. Smith

Marine Fish Division, Bedford Institute of Oceanography, P.O. Box 1006, Dartmouth, Nova Scotia B2Y 4A2, Canada

SUMMARY

The A-distribution has been suggested as a useful model for the distribution of catch-per-tow data from trawl surveys for fish and plankton (Pennington, 1983, Biometrics 39, 281-286). Previously the efficiency of the A-distribution mean estimator K with respect to the sample mean had been evaluated by using a large-sample approximation for the variance of K. The exact form of the variance of K is derived here and used to study the efficiency for the small sample sizes typical of trawl surveys. Actual data from a recent trawl survey are used to assess the expected gains in precision from using K in practice.

1. Introduction

Stratified mean catch per tow from trawl surveys has been used as an index of relative abundance of commercial fish species on the Atlantic coast of both Canada and the United States (Azarovitz, 1981; Clark, 1981; Halliday and Koeller, 1981; Pitt, Wells, and McKone, 1981). Typically catch per tow (in numbers) exhibits a skewed distribution with many zero catches and some very large catches. A number of statistical distributions, such as the negative binomial (Taylor, 1953; Houser and Dunn, 1967), the gamma (Smith, 1981), and the log-normal and Poisson (Brodie and Wells, Northwest Atlantic Fisheries Organization Scientific Research Document 85/106, 1985) have been suggested as possible models for this kind of data. However, the gamma and the log-normal are not defined for observations of zero catches and the discrete distributions given above do poorly in accommodating the large catches when there are also many zero catches observed.

Recently, Pennington (1983, 1986) has suggested using the A-distribution discussed in Aitchison and Brown (1957) for modelling the distribution of catches from fish and plankton surveys. This distribution is simply a log-normal distribution modified to include observations of zeroes. Pennington's rationale for using this distribution is based on the assumption that the zeroes represent unoccupied areas, perhaps indicating unsuitable habitat, while nonzero catches of fish can be modelled as a positive random variate. In a sense, the occurrence of zero catches is decoupled from the distribution of the number of fish caught.

The statistical advantage of using the A-distribution is that the estimator of the mean defined for this distribution is more efficient than the ordinary sample mean currently used to estimate abundance within the strata of the study areas. This was proved by Finney (1941) for the log-normal and extended to the A-distribution by Aitchison and Brown (1957). Pennington (1986) has extended the graph of the efficiency of the sample mean and variance originally given in Aitchison and Brown for U2 = (0, 2) (with respect to the base normal distribution) to U2 = (0, 8). Unfortunately, these efficiency calculations are based on large-sample approximations of the variance of the A-distribution estimator of

Key words: A-distribution; Efficiency; Trawl surveys.

485

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 3: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

486 Biometrics, June 1988

the mean and may therefore present too optimistic a picture for small-sample situations that are more the rule for trawl surveys.

In this paper the exact form of the variance of the estimate of the mean for the A-distribution is derived. This form is then used to calculate the efficiency of the mean estimator and is compared to the large-sample approximation for sample sizes in the range commonly encountered within strata for trawl surveys. Data from trawl surveys carried out in the Scotian Shelf area are examined to assess the efficiency of the estimator when used in practice.

2. The A-Distribution

Let X be a random variate from a population such that the proportion of zero values is represented by 6 and the distribution of the nonzero values is log-normal, A(A, a2). That is,

PrIX< O} = 0;

PrX = }=6;

PrIX <, x} = 6 + (I - )A(x a u 2), X > O.

The distribution of X is denoted as A(b, au, o2) with mean and variance given as

E[X]= K = (1 )exp A )

var[X] = = (1 - 6)exp(2,. + a2)[exp(a2) - ( -)].

For a sample of n observations, let no denote the number of zero values observed and n1 the number of nonzero values observed such that n = nO + ni. Further, define yi = ln xi (i E ni), and let j and S2 be respectively the sample mean and variance of the y values. The estimators K and v2 are, respectively (Aitchison and Brown, 1957),

n' eygn(2 s) n, > I

K= X1In1= 1 n~~~~n

Onnl = 0

and

nii e2{[gn (2s2) - ( - I

gn,(fl 2

s2)J ni > 1

X2

n= 1 n

0 ni = 0

where

+ n, I +j=2 nj f 2 (n, + 21- 3)7' (1)

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 4: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

Efficiency of the Mean 487

The function gn1 (t) is defined such that

E[gn,(s2)] = exp(aa2)

and

lim [gn,(s2)] = exp(U2), n I -lo

where a = (n, - 1)/nI. The function in (1) belongs to the class of generalized hypergeometric functions and is well known in the log-normal literature (e.g., Bradu and Mundlak, 1970; Ebbeler, 1973; Hoyle, 1968; Mehran, 1973). This function can also be written as

m(nams2)

=

jEO

+

(2)

with m = n, - 1, a defined as above and E[4m(ams2)]

= exp(ao-2/2).

This form has certain advantages over that in (1), which will be discussed later.

Pennington (1983) has derived an unbiased estimator for var(K,) which is given as

{ n ni (2) (n - )g (n_ -2 52)] ni> 1

var(K')= (XI)2 * (3) n 0

t0 n, 0

3. The Exact Variance of K

Since K is a function of jointly sufficient statistics g and s2, it has been shown that it is also the minimum variance unbiased estimator of its expectation (Aitchison and Brown, 1957). Therefore, var(K,) < var(x) and this is usually expressed in terms of "efficiency," which is defined as

-var(K') eff0x= var( ) x 100%. (4)

The variance of the sample mean is calculated as

var(xf) = - = (1 a) exp(2t + -2) [exp(a2)-(1-

Aitchison and Brown (1957) assumed that for large n and 6 appreciably less than 1 the variance of K could be approximated by

va(K) =x(2 + [(i 6) + - (I1

- 6)(2a2 + a4] (5)

which is itself an approximation of the large-sample form given by Finney (1941) for the variance of the log-normal mean a'. Owen and DeRouen (1980) based their estimator of the variance of K on the form in (5) by substituting sample estimates for the parameters. They reported that based on simulation studies this estimator performed well as an estimator of (5) for sample sizes of 15 or greater with respect to a mean-squared-error criterion.

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 5: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

488 Biometrics, June 1988

The exact variance of K is derived from results given in Mehran (1973) for the log-normal mean, a = exp(,u + o2/2). Extending these results to the A-distribution (see the Appendix) gives

var(k) = a2{ Pr(n1 = i)QL)exp(2)4n4Q1 n ) a4J-(1- (6)

The form in (2) is preferred to that in (1) here because Hoyle (1968) and Mehran (1973) have shown that the expectation of the square of (2) required in the derivation of the above can be expressed in the same form, whereas the expectation of the square of (1) is usually written as a separate function.

Var(k) and var(x) are equivalent when n = 2. To see this, note that the generalized hypergeometric functions in (1) for n, = 2 and in (2) for m = 1 reduce to the simple form (Ebbeler, 1973)

g2(t) = @,(t) = cosh( \).

Therefore, we have from (6) for n, = 2,

P(2) a2ap2 =T2 exp(( 2) +

Expanding the expression in (6) for n = 2 and substituting the above for n, = 2, we obtain

2 ~~exp(0-2) exp( U2) +1 ( ) var(k) [Prnl = + Pr(n = 2) 2

(1 - 6)a 2 = ( 2 ) [3 exp(U2) + (1 - 6)exp(U2) + (1 - 3) - 2(1 - 6)]

[exp(a2 - 2

which is equivalent to var(xk) when n = 2. Therefore, the efficiency in this case is always 100%.

4. Efficiency in Small Samples

The efficiency of xk with respect to K was calculated for n = 3, 4, 8, 15 from (4) using both the approximation' (5) and the exact form (6) for var(K,). Typically sample sizes for trawl surveys on the Scotian Shelf range from 2 to 8 within strata. The results of these calculations for r2 and 6 = .1 and .5 are presented in Figures 1-4.

For sample sizes in the range for trawl surveys the approximate efficiency presents a far more optimistic picture than the exact form does. For example, when n = 3 and 4, 3 = .1, and U2 = 3, the approximation indicates that var(k) will be on average 58.3% smaller than var(x), whereas it will be 14% and 24% smaller, respectively (Figs 1 and 2). At a2 = 3, increasing the sample size to 8 results in var(k) being 40.8% smaller (Fig. 3). At n = 15 (Fig. 4) the variance of K is close to 49% of the variance of the sample mean. Gains in efficiency appear to be negligible in all cases for U2 < 1 and minimal for U2 < 3 for the smaller sample sizes (n = 3 and 4). Increasing 6 also reduces the gains in efficiency considerably for the smaller sample sizes and for the larger values of a2.

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 6: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

Efficiency of the Mean 489

1-00 1 . --_ _

800

60 1 2

F Approximation L t 40 [

20_

_

0 1 2 3 4

Variance in Log Scale

Figure 1. Comparison of the exact and the approximate forms for the efficiency of K' with respect to x[n =3; 6 =I. (solid line), .5 (dashed line)].

100 C~-_

- <\ \ ~~~~~~~~~~~~Exac-t

60 -\

a) S Approximation

, 40_

20

O 1 2 3 4

Variance in Log Scale

Figure 2. Comparison of the exact and the approximate forms for the efficiency of K' with respect to x [n = 4; 6 = A (solid line), .5 (dashed line)].

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 7: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

490 Biometrics, June 1988

100

: \ E~~~~~~~~~~xact-

P ~~~~~~~~Approximation

200

0 1 2 3 4

Variance in Log Scale

Figure 3. Comparison of the exact and the approximate forms for the efficiency of K with respect to x [n = 8; 5 = .1 (solid line), .5 (dashed line)].

100

_ ~~~~~~~~Approximation

20

o 1 2 3 4

Variance in Log Scale

Figure 4. Comparison of the exact and the approximate forms for the efficiency of K with respect to x [n = 15; 6 = .1 (solid line), .5 (dashed line)].

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 8: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

Efficiency of the Mean 491

5. Application to Trawl Survey Data

The results in Figures 1-4 illustrate that substantive gains in precision from using K instead of x depend on n, a,2, and 3. Trawl surveys based on a stratified random design have been carried out on the Scotian Shelf since 1970. Small sample sizes within strata are usual, given the large area that is to be covered over a short period of time. The northern part of the surveys includes three strata in the Cape Breton area, which cover the range of what is believed to be a distinct stock of cod (Gadus morhua). Stratified mean catch per tow from these strata are used as an annual index of abundance for this cod stock (Smith and Sinclair, Canadian Atlantic Fisheries Scientific Advisory Committee Research Document 86/39, 1986). For the 51 strata available (17 years x 3 strata per year), 28 had two or fewer nonzero observations. Of the remaining 23 with between three and five nonzero observa- tions, only 11 had a2 greater than 1.0 and only five greater than 2.0. Application of K to these data would result in minimal gains in precision compared to using the sample mean.

Having small sample sizes has also resulted in minimizing the possible gains in precision from using a stratified random design (Gavaris and Smith, 1987). In March 1986 a new stratified design was implemented, for part of the Scotian Shelf, which has fewer strata and more samples within strata than the previous design. Preliminary results from this survey for catches of cod and haddock (Melanogrammus aeglefinus) are presented in Table 1. The efficiency calculations were made by using S2 in place of a 2. For cases where there were greater than two nonzero observations it appears that using K can offer substantial gains in precision over x for this study.

6. Discussion and Conclusions

The results presented here show that although K is a more efficient estimator than x, potential gains in precision must be evaluated with respect to the values of n, a 2, and 3. Use of the large-sample approximation to the variance of K implicit in the efficiency results in Figure 1 of Pennington (1986) will be misleading when evaluating the estimator for small sample sizes typical of field surveys. This will be especially true when only two nonzero observations are available within a stratum or study area. Increasing the number

Table 1 Expected efficiency for A-distribution mean estimatorfor data from 1986 March cruise

on Northern Scotian Shelf

Cod Haddock

Stratum ? n, s eff(x) 6 n S2 eff(x) 401 .43 4 .13 99.99 1.00 0 402 .38 8 4.53 46.99 .54 6 1.80 91.68 403 .13 7 3.90 47.57 .75 2 .46 100.00 404 .00 1 .00 1 405 .00 2 .00 100.00 .00 2 5.74 100.00 406 .27 8 .94 94.51 .27 8 3.65 52.46 407 .14 6 4.99 40.83 .71 2 .00 100.00 408 .29 10 3.53 49.50 .36 9 5.21 35.13 409 .38 5 15.06 35.64 .88 1 .00 100.00 410 .71 2 .00 100.00 .00 7 2.40 68.19 411 1.00 0 1.00 0

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 9: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

492 Biometrics, June 1988

of samples within strata appears to allow for substantial gains in precision for K for the examples presented in Table 1.

ACKNOWLEDGEMENTS

The author wishes to thank Mr L. P. Fanning (Marine Fish Division) and Dr C. Field (Dalhousie University) for comments on this manuscript. Mr S. Gavaris (Marine Fish Division) provided useful comments on an earlier draft. Two anonymous referees provided valuable comments on the final draft.

RESUMt

La distribution delta a et6 propos6e, pour modeliser la distribution de donnees de captures par traits de chalut, issues de sondages concernant les poissons et le plancton (Pennington, 1983, Biometrics 39, 281-286). L'efficience de K^ (estimateur de la moyenne, defini pour cette distribution delta), comparativement a la moyenne de l'echantillon, 6tait prec6demment evaluee en utilisant une approximation de la variance de K, approximation correspondant au cas de grands 6chantillons. Ici, la formulation exacte de la variance de K est extraite et est utilisee pour 6tudier l'efficience dans le cas d'un echantillon de petite taille, typique des sondages au chalut. Des donnees reelles, issues d'un sondage r6cent, sont utilisees pour estimer le gain de precision attendu dans un cas pratique d'emploi de K.

REFERENCES

Aitchison, J. and Brown, J. A. C. (1957). The Log-normal Distribution. Cambridge: Cambridge University Press.

Azarovitz, T. R. (1981). A brief historical review of the Woods Hole Laboratory trawl survey time series. In Bottom Trawl Surveys, W. G. Doubleday and D. Rivard (eds). Canadian Special Publication of Fisheries and Aquatic Sciences 58, 62-67.

Bradu, D. and Mundlak, Y. (1970). Estimation in log-normal linear models. Journal of the American Statistical Association 65, 198-211.

Clark, S. H. (1981). Use of trawl survey data in assessments. In Bottom Trawl Surveys, W. G. Doubleday and D. Rivard (eds). Canadian Special Publication of Fisheries and Aquatic Sciences 58, 82-92.

Ebbeler, D. H. (1973). A note on estimation in log-normal linear models. Journal of Statistical Computing and Simulation 2, 225-231.

Finney, D. (1941). On the distribution of a variate whose logarithm is normally distributed. Journal of the Royal Statistical Society, Supplement 7, 151-161.

Gavaris, S. and Smith, S. J. (1987). Effect of allocation and stratification strategies on the precision of survey abundance estimates for Atlantic cod (Gadus morhua) on the eastern Scotian shelf. Journal of Northwest Atlantic Fisheries Science 7, 137-144.

Halliday, R. G. and Koeller, P. A. (1981). A history of Canadian groundfish trawling surveys and data usage in ICNAf Divisions 4TVWX. In Bottom Trawl Surveys, W. G. Doubleday and D. Rivard (eds). Canadian Special Publication of Fisheries and Aquatic Sciences 58, 27-41.

Houser, A. and Dunn, J. E. (1967). Estimating the size of the thread-fin shad population in Bull Shoal reservoir from midwater trawl catches. Transactions of the American Fisheries Society 96, 176-184.

Hoyle, M. H. (1968). The estimation of variances after using a Gaussinating transformation. The Annals of Mathematical Statistics 39, 1125-1143.

Mehran, F. (1973). Variance of the MVUE for the log-normal mean. Journal of the American Statistical Association 68, 726-727.

Owen, W. J. and DeRouen, T. A. (1980). Estimation of the mean for log-normal data containing zeroes and left-censored values, with application to the measurement of worker exposure to air contaminants. Biometrics 36, 707-719.

Pennington, M. (1983). Efficient estimators of abundance, for fish and plankton surveys. Biometrics 39, 281-286.

Pennington, M. (1986). Some statistical techniques for estimating abundance indices from trawl surveys. Fishery Bulletin 84, 519-525.

Pitt, T. K., Wells, R., and McKone, W. D. (1981). A critique of research vessel otter trawl surveys by the St. John's Research and Resource Services. In Bottom Trawl Surveys, W. G. Doubleday and D. Rivard (eds). Canadian Special Publication of Fisheries and Aquatic Sciences 58, 42-61.

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions

Page 10: Evaluating the Efficiency of the Δ-Distribution Mean Estimator

Efficiency of the Mean 493

Smith, S. J. (1981). A comparison of estimators of location for skewed populations, with applications to groundfish trawl surveys. In Bottom Trawl Surveys, W. G. Doubleday and D. Rivard (eds). Canadian Special Publication of Fisheries and Aquatic Sciences 58, 154-163.

Taylor, C. C. (1953). Nature of variability in trawl catches. Fishery Bulletin 54, 145-166.

Received December 1986; revised August 1987.

APPENDIX

Since the A\-distribution is based on the log-normal, we need to obtain the exact form of the variance of the log-normal mean estimator a' in order to derive the variance for K^. The exact form for var(a&) has been derived by Mehran (1973) and by Bradu and Mundlak (1970) for a more general regression case. In the former case the variance is derived as

var(a&) = E[( - af)2] = E(a2)a-

The first term on the right-hand side is evaluated as

E(ae2= E[exp(2P)]E{ [(-n - 1)2 52j}.

It is more straightforward to deal with the series in (2) than with that in (1) because Mehran (1973) has shown that

[(n 1)2 2 [(n ) 2 [( n - 1

The approach of Bradu and Mundlak requires a separate function to give the expected value of the square of the expression in (1). Therefore, since

E[exp(2y)] = exp(2M + 2 -) a exp[ (-n 2)

we have

var(&) = a (P) [( n ) d }

The var(K ) is just an extension of the above to the A\-distribution case such that

var(K ) = E[(K - K)2] = E(K ) - K

The expected value of K^2 is evaluated as

E( = Pr(n, = i)E[K I ni = i] i=O

exp(2A + 2 T2) n

= 0 + Pr(nl = 1) 2 + E Pr(n =i)E(K2 n= i) n i=2

nai-2 ( I2nlLn / J\n2 i]

= Pr(P( = l)Qk) E Pr(ni = i)4 aep nl[(n 1)2 E 4]i

Therefore, the variance of K is given by

var(K ) = a2{ Pr(n = i )Q9exp(Pi )4Y[(n 1 ) ] }

It can be easily shown that the estimator in (3) is an unbiased estimator of the above.

This content downloaded from 185.31.194.167 on Sat, 28 Jun 2014 13:22:54 PMAll use subject to JSTOR Terms and Conditions