a remark on the difference between sampling with and without replacement

A Remark on the Difference Between Sampling With and Without ReplacementAuthor(s): David FreedmanSource: Journal of the American Statistical Association, Vol. 72, No. 359 (Sep., 1977), p. 681Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2286241 .

Accessed: 14/06/2014 17:40

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 185.2.32.141 on Sat, 14 Jun 2014 17:40:32 PMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=astata

http://www.jstor.org/stable/2286241?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


A Remark on the Difference Between Sampling With and Without Replacement

DAVID FREEDMAN*

The variation norm distance between sampling with and without replacement is calculated.

KEY WORDS: Sampling with replacement; Sampling without replacement; Variation norm.

Let N be a positive integer, and let F = {1, ..., N}. Let Fk consist of the k-tuples of elements of F, to be thought of as all samples of size k from F. Sampling with replacement induces a probability P on Fk, with

Pt (f1. **Xfk) } = liNk

Let G consist of the vectors (fl, ... ., fk) E Fk for which all components are unequal. Then sampling without replacement induces a probability Q on Fk, with

Q{I (fl *.*. fk) } = 1/Nk for (fi, . . ., fk) C G = 0 elsewhere,

where Nk =N(N-1). ... (N-k + 1)

By definition, jP - Ql = SUpA I P (A) -Q (A) I. Since P (f) is constant for f & Fk, and Q (g) is constant (and larger) for g E G, and Q (g) = 0 for g E G, it follows that

IP -Qll = Q(G) - P(G)

- 1-P(G)

- 1 - (Nk/Nk)

This proves the following proposition.

Proposition: ||P - Ql| = 1 - (Nk/Nk). As usual,

1 -E~Z xj < II (1-x,) < exp [- E xjl i j j~~~~~~~~~~~~~

for 0 < xi < 1 Nk kI1 --)

* David Freedman is Professor of Statistics, University of Cali- fornia at Berkeley. This research was supported in part by National Science Foundation Grant GP43085.

is bounded below by

1- 1 k 1! (k1) j=i N 2 N

and above by

r k-l l k(k -1)1 exp[-?E ] = exp [ -kk ) L j=1 N] 2 N J

The corollary follows.

Corollary:

[ 1 k(k 1) 1 k(k -1) 1-exp 1)1 < lip - QHl< J 2 N

The results of the Proposition and the Corollary are elementary, but I do not know any references to them.

A numerical example may be of interest. Suppose, e.g., that k = 1,000 and N = 100,000,000, then

.00498 < HP - Q < .00500

If k = 5,000 and N = 100,000,000, then

.117 < IIP - Qll < .125

When drawing a sample of 1,000 from a population of 100,000,000, there is almost no difference between drawing with or without replacement. When drawing a sample of 5,000, there is a substantial difference in variation norm. Of course, the distributions of statistics like sums may not change very much.

[Received September 1976. Revised January 1977.]

? Journal of the American Statistical Association September 1977, Volume 72, Number 359

Theory and Methods Section

681

This content downloaded from 185.2.32.141 on Sat, 14 Jun 2014 17:40:32 PMAll use subject to JSTOR Terms and Conditions


a remark on the difference between sampling with and without replacement

Documents