a remark on the difference between sampling with and without replacement
TRANSCRIPT
![Page 1: A Remark on the Difference Between Sampling With and Without Replacement](https://reader036.vdocuments.us/reader036/viewer/2022081823/57509ec51a28abbf6b13d010/html5/thumbnails/1.jpg)
A Remark on the Difference Between Sampling With and Without ReplacementAuthor(s): David FreedmanSource: Journal of the American Statistical Association, Vol. 72, No. 359 (Sep., 1977), p. 681Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2286241 .
Accessed: 14/06/2014 17:40
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp
.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].
.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.
http://www.jstor.org
This content downloaded from 185.2.32.141 on Sat, 14 Jun 2014 17:40:32 PMAll use subject to JSTOR Terms and Conditions
![Page 2: A Remark on the Difference Between Sampling With and Without Replacement](https://reader036.vdocuments.us/reader036/viewer/2022081823/57509ec51a28abbf6b13d010/html5/thumbnails/2.jpg)
A Remark on the Difference Between Sampling With and Without Replacement
DAVID FREEDMAN*
The variation norm distance between sampling with and without replacement is calculated.
KEY WORDS: Sampling with replacement; Sampling without re- placement; Variation norm.
Let N be a positive integer, and let F = {1, ..., N}. Let Fk consist of the k-tuples of elements of F, to be thought of as all samples of size k from F. Sampling with replacement induces a probability P on Fk, with
Pt (f1. **Xfk) } = liNk
Let G consist of the vectors (fl, ... ., fk) E Fk for which all components are unequal. Then sampling without re- placement induces a probability Q on Fk, with
Q{I (fl *.*. fk) } = 1/Nk for (fi, . . ., fk) C G = 0 elsewhere,
where Nk =N(N-1). ... (N-k + 1)
By definition, jP - Ql = SUpA I P (A) -Q (A) I. Since P (f) is constant for f & Fk, and Q (g) is constant (and larger) for g E G, and Q (g) = 0 for g E G, it follows that
IP -Qll = Q(G) - P(G)
- 1-P(G)
- 1 - (Nk/Nk)
This proves the following proposition.
Proposition: ||P - Ql| = 1 - (Nk/Nk). As usual,
1 -E~Z xj < II (1-x,) < exp [- E xjl i j j~~~~~~~~~~~~~
for 0 < xi < 1 Nk kI1 --)
* David Freedman is Professor of Statistics, University of Cali- fornia at Berkeley. This research was supported in part by National Science Foundation Grant GP43085.
is bounded below by
1- 1 k 1! (k1) j=i N 2 N
and above by
r k-l l k(k -1)1 exp[-?E ] = exp [ -kk ) L j=1 N] 2 N J
The corollary follows.
Corollary:
[ 1 k(k 1) 1 k(k -1) 1-exp 1)1 < lip - QHl< J 2 N
The results of the Proposition and the Corollary are elementary, but I do not know any references to them.
A numerical example may be of interest. Suppose, e.g., that k = 1,000 and N = 100,000,000, then
.00498 < HP - Q < .00500
If k = 5,000 and N = 100,000,000, then
.117 < IIP - Qll < .125
When drawing a sample of 1,000 from a population of 100,000,000, there is almost no difference between draw- ing with or without replacement. When drawing a sample of 5,000, there is a substantial difference in variation norm. Of course, the distributions of statistics like sums may not change very much.
[Received September 1976. Revised January 1977.]
? Journal of the American Statistical Association September 1977, Volume 72, Number 359
Theory and Methods Section
681
This content downloaded from 185.2.32.141 on Sat, 14 Jun 2014 17:40:32 PMAll use subject to JSTOR Terms and Conditions