sjs sdi_151 design of statistical investigations stephen senn introduction to sampling
TRANSCRIPT
SJS SDI_15 1
Design of Statistical Investigations
Stephen Senn
Introduction to Sampling
SJS SDI_15 2
Representative Inference
• So far in the course we have been interested in comparisons– with some sort of causal investigation
• We now look at the case where we are interested in collecting representative material– samples to describe populations
• First we consider some possible applications
SJS SDI_15 3
Applications of Sampling Methods• Quality control of manufacturing processes
• Financial audit
• Opinion polls
• Clinical audit
• Anthropology
• Social surveys
• Ecological surveys– capture/recapture
SJS SDI_15 4
An Important Practical Distinction
All of these application areas require sampling theory and careful consideration as to how samples are drawn.
However some of them have a further difficulty, which is that the opinions of human beings have to be ascertained.
In what follows we shall often take opinion polls/social surveys as typical examples of sampling problems. (But our first example is not of this sort.) This will enable us to discuss also the further problems that arise in these contexts.
However, first we shall review some very elementary statistical concepts
SJS SDI_15 5
Standard Deviation/Standard Error
• There is common confusion between standard deviation and standard error
• The standard deviation describes the spread of original values
• The standard error is a measure of reliability of some statistic based on the original values
SJS SDI_15 6
An Illustration of This Difference
• This will now be illustrated using a simple example
• This example is again a medical one– My apologies!– I need a large data set– This one will have to do
SJS SDI_15 7
Example Surv_2
• Cross-over trial in asthma
• 790 baseline FEV1 readings
– Since baselines unaffected by treatment– Regard as homogenous sample– Ignore fact that they are repeated measures
• The following slide shows distribution of readings
SJS SDI_15 8
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
FEV1
Histogram of 772 Baseline FEV1 Readings from a Clinical Trial
Source: Senn, Lilenthal, Patalano and Till
SJS SDI_15 9
Distribution
• Curve skewed to the right– Clearly not Normal
• Statistics– Mean 1.965– Median 1.820– Variance 0.462
SJS SDI_15 10
Sampling
• Suppose that we take simple random samples of size 10– Take these at random from original distribution
• With replacment
• Calculate mean of these
• Study distribution of these means– This is what is called a sampling distribution
• Illustrated on next slide
SJS SDI_15 11
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Means of samples of size 10
Histogram of 500 means of samples of size 10
SJS SDI_15 12
Distribution• Curve less obviously skewed to the right
– Approximation to Normal is closer
• Distribution is narrower
• Statistics– Mean 1.961 (very similar to previously)– Median 1.948 (now much closer to mean)– Variance 0.043 (approximately 1/10 of
previous value)
SJS SDI_15 13
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Histograms of original data and means of samples of size 10
Original valuesSample means
SJS SDI_15 14
The Different Variances• Case 1
– Variance of original values
– The square root of this is the standard deviation
• Case 2– Variance of means
– Square root of these is standard error of the mean (SEM)
• In general– Square root of the variance of a statistic (e.g. a mean) is
a standard error
SJS SDI_15 15
Standard Deviation v Standard Error
• Standard deviation– Used to describe variation of original values
• Can be population
• Can be sample
• Standard error– Used to describe reliability of a statistic. For
example• SE of mean
• SE of treatment differences
SJS SDI_15 16
Estimating the Standard Error
The standard error of a simple random sample of size n drawn from a population with variance 2 is /n.
In practice 2, being a population parameter, is unknown so we estimate it using the sample variance, s2.
Hence we estimate the standard error of the mean by
s/n
SJS SDI_15 17
Transformations• Can be very valuable
– Improve accuracy of analysis
• Under-utilised
• Previous FEV1 example follows
– log-transformation– data more nearly Normal
• But will not deal with all problems– Outliers ( in particular “bad” values)
SJS SDI_15 18
-0.5 -0.2 0.1 0.4 0.7 1.0 1.3 1.6 1.9 2.2
log-Transformed FEV1 data with superimposed Normal distribution
SJS SDI_15 19
Normal Distribution
• Ideal mathematical representation
• Rarely applies in practice to original data
• However, many sampling distributions have approximately Normal form
• This increases its utility considerably
• A combination of transformation of original data plus averaging can frequently make it applicable
SJS SDI_15 20
Technical Terms(Schaeffer, Mendenhall and Ott)
• Element– Object on which a measurement is taken
• Population– A collection of elements about which we wish to make an
inference
• Sampling units– Nonoverlapping collection of elements from the population that
cover the entire population
• Sampling frame– A list of sampling units
• Sample– Collection of sampling units drawn from a frame
SJS SDI_15 21
Probability Sampling
• Well-defined sampling frame
• Probabilistic rule for drawing sample
• Knowledge of rule and sampling frame enables probabilistic statements about the population
• There are various types of such sample– simple, cluster, stratified
SJS SDI_15 22
Simple Random Sample
We shall encounter this in more detail in the next lecture.
For the moment we note a definition
“Sampling in which every member of the population has an equal chance of being chosen and successive drawings are independent” Mariott, A Dictionary of Statistical Terms
Only for simple random sampling is the standard error of the mean equal to /n
SJS SDI_15 23
Quota Sampling
• Sampling frame not used
• May have rough idea of population composition
• Sampling carries on until various quotas are fulfilled– e.g 100 males, 100 females
• Difficult to make probabilistic statements about population