sjs sdi_151 design of statistical investigations stephen senn introduction to sampling

SJS SDI_15 1

Design of Statistical Investigations

Stephen Senn

Introduction to Sampling

SJS SDI_15 2

Representative Inference

• So far in the course we have been interested in comparisons– with some sort of causal investigation

• We now look at the case where we are interested in collecting representative material– samples to describe populations

• First we consider some possible applications

SJS SDI_15 3

Applications of Sampling Methods• Quality control of manufacturing processes

• Financial audit

• Opinion polls

• Clinical audit

• Anthropology

• Social surveys

• Ecological surveys– capture/recapture

SJS SDI_15 4

An Important Practical Distinction

All of these application areas require sampling theory and careful consideration as to how samples are drawn.

However some of them have a further difficulty, which is that the opinions of human beings have to be ascertained.

In what follows we shall often take opinion polls/social surveys as typical examples of sampling problems. (But our first example is not of this sort.) This will enable us to discuss also the further problems that arise in these contexts.

However, first we shall review some very elementary statistical concepts

SJS SDI_15 5

Standard Deviation/Standard Error

• There is common confusion between standard deviation and standard error

• The standard deviation describes the spread of original values

• The standard error is a measure of reliability of some statistic based on the original values

SJS SDI_15 6

An Illustration of This Difference

• This will now be illustrated using a simple example

• This example is again a medical one– My apologies!– I need a large data set– This one will have to do

SJS SDI_15 7

Example Surv_2

• Cross-over trial in asthma

• 790 baseline FEV1 readings

– Since baselines unaffected by treatment– Regard as homogenous sample– Ignore fact that they are repeated measures

• The following slide shows distribution of readings

SJS SDI_15 8

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

FEV1

Histogram of 772 Baseline FEV1 Readings from a Clinical Trial

Source: Senn, Lilenthal, Patalano and Till

SJS SDI_15 9

Distribution

• Curve skewed to the right– Clearly not Normal

• Statistics– Mean 1.965– Median 1.820– Variance 0.462

SJS SDI_15 10

Sampling

• Suppose that we take simple random samples of size 10– Take these at random from original distribution

• With replacment

• Calculate mean of these

• Study distribution of these means– This is what is called a sampling distribution

• Illustrated on next slide

SJS SDI_15 11

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Means of samples of size 10

Histogram of 500 means of samples of size 10

SJS SDI_15 12

Distribution• Curve less obviously skewed to the right

– Approximation to Normal is closer

• Distribution is narrower

• Statistics– Mean 1.961 (very similar to previously)– Median 1.948 (now much closer to mean)– Variance 0.043 (approximately 1/10 of

previous value)

SJS SDI_15 13

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Histograms of original data and means of samples of size 10

Original valuesSample means

SJS SDI_15 14

The Different Variances• Case 1

– Variance of original values

– The square root of this is the standard deviation

• Case 2– Variance of means

– Square root of these is standard error of the mean (SEM)

• In general– Square root of the variance of a statistic (e.g. a mean) is

a standard error

SJS SDI_15 15

Standard Deviation v Standard Error

• Standard deviation– Used to describe variation of original values

• Can be population

• Can be sample

• Standard error– Used to describe reliability of a statistic. For

example• SE of mean

• SE of treatment differences

SJS SDI_15 16

Estimating the Standard Error

The standard error of a simple random sample of size n drawn from a population with variance 2 is /n.

In practice 2, being a population parameter, is unknown so we estimate it using the sample variance, s2.

Hence we estimate the standard error of the mean by

s/n

SJS SDI_15 17

Transformations• Can be very valuable

– Improve accuracy of analysis

• Under-utilised

• Previous FEV1 example follows

– log-transformation– data more nearly Normal

• But will not deal with all problems– Outliers ( in particular “bad” values)

SJS SDI_15 18

-0.5 -0.2 0.1 0.4 0.7 1.0 1.3 1.6 1.9 2.2

log-Transformed FEV1 data with superimposed Normal distribution

SJS SDI_15 19

Normal Distribution

• Ideal mathematical representation

• Rarely applies in practice to original data

• However, many sampling distributions have approximately Normal form

• This increases its utility considerably

• A combination of transformation of original data plus averaging can frequently make it applicable

SJS SDI_15 20

Technical Terms(Schaeffer, Mendenhall and Ott)

• Element– Object on which a measurement is taken

• Population– A collection of elements about which we wish to make an

inference

• Sampling units– Nonoverlapping collection of elements from the population that

cover the entire population

• Sampling frame– A list of sampling units

• Sample– Collection of sampling units drawn from a frame

SJS SDI_15 21

Probability Sampling

• Well-defined sampling frame

• Probabilistic rule for drawing sample

• Knowledge of rule and sampling frame enables probabilistic statements about the population

• There are various types of such sample– simple, cluster, stratified

SJS SDI_15 22

Simple Random Sample

We shall encounter this in more detail in the next lecture.

For the moment we note a definition

“Sampling in which every member of the population has an equal chance of being chosen and successive drawings are independent” Mariott, A Dictionary of Statistical Terms

Only for simple random sampling is the standard error of the mean equal to /n

SJS SDI_15 23

Quota Sampling

• Sampling frame not used

• May have rough idea of population composition

• Sampling carries on until various quotas are fulfilled– e.g 100 males, 100 females

• Difficult to make probabilistic statements about population

sjs sdi_151 design of statistical investigations stephen senn introduction to sampling

Documents

slide slide

sjs sdi

sampling slide

standard error slide

frame slide

stratified slide

applicable slide

s n slide