chapter 5 sampling distributions. introduction distribution of a sample statistic: the probability...

15
Chapter 5 Sampling Distributions

Upload: randell-mason

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Sampling Distributions for Counts and Proportions Binary outcomes: Each individual or realization can be classified as a “Success” or “Failure” (Presence/Absence of Characteristic of interest) Random Variable X is the count of the number of successes in n “trials” Sample proportion: Proportion of succeses in the sample Population proportion: Proportion of successes in the population

TRANSCRIPT

Page 1: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Chapter 5

Sampling Distributions

Page 2: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Introduction

• Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a random sample or a randomized experiment– What values can a sample mean (or proportion)

take on and how likely are ranges of values?• Population Distribution: Set of values for a

variable for a population of individuals. Conceptually equivalent to probability distribution in sense of selecting an individual at random and observing their value of the variable of interest

Page 3: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Sampling Distributions for Counts and Proportions

• Binary outcomes: Each individual or realization can be classified as a “Success” or “Failure” (Presence/Absence of Characteristic of interest)

• Random Variable X is the count of the number of successes in n “trials”

• Sample proportion: Proportion of succeses in the sample

• Population proportion: Proportion of successes in the population

pnXp :Proportion Population :Proportion Sample

^

Page 4: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Binomial Distribution for Sample Counts

• Binomial “Experiment”– Consists of n trials or observations– Trials/observations are independent of one another– Each trial/observation can end in one of two possible

outcomes often labelled “Success” and “Failure”– The probability of success, p, is constant across

trials/observations– Random variable, X, is the number of successes observed in

the n trials/observations. • Binomial Distributions: Family of distributions for X,

indexed by Success probability (p) and number of trials/observations (n). Notation: X~B(n,p)

Page 5: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Binomial Distributions and Sampling

• Problem when sampling from a finite sample: the sequence of probabilities of Success is altered after observing earlier individuals.

• When the population is much larger than the sample (say at least 20 times as large), the effect is minimal and we say X is approximately binomial

• Obtaining probabilities:

nkknk

nkn

ppkn

kXP knk ,,1,0)!(!

!)1()(

Table C gives probabilities for various n and p. Note that for p > 0.5, use 1-p and you are obtaining P(X=n-k)

Page 6: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Example - Diagnostic Test• Test claims to have a sensitivity of 90% (Among people

with condition, probability of testing positive is .90)• 10 people who are known to have condition are

identified, X is the number that correctly test positive

10,,1,0)!10(!

!1010)1(.)9(.

10)( 10

k

kkkkkXP kk

k 0 1 2 3 4 5 6 7 8 9 10P(k) 1E-10 9E-09 3.64E-07 8.75E-06 0.000138 0.001488 0.01116 0.057396 0.19371 0.38742 0.348678

• Compare with Table C, n=10, p=.10

• Table obtained in EXCEL with function: BINOMDIST(k,n,p,FALSE)

(TRUE option gives cumulative distribution function: P(Xk)

Page 7: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Binomial Mean & Standard Deviation• Let Si=1 if the ith individual was a success, 0 otherwise

• Then P(Si=1) = p and P(Si=0) = 1-p

• Then E(Si)=S = 1(p) + 0(1-p) = p

• Note that X = S1+…+Sn and that trials are independent

• Then E(X)=X = nS = np

• V(Si) = E(Si2)-S

2 = p-p2 = p(1-p)

• Then V(X)=X2 =np(1-p)

)1()(),(~ pnpnpXEpnBX XX

For the diagnostic test:

95.0)1.0)(9.0(100.9)9.0(10

Page 8: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Sample Proportions• Counts of Successes (X) rarely reported due to

dependency on sample size (n)• More common is to report the sample proportion of

successes:

nXp

size samplesamplein successes of #^

npp

npppV

ppE

pp

p

)1()1(^^

^

2^

^

Page 9: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Sampling Distributions for Counts & Proportions

• For samples of size n, counts (and thus proportions) can take on only n distinct possible outcomes

• As the sample size n gets large, so do the number of possible values, and sampling distribution begins to approximate a normal distribution. Common Rule of thumb: np 10 and n(1-p) 10 to use normal approximation

tely)(approxima )1(,~

tely)(approxima )1(,~^

npppNp

pnpnpNX

Page 10: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Sampling Distribution for X~B(n=1000,p=0.2)Sampling Distribution of X (n=1000,p=0.2)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

1 29 57 85 113

141

169

197

225

253

281

309

337

365

393

421

449

477

505

533

561

589

617

645

673

701

729

757

785

813

841

869

897

925

953

981

# Successes

Prob

abili

ty

65.12)8)(.2(.1000)1(200)20(.1000 pnpnp XX

Page 11: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Using Z-Table for Approximate Probabilities• To find probabilities of certain ranges of counts or proportions,

can make use of fact that the sample counts and proportions are approximately normally distributed for large sample sizes.– Define range of interest– Obtain mean of the sampling distribution– Obtain standard deviation of sampling distribution– Transform range of interest to range of Z-values– Obtain (approximate) Probabilities from Z-table

2643.7357.1)63.0(1)63.0(63.00158.

50.051.0

0158.1000

)5.0)(5.0(:SD 50.0:Mean51.0 :Range

tosses1000|51.0 :ads)Tossing(HeCoin

^

^

^

^

^

ZPZPp

z

pp

npP

p

p

Page 12: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Sampling Distribution of a Sample Mean• Obtain a sample of n independent measurements of a

quantitative variable: X1,…,Xn from a population with mean and standard deviation – Averages will be less variable than the individual measurements– Sampling distributions of averages will become more like a

normal distribution as n increases (regardless of the shape of the population of individual measurements)

nn

nn

Xn

VXV

nn

Xn

EXE

XXi

Xi

2

22211

11

Page 13: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Central Limit Theorem• When random samples of size n are selected from aamy

population with mean and finite standard deviation , the sampling distribution of the sample mean will be approximately distributed for large n:

nn

NX largefor ely,approximat ,~

Z-table can be used to approximate probabilities of ranges of values for sample means, as well as percentiles of their sampling distribution

Page 14: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Exponential Distribution• Often used to model times: survival of components, to

complete tasks, between customer arrivals at a checkout line, etc. Density is highly skewed:

0.2

.4.6

.81

y

0 1 2 3 4 5x

0.5

11.5

y

0 1 2 3 4 5x

Individual Measurements (=1,=1) Sample means of size 10 (=1, =1/100.5=0.32)

Page 15: Chapter 5 Sampling Distributions. Introduction Distribution of a Sample Statistic: The probability distribution of a sample statistic obtained from a

Miscellaneous Topics

• Normal Approximation for sample counts and proportions is example of CLT (X=S1+…+Sn)

• Any linear function of independent normal random variables is normal (use rules on means and variances to get parameters of distribution)

• Generalizations of CLT apply to cases where random variables are correlated (to an extent) and have different distributions (within reason)– Variables made up of many small random influence

will tend to be approximately normal