lecture 10. random sampling and sampling distributions david r. merrell 90-786 intermediate...

35
Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Lecture 10. Random Sampling and Sampling Distributions

David R. Merrell90-786 Intermediate Empirical

Methods for Public Policy and Management

Page 2: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Agenda

Normal Approximation to Binomial Poisson Process Random sampling Sampling statistics and sampling

distributions Expected values and standard

errors of sample sums and sample means

Page 3: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Binomial Random Variable

Binomial random variable X is the number of “successes” in n trials, where

Probability of success remains the same from trial to trial

Trials are independent

Page 4: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Binomial Probability Distribution

Discrete distribution with: P(X=x) = (n!/(x!(n-x)!))px qn-x

n is number of trials x is number of successes in n trials

(x = 0, 1, 2, ..., n) p is the probability of success on a single trial q is the probability of failure on a single trial

Page 5: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Properties of the Binomial RV

Mean: = np

Variance: = npq

Standard Deviation:

Page 6: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Binomial(n = 10, p = .4)

x P(X=x)0 0.0060471 0.0403112 0.1209323 0.2149914 0.2508235 0.2006586 0.1114777 0.0424678 0.0106179 0.00157310 0.000105

0 0.0060471 0.0403112 0.1209323 0.2149914 0.2508235 0.2006586 0.1114777 0.0424678 0.0106179 0.00157310 0.000105

Page 7: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Approximation to Binomial Distribution Use normal distribution when:

n is large np > 10 n(1 - p) > 10

Parameters of the approximating normal distribution are the mean and standard deviation from the binomial distribution

Page 8: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

605040302010

0.09

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

0.00

C1

C2

Approximation of Binomial Distribution

n = 80, p = .4

10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58

Page 9: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

How Good is the Approximation?

Normal with mean = 32.0000 and standard deviation = 4.38000

x P( X <= x) 28.0000 0.1806

x P( X <= x) 28.5000 0.2121

Binomial with n = 80 and p = 0.400000

x P( X <= x) 28.00 0.2131

P(X < 29)

Page 10: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Application 1

The Chicago Equal Employment Commission believes that the Chicago Transit Authority (CTA) discriminates against Republicans. The records show that 37.5% of the individuals listed as passing the CTA exam were Republicans; the remainder were Democrats (no one registers as an independent in Illinois). CTA hired 30 people last year, 25 of them were Democrats. What is the probability that this situation could exist if CTA did not discriminate?

Page 11: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Application 1 (cont.)

Success: a Republican is hired The probability of success, p = 0.375 The number of trials, n = 30 The number of successes, x = 5 P(x 5) = ???

Page 12: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Application 1 (cont.)

Mean: = np = 30*.375 =

11.25

Variance: = npq =

30*.375*.625 = 7.03

Standard Deviation: = 2.65Normal with mean = 11.25 and standard deviation = 2.65

x P( X <= x) 5.5000 0.0150

Page 13: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Poisson Process

time homogeneityindependenceno clumping

rate xxx

0 time

Assumptions

Page 14: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Poisson Process

Earthquakes strike randomly over time with a rate of = 4 per year.

Model time of earthquake strike as a Poisson process

Count: How many earthquakes will strike in the next six months?

Duration: How long will it take before the next earthquake hits?

Page 15: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Count: Poisson Distribution

What is the probability that 3 earthquakes will strike during the next six months?

Page 16: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Poisson Distribution

Count in time period t

P Y ye t

yy

t y

( )( )

!, , ,

0 1

Page 17: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Minitab Probability Calculation

Click: Calc > Probability Distributions > Poisson

Enter: For mean 2, input constant 3 Output:Probability Density FunctionPoisson with mu = 2.00000 x P( X = x) 3.00 0.1804

Page 18: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Duration: Exponential Distribution

Time between occurrences in a Poisson process

Continuous probability distribution Mean =1/t

Page 19: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Exponential Probability Problem

What is the probability that 9 months will pass with no earthquake?

t = 1/12, t= 1/3 1/ t = 3

Page 20: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Minitab Probability Calculation

Click: Calc > Probability Distributions > Exponential

Enter: For mean 3, input constant 9 Output:Cumulative Distribution FunctionExponential with mean = 3.00000 x P( X <= x) 9.0000 0.9502

Page 21: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Exponential Probability Density Function

MTB > set c1 DATA > 0:12000 DATA > end Let c1 = c1/1000 Click: Calc > Probability distributions > Exponential

> Probability density > Input column Enter: Input column c1 > Optional storage c2 Click: OK > Graph > Plot Enter: Y c2 > X c1 Click: Display > Connect > OK

Page 22: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Exponential Probability Density Function

1050

0.3

0.2

0.1

0.0

C1

C2

Page 23: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Sampling

Population - entire set of objects that we are interested in studying

Sample - a chosen subset of a population

Page 24: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Some Samples Are ...

random -- each item in the population has an equal chance of being selected to be part of the sample

representative -- has the same characteristics as the population under study, a microcosm of the population

Page 25: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Population Parameters and Sample Statistics

Population Parameter Numerical descriptor of a population Values usually uncertain e.g., population mean (), population

standard deviation () Sample Statistics

Numerical descriptor of a sample Calculated from observations in the sample e.g., sample mean , sample standard

deviation SX

Page 26: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

What is a sampling distribution?

Sample statistics are random variables

Sample statistics have probability distributions

“Sampling distribution” is the probability distribution of a sample statistic

Page 27: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

MTB > Retrieve 'C:\MTBWIN\DATA\RESTRNT.MTW'.Retrieving worksheet from file: C:\MTBWIN\DATA\RESTRNT.MTWWorksheet was saved on 5/31/1994MTB > info

Information on the Worksheet

Column Name Count MissingC1 ID 279 0C2 OUTLOOK 279 1C3 SALES 279 25C4 NEWCAP 279 55C5 VALUE 279 39C6 COSTGOOD 279 42C7 WAGES 279 44C8 ADS 279 44C9 TYPEFOOD 279 12C10 SEATS 279 11C11 OWNER 279 10C12 FT.EMPL 279 14C13 PT.EMPL 279 13C14 SIZE 279 16

Page 28: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

MTB > desc 'sales'

Descriptive Statistics

Variable N N* Mean Median TrMean StDev SEMeanSALES 254 25 332.6 200.0 248.9 650.5 40.8

Variable Min Max Q1 Q3SALES 0.0 8064.0 83.7 382.7

8000

7000

6000

5000

4000

3000

2000

1000

0

SA

LE

S

MTB > boxp 'sales'* NOTE * N missing = 25

Page 29: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

800070006000500040003000200010000

200

100

0

SALES

Fre

que

ncy

MTB > hist 'sales'* NOTE * N missing = 25

Page 30: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

MTB > let c15 = loge('sales')MTB > let c15 = loge('sales') J*** Values out of bounds during operation at J Missing returned 1 times

MTB > let c15 = loge('sales' + 1)MTB > name c15 'logsales'MTB > desc 'logsales'

Descriptive Statistics

Variable N N* Mean Median TrMean StDev SEMeanlogsales 254 25 5.1830 5.3033 5.2134 1.1387 0.0715

Variable Min Max Q1 Q3logsales 0.0000 8.9953 4.4394 5.9500

MTB > boxp 'logsales'* NOTE * N missing = 25

Page 31: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

9

8

7

6

5

4

3

2

1

0

log

sale

s

Page 32: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

9876543210

90

80

70

60

50

40

30

20

10

0

logsales

Fre

que

ncy

Page 33: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

76543

15

10

5

0

C16

Fre

que

ncy

8642

25

20

15

10

5

0

C17

Fre

que

ncy

765432

20

10

0

C18

Fre

que

ncy

76543

20

10

0

C19

Fre

que

ncy

Four Samples of Size 50 From Restaurant “Logsales” Data--Histograms

Page 34: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

MTB > Desc c16-c19

Descriptive Statistics

Variable N N* Mean Median TrMean StDev SEMeanC16 43 7 5.246 5.375 5.280 0.867 0.132C17 43 7 5.351 5.352 5.383 1.223 0.186C18 48 2 5.366 5.461 5.388 0.888 0.128C19 43 7 5.244 5.198 5.253 0.937 0.143

Variable Min Max Q1 Q3C16 2.773 6.621 4.625 5.787C17 1.099 8.456 4.710 6.176C18 2.485 7.091 4.961 5.994C19 3.434 6.868 4.595 6.089

Random Samples from Restaurant “Logsales” Data--Summary

Page 35: Lecture 10. Random Sampling and Sampling Distributions David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management

Next Time ...

Central Limit Theorem--”Sample averages are approximately normally distributed”