statistics for social and behavioral sciences session #15: interval estimation, confidence interval...

26
Statistics for Social and Behavioral Sciences Session #15: Interval Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad

Upload: edwina-horn

Post on 24-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Statistics for Socialand Behavioral Sciences

Session #15:Interval Estimation, Confidence Interval

(Agresti and Finlay, Chapter 5)

Prof. Amine Ouazad

Statistics Course Outline

PART I. INTRODUCTION AND RESEARCH DESIGN

PART II. DESCRIBING DATA

PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL

STATISTICS

PART IV. : CORRELATION AND CAUSATION: REGRESSION

ANALYSIS

Week 1

Weeks 2-4

Weeks 5-9

Weeks 10-14

This is where we talk about Zmapp and Ebola!

Firenze or Lebanese Express’s ratings are within a MoE of each other!

Last Session: Inference

• A conservative Margin of Error (= 2 standard errors) for Cafe Firenze’s restaurant rating is 1.1 with 14 votes.

• For any rating from 1 to 5, the largest possible Margin of Error is 4/√N, where N is the number of ratings.

• With TripAdvisor, we see the rating of each individual customer, and so we can calculate sX!

Central Limit Theorem:

• with a large sample size N, the sampling distribution of the sample mean is approximately normal.

• The mean of the sampling distribution is the population mean.• The standard deviation of the sampling distribution is sX/√N, where sX is the

standard deviation of X.

Today

• Use this margin of error to provide interval estimates:– A 95% confidence interval for Café Firenze is [2.3,4.5].– “The true rating of Café Firenze is between 2.3 and 4.5

with probability 95%”.– Note: average was 3.4 and MoE was 1.1.

– A 95% confidence interval for Cory Gardner’s vote share in Colorado is [48-3.6,48+3.6]=[44.4,51.6].

– “The true vote share for Cory Gardner is between 42.9% of the vote and 50.1% of the vote with 95% probability”.

– Note: MoE was 3.6.

News: Last Tuesday• We learnt the population proportion p !!!

– Proportion of voters for Cory Gardner.

• The latest poll was giving us asample proportion of the vote p (N around 1000).

Outline

1. Interval EstimationConfidence Interval

2. Choosing between 90, 95, 99% confidence

3. When distributions are normal: t-distribution

Next time: Estimation, Confidence Intervals (continued) Chapter 5 of A&F

Parameters and Interval Estimate• An interval estimate is an interval of numbers

around the point estimate, which includes the parameter with probability either 90%, 95%, or 99%.

• Example: “the interval estimate[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm]includes the population average height with probability 95%.”

• Sample mean: 156.2cm, MoE = 0.49 cm.

Parameters and Interval Estimate

• An interval estimate that includes the parameter with probability 95% is called a 95% confidence interval.

• The expression “95% confidence interval” is widely used.

• Example: “[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm]is a 95% confidence interval for the population average height.”

• Sample mean: 156.2cm, MoE = 0.49 cm.

How do we build a 95% confidence interval?

Goal: estimate the population average m.From previous sessions:[m – MoE ; m + MoE] includes the sample mean with probability 95%.We conclude: the interval [m – MoE; m+MoE] includes the population mean with probability 95%.

[m – MoE; m+MoE] is a 95% confidence interval for m.

MoE = 1.96 x Standard ErrorStandard Error = sX/√N

We use 1.96 instead of 2 from now on.

Outline

1. Interval EstimationConfidence Interval

2. Choosing between 90, 95, 99% confidence

3. When distributions are normal: t-distribution

Next time: Estimation, Confidence Intervals (continued) Chapter 5 of A&F

Choosing between 90%, 95%, 99%

• The interval estimate[Sample Mean – MoE, Sample Mean + MoE]includes the population mean (the parameter)with probability:

• 99% if MoE = 2.58 * Standard Error • 95% if MoE = 1.96 * Standard Error• 90% if MoE = 1.65 * Standard Error

• The width of a confidence interval:1. Increases as the confidence level increases.2. Decreases as the sample size increases.

Building 90%, 95%, 99%confidence intervals

Exercise:

• The sample mean weight (a sample of individuals in the US) is 60.0 kg, and the sample standard deviation is 29.9 kg.

• Find a 90% (resp., 95%, 99%) confidence interval for the population mean weight.

Why 90%, 95%, 99%?

• Invented by Jerzy Newman in the 1930s.

• R.A. Fisher developed the theory of statistical testing.

• Sample sizes were small at the time (a few hundred), and 95% seemed a reasonable confidence level.

• Medical sciences introduced confidence intervals in medicine soon after their discoveries.

• 95% became the standard.R.A. Fisher

Outline

1. Interval EstimationConfidence Interval

2. Choosing between 90, 95, 99% confidence

3. When distributions are normal: t-distribution

Next time: Estimation, Confidence Intervals (continued) Chapter 5 of A&F

Central Limit Theorem• Requires a large sample size N.• This is because it applies to any distribution of X.• Example #1:– We had a sample of N songs, and the number of times Xi

that song had been played.– The number of times Xi a song is played on Spotify does

not have a normal distribution. – But we can build a confidence interval for the average

number of times a song is played (m), provided we have a large enough number N of songs.

– MoE = 1.96 * sX/√N for a 95% confidenceinterval.

We can use our formulas to find a 95% confidence interval for m=360.63 as:• N is large.

Even though X does not have a normal distribution.

What if N is small?

• If N is “small”, the Central Limit Theorem does not apply….– We cannot use our formulas.

• “Small” ? Less than a few hundred (from experience).

• If N is very small:

These sampling distributions are not normal.

N=2

N=5

If N is small• sX is potentially very far from sx.• But… we can still find confidence intervals if X is normal.• The sampling distribution of the sample mean is Student’s

t distribution, with degrees of freedom (df) equal to N-1, and with standard deviation sx/√N.

If N is small

A 95% confidence interval for the sample mean is:[Sample Mean – MoE , Sample Mean + MoE]

With MoE = z * Standard Error.• z= 1.96 when the df = ∞ • z> 1.96 when the df are small. • See next table for the exact value of z.

t Table

Why is it called Student’s t distribution?

• The t distribution was allegedly invented by a person called Student.

• That “Student” was an engineer at Guinness’s Factories in Ireland: William Sealy Gossett.

• He was producing small samples of a drink, seeking guidance for industrial quality control:– He was trying a small number of samples

(N=2,4, perhaps 7).– And from these samples was trying to infer the quality of all

containers of the product (the population).W.S. Gosset and Some Neglected Concepts in Experimental Statistics: Guinnessometrics II, Stephen T. Ziliak, 2011.

Wrap up• Interval estimates for a population mean

(a parameter) when N is large, for any distribution of X.• Build a confidence interval for a parameter:

the interval [Sample Mean – MoE ; Sample + MoE]includes the parameter with probability:

99% if MoE = 2.58 * Standard Error 95% if MoE = 1.96 * Standard Error90% if MoE = 1.65 * Standard Error

• The t-distribution gives confidence intervals when the sample size N is small… and when the distribution of X is normal.

• Use z given by Table 5.1 of Agresti and Finlay for degrees of freedom N-1.

Coming up: Readings:• This week and next week:

– Chapter 5 entirely – estimation, confidence intervals.

• Online quiz deadline Tuesday 9am.• Deadlines are sharp and attendance is followed.

For help:

• Amine OuazadOffice 1135, Social Science [email protected] hour: Tuesday from 5 to 6.30pm.

• GAF: Irene [email protected] recitations. At the Academic Resource Center, Monday from 2 to 4pm.