proportions

52
+ Proportions Estimating population proportions Difference of proportions

Upload: fallon

Post on 24-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Proportions. Estimating population proportions Difference of proportions . Review: Setting up a c.i. around a proportion. 1. estimate the proportion 2. Take the SD with this formula: s = sqrt (p * (1-p))  3. Find the s.e. with this formula:   s / sqrt (n) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Proportions

+

Proportions

Estimating population proportionsDifference of proportions

Page 2: Proportions

+Review: Setting up a c.i. around a proportion1. estimate the proportion2. Take the SD with this formula:

s= sqrt(p * (1-p)) 3. Find the s.e. with this formula:  

s / sqrt(n)4. Set up the confidence interval with this formula: 

 proportion plus or minus t * s.e.

Page 3: Proportions

+ExampleA teacher offers extra tutoring in math. He takes a sample of 100 students who went through his tutoring sessions, and found that 73 started to get higher marks. He wants to figure out the 95% confidence limits in his survey before he goes to the principal and tells her the program was successful.

Page 4: Proportions

+Step 1-2Step 1: estimate the population proportion     =0.73 start to perform better in mathStep 2: get the sample standard deviation using this formula: s= sqrt(p * (1-p))      =sqrt(0.73 * 0.27)     =0.444

Page 5: Proportions

+Step 3Step 3: Use this in order to find the standard error:

= s / sqrt(n)      =0.44/ sqrt(100) = 0.044

Page 6: Proportions

+Build confidence interval

Step 4: What are the 95% confidence limits of the proportion? Since n is bigger than 30, the normal curve can be used. Set up a confidence interval using this formula: proportion plus or minus t * s.e.

=0.73 + or - 1.96 * 0.044=0.73 + or - 0.087=0.63 to 0.81

Page 7: Proportions

+

Testing the difference between 2 groupsChapter 14

Page 8: Proportions

+Difference of means test Used if someone wants to know if two sample means or proportions are different (statistically)     -could both sample means have been drawn

from the same population (and the difference is attributed to chance alone)

     -or are they so different that there is no way they

could have been drawn from the same population 

Page 9: Proportions

+Vs. what we’ve already done

-So far, we have used single samples, meaning we wanted to see if a single sample of a particular mean could be drawn from a population with a known or hypothesized mean 

now we use 2 samples

Page 10: Proportions

+A sample problem to walk through the logic of these types of problemsA veterans support agency offers continuing education seminars for veterans. They want to evaluate the effect it has on job placement. The agency has half the veterans take the seminars, and the other half does not. They randomly select 50 who have done the seminars and 50 who have not. They want to evaluate whether the job placement rates are different.

Page 11: Proportions

+Formulate a research and a null hypothesis

The research hypothesis: tests whether one of the sample means is larger or smaller than the other sample mean (difference of means)

The null hypothesis: when you fail to reject it, you’re saying that the population means in questions are not different (i.e. an after school reading program didn’t raise mean scores)

Page 12: Proportions

+For this example:H_A: Employees with the seminar will have higher job placement rates

H_0: Employees who attend the seminar and those who do not attend the seminar will show no difference in job placement rates

Page 13: Proportions

+What does it mean to reject the null?

-If we fail to reject the null, this is like saying the mean scores of the two populations show no difference      -i.e. the population mean is the same and the

seminars don’t lead to higher job placement 

-If we reject the null, the conclusion there is that the test scores were different and job placement rates are in fact different

Page 14: Proportions

+Practice Problem The career development center wants to see if its new

ad campaign is working, so they can decide whether or not to fire their intern and use the money elsewhere. They randomly sample 9 MIT courses and send them weekly reminders about the career center. They randomly sample another 9 courses and send no emails. They then record the mean student visits per sample.

The average visits in courses with no ads is 135 (SD=110) per term

The average visits in courses with ads is 405 (SD=135) per term

Page 15: Proportions

+Thinking through the problem

The first step is to state the null and alternative hypothesis: Even if you aren’t asked for them, it helps you think Ho: the ads have had no effect on visitations to the career

center Ha: the ads have increased visitations to the career center

What you’re conceptualizing here is the difference between the average visitations: Mu(experiment) – Mu(control) = d (difference)

Page 16: Proportions

+Perform calculations (that we already know how to do) The mean and SD were already given:

The average visits in courses with no ads is 135 (SD=110) per term

The average visits in courses with ads is 405 (SD=135) per term

Get the Standard error: SD / sqrt(n) Pre advertisements: 110 / sqrt (9) = 36.667 Post advertisements: 135 / sqrt (9) = 45

Page 17: Proportions

+New step for difference of means Calculate the pooled standard error with the following

formula:

=58.04

Page 18: Proportions

+New step 2 for difference of means Get the t score using the pooled standard error s.e._d.

Why? Because the point of the difference of means test is to see the probability that the groups could have been drawn by the same population and the difference is just by chance

The formula for the t score is:

(135-405)/58.04= -4.65

Page 19: Proportions

+Look this t score up in the t table Find the degrees of freedom (n1+n2-2)=18-2=16 P=.0001 Reject null

Page 20: Proportions

+Can also use a calculator

http://stattrek.com/online-calculator/t-distribution.aspx

Page 21: Proportions

+

Types of difference tests

Page 22: Proportions

+Before we start

We see the word “variance” a lot in this chapter The variance is just the standard deviation squared

Page 23: Proportions

+General types of difference of means tests Independent samples: you have two samples that are not

paired or matched in any way These samples were obtained using random sampling methods Example: someone at the IRS picks 2 samples from a database

of 250 tax returns These samples could have equal or unequal variances (more on

that later)

Dependent Samples: “before and after test” where each item in one sample is paired with an item in the second sample Example: an agency selects 20 people with low performance

scores and has them do a workshop for a month, the same 20 employees are then tested again after the workshop to see if they improve

Page 24: Proportions

+Difference of means: Independent samples, unequal variances

If you don’t know what type of difference test you’re doing assume it is this one This is the most conservative test and

the one you see the most in real life studies

Conservative means it is hard to reject the null hypothesis

Why? The standard error calculations take large differences in sample variances (s^2) into account Sampling error is to blame for

unequal variances

Page 25: Proportions

+Calculating the degrees of freedom Use this formula (it produces smaller df which makes

the test more conservative versus the n1+n2-2 formula:

Remember, the lower the df, the bigger the test statistic needs to be when deciding to reject the null or not

Page 26: Proportions

+Calculating degrees of freedom

The formula is useful when the number of cases in each sample is different Or if the number of cases in each sample is small (less than 30) Example: if sample one has 150 cases and sample 2 only has 20

The variances will be different

More conservative tests make it harder to commit a type I error

Page 27: Proportions

+Practice Problem: difference of means, independent samples, unequal variances

The president of MIT wants to know if a new technology program for professors has made them use interactive visual aides more in the classroom. He randomly selects 10 courses where the professors in them received the training, and 8 courses where they have not yet taken the course.

Mean use of visual aides per term

s

No course 32.7 6.4Tech course 37.6 6.3

Page 28: Proportions

+State the null and alternative hypotheses Ho: the use of visual aides with course = use without

course Ha: the use of visual aides with course > use without

course

Page 29: Proportions

+Calculate the standard error

Mean use of visual aides per term

s

No course 32.7 6.4Tech course 37.6 6.3

• Use s/sqrt(n)• No course: 6.4 / sqrt 8= 2.262• Course: 6.3 / sqrt 10=1.99

Page 30: Proportions

+Get the pooled standard error

Use this formula:

=3.015

Page 31: Proportions

+Get the t score

Use this formula:

t= (32.7-37.6) / 3.015=-1.625

Page 32: Proportions

+Calculate df

Use this formula:

Numerator: [(6.4 ^2 / 8) + (6.3 ^2 /10)] ^2 =82 Denominator: 3.74 + 1.75 82/(3.74+1.75) = 14.93 Round to 15 (compare to 10+8-2)=16

Page 33: Proportions

+Look up in t table

df=15 T=-1.625 =0.2012 ~20% chance these samples were taken from the same

population Between .10 and .05

Page 34: Proportions

+Visualized

Page 35: Proportions

+Difference of means, independent samples, equal variances Less conservative than the test for unequal variances

Because the former makes for larger standard errors and higher t scores

You determine if two sample variances are equal by using the Levene test We will go over this in our next Stata lab This is a super common task for students to use Stata for

The Levene gets interpreted as follows: Null: two sample variances are equal Research: two sample variances are unequal The test statistic here is the F statistic

Example: F=87.4 at significance .00 -> reject the null since the probability the two variances are equal is quite small

Page 36: Proportions

+Steps to solve these types of problems Once you have the mean and s1 and s2, calculate a

new “pooled” standard deviation using this formula:

This is nothing but a weighted average of the two sample standard deviations

Page 37: Proportions

+Calculate standard error

Convert the standard deviation to standard error using this formula:

Then get the t statistic using (x bar 1 – x bar 2)/s.e. Then look it up in the t chart

Page 38: Proportions

+Example NOAA scientists sampling fish larvae in New England

fisheries have received a bigger budget to buy finer mesh nets for their sampling trips in the spring and in the fall. They believe this net will help them to better survey for fish larvae. Better data means less angry stakeholder assessments. They randomly select 10 boats with old nets and 10 boats with new nets and collect the following information. They want to show that this was money well spent.

Mean number of larvae

s

Old net 326 64New Net 526 64

Page 39: Proportions

+State the null and the research hypothesis Ho: the new net’s catch yield = the old net’s Ha: the new net’s catch > old net’s catch

Page 40: Proportions

+Get the pooled standard deviation Use this formula:

Numerator= 73728 Denominator= 4096 S_d= 64

Mean number of larvae

s

Old net 326 64New Net 526 64

Page 41: Proportions

+Get the standard error & t statistic Use this formula:

=28.622 (326-526)/28.622 = -6.98

Mean number of larvae

s

Old net 326 64New Net 526 64

Page 42: Proportions

+Look it up in the chart at 18 df

p<.0005 This means we can reject the null and say with high

certainty that the new nets are working. Let’s extrapolate these results to make claims on

government spending in general. Just kidding.

Page 43: Proportions

+Difference of means tests: Dependent Samples This is the “before and after” where the befores are

paired with the afters

Example: The IRS implemented a training program to reduce the time it takes to process an organization’s tax exempt status. They took the following data from 10 different regional offices before and after the program:

Page 44: Proportions

+The remaining steps to solve ARE ALL PERFORMED ON THE D COLUMN

Your results and the statistical inference you make are on the difference

Page 45: Proportions

+Step 1: get the standard error

Mean=4.07 S=4.56 s.e.= 4.56/ sqrt(10) = 1.44

Page 46: Proportions

+Get the t score

(4.07-0) / 1.44 Use 0 because you are seeing if there is a difference

between the difference you found (d) and no difference at all 0

=2.83 Look this up in the t table at df=9, or use stat

calculator:

Page 47: Proportions

+Difference of proportions

• The t test can be used for the difference of 2 sample proportions in the same way that it can be used for differences between sample means

Page 48: Proportions

+Practice Problem

DUSP wants to know if math camp is working for its new MCP admits. 65% of the incoming class was put through the program. 80 Students are sampled from the math camp group 65 passed quant. 40 MCPs were sampled from the group that was exempted from math camp. From those, 29 passed quant. Does math camp work?

Calculate the proportions: Math camp: 81.2% pass No math camp: 72.5% pass

Page 49: Proportions

+Calculate the s

S=sqrt(p*(1-p)) Those who did math camp: sqrt (.81*.19)=.39

Those who did not do math camp: sqrt(.72*.25)=.42

Page 50: Proportions

+Get the s.e.

s / sqrt(n) Those who did math camp: .39 / sqrt (80) = .044 Those who did not do math camp: .42 / sqrt (40)= .067

Page 51: Proportions

+Get the pooled s.e.

Use this formula:

=sqrt(.0438^2+.067^2) =0.08

Page 52: Proportions

+Get the t score

Use this formula: (proportion 1 – proportion 2) / pooled s.e.

(0.81-0.72) / 0.08 = 1.125 Look it up in the infinity df in the t table because the

sample sizes are both over 30 ~.13 There is a .13 chance that these two samples were

drawn from the same population