proportions

+

Proportions

Estimating population proportionsDifference of proportions

+Review: Setting up a c.i. around a proportion1. estimate the proportion2. Take the SD with this formula:

s= sqrt(p * (1-p)) 3. Find the s.e. with this formula:

s / sqrt(n)4. Set up the confidence interval with this formula:

proportion plus or minus t * s.e.

+ExampleA teacher offers extra tutoring in math. He takes a sample of 100 students who went through his tutoring sessions, and found that 73 started to get higher marks. He wants to figure out the 95% confidence limits in his survey before he goes to the principal and tells her the program was successful.

+Step 1-2Step 1: estimate the population proportion =0.73 start to perform better in mathStep 2: get the sample standard deviation using this formula: s= sqrt(p * (1-p)) =sqrt(0.73 * 0.27) =0.444

+Step 3Step 3: Use this in order to find the standard error:

= s / sqrt(n) =0.44/ sqrt(100) = 0.044

+Build confidence interval

Step 4: What are the 95% confidence limits of the proportion? Since n is bigger than 30, the normal curve can be used. Set up a confidence interval using this formula: proportion plus or minus t * s.e.

=0.73 + or - 1.96 * 0.044=0.73 + or - 0.087=0.63 to 0.81

+

Testing the difference between 2 groupsChapter 14

+Difference of means test Used if someone wants to know if two sample means or proportions are different (statistically) -could both sample means have been drawn

from the same population (and the difference is attributed to chance alone)

-or are they so different that there is no way they

could have been drawn from the same population

+Vs. what we’ve already done

-So far, we have used single samples, meaning we wanted to see if a single sample of a particular mean could be drawn from a population with a known or hypothesized mean

now we use 2 samples

+A sample problem to walk through the logic of these types of problemsA veterans support agency offers continuing education seminars for veterans. They want to evaluate the effect it has on job placement. The agency has half the veterans take the seminars, and the other half does not. They randomly select 50 who have done the seminars and 50 who have not. They want to evaluate whether the job placement rates are different.

+Formulate a research and a null hypothesis

The research hypothesis: tests whether one of the sample means is larger or smaller than the other sample mean (difference of means)

The null hypothesis: when you fail to reject it, you’re saying that the population means in questions are not different (i.e. an after school reading program didn’t raise mean scores)

+For this example:H_A: Employees with the seminar will have higher job placement rates

H_0: Employees who attend the seminar and those who do not attend the seminar will show no difference in job placement rates

+What does it mean to reject the null?

-If we fail to reject the null, this is like saying the mean scores of the two populations show no difference -i.e. the population mean is the same and the

seminars don’t lead to higher job placement

-If we reject the null, the conclusion there is that the test scores were different and job placement rates are in fact different

+Practice Problem The career development center wants to see if its new

ad campaign is working, so they can decide whether or not to fire their intern and use the money elsewhere. They randomly sample 9 MIT courses and send them weekly reminders about the career center. They randomly sample another 9 courses and send no emails. They then record the mean student visits per sample.

The average visits in courses with no ads is 135 (SD=110) per term

The average visits in courses with ads is 405 (SD=135) per term

+Thinking through the problem

The first step is to state the null and alternative hypothesis: Even if you aren’t asked for them, it helps you think Ho: the ads have had no effect on visitations to the career

center Ha: the ads have increased visitations to the career center

What you’re conceptualizing here is the difference between the average visitations: Mu(experiment) – Mu(control) = d (difference)

+Perform calculations (that we already know how to do) The mean and SD were already given:

The average visits in courses with no ads is 135 (SD=110) per term

The average visits in courses with ads is 405 (SD=135) per term

Get the Standard error: SD / sqrt(n) Pre advertisements: 110 / sqrt (9) = 36.667 Post advertisements: 135 / sqrt (9) = 45

+New step for difference of means Calculate the pooled standard error with the following

formula:

=58.04

+New step 2 for difference of means Get the t score using the pooled standard error s.e._d.

Why? Because the point of the difference of means test is to see the probability that the groups could have been drawn by the same population and the difference is just by chance

The formula for the t score is:

(135-405)/58.04= -4.65

+Look this t score up in the t table Find the degrees of freedom (n1+n2-2)=18-2=16 P=.0001 Reject null

+Can also use a calculator

http://stattrek.com/online-calculator/t-distribution.aspx



+

Types of difference tests

+Before we start

We see the word “variance” a lot in this chapter The variance is just the standard deviation squared

+General types of difference of means tests Independent samples: you have two samples that are not

paired or matched in any way These samples were obtained using random sampling methods Example: someone at the IRS picks 2 samples from a database

of 250 tax returns These samples could have equal or unequal variances (more on

that later)

Dependent Samples: “before and after test” where each item in one sample is paired with an item in the second sample Example: an agency selects 20 people with low performance

scores and has them do a workshop for a month, the same 20 employees are then tested again after the workshop to see if they improve

+Difference of means: Independent samples, unequal variances

If you don’t know what type of difference test you’re doing assume it is this one This is the most conservative test and

the one you see the most in real life studies

Conservative means it is hard to reject the null hypothesis

Why? The standard error calculations take large differences in sample variances (s^2) into account Sampling error is to blame for

unequal variances

+Calculating the degrees of freedom Use this formula (it produces smaller df which makes

the test more conservative versus the n1+n2-2 formula:

Remember, the lower the df, the bigger the test statistic needs to be when deciding to reject the null or not

+Calculating degrees of freedom

The formula is useful when the number of cases in each sample is different Or if the number of cases in each sample is small (less than 30) Example: if sample one has 150 cases and sample 2 only has 20

The variances will be different

More conservative tests make it harder to commit a type I error

+Practice Problem: difference of means, independent samples, unequal variances

The president of MIT wants to know if a new technology program for professors has made them use interactive visual aides more in the classroom. He randomly selects 10 courses where the professors in them received the training, and 8 courses where they have not yet taken the course.

Mean use of visual aides per term

s

No course 32.7 6.4Tech course 37.6 6.3

+State the null and alternative hypotheses Ho: the use of visual aides with course = use without

course Ha: the use of visual aides with course > use without

course

+Calculate the standard error

Mean use of visual aides per term

s

No course 32.7 6.4Tech course 37.6 6.3

• Use s/sqrt(n)• No course: 6.4 / sqrt 8= 2.262• Course: 6.3 / sqrt 10=1.99

+Get the pooled standard error

Use this formula:

=3.015

+Get the t score

Use this formula:

t= (32.7-37.6) / 3.015=-1.625

+Calculate df

Use this formula:

Numerator: [(6.4 ^2 / 8) + (6.3 ^2 /10)] ^2 =82 Denominator: 3.74 + 1.75 82/(3.74+1.75) = 14.93 Round to 15 (compare to 10+8-2)=16

+Look up in t table

df=15 T=-1.625 =0.2012 ~20% chance these samples were taken from the same

population Between .10 and .05

+Visualized

+Difference of means, independent samples, equal variances Less conservative than the test for unequal variances

Because the former makes for larger standard errors and higher t scores

You determine if two sample variances are equal by using the Levene test We will go over this in our next Stata lab This is a super common task for students to use Stata for

The Levene gets interpreted as follows: Null: two sample variances are equal Research: two sample variances are unequal The test statistic here is the F statistic

Example: F=87.4 at significance .00 -> reject the null since the probability the two variances are equal is quite small

+Steps to solve these types of problems Once you have the mean and s1 and s2, calculate a

new “pooled” standard deviation using this formula:

This is nothing but a weighted average of the two sample standard deviations

+Calculate standard error

Convert the standard deviation to standard error using this formula:

Then get the t statistic using (x bar 1 – x bar 2)/s.e. Then look it up in the t chart

+Example NOAA scientists sampling fish larvae in New England

fisheries have received a bigger budget to buy finer mesh nets for their sampling trips in the spring and in the fall. They believe this net will help them to better survey for fish larvae. Better data means less angry stakeholder assessments. They randomly select 10 boats with old nets and 10 boats with new nets and collect the following information. They want to show that this was money well spent.

Mean number of larvae

s

Old net 326 64New Net 526 64

+State the null and the research hypothesis Ho: the new net’s catch yield = the old net’s Ha: the new net’s catch > old net’s catch

+Get the pooled standard deviation Use this formula:

Numerator= 73728 Denominator= 4096 S_d= 64


s


+Get the standard error & t statistic Use this formula:

=28.622 (326-526)/28.622 = -6.98


s


+Look it up in the chart at 18 df

p<.0005 This means we can reject the null and say with high

certainty that the new nets are working. Let’s extrapolate these results to make claims on

government spending in general. Just kidding.

+Difference of means tests: Dependent Samples This is the “before and after” where the befores are

paired with the afters

Example: The IRS implemented a training program to reduce the time it takes to process an organization’s tax exempt status. They took the following data from 10 different regional offices before and after the program:

+The remaining steps to solve ARE ALL PERFORMED ON THE D COLUMN

Your results and the statistical inference you make are on the difference

+Step 1: get the standard error

Mean=4.07 S=4.56 s.e.= 4.56/ sqrt(10) = 1.44

+Get the t score

(4.07-0) / 1.44 Use 0 because you are seeing if there is a difference

between the difference you found (d) and no difference at all 0

=2.83 Look this up in the t table at df=9, or use stat

calculator:

+Difference of proportions

• The t test can be used for the difference of 2 sample proportions in the same way that it can be used for differences between sample means

+Practice Problem

DUSP wants to know if math camp is working for its new MCP admits. 65% of the incoming class was put through the program. 80 Students are sampled from the math camp group 65 passed quant. 40 MCPs were sampled from the group that was exempted from math camp. From those, 29 passed quant. Does math camp work?

Calculate the proportions: Math camp: 81.2% pass No math camp: 72.5% pass

+Calculate the s

S=sqrt(p*(1-p)) Those who did math camp: sqrt (.81*.19)=.39

Those who did not do math camp: sqrt(.72*.25)=.42

+Get the s.e.

s / sqrt(n) Those who did math camp: .39 / sqrt (80) = .044 Those who did not do math camp: .42 / sqrt (40)= .067

+Get the pooled s.e.

Use this formula:

=sqrt(.0438^2+.067^2) =0.08

+Get the t score

Use this formula: (proportion 1 – proportion 2) / pooled s.e.

(0.81-0.72) / 0.08 = 1.125 Look it up in the infinity df in the t table because the

sample sizes are both over 30 ~.13 There is a .13 chance that these two samples were

drawn from the same population

proportions

Documents

sample means

population mean

sample meandifference

sample problem

higher job placement

mean scores

sample standard deviation

confidence limits