math 533 part b project

13
Brief Introduction: AJ Davis is a department store chain, which has many credit customers and want to find out more information about these customers. AJ Davis has complied a sample of 50 credit customers with data selected in the following variables: Location, Income (in $1,000’s), Size (Number of people living in the household), Years (number of years the customer has lived in the current location), and Credit Balance (customers current credit card balance on the store’s credit car, in $). The manager at AJ Davis has speculated the following: a. The average (mean) annual income was less than $50,000. b. The true population proportion of customers who live in an urban area exceeds 40% c. The average (mean) number of years lived in the current home is less than 13 years d. The average (mean) credit balance for suburban customers is more than $4300 I will analyze the speculated data listed above by performing hypothesis test for each of the above situations (using the Seven elements of a Test Hypothesis with a=.05) in order to see if there is evidence to support my manager’s beliefs in each case (a-d), explain my conclusion in simple terms, compute the p-value with the interpretation, follow up with computing 95% confidence intervals for each of the variables described in a. to d. along with interpreting these intervals. This paper will also include an Appendix with all the steps in hypothesis testing, as well as the confidence intervals and Minitab output In order to understand how hypothesis testing is done it is important that you know the elements of the Test of Hypothesis, and what each step means. The Seven elements of a Test of Hypothesis are: 1

Upload: tasha-mandley

Post on 07-Nov-2014

2.102 views

Category:

Documents


2 download

DESCRIPTION

Statistic paper to be used as a guide. No information should be copied.

TRANSCRIPT

Page 1: Math 533 Part b Project

Brief Introduction: AJ Davis is a department store chain, which has many credit customers and want to find out more information about these customers. AJ Davis has complied a sample of 50 credit customers with data selected in the following variables: Location, Income (in $1,000’s), Size (Number of people living in the household), Years (number of years the customer has lived in the current location), and Credit Balance (customers current credit card balance on the store’s credit car, in $).

The manager at AJ Davis has speculated the following:a. The average (mean) annual income was less than $50,000.b. The true population proportion of customers who live in an urban area

exceeds 40%c. The average (mean) number of years lived in the current home is less than 13

yearsd. The average (mean) credit balance for suburban customers is more than $4300

I will analyze the speculated data listed above by performing hypothesis test for each of the above situations (using the Seven elements of a Test Hypothesis with a=.05) in order to see if there is evidence to support my manager’s beliefs in each case (a-d), explain my conclusion in simple terms, compute the p-value with the interpretation, follow up with computing 95% confidence intervals for each of the variables described in a. to d. along with interpreting these intervals. This paper will also include an Appendix with all the steps in hypothesis testing, as well as the confidence intervals and Minitab output

In order to understand how hypothesis testing is done it is important that you know the elements of the Test of Hypothesis, and what each step means.

The Seven elements of a Test of Hypothesis are:

1. Null Hypothesis - A theory about the specific values of one or more population parameters. The theory generally represents the status quo, and we accept it until proven false.

2. Alternative (research) hypothesis (Ha)- A theory about the specific values of one or more population parameters. The theory generally represents the status quo, and we accept it until proven false

3. Test statistic - A sample statistic used to decide whether to reject the null hypothesis.

4. Rejection Region - The numerical values of the test statistic for which the null hypothesis will be rejected.

5. Assumptions- Clear statements of any assumptions made about the populations being sampled.

1

Page 2: Math 533 Part b Project

6. Experiment and calculation of test statistics- Performance of the sampling experiment and determination of the numerical value of the test statistic.

7. Conclusion- a.  If the numerical value of the test statistic falls in the rejection region

then we reject the null hypothesis and conclude that the alternative is true.

b. If the test statistic does not fall in the rejection region, then we do not reject H0 as we have insufficient data to do so.

2

Page 3: Math 533 Part b Project

a. The average (mean) annual income was less than $50,000

I found that the average annual incomes are 43.74 or $46,060, and the standard deviation to be 14.64 or $14.064.

Set up Hypothesis Test

o Ho: µ =50 o H1: µ <50

For a= 0.5 and “<” in the Ha, I found that z= -1.645, so the “Rejection Region” would be z<-1.645

Next I calculated the test statistic, using the formula below to calculate the test statistic z.

z=x−u0

σ x−

where is the mean in the null hypothesis and = s/

Z= (43.74-50)/7.0711= 2.08, because =-2.07,because =

14.64/ = 7.07107

The p-value= 0.001. The p-value is another complementary and equally valid way we can evaluate the null and alternative hypotheses is by looking at the p-value and compare the p-value to alpha. If the p-value is less than alpha, reject the null hypothesis and accept the alternative hypothesis, at the given alpha. When you look at the calculated test statistics results you can see that both the test statistic and the p-value methods have the same reject or not reject results.

Because the p-value = 0.001 is less than alpha = 0.05: we reject the null hypothesis H0: μ=50 and we accept the alternative hypothesis Ha: μ<50, at α=0.05.

My calculated test statistic of -2.07 falls in the rejection region of z < -0.1645, therefore, I would reject the null hypothesis and say there is sufficient evidence to indicate u<50 or $50,000.

3

Page 4: Math 533 Part b Project

b. The true population proportion of customers who live in an urban area exceeds 40%

22 of the 50 surveyed live in the Urban area, which is 44% or 0.44, this is the point estimate for p.

Therefore my hypothesis would beo Ho: = 0.40 vs. Ha: p>0.40

In order to conduct the large sample z-test, we first need to verify that the sample size is large enough.

o nPo= 50(0.40) = 20 and 50 (1-0.44) = 30, both are larger than 15, so we can conclude that sample size is large enough to apply the large sample z test.

Z= (0.44 – 0.400)/ 0.69282= 0.58 where s phat= sqrt (((0.40) (0.60))/50= 0.069282

This is a one tailed (upper or right since HA has “>”). Our rejection regions would be z > 1.645.

0.58 is not greater than 1.645 (and is not in the rejection regions) so we would not Reject the Ho.

The p-value= 0.282. The p-value is another complementary and equally valid way we can evaluate the null and alternative hypotheses is by looking at the p-value and compare the p-value to alpha. If the p-value is less than alpha, reject the null hypothesis and accept the alternative hypothesis, at the given alpha. When you look at the calculated test statistics results you can see that both the test statistic and the p-value methods have the same reject or not reject results.

Because the p-value = 0.282 is more than alpha = 0.05: we do not reject the null hypothesis H0: μ=40 and we do not accept the alternative hypothesis

Ha: μ<40, at α=.05.

Since we are not rejecting the Ho, we are saying there is insufficient evidence to conclude the true population of customers who live in the Suburban location is greater than 40%.

4

Page 5: Math 533 Part b Project

c. The average (mean) number of years lived in the current home is less than 13 years.

o The average number of years in the current home form survey data to be 12.260, and the standard deviation to be 5.086

o Set up Hypothesis Test Ho: u = 13 H1: u<13

For a = 0005 and “<” in the Ha, I found that z= -1.645, so the “rejection Region would be z < -1.645

Now I calculate the test statisticz=x−u0

σ x−

where is the mean in the null hypothesis and = s/

z= (12.26 -13)/0.7193= -1.03, because = 5.086/ √n (50)= 0.7193

Because the p-value = 0.152 is more than alpha = 0.05: we do not reject the null hypothesis H0: μ=13 and we do not accept the alternative hypothesis Ha: μ<13, at α=.05.

My calculated test statistic of -1.03 does not fall in the rejection region of z < -1.645, therefore, we would not reject the null hypothesis and say there is insufficient evidence to indicate U<13

5

Page 6: Math 533 Part b Project

d. The average (mean) credit balance for suburban customers is more than $4300.

o I found he average credit balance for those surveyed is $3970, and the standard deviation is 932.

o Set up Hypothesis Test Ho: u = 4300 H1: u> 4300

For a = .05 and “>” in the Ha, I found z= 1.645, so the Rejection Region would bez > 1.645.

Now I calculate the test statistic

where is the mean in the null hypothesis and = s/

z= (3970- 4300)/131.8 = -2.50, because = 932/ (50)= 131.8

The p-value= 0.994. The p-value is another complementary and equally valid way we can evaluate the null and alternative hypotheses is by looking at the p-value and compare the p-value to alpha. If the p-value is less than alpha, reject the null hypothesis and accept the alternative hypothesis, at the given alpha. When you look at the calculated test statistics results you can see that both the test statistic and the p-value methods have the same reject or not reject results.

Because the p-value = 0.994 is not less than alpha = .05: we do not reject the null hypothesis H0: μ=4300 and we do not accept the alternative hypothesis Ha: μ>4300 at α=.05.

My calculated test statistic of -2.50 does not fall in the rejection region of Z > -1.645, therefore, I would NOT reject the null hypothesis and say there is insufficient evidence to indicate U>4300.

6

Page 7: Math 533 Part b Project

Appendix

2) Follow this up with computing 95% confidence intervals for each of the variables described in a. - d., and gain interpreting these intervals.

a. The average (mean) annual income was less than $50,000

One-Sample Z: Income ($1000)

The assumed standard deviation = 14.64

Variable N Mean StDev SE Mean 95% CIIncome ($1000) 50 43.74 14.64 2.07 (39.68, 47.80)

Conclusion: According to the confidence interval, we are 95% confident that the true mean income lies between $39,680 and $47,800.

b. The true population proportion of customers who live in an urban area exceeds 40%

Sample X N Sample p 95% CI Z-Value P-Value1 22 50 0.440000 (0.302411, 0.577589) 0.58 0.564

Conclusion: According to the confidence interval, we are 95% confident that the mean population lies between 0.302 and 0.577.

c. The average (mean) number of years lived in the current home is less than 13 years

One-Sample Z: Income ($1000)

The assumed standard deviation = 5.086

Variable N Mean StDev S E Mean 95% CIIncome ($1000) 50 43.740 14.640 0.719 (42.330, 45.150)

Conclusion: According to the confidence interval, we are 95% confident that the average mean of people living in their current homes lies between 42.33 and 45.15.

d. The average (mean) credit balance for suburban customers is more than $4300

One-Sample Z: Credit Balance($)

The assumed standard deviation = 932

Variable N Mean StDev SE Mean 95% CICredit Balance($) 50 3970 932 132 (3712, 4229)

7

Page 8: Math 533 Part b Project

Conclusion: We are 95% confident that the true mean credit balance lies between $3,712 and $4,229.

Minitab calculations for first part of Part B Project

a. The average (mean) annual income was less than $50,000Descriptive Statistics: Income ($1000)

Descriptive Statistics: Income ($1000)

Variable Mean StDev Minimum MaximumIncome ($1000) 43.74 14.64 21.00 67.00

One-Sample Z

Test of mu = 50 vs < 50The assumed standard deviation = 14.64

95% Upper N Mean SE Mean Bound Z P50 43.74 2.07 47.15 -3.02 0.001

0.4

0.3

0.2

0.1

0.0X

Dens

ity

-1.645

0.05

0

Distribution PlotNormal, Mean=0, StDev=1

b. The true population proportion of customers who live in an urban area exceeds 40%.

Location Count Percent Rural 13 26.00Suburban 15 30.00 Urban 22 44.00 N= 50

8

Page 9: Math 533 Part b Project

Test and CI for One Proportion

Test of p = 0.4 vs p > 0.4

95% LowerSample X N Sample p Bound Z-Value P-Value1 22 50 0.440000 0.324532 0.58 0.282

Test and CI for One Proportion

Sample X N Sample p 95% CI1 22 50 0.440000 (0.302411, 0.577589)

c. The average (mean) number of years lived in the current home is less than 13 years

Descriptive Statistics: Years

Variable Mean StDev Minimum MaximumYears 12.260 5.086 1.000 20.000

One-Sample Z: Years

Test of mu = 13 vs < 13The assumed standard deviation = 5.086

95% UpperVariable N Mean StDev SE Mean Bound Z PYears 50 12.260 5.086 0.719 13.443 -1.03 0.152

0.4

0.3

0.2

0.1

0.0X

Dens

ity

-1.645

0.05

0

Distribution PlotNormal, Mean=0, StDev=1

9

Page 10: Math 533 Part b Project

d. The average (mean) credit balance for suburban customers is more than $4300

Descriptive Statistics: Credit Balance($)

Variable Mean StDev Minimum MaximumCredit Balance($) 3970 932 1864 5678

One-Sample Z: Credit Balance($)

Test of mu = 4300 vs > 4300The assumed standard deviation = 932

95% LowerVariable N Mean StDev SE Mean Bound Z PCredit Balance($) 50 3970 932 132 3754 -2.50 0.994

0.4

0.3

0.2

0.1

0.0X

Dens

ity

1.645

0.05

0

Distribution PlotNormal, Mean=0, StDev=1

10