stat101 assignment final actual

8/8/2019 STAT101 Assignment Final Actual

1/8

Peter Greer 46046307pjg102

STAT101 Assignment #3

Question 1

As the samples provided in the PULSE.xls file were greater than 30, the sample meanswere assumed to be normally distributed in accordance with the Central Limit Theorem.

Microsoft Excel was used to find the mean pulse rates and the standard deviation formales and females by using the SUM and STDEV function as follows (where n = 40):

EXCEL FORMULAS:MEAN: = SUM(number_1:number_n) / nSTANDARD DEVIATION : = STDEV(number_1:number_n)

The values returned were as follows:

MALES:MEAN: 69.4STANDARD DEVIATION: 11.30 (2dp)

FEMALES:MEAN: 76.3STANDARD DEVIATION: 12.50 (2dp)

a. To construct a 95% confidence interval for the population mean pulse rate formales using the sample provided, a critical t-value must first be found usingdegrees of freedom (df = n 1) and alpha value ( = 1 (confidence / 100)). In thiscase these are:

df = 40 1= 39

= 1 (95 / 100)= 1 0.95= 0.05

By consulting the t-distribution tables found on page 276 of the STAT101 CourseReader these values can be found to correspond to a critical t-value of 2.023.

The formula below can then be used to construct the interval (where refers to thesample mean, s refers to the sampling distribution and n refers to the sample size):

95% CI = tn-1, ( / 2) * (s / n)= 69.4 2.023 * (11.30 / 40)

= 69.4 2.023 * (11.30 / 6.32)= 69.4 2.023 * 1.79= 69.4 3.61 (2dp)= (65.79, 73.01)

69.4 3.61 therefore represents a 95% confidence interval for the populationmean pulse rate for males. The lower limit is 65.79 (2dp) and the upper limit is73.01 (2dp).

STAT101 Page 1 of 8Assignment #3


2/8


b. In order to construct a 95% confidenceinterval for the population mean pulse rate for females using the sample provided,the critical t-value from 1a) was used again as both the df (degrees of freedom)and (alpha) variables remain unchanged.

95% CI = tn-1, ( / 2) * (s / n)

= 76.3 2.023 . (12.50 / 40)= 76.3 2.023 . (12.50 / 6.32)= 76.3 2.023 . 1.98= 76.3 4.00 (2dp)= (72.30, 80.30) (2dp)

76.3 4.00 therefore represents a 95% confidence interval for the populationmean pulse rate for females. The lower limit is 72.30 (2dp) and the upper limit is80.30 (2dp).

c. The following is a print-out of the Exceldescriptive statistics for each sample. The limits at the bottom of each table werenot automatically calculated by Microsoft Excel, and were added manually usingbasic arithmetic in MS Excel formula bar:

Male Pulse Female Pulse

Mean 69.4 Mean 76.3

Standard Error1.78627

24 Standard Error1.976204

6

Median 66 Median 74

Mode 64 Mode 72

Standard Deviation11.2973

79 Standard Deviation12.49861

5

Sample Variance127.630

77 Sample Variance156.2153

8

Kurtosis

-0.63951

8 Kurtosis 4.525991

Skewness0.68002

37 Skewness1.683896

1

Range 40 Range 64

Minimum 56 Minimum 60

Maximum 96 Maximum 124

Sum 2776 Sum 3052

Count 40 Count 40Confidence

Level(95.0%)

3.61307

7 Confidence Level(95.0%)

3.997251

1

Upper limit of the 95%CI:

73.013077

Upper limit of the 95%CI:

80.297251

Lower limit of the 95%CI:

65.786923

Lower limit of the 95%CI:

72.302749

d. Because the 95% confidence intervalsfor the population means overlap, we cannot conclude without further teststhat the two population means are different.



3/8


Question 2

a. In order to test the researchers theory, we perform a one-tailed Z-test on ourhypothesis. If the returned value is outside the criticalz-value, which can be foundinz-tables using significance level (in this case given as 0.01), the null hypothesiscan be rejected.

In order to validly use this test we must first verify that the sample size is largeenough. In order to do this we multiply the sample size (n = 415) by first the null-hypothesis population proportion (p0 = 0.79), and then q (1 p0 = 0.21). If boththese values are equal to or greater than 5, we can validly use the test. So:

n * p0 = 415 * 0.79= 327.85

n * q = 415 * (1 0.79)= 415 * 0.21= 87.15

These values are both well in excess of 5, so we may proceed with the test. Theresearchers theory is that the proportion of accounting firms who offer flexibleworking hours (p) is lower than the proportion of all companies who offer the same(p0 = 0.79). This is the alternate hypothesis (HA). The null hypothesis (H0) is that theproportion of accounting firms who offer flexible working hours is the same as theproportion of all companies offering flexible working hours. These hypotheses arelaid out below, along with the calculation of sample proportion (p):

H0: p = 0.79HA: p < 0.79

p = number of successes / sample size

= 303 / 415= 0.73 (2dp)

Now we have formed our hypotheses, we use the formula below to calculate ourtest statisticz:

z = p - p0 .((p0 * (1 p0)) / n)

= 0.73 0.79 .((0.79 * (1 0.79)) / 415)

= -0.06 .

((0.79 * 0.21) / 415)

= -0.06 .(0.166 / 415)

= -0.06 .0.0004

= -0.06 .0.020

= -3.00 (2dp)



4/8


By using the Microsoft NORMSINV function the criticalz-score with a significancelevel of 0.01 for a one-tailed test was found to be -2.32 (2dp). To sum thisinformation up:

z0.01: -2.32z: -3.00

We can see from this information that the test statisticzis in the rejection region (-3.00 < -2.32). The null hypothesis (p = 0.79) can therefore be rejected, and thealternative hypothesis (p < 0.79) can be accepted. In the context of thisproblem this means we can conclude that a significantly lower proportionof accounting firms do offer flexible working hours than the stated claimfor all companies of 79% at the 0.01 level of significance.

b. The P-value of this test is 0.0013 (2sf)

c. To calculate the 90% confidence interval for the true proportion of accounting firmsthat offer flexible working hours the formula below is used:

CI: = p z/2 * ((p * (1 p ))/ n)= 0.73 1.645 * ((0.73 * 0.27) / 415)= 0.73 1.645 * (0.197 / 415)= 0.73 1.645 * 0.000475= 0.73 1.645 * 0.0218= 0.73 0.036

The confidence interval is 0.73 0.036. The lower limit is 0.694 (3dp) and theupper limit is 0.766 (3dp).

This interval suggests that we can estimate with 90% confidence that the trueproportion of accounting firms who offer flexible working hours is between 0.694

and 0.766 (69.4% and 76.6%).



5/8


Question 3

a. A Type I error is committed by rejecting a true null hypothesis. In the context of theproblem, a Type I error would mean the manager, by chance, selects an extremesample and based on that sample rejects the null hypothesis, concluding that theaverage time taken to supply customers with the basic order is greater than 80

seconds. The business manager could then take unfair disciplinary action, orchange policy unnecessary.

A Type II error is committed by failing to reject a false null hypothesis. In thecontext of this problem, this would imply the manager reaches the conclusion thatthe average time to supply customers with a basic order is equal to or less than 80seconds, when in fact it is not. This could result in decreased efficiency, due to themanagers unfounded satisfaction with the performance of his workers/equipment.

b. In the context of this problem, it is likely the Type II error would be considered moreimportant. A Type I error would result in increased efforts to improve efficiency,which is not usually a problem from the business managers point of view. However,being wrongly satisfied with an under-performing business is likely to be expensivein the long run.

c. In order to test whether or not the manager has cause for concern regarding theefficiency of his workers at the 0.05 level, a t-test is used to test the hypothesesbelow:

H0: = 80HA: > 80

A t-test must be used as we do not have the population standard deviation to hand.In order to use this type of hypothesis test, the population must be normallydistributed and/or the sample size must be greater than or equal to 30, in

accordance with the Central Limit Theorem. In this case the sample size is stated as36, so we can safely use the test.

Firstly we calculate our test statistic using the formula:

t = - s / n

= 89.0 80.019.46 / 36

= 9.0 .19.46 / 6

= 9.03.24

= 2.77 (2dp)

Using the TINV function of Excel the critical value tis found to be 1.69 (2dp) for a

one-tailed t-test where is equal to 0.05 and there are 35 (n 1) degrees of

freedom. 2.77 falls well outside the critical value of 1.69, and the manager can

therefore reject the null hypothesis and conclude that he has cause for concern

regarding the efficiency of his staff.



6/8


The P-value can be yielded from the TDIST function in Excel, entering the test

statistic, degrees of freedom and number of tails. The P-value of this test statistic is

0.0044 (2sf).

d. The P-value provides us with an actual probability that the discrepancy between

two results is due to chance, as opposed to a position on a distribution curve. It

requires no alpha value, and can be readily compared to any value the statistician

wishes.

Question 4

a. A two sample t-test can be conducted provided the samples are approximatelynormally distributed, or the sample size is greater than or equal to 30; and that thevariables are independent. Both samples are larger than 30 (n > 30), and they areindeed independent, and therefore a t-test can be carried out. For the purposes ofthis test an alpha value of 0.05 will be used, as stipulated in the question.

Firstly, the mean and standard deviation of each sample must be calculated. This

was done in Excel using the same method as in Question 1.

MALES:MEAN: 2.71 (2dp)STANDARD DEVIATION: 0.37 (2dp)

FEMALES:MEAN: 1.99 (2dp)STANDARD DEVIATION: 0.38 (2dp)

In order to continue, we have to establish a null hypothesis (H0), an alternatehypothesis (HA), and a test statistic. The hypotheses are laid out below:

H0: W = MHA: W M

The null hypothesis implies that the population mean for woman is the same as thatfor men. If the test statistic (calculated using the formula provided in the CourseReader) lies outside the critical t-value (calculated using Excel TINV where = 0.05and df = 35 (the smaller sample size 1) as 2.03), we reject the null hypothesis,and can conclude there appears to be a difference in the population means.

Using the formula we can proceed to calculate our test statistic as follows:

t = (W - M) ( W - M)

(sW2

/nW + sM2

/nM)In this case we know that given our null hypothesis, W = M, and therefore that W -M = 0. The other variables are already available to us, and when used in theformula give us the following results:

t = (2.71 1.99) ( 0) .(2.712/36+1.992/40)

= 0.72 .0.125

= 0.72

0.35

= 2.035 (3dp)



7/8




8/8


So, our test statistic has been calculated as 2.035, and we know from previously

that where df = 35 and = 0.05 our critical t-value is 2.030. 2.035 is greater than

2.030, and the test statistic therefore falls into the rejection region. From this we

can reject the hypothesis that W = M, and therefore at the 0.05 significance

level we can conclude that there appears to be a difference in the

population means of the extent of approval for unsporting play betweenmen and women.

b. Using the TDIST function the P-value is calculated from our test statistic and

degrees of freedom to be 0.049 (3dp). This value tells us that the probability of

observing a test statistic as extreme as we did by chance, assuming the null

hypothesis (that the two means are the same) is true; is 0.049; or in other words

that there is 4.9% chance our results are due to coincidence rather than an actual

difference in population means.

c. The Two Sample t-Test with Unequal Variance output from Microsoft Excel is below:

Men WomenMean 2.706666667 1.9875

Variance 0.140074286 0.142501282

Observations 36 40Hypothesized Mean

Difference 0

df 73

t Stat 8.330093485

P(T

stat101 assignment final actual

Documents