stat101 assignment final actual

Upload: peter-greer

Post on 10-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 STAT101 Assignment Final Actual

    1/8

    Peter Greer 46046307pjg102

    STAT101 Assignment #3

    Question 1

    As the samples provided in the PULSE.xls file were greater than 30, the sample meanswere assumed to be normally distributed in accordance with the Central Limit Theorem.

    Microsoft Excel was used to find the mean pulse rates and the standard deviation formales and females by using the SUM and STDEV function as follows (where n = 40):

    EXCEL FORMULAS:MEAN: = SUM(number_1:number_n) / nSTANDARD DEVIATION : = STDEV(number_1:number_n)

    The values returned were as follows:

    MALES:MEAN: 69.4STANDARD DEVIATION: 11.30 (2dp)

    FEMALES:MEAN: 76.3STANDARD DEVIATION: 12.50 (2dp)

    a. To construct a 95% confidence interval for the population mean pulse rate formales using the sample provided, a critical t-value must first be found usingdegrees of freedom (df = n 1) and alpha value ( = 1 (confidence / 100)). In thiscase these are:

    df = 40 1= 39

    = 1 (95 / 100)= 1 0.95= 0.05

    By consulting the t-distribution tables found on page 276 of the STAT101 CourseReader these values can be found to correspond to a critical t-value of 2.023.

    The formula below can then be used to construct the interval (where refers to thesample mean, s refers to the sampling distribution and n refers to the sample size):

    95% CI = tn-1, ( / 2) * (s / n)= 69.4 2.023 * (11.30 / 40)

    = 69.4 2.023 * (11.30 / 6.32)= 69.4 2.023 * 1.79= 69.4 3.61 (2dp)= (65.79, 73.01)

    69.4 3.61 therefore represents a 95% confidence interval for the populationmean pulse rate for males. The lower limit is 65.79 (2dp) and the upper limit is73.01 (2dp).

    STAT101 Page 1 of 8Assignment #3

  • 8/8/2019 STAT101 Assignment Final Actual

    2/8

    Peter Greer 46046307pjg102

    b. In order to construct a 95% confidenceinterval for the population mean pulse rate for females using the sample provided,the critical t-value from 1a) was used again as both the df (degrees of freedom)and (alpha) variables remain unchanged.

    95% CI = tn-1, ( / 2) * (s / n)

    = 76.3 2.023 . (12.50 / 40)= 76.3 2.023 . (12.50 / 6.32)= 76.3 2.023 . 1.98= 76.3 4.00 (2dp)= (72.30, 80.30) (2dp)

    76.3 4.00 therefore represents a 95% confidence interval for the populationmean pulse rate for females. The lower limit is 72.30 (2dp) and the upper limit is80.30 (2dp).

    c. The following is a print-out of the Exceldescriptive statistics for each sample. The limits at the bottom of each table werenot automatically calculated by Microsoft Excel, and were added manually usingbasic arithmetic in MS Excel formula bar:

    Male Pulse Female Pulse

    Mean 69.4 Mean 76.3

    Standard Error1.78627

    24 Standard Error1.976204

    6

    Median 66 Median 74

    Mode 64 Mode 72

    Standard Deviation11.2973

    79 Standard Deviation12.49861

    5

    Sample Variance127.630

    77 Sample Variance156.2153

    8

    Kurtosis

    -0.63951

    8 Kurtosis 4.525991

    Skewness0.68002

    37 Skewness1.683896

    1

    Range 40 Range 64

    Minimum 56 Minimum 60

    Maximum 96 Maximum 124

    Sum 2776 Sum 3052

    Count 40 Count 40Confidence

    Level(95.0%)

    3.61307

    7 Confidence Level(95.0%)

    3.997251

    1

    Upper limit of the 95%CI:

    73.013077

    Upper limit of the 95%CI:

    80.297251

    Lower limit of the 95%CI:

    65.786923

    Lower limit of the 95%CI:

    72.302749

    d. Because the 95% confidence intervalsfor the population means overlap, we cannot conclude without further teststhat the two population means are different.

    STAT101 Page 2 of 8Assignment #3

  • 8/8/2019 STAT101 Assignment Final Actual

    3/8

    Peter Greer 46046307pjg102

    Question 2

    a. In order to test the researchers theory, we perform a one-tailed Z-test on ourhypothesis. If the returned value is outside the criticalz-value, which can be foundinz-tables using significance level (in this case given as 0.01), the null hypothesiscan be rejected.

    In order to validly use this test we must first verify that the sample size is largeenough. In order to do this we multiply the sample size (n = 415) by first the null-hypothesis population proportion (p0 = 0.79), and then q (1 p0 = 0.21). If boththese values are equal to or greater than 5, we can validly use the test. So:

    n * p0 = 415 * 0.79= 327.85

    n * q = 415 * (1 0.79)= 415 * 0.21= 87.15

    These values are both well in excess of 5, so we may proceed with the test. Theresearchers theory is that the proportion of accounting firms who offer flexibleworking hours (p) is lower than the proportion of all companies who offer the same(p0 = 0.79). This is the alternate hypothesis (HA). The null hypothesis (H0) is that theproportion of accounting firms who offer flexible working hours is the same as theproportion of all companies offering flexible working hours. These hypotheses arelaid out below, along with the calculation of sample proportion (p):

    H0: p = 0.79HA: p < 0.79

    p = number of successes / sample size

    = 303 / 415= 0.73 (2dp)

    Now we have formed our hypotheses, we use the formula below to calculate ourtest statisticz:

    z = p - p0 .((p0 * (1 p0)) / n)

    = 0.73 0.79 .((0.79 * (1 0.79)) / 415)

    = -0.06 .

    ((0.79 * 0.21) / 415)

    = -0.06 .(0.166 / 415)

    = -0.06 .0.0004

    = -0.06 .0.020

    = -3.00 (2dp)

    STAT101 Page 3 of 8Assignment #3

  • 8/8/2019 STAT101 Assignment Final Actual

    4/8

    Peter Greer 46046307pjg102

    By using the Microsoft NORMSINV function the criticalz-score with a significancelevel of 0.01 for a one-tailed test was found to be -2.32 (2dp). To sum thisinformation up:

    z0.01: -2.32z: -3.00

    We can see from this information that the test statisticzis in the rejection region (-3.00 < -2.32). The null hypothesis (p = 0.79) can therefore be rejected, and thealternative hypothesis (p < 0.79) can be accepted. In the context of thisproblem this means we can conclude that a significantly lower proportionof accounting firms do offer flexible working hours than the stated claimfor all companies of 79% at the 0.01 level of significance.

    b. The P-value of this test is 0.0013 (2sf)

    c. To calculate the 90% confidence interval for the true proportion of accounting firmsthat offer flexible working hours the formula below is used:

    CI: = p z/2 * ((p * (1 p ))/ n)= 0.73 1.645 * ((0.73 * 0.27) / 415)= 0.73 1.645 * (0.197 / 415)= 0.73 1.645 * 0.000475= 0.73 1.645 * 0.0218= 0.73 0.036

    The confidence interval is 0.73 0.036. The lower limit is 0.694 (3dp) and theupper limit is 0.766 (3dp).

    This interval suggests that we can estimate with 90% confidence that the trueproportion of accounting firms who offer flexible working hours is between 0.694

    and 0.766 (69.4% and 76.6%).

    STAT101 Page 4 of 8Assignment #3

  • 8/8/2019 STAT101 Assignment Final Actual

    5/8

    Peter Greer 46046307pjg102

    Question 3

    a. A Type I error is committed by rejecting a true null hypothesis. In the context of theproblem, a Type I error would mean the manager, by chance, selects an extremesample and based on that sample rejects the null hypothesis, concluding that theaverage time taken to supply customers with the basic order is greater than 80

    seconds. The business manager could then take unfair disciplinary action, orchange policy unnecessary.

    A Type II error is committed by failing to reject a false null hypothesis. In thecontext of this problem, this would imply the manager reaches the conclusion thatthe average time to supply customers with a basic order is equal to or less than 80seconds, when in fact it is not. This could result in decreased efficiency, due to themanagers unfounded satisfaction with the performance of his workers/equipment.

    b. In the context of this problem, it is likely the Type II error would be considered moreimportant. A Type I error would result in increased efforts to improve efficiency,which is not usually a problem from the business managers point of view. However,being wrongly satisfied with an under-performing business is likely to be expensivein the long run.

    c. In order to test whether or not the manager has cause for concern regarding theefficiency of his workers at the 0.05 level, a t-test is used to test the hypothesesbelow:

    H0: = 80HA: > 80

    A t-test must be used as we do not have the population standard deviation to hand.In order to use this type of hypothesis test, the population must be normallydistributed and/or the sample size must be greater than or equal to 30, in

    accordance with the Central Limit Theorem. In this case the sample size is stated as36, so we can safely use the test.

    Firstly we calculate our test statistic using the formula:

    t = - s / n

    = 89.0 80.019.46 / 36

    = 9.0 .19.46 / 6

    = 9.03.24

    = 2.77 (2dp)

    Using the TINV function of Excel the critical value tis found to be 1.69 (2dp) for a

    one-tailed t-test where is equal to 0.05 and there are 35 (n 1) degrees of

    freedom. 2.77 falls well outside the critical value of 1.69, and the manager can

    therefore reject the null hypothesis and conclude that he has cause for concern

    regarding the efficiency of his staff.

    STAT101 Page 5 of 8Assignment #3

  • 8/8/2019 STAT101 Assignment Final Actual

    6/8

    Peter Greer 46046307pjg102

    The P-value can be yielded from the TDIST function in Excel, entering the test

    statistic, degrees of freedom and number of tails. The P-value of this test statistic is

    0.0044 (2sf).

    d. The P-value provides us with an actual probability that the discrepancy between

    two results is due to chance, as opposed to a position on a distribution curve. It

    requires no alpha value, and can be readily compared to any value the statistician

    wishes.

    Question 4

    a. A two sample t-test can be conducted provided the samples are approximatelynormally distributed, or the sample size is greater than or equal to 30; and that thevariables are independent. Both samples are larger than 30 (n > 30), and they areindeed independent, and therefore a t-test can be carried out. For the purposes ofthis test an alpha value of 0.05 will be used, as stipulated in the question.

    Firstly, the mean and standard deviation of each sample must be calculated. This

    was done in Excel using the same method as in Question 1.

    MALES:MEAN: 2.71 (2dp)STANDARD DEVIATION: 0.37 (2dp)

    FEMALES:MEAN: 1.99 (2dp)STANDARD DEVIATION: 0.38 (2dp)

    In order to continue, we have to establish a null hypothesis (H0), an alternatehypothesis (HA), and a test statistic. The hypotheses are laid out below:

    H0: W = MHA: W M

    The null hypothesis implies that the population mean for woman is the same as thatfor men. If the test statistic (calculated using the formula provided in the CourseReader) lies outside the critical t-value (calculated using Excel TINV where = 0.05and df = 35 (the smaller sample size 1) as 2.03), we reject the null hypothesis,and can conclude there appears to be a difference in the population means.

    Using the formula we can proceed to calculate our test statistic as follows:

    t = (W - M) ( W - M)

    (sW2

    /nW + sM2

    /nM)In this case we know that given our null hypothesis, W = M, and therefore that W -M = 0. The other variables are already available to us, and when used in theformula give us the following results:

    t = (2.71 1.99) ( 0) .(2.712/36+1.992/40)

    = 0.72 .0.125

    = 0.72

    0.35

    = 2.035 (3dp)

    STAT101 Page 6 of 8Assignment #3

  • 8/8/2019 STAT101 Assignment Final Actual

    7/8

    Peter Greer 46046307pjg102

    STAT101 Page 7 of 8Assignment #3

  • 8/8/2019 STAT101 Assignment Final Actual

    8/8

    Peter Greer 46046307pjg102

    So, our test statistic has been calculated as 2.035, and we know from previously

    that where df = 35 and = 0.05 our critical t-value is 2.030. 2.035 is greater than

    2.030, and the test statistic therefore falls into the rejection region. From this we

    can reject the hypothesis that W = M, and therefore at the 0.05 significance

    level we can conclude that there appears to be a difference in the

    population means of the extent of approval for unsporting play betweenmen and women.

    b. Using the TDIST function the P-value is calculated from our test statistic and

    degrees of freedom to be 0.049 (3dp). This value tells us that the probability of

    observing a test statistic as extreme as we did by chance, assuming the null

    hypothesis (that the two means are the same) is true; is 0.049; or in other words

    that there is 4.9% chance our results are due to coincidence rather than an actual

    difference in population means.

    c. The Two Sample t-Test with Unequal Variance output from Microsoft Excel is below:

    Men WomenMean 2.706666667 1.9875

    Variance 0.140074286 0.142501282

    Observations 36 40Hypothesized Mean

    Difference 0

    df 73

    t Stat 8.330093485

    P(T