statrep2
TRANSCRIPT
-
7/30/2019 statrep2
1/8
Fig. 2 Box plot for non survivors and survivors
The Random Variable of interest is age. Let X1 denotes age of survivors and X2 denotes the
age of non-survivors.
To answer
-value of 0.05).
Further to the above assumptions, we assume that the CDFs of X 1 and X2 have same shape.
This allows us to apply the wilcoxons rank sum test.
From the calculated p-value for Wilcoxon rank sum test (0.19), there is not enough evidence
against Ho (Ho: X1 is stochastically equal to X2).
Fig 3: ECDF for non survivors and survivors
But from the histograms and ECDF
5+ to 15, 15+ to 30, 30+ to 45, 45+ to 60, and 60+. Now, we do a chi-square test to see if
survivors and non-survivors have a homogeneous distribution across these age categories. We
get a p-value of 5.47710-6, which supports our belief that there is a difference in age
distributions of survivors and non-survivors. Now, since the sample size of survivors is 313,
and that of non-survivors is 443, we can do a z-test on problem of proportion for each age
category separately, null hypotheses being
-
7/30/2019 statrep2
2/8
0-5; Survivors=
tailed tests were done wherever null was refuted.
On performing Z tests, we get the following p values, and thus the adjoining conclusions:-
Age Category P value Conclusion
0 to 5 0-5; Survivors >0-5; Non-Survivors
5+ to 15
15+ to 30
30+ to 45
45+ to 60 45-60; Survivors = 45-60; Non-Survivors
60+ 60+; Survivors < 60+; Non-Survivors
I. (a) Is there a significant difference in Age distribution between male survivors
and male non survivors?
.
We use the same approach of dividing the population into age categories to find out if there is
a dependence of survival probability on age category as done in part (1), the only difference
being that here the two samples come from Male. Chi-square p value of 1.47e-11 implies
population of male survivors and non-survivors is not homogeneous with respect to age
categories. Thus, we go ahead with 6 separate Z tests, one for each age category. Null
hypotheses being as follows:-
0-5; Male_Survivors= 0-5; Male_Non-Survivors
5-15; Male_Survivors =
60+; Male_Survivors = 60+; Male_Non-Survivors
We began with two tailed tests and single tailed tests were done wherever null was refuted.
On performing Z tests, we get the following p values, and thus the adjoining conclusions:-
Age
-
7/30/2019 statrep2
3/8
The p-values for Lilliefors
(Kolmogorov-Smirnov) normality
test
0.0001661670 0.12109238
Shapiro-Francia normality test 0.0077707718 0.11744076
The p-values for all the tests for the two samples suggest that the samples of survivors are not
normal, whereas that of non survivors follow normal distribution (at assumed -value of
0.05).This clearly suggests that the distributions are not same. However, to reinforce on this,
we do a Kolmogorov Smirnov two sample test. This also suggests that the two samples come
from different distributions (p value = 0.01326) implying there is a significant difference in
age distributions of female survivors and dead.
We use the same approach of dividing the population into age categories to find out if there is
a dependence of
30-45; Female_Survivors = 30-45; Female_Non-Survivors
45-60; Female_Survivors = 45-60; Female_Non-Survivors
60+; Female_Survivors = 60+; Female_Non-Survivors
We began with two tailed tests and single tailed tests were done wherever null was refuted.
On performing Z tests, we get the following p values, and thus the adjoining conclusions:-
Age Category P value Conclusion
0 to 5
5+ to 15
15+ to 30
30+ to 45
45+ to 60
60+ 0.666 60+; Female_Survivors = 60+; Female_Non-Survivors
-
7/30/2019 statrep2
4/8
The above analysis suggests that there is a significant difference in age distribution between
female survivors and female non-survivors.
II. Remark on how Age affected the Survival Probability of a passenger on board
the Titanic, based on consolidations of your findings in 1 and 2 above.
The findings in 1 and 2 above suggest that females had higher survival probability than their
counterparts. Given that the boarders are males, infants and teenagers had higher survival
probability; however, age group of 15 to 30 and above 60 years had less survival probability.
Given that the boarders are females, age group of 45 to 60 had higher survival probability.
Possible reasons could have been that females and kids were given preference in going on life
boats, old could have thought of sacrificing their lives for the young.
-
7/30/2019 statrep2
5/8
IV. Is there a significant di erence in Survival Probability between the two genders?ff
Ho:No difference in the survival probability of the two genders viz. male and female
Ha: Significant difference in the survival probability of the two genders viz. male and
female (Two-sided)
Data:
The below table displays the problems data:-
Survivor Non-Survivor Total
Males 142 709 851
Females 308 154 462
Total 450 863 1313
Test adopted for testing the hypothesis:
Since its a problem of proportion and we would like to compare the survival probabilities of
male and female, we can use the following tests:
1. Fishers exact test
2. Z-test
-
7/30/2019 statrep2
6/8
Fishers exact test is more powerful test in this case but we can also do a Z-test as the sample
size is large.
Conclusion: On the basis of Z-test we conclude that there is a significant difference in the
survival probability of the two genders.
We have the following data:-
Survivors Non-Survivors
Passenger Class I 193 129
Passenger Class II 119 161
Passenger Class III 138 573
The p-value of 2.210-16 suggests that there is enough evidence to reject the null hypothesis
(at -value of 0.05). It can be said that there is a significant difference between population
distributions across passenger classes.
We further break the data to compare different classes. We did single-tailed Fishers test by
taking sets of two classes at a time. This helped us find which passenger class had better
-
7/30/2019 statrep2
7/8
chance of survival. It was observed that the survival probability is highest for Class I
followed by Class II with Class III having the lowest probability for survival.
The above conclusion agrees with the common knowledge that passengers in first class had
the first option to mount the lifeboats. Passengers in third class were the last to mount the
lifeboats.
VI. Is there a significant difference in Survival Probability between the two genders
even after taking the effect of Passenger Class into Account?
We make three 22 contingency tables corresponding to each class, and do Fishers test as
follows:-
Class I Survivors Non Survivors
Male 59 120
Female 134 9
We did a two sided Fishers test which yielded a p value of less than 2.2e-16, i.e., there is a
significant difference in Survival Probability between the two genders for class1. So, we did a
one-sided fishers test We did a two sided Fishers test which yielded a p-value of less than
2.2e-16, i.e., there is a significant difference in Survival Probability between the two genders
for Class II. So, we did a one-sided
Class III Survivors Non Survivors
-
7/30/2019 statrep2
8/8
441
Female 80 132
We did a two sided Fishers test which yielded a p value of less than 2.2e-16, i.e., there is a
significant difference in Survival Probability between the two genders for class2. So, we did a
one-sided fishers test with alternate hypothesis being that males survival probability is
less than that of