inferential statistic –non parametric test by- dr harshal p. bhumbar
TRANSCRIPT
DEFINITION
A statistical method wherein the data is not required to fit a normal distribution. Nonparametric statistics uses data that is often ordinal, meaning it does not rely on numbers, but rather a ranking or order of sorts.
S. No.
Parametric Test
Non-parametric Test
1 Study of two independent samples
Student t test Wilcoxon-Mann-Whitney test
2 Study of two matched samples
Paired t test Wilcoxon signed rank test
3 Study of two or more independent samples
One way ANOVA
Kruskal-Wallis test
4 Study of two or more matched samples
Two way ANOVA
Friedman test
Difference between Parametric & Non-parametric test
Parametric test Non parametric test1. Used for ratio or interval data For ordinal or nominal data2. Used for Normal distribution Any distribution3. Mean is usual central measure Median is usual central
measure4. Information about population is
completely knownNo information available
5. Specific assumptions made regarding population
Assumption free test
6. Null hypothesis based on parameters of population
Null hypothesis free of parameters
7. Applicable only for variable For both variable & attribute
8. More efficient Less efficient9. More powerful if exists Less powerful
Wilcoxon-Mann-Whitney test
Example -
The effectiveness of advertising for two rival products (Brand X and Brand Y) was compared. Market research at a local shopping centre was carried out, with the participants being shown adverts for two rival brands of coffee, which they then rated on the overall likelihood of them buying the product (out of 10, with 10 being "definitely going to buy the product"). Half of the participants gave ratings for one of the products, the other half gave ratings for the other product.
We have two conditions, with each participant taking part in only one of the conditions. The data are ratings (ordinal data), and hence a nonparametric test is appropriate - the Mann-Whitney U test (the non- parametric counterpart of an independent measures t-test).
Brand X Brand Y
Participant
Rating Rank Participant
Rating Rank
1 3 3 1 9 11
2 4 4 2 7 9
3 2 1.5 3 5 5.5
4 6 7.5 4 10 12
5 2 1.5 5 6 7.5
6 5 5.5 6 8 10
STEP TWO: Add up the ranks for Brand X, to get T1
Therefore, T1 = 3 + 4 + 1.5 + 7.5 + 1.5 + 5.5 = 23 STEP THREE: Add up the ranks for Brand Y, to get T2 Therefore,
T2 = 11 + 9 + 5.5 + 12 + 7.5 + 10 = 55
STEP FOUR: Select the larger rank. In this case it’s T2
STEP FIVE: • Calculate n1, n2 and nx These are the number of participants
in each group, and the number of people in the group that gave the larger rank total.
• Therefore n1 = 6 n2 = 6 nx = 6
STEP SIX: • Find U (Note: Tx is the larger rank total) • U = n1*n2 + nx *(nx+1)/2 – Tx• U = 6*6+6*(6+1)/2- 55• U = 2
STEP SEVEN: Use a table of critical U values for the Mann-Whitney U Test
• For n1 = 6 and n2=6, the critical value of U is 5 at the 0.05 significance level.
• For n1 = 6 and n2=6, the critical value of U is 2 at the 0.01 significance level.
STEP EIGHT: To be significant, our obtained U has to be equal to or LESS than
this critical value. Our obtained U = 2
• Our obtained U = 2 The critical value for a two tailed test at .05 significance level = 5 The critical value for a two tailed test at .01 significance level = 2
• So, our obtained U is less than the critical value of U for a 0.05 significance level. It is also equal to the critical value of U for a 0.01 significance level.
But what does this mean? • We can say that there is a highly significant difference (p<.01)
between the ratings given to each brand in terms of the likelihood of buying the product.
Wilcoxon sign rank test
Example -
To know effectiveness of new drug designed to reduce repetitive behaviors in children affected with autism. A total of 8 children with autism enroll in study and amount of time that each is engaged in repetitive behaviour during three hour observation periods are measured both before treatment and then again after taking new medication for a period of 1 week . The data shown below -.
child Before treatment After 1 week treatment
1 85 752 70 503 40 504 65 405 80 206 75 657 55 408 20 25
First we compute difference score for each child
child Before treatment
After 1 week treatment
Difference(before-after)
1 85 75 10
2 70 50 20
3 40 50 - 10
4 65 40 25
5 80 20 60
6 75 65 10
7 55 40 15
8 20 25 - 5
Next step to rank difference scores . First order absolute values of difference scores and assigned rank from 1 to lowest and n to highest for difference scores and assigned mean rank when there are ties in absolute values of different scores.
Observed difference
Ordered absolute value of difference
rank
10 - 5 1
20 10 3
- 10 - 10 3
25 10 3
60 15 5
10 20 6
15 25 7
- 5 60 8
Final step is to attach signs ( +, - ) of observed difference to each rank shown below.
rank Signed rank
1 -13 33 - 33 35 56 67 78 8
Test statistics for Wilcoxon sign rank test is given by W.W+ ( sum of positive ranks )W- ( sum of negative ranks ) If Ho – true then W+ = W- If research hypothesis true then W+ > W- In our example , W+ = 32 and W- = 4 Recall sum of ranks always equal to n(n+1)/2, In our assignment , ( 8*9)/2 = 36 , Test statistics is W = 4,
If the absolute value of W less than or equal to critical value we reject null hypothesis and if observed value of W exceeds critical value we don’t reject null hypothesis.
Friedman test
Example-
Hall et all compared three methods of determining serum amylase values in patients with pancreatitis. The results are shown in following table .we wish to know whether these data indicates a difference among three methods. ( given @=0.05 )
Specimen
Methods of determination
A B C1 4000 3210 61202 1600 1040 24103 1600 647 22104 1200 570 20605 840 445 14006 352 156 2497 224 155 2248 200 99 2089 184 70 227
Following table shows serum amylase values ( enzyme units per 100 ml of serum ) in patients with pancreatitis.
Hypothesis – Ho – MA = MB= MC H1 - at least one equality is violated. Test statistics, b= 9 & k = 3 After converting original observations to ranks , we have
Specimen Methods of determination
A B C1 2 1 32 2 1 33 2 1 34 2 1 35 2 1 36 3 1 27 2.5 1 2.58 2 1 39 2 1 3
So RA= 19.5 , RB= 9 , Rc=25.5
So by equation we have, k= 3 & b=9 Friedman test statistics
Xr² = 12/bk(k+1) ∑Rj²- 3b (k+1)
Xr²= 15.5
From table X²(1-ά, k-1) , where ά=0.05 , k=3 X²(0.95, 2) = 5.991 Since 15.5> 5.991 , we reject null hypothesis Conclusion- Enough evidence to support the claim that three
methods do not yield identical results.
Kruskal Wallis testEXAMPLE-
Does it make any difference to students comprehension of statistics whether the lectures are given in English , Serbo - croat or Cantonese?
Group A – lectures in English Group B – lectures in Serbo-croat Group C– Lectures in Cantonese DV : Students rating of lectures intelligibility on 100 point scale
English (Raw score)
English (Rank)
Serbo-croat(Raw score)
Serbo-croat(Rank)
Cantonese(Raw score)
Cantonese (Rank)
20 3.5 25 7.5 19 1.527 9 33 10 20 3.519 1.5 35 11 25 7.523 6 36 12 22 5
Step 1- Rank the scores ignoring which group they belong to .
• Lowest scores get lowest rank . • Tied scores get average rank
Step 2 –• Tc - Total of rank for each group• Tc1 – 20 • Tc2 – 40.5• Tc3 – 17.5
Step 3 – Find HWhere N- Total number of subjectsTc – Rank total for each groupnc – Number of subjects in each group
Hypothesis –• Ho – MA = MB= MC
• H1 - At least one equality is violated.• Test statistics, b= 9 & k = 3 • After converting original observations to ranks , we have
• ∑ Tc ²/nc = 20²/4 + 40.5²/4 +17.5²/4• = 586.62• H = 6.12 Step 4 – Df are number of groups minus one Step 5 – For 2 Df a chi square of 5.99 has a p = 0.05
occurring by chance• But our H is > 5.99 even so less likely occur by chance• H Is 6.12 , p< 0.05Conclusion – Three groups differ significantly. Language in which statistics is taught does make a difference
to students intelligibility.
Advantages Simple & easy to understand. Not involve complicated sampling theory. No assumption made regarding parent population.
Disadvantages Applied for only nominal or ordinal scale. They uses less information than parametric test. They are not so efficient as of parametric test.
References
• Rao VK. Biostatistics: A manual of statistical method for use in health nutrition and anthropometry. 2nd ed. New Delhi: Jaypee Brothers; 2007.
• Armitage P, Berry G. Statistical Method in Medical Research. 3rd ed. London: Oxford Blackwell scientific publication; 1994
• Swinskow TV, Campbell MJ. Statistics at Square One. 10th ed. London: BMJ Books; 2002.