sadc course in statistics introduction to non- parametric methods (session 19)
TRANSCRIPT
![Page 1: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/1.jpg)
SADC Course in Statistics
Introduction to Non-Parametric Methods
(Session 19)
![Page 2: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/2.jpg)
2To put your footer here go to View > Header and Footer
Learning ObjectivesAt the end of this session, you will be able to
• Understand the general meaning of non-parametric methods and when they might be used
• Implement and interpret a simple non-parametric test, the sign test, and understand its advantages and limitations
• Appreciate some practical problems associated with non-parametric methods
![Page 3: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/3.jpg)
3To put your footer here go to View > Header and Footer
An illustrative example
A random sample of 12 small businesses were asked “What percentage of last year’s profit was reinvested?”.
Data: 5.1, 6.4, 7.1, 23.6, 4.7, 14.3,
5.9, 5.5, 11.6, 17.5, 8.2, 7.7
A government official claims the real “average” is 10%.
How can this claim be tested?
![Page 4: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/4.jpg)
4To put your footer here go to View > Header and Footer
Start by plotting
- A very skewed distribution
5 10 15 20 25reinvest
Boxplot of % Reinvested
![Page 5: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/5.jpg)
5To put your footer here go to View > Header and Footer
Addressing the question …
• A one-sample t-test is often employed in such cases, but the procedure assumes normally distributed data
• This is clearly NOT the case here, and hence the validity of the t-test procedure is questionable
![Page 6: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/6.jpg)
6To put your footer here go to View > Header and Footer
Recall the t-test is robust to departures from normality due to the Central Limit Theorem
We only need to worry if the sample size is quite small and/or the underlying distribution is very non-normal
Hence, we might be concerned here about applying a t-test in our example
Robustness of the t-test
![Page 7: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/7.jpg)
7To put your footer here go to View > Header and Footer
Two alternative approaches
• TransformationsAre the measurements approximately normally distributed on a different measurement scale, e.g. a logarithmic scale? If so, analyse the data on the transformed scale
• Non-Parametric methodsUtilise a technique that does not assume a normal distribution. Such methods are often collectively referred to as non-parametric methods …
![Page 8: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/8.jpg)
8To put your footer here go to View > Header and Footer
• Non-parametric methods (or tests) derive their name from the fact that no explicit distribution (e.g. normal, gamma, …) is associated with the data
• Occasionally the techniques are called distribution-free methods, but assumptions may be made, e.g. a symmetrical distribution. Hence, the name is potentially misleading
• To illustrate the above we shall now apply a simple sign test to the example
Non-Parametric methods
![Page 9: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/9.jpg)
9To put your footer here go to View > Header and Footer
Back to the example
• Let us make no assumption about the distribution of reinvestment percentages
• Have said this, the distribution is clearly very skewed. When attempting to summarise the “average” of such a distribution the median is a natural choice
– Sample median = 7.4%
• The median is a flexible summary and so hypotheses of interest are generally phrased in terms of a population median
![Page 10: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/10.jpg)
10To put your footer here go to View > Header and Footer
The sign test
Hypotheses:
H0: Population median, =10% vs.
H1: Population median, 10%
Assumptions: Data values are independent. No distributional assumption is necessary
Logic: If H0 is true, then we would expect half
of the observed values to fall below 10 and half above 10. How inconsistent is our data with this expectation?
![Page 11: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/11.jpg)
11To put your footer here go to View > Header and Footer
Applying the sign test
• List the data in ascending order:4.7, 5.1, …,8.2, 11.6, …, 23.6
• If a value is < 10 assign a negative sign;if a value is > 10 assign a positive sign
• Under H0, we have a random sample of n=12 binary outcomes (– or +):
– – – – – – – – + + + +• This gives 8 –ve and 4 +ve signs compared
to the expected 6 and 6 respectively
![Page 12: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/12.jpg)
12To put your footer here go to View > Header and Footer
Applying the sign test
• How unusual is this result under H0?
• A natural test statistic is literally the number of +ve signs [the choice –ve vs. +ve is arbitrary]
• A sufficiently small or large value is evidence to reject H0
• Under H0, R=number of +ve signs follows a binomial distribution with n=12 and p=0.5– This is a symmetric distribution
• A two-sided p-value is thenProb(R4)+Prob(R8) = 2Prob(R4)
![Page 13: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/13.jpg)
13To put your footer here go to View > Header and Footer
The p-value
• Using statistical software, e.g. Stata:
Two-sided test:
Ho: median of reinvest - 10 = 0 vs.
Ha: median of reinvest - 10 != 0
Pr(#positive >= 8 or #negative >= 8) =
min(1, 2*Binomial(n = 12, x >= 8, p = 0.5))= 0.3877
• P-value = 0.39
• This may be calculated by using the Excel BINOMDIST worksheet function
![Page 14: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/14.jpg)
14To put your footer here go to View > Header and Footer
Conclusions
• The p-value is very large. Hence, there is no evidence to reject H0
• The estimated median reinvestment, 7.4%, is not significantly different from 10%
• There is no evidence based on this survey against the government official’s claim
![Page 15: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/15.jpg)
15To put your footer here go to View > Header and Footer
Further notes• P-value calculation
– The p-value may be approximated using the normal approximation to the binomial distribution
– Compare Z with the tails of a N(0,1) distribution– n > 20 will usually give a reasonable
approximation
0 H ,
R n/2Under Z = N(0,1) approximately
n/2
![Page 16: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/16.jpg)
16To put your footer here go to View > Header and Footer
Further notes
• No signs– If any value equals the hypothesised median of
10 then it is ignored and the sample size is reduced accordingly
• One-sided tests– Although a two-sided example was discussed,
one-sided tests are also possible
![Page 17: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/17.jpg)
17To put your footer here go to View > Header and Footer
Pros and cons of the sign test
Advantages
• Simple and logical
• Widely applicable– Few assumptions
• Robust to outliers– Recorded values are not used, only signs
![Page 18: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/18.jpg)
18To put your footer here go to View > Header and Footer
Pros and cons of the sign test
Major Disadvantages
• Severe loss of information– Recorded values not used, only signs– Makes the sign test inefficient
• Confidence intervals (CIs)– A CI for the true median can be constructed,
but it is cumbersome– Software packages tend not to present a CI for
the median, instead concentrating on the p-value
![Page 19: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/19.jpg)
19To put your footer here go to View > Header and Footer
Concluding remarks
• Non-parametric methods generally concentrate on hypothesis testing, and hence the p-value
• The lack of confidence intervals is a major disadvantage
• We shall return to these issues in Session 20
![Page 20: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/20.jpg)
20To put your footer here go to View > Header and Footer
References
The two references below apply to bothSessions 19 and 20 and also to non-parametric methods in general.
• Conover, W.J. (1999) Practical Nonparametric Statistics. 3rd edn. Wiley, pp. 584.
• Sprent, P., (1993) Applied Nonparametric Statistical Methods, 2nd edn. Chapman and Hall, London.
![Page 21: SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)](https://reader035.vdocuments.us/reader035/viewer/2022081513/5515fa8d550346d46f8b5854/html5/thumbnails/21.jpg)
21To put your footer here go to View > Header and Footer
Practical work follows …