comparing two samples harry r. erwin, phd school of computing and technology university of...
TRANSCRIPT
![Page 1: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/1.jpg)
Comparing Two Samples
Harry R. Erwin, PhD
School of Computing and Technology
University of Sunderland
![Page 2: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/2.jpg)
Resources
• Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley.
• Gentle, JE (2002) Elements of Computational Statistics. Springer.
• Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).
• Freund and Wilson (1998) Regression Analysis. Academic Press.
![Page 3: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/3.jpg)
Why Test?
• Statistics is an experimental science, not really a branch of mathematics.
• It’s a tool that can tell you whether data are accidentally or really similar.
• It does not give you certainty.
• This lecture discusses comparison of samples.
![Page 4: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/4.jpg)
Don't Complicate Things
Use the classical tests:• var.test to compare two variances (Fisher's F)• t.test to compare two means (Student's t)• wilcox.test to compare two means with non-normal errors
(Wilcoxon's rank test)• prop.test (binomial test) to compare two proportions• cor.test (Pearson's or Spearman's rank correlation) to
correlate two variables• chisq.test (chi-square test) or fisher.test (Fisher's exact test)
to test for independence in contingency tables
![Page 5: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/5.jpg)
Comparing Two Variances
• Before comparing means, verify that the variances are not significantly different.– var.text(set1, set2)
• This performs Fisher's F test• If the variances are significantly different, you
can transform the output (y) variable to equalise variances, or you can still use the t.test (Welch's modified test).
![Page 6: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/6.jpg)
Comparing Two Means
• Student's t-test (t.test) assumes the samples are independent, the variances constant, and the errors normally distributed. It will use the Welch-Satterthwaite approximation (default, less power) if the variances are different. This test can also be used for paired data.
• Wilcoxon rank sum test (wilcox.test) is used for independent samples, errors not normally distributed. If you do a transform to get constant variance, you will probably have to use this test.
![Page 7: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/7.jpg)
Paired Observations
• The measurements will not be independent.
• Use the t.test with paired=T. Now you’re doing a single sample test of the differences against 0.
• When you can do a paired t.test, you should always do the paired test. It’s more powerful.
• Deals with blocking, spatial correlation, and temporal correlation.
![Page 8: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/8.jpg)
Sign Test
• Used when you can't measure a difference but can see it.
• Use the binomial test (binom.test) for this.
• Binomial tests can also be used to compare proportions. prop.test
![Page 9: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/9.jpg)
Chi-square Contingency Tables
• Deals with count data.• Suppose there are two characteristics (hair
colour and eye colour). The null hypothesis is that they are uncorrelated.
• Create a matrix that contains the data and apply chisq.test(matrix).
• This will give you a p-value for matrix values given the assumption of independence.
![Page 10: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/10.jpg)
Fisher's Exact Test
• Used for analysis of contingency tables when one or more of the expected frequencies is less than 5.
• Use fisher.test(x)
![Page 11: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/11.jpg)
Correlation and Covariance
• Are two parameters correlated significantly?• Create and attach the data.frame• Apply cor(data.frame)• To determine the significance of a correlation, apply
cor.test(data.frame)• You have three options: Kendall's tau (method = "k"),
Spearman's rank (method = "s"), or (default) Pearson's product-moment correlation (method = "p")
![Page 12: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/12.jpg)
Kolmogorov-Smirnov Test
• Are two sample distributions significantly different?
or
• Does a sample distribution arise from a specific distribution?
• ks.test(A,B)
![Page 13: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/13.jpg)
Statistical Problems
• Outliers
• Unequal variances
• Correlated errors
![Page 14: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/14.jpg)
Outliers and Influential Observations
• Extreme responses are called outliers and extreme inputs are called leverage points.
• An observation that has great influence on the estimates is usually an outlier and a leverage point.
• Use the residual plot to detect them—discussed in the modelling presentation.
• Fix by verifying the correctness of the observation. If it happens to be correct, it may reflect a factor not present in any of the other observations.
![Page 15: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/15.jpg)
Unequal variances
• Mentioned earlier.
• Use– non-parametric statistics (usually not effective for
regression)– robust methods– rescaling– live with it
![Page 16: Comparing Two Samples Harry R. Erwin, PhD School of Computing and Technology University of Sunderland](https://reader036.vdocuments.us/reader036/viewer/2022072011/56649e305503460f94b20b85/html5/thumbnails/16.jpg)
Correlated errors
• The measurements in the data were not independent—usually because selection of the sample units was not strictly random. Frequent problem with time series data but can also reflect spatial correlation or simply sloppy data collection.
• An autoregressive model.
• Try special models
• Avoid with a careful experimental design.
Statistical Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland
Summary of Remainder Harry R. Erwin, PhD School of Computing and Technology University of Sunderland