© 2006 1 inferential statistics testing hypotheses
Post on 15-Jan-2016
219 Views
Preview:
TRANSCRIPT
1© 2006
Inferential statistics
Testing hypotheses
Evidence-based Chiropractic © 20062
In inferential statistics
• Data from samples are used to make inferences about populations
• Researchers can make generalizations about an entire population based on a smaller number of observations
• However, the sample means will not all be the same when repeated random samples are taken from a population
Evidence-based Chiropractic © 20063
Sampling distributions
• If many different samples were taken from a population, it would produce a distribution of sample means
• If repeated enough times, the distribution would take on a normal shape– Even if the underlying population is not
normal
• If repeated an infinite number of times, it would be called a sampling distribution
Evidence-based Chiropractic © 20064
Sampling distributions (cont.)
• Which of the sample means is truly the population mean?– It would be useful to know, but an exact figure
is not possible
• The population mean can be inferred from the sample – The sample mean is an estimate– Referred to as the point estimate
Evidence-based Chiropractic © 20065
Sampling distributions (cont.)
• Because sampling distributions are normal, the properties of the normal distribution can be used – e.g., the 68.3, 95.5, 99.7 proportion of the
area under the curve
Evidence-based Chiropractic © 20066
Standard error of the mean (SEm)
• The spread of means around the mean of a sampling distribution
• Can be estimated from the sample – SEm is calculated by dividing the SD of the
sample by the square root of the number of units in the sample
nSm SE
Evidence-based Chiropractic © 20067
SEm (cont.)
• SEm is higher when
– The sample’s SD is large or – The sample size is small
• Lower when – SD is a small or – The sample size is large
• A small SEm is preferable because generalizations are more precise
Evidence-based Chiropractic © 20068
Confidence Intervals (CIs)
• A CI is a range of values that is likely to contain the population parameter that is being estimated (e.g., the mean)
• The probability that this range of values contains the population parameter is typically 95% – Thus, the 95% confidence interval
Evidence-based Chiropractic © 20069
Confidence Intervals (CIs)
-3 -2 -1 0 +1 +2 +3
Evidence-based Chiropractic © 200610
CIs (cont.)
• One can have 95% confidence that the value of the true mean lies within the calculated interval (i.e., 95% CI)
Evidence-based Chiropractic © 200611
Calculating a 95% CI
1. Find the z-score (using a z-table) that corresponds to the area under the distribution that includes 95% of all values (e.g., z = ±1.96 for a 95% CI)
2. Multiply the z-scores by the SEm
3. Add the product to the sample mean to find the upper limit of the CI and subtract to find the lower limit
Evidence-based Chiropractic © 200612
Size (width) of CIs
• The size of the CI is related to the size of the sample and the size of the data variation– Small samples & large variation = larger CIs – Large samples & small variation = smaller CIs
Evidence-based Chiropractic © 200613
Hypothesis testing
• A hypothesis is an assumption that appears to explain certain events, which must be tested to see whether it is true
• Research hypothesis – a.k.a., alternative hypothesis – Denoted H1 – The research hypothesis is not tested directly
• Instead the null hypothesis (H0) is tested
Evidence-based Chiropractic © 200614
Hypothesis tests
• Depending on the outcome of the test of H0, there is either support for or against the research hypothesis
• Hypothesis testing involves the comparison of the means of groups in an experiment– The objective is to find out whether they are
significantly different from each other
Evidence-based Chiropractic © 200615
Hypothesis tests (cont.)
• When comparing the means of an active treatment group and a control group, one looks for a difference – The treatment may produce a better outcome
leading to a higher mean than the control group
– The difference may appear real, but it may be due to chance
– Statistical tests verify if it is real
Evidence-based Chiropractic © 200616
The null hypothesis
• H0 states that there is no difference between the group means
• H1 is accepted only if the null hypothesis proves to be unlikely – Typically it must be at least 95% unlikely – If H0 is unlikely, it is rejected
• Not unlike the innocent until proven guilty concept in our legal system
Evidence-based Chiropractic © 200617
A hypothetical neck pain study
• Patients are treated with chiropractic vs. usual medical care – Outcome measure is the Neck Disability Index
(NDI)– H1
• Chiropractic patients will have lower mean NDI scores after treatment
– H0 • There is no difference between mean NDI scores
Evidence-based Chiropractic © 200618
Hypothetical study (cont.)
• Results – Mean NDI scores of chiropractic patients
• 28 before, 10 after treatment
– Mean NDI scores medical patients • 29 before, 15 after treatment
• Chiropractic care appears to be better– But is there enough difference to rule out
chance– Must perform statistical tests to find out
Evidence-based Chiropractic © 200619
30
20
10
0
Hypothetical study (cont.)N
DI
scor
e
Baseline Outcome
ChiropracticMedical
Is this difference enough to be meaningful?
Is this difference enough to be meaningful?
Evidence-based Chiropractic © 200620
Statistical significance
• The results of a study (i.e., the difference between groups) are unlikely to be due to chance – At a specified probability level, referred to as
alpha () is the probability of incorrectly rejecting a
null hypothesis
• If the results are not due to chance, H0 is rejected and H1 is accepted
Evidence-based Chiropractic © 200621
Statistical significance (cont.)
• It must be at least 95% unlikely that H0 is true before it can be rejected
• There is still a 5% chance that H0 would be rejected, when it was actually true
• Accordingly, P values must be equal to or less than 5% in order for the results of a study to reach a level of statistical significance
Evidence-based Chiropractic © 200622
Statistical significance (cont.)
• The level of significance (alpha level) is not the same as the P value– The alpha level must be set before the study
begins – The P value is calculated at the completion of
the study and must be ≤ to the alpha level in order to reach statistical significance
Evidence-based Chiropractic © 200623
Statistical significance (cont.)
• Even when studies are not statistically significant, there is a 1:20 chance that significant results would occur if the study was repeated 20 times
• Fishing– When researchers perform a lot of statistical
tests on their data – Increases the chance that at least one of the
tests will wrongly reach statistical significance
Evidence-based Chiropractic © 200624
Type I & II errors
• Type I error (a.k.a., alpha error)– Rejecting a true null hypothesis– The probability of making a Type I error is
equal to the value of α
• Type II error (a.k.a., beta error )– Failure to reject a false null hypothesis– The probability of making a Type II error is
equal to the value of beta ()
Evidence-based Chiropractic © 200625
Type I & II errors (cont.)
Consequences of accepting or rejecting true and false null hypotheses
Consequences of accepting or rejecting true and false null hypotheses
Evidence-based Chiropractic © 200626
Type I & II errors (cont.)
• There is a trade-off between the likelihood of a study resulting in a Type I error versus a Type II error
• As alpha becomes smaller, the chance of making a Type I error decreases
• Whereas the chance of making a Type II error increases – Because it is more likely that a false H0 will
not be rejected
Evidence-based Chiropractic © 200627
Type I & II errors (cont.)
The 0.05 alpha level is a compromise between Type I and Type II errors
Evidence-based Chiropractic © 200628
Power
• The probability of correctly rejecting a false H0
– Related to error – Power is equal to 1-
• Power depends on sample size, the magnitude of the difference between group means, and the value of α
Evidence-based Chiropractic © 200629
Power (cont.)
• Power increases as – Sample size increases
• Only to a certain extent, then it becomes a waste of resources
– The difference between group means increases
– α increases
• A power value of 0.80 is often sought by researchers
Evidence-based Chiropractic © 200630
Power (cont.)
• Power may be calculated after a study has been completed (post hoc)– If low power is detected during post hoc
power analysis and H0 was not rejected, it may be grounds to repeat the study using a larger sample
Evidence-based Chiropractic © 200631
Confidence intervals and hypothesis testing
• If the value specified as the difference between group means in the null hypothesis is included in the 95% CI, then H0 should not be rejected
– The test is not statistically significant
• H0 states there is no difference between group means, so the specified no difference value is always zero
Evidence-based Chiropractic © 200632
CIs and hypothesis testing (cont.)
• If zero is not included in the 95% CI, the null hypothesis should be rejected – The test is statistically significant
• CIs are becoming more prevalent in the health care literature because they convey more information than P values alone
Evidence-based Chiropractic © 200633
CIs and hypothesis testing (cont.)
• Example study– Brinkhaus et al.– Acupuncture was more effective in improving
pain on VAS* than no acupuncture in chronic low back pain patients
• Difference, 21.7 mm (95% CI 13.9 to 30.0)
– But no statistical difference between acupuncture and minimal acupuncture
• Difference, 5.1 mm (95% CI -3.7 to 13.9)* Visual analog scale
Evidence-based Chiropractic © 200634
Clinical significance a.k.a., practical significance
• Do the findings of a study really matter in clinical situations
• Sometimes a study is statistically significant, but the findings are not important in clinical terms– Large studies with small differences between
groups can generate statistically significant findings that are not meaningful to practitioners
Evidence-based Chiropractic © 200635
Clinical significance (cont.)
• For example – A study found a statistically significant
difference between mean Headache Disability Inventory (HDI) scores of only 10 points
– Yet at least a 29-point change must occur from test to retest before the changes can be attributed to a patient’s treatment
• The HDI is not very responsive to change
Evidence-based Chiropractic © 200636
Commonly encountered statistical tests
• Statistical tests determine the probabilities associated with relationships in studies– Are the results real or merely due to chance?
• t-test, ANOVA, and chi-square are common in journal articles– Familiarity with these tests is helpful in the
appraisal of articles
Evidence-based Chiropractic © 200637
t-test
• Used to find out whether the means of two groups are statistically different
• Results are not entirely black-and-white – Only indicates that the means are probably
different– Or, that they are probably the same, if the
study fails to find a difference
• The t-test can be used for a single group by comparing the mean with known values
Evidence-based Chiropractic © 200638
t-test (cont.)
• The actual differences between means is considered
• Also the amount of variability of the scores– A high degree of variability of group scores
can obscure the differences between means
Evidence-based Chiropractic © 200639
t-test (cont.)
• The differences between means are the same in both examples, but the variability of group scores differs
• The lower example would be much more likely to reach statistical significance because of the narrow spread
Evidence-based Chiropractic © 200640
Assumptions of the t-test
• The data should be normal and involve interval or ratio measurement
• Groups should be independent
• The variances of groups should be equal
• When the sample size is large enough (about 30 subjects) violations of these assumptions are less important
Evidence-based Chiropractic © 200641
Alternatives to the t-test
• The t-test for unequal variances
• Non-parametric tests for use with skewed data– Mann-Whitney U test– Wilcoxon test
Evidence-based Chiropractic © 200642
Paired t-test
• Groups are dependent – The same subjects are in each of the groups
• e.g., repeated measures studies
– Or subjects are matched• e.g., twins or when subjects are very much alike
Evidence-based Chiropractic © 200643
Analysis of variance (ANOVA)
• Used to compare means when more than two groups are involved
• Repeating t-tests increases the probability of producing a Type I error
• ANOVA can only compare one outcome variable – Univariate
• MANOVA counters this
Evidence-based Chiropractic © 200644
ANOVA (cont.)
• ANOVA provides information about – Whether there are any significant
differences among the group means– Whether any of the particular groups differ
from each other – Whether the differences are relatively big or
small
Evidence-based Chiropractic © 200645
Assumptions of ANOVA test
• Normally distributed data
• Groups should be independent
• Variances of groups should be equal
• If not, a nonparametric test should be used – Kruskal-Wallis test – Friedman test
Evidence-based Chiropractic © 200646
Between and within-group variance
The means of3 groups arecompared
Evidence-based Chiropractic © 200647
Comparison tests
• Compare the group pairs (pairwise)• Common comparison tests include
– Tukey• Used if the groups are of unequal size
– Bonferroni• For both equal and unequal group sizes
– Scheffé• Is very conservative to minimize the risk of type I
error
Evidence-based Chiropractic © 200648
Comparison test results
Tukey HSD
(I) Type of care
(J) Type of care
Difference (I-J)
Std. Error
P value
95% Confidence Interval
Chiro MD 6.87500* 1.51677 .001 3.0519 to 10.6981
PT 7.25000* 1.51677 .000 3.4269 to 11.0731
MD Chiro -6.87500* 1.51677 .001 -10.6981 to -3.0519
PT .37500 1.51677 .967 -3.4481 to 4.1981
PT Chiro -7.25000* 1.51677 .000 -11.0731 to -3.4269
MD -.37500 1.51677 .967 -4.1981 to 3.4481
* The mean difference is significant at the .05 level.
Evidence-based Chiropractic © 200649
Chi-square test
• Used to test hypotheses involving categorical data
• There are 2 versions – Chi-square goodness of fit
• Determines if observed frequencies of occurrence differ from what would be expected by chance
– Chi-square test of independence • Tests to see if frequencies for one category differ
significantly from those of another category
Evidence-based Chiropractic © 200650
Chi-square goodness of fit
• Called the goodness of fit test because it tests whether observed frequencies “fit” against the expected frequencies
• For example– If a sample of Americans found 60 males and
40 females, would that be statistically significantly different from what would normally be expected (50/50)?
Evidence-based Chiropractic © 200651
Goodness of fit example (cont.)
• A chi-square table is used to see if the results are statistically significant – Only if the critical value is exceeded (3.84 in
this case)
• df is the number of categories minus 1
• The calculated Χ2 is 4– So, the sample is different from what was
expected
Evidence-based Chiropractic © 200652
Chi-square test of independence
• Frequencies of one variable are compared with another to see if they differ significantly
• A 2 X 2 contingency table (a.k.a., cross-tabulation table) is used
Evidence-based Chiropractic © 200653
A 2 X 2 contingency table
Yes No Row Total
Yes a b a+b
No c d c+d
Column Total a+c b+d a+b+c+dGrand Total
Var
iabl
e 1
Variable 2
Evidence-based Chiropractic © 200654
Example hypothetical study
• Two groups of patients are treated using different spinal manipulation techniques – Gonstead vs. Diversified
• The presence or absence of pain after treatment is the outcome measure
• Two categories– Technique used– Pain after treatment
Evidence-based Chiropractic © 200655
Gonstead vs. Diversified example - Results
Yes No Row Total
Gonstead 9 21 30
Diversified 11 29 40
Column Total 20 50 70Grand Total
Tec
hniq
ue
Pain after treatment
9 out of 30 (30%) still had pain after Gonstead treatment and 11 out of 40 (27.5%) still had pain after Diversified, but is this difference statistically significant?
Evidence-based Chiropractic © 200656
Gonstead vs. Diversified example (cont.)
• Find df and then consult a Χ2 table to see if
statistically significant– df = (number of categories for variable 1) -1 X
(number of categories for variable 2) -1
• There are two categories for each variable in this case, so df = 1
• Critical value at the 0.05 level and one df is 3.84 – Therefore, Χ2
is not statistically significant
Evidence-based Chiropractic © 200657
Χ2 required conditions
• Observations must be independent – The total number of observed frequencies
should not be higher than the number of subjects in the study
• No small expected frequencies – Expected frequencies less than one or less
than five in more than 20 percent of cells are too small
Evidence-based Chiropractic © 200658
Χ2 requirements (cont.)
– Fisher's exact test • An alternative to the chi-square test that is used
when expected frequencies are too small• All that is needed is at least one data value in each
row and one data value in each column
• No extremely small or extremely large samples – Extremely small samples may overlook
obvious false null hypotheses and extremely large samples may identify trivial differences
Evidence-based Chiropractic © 200659
Correlation
• A measure of mathematical relationships that may exist between two or more variables – i.e., if one variable increases or decreases,
the other one will also increase or decrease a specific amount
• Pearson’s correlation coefficient (r) is commonly used
Evidence-based Chiropractic © 200660
Correlation (cont.)
• Correlation coefficient values range from -1 to +1 +1 = perfect positive correlation -1 = perfect negative correlation
• The closer r is to +1 or -1, the more closely variables are related
Evidence-based Chiropractic © 200661
No cause-and-effect
• A strong relationship between two variables does not mean that one caused the other to change
• For instance, there is a strong relationship between coffee drinking and developing lung cancer – Actually, heavy coffee drinkers tend to be
heavy smokers– Smoking is the actual cause
Evidence-based Chiropractic © 200662
Scatterplots
• An X-Y graph with symbols that represent the values of two variables
Regression line
Regression line
Evidence-based Chiropractic © 200663
Examples
Positive correlation slopes upward
Positive correlation slopes upward
Negative correlationslopes downward
Negative correlationslopes downward
Evidence-based Chiropractic © 200664
Examples (cont.)
No correlationNo correlation
Evidence-based Chiropractic © 200665
Scatterplots (cont.)
• Show the form, direction, and strength of the relationship between variables
• Its form may be linear, but can also be curvilinear or nonlinear
• A correlation weakens after a certain point when data is curvilinear
Evidence-based Chiropractic © 200666
Curvilinear example
• As people age they get stronger to a certain point, but as they continue to age, they eventually begin to weaken
Evidence-based Chiropractic © 200667
Outliers
• Extreme values that are located far away from the group of data on a scatterplot
• Outliers can strongly influence the slope of the regression line – And the value of the correlation coefficient
• Authors should adequately discuss outliers– Why they occurred– How they were dealt with
Evidence-based Chiropractic © 200668
Outliers (cont.)
• Outliers are obvious on a scatterplot
Outlier Outlier
Evidence-based Chiropractic © 200669
Coefficient of determination
• Is the correlation coefficient squared– Symbolized as r2
• Only positive values are possible (because it is squared) – Ranging from 0 to 1
• Denotes how much of the variation in one variable can be explained by the other variable
Evidence-based Chiropractic © 200670
Coefficient of determination
• Example– If a study on the relationship between the
amount lifted at work and the incidence of low-back pain reported r2 as 0.65
– One could say that 65% of the variability in the incidence of low-back pain was explained by the amount workers lifted
– Other factors are responsible for the remaining 35% variability
Evidence-based Chiropractic © 200671
Regression
• Regression analysis– Calculation of the line of best fit passing
through a set of data– An equation is generated that describes the
line of best fit (a.k.a., least squares line)
• Using the equation, predictions can be made about the direction and amount variables change
Evidence-based Chiropractic © 200672
Regression (cont.)
• A regression line is fitted by minimizing the sum of squared deviations of the data points from the least squares line
• The regression equation is Y = a + bX, where – a is the Y intercept– b is the slope of the line – X is the value of the (predictor) variable
Evidence-based Chiropractic © 200673
The value of Y can be calculated from a given value of X
a b
Regression (cont.)
Y
X
Evidence-based Chiropractic © 200674
The line is positioned so that the distances of all deviations are as short as possible
The regression line
Evidence-based Chiropractic © 200675
Multiple regression
• Frequently outcomes are affected by more than one predictor variable
• The multiple regression equation is similar to simple regression, but with more than one value for b. Thus, the equation is Y = a + b1X1 + b2X2 + . . . + bkXk, where
• X1 is the first predictor variable, X2 is the second, and Xk continues for as many predictor variables as are involved
top related