statistics for the terrified paul f. cook, phd center for nursing research

Statistics for the Terrified

Paul F. Cook, PhDCenter for Nursing Research

What Good are Statistics?

• “How big?” (“how much?”, “how many?”)– Descriptive statistics, including effect sizes– Describe a population based on a sample– Help you make predictions

• “How likely?”– Inferential statistics– Tell you whether a finding is reliable, or

probably just due to chance (sampling error)

Answering the 2 Questions

• Inferential statistics tell you “how likely”– Can’t tell you how big– Can’t tell you how important– “Success” is based on a double negative

• Descriptive statistics tell you “how big”

– Cohen’s d =

– Pearson r (or other correlation coefficient)– Odds ratio

x1 – x2

SDpooled

How Big is “Big”?

• Correlations– 0 = no relationship, 1 = upper limit– + = + effect, - = - effect– .3 for small, .5 for medium, .7 for large– r2 = “percent of variability accounted for”

• Cohen’s d– Means are how many SDs apart?: 0 = no effect– .5 for small, .75 for medium, 1.0 for large

• Odds Ratio– 1 = no relationship, <1 = - effect, >1 = + effect

• All effect size statistics are interchangeable!

How Likely is “Likely”? - Test Statistics

A ratio of “signal” vs. “noise”

X1 X2

x1 – x2

z = s1

2/n1 + s22/n2

“signal” : AKA, “between-groups variability” or “model”“noise” : AKA, “within-groups variability” or “error”

How do We Get the p-value?

Chebyshev’s Theorem:

-1.96 SD 1.96 SD

-1.96 1.96

2.5% 2.5%

z > + 1.96 is the critical value for p < .05 (half above, half below: always use a 2-tailed test unless you have reason not to)

Hypothesis Testing – 5 Steps

1. State null and alternative hypotheses2. Calculate a test statistic3. Find the corresponding p-value4. “Reject” or “fail to reject” the null

hypothesis (your only 2 choices)5. Draw substantive conclusions

Red = statistics, blue = logic, black = theory

How Are the Questions Related?

• “Large” z = a large effect (d) and a low p

• But z depends on sample size; d does not– Every test statistic is the product of an effect

size and the sample size– Example: = 2 / N

• A significant result (power) depends on:– What alpha level () you choose– How large an effect (d) there is to find– What sample size (n) is available

What Type of Test?

• N-level predictor (2 groups): t-test or z-test• N-level predictor (3+ groups): ANOVA (F-

test)• I/R-level predictor: correlation/regression

• N-level dependent variable: 2 or logistic reg.

Correlation answers the “how big” question, but can convert to a t-test value to also answer the “how likely” question

The F test

• ANOVA = “analysis of variance”• Compares variability between groups to

variability within groups

• Signal vs. noise

MSEb avg. difference among meansF = = MSEw avg. variability within each group

Omnibus and Post Hoc Tests

• The F-test compares 3+ groups at once• Benefit: avoids “capitalizing on chance”• Drawback: can’t see individual differences• Solution: post hoc tests

– Bonferroni correction for 1-3 comparisons (uses an “adjusted alpha” of .025 or .01)

– Tukey test for 4+ comparisons

F and Correlation (eta-squared)

SSb

eta2 = = % of total variability that is due SStotal to the IV (i.e., R-squared)

The F-Table:

SS df MS F p Between Between SSb / dfb MSb / MSw .05Within Within SSw / dfw

Total Total(= SSb + SSw) (= dfb + dfw)

Correlation Seen on a Graph

Moderate Correlation

Same Direction,Weak Correlation

Same Direction,Strong Correlation

Regression and the F-testThe line of best fit(minimizes sum ofsquared residuals)

Predicted value

Actual value

Error variance (residual)

Model variance (predicted)

Avg. SSmodel variance F = Avg. SSerror variance

Parametric Test Assumptions

• Tests have restrictive assumptions:– Normality– Independence– Homogeneity of variance– Linear relationship between IV and DV

• If assumptions are violated, use a nonparametric alternative test:– Mann-Whitney U instead of t– Kruskal-Wallis H instead of F– Chi-square for categorical data

Chi-Square

• The basic nonparametric test • Also used in logistic regression, SEM• Compares observed values (model) to

observed minus predicted values (error)

• Signal vs. noise again• Easily converts to phi coefficient:

= √2 / N

( Fo - Fe )2

2 = Fe

2-by-2 Contingency Tables

Dependent Observations

• Independence is a major assumption of parametric tests (but not nonparametrics)

• Address non-independence by collapsing scores to a single observation per participant:– Change score = posttest score – pretest score– Can calculate SD (variability) of change scores

• Determine if the average change is significantly different from zero (i.e., “no change”):– t = (average change – zero) / (SDchange / √ n )– Nonparametric version: Wilcoxon signed-rank test

ANCOVA / Multiple Regression

• Statistical “control” for confounding variables – no competing explanations

• Method adds a “covariate” to the model:– That variable’s effects are “partialed out”– Remaining effect is “independent” of confound

• One important application: ANCOVA– Test for post-test differences between groups– Control for pre-test differences

• Multiple regression: Same idea, I/R-level DV– Stepwise regression finds “best” predictors

“Unique Variability” for IV1

“Unique Variability” for IV2

“Shared Variability”for IV1 & IV2

Unexplained variabilityremaining for the dependent variable

This circle representsall of the variability

that exists in the dependent variable

Independent Variable #1

Independent Variable #2

This is the amount of variability inthe DV that can be accountedfor by its association with IV1

This is the amount of variability inthe DV that can be accountedfor by its association with IV2

The “unique variability” is the part of the variability in the DV thatcan be accounted for only by this IV (and not by any other IV)The “shared variability” is the part of the variability in the DV thatcan be accounted for by more than one DV.

When two IVs account for the same variability in the DV (i.e., whenthere is shared variability), they are “multicollinear” with each other.What’s left over (variability in the DV not accounted for by any predictor) is considered “error”—random (i.e., unexplained) variability

DV

IV1

IV2

The percentage of variability in the DV that can be accountedfor by an IV is the definition of R2—the coefficient of determination.

This graph can also be used to show the percentage of thevariability in the DV that can be accounted for by each IV.

If …

“Total” R2 for IV1 & IV2 together =

All unique variability for the IVs (not including any shared)

Total SS for the DV

is 30% of …

… thenthe TotalR2 is .30

DV

IV1

IV2

Semipartial R2 for IV1 =

The semipartial R2 is the percentage of variability in the DV that can be accounted for by one individual predictor, independent of the effects of all of the other predictors.

A related concept is the idea of a “semipartial R2, whichtells you what % of the variability in the DV can be accountedfor by each IV on its own, not counting any shared variability.

is 20% of …

If …

Type III SS for IV1

Total SS for the DV

… then the semipartial R2 for IV1 is .20

Lying with Statistics

• Does the sample reflect your population?• Is the IV clinically reasonable?• Were the right set of controls in place?• Is the DV the right way to measure outcome?• Significant p-value = probably replicable

– APA task force: always report exact p-value

• Large effect size = potentially important– APA, CONSORT guideline: always report effect

size

• Clinical significance still needs evaluation

What We Cover in Quant II• Basic issues

– Missing data, data screening and cleaning– Meta-analysis– Factorial ANOVA and interaction effects

• Multivariate analyses– MANOVA– Repeated-Measures ANOVA

• Survival analysis• Classification

– Logistic regression– Discriminant Function Analysis

• Data simplification and modeling– Factor Analysis– Structural Equation Modeling

• Intensive longitudinal data (hierarchical linear models)

• Exploratory data analysis (cluster analysis, CART)

statistics for the terrified paul f. cook, phd center for nursing research

Documents

groups variability

teststhe ftest

test statisticfind

tukey test

large effect d

ttest value

total variability

effect sizesdescribe