choose the right statistical test with answers to four key ... · • the four questions you must...
TRANSCRIPT
Karen Grace-Martin
Choose the Right Statistical Test with Answers to Four Key Questions
What You’ll Learn Today:
• The Four Questions you must answer to choose an appropriate statistical method
• How they come together to help you narrow it way down
• Our focus is on choosing the right type of model or test, not details of that model or
test
• Where to get help with the process
2
Before we begin…
3
4
There will be an offer at the end.
http://www.theanalysisfactor.com/membership/
$47/month or $497/year
We have a special bonus today
1
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 5
Don’t Worry!
You will learn a lot regardless of whether or not you buy anything.
2
How This Will Work
• Webinar is ~ 1.5 hr
• Questions will be answered as we go
• There will be a recording and handout (more instructions at the end)
6
About Karen Grace-Martin
• Started out as a psychology researcher
• Worked with thousands of researchers
• Built The Analysis Factor to give
researchers an opportunity to get
statistical support that’s accessible
7
The Problem
8
What Makes a Statistical Method Right?
9
1. The statistical test or model answers the research question or sets you up to do so in another analysis
2. The data meet all statistical assumptions of the test
Four Questions to Answer for Each Analysis
1. What is your research question?
2. What is the design?
3. Which variables will you use to answer the research question and what is the scale of measurement of each?
4. Are there any data issues?
10
The 14 Steps Part 1: Plan
1. Write out research questions in theoretical and operational terms 2. Design the study or define the design 3. Choose the variables and determine their level of measurement 4. Write an analysis plan 5. Calculate sample size estimates
Part 2: Prepare and explore
6. Collect, code, enter, and clean data 7. Create new variables 8. Run univariate and bivariate statistics 9. Run an initial model
Part 3: Refine the model
10. Refine predictors and check model fit 11. Check assumptions 12. Check for and resolve data issues 13. Interpret Results 14. Write up Results
11
An Example I want to do the tests on a measure of Gestational Diabetes in conjunction with Iron and/or Vitamin C supplementation, so: Dependent Variable: Gestational Diabetes (0 = no; 1 = yes) Independent Variables: Iron Supplementation (0 = no; 1 = yes) Vitamin C Supplementation (0 = no; 1 = yes) So from what I understand, I should be able to do a chi-squared test, since they're all categorical variables? Is that correct, or did I miss something big? Also, I need to validate those against a few confounding variables, namely Age (continuous) Body Mass Index (continuous) Parity (continuous)
12
Question 1. What is your Research Question?
Step 1: Write out the Research Questions in Theoretical and Operational Terms
13
Theoretical Question:
Do levels of Iron and Vitamin C affect the likelihood of gestational diabetes?
Operational Question:
Do women who have received Iron supplements, Vitamin C supplements, or both, during the first trimester of pregnancy have different likelihood of developing gestational diabetes at any point during the pregnancy, controlling for age, BMI, and parity, compared to those who received a standard prenatal vitamin?
14
Key Info: 1. Comparing
four groups on a dependent variable
2. Control for
covariates
What We Get from the Operational Research Question
15
If the Research Question Contains: Statistical Method Needs to be able to include:
“predicts” “relationship between” “affects” Regression modeling, usually
“Controlling for” “Confounding variables” “Above and beyond”
Control Variables
“when ….” “in the presence of”…”moderate” Interactions
Group comparisons Categorical Predictors
“mediates the relationship between” “affects M, which in turn affects Y”
Mediation or Path Analysis
“change over time” Repeated Measures
What Makes Question 1 Important to Answer
• Translating from theory to operation
• Knowing what tests are available helps
• Knowing what variables you can actual collect will help narrow it down
16
What Makes Question 1 Difficult to Answer
• Not all research questions are testable
• This will directly affect the design, the variables, and the analysis
2. What is the design?
17
Step 2. Design the study or define the design
Step 2: Design Elements
Sampling: simple, stratified, convenience, matching, clustering Assignment of subjects to conditions/predictors: random assignment, observed Restrictions on randomization: blocking, order effects Co-occurrence of conditions: nested and crossed effects Independence: matching, clustering, repeated measures, longitudinal
18
Step 2: Design Names are Not Helpful
The Detail we need
Sampling
Assignment of subjects to conditions
Restrictions on randomization
Co-occurrence of conditions
Independence
Selection of factors
Design Names
Stratified Survey
Randomized Control Trial
Case-Control Study
Split Plot Design
Observational Design
Latin Square Design
Repeated Measures Design
19
• You need to consider logistical constraints now
• Different research questions can have different designs in the same study
• Names of designs aren’t helpful
• Design issues can get easily complicated
20
What Makes Question 2 Important to Answer
What Makes Question 2 Difficult to Answer
• Some design decisions are very logical but make the analysis much more difficult
• Failing to account for design issues in the analysis will lead to inaccurate results
• The design affects which research questions you can test
Step 2: Define The Design
I want to do the tests on a measure of Gestational Diabetes in conjunction with Iron and/or Vitamin C supplementation, so: Dependent Variable: Gestational Diabetes (0 = no; 1 = yes) Independent Variables: Iron Supplementation (0 = no; 1 = yes) Vitamin C Supplementation (0 = no; 1 = yes) So from what I understand, I should be able to do a chi-squared test, since they're all categorical variables? Is that correct, or did I miss something big? Also, I need to validate those against a few confounding variables, namely Age (continuous) Body Mass Index (continuous) Parity (continuous)
21
Missing Info: 1. Are iron and vitamin C conditions crossed?
- Assume Yes 2. Are patients nested within doctors or randomly sampled?
- Assume Yes
Question 3. Which variables will you use to answer the research question and what is the scale of
measurement of each?
Step 3. Choose the variables and determine their level of measurement
22
Dependent Variable Types
Continuous, unbounded, interval: Linear Model
Binary: Logistic or Probit Regression
Multinomial: Multinomial Logistic
Ordinal: Ordinal Logistic
Discrete counts: Poisson Family – Poisson, Negative Binomial
Proportion: Logistic, Tobit, Beta
23
Independent Variable Types
1. Numerical
2. Categorical
24
Other Types of Predictors
1. Covariates
2. Interactions
3. Polynomials
4. Mediators
25
• Data sets often contain (or could) multiple versions of the same variable
• Part of the analysis may be about creating variables
• The same variable can be considered different levels of measurement in different contexts
26
What Makes Question 3 Important to Answer
What Makes Question 3 Difficult to Answer
• It has a direct impact on assumptions being met
• Huge impact on the difficulty of
the statistical method chosen
Step 3: The Variables
I want to do the tests on a measure of Gestational Diabetes in conjunction with Iron and/or Vitamin C supplementation, so: Dependent Variable: Gestational Diabetes (0 = no; 1 = yes) Independent Variables: Iron Supplementation (0 = no; 1 = yes) Vitamin C Supplementation (0 = no; 1 = yes) So from what I understand, I should be able to do a chi-squared test, since they're all categorical variables? Is that correct, or did I miss something big? Also, I need to validate those against a few confounding variables, namely Age (continuous) Body Mass Index (continuous) Parity (continuous)
27
Key Info: 1. DV is Binary 2. Both IVs are Categorical 3. All Covariates are Continuous
Now, pulling these together and anticipating later steps…
Step 4. Write an analysis plan
28
The Data Analysis Plan Will Usually Change
29
To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.
- Ronald Fisher
30
❛ ❛ ❛ ❛
The Analysis Plan
31
Questions: Statistical Method Needs to be able to include:
Indicates need for:
1. Research questions - Comparing groups on DV - Controlling for covariates
- Some kind of ANCOVA or regression
2. Design - Crossed Factors - Nesting of Individuals within
Doctors
- Include interaction - Mixed Model
3. Variables - Binary outcome - Two categorical predictors - Three continuous covariates
- Logistic Regression
The Analysis Plan
32
Questions: Statistical Method Needs to be able to include:
Indicates need for:
1. Research questions - Comparing groups on DV - Controlling for covariates
- Some kind of ANCOVA or regression
2. Design - Crossed Factors - Nesting of Individuals within
Doctors
- Include interaction - Mixed Model
3. Variables - Binary outcome - Two categorical predictors - Three continuous covariates
- Logistic Regression
Conclusion: Generalized Linear Mixed Model
Question 4. Now that you’ve collected the data, are
there any data issues?
Step 8. Run univariate and bivariate statistics
Step 12. Check for and resolve data issues
33
Data Issues
1. Outliers and Influential Points
2. Missing Data
3. Multicollinearity
4. Small Sample Sizes
5. Lack of Variation
6. Censoring and Truncation
7. Zero Inflation
34
Step 4: Data Issues
I want to do the tests on a measure of Gestational Diabetes in conjunction with Iron and/or Vitamin C supplementation, so: Dependent Variable: Gestational Diabetes (0 = no; 1 = yes) Independent Variables: Iron Supplementation (0 = no; 1 = yes) Vitamin C Supplementation (0 = no; 1 = yes) So from what I understand, I should be able to do a chi-squared test, since they're all categorical variables? Is that correct, or did I miss something big? Also, I need to validate those against a few confounding variables, namely Age (continuous) Body Mass Index (continuous) Parity (continuous)
35
Potential Issues: 1. Outliers on Covariates 2. Missing Data 3. Multicollinearity among Covariates 4. Sample Size 5. Lack of Variation
To Review:
Steps:
1. Write out research questions in theoretical and operational terms
2. Design the study or define the design
3. Choose the variables and determine their level of measurement
4. Write an analysis plan
8. Run univariate and bivariate statistics
12. Check for and resolve data issues
36
Strategies to Make this Easier
1. Do all 14 Steps in this general order and expect to loop back
2. Write out each step
3. Keep learning and practicing
4. Talk it out with someone knowledgeable
37
Strategies to Make this Easier
1. Do all 14 Steps in this general order and expect to loop back
2. Write out each step
3. Keep learning and practicing
4. Talk it out with someone knowledgeable
38
Want Some Help?
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 39
Want Some Help?
http://theanalysisfactor.com/membership
• Resources
• Training
• Access to statistical mentors
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 40
Monthly member webinars on new topics
Question 1: Research Questions
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 41
Monthly member webinars on new topics
Question 2: Design Issues
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 42
Monthly member webinars on new topics
Question 3: Variable Types
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 43
Monthly member webinars on new topics
Question 4: Data Issues
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 44
Kim Love, PhD
Jeff Meyer, MPA, MBA
Karen Grace-Martin, MA, MA
Audrey Schnell, PhD
Your Statistical Support Team
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 45
“On more than one occasion I have nearly cried with
relief at having clear answers, and given in such a supportive and encouraging
way!”
“After spending so many hours searching for answers
with no success and deadlines looming, this feels like the
best £20 I ever spent!”
Helen Graduate Student
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 46
http://theanalysisfactor.com/membership
Monthly Annual
$47 $497 ($41.42/Mo) Students
$29
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 47
http://theanalysisfactor.com/membership
Bonus
This Week Only!
Types of Regression Models
Get Clear on Your Study Design
Know Your Variables
GUIDE WORKSHEET WORKSHEET
An Overview of 18 Types of Models and When To Use Them
20 Questions to walk you through the design issues that will affect your analysis.
13 questions to give you clarity about your variables
©2018 Karen Grace-Martin| https://TheAnalysisFactor.com 48
100% risk free
If you’re not fully satisfied, just notify us within 30 days and we will issue you a full refund.
http://theanalysisfactor.com/membership