meta-analysis and the synthetic approach luke plonsky current developments in quantitative research...
Post on 14-Dec-2015
217 Views
Preview:
TRANSCRIPT
Meta-analysis and the Synthetic Approach
Luke PlonskyCurrent Developments in
Quantitative Research MethodsDay 2
Traditional Literature Reviews
What do they look like?
Think of a recent one you wrote: What was your process like?
What are their strengths? Weaknesses?
(As we discuss the meta-analytic process, keep a topic or domain of yours in mind.)
Meta-analysis as “the way forward”? (Rousseau, 2008, p. 9)
Systematic, transparent, & quantitative means to
Summarize (all) previous studies (A B; M x N)
Provide a quantitative indication of a relationshipPrevent over/under-interpreting results (Norris &
Ortega, 2006; Rousseau, 2008)
Increase statistical power and generalizability across learners, contexts, L2 features, outcomes, etc. (Plonsky, 2012)
Examine relationships not visible in primary research (A on B when C vs. D)
Identify substantive and methodological trends, weaknesses, and gaps (Plonsky & Gass, 2011)
Meta-analysis is here!
(See Norris & Ortega, 2010; Oswald & Plonsky, 2010)
Pre-2000 2000-2003 2004-2007 2008-in press
0
5
10
15
20
25
30
35
40
45
50
4 6
19
48
+visibility
+impact +citation (Cooper &
Hedges, 2009)
Understand/evaluate choices
advance theory, research, and
practice
Judgment and Decision-Making
Art and ScienceOswald & McCloy
(2003)
Norris & Ortega (2007)
“There doesn’t seem to be a big role in this kind of work for much intelligent statistics, opposed to much wise thought” (Wachter, 1990, p. 182).
vs.
Four major stages(parallel to primary research)
1. Defining the domain / locating primary studies
2. Developing and implementing a coding scheme
3. (Meta-)Analysis
4. Interpreting meta-analytic results
1. DEFINING THE DOMAIN / LOCATING PRIMARY STUDIES
“Best evidence synthesis” (Eysenck, 1995)
Truscott (2007) – strict criteria (e.g., only “long-term” treatments)
Vs. Inclusiveness (preferred) (Norris & Ortega, 2006; Plonsky & Oswald, 2012)Weaknesses mitigated by volume and assessed empirically (e.g.,
Russell & Spada, 2006)
Reliability reported? Yes, d = 0.65; No, d = 0.42 (Plonsky, 2011)
Control for bias? Tight, d = 0.51; Loose, d = 0.38 (Adesope et al., 2010)
(Are there studies with certain methodological features that you would exclude?)
1. Defining the domain / locating primary studies:Methodological considerations
1. Defining the domain / locating primary studies:Publication status (& bias)
Exclude unpublished studies (e.g., Keck et al., 2006; Lyster & Saito, 2010; Mackey
& Goo, 2007) failsafe n (Abraham, 2008; Ross, 1998) lacking precision (e.g., Becker,
2005)
funnel plot (Li, 2010; Norris & Ortega, 2000; Plonsky, 2011)
Include unpublished studies (e.g., Li, 2010; Masgoret & Gardner, 2003, Won, 2008)
Compare Published (g = 0.43) vs. unpublished (g = 0.56) (Taylor et al., 2006)
1. Defining the domain / locating primary studies:Substantive considerations
BroadStrategy instruction (all
skills; Plonsky, 2011)
Multi-word instruction (all types) (Han, in preparation)
Narrow (local)Strategy instruction (reading
only; Taylor et al., 2006)
Collocation instruction + tech.(Nurmukhamedov, in preparation)
(Would you describe your domain as relatively broad or more narrow? If narrow, what broader
domain does your belong to?)
Strict / convenient? quality criteria
The Effectiveness of Bilingual Education Willig (1985) K = 23
d = .63
Rossell & Baker (1996) K = 72 (the “naysayers”; 228 unacceptable) Vote: % of studies helpful (22%), no diff (45%), harmful (33%)
Greene (1998) K = 11 g = .18 (quasi-exp) / .26 (experiments); no Canada
Slavin & Cheung (2003) K = 42; “best-evidence synthesis” No overall d; many subgroups
Roessingh (2004) K = 12 Qual. synthesis; HS learners only; Canadian focus
Rolstad, Mahoney, & Glass (2005) K = 17 (all post-Willig, 1985) dL2 = .23 (usually English); dL1 = .86
Reljić (2011) K = 7 European studies only; d = ?(See also Rossell & Kuder’s [2005] meticulous critique and re-analysis of these studies.)
N&O ‘00
Miller ‘03
R&S ’06
Keck et al. ‘06
M&G ‘07
Truscott ‘07
P&J ‘09
Li ‘10
L&S ’10
Biber et al. ’11
K&W ‘11
Chen & Li ‘12
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
WrittImpExpMLPmptReCF
How effective is feedback?
(Well, it depends…)
Corrective Feedback?
N&O ‘00
Miller ‘03
R&S ’06
Keck et al. ‘06
M&G ‘07
Truscott ‘07
P&J ‘09
Li ‘10
L&S ’10
Biber et al. ’11
K&W ‘11
Chen & Li ‘12
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
WrittImpExpMLPmptReCF
?
(Effects of CF not calculated)
d=-.15
d=1.16
How effective is feedback?
(Well, it depends…)
Corrective Feedback
1. Defining the domain / locating primary studies:Search Strategiesa. Database searches (e.g., LLBA, ERIC, PsycInfo) (see In’nami &
Koizumi, 2010; Plonsky & Brown, under review)
b. Forward citations (Google/Scholar, Web of Science) (Plonsky, 2011)
c. Manual journal searches (Keck et al., 2006; Plonsky & Gass, 2011)
d. Textbooks and edited volumese. Conference proceedings (15 in Lee et al., in press)
f. Reference digging (‘ancestry’)g. Dissertations/theses (10 in Li, 2010; 19 in Lee et al., in press)
h. Previous reviews (e.g., ARAL)
i. Researchers’ websites, online bibliographies, listservs j. Contacting authorsk. others?l. All of the above
1. Defining the domain / locating primary studies:Search Strategies
(in Plonsky & Brown, under review)
Narrow range of search techniques
completeness+redundancy > incompleteness
2. CODING
2. Developing and implementing a coding scheme (the data collection instrument)
Knowledge of…
Substantive issues, relevant models, variables e.g., Taxonomies of instruction, CF moderators e.g., What constitutes a multi-word unit? Collocation? (Han, in prep;
Nurmukhamedov, in prep.) moderators
Research design(s) used Pre-post? Control-experimental only? Classroom/lab, FL/SL, correlational/experimental, length of
treatment, researcher- or teacher-led, outcome measures… more moderators
Methodological features (for analysis of study quality)
2. Developing and implementing a coding scheme
Typically 5 different types of data are coded1. Identification (year, author)2. Sample and context (age, L1, L2, proficiency)3. Design (pre-post/control-experimental, treatment features)4. Outcome features (free response, constrained response)5. Outcomes / effect sizes (r, d)
Coding scheme example: Lee, Jang, & Plonsky (in press) Recommendations:
code variables numerically/categorically whenever possible revise and add new variables as they emerge from coding
(What types of substantive and methodological features would you code for?)
(Which type of index would be most appropriate for your research/domain?)
2. Developing and implementing a coding scheme (cont’d)
Decisions about…
Interrater reliabilityEspecially for high-inference items (e.g., L2 proficiency; task-
essentialness)
Percentage agreement; Cohen’s kappa
Missing data (e.g., SDs VERY common: 31% in Plonsky & Gass,
2011)
1. Ignore/exclude (most common)
2. Impute (i.e., estimate)
3. Request (5/15 and 5/16 sent data in Plonsky, 2011, and Lee et al., in press,
respectively)
3. (META-)ANALYSIS
3. (Meta-)Analysis
Potentially very simple: Overall d = M(study1, study2, …)
Level of analysis (e.g., study?, sample?, within vs. between groups?) Pre-post ESs generally larger than control-experimental ones
Weighting/adjusting ESs for quality, statistical artifactsN (Norris & Ortega, 2000; Plonsky, 2011), inverse variance (Won, 2008)
“Schmidt & Hunter” corrections (Jeon & Yamashita, under review; Masgoret & Gardner, 2003)
Quality/control (e.g., random assignment, pretesting)
Example/template for ES weighting (N; inverse variance)
3. (Meta-)Analysis“adds as well as summarizes knowledge” (Hall et al.,
1994, p. 24)
Moderator analyses (explain variance across studies):
- Ross, 1998: listening; reading
- Norris & Ortega, 2000: +explicitness; +constrained
measures
- Mackey & Goo, 2007: vocab > grammar
- Li, 2010: labs > classrooms
- Plonsky, 2011: longer treatments; fewer strategies; R
& S
- Lee et al., in press.: instruction + feedback; longer
treatments
Overall / mean (d,
r)
(Example of moderator analyses using SPSS)
Totally essential! (and
awesome)
3. (Meta-)Analysis: Treatment types as moderators
Plonsky, 2011
3. (Meta-)Analysis: Outcome measures as moderators
Norris & Ortega, 2000
3. (Meta-)Analysis: Multiple Moderators
Spada & Tomita, 2010
3. (Meta-)Analysis: Treatment length as a moderator
Pragmatics Instruction
L2 Instruction Classroom CF
0.42
1.06
0.720.82
1.08
0.57
0.79
1.13
0.79
(Jeon & Kaya, 2006)(Norris & Ortega, 2000) (Lyster & Saito, 2010)
S LLSB M B S-M L
More advanced (meta-)analytic / techniques
Fixed vs. random effects modeling Bayesian meta-analysis (see Ross, 2013)
Meta-regressionMeta-SEM
(See Borenstein et al., 2009; Cooper, Hedges, & Valentine, 2009)
3. (Meta-)Analysis
4. INTERPRETING RESULTS
SMALL BIG
What do they mean anyway?
What implications do these effect have for
future research, theory, and practice?
What does d = 0.50 (or 0.10, or 1.00…) mean?
How big is ‘big’? And how small
is ‘small’?
4. Interpreting findings(Plonsky & Oswald, under review)
General and field-specific benchmarks (Cohen, 1988; Plonsky & Oswald, under review)
Previous/similar meta-analyses in AL (e.g.,
Abraham, 2008; Lee et al., this colloquium; Mackey & Goo, 2007)
meta-analyses in other fields (Plonsky, 2011)
SD units (Taylor et al., 2006)
Setting (e.g., Li, 2010; Mackey & Goo, 2007)
Length/intensity, practicality (Lee & Huang, 2008; Lee et al., in press;
Lyster & Saito, 2010; Norris & Ortega, 2000)
Study quality (Plonsky, 2011, 2013, in press; Plonsky & Gass, 2011)
L2 Interac-tion
Strategy Instruction
1.00.8
0.60.4
Lab Classroom
Cohen’s (1988) “t-shirt” effect sizes
ESs are best understood in relation to a particular discipline and, ideally, within a particular sub-domain of that discipline (e.g., Cohen, 1988; Valentine & Cooper, 2003)
d = 0.20d = 0.50 d = 0.80
dlinguistics = economics =
social work = …?
d values across 77 L2 meta-analyses(1,733 studies, N = 452,000+; Plonsky & Oswald, under review)
-0.5
0
0.5
1
1.5
2
0.40 ≈ Small(ish)
0.70 ≈ Medium(ish)
1.00 ≈ Large(ish)
M = 0.63
d values across 236 primary L2 studies
- 0
- 1
- 2
- 3
- 4
- 5
1.0775th percentile
large-ish
0.7150th percentile
medium-ish
0.4525th percentile
small-ish
- 0
- 1
- 2
- 3
- 4
- 5
-0.5
0
0.5
1
1.5
2
0.40 ≈ Small
0.70 ≈ Medium
1.00 ≈ Large
M = 0.63
35
1.0775th percentile
large-ish
0.7150th percentile
medium-ish
0.4525th percentile
small-ish
d values across 236 primary L2 studies
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Additional Considerations: Theoretical Maturity
Year
ES
(d)
-fine-grained analyses
+fine-grained analyses
Example: d = 0.42, SD = 0.24, k = 46
Additional Considerations: Methodological Maturity
Example: d = 0.42, SD = 0.24, k = 46
Year
ES
(d)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-refined methods and instruments
+refined methods and instruments
Additional Considerations: Theoretical & Methodological Maturity
Example: d = 0.42, SD = 0.24, K = 92
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ES
(d)
Year
-refined methods and instruments
+refined methods and instruments
-fine-grained analyses
+fine-grained analyses
Where is your study?
ESs Over TimePlonsky & Gass (2011)
2000s
1990s
1980s
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
0.52
0.820000000000001
1.62
Average Effect Sizes across Three Decades
Effect Size (d)
Decade
(Literal/Mathematical) SD UnitsExample: d = 0.73; the average EG participant outscored
the average CG participant by about 3/4 a SD
Additional Considerations: Research Setting
Lab vs. Classroom FL vs. SL
*Setting may change over time: L2 interaction (Plonsky & Gass, 2011)
- 1980s ≈ 80% lab-based- 1990s-2000s ≈ 50/50% lab/classroom
(Mackey & Goo, 2007) (Plonsky, 2011) Li (2010)(Taylor et al., 2006)
Additional Considerations: Length/Intensity of Treatment
(Practicality?)
(Jeon & Kaya, 2006) (Norris & Ortega, 2000) (Lyster & Saito, 2010)
S L LSB M B S-M L
Additional Considerations: Manipulation of IVs(Practicality?)
Lee & Huang (2008)The effect of input enhancement on L2 grammar learning: d =
0.22Numerically small, but practically large/significant?
Additional Considerations: Publication Bias, Sample Sizes, & Sampling ErrorPub. bias: The tendency only to publish studies with
statistically significant (or theoretically appealing) findings (Rothstein, Sutton, & Borenstein, 2005; see Plonsky, 2013; Lee, Jang, & Plonsky, in press, for evidence of publication bias in L2 research.)
0
20
40
60
80
100
120
140
160
180
200
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
Effect size (d )
Sam
ple s
ize
Two related statistical artifacts:1. Smaller Ns +sampling error +variance/distance from population mean2. Low instrument reliability smaller effects
-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.50
10
20
30
40
50
60 d = 0.83
vs.
Challenges to meta-analysis
1) Domain maturityage, breadth and depth of researchdanger of pre-mature closure
2) Poor reporting practices (SDs, ESs)Missing data (K = 19 in Nekrasova & Becker, 2009; 22 in
Plonsky, 2011)
3) Instrument reliability low or unreportedReported in 6% of studies (Nekrasova & Becker, 2009)
4) Idiosyncratic/inconsistent research activity
5) Very few replications (see Polio & Gass, 1997; Porte, 2002, 2012)
What challenges might one encounter in conducting a
meta-analysis in your target domain and/or generally?
Challenges to meta-analysis (cont.)
6) Disagreement over definitions and operationalizationsE.g., noticingPerhaps more “adversarial collaboration” is needed (see Tetlock &
Mitchell, 2009)
7) Overreliance on individual studies (see Norris & Ortega, 2007)
8) Bias of primary (and secondary) researchers toward particular types of findings (e.g., in favor/against theory X; p < .05)
9) Tradition of overreliance on NHST (see Schmidt & Hunter, 2002)
CrudeUninformativeUnreliable
A synthetic approach to primary research?
What might this look like generally and in terms of…Research agendas?Reporting practices and interpretations of findings?Researcher training? Journal calls and acceptance policies?
Conclusion: Judgment and decision-making play a
major role in all meta-analyses
Understanding the choices
More appropriate execution and interpretation of meta-analytic findings
More precise advances in theory, more efficient L2 research, and more accurately
informed practice
Further Reading
Synthesizing research on language learning and teaching (Norris & Ortega, 2006)
Research synthesis and meta-analysis: A step-by-step approach (Cooper, 2010)
Practical meta-analysis (Lipsey & Wilson, 2001)
Connections to Other Topics to be Discussed this WeekNHST, effect sizes (MONDAY)Study Quality (WEDNESDAY)Replication (THURSDAY)Reporting practices (FRIDAY)
Tomorrow: Study Quality
What does this mean?
How can we operationalize study quality?
What findings exist for studies of study quality in AL?
Where and how can the findings of quality analyses be implemented?
top related