an illustrated guide to the methods of meta analysi

14
1 Introduction Meta-analysis is now accepted as a necessary tool for the evaluation of health care. Such analyses have been carried out in virtually every area of medicine, to evaluate a wide spectrum of health-care interven- tions and policies. The primary aim of many meta- analyses is to produce a more accurate estimate of the effect of a particular intervention, or group of inter- ventions, than is possible using only a single study. Since different studies are carried out using different populations, different designs and a whole range of other study-specific factors, it has been suggested that combining them will produce an estimate that has broader generalizability than any single study. Addi- tionally, it may be possible to explain the differences between results from individual studies by carrying out a meta-analysis. Such an assessment may even provide further insight into the intervention, and develop our understanding of how it works. Journal of Evaluation in Clinical Practice, 7, 2, 135–148 © 2001 Blackwell Science 135 Correspondence Mr Alex J Sutton Department of Epidemiology and Public Health University of Leicester 22-28 Princess Road West Leicester LE1 6TP UK Keywords: Bayesian methods, hospital discharge, meta-analysis, methods, re-admission, review Accepted for publication: 22 July 2000 Abstract Meta-analysis is now accepted as a necessary tool for the evaluation of health care. Such analyses have been carried out in virtually every area of medicine to evaluate a wide spectrum of health care interventions and poli- cies. This paper has three broad aims: (1) to describe the basic principles of meta-analysis, using a meta-analysis of interventions intended to reduce hospital re-admission rates for illustration; (2) to consider threats to the internal validity of meta-analysis, and the measures which can be taken to minimize their impact; and (3) to present an overview of more specialist and developing methods for synthesizing data, with the intention of out- lining the directions meta-analysis may take in the future.The methods used to synthesize studies, which take ‘weighted averages’ of effect sizes have been refined to a high degree, while the methods for dealing with threats to the validity of meta-analyses such as publication bias, and variations in quality of the primary studies, are at a less advanced stage. However, many consider this standard ‘weighted average’ approach to meta-analysis not to be ‘state of the art’ in at least some situations, where the use of more sophisticated methods, generally to explain variation in estimates from different studies and synthesize a broader base of evidence, would be advantageous. Currently, approaches which attempt to do this are mainly still in the experimental stage and, unfortunately, ideas which sound natural and appealing are often difficult to implement in practice. Clearly, it will be some time before they are used routinely, but significant steps have been made. An illustrated guide to the methods of meta-analysis Alexander J. Sutton BSc MSc 1 Keith R. Abrams BSc MSc PhD 2 and David R. Jones BA MSc PhD CStat CMath DipTCDHE 3 1 Lecturer in Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK 2 Reader in Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK 3 Professor of Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK

Upload: stikes-merangin-jambi

Post on 05-Dec-2014

964 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: An illustrated guide to the methods of meta analysi

1 Introduction

Meta-analysis is now accepted as a necessary tool for the evaluation of health care. Such analyses havebeen carried out in virtually every area of medicine,to evaluate a wide spectrum of health-care interven-tions and policies. The primary aim of many meta-analyses is to produce a more accurate estimate of theeffect of a particular intervention, or group of inter-ventions, than is possible using only a single study.

Since different studies are carried out using differentpopulations, different designs and a whole range ofother study-specific factors, it has been suggested thatcombining them will produce an estimate that hasbroader generalizability than any single study. Addi-tionally, it may be possible to explain the differencesbetween results from individual studies by carryingout a meta-analysis. Such an assessment may evenprovide further insight into the intervention, anddevelop our understanding of how it works.

Journal of Evaluation in Clinical Practice, 7, 2, 135–148

© 2001 Blackwell Science 135

CorrespondenceMr Alex J SuttonDepartment of Epidemiology and Public

HealthUniversity of Leicester22-28 Princess Road WestLeicester LE1 6TPUK

Keywords: Bayesian methods, hospital discharge, meta-analysis,methods, re-admission, review

Accepted for publication:22 July 2000

AbstractMeta-analysis is now accepted as a necessary tool for the evaluation ofhealth care. Such analyses have been carried out in virtually every area ofmedicine to evaluate a wide spectrum of health care interventions and poli-cies. This paper has three broad aims: (1) to describe the basic principles ofmeta-analysis, using a meta-analysis of interventions intended to reducehospital re-admission rates for illustration; (2) to consider threats to theinternal validity of meta-analysis, and the measures which can be taken tominimize their impact; and (3) to present an overview of more specialistand developing methods for synthesizing data, with the intention of out-lining the directions meta-analysis may take in the future.The methods usedto synthesize studies, which take ‘weighted averages’ of effect sizes havebeen refined to a high degree, while the methods for dealing with threatsto the validity of meta-analyses such as publication bias, and variations in quality of the primary studies, are at a less advanced stage. However,many consider this standard ‘weighted average’ approach to meta-analysisnot to be ‘state of the art’ in at least some situations, where the use of moresophisticated methods, generally to explain variation in estimates from different studies and synthesize a broader base of evidence, would beadvantageous. Currently, approaches which attempt to do this are mainlystill in the experimental stage and, unfortunately, ideas which sound naturaland appealing are often difficult to implement in practice. Clearly, it will besome time before they are used routinely, but significant steps have beenmade.

An illustrated guide to the methods of meta-analysis

Alexander J. Sutton BSc MSc1 Keith R. Abrams BSc MSc PhD2 and David R. Jones BA MSc PhD CStat CMath DipTCDHE3

1Lecturer in Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK 2Reader in Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK 3Professor of Medical Statistics, Department of Epidemiology and Public Health, University of Leicester, UK

Page 2: An illustrated guide to the methods of meta analysi

A.J. Sutton et al.

Concurrent with the explosion in the use of meta-analysis is the continued development and refine-ment of the methods used to carry out such analyses.This is an important endeavour, because the scienceof meta-analysis is still in its infancy, and in the pastover-simplistic methods have led to misleading conclusions (Hunt 1997). A systematic review ofmethodology for meta-analysis carried out by theauthors (Sutton et al. 1998) informed the writing ofthis paper, and is recommended further reading formore technical details on the material presentedhere. The reader should note, however, that severalimportant developments which are noted here havebeen published in the short time since the review waswritten, confirming the speed with which this fieldcontinues to develop.

This paper has three broad aims: (1) to describe thebasic principles of meta-analysis using a workedexample; (2) to consider the threats to the validity ofmeta-analysis and the measures which can be takento minimize their impact; and (3) to present anoverview of more specialist and developing methods,with the intention of outlining the directions meta-analysis may take in the future. The term ‘meta-analysis’ is used to describe different aspects ofresearch synthesis by different people. In some con-texts it is used to indicate the whole review process,including aspects such as literature searching anddata extraction, as well as the statistical combinationof quantitative results.We prefer to use the term ‘sys-tematic review’ to indicate the whole review process,restricting the term ‘meta-analysis’ to describe thesynthesis of quantitative data from multiple studies.Although many recent advances in pre-synthesisreview methods have been made, such as the devel-opment of sophisticated searching methods (Suttonet al. 1998; Dickersin et al. 1994), this paper focusessolely on aspects of quantitative data synthesis, ormeta-analysis. [Note: very often a systematic reviewwill include a meta-analysis; however, if no quantita-tive data are available from the primary reports, orthat which is available is deemed too heterogeneousto be meaningfully combined, then only a narrativedescription of the studies may be carried out (Suttonet al. 1998).] Guidelines for good practice for the pre-synthesis aspects of systematic reviews have beendescribed comprehensively elsewhere (Deeks et al.1996; Oxman 1996). Very importantly, a recent initia-

tive has produced a checklist addressing the qualityof reporting of meta-analyses (QUORUM) (Moheret al. 1999b). This statement is in the same spirit as the CONSORT statement for reporting randomizedclinical trials (RCTs) (Begg et al. 1996) and is recom-mend as reading for those preparing reports of meta-analyses of RCTs.

2 The synthesis of estimates of effectivenessfrom multiple primary studies

This section focuses on pooling results from a numberof studies investigating the relative effectiveness of anintervention. Often, meta-analyses of this sort includeonly RCTs, typically with two arms – one arm receiv-ing experimental treatment and the other control,placebo or standard treatment. (The issue of variablequality of studies, and the synthesis of studies with different designs is considered in sections 3 and 4,respectively). Data from a meta-analysis of interven-tions intended to improve the process of hospital dis-charge of older people, published elsewhere (Parkeret al. 2001), is used to illustrate the methods. Thirtytwo-arm RCTs are included in the meta-analysis, andthe outcome focused on here is the re-admission rateto hospital following discharge. In the remainder ofthis section the principal ideas involved in performinga meta-analysis are explained and, where possible,the calculations required are reproduced to aid understanding. In practice, the use of computer soft-ware greatly facilitates the analyses required. Themeta-analysis capabilities of many common statisticalanalysis packages are limited; however, much specialist software has been developed recently(Sutton et al. 2000b; Sterne et al. 2001).

Calculation of an effect size for each study

Broadly speaking, quantitative outcomes from anystudy can be classified as belonging to one of threedata types: (i) binary, e.g. often indicating the pres-ence or absence of the event of interest in eachpatient; (ii) continuous, where outcome is measuredon a continuous scale, e.g. this could be change inblood pressure, etc.; or (iii) ordinal, where outcomeis measured on an ordered categorical scale, e.g. adisease severity scale, where a patient can be classi-fied as belonging to one of several distinct categories.

136 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148

Page 3: An illustrated guide to the methods of meta analysi

Meta-analysis methods

© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 137

The approaches used to combine either binary orcontinuous outcomes are often similar, while ordinaldata is somewhat more complex and requires spe-cialist methods, discussed elsewhere (Whitehead &Jones 1994).

Table 1 provides a sample of the data extractedfrom reports of 30 RCTs to be included in the meta-analysis (for a list of references for these RCTs seethe original report (Parker et al. 2001) – numbersused to identify these RCTs in this report are pro-vided here in the final column of Table 1). Columnsthree and six provide the number of patients ran-domized to the experimental and control arms ofeach study, respectively. [Note: analysis shouldusually be calculated on the basis of intention to treat(Hollis & Campbell 1999) – if the analysis in the orig-inal study report was not performed using thismethod it may still be possible to extract the cor-rect figures for the purposes of the meta-analysis.]Columns four and seven indicate the number of re-admission episodes. Note that an individual can havemultiple re-admissions; for example, the new inter-vention arm of study 8 included 142 patients, while554 events were reported. [Note: the fact that morethan one re-admission is permitted for each patientmeans that an individual’s outcome is not binary.]Column two indicates the length of follow-up of thestudies, which ranges from 1 to 12 months; it is nec-essary to account for follow-up when calculatingeffect sizes, since the number of re-admissions maybe critically dependent on the length of the observa-tion period of the trial.

An outcome measure which takes into accountlength of follow-up is the re-admission rate ratio(RRR). As the name suggests, this is the ratio of the re-admission rates (per month) in both arms.The re-admission rate (RR) in each arm is calculatedby:

For example, there are two re-admissions in 37patients over 1.5 months in trial 1, so the RR is2/(3.7 ¥ 1.5) = 0.036. [Note: more decimal places areused in the working of the calculations in this paperthan are printed.] Similarly, the RR in the controlgroup is 0.162. The outcome of interest can now be

RRNumber of re admissions

Number of patients length of follow up=

¥-

-

calculated by dividing the RRs in the treatment andcontrol arms, 0.036/0.162, which produces an RRR of0.222. This RRR is less than one, which indicates there-admission rate is lower in the treatment arm, sug-gesting that the intervention is beneficial. In thisinstance the estimated effect is large (a long wayfrom 1). The RRs for each arm are provided incolumns 5, 8 and the RRR in column 9.

Although the RRR is the measure of interest, dueto theoretical statistical considerations (includingimproved approximate normality), a natural loga-rithm transformation is used (ln(RRR)) for thepurpose of combining studies via a meta-analysis.(Fleiss 1994) The pooled result can be back-trans-formed by taking the exponential of the pooledln(RRR) (e1n(RRR)) afterwards, to convert the answerback to the RRR scale, allowing easier interpreta-tion. The ln(RRR) estimates for each study are givenin column 10 of Table 1.

A further value, the standard error (SE) of theln(RRR), is required for the meta-analysis calcula-tion. The SE gives an indication of the degree of pre-cision to which each study estimates the effect size; asmall SE indicates a precise estimate, usually from alarge study. The SE for the ln(RRR) is calculated by:

Hence, for study 1 the SE(ln(RRR)) is÷1/2 +�1/9 = 0.782. Standard errors for the remain-ing studies are provided in column 11 of Table 1.It is common practice to calculate 95% confi-dence intervals for each study – these indicate the interval in which the estimate of effect size would be expected to fall 95 times out of every 100 replications of the trial. Hence, a 95% confidenceinterval provides a range in which one can be reasonably sure the true effect size lies. The formulafor calculating a 95% confidence interval for aln(RRR) is:

For study 1 the ln(RRR) 95% confidence intervalis given by -1.504 ± 1.96(0.782) = (-3.04 - 0.03). Con-

ln . ln .RRR SE RRR( ) ± ¥ ( )( )1 96

SE RRR

num of re admiss

in group

num of re admiss in

control group

ln

. .

.

. .

( )( ) =

+- -

1 1

exp

Page 4: An illustrated guide to the methods of meta analysi

138©

2001 Blackw

ell Science, Journal of E

valuation in Clinical P

ractice, 7, 2, 135–148

Table 1 Data and calculations for the hospital re-admissions meta-analysis

Experimental group Control groupRe- Number

Length of Re- Re- Re- Re- admission EPOC used inStudy follow-up Patients admissions admission Patients admissions admission rate ratio SE 95% CI 95% CI Intervention quality originalID (months) (n) (n) rate (n) (n) rate (RRR) ln(RRR) (ln(RRR)) ln(RRR) RRR Weight administration measure report*1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

1 1.5 37 2 0.036 37 9 0.162 0.222 -1.504 0.782 (-3.04 - 0.03) (0.05 - 1.03) 1.64 Single 5 532 3 464 102 0.073 439 102 0.077 0.946 -0.055 0.140 (-0.33 - 0.22) (0.72 - 1.24) 51.00 Single 3 593 6 499 347 0.116 502 340 0.113 1.027 0.026 0.076 (-0.12 - 0.18) (0.88 - 1.19) 171.73 Single 6 604 6 86 36 0.070 87 26 0.050 1.401 0.337 0.257 (-0.17 - 0.84) (0.85 - 2.32) 15.10 Single 4 695 12 57 9 0.013 56 6 0.009 1.474 0.388 0.527 (-0.65 - 1.42) (0.52 - 4.14) 3.60 Team 3 826 2 39 29 0.372 41 35 0.427 0.871 -0.138 0.251 (-0.63 - 0.35) (0.53 - 1.42) 15.86 Single 3 887 3 20 3 0.050 20 13 0.217 0.231 -1.466 0.641 (-2.72 to - 0.21) (0.07 - 0.81) 2.44 Single 6 1778 3 142 554 1.300 140 868 2.067 0.629 -0.463 0.054 (-0.57 to - 0.36) (0.57 - 0.70) 338.16 Team 4 1879 6 695 343 0.082 701 310 0.074 1.116 0.110 0.078 (-0.04 - 0.26) (0.96 - 1.30) 162.83 Team 6 222

10 2 178 43 0.121 176 37 0.105 1.149 0.139 0.224 (-0.30 - 0.58) (0.74 - 1.78) 19.89 Single 2 22811 6 30 9 0.050 30 6 0.033 1.500 0.405 0.527 (-0.63 - 1.44) (0.53 - 4.21) 3.60 Team 5 23112 6 96 42 0.073 97 62 0.107 0.684 -0.379 0.200 (-0.77 - 0.01) (0.46 - 1.01) 25.04 Team 3 23613 3 303 104 0.114 300 109 0.121 0.945 -0.057 0.137 (-0.33 - 0.21) (0.72 - 1.24) 53.22 Team 4 27514 6 150 51 0.057 99 32 0.054 1.052 0.051 0.226 (-0.39 - 0.49) (0.68 - 1.64) 19.66 Team 4 28315 1 20 4 0.200 20 6 0.300 0.667 -0.405 0.645 (-1.67 - 0.86) (0.19 - 2.36) 2.40 Team 1 31216 1.5 29 4 0.092 25 9 0.240 0.383 -0.959 0.601 (-2.14 - 0.22) (0.12 - 1.24) 2.77 Single 4 33417 12 333 396 0.099 335 410 0.102 0.972 -0.029 0.070 (-0.17 - 0.11) (0.85 - 1.12) 201.44 Single 3 33918 3 140 18 0.043 136 16 0.039 1.093 0.089 0.344 (-0.58 - 0.76) (0.56 - 2.14) 8.47 Single 4 35119 9 418 495 0.132 417 549 0.146 0.899 -0.106 0.062 (-0.23 - 0.02) (0.80 - 1.02) 260.31 Single 3 39720 6 62 21 0.056 58 35 0.101 0.561 -0.578 0.276 (-1.12 to - 0.04) (0.33 - 0.96) 13.13 Team 4 40321 12 199 107 0.045 205 111 0.045 0.993 -0.007 0.135 (-0.2 - 0.26) (0.76 - 1.30) 54.48 Team 3 41622 12 63 22 0.029 60 30 0.042 0.698 -0.359 0.281 (-0.91 - 0.19) (0.40 - 1.21) 12.69 Team 4 69123 6 35 10 0.048 40 51 0.213 0.224 -1.496 0.346 (-2.17 to - 0.82) (0.11 - 0.44) 8.36 Single 4 179324 6 102 49 0.080 102 51 0.083 0.961 -0.040 0.200 (-0.43 - 0.35) (0.65 - 1.42) 24.99 Single 7 179625 6 140 24 0.029 97 29 0.050 0.573 -0.556 0.276 (-1.10 to - 0.02) (0.33 - 0.98) 13.13 Team 3.5 221126 3 45 5 0.037 46 5 0.036 1.022 0.022 0.632 (-1.22 - 1.26) (0.30 - 3.53) 2.50 Team 6 222927 4 49 11 0.056 51 7 0.034 1.636 0.492 0.483 (-0.46 - 1.44) (0.63 - 4.22) 4.28 Single 3.5 265728 6 177 49 0.046 186 107 0.096 0.481 -0.731 0.172 (-1.07 to - 0.39) (0.34 - 0.67) 33.61 Single 3 363229 3 381 154 0.135 381 197 0.172 0.782 -0.246 0.108 (-0.46 to - 0.04) (0.63 - 0.97) 86.43 Team 4 363630 2 96 22 0.019 110 43 0.033 0.586 -0.534 0.262 (-1.05 to - 0.02) (0.35 - 0.98) 14.55 Single 6 4460

*Parker et al. 2000. n = number.

Page 5: An illustrated guide to the methods of meta analysi

fidence intervals for RRR are obtained by taking the exponential of this ln(RRR) interval; hence,the RRR 95% confidence interval for study 1 is(0.05–1.03). This interval includes 1, which indicatesthat on its own the trial is inconclusive, because bothbeneficial and harmful effect size estimates areincluded in the interval and are in some sense plau-sible. This highlights the need to consider the preci-sion of the estimate; the study estimated a very largetreatment effect, but did so very imprecisely; the trueeffect could be much smaller (or larger) than thepoint estimate. The 95% confidence intervals forln(RRR) and RRR for the remaining studies are provided in columns 12 and 13, respectively. To aidexamination of the results of the individual studies,these intervals can be plotted on the same axis, as inFig. 1. The RRR estimate for each study is plotted,with the size of the plotting symbol proportional tothe precision of the estimate. The 95% confidenceinterval for each RRR estimate is also plotted (themore precise estimates having the smaller confidenceintervals) (other features of this figure will beexplained in due course). This plot highlights thevariability in the estimates and in the precisionsbetween studies.The issue of variability between esti-mates from individual studies is considered further inlater sections.

Combining effect sizes – calculating weighted averages

The previous section illustrated how a RRR estimateand corresponding standard error could be calcu-lated from summary data extracted from individualstudy reports. In other instances different effect measures may be more appropriate, but the generalprinciple that an estimate and SE are required fromeach study remains. When outcomes are reported on a binary scale, the odds ratio, risk ratio or risk difference measures are commonly used, while outcomes measured on a continuous scale can becombined directly, or standardized – if differentscales of measurement have been used in the indi-vidual studies. Descriptions and formulae for each ofthese outcome measures and others are availableelsewhere (Fleiss 1993; Sutton et al. 2000c).

The simplest way to combine estimates is toaverage them. Since different studies estimate thetrue effect size with varying degrees of precision, aweighted average is used. The weight given to eachstudy in the re-admissions meta-analysis is calculatedby:

weightSE RRR

=( )( )1

2ln

.

Meta-analysis methods

© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 139

Figure 1 Forest plot of 30 RCTsexamining the effect on re-admission rates of interventionsaimed at modifying the hospitaldischarge process for elderlypeople.

Page 6: An illustrated guide to the methods of meta analysi

A.J. Sutton et al.

The square of the standard error is often known asthe variance, so combining studies using this weight-ing is often called the inverse variance-weightedmethod (Fleiss 1993). The weightings for each studyare provided in Table 1, column 14. If an effectmeasure other than the RRR is being used, then theweightings are calculated by the same principle, usingthe inverse of the variance of that effect measure.

Once weight for each study has been calculated, apooled estimate of ln(RRR) is calculated by multi-plying each study’s weight by its ln(RRR) andsumming the resulting values, and then dividing thisvalue by the sum of the weights. Using figures fromTable 1, the outline calculation for the re-admissionsdata is:

The variance for ln(pooled RRR) (or any othereffect measure used) is then calculated by taking thereciprocal of the sum of the weights (1/sum ofweights):

Using these figures, an approximate 95% confidenceinterval for the pooled estimate can be calculated in the same manner as confidence intervals were produced for the individual study estimates above.The pooled estimate of RRR for the re-admissionsdataset is 0.85 with 95% CI (0.81–0.89), indicating amodest, statistically significant treatment benefit atthe 5% level.This estimate is plotted using a diamondshape in Fig. 1 directly below the 30 individualstudies. Figure 1 is often called a forest plot and iscommonly used to display the results of a meta-analysis.

This approach is often known as a fixed-effectapproach, to distinguish it from the random-effectmodels described below. It can be used to combineoutcomes on any scale; however, other related fixed-effect methods specifically for combining odds ratiosalso exist (Fleiss 1993; Sutton et al. 2000c). Thesefixed-effect methods all make the strong assumptionthat each study is estimating the same underlying

var . . . . .

.

pooled RRR( ) = + +( )=

1 1 64 14 55

0 0006

ln

. . . . . . .. . . . .

.

pooled RRR( )

=¥ -( )( ) + + ¥ -( )( )[ ]

+ +( )= -

1 64 1 504 14 55 1 5041 654 14 55

0 164

treatment effect. Many people feel that in medicaland related research such an assumption is unrealis-tic (Thompson 1993) because studies are never iden-tical replications of one another, and study designand conduct differences will inevitably have somedegree of influence on study outcome. Models whichaccount for underlying variability in the treatmenteffect estimates are considered in the next section.

Heterogeneity and random effect models

When performing a meta-analysis, although theoverall aim may be to produce an overall pooled esti-mate of treatment effect, it is crucial to assess thevariation between results of the primary studies and,if possible, to investigate why they differ. Clearly, itwould be remarkable if all studies being meta-analysed produced exactly the same treatment effectestimate. Some variation in results is expected, duesimply to the play of chance; this is often calledrandom variation. However, if effect size estimatesvary between studies to a greater extent thanexpected on the basis of chance alone the studies areconsidered to be heterogeneous, and it is necessaryto account for the extra variation, above that ex-pected by chance, in the meta-analysis model. Theway this is usually performed is through the use of arandom-effect model. Essentially, this relaxes theassumption that each study is estimating exactly thesame underlying treatment effect, and insteadassumes that the underlying effect sizes are drawnfrom a distribution of effect sizes. This distribution isusually assumed to be Normal, with a variance deter-mined by the data. In practical terms, accounting forbetween study heterogeneity in this way produces apooled point estimate which is often (but not always)similar to the one produced by fixed-effect methods.However, taking into account between study hetero-geneity produces a wider 95% confidence interval, sothe estimate is more conservative.

The whole issue of appropriateness and suitabilityof fixed- and random-effect models for meta-analysishas been much discussed (Thompson 1993; Peto1987). A test for heterogeneity exists (Fleiss 1993),and the result of this test can then be used to informmodel choice. If it is non-significant a fixed-effectmodel is to be used, and if it is significant a random-effect model should be used. This seemingly sensible

140 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148

Page 7: An illustrated guide to the methods of meta analysi

approach has a flaw because the test has low power.This implies that heterogeneity may exist even whenit produces a non-significant result (Boissel et al.1989). An alternative approach is to always use arandom-effect model. The inflation of the confidenceinterval is dictated by the degree of variation between studies, so when between-study variation issmall the inflation will be negligible, producing aresult which would be very similar to the fixed-effectapproach.

A detailed description of the random-effect meta-analysis model is beyond the scope of this paper, butclear accounts are given elsewhere (DerSimonian &Laird 1986; Shadish & Haddock 1994). Combiningthe 30 studies evaluating interventions to prevent re-admission using a random-effect model produces aRRR of 0.83 (0.73–0.93). This estimate is plottedbelow the fixed-effect one in Fig. 1. The estimate ofthe between-study variance is 0.057, which is quitesmall but non-negligible (the test for between-study heterogeneity is highly significant (P < 0.001)).Accounting for this heterogeneity has produced awider confidence interval compared to the fixed-effect approach, which is a typical finding. Modifica-tions to the way the parameters in a random-effectmeta-analysis model are calculated have been devel-oped (Hardy & Thompson 1996; Biggerstaff &Tweedie 1997). One of these should be used if thenumber of studies in the meta-analysis is small(approximately less than 10) as it overcomes prob-lems with a previous simplification in the model cal-culations, which can be important in meta-analyses ofsmall numbers of studies.

A final point concerning between study hetero-geneity is that there is little explicit guidance to offerregarding the point at which studies estimates shouldnot be pooled at all because heterogeneity is deemedtoo great, but alternative approaches are discussedbelow.

Exploring and explaining heterogeneity

Until now, the impression has been given that het-erogeneity is a nuisance factor which needs account-ing for when performing a meta-analysis. However,investigating why between-study variation existsoffers the meta-analyst unique opportunities. More

desirable than using random-effect models to allowfor heterogeneity is to try to explain the heterogene-ity. This may lead to the identification of associationsbetween study or patient characteristics and theoutcome measure, which would not have been pos-sible in single studies. This may lead in turn to clini-cally important findings and may eventually assist inindividualizing treatment regimes (Lau et al. 1998).Both subgroup analyses and regression methods canbe used to do this.

Potential study level factors, pertaining to eitherstudy design or patient characteristics which couldaffect study results should ideally be identified beforea meta-analysis is conducted. If this is carried out,data on these factors can then be obtained at the dataextraction stage of a review, and such explicit a priorispecification also reduces the temptation of ‘datadredging’.

Returning to the re-admission dataset, one poten-tial factor which could affect results is whether theintervention was administered by a team or an individual. This information is given for each study in column 15 of Table 1. In 16 of the studies the intervention was administered by an individual and in 14 it was administered by a team. Separatemeta-analyses can be performed for these two sub-groups in an attempt to see if the effectiveness of the intervention depends on whether an individual or team implements it, and whether between studyheterogeneity is reduced in the subgroups. Pooledestimates for these subgroups turn out to be almostidentical. The intervention administered by indi-vidual subgroup has a RRR of 0.83 (0.70–0.97) andthe estimate of the between-study heterogeneity of 0.056 (test for heterogeneity highly significant atP < 0.001). For the studies where the intervention was administered by a team the RRR was 0.83(0.69–0.99) and the estimate of between-study heterogeneity 0.062 (test for heterogeneity highlysignificant at P < 0.001). Hence, it would appear that whether the intervention is administered by anindividual or a team makes very little difference tothe effectiveness of the intervention and, hence, doesnot explain any of the variation between studyresults.

If the factor of interest is measured on a continu-ous scale, or dummy indicator variables are createdfor the levels of categorical factors, then meta-

Meta-analysis methods

© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 141

Page 8: An illustrated guide to the methods of meta analysi

A.J. Sutton et al.

regression can be used to explore their impact. Meta-regression models are very similar in principle toordinary simple linear regression models, the maindifferences being that individual observations (theprimary studies), unlike individual patients, are notgiven equal weight in the analysis (i.e. study shouldbe weighted according to its precision). Addition-ally, it may be desirable to include a random-effect term to account for residual heterogeneity notexplained by the covariate(s); such a model can bethought of as an extension to the random-effectmodel described above (Berkey et al. 1995). Anexample of a meta-regression analysis is given insection 3.

Meta-regression techniques are currently used relatively rarely, and the authors believe not to their full potential, but examples are emerging (Freemantle et al. 1999; von Dadelszen et al.2000). Although a powerful tool, they do have theirlimitations. Regression analysis of this type are alsosusceptible to aggregation bias, which occurs if therelation between patient characteristic study meansand outcomes do not directly reflect the relationbetween individuals’ values and individuals’ out-comes (Greenland 1987). Additionally, meta-regression type analyses are often limited by thenumber of studies included in the meta-analysis.Special regression models have also been developedto explore the effect of patients’ underlying risk onintervention effect (Senn et al. 1996; Walter 1997)which are necessary to avoid producing incorrectresults when exploring the effect of such a factor(Schmid et al. 1998; Senn et al. 1996).

3 Threats to the validity of a meta-analysis

Although meta-analyses are often considered toprovide the highest grade of evidence availableregarding the effectiveness of an intervention, higherthan an individual trial, it should not be forgottenthat they are a type of observational study, and assuch are open to biases which may threaten theirvalidity. Perhaps the two most serious problemswhich can potentially lead to biased estimates arepublication bias and variable study quality of theprimary studies. These two issues are consideredfurther below.

Publication and related biases

Publication bias exists because research with statisti-cally significant or interesting results is potentiallymore likely to be submitted, published or publishedmore rapidly than work with null or non-significantresults (Song et al. 2000). When only the publishedliterature is included in a meta-analysis, this canpotentially lead to biased over-optimistic conclu-sions. Related biases which can also bias the resultsof a meta-analysis include (i) pipeline bias, when sig-nificant results are published quicker than non-sig-nificant ones; and (ii) language bias, when researcherswhose native tongue is not English are more likely topublish their non-significant results in non-Englishwritten journals, but are more likely to publish theirsignificant results in English. If this happens, a meta-analysis including only study reports in English maybe based on a biased collection of studies. Perhaps anappropriate term which includes all these sources ofbias is ‘dissemination bias’ (Song et al. 2000).

Long-term initiatives to alleviate the problem ofpublication bias have commenced, including trialamnesties (Horton 1997) to encourage publication ofpreviously unpublished trials, and the creation of reg-istries for prospective registration of trials (Horton& Smith 1999). However, the issue is currently still a big concern for researchers carrying out meta-analyses. There are certain measures which can betaken to assess the presence and minimize the impactof publication bias in a meta-analysis dataset. Cur-rently, however, there is much debate, and somedispute as to the approach researchers should take todeal with publication bias in meta-analyses.

The presence of publication bias in a meta-analysis dataset can be assessed informally by inspec-tion of a funnel plot (Light & Pillemar 1984). Thisplots the effect size for each study against somemeasure of its precision, e.g. the 1/standard error ofthe effect size. The resulting plot should be shapedlike a funnel if no publication bias is present. Thisshape is expected because trials of decreasing sizehave increasingly large variation in their effect size estimates due to random variation becomingincreasingly influential. However, if the chance ofpublication is greater for larger trials or trials withstatistically significant results, some small non-significant studies may not appear in the literature.

142 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148

Page 9: An illustrated guide to the methods of meta analysi

This leads to omission of trials in one corner of theplot – the bottom right-hand corner of the plot whenan ‘undesirable’ outcome such as the re-admissionrate is being considered, and hence to a degree ofasymmetry in the funnel. A funnel plot for the 30RCTs in the re-admissions dataset is provided inFig. 2. Visual inspection would suggest that there islittle evidence of publication bias in this dataset;however, there are a few small studies with extremelybeneficial RRRs at the bottom left-hand corner ofthe plot, for which there are no symmetric counter-parts with extreme positive RRRs in the bottomright-hand corner.

Publication bias can be tested for more formallyusing statistical tests which are based on the samesymmetry assumptions as a funnel plot assessment(Begg & Mazumdar 1994; Egger et al. 1997; Duval &Tweedie 1998). One formal test (Egger et al. 1997)produces a non-significant P-value of 0.57 for the re-admissions dataset, which is consistent with theinconclusive visual assessment.

Disagreement exists about how to proceed if pub-lication bias is suspected, after an assessment for itspresence has been made. Methods to assess the likelyimpact of publication bias on the pooled outcomeestimate have been developed (Duval & Tweedie1998; Givens et al. 1997; Copas 1999; Song et al. 2000)but they are not widely used, due partly to the factthat many are complex and hence difficult to imple-ment, and due partly to concerns about their applic-ability. We believe that the use of such methods as

part of a sensitivity analysis is sensible (Sutton et al.2000a) but more research is needed in this area.

Study quality

It is rare that all the studies available for a meta-analysis are of a unanimously high quality. Morelikely there will be a range in the quality of theresearch pertaining to the intervention of interest.Restricting a meta-analysis to include only RCTs isa safeguard taken by such groups as the CochraneCollaboration, in an attempt to include only evidencewhich potentially produces the least biased results.Restricting analyses only to RCT does not guaranteethe meta-analysis will produce an unbiased result,however, as there can still be methodological flaws in the design, conduct and analysis of a trial. Clearly,the inclusion of poor or flawed studies in a meta-analysis may be problematic because their influencemay bias the pooled result and even mean the meta-analysis cones to the wrong qualitative conclusions.Unfortunately, most studies are flawed to somedegree, and including all but ‘perfect’ studies (whichmay not be possible to conduct due to ethical or prac-tical constraints in some fields) may leave the meta-analyst with few if any data. The problem of dealingwith study quality in a meta-analysis is similar to thatfor publication bias, in the sense that there is agree-ment that some assessment of quality should alwaysbe made, but little consensus on how to make suchan assessment, or how to incorporate the results intothe meta-analysis.

There have been many scales and checklists devel-oped to aid in the assessment of study quality (Moheret al. 1995) but many of them have come under heavycriticism for not being constructed scientifically(Moher et al. 1999a). Further, recent work has de-monstrated that different results can be obtained ina meta-analysis depending on the checklist used (Juniet al. 1999). A further problem is the fact that it isoften difficult to ascertain all the required details ofthe trial from a study report (Begg et al. 1996). Often,this means that an assessment of the trial report andnot of the trial itself is in effect being made. Theunderlying problem with the use of a scale or check-list is that it is impossible to predict which designaspects cause the most bias and, more fundamentally,it is often impossible to predict even the direction in

Meta-analysis methods

© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 143

20

16

12

8

4

0

1/st

anda

rd e

rror

(In

(RR

R))

0.25 0.5 0.75 1.0 1.5 2.0

Re-admission rate ratio (log scale)

Figure 2 Funnel plot of studies included in the hospitaldischarge meta-analysis examining the effect ofinterventions on the effect of re-admission rates.

Page 10: An illustrated guide to the methods of meta analysi

A.J. Sutton et al.

which any bias will be acting (Schulz et al. 1995). Thismakes the direct adjustment of study estimates forstudy quality impossible.

Several ways in which study quality can be incor-porated into a meta-analysis have been suggested.Perhaps the simplest is to use a quality threshold toinclude or exclude studies. This could be definedusing a cut-off value, on a particular quality scale, oras a requirement of having several design aspectspresent. A further possibility is to use a quality scoreto weight study results, or incorporate such a scoreinto the standard precision weightings (Berard &Bravo 1998). Finally, an approach which appears tobe gaining support is the exploration of quality, viameta-regression. In such an approach a quality score,or individual markers of study quality, such as thedegree of blinding or method of treatment allocation,are included in a regression model as explanatoryvariables. Examining individual markers of qualityseparately eliminates the problems with the some-what arbitrary construction for the quality scalescoring systems (Detsky et al. 1992).

Returning to the re-admissions meta-analysis,study quality was rated crudely using a count of effec-tive practice and organization of care (EPOC)quality criteria (Cochrane Effective Practice & Orga-nization of Care Review Group 1998) that were satisfied for each study. The scores obtained by eachtrial using this method are given in the penultimatecolumn of Table 1. When these scores are included ina random effect regression model, the equationln(RRR) = -0.22 + 0.007 ¥ quality score is obtained.This regression line, together with the primarystudies (the size of the plotting symbol is propor-tional to the precision of the effect size estimate), areplotted in Fig. 3. The quality score coefficient is smalland not statistically significant (P = 0.88). This meansstudy quality, at least as measured in this way, wouldnot appear to affect the study results systematically,or to explain the between-study heterogeneity.

4 Further developments in methods of meta-analysis

Specialist meta-analysis methods

While section 2 provided a summary of the mostcommonly used methods in meta-analysis, many

further developments have been made. A proportionof these focus on the synthesis of less standard datatypes. For example, specialist methods are requiredto pool the results of diagnostic tests because twooutcomes, specificity and sensitivity, require simulta-neous consideration (Irwig et al. 1995). Another areawhich requires special methods is the analysis of sur-vival data because account has to be made of cen-sored observations. (Dear 1994) Other data-types forwhich specialist methods have been developed aredose–response data (Tweedie & Mengersen 1995)and economic data (Jefferson et al. 1996). Individualpatient data (Stewart & Clarke 1995), where originalstudy datasets are pooled, rather than relying on pub-lished summary data has been described as the goldstandard, it is considered by some to be the only wayto carry out a meta-analysis of survival data, and ismuch more time consuming and costly than meta-analysis of summary data. It is currently unclearwhether the extra effort required is worthwhile. Foran overview of these and further meta-analyticaldevelopments see Sutton et al. (2000c).

New directions for meta-analysis using Bayesian statistics

In addition to the above developments, moreadvanced methods for synthesis of information havebeen developed. Although not currently used rou-

144 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148

2.0

1.0

0.5

0.2

Re-

adm

issi

on r

ate

ratio

(lo

g sc

ale)

1 2 3 4 5 6 7

EPOC quality score

Figure 3 Regression line examining the impact ofquality score, using a random effect meta-regressionmodel for re-admission rate in the hospital dischargemeta-analysis.

Page 11: An illustrated guide to the methods of meta analysi

tinely, these provide potentially more powerful andflexible tools for synthesizing evidence. Many ofthese methods use Bayesian statistics, in contrast tothe more commonly employed classical approach.

A full description of Bayesian methods is not pos-sible here, but for a recent review of their use inassessing health technologies see Spiegelhalter et al.(2000a). The key element of the Bayesian approachis that it introduces the idea of subjective probability(O’Hagan 1988) in contrast to the objective pro-babilities traditionally attached to specific, oftenrepeatable, events. Before carrying out a piece ofresearch, an investigator would have formed someprior beliefs regarding its outcome, possibly derivedfrom results of previous research in the same field.These a priori beliefs are combined with the datafrom the current investigation to produce resultswhich reflect the researchers beliefs having con-ducted the research. These posterior beliefs are cal-culated by combining the prior beliefs with the newdata using Bayes’ Theorem, which forms the back-bone of all Bayesian analysis.

The advantages of using such an approach areoften subtle, but important. Perhaps most notablefrom a health-care context is the ability to makedirect probability statements regarding quantities ofinterest, for example, the probability that patientsreceiving drug A have better survival than those who receive drug B. There are good reasons,however, why the Bayesian approach has largelybeen neglected in routine use. The most serious is that, generally, the computations required inBayesian models are very complex. Additionally, theexpressing of prior beliefs in form which can beincluded in analysis is a non-trivial task. Excitingly,many of the computational difficulties have beenaddressed recently, with the development of special-ist software, most notably WinBUGS (Spiegelhalteret al. 2000b). The problem of expressing prior beliefsremains; however, there are practical ‘solutions’,including using ‘off-the-shelf’ priors, which canexpress the presence of a range of degrees of priorknowledge, and can be used in a sensitivity analysis.Use of ‘vague priors’, which essentially means priorinformation is ignored, is also possible.

The new WinBUGS software is able to computethe calculations required for a wide range ofBayesian analyses. The user has freedom to imple-

ment many existing meta-analysis methods devel-oped classically and, more importantly, developmodels not possible using more traditional classicalsoftware. This has potentially huge benefits for syn-thesizing information and builds on earlier pioneer-ing work by Eddy et al. (1992), whose ‘new’ graphicalapproach to meta-analysis can now be implementedusing WinBUGS (Spiegelhalter et al. 2000b). Issuesbeing addressed by these methods are outlinedbelow:1 Data from an RCT may be of direct interest, butnot of a form which can simply included in a meta-analysis. For example, data from an RCT which usesthe intervention of interest in the treatment arm, buta different intervention from the other studies in the control arm may be available. Methods to include such data have been developed (Higgins &Whitehead 1996).2 In some assessments considering only random-ized evidence may not be the optimal approach.Observational studies, which could potentially bevery large, providing valuable data on thousands ofpatients, may be available. It may sometimes seemunjust to exclude these from a meta-analysis, partic-ularly if they are of high quality, as they may haveparticular strengths and weaknesses, different fromthose of randomized studies (Droitcour et al. 1993).Special methods have been developed to account fordifferent study designs in a meta-analysis (Prevostet al. 2000; Larose & Dey 1997). In other instancesdata on the effect of a drug of interest in animals maybe available and provide valuable information whichcan be incorporated (DuMouchel & Harris 1983).3 There may be benefits to including informationincluded in previous trials or meta-analyses onsimilar topics using similar interventions and out-come measures (Higgins & Whitehead 1996).4 A study may not provide any quantitative data atall, being qualitative in design, but this qualitativedata may be of direct relevance to the topic underassessment (Roberts et al. 1998).

Bayesian modelling gives us the potential toinclude all these types of data in a variety of ways,including direct input into the model, or incorporatedthrough the specification of prior beliefs.

Other new approaches to meta-analysis have beensuggested, but the corresponding methodology is atthe conceptual rather than practical stage of devel-

Meta-analysis methods

© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 145

Page 12: An illustrated guide to the methods of meta analysi

A.J. Sutton et al.

opment. The extension of meta-regression to thesimultaneous modelling of multiple scientific factorswith the intention of producing a response surface oftreatment effects, rather than a single pooled resulthas been advocated (Rubin 1992). This may allow amore detailed examination of the science underlyingthe results synthesized (Rubin 1992; Lau et al. 1998).Further, it may be possible to model different aspectsof the processes under study separately. For example,if one were interested in the effect of lowering cho-lesterol of clinical outcomes, in a first stage of theanalysis data relating to the degree different inter-ventions lower cholesterol levels could be synthe-sized. Then, in a second stage, the relationshipbetween cholesterol level and various clinical out-comes could be modelled (Katerndahl & Lawler1999). A further utilization of Bayesian modellingcould allow meta-analysis to be placed within a deci-sion theoretical framework (Berger 1980) which canalso take into account utilities when making healthcare or policy decisions (Midgette et al. 1994).

However, there is no magic wand to make all thishappen. While Bayesian modelling provides flexibil-ity and framework, it does not dictate how modelsshould be specified, data should be incorporated, orhow priors should be elicited.There is much method-ological work required to further develop the ideasoutlined above.

5 Conclusion

Much has been written on meta-analysis and the syn-thesis of evidence within the medical literature overthe past two decades. During this time, the basic syn-thesizing of effect measures using weighed averageshas been refined to a high degree, and much of themethodology required to do so is in place for mostsituations encountered. Threats to the validity ofmeta-analysis exist, and the methods for dealing withproblems such as publication bias and variations inquality of the primary studies are at a less refinedstage. Additionally, many consider the standard‘weighted average approach’ to meta-analysis not tobe ‘state of the art’ in at least some situations, wherethe use of more sophisticated methods, generally tosynthesize a broader base of evidence, would beadvantageous. Currently, such approaches are stillfirmly in the experimental stage and unfortunately

ideas which sound natural and appealing are oftendifficult to implement in practice. Clearly, it will besome time before they are used routinely, but signif-icant steps have been made. Moving the synthesis ofevidence beyond calculating simple averages istimely, feasible and, indeed, essential.

Acknowledgements

The research on which this paper is based wasfunded, in part, by the NHS Research and Develop-ment Health Technology Assessment Programme(Methodology Project Numbers 93/52/3 & 95/09/03)

References

Begg C., Cho M., Eastwood S., Horton R., Moher O. &Olkin I. (1996) Improving the quality of reporting of randomised controlled trials: the CONSORT statement.Journal of the American Medical Association 276,637–639.

Begg C.B. & Mazumdar M. (1994) Operating characteris-tics of a rank correlation test for publication bias. Bio-metrics 50, 1088–1101.

Berard A. & Bravo G. (1998) Combining studies usingeffect sizes and quality scores: application to bone lossin postmenopausal women. Journal of Clinical Epidemi-ology 51, 801–807.

Berger J.O. (1980). Statistical Decision Theory and BayesianAnalysis, 2nd edn. Springer-Verlag, New York.

Berkey C.S., Hoaglin D.C., Mosteller F. & Colditz G.A.(1995) A random-effects regression model for meta-analysis. Statistics in Medicine 14, 395–411.

Biggerstaff B.J. & Tweedie R.L. (1997) Incorporating vari-ability in estimates of heterogeneity in the randomeffects model in meta-analysis. Statistics in Medicine 16,753–768.

Boissel J.P., Blanchard J., Panak E., Peyrieux J.C., SACKS& H. (1989) Considerations for the meta-analysis of ran-domized clinical trials: summary of a panel discussion.Controlled Clinical Trials 10, 254–281.

Cochrane Effective Practice and Organisation of CareReview Group (1998) The Data Collection Checklist.University of Aberdeen, HSRU, Aberdeen.

Copas J. (1999) What works?: selectivity models and meta-analysis. Journal of the Royal Statistical Society, Series A161, 95–105.

von Dadelszen P., Ornstein M.P., Bull S.B., Logan A.G.,Koren G. & Magee L.A. (2000) Fall in mean arterialpressure and fetal growth restriction in pregnancy hyper-tension: a meta-analysis. Lancet 355, 87–92.

146 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148

Page 13: An illustrated guide to the methods of meta analysi

Dear K.B.G. (1994) Iterative generalized least squares formeta-analysis of survival data at multiple times. Biomet-rics 50, 989–1002.

Deeks J., Glanville J. & Sheldon T. (1996) Undertaking sys-tematic reviews of research on effectiveness: CRDguidelines for those carrying out or commissioningreviews. Report no. 4. Centre for Reviews and Dissemi-nation. York Publishing Services Ltd, York.

DerSimonian R. & Laird N. (1986) Meta-analysis in clini-cal trials. Controlled Clinical Trials 7, 177–188.

Detsky A.S., Naylor C.D.O., Rourke K., McGeer A.J.L.,Abbe K.A., O’Rourke K. & L’Abbe K.A. (1992) Incor-porating variations in the quality of individual random-ized trials into meta-analysis. Journal of ClinicalEpidemiology 45, 255–265.

Dickersin K., Scherer R. & Lefebvre C. (1994) Systematicreviews – identifying relevant studies for systematicreviews. British Medical Journal 309, 1286–1291.

Droitcour J., Silberman G. & Chelimsky E. (1993) Cross-design synthesis: a new form of meta-analysis for combining results from randomized clinical trials and medical-practice databases. International Journalof Technology Assessment in Health Care 9, 440–449.

DuMouchel W.H. & Harris J.E. (1983) Bayes methods forcombining the results of cancer studies in humans andother species (with comment). Journal of the AmericanStatistical Association 78, 293–308.

Duval S. & Tweedie R. (1998) Practical estimates of theeffect of publication bias in meta-analysis. AustralasianEpidemiologist 5, 14–17.

Eddy D.M., Hasselblad V. & Shachter R. (1992) Meta-Analysis by the Confidence Profile Method. AcademicPress, San Diego.

Egger M., Smith G.D., Schneider M. & Minder C. (1997)Bias in meta-analysis detected by a simple, graphical test.British Medical Journal 315, 629–634.

Fleiss J.L. (1993) The statistical basis of meta-analysis. Sta-tistical Methods in Medical Research 2, 121–145.

Fleiss J.L. (1994) Measures of effect size for categoricaldata. In The Handbook of Research Synthesis (eds H.Cooper & L.V. Hedges), pp. 245–260. Russell Sage Foun-dation, New York.

Freemantle N., Cleland J., Young P., Mason J. & HarrisonJ. (1999) b-Blockade after myocardial infarction: sys-tematic review and meta regression analysis. BritishMedical Journal 318, 1730–1737.

Givens G.H., Smith D.D. & Tweedie R.L. (1997) Publica-tion bias in meta-analysis: a Bayesian data-augmentationapproach to account for issues exemplified in the passivesmoking debate. Statistical Science 12, 221–250.

Greenland S. (1987) Quantitative methods in the review of

epidemiological literature. Epidemiological Review 9,1–30.

Hardy R.J. & Thompson S.G. (1996) A likelihood approachto meta-analysis with random effects. Statistics in Medi-cine 15, 619–629.

Higgins J.P.T. & Whitehead A. (1996) Borrowing strengthfrom external trials in a meta-analysis. Statistics in Med-icine 15, 2733–2749.

Hollis S. & Campbell F. (1999) What is meant by intentionto treat analysis? Survey of published randomised con-trolled trials. British Medical Journal 319, 670–674.

Horton R. (1997) Medical editors trial amnesty. Lancet 350,756.

Horton R. & Smith R. (1999) Time to register randomisedtrials – the case is now unanswerable. British MedicalJournal 319, 865–866.

Hunt M. (1997) How Science Takes Stock: the story of meta-analysis. Russell Sage Foundation, New York.

Irwig L., Macaskill P., Glasziou P. & Fahey M. (1995) Meta-analytic methods for diagnostic test accuracy. Journal ofClinical Epidemiology 48, 119–130.

Jefferson T., Mugford M., Gray A. & DeMicheli V. (1996)An exercise in the feasibility of carrying out secondaryeconomic analysis. Health Economics 5, 155–165.

Juni P.,Witschi A., Bloch R. & Egger M. (1999) The hazardsof scoring the quality of clinical trials for meta-analysis.Journal of the American Medical Association 282,1054–1060.

Katerndahl D.A. & Lawler W.R. (1999) Variability in meta-analytic results concerning the value of cholesterol re-duction in coronary heart disease: a meta-meta-analysis.American Journal of Epidemiology 149, 429–441.

Larose D.T. & Dey D.K. (1997) Grouped random effectsmodels for Bayesian meta-analysis. Statistics in Medicine16, 1817–1829.

Lau J., Ioannidis J.P. & Schmid C.H. (1998) Summing upevidence: one answer is not always enough. Lancet 351,123–127.

Light R.J. & Pillemar D.B. (1984) Summing Up: the scienceof reviewing research. Harvard University Press, Cam-bridge, MA.

Midgette A.S., Wong J.B., Beshansky J.R., Porath A.,Fleming C. & Pauker S.G. (1994) Cost-effectiveness ofstreptokinase for acute myocardial-infarction – a com-bined metaanalysis and decision-analysis of the effectsof infarct location and of likelihood of infarction.Medical Decision Making 14, 108–117.

Moher D., Cook D.J., Eastwood S., Olkin I., Rennie D. &Stroup D. for the QUORUM Group (1999b) Improvingthe quality of reporting of meta-analysis of randomisedcontrolled trials: the QUORUM statement. Lancet 354,1896–1900.

Meta-analysis methods

© 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148 147

Page 14: An illustrated guide to the methods of meta analysi

A.J. Sutton et al.

Moher D., Jadad A.R., Nichol G., Penman M., Tugwell P. &Walsh S. (1995) Assessing the quality of randomized con-trolled trials – an annotated bibliography of scales andchecklists. Controlled Clinical Trials 12, 62–73.

Moher D., Klassen T.P., Jadad A.R.,Tugwell P., Moher M. &Jones A.L. (1999a) Assessing the quality of randomisedcontrolled trials: implications for the conduct of meta-analyses. Health Technology Assessment 3(12), 1–98.

O’Hagan A. (1988) Probability: methods and measurement.Chapman & Hall, London.

Oxman A.D. (1996) The Cochrane Collaboration Hand-book: preparing and maintaining systematic reviews, 2ndedn. Cochrane Collaboration, Oxford.

Parker S.G., Peet S.M., McPherson A., Cannaby A.M.,Baker R.,Wilson A., Lindesay J., Parker G.,Abrams K.R.& Jones D.R. (2001) A systematic review of dischargearrangements for older people. Health TechnologyAssessment (in press).

Peto R. (1987) Why do we need systematic overviews ofrandomised trials? Statistics in Medicine 6, 233–240.

Prevost T.C., Abrans K.R. & Jones D.R. (2000) Hierarchi-cal models in generalized synthesis of evidence: anexample based on studies of breast cancer screening.Statistics in Medicine 19, 3359–3376.

Roberts K.A., Jones D.R., Abrams K.R., Dixon-Woods M.& Fitzpatrick R. (1998) Meta-analysis of qualitative andquantitative evidence: an example based on studies ofpatient satisfaction. Technical Report 98–01, Universityof Leicester: Department of Epidemiology and PublicHealth, Leicester.

Rubin D. (1992) A new perspective. In The Future of Meta-Analysis (eds K.W. Wachter & M.L. Straf), pp. 155–165.Russell Sage Foundation, New York.

Schmid C.H., Lau J., McIntosh M.W. & Cappelleri J.C.(1998) An empirical study of the effect of the controlrate as a predictor of treatment efficacy in meta-analysisof clinical trials. Statistics in Medicine 17, 1923–1942.

Schulz K.F., Chalmers I., Hayes R.J. & Altman D.G. (1995)Empirical evidence of bias: dimensions of methodologi-cal quality associated with estimates of treatment effectsin controlled trials. Journal of the American MedicalAssociation 273, 408–412.

Senn S., Sharp S., Thompson S. & Altman D. (1996) Rela-tion between treatment benefit and underlying risk inmeta-analysis. British Medical Journal 313, 1550–1551.

Shadish W.R. & Haddock C.K. (1994) Combining estimatesof effect size. In The Handbook of Research Synthesis(eds H. Cooper & L.V. Hedges), pp. 261–284. RussellSage Foundation, New York.

Song F., Easterwood A., Gilbody S., Duley L. & Sutton A.J. (2000) Publication and other selection biases in systematic reviews. Health Technology Assessment 4(10),1–115.

Spiegelhalter D.J., Miles J.P., Jones D.R. & Abrams K.R.(2000a) Bayesian methods in health technology assess-ment. Health Technology Assessment 4(38), 1–142.

Spiegelhalter D.J., Thomas A. & Best N.G. (2000b)Winbugs, version 1.2. user manual. MRC BiostatisticsUnit, Cambridge.

Sterne J.A.C., Egger M. & Sutton A.J. (2001) Meta-analysis software. In Systematic Reviews in Health Care:meta-analysis in context, 2nd edn (eds M. Egger, G.Davey Smith & D.G. Altman), pp. 336–346. BMJ Books,London.

Stewart L.A. & Clarke M.J. (1995) Practical methodologyof meta-analyses (overviews) using updated individualpatient data. Cochrane Working Group on StatisticalMedicine 14, 2057–2079.

Sutton A.J.,Abrams K.R., Jones D.R., Sheldon T.A. & SongF. (1998) Systematic reviews of trials and other studies.Health Technology Assessment 2(19), 1–310.

Sutton A.J., Abrams K.R., Jones D.R., Sheldon T.A. & Song F. (2000c) Methods for Meta-Analysis in MedicalResearch. John Wiley, London.

Sutton A.J., Duval S.J.,Tweedie R.L.,Abrams K.R. & JonesD.R. (2000a) Empirical assessment of effect of publica-tion bias on meta-analyses. British Medical Journal 320,1574–1577.

Sutton A.J., Lambert P.C., Hellmich M., Abrams K.R. &Jones D.R. (2000b) Meta-analysis in practice: a criticalreview of available software. In Meta-Analysis in Medi-cine and Health Policy (eds D.A. Berry & D.K. Stangl).Marcel Dekker, New York.

Thompson S.G. (1993) Controversies in meta-analysis: thecase of the trials of serum cholesterol reduction. Statisti-cal Methods in Medical Research 2, 173–192.

Tweedie R.L. & Mengersen K.L. (1995) Meta-analyticapproaches to dose–response relationships, with appli-cation in studies of lung cancer and exposure to envi-ronmental tobacco smoke. Statistics in Medicine 14,545–569.

Walter S.D. (1997) Variation in baseline risk as an expla-nation of heterogeneity in meta-analysis. Statistics inMedicine 16, 2883–2900.

Whitehead A. & Jones N.M.B. (1994) A meta-analysis ofclinical trials involving different classifications ofresponse into ordered categories. Statistics in Medicine13, 2503–2515.

148 © 2001 Blackwell Science, Journal of Evaluation in Clinical Practice, 7, 2, 135–148