multivariate meta-analysis
DESCRIPTION
An overview of multivariate meta-analysis in StataTRANSCRIPT
Multivariate Meta-analysis
Pantelis G. Bagos, PhDAssistant Professor
Niki L. Dimou, MScPhd Candidate
University of Central Greece, Lamia, Greece Department of Computer Science and Biomedical Informatics
Lamia 2012
Meta-analysis
■ Combining the estimates of several studies
■ The methodology dates back to Fisher
■ The term appeared for the first time in Psychology (Glass, 1976)
■ In its simpler form, it is a weighted average of the estimates
■ Improves the statistical power to detect weak effects
Glass GV. Primary, secondary, and meta-analysis of research. Educational Researcher, 1976; 5: 3-8
Nikolopoulos GΚ, Tsantes AΕ, Bagos PG, Travlou A, Vaiopoulos G. Integrin, alpha 2 gene C807T Polymorphism and Risk of Ischemic Stroke: a Meta-Analysis. 2007, Thrombosis Research; 119 (4): 501-510
Tsantes AΕ, Nikolopoulos GΚ, Bagos PG, Rapti E, Mantzios G, Kapsimali V, Travlou A. Association between the Plasminogen Activator Inhibitor-1 4G/5G Polymorphism and Venous Thrombosis: a Meta-Analysis. 2007, Thrombosis and Haemostasis; 97(6):907-13
Statistical models
Fixed effects models
Random effects models
Multivariate meta-analysis
■ In many situations we need to model simultaneously two or more effect sizes from each study
■ This may be due to their complementarity (i.e. sensitivity-specificity)
■ There may be multiple treatments, multiple outcomes, or multiple risk factors
■ The joint analysis usually increases the power by borrowing strength from external studies
■ The joint analysis takes into account the correlation of the estimates and thus, it allows the comparison of the estimates after fitting the model
Higgins JP, Whitehead A (1996) Borrowing strength from external trials in a meta-analysis. Stat Med 15(24): 2733-2749
The model
van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med. 2002;21(4):589-624
Estimation
■ Based on the multivariate normal distribution
■ ML, REML and method of moments (non-iterative)
■ Studies reporting a subset of the outcomes are treated as MAR
Berkey CS, Hoaglin DC, Antczak-Bouckoms A, Mosteller F, Colditz GA (1998) Meta-analysis of multiple outcomes by regression with random effects. Stat Med 17(22): 2537-2550
Jackson D, Riley R, White IR.Multivariate meta-analysis: Potential and promise. Stat Med. 2011 Jan 26.
Jackson D, White IR, Thompson SG: Extending DerSimonian and Laird's methodology to perform multivariate random effects meta-analyses. Stat Med 2010, 29:1282-1297
What about within-studies correlation?
■ Just like the variance, the covariance is assumed known and needs to be available
■ In the majority of the situations however, unlike variance, the covariance is not reported in publications and one needs access to individual data in order to calculate it
■ Ignoring or approximating the correlation results in biased overall estimates of variance
Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol. 2007 12;7:3.
An alternative model
It is different from all the above-mentioned models, in that it uses an estimate for the overall correlation (ρ). That is, rather than partitioning the overall correlation into within-study and between-study components, it uses a “single” parameter, ρ, to model directly the overall correlation. The additional variation beyond sampling error is indicated by ψ1, ψ2. However, these are not directly equivalent to τ1, τ2, the between-study variances in the general model, although in some circumstances they may be similar. The model is not hierarchical and it can also include studies that provide only one of the 2 endpoints under a missing at random assumption.
Riley RD, Thompson JR, Abrams KR (2008) An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics 9(1): 172-186
The general case
■ Let Y, X1, and X2 denote three categorical random variables with two levels (i.e. 0,1), used to classify n individuals.
■ Usually, in a retrospective case-control study Y denotes the case-control status and X1, X2 denote the two different exposures.
■ In clinical trials (or in prospective observational studies), Y will denote the disease/non-disease outcome whereas X1, X2 may denote two different treatments (or exposures).
■ Alternatively, Y may denote the single treatment whereas X1, X2 will denote the two alternative outcomes.
Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine. 2012
The data
The data (cont.)
The odds ratios
The odds ratios
The main result
■ The main purpose of this work is to derive an estimate for the covariance and express it using solely the observed counts of the contingency tables (Table 1 and Table 2). I will show that the estimate of the covariance is given by:
■ It is interesting to note at this point that the covariance depends only on the observed counts nijk, nij+ and ni+k. If nij+ or ni+k becomes zero, a simple correction can be employed adding c=½ to the cell counts.
■ It is clear that for calculating the covariance requires knowledge of the full distribution of the counts in Table 1 (i.e. nijk ). However, if we recall that nij+, ni+k and n+jk are the minimal sufficient statistics for obtaining the maximum likelihood (ML) estimates of the 2x2x2 contingency table in the case of no three-factor interaction, we realize that the covariance can theoretically be calculated in certain cases even when the nijk are not directly observed, provided that we assume no three-way interaction.
Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine. 2012
Special cases already known in the literature
■ Dose-response models■ Genetic association studies■ Observational studies that share the same group of controls■ Clinical trials with multiple treatments that share a common
placebo group■ Mutually exclusive outcomes
Dose-response models and Genetic association studies
In such cases where we have two (or more) categories of exposure that are compared against the same baseline group (no exposure), we will have n1+0=n10+ and n00+= n0+0, and thus:
Berrington A, Cox DR. Generalized least squares for the synthesis of correlated information. Biostatistics 2003, 4(3):423-431.Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. Am J Epidemiol 1992, 135(11):1301-1309
Bagos PG. A unification of multivariate methods for meta-analysis of genetic association studies. Stat Appl Genet Mol Biol 2008, 7:Article31
Bagos PG. Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature. BMC Genet 2011, 12:8
Observational studies that share the same group of controls/ clinical trials with multiple treatments (common placebo group)
■ The latter has become very important recently, since it finds applications in the so-called multiple treatment comparison or network meta-analysis. In such a case we will have n01+ =n0+1 and n00+ =n0+0 and thus:
Gleser LJ, Olkin I. Stochastically dependent effect sizes. In: The Handbook of Research Synthesis Edited by Cooper HM, Hedges LV. New York: Russell Sage Foundation; 1994: 339-355
Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med 2004, 23(20):3105-3124.
Mutually exclusive outcomes
■ In this particular situation (i.e. when we have death from cancer, death from other cause and no death at all), the odds-ratios are calculated against all other alternatives and not only against the “alive” category. Thus, using the notation of Table 2, we will have Y denoting the treatment and X1 and X2 denoting the mutually exclusive outcomes and it is easily understood that the two log-odds ratios will be negatively correlated. To reconstruct this scenario using the notation followed here, we have to resort to Table 1 and observe that n111=n011=0 by design (a person cannot die from both causes) and that the remaining counts are disjoint:
Trikalinos TA, Olkin I. A method for the meta-analysis of mutually exclusive binary outcomes. Stat Med 2008, 27(21):4279-4300
New applications
■ Combining matched and unmatched case-control studies–Moreno V, Martin ML, Bosch FX, de Sanjose S, Torres F, Munoz N. Combined analysis of matched and unmatched case-control studies: comparison of risk estimates from different studies. Am J Epidemiol 1996, 143(3):293-300.
■ Combining population-based and family-based genetic association studies
–Pfeiffer RM, Pee D, Landi MT. On combining family and case-control studies. Genet Epidemiol 2008, 32(7):638-646
■ Multipoint meta-analysis of genetic association studies■ Meta-analysis of diagnostic tests–For all the above, only methods that use individual data were available
More information
● Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine. 2012
● Bagos PG, Dimou NL, Liakopoulos TD, Nikolopoulos GK. Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases. Statistical Applications in Genetics and Molecular Biology.2011, 10(1):Article19
● Bagos PG. Meta-analysis of haplotype-association studies: Comparison of methods and empirical evaluation of the literature, 2011, BMC Genetics, 12:8
● Bagos PG, Liakopoulos TD. A multipoint method for meta-analysis of genetic association studies. 2010, Genetic Epidemiology, 34(7):702-15
http://www.compgen.org/publications/by-subject
Applications in Stata
● Baseline risk● Diagnostic tests● Multiple outcomes with known within studies
correlation● Multiple studies with unknown within studies
correlation● Multiple treatments● Genetic association studies● Mendelian randomization
Requirements
● A working version of Stata (www.stata.com)● GLLAMM for Stata (www.gllamm.org)● mvmeta v.2 for Stata (from within Stata type: net from
http://www.mrc-bsu.cam.ac.uk/IW_Stata/) ● The Stata do-files are available at:http://www.compgen.org/material/meta-analysis/multivariate
Baseline risk (1)
■ The meta-analysis concerns 13 trials on the efficacy of BCG vaccine against tuberculosis.
■ In each trial a vaccinated group is compared with a non-vaccinated control group.
■ Some covariates are available that might explain the heterogeneity among studies: geographic latitude of the place where the study was done; year of publication, and method of treatment allocation (random, alternate or systematic).
■ The main question behind the discussion on baseline risk is whether the baseline risk (risk in the non-vaccinated group) can be a source of heterogeneity. However, the log-odds ratio and the log-odds of the non-vaccinated group are correlated (regression to the mean)
Example: BCG data
Colditz GA, Brewer FB, Berkey CS, Wilson EM, Burdick E, Fineberg HV, Mosteller F. Efficacy of BCG vaccine in the prevention of tuberculosis. Journal of the American Medical Association 1994; 271:698 –702.
Baseline risk (2)
The data look like this: Where:
X_T: Vaccinated-Disease
n_T: Vaccinated-No disease
X_C: Not vaccinated-Disease
n_C: Not vaccinated-No disease
Lat: Latitude
alloc: Allocation (1: Random, 2: Alternate, 3: Systematic)
Baseline risk (3)
Figure 1. L’Abbe plot of observed log(odds) of the not-vaccinated trial arm versus the vaccinated trial arm. The size of the circle is an indication for the inverse of the variance of the log-odds ratio in that trial.
Baseline risk (4)We use bcg.do and we perform a univariate random effects meta-analysis.
Baseline risk (5)
The results are identical using mvmeta command: . mvmeta b V, vars(b1 b2) ml
Baseline risk (6)
■ The conditional variance of the true log-odds, and therefore also of the log-odds ratio, in the vaccinated group given the true log-odds in the not-vaccinated group is which is interpreted as the variance between treatment effects among trials with the same baseline risk.
■ The variance of the treatment effect, measured as the log-odds ratio, calculated from Σ is (1.4313709+2.4073333-2*1.7573268)=0.3240506
■ So baseline risk, measured as the true log-odds in the not-vaccinated group, explains (0.3240506−0.1485417)/0.3240506=54 per cent of the heterogeneity in vaccination effect between the trials.
Baseline risk (7)We can also use gllamm command assuming a bivariate normal distribution:
. gllamm logit grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) s(wgt) constraint(1) from(A) long adapt
Baseline risk (8)Or a binomial distribution:
. gllamm X grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) family(binomial) denom(z) link(logit) adapt trace allc
Diagnostic tests (1)
Usually, in diagnostic tests there are two parameters of interest (sensitivity and specificity) which are modelled simultaneously:
The marginal model on which we base the inference is:
Table 1. Classification of the findings of a diagnostic test according to the test result (positive or negative) and the disease status for study i (i=1,2,…,k).
Diagnostic tests (2)
■ The method was applied in the data obtained from a meta-analysis that aimed to determine whether Rheumatoid Factor (RF) identifies patients with Rheumatoid Arthritis (RA).
■ A total of 50 studies provided information concerning RF.
Example: RF for the diagnosis of Rheumatoid Arthritis
Nishimura K, Sugiyama D, Kogata Y, Tsuji G, Nakazawa T, Kawano S, et al. Meta-analysis: diagnostic accuracy of anti-cyclic citrullinated peptide antibody and rheumatoid factor for rheumatoid arthritis. Ann Intern Med. 2007 Jun 5;146(11):797-808.
Diagnostic tests (3)
Thus after running roc.do we get the following output for RF:. mvmeta b V, vars(b1 b2)
Diagnostic tests (4)
The construction of the ROC curves is also feasible for RF:
Diagnostic tests (5)
Or we can use the MIDAS command: midas tp fp fn tn, res(sum)
Known within studies correlation (1)
■ A recent meta-analysis by Antczak-Bouckoms et al. located 5 randomized controlled trials that compared a surgical procedure with a non-surgical procedure for the treatment of moderate periodontal disease.
■ The two outcomes assessed on each patient were (pre- to post-treatment mm changes in) probing depth (PD) and attachment level (AL) which are modeled simultaneously in our example.
■ The goal of treatment is to decrease probing depths and to increase attachment levels around the teeth.
Example: Periodontitis (PD-AL)
Antczak-Bouckoms, A., Joshipura, K., Burdick, E. and Tulloch, J. F. C. ‘Meta-analysis of surgical versus non-surgical method of treatment for periodontal disease’, Journal of Clinical Periodontology, 20, 259-268 (1993).
Known within studies correlation (2)
The data look like this:
Where:
PD: Improvement in Probing depth (surgical minus non surgical values)
AL: Improvement in Attachment level (surgical minus non surgical values)
V11, V22, V12: The within-trial covariance matrix of the two outcomes (means) in trial i
Known within studies correlation (3)
Thus, after running periodontitis.do we get the following output:
Known within studies correlation (4)
Known within studies correlation (5)
Unknown within studies correlation (1)
■ A systematic review in neuroblastoma sought to establish the prognostic importance of MYCN, a protooncogene.
■ In 17 studies, a log-hazard ratio estimate for “amplified” versus “non-amplified” MYCN was available for both disease-free survival (Yi1) and overall survival (Yi2).
■ However, no studies reported the within-study correlations, which are likely to be strongly positive due to the structural relationship between these endpoints.
■ Further, there were 64 studies which provided data for only one of the 2 endpoints.
Example: MYCN data
RILEY, R. D., HENEY, D., JONES, D. R., SUTTON, A. J., LAMBERT, P. C., ABRAMS, K. R., YOUNG, B., WAILOO, A. J. AND BURCHILL, S. A. (2004). A systematic review of molecular and biological tumor markers in neuroblastoma. Clinical Cancer Research 10, 4–12.
Unknown within studies correlation (2)The data look like this:
Where:b1: log-hazard ratio for “amplified” versus “non-amplified” MYCN for disease-free survivalb2: log-hazard ratio for “amplified” versus “non-amplified” MYCN for overall survivalV11: variance of b1V22: variance of b2
There is no V12!!!
Unknown within studies correlation (3)
After running mycn.do we get the following output:
Multiple treatments (1)
■ 26 clinical trials which investigate the prevention of cirrhosis using beta-blockers and sclerotherapy
■ Nine randomized clinical trials of beta-blockers and 19 trials of sclerotherapy were reviewed.
■ Crude rates of bleeding and death in treated and control groups were recorded.
Example: Cirrhosis and beta-blockers or sclerotherapy
Pagliaro L, D'Amico G, Sörensen TI, Lebrec D, Burroughs AK, Morabito A, Tiné F, Politi F, Traina M. Prevention of first bleeding in cirrhosis. A meta-analysis of randomized trials of nonsurgical treatment. Ann Intern Med.1992 Jul 1;117(1):59-70.
Multiple treatments (2)
The data look like this: Where:
r1: number of bleedings in patients treated with beta-blockers
n1: total number of patients treated with beta-blockers
r2: number of bleedings in patients treated with sclerotherapy
n2: total number of patients treated with sclerotherapy
r0: number of bleedings in controls
n0: total number of controls
Multiple treatments (3)
After running bleeding.do we get the following output:. mvmeta b V,vars( b1 b2 ) ml
Multiple treatments (4)
. mvmeta b V,vars( b1 b2) ml bscov(prop C)
.gllamm logit c1 c2 c3 , nocons i(id) nrf(1) eqs(c) s(wgt) constraint(1 ) nip(8) adapt
.gllamm r c1 c2 c3, nocons fam(binom) i(id) link(logit) eqs(c) nrf(1) adapt denom(n)
Genetic association studies (1)
Table 2. A typical layout of the data used in a meta-analysis of genetic association studies involving a single bi-allelic locus with a continuous outcome.
The marginal model on which we base the inference is:
Genetic association studies (2)
Example: Association of AGT M235T polymorphisms with Essential Hypertension
■ A total of 7 studies addressed the association of AGT M235T with Hypertension■ The two logORs derived from the mutant allele (TT vs. MM and MT vs. MM)
were modeled simultaneously as a bivariate response.
Bagos PG. A unification of multivariate methods for meta-analysis of genetic association studies. 2008, Statistical Applications in Genetics and Molecular Biology, 7(1), Article 13
Genetic association studies (3)
The data look like this:
Where:aa0: MM genotype for controls, ab0: MT genotype for controls, bb0: TT genotype for controlsaa1: MM genotype for cases, ab1: MT genotype for cases, bb1: TT genotype for cases
Genetic association studies (4)
Thus, after running genetics.do we get the following output:. mvmeta b V,vars(b1 b2) ml
Genetic association studies (5)
It is special case of the general model. It re-parameterizes the y1, y2, using λ= y1/ y2 and imposes a single between studies variance τ2. Thus, λ is treated as a fixed-effects parameter.
Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J (2005) The choice of a genetic model in the meta-analysis of molecular association studies. Int J Epidemiol 34(6): 1319-1328
Genetic association studies (6)
Under the genetic model-free approach proposed by Minelli et al. we get the following output (model_free.do):
Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J (2005) The choice of a genetic model in the meta-analysis of molecular association studies. Int J Epidemiol 34(6): 1319-1328
Mendelian randomisation (1)
■ Suppose that we have genotype, G, a continuous intermediate phenotype, IP, and a binary disease outcome D.
■ The gene is selected because of its influence on the phenotype and we wish to establish the relationship between phenotype and disease.
■ The hypothesised causal pathway is:
G IP DMinelli C, Thompson JR, Tobin MD, Abrams KR (2004) An integrated approach to the meta-analysis of genetic association studies using Mendelian randomization. Am J Epidemiol 160(5): 445-452Thompson JR, Minelli C, Abrams KR, Tobin MD, Riley RD (2005) Meta-analysis of genetic studies using Mendelian randomization--a multivariate approach. Stat Med 24(14): 2241-2254
Mendelian randomisation (2)
■ Let the Log Odds Ratio of Disease given genotype be y1i and the mean difference in phenotype be y2i.
■ The ith study produces two potential estimates y1i and y2i although in practice only one or other may be obtained or reported.
■ We assume that the within-study correlation of y1i and y2i is negligible.■ Thus we wish to estimate y1i, y2i and the parameters of between studies
heterogeneity.
Mendelian randomisation (3)
Model A
It is a standard multivariate model with ρW=0. It can be fitted using standard software
Model B
It is similar to model A except that λ is treated as a random-effects parameter (with variance τ2
λ) whereas the within studies correlation is zero (ρW=0). It needs specialised software
Mendelian randomisation (4)
■ Using the paper by Wald et al. a total of 64 genetic studies were identified (mthfr.dta).
■ 31 evaluated only genotype-disease association, 16 only genotype-phenotype association, and 17 both
■ Among the 17 studies evaluating both associations, 7 measured the mean difference in phenotype level with genotype in both cases and controls (2 reporting only combined means), but 4 studies measured homocysteine only in cases and 4 only in controls, while two reports were unclear.
Example: MTHFR, Homocysteine and CHD
Wald DS, Law M, Morris JK. Homocysteine and cardiovascular disease: evidence on causality from a meta-analysis. BMJ 2002;325:1202.
Mendelian randomisation (5)
The data look like this:Where:
z: logOR TT vs. CC genotypes
varz: variance of z
y: mean differences in serum homocysteine concentrations (μmol/l) for TT vs. CC genotypesvary: variance of y
wy: 1/vary
wz: 1/varz
Mendelian randomisation (6)
Thus, after running mthfr.do we get the following output:. mvmeta b V, vars(b1 b2) ml
The results are identical with those derived after fitting Model A.
Mendelian randomisation (7)
Mendelian randomisation (8)
If we assume model B:
We obtain the following parameters estimates:
Mendelian randomisation (9)
Conclusions
● Multivariate meta-analysis is an important tool in systematic reviews
● It can be applied in several settings:○ in a single 2X2 table with a special scope (baseline risk, diagnostic
studies)○ in cases of several outcomes or risk factors (multiple outcomes)○ in case of a single outcome or risk factor which is however a vector
(genetic association)● The within studies correlation is very important
● The multivariate analysis usually increases the power by borrowing strength from external studies
● The multivariate analysis takes into account the correlation of the estimates and thus, it allows the comparison of the estimates after fitting the model
● Recent developments in statistical theory and software allows easily fitting such models
Thank you
Questions?