multivariate meta-analysis

Multivariate Meta-analysis

Pantelis G. Bagos, PhDAssistant Professor

Niki L. Dimou, MScPhd Candidate

University of Central Greece, Lamia, Greece Department of Computer Science and Biomedical Informatics

Lamia 2012

Meta-analysis

■ Combining the estimates of several studies

■ The methodology dates back to Fisher

■ The term appeared for the first time in Psychology (Glass, 1976)

■ In its simpler form, it is a weighted average of the estimates

■ Improves the statistical power to detect weak effects

Glass GV. Primary, secondary, and meta-analysis of research. Educational Researcher, 1976; 5: 3-8

Nikolopoulos GΚ, Tsantes AΕ, Bagos PG, Travlou A, Vaiopoulos G. Integrin, alpha 2 gene C807T Polymorphism and Risk of Ischemic Stroke: a Meta-Analysis. 2007, Thrombosis Research; 119 (4): 501-510

Tsantes AΕ, Nikolopoulos GΚ, Bagos PG, Rapti E, Mantzios G, Kapsimali V, Travlou A. Association between the Plasminogen Activator Inhibitor-1 4G/5G Polymorphism and Venous Thrombosis: a Meta-Analysis. 2007, Thrombosis and Haemostasis; 97(6):907-13

Statistical models

Fixed effects models

Random effects models

Multivariate meta-analysis

■ In many situations we need to model simultaneously two or more effect sizes from each study

■ This may be due to their complementarity (i.e. sensitivity-specificity)

■ There may be multiple treatments, multiple outcomes, or multiple risk factors

■ The joint analysis usually increases the power by borrowing strength from external studies

■ The joint analysis takes into account the correlation of the estimates and thus, it allows the comparison of the estimates after fitting the model

Higgins JP, Whitehead A (1996) Borrowing strength from external trials in a meta-analysis. Stat Med 15(24): 2733-2749

The model

van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med. 2002;21(4):589-624

Estimation

■ Based on the multivariate normal distribution

■ ML, REML and method of moments (non-iterative)

■ Studies reporting a subset of the outcomes are treated as MAR

Berkey CS, Hoaglin DC, Antczak-Bouckoms A, Mosteller F, Colditz GA (1998) Meta-analysis of multiple outcomes by regression with random effects. Stat Med 17(22): 2537-2550

Jackson D, Riley R, White IR.Multivariate meta-analysis: Potential and promise. Stat Med. 2011 Jan 26.

Jackson D, White IR, Thompson SG: Extending DerSimonian and Laird's methodology to perform multivariate random effects meta-analyses. Stat Med 2010, 29:1282-1297

What about within-studies correlation?

■ Just like the variance, the covariance is assumed known and needs to be available

■ In the majority of the situations however, unlike variance, the covariance is not reported in publications and one needs access to individual data in order to calculate it

■ Ignoring or approximating the correlation results in biased overall estimates of variance

Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate random-effects meta-analysis and the estimation of between-study correlation. BMC Med Res Methodol. 2007 12;7:3.

An alternative model

It is different from all the above-mentioned models, in that it uses an estimate for the overall correlation (ρ). That is, rather than partitioning the overall correlation into within-study and between-study components, it uses a “single” parameter, ρ, to model directly the overall correlation. The additional variation beyond sampling error is indicated by ψ1, ψ2. However, these are not directly equivalent to τ1, τ2, the between-study variances in the general model, although in some circumstances they may be similar. The model is not hierarchical and it can also include studies that provide only one of the 2 endpoints under a missing at random assumption.

Riley RD, Thompson JR, Abrams KR (2008) An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics 9(1): 172-186

The general case

■ Let Y, X1, and X2 denote three categorical random variables with two levels (i.e. 0,1), used to classify n individuals.

■ Usually, in a retrospective case-control study Y denotes the case-control status and X1, X2 denote the two different exposures.

■ In clinical trials (or in prospective observational studies), Y will denote the disease/non-disease outcome whereas X1, X2 may denote two different treatments (or exposures).

■ Alternatively, Y may denote the single treatment whereas X1, X2 will denote the two alternative outcomes.

Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine. 2012

The data

The data (cont.)

The odds ratios

The main result

■ The main purpose of this work is to derive an estimate for the covariance and express it using solely the observed counts of the contingency tables (Table 1 and Table 2). I will show that the estimate of the covariance is given by:

■ It is interesting to note at this point that the covariance depends only on the observed counts nijk, nij+ and ni+k. If nij+ or ni+k becomes zero, a simple correction can be employed adding c=½ to the cell counts.

■ It is clear that for calculating the covariance requires knowledge of the full distribution of the counts in Table 1 (i.e. nijk ). However, if we recall that nij+, ni+k and n+jk are the minimal sufficient statistics for obtaining the maximum likelihood (ML) estimates of the 2x2x2 contingency table in the case of no three-factor interaction, we realize that the covariance can theoretically be calculated in certain cases even when the nijk are not directly observed, provided that we assume no three-way interaction.

Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine. 2012

Special cases already known in the literature

■ Dose-response models■ Genetic association studies■ Observational studies that share the same group of controls■ Clinical trials with multiple treatments that share a common

placebo group■ Mutually exclusive outcomes

Dose-response models and Genetic association studies

In such cases where we have two (or more) categories of exposure that are compared against the same baseline group (no exposure), we will have n1+0=n10+ and n00+= n0+0, and thus:

Berrington A, Cox DR. Generalized least squares for the synthesis of correlated information. Biostatistics 2003, 4(3):423-431.Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. Am J Epidemiol 1992, 135(11):1301-1309

Bagos PG. A unification of multivariate methods for meta-analysis of genetic association studies. Stat Appl Genet Mol Biol 2008, 7:Article31

Bagos PG. Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature. BMC Genet 2011, 12:8

Observational studies that share the same group of controls/ clinical trials with multiple treatments (common placebo group)

■ The latter has become very important recently, since it finds applications in the so-called multiple treatment comparison or network meta-analysis. In such a case we will have n01+ =n0+1 and n00+ =n0+0 and thus:

Gleser LJ, Olkin I. Stochastically dependent effect sizes. In: The Handbook of Research Synthesis Edited by Cooper HM, Hedges LV. New York: Russell Sage Foundation; 1994: 339-355

Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med 2004, 23(20):3105-3124.

Mutually exclusive outcomes

■ In this particular situation (i.e. when we have death from cancer, death from other cause and no death at all), the odds-ratios are calculated against all other alternatives and not only against the “alive” category. Thus, using the notation of Table 2, we will have Y denoting the treatment and X1 and X2 denoting the mutually exclusive outcomes and it is easily understood that the two log-odds ratios will be negatively correlated. To reconstruct this scenario using the notation followed here, we have to resort to Table 1 and observe that n111=n011=0 by design (a person cannot die from both causes) and that the remaining counts are disjoint:

Trikalinos TA, Olkin I. A method for the meta-analysis of mutually exclusive binary outcomes. Stat Med 2008, 27(21):4279-4300

New applications

■ Combining matched and unmatched case-control studies–Moreno V, Martin ML, Bosch FX, de Sanjose S, Torres F, Munoz N. Combined analysis of matched and unmatched case-control studies: comparison of risk estimates from different studies. Am J Epidemiol 1996, 143(3):293-300.

■ Combining population-based and family-based genetic association studies

–Pfeiffer RM, Pee D, Landi MT. On combining family and case-control studies. Genet Epidemiol 2008, 32(7):638-646

■ Multipoint meta-analysis of genetic association studies■ Meta-analysis of diagnostic tests–For all the above, only methods that use individual data were available

More information

● Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine. 2012

● Bagos PG, Dimou NL, Liakopoulos TD, Nikolopoulos GK. Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases. Statistical Applications in Genetics and Molecular Biology.2011, 10(1):Article19

● Bagos PG. Meta-analysis of haplotype-association studies: Comparison of methods and empirical evaluation of the literature, 2011, BMC Genetics, 12:8

● Bagos PG, Liakopoulos TD. A multipoint method for meta-analysis of genetic association studies. 2010, Genetic Epidemiology, 34(7):702-15

http://www.compgen.org/publications/by-subject



Applications in Stata

● Baseline risk● Diagnostic tests● Multiple outcomes with known within studies

correlation● Multiple studies with unknown within studies

correlation● Multiple treatments● Genetic association studies● Mendelian randomization

Requirements

● A working version of Stata (www.stata.com)● GLLAMM for Stata (www.gllamm.org)● mvmeta v.2 for Stata (from within Stata type: net from

http://www.mrc-bsu.cam.ac.uk/IW_Stata/) ● The Stata do-files are available at:http://www.compgen.org/material/meta-analysis/multivariate

http://www.compgen.org/material/meta-analysis/multivariate



Baseline risk (1)

■ The meta-analysis concerns 13 trials on the efficacy of BCG vaccine against tuberculosis.

■ In each trial a vaccinated group is compared with a non-vaccinated control group.

■ Some covariates are available that might explain the heterogeneity among studies: geographic latitude of the place where the study was done; year of publication, and method of treatment allocation (random, alternate or systematic).

■ The main question behind the discussion on baseline risk is whether the baseline risk (risk in the non-vaccinated group) can be a source of heterogeneity. However, the log-odds ratio and the log-odds of the non-vaccinated group are correlated (regression to the mean)

Example: BCG data

Colditz GA, Brewer FB, Berkey CS, Wilson EM, Burdick E, Fineberg HV, Mosteller F. Efficacy of BCG vaccine in the prevention of tuberculosis. Journal of the American Medical Association 1994; 271:698 –702.

Baseline risk (2)

The data look like this: Where:

X_T: Vaccinated-Disease

n_T: Vaccinated-No disease

X_C: Not vaccinated-Disease

n_C: Not vaccinated-No disease

Lat: Latitude

alloc: Allocation (1: Random, 2: Alternate, 3: Systematic)

Baseline risk (3)

Figure 1. L’Abbe plot of observed log(odds) of the not-vaccinated trial arm versus the vaccinated trial arm. The size of the circle is an indication for the inverse of the variance of the log-odds ratio in that trial.

Baseline risk (4)We use bcg.do and we perform a univariate random effects meta-analysis.

Baseline risk (5)

The results are identical using mvmeta command: . mvmeta b V, vars(b1 b2) ml

Baseline risk (6)

■ The conditional variance of the true log-odds, and therefore also of the log-odds ratio, in the vaccinated group given the true log-odds in the not-vaccinated group is which is interpreted as the variance between treatment effects among trials with the same baseline risk.

■ The variance of the treatment effect, measured as the log-odds ratio, calculated from Σ is (1.4313709+2.4073333-2*1.7573268)=0.3240506

■ So baseline risk, measured as the true log-odds in the not-vaccinated group, explains (0.3240506−0.1485417)/0.3240506=54 per cent of the heterogeneity in vaccination effect between the trials.

Baseline risk (7)We can also use gllamm command assuming a bivariate normal distribution:

. gllamm logit grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) s(wgt) constraint(1) from(A) long adapt

Baseline risk (8)Or a binomial distribution:

. gllamm X grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) family(binomial) denom(z) link(logit) adapt trace allc

Diagnostic tests (1)

Usually, in diagnostic tests there are two parameters of interest (sensitivity and specificity) which are modelled simultaneously:

The marginal model on which we base the inference is:

Table 1. Classification of the findings of a diagnostic test according to the test result (positive or negative) and the disease status for study i (i=1,2,…,k).


■ The method was applied in the data obtained from a meta-analysis that aimed to determine whether Rheumatoid Factor (RF) identifies patients with Rheumatoid Arthritis (RA).

■ A total of 50 studies provided information concerning RF.

Example: RF for the diagnosis of Rheumatoid Arthritis

Nishimura K, Sugiyama D, Kogata Y, Tsuji G, Nakazawa T, Kawano S, et al. Meta-analysis: diagnostic accuracy of anti-cyclic citrullinated peptide antibody and rheumatoid factor for rheumatoid arthritis. Ann Intern Med. 2007 Jun 5;146(11):797-808.


Thus after running roc.do we get the following output for RF:. mvmeta b V, vars(b1 b2)


The construction of the ROC curves is also feasible for RF:


Or we can use the MIDAS command: midas tp fp fn tn, res(sum)

Known within studies correlation (1)

■ A recent meta-analysis by Antczak-Bouckoms et al. located 5 randomized controlled trials that compared a surgical procedure with a non-surgical procedure for the treatment of moderate periodontal disease.

■ The two outcomes assessed on each patient were (pre- to post-treatment mm changes in) probing depth (PD) and attachment level (AL) which are modeled simultaneously in our example.

■ The goal of treatment is to decrease probing depths and to increase attachment levels around the teeth.

Example: Periodontitis (PD-AL)

Antczak-Bouckoms, A., Joshipura, K., Burdick, E. and Tulloch, J. F. C. ‘Meta-analysis of surgical versus non-surgical method of treatment for periodontal disease’, Journal of Clinical Periodontology, 20, 259-268 (1993).


The data look like this:

Where:

PD: Improvement in Probing depth (surgical minus non surgical values)

AL: Improvement in Attachment level (surgical minus non surgical values)

V11, V22, V12: The within-trial covariance matrix of the two outcomes (means) in trial i


Thus, after running periodontitis.do we get the following output:

Unknown within studies correlation (1)

■ A systematic review in neuroblastoma sought to establish the prognostic importance of MYCN, a protooncogene.

■ In 17 studies, a log-hazard ratio estimate for “amplified” versus “non-amplified” MYCN was available for both disease-free survival (Yi1) and overall survival (Yi2).

■ However, no studies reported the within-study correlations, which are likely to be strongly positive due to the structural relationship between these endpoints.

■ Further, there were 64 studies which provided data for only one of the 2 endpoints.

Example: MYCN data

RILEY, R. D., HENEY, D., JONES, D. R., SUTTON, A. J., LAMBERT, P. C., ABRAMS, K. R., YOUNG, B., WAILOO, A. J. AND BURCHILL, S. A. (2004). A systematic review of molecular and biological tumor markers in neuroblastoma. Clinical Cancer Research 10, 4–12.

Unknown within studies correlation (2)The data look like this:

Where:b1: log-hazard ratio for “amplified” versus “non-amplified” MYCN for disease-free survivalb2: log-hazard ratio for “amplified” versus “non-amplified” MYCN for overall survivalV11: variance of b1V22: variance of b2

There is no V12!!!

Unknown within studies correlation (3)

After running mycn.do we get the following output:

Multiple treatments (1)

■ 26 clinical trials which investigate the prevention of cirrhosis using beta-blockers and sclerotherapy

■ Nine randomized clinical trials of beta-blockers and 19 trials of sclerotherapy were reviewed.

■ Crude rates of bleeding and death in treated and control groups were recorded.

Example: Cirrhosis and beta-blockers or sclerotherapy

Pagliaro L, D'Amico G, Sörensen TI, Lebrec D, Burroughs AK, Morabito A, Tiné F, Politi F, Traina M. Prevention of first bleeding in cirrhosis. A meta-analysis of randomized trials of nonsurgical treatment. Ann Intern Med.1992 Jul 1;117(1):59-70.


The data look like this: Where:

r1: number of bleedings in patients treated with beta-blockers

n1: total number of patients treated with beta-blockers

r2: number of bleedings in patients treated with sclerotherapy

n2: total number of patients treated with sclerotherapy

r0: number of bleedings in controls

n0: total number of controls


After running bleeding.do we get the following output:. mvmeta b V,vars( b1 b2 ) ml


. mvmeta b V,vars( b1 b2) ml bscov(prop C)

.gllamm logit c1 c2 c3 , nocons i(id) nrf(1) eqs(c) s(wgt) constraint(1 ) nip(8) adapt

.gllamm r c1 c2 c3, nocons fam(binom) i(id) link(logit) eqs(c) nrf(1) adapt denom(n)

Genetic association studies (1)

Table 2. A typical layout of the data used in a meta-analysis of genetic association studies involving a single bi-allelic locus with a continuous outcome.

The marginal model on which we base the inference is:


Example: Association of AGT M235T polymorphisms with Essential Hypertension

■ A total of 7 studies addressed the association of AGT M235T with Hypertension■ The two logORs derived from the mutant allele (TT vs. MM and MT vs. MM)

were modeled simultaneously as a bivariate response.

Bagos PG. A unification of multivariate methods for meta-analysis of genetic association studies. 2008, Statistical Applications in Genetics and Molecular Biology, 7(1), Article 13


The data look like this:

Where:aa0: MM genotype for controls, ab0: MT genotype for controls, bb0: TT genotype for controlsaa1: MM genotype for cases, ab1: MT genotype for cases, bb1: TT genotype for cases


Thus, after running genetics.do we get the following output:. mvmeta b V,vars(b1 b2) ml


It is special case of the general model. It re-parameterizes the y1, y2, using λ= y1/ y2 and imposes a single between studies variance τ2. Thus, λ is treated as a fixed-effects parameter.

Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J (2005) The choice of a genetic model in the meta-analysis of molecular association studies. Int J Epidemiol 34(6): 1319-1328


Under the genetic model-free approach proposed by Minelli et al. we get the following output (model_free.do):

Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J (2005) The choice of a genetic model in the meta-analysis of molecular association studies. Int J Epidemiol 34(6): 1319-1328

Mendelian randomisation (1)

■ Suppose that we have genotype, G, a continuous intermediate phenotype, IP, and a binary disease outcome D.

■ The gene is selected because of its influence on the phenotype and we wish to establish the relationship between phenotype and disease.

■ The hypothesised causal pathway is:

G IP DMinelli C, Thompson JR, Tobin MD, Abrams KR (2004) An integrated approach to the meta-analysis of genetic association studies using Mendelian randomization. Am J Epidemiol 160(5): 445-452Thompson JR, Minelli C, Abrams KR, Tobin MD, Riley RD (2005) Meta-analysis of genetic studies using Mendelian randomization--a multivariate approach. Stat Med 24(14): 2241-2254


■ Let the Log Odds Ratio of Disease given genotype be y1i and the mean difference in phenotype be y2i.

■ The ith study produces two potential estimates y1i and y2i although in practice only one or other may be obtained or reported.

■ We assume that the within-study correlation of y1i and y2i is negligible.■ Thus we wish to estimate y1i, y2i and the parameters of between studies

heterogeneity.


Model A

It is a standard multivariate model with ρW=0. It can be fitted using standard software

Model B

It is similar to model A except that λ is treated as a random-effects parameter (with variance τ2

λ) whereas the within studies correlation is zero (ρW=0). It needs specialised software


■ Using the paper by Wald et al. a total of 64 genetic studies were identified (mthfr.dta).

■ 31 evaluated only genotype-disease association, 16 only genotype-phenotype association, and 17 both

■ Among the 17 studies evaluating both associations, 7 measured the mean difference in phenotype level with genotype in both cases and controls (2 reporting only combined means), but 4 studies measured homocysteine only in cases and 4 only in controls, while two reports were unclear.

Example: MTHFR, Homocysteine and CHD

Wald DS, Law M, Morris JK. Homocysteine and cardiovascular disease: evidence on causality from a meta-analysis. BMJ 2002;325:1202.


The data look like this:Where:

z: logOR TT vs. CC genotypes

varz: variance of z

y: mean differences in serum homocysteine concentrations (μmol/l) for TT vs. CC genotypesvary: variance of y

wy: 1/vary

wz: 1/varz


Thus, after running mthfr.do we get the following output:. mvmeta b V, vars(b1 b2) ml

The results are identical with those derived after fitting Model A.


If we assume model B:

We obtain the following parameters estimates:

Conclusions

● Multivariate meta-analysis is an important tool in systematic reviews

● It can be applied in several settings:○ in a single 2X2 table with a special scope (baseline risk, diagnostic

studies)○ in cases of several outcomes or risk factors (multiple outcomes)○ in case of a single outcome or risk factor which is however a vector

(genetic association)● The within studies correlation is very important

● The multivariate analysis usually increases the power by borrowing strength from external studies

● The multivariate analysis takes into account the correlation of the estimates and thus, it allows the comparison of the estimates after fitting the model

● Recent developments in statistical theory and software allows easily fitting such models

Thank you

Questions?

multivariate meta-analysis

Documents