(20181118, ada) distributed analysis in multi-center studies...methods that allow robust and...

67
Distributed analysis in multi-center studies Sharing of individual-level data across health plans or healthcare delivery systems continues to be challenging due to concerns about loss of patient privacy, unauthorized uses of transferred data, inaccurate analysis or interpretation of data, or contractual or legal restrictions. Although these challenges can be addressed in part by proper governance and appropriate updates to existing regulations, newer privacy-protecting analytic and data-sharing methods offer another potential solution. This presentation will describe the use of privacy-protecting analytic methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling of individual-level datasets across data sources. We will present several comparative safety and effectiveness studies of medical treatments that employ these methods to generate actionable real-world evidence. 1. Toh S, Gagne JJ, Rassen JA, Fireman BH, Kulldorff M, Brown JS. Confounding adjustment in comparative effectiveness research conducted within distributed research networks. Med Care 2013:51(8 Suppl 3):S4-S10 2. Toh S, Hampp C, Reichman ME, Graham DJ, Balakrishnan S, Pucino F, Hamilton J, Lendle S, Iyer A, Rucker M, Pimentel M, Nathwani N, Griffin MR, Brown NJ, Fireman BH. Risk of hospitalized heart failure among new users of saxagliptin, sitagliptin, and other antihyperglycemic drugs: A retrospective cohort study. Ann Intern Med 2016;164(11):705- 714 (PMC5178978) 3. Toh S, Reichman ME, Graham DJ, Hampp C, Zhang R, Butler MG, Iyer A, Rucker M, Pimentel M, Hamilton J, Lendle S, Fireman BH; for the Mini-Sentinel AMI-Saxagliptin Surveillance Writing Group. Prospective post-marketing surveillance of acute myocardial infarction in new users of saxagliptin: A population-based study. Diabetes Care 2018;41(1):39-48

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Distributed analysis in multi-center studies

Sharing of individual-level data across health plans or healthcare delivery systems continues to be challenging due to concerns about loss of patient privacy, unauthorized uses of transferred data, inaccurate analysis or interpretation of data, or contractual or legal restrictions. Although these challenges can be addressed in part by proper governance and appropriate updates to existing regulations, newer privacy-protecting analytic and data-sharing methods offer another potential solution. This presentation will describe the use of privacy-protecting analytic methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling of individual-level datasets across data sources. We will present several comparative safety and effectiveness studies of medical treatments that employ these methods to generate actionable real-world evidence.

1. Toh S, Gagne JJ, Rassen JA, Fireman BH, Kulldorff M, Brown JS. Confounding adjustment incomparative effectiveness research conducted within distributed research networks. MedCare 2013:51(8 Suppl 3):S4-S10

2. Toh S, Hampp C, Reichman ME, Graham DJ, Balakrishnan S, Pucino F, Hamilton J, Lendle S,Iyer A, Rucker M, Pimentel M, Nathwani N, Griffin MR, Brown NJ, Fireman BH. Risk ofhospitalized heart failure among new users of saxagliptin, sitagliptin, and otherantihyperglycemic drugs: A retrospective cohort study. Ann Intern Med 2016;164(11):705-714 (PMC5178978)

3. Toh S, Reichman ME, Graham DJ, Hampp C, Zhang R, Butler MG, Iyer A, Rucker M,Pimentel M, Hamilton J, Lendle S, Fireman BH; for the Mini-Sentinel AMI-SaxagliptinSurveillance Writing Group. Prospective post-marketing surveillance of acute myocardialinfarction in new users of saxagliptin: A population-based study. Diabetes Care2018;41(1):39-48

Page 2: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Distributed analysis in multi‐center studies

Darren Toh, ScDDepartment of Population Medicine

Harvard Medical School & Harvard Pilgrim Health Care InstituteBoston, MA

November 18, 2018

Page 3: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Disclosures

Research support• Patient‐Centered Outcomes Research Institute (ME‐1403‐11305)• Office of the Assistant Secretary for Planning and Evaluation & Food and Drug Administration 

(HHSF223200910006I)• National Institutes of Health (U01EB023683)• Agency for Healthcare Research and Quality (R01HS026214)

Board of Directors, International Society for Pharmacoepidemiology

My spouse is an employee of Biogen

2

Page 4: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Overview

Evolution of multi‐center studies

Analytic methods in multi‐center studies

Select examples

Discussion

3

Page 5: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Overview

Evolution of multi‐center studies

Analytic methods in multi‐center studies

Select examples

Discussion

4

Page 6: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Multi‐center studies

Many studies are now done in multi‐center settings

5

Page 7: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Why do multi‐center studies?

6

Larger sample sizes• Allow studies of rare treatments or rare outcomes• Allow studies in specific subpopulations• Allow studies to be done more quickly

More diverse populations• Allow more generalizable findings• Allow assessment of treatment effect heterogeneity

Page 8: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Multi‐center studies v1.0

Analysis center

7

Page 9: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Multi‐center studies v1.0

8

Pooling study‐specific individual‐level datasets

Page 10: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Typical datasets shared in multi‐center studies v1.0

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 1 0 312 33 1 0 1 1 …

002 0 0 40 45 1 0 1 0 …

003 0 0 365 76 0 0 0 0 …

004 0 0 200 56 0 1 0 0 …

005 0 1 2 21 0 0 1 0 …

006 1 1 15 80 1 0 0 1 …

007 1 0 4 65 1 1 0 1 …

008 1 0 145 77 0 1 0 0 …

009 0 1 33 48 1 0 0 0 …

010 0 0 98 52 1 0 0 0 …

011 0 0 34 32 0 0 0 0 …

… … … … … … … … … …

9

Page 11: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Typical datasets shared in multi‐center studies v1.0

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 1 0 312 33 1 0 1 1 …

002 0 0 40 45 1 0 1 0 …

003 0 0 365 76 0 0 0 0 …

004 0 0 200 56 0 1 0 0 …

005 0 1 2 21 0 0 1 0 …

006 1 1 15 80 1 0 0 1 …

007 1 0 4 65 1 1 0 1 …

008 1 0 145 77 0 1 0 0 …

009 0 1 33 48 1 0 0 0 …

010 0 0 98 52 1 0 0 0 …

011 0 0 34 32 0 0 0 0 …

… … … … … … … … … …

10

Each row represents an individual

Page 12: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Typical datasets shared in multi‐center studies v1.0

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 1 0 312 33 1 0 1 1 …

002 0 0 40 45 1 0 1 0 …

003 0 0 365 76 0 0 0 0 …

004 0 0 200 56 0 1 0 0 …

005 0 1 2 21 0 0 1 0 …

006 1 1 15 80 1 0 0 1 …

007 1 0 4 65 1 1 0 1 …

008 1 0 145 77 0 1 0 0 …

009 0 1 33 48 1 0 0 0 …

010 0 0 98 52 1 0 0 0 …

011 0 0 34 32 0 0 0 0 …

… … … … … … … … … …

11

Each column represents a covariate

Page 13: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Typical datasets shared in multi‐center studies v1.0

12

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 1 0 312 33 1 0 1 1 …

002 1 0 40 45 1 0 1 0 …

003 1 0 365 76 0 0 0 0 …

004 1 0 200 56 0 1 0 0 …

005 0 1 2 21 0 0 1 0 …

006 0 1 15 80 1 0 0 1 …

007 0 0 4 65 1 1 0 1 …

008 0 0 145 77 0 1 0 0 …

… … … … … … … … … …

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 0 1 35 44 0 1 3 0 …

002 0 1 213 54 0 1 1 1 …

003 0 1 453 78 0 0 4 1 …

004 0 0 58 87 1 0 3 1 …

005 1 0 31 22 1 0 3 0 …

006 1 0 56 46 0 1 2 0 …

007 1 0 123 53 0 1 1 1 …

008 1 0 546 35 0 0 3 0 …

… … … … … … … … … …

Site PatID Exposure Outcome Time Age Sex DM HTN CVD …

1 001 1 0 312 33 1 0 1 1 …

1 002 1 0 40 45 1 0 1 0 …

1 003 1 0 365 76 0 0 0 0 …

1 004 1 0 200 56 0 1 0 0 …

1 005 0 1 2 21 0 0 1 0 …

1 006 0 1 15 80 1 0 0 1 …

1 007 0 0 4 65 1 1 0 1 …

1 008 0 0 145 77 0 1 0 0 …

… … … … … … … … … … …

2 001 0 1 35 44 0 1 3 0 …

2 002 0 1 213 54 0 1 1 1 …

2 003 0 1 453 78 0 0 4 1 …

2 004 0 0 58 87 1 0 3 1 …

2 005 1 0 31 22 1 0 3 0 …

2 006 1 0 56 46 0 1 2 0 …

2 007 1 0 123 53 0 1 1 1 …

2 008 1 0 546 35 0 0 3 0 …

… … … … … … … … … … …

Data Partner 1

Data Partner 2

Page 14: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Multi‐center studies v2.0

Individual data partners

Site 1 Site 2

Site 3 Site 4

Data standardization(common data model)

Site 1

Site 2

Site 3

Site 4

Data accessible to research projects 

• Research projects

• Programs written against common data model

Data quality improvement feedback loop

Adapted from: http://www.hcsrn.org/asset/b9efb268‐eb86‐400e‐8c74‐2d42ac57fa4F/VDW.Infographic031511.jpg 13

Page 15: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Data standardization – Common data model

14

Page 16: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Review & Run Query

Review & Return Results

Data Partner 1

EnrollmentDemographicsUtilizationPharmacy

Etc

1. User creates and submits query 

Review & Run Query

Review & Return Results

Data Partner 2

EnrollmentDemographicsUtilizationPharmacy

Etc

Analysis Center

Secure Network Portal

1

Distributed analysis in networks with common data model 

15

Page 17: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Review & Run Query

Review & Return Results

Data Partner 1

EnrollmentDemographicsUtilizationPharmacy

Etc

1. User creates and submits query

2. Data partners retrieve query

2

Review & Run Query

Review & Return Results

Data Partner 2

EnrollmentDemographicsUtilizationPharmacy

Etc

Analysis Center

Secure Network Portal

1

Distributed analysis in networks with common data model 

16

Page 18: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Review & Run Query

Review & Return Results

Data Partner 1

EnrollmentDemographicsUtilizationPharmacy

Etc

1. User creates and submits query 

2. Data partners retrieve query 

3. Data partners review and run query against their local data

4. Data partners review results 

2 3 4

Review & Run Query

Review & Return Results

Data Partner 2

EnrollmentDemographicsUtilizationPharmacy

Etc

3 4

Analysis Center

Secure Network Portal

1

Distributed analysis in networks with common data model

17

Page 19: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Review & Run Query

Review & Return Results

Data Partner 1

EnrollmentDemographicsUtilizationPharmacy

Etc

1. User creates and submits query 

2. Data partners retrieve query 

3. Data partners review and run query against their local data

4. Data partners review results 

5. Data partners return results via secure network 

6. Results are aggregated and reported 

2 3 45

6

Review & Run Query

Review & Return Results

Data Partner 2

EnrollmentDemographicsUtilizationPharmacy

Etc

3 4

Analysis Center

Secure Network Portal

1

Distributed analysis in networks with common data model

18

Page 20: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Typical datasets shared in multi‐center studies v2.0

19

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 1 0 312 33 1 0 1 1 …

002 1 0 40 45 1 0 1 0 …

003 1 0 365 76 0 0 0 0 …

004 1 0 200 56 0 1 0 0 …

005 0 1 2 21 0 0 1 0 …

006 0 1 15 80 1 0 0 1 …

007 0 0 4 65 1 1 0 1 …

008 0 0 145 77 0 1 0 0 …

… … … … … … … … … …

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 0 1 35 44 0 1 3 0 …

002 0 1 213 54 0 1 1 1 …

003 0 1 453 78 0 0 4 1 …

004 0 0 58 87 1 0 3 1 …

005 1 0 31 22 1 0 3 0 …

006 1 0 56 46 0 1 2 0 …

007 1 0 123 53 0 1 1 1 …

008 1 0 546 35 0 0 3 0 …

… … … … … … … … … …

Site PatID Exposure Outcome Time Age Sex DM HTN CVD …

1 001 1 0 312 33 1 0 1 1 …

1 002 1 0 40 45 1 0 1 0 …

1 003 1 0 365 76 0 0 0 0 …

1 004 1 0 200 56 0 1 0 0 …

1 005 0 1 2 21 0 0 1 0 …

1 006 0 1 15 80 1 0 0 1 …

1 007 0 0 4 65 1 1 0 1 …

1 008 0 0 145 77 0 1 0 0 …

… … … … … … … … … … …

2 001 0 1 35 44 0 1 3 0 …

2 002 0 1 213 54 0 1 1 1 …

2 003 0 1 453 78 0 0 4 1 …

2 004 0 0 58 87 1 0 3 1 …

2 005 1 0 31 22 1 0 3 0 …

2 006 1 0 56 46 0 1 2 0 …

2 007 1 0 123 53 0 1 1 1 …

2 008 1 0 546 35 0 0 3 0 …

… … … … … … … … … … …

Data Partner 1

Data Partner 2

Page 21: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Concerns about data sharing in multi‐center studies v1 & v2

Loss of patient privacy

Unauthorized uses of data

Inaccurate analysis or interpretation of data

Disclosures of sensitive institutional or corporate information

Contractual restrictions

20

Page 22: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Data sharing – A balancing act

Granularity or identifiability of

information

Analytic flexibility

21

Page 23: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Multi‐center studies v3.0

Analysis Center

22

Page 24: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Multi‐center studies v3.0

Pooling study‐specific summary‐level datasets23

Page 25: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Overview

Evolution of multi‐center studies

Analytic methods in multi‐center studies

Select examples

Discussion

24

Page 26: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Privacy‐protecting methods for multi‐center studies v3.0

Summary score‐based methods

Meta‐analysis of database‐specific effect estimates

Distributed regression

25

Page 27: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Summary scores

PS: Propensity scores DRS: Disease risk scores

26

Treatment Outcome

Confounders

DRSPS

Page 28: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Individual‐level dataset with individual covariates

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 1 0 312 33 1 0 1 1 …

002 0 0 40 45 1 0 1 0 …

003 0 0 365 76 0 0 0 0 …

004 0 0 200 56 0 1 0 0 …

005 0 1 2 21 0 0 1 0 …

006 1 1 15 80 1 0 0 1 …

007 1 0 4 65 1 1 0 1 …

008 1 0 145 77 0 1 0 0 …

009 0 1 33 48 1 0 0 0 …

010 0 0 98 52 1 0 0 0 …

011 0 0 34 32 0 0 0 0 …

… … … … … … … … … …

27

Page 29: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Individual‐level dataset with summary scores

PatID Exposure Outcome Time PS

001 1 0 312 0.33

002 0 0 40 0.21

003 0 0 365 0.56

004 0 0 200 0.11

005 0 1 2 0.97

006 1 1 15 0.56

007 1 0 4 0.40

008 1 0 145 0.22

009 0 1 33 0.43

010 0 0 98 0.78

011 0 0 34 0.38

… … … … …

28

Page 30: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Summary score‐based method #1 – Matching

29

Persons in exposed

Persons in unexposed

Events in exposed

Events in unexposed

500 500 80 75

PatID Exposure Outcome Time PS

001 1 0 312 0.33

002 0 0 40 0.21

003 0 0 365 0.56

004 0 0 200 0.11

005 0 1 2 0.97

006 1 1 15 0.56

007 1 0 4 0.40

008 1 0 145 0.22

009 0 1 33 0.43

010 0 0 98 0.78

011 0 0 34 0.38

… … … … …

Page 31: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Summary score‐based method #1 – Matching

30

Persons in exposed

Persons in unexposed

Events in exposed

Events in unexposed

500 500 87 85

Persons in exposed

Persons in unexposed

Events in exposed

Events in unexposed

400 400 68 65

Site Persons in exposed

Persons in unexposed

Events in exposed

Events in unexposed

1 500 500 87 85

2 400 400 68 65

Data Partner 1

Data Partner 2

Page 32: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Summary score‐based method #2 – Stratification

31

PatID Exposure Outcome Time PS

001 1 0 312 0.33

002 0 0 40 0.21

003 0 0 365 0.56

004 0 0 200 0.11

005 0 1 2 0.97

006 1 1 15 0.56

007 1 0 4 0.40

008 1 0 145 0.22

009 0 1 33 0.43

010 0 0 98 0.78

011 0 0 34 0.38

… … … … …

PS or DRS stratum

Persons in exposed

Persons in unexposed

Events in exposed

Events in unexposed

1 200 150 30 352 150 100 20 403 200 180 21 214 150 200 26 18

Page 33: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Summary score‐based method #3 – Risk set analysis

32

PatID Exposure Outcome Time PS

001 1 0 312 0.33

002 0 0 40 0.21

003 0 0 365 0.56

004 0 0 200 0.11

005 0 1 2 0.97

006 1 1 15 0.56

007 1 0 4 0.40

008 1 0 145 0.22

009 0 1 33 0.43

010 0 0 98 0.78

011 0 0 34 0.38

… … … … …

Event Event time Event exposed

Risk set exposed

Risk set unexposed

1 8 0 300 2992 12 1 296 2953 20 1 290 2884 21 0 286 283… … … … …

Page 34: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Meta‐analysis of database‐specific effect estimates

33

PatID Exposure Outcome Time Age Sex DM HTN CVD …

001 1 0 312 33 1 0 1 1 …

002 0 0 40 45 1 0 1 0 …

003 0 0 365 76 0 0 0 0 …

004 0 0 200 56 0 1 0 0 …

005 0 1 2 21 0 0 1 0 …

006 1 1 15 80 1 0 0 1 …

007 1 0 4 65 1 1 0 1 …

008 1 0 145 77 0 1 0 0 …

009 0 1 33 48 1 0 0 0 …

010 0 0 98 52 1 0 0 0 …

011 0 0 34 32 0 0 0 0 …

… … … … … … … … … …

Hazard ratio Lower 95% CI Upper 95% CI

2.97 1.95 4.52

Page 35: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Distributed regression

34

Analyst inputs individual‐level dataset into statistical software

Statistical software produces final results

Statistical software produces intermediate statistics as part of 

computing process

ID E X1 X2 Y

A001 0 13.89 3.42 28.70

A002 1 18.10 1.29 27.90

A003 0 6.41 4.86 33.10

A004 1 16.30 1.45 17.20

A005 1 17.57 2.51 21.70

… … … … …

A100 0 5.78 2.53 23.76

Type Name Intercept E X1 X2 Y

SSCP Intercept 100.0 52.0 1157.1 405.9 2235.5

SSCP E 52.0 52.0 813.2 138.1 1060.9

SSCP X1 1157.1 813.2 17751.3 3458.7 23815.8

SSCP X2 405.9 138.1 3458.7 2240.8 9572.3

SSCP Y 2235.5 1060.9 23815.8 9572.3 56911.9

MEAN 1.0 0.5 11.6 4.1 22.4

STD 0.0 0.5 6.6 2.5 8.4

N 100 100 100 100 100

VariableParameter estimate

Standard error

Intercept 25.4540 3.7959

E ‐0.4323 1.7865

X1 ‐0.5643 0.1432

X2 ‐0.6564 0.4532

Page 36: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Distributed regression

35

Analyst inputs individual‐level dataset into statistical software

Statistical software produces final results

Statistical software produces intermediate statistics as part of 

computing process

ID E X1 X2 Y

A001 0 13.89 3.42 28.70

A002 1 18.10 1.29 27.90

A003 0 6.41 4.86 33.10

A004 1 16.30 1.45 17.20

A005 1 17.57 2.51 21.70

… … … … …

A100 0 5.78 2.53 23.76

Type Name Intercept E X1 X2 Y

SSCP Intercept 100.0 52.0 1157.1 405.9 2235.5

SSCP E 52.0 52.0 813.2 138.1 1060.9

SSCP X1 1157.1 813.2 17751.3 3458.7 23815.8

SSCP X2 405.9 138.1 3458.7 2240.8 9572.3

SSCP Y 2235.5 1060.9 23815.8 9572.3 56911.9

MEAN 1.0 0.5 11.6 4.1 22.4

STD 0.0 0.5 6.6 2.5 8.4

N 100 100 100 100 100

VariableParameter estimate

Standard error

Intercept 25.4540 3.7959

E ‐0.4323 1.7865

X1 ‐0.5643 0.1432

X2 ‐0.6564 0.4532

“Regular” regression shares this

Page 37: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Distributed regression

36

Analyst inputs individual‐level dataset into statistical software

Statistical software produces final results

Statistical software produces intermediate statistics as part of 

computing process

ID E X1 X2 Y

A001 0 13.89 3.42 28.70

A002 1 18.10 1.29 27.90

A003 0 6.41 4.86 33.10

A004 1 16.30 1.45 17.20

A005 1 17.57 2.51 21.70

… … … … …

A100 0 5.78 2.53 23.76

Type Name Intercept E X1 X2 Y

SSCP Intercept 100.0 52.0 1157.1 405.9 2235.5

SSCP E 52.0 52.0 813.2 138.1 1060.9

SSCP X1 1157.1 813.2 17751.3 3458.7 23815.8

SSCP X2 405.9 138.1 3458.7 2240.8 9572.3

SSCP Y 2235.5 1060.9 23815.8 9572.3 56911.9

MEAN 1.0 0.5 11.6 4.1 22.4

STD 0.0 0.5 6.6 2.5 8.4

N 100 100 100 100 100

VariableParameter estimate

Standard error

Intercept 25.4540 3.7959

E ‐0.4323 1.7865

X1 ‐0.5643 0.1432

X2 ‐0.6564 0.4532

Distributed regression shares this

Page 38: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Overview

Evolution of multi‐center studies

Analytic methods in multi‐center studies

Select examples

Discussion

37

Page 39: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Example 1

http://www.hopkinsmedicine.org/healthlibrary/test_procedures/gastroenterology/laparoscopic_adjustable_gastric_banding_135,63/

http://www.hopkinsmedicine.org/healthlibrary/test_procedures/gastroenterology/roux‐en‐y_gastric_bypass_weight‐loss_surgery_135,65/

38

Page 40: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Study design

•≥21 years at time of bariatric surgery•≥1 BMI of 35kg/m2 or greater •Continuous enrollment w/ benefits•No prior bariatric surgery•No prior diagnosis of study outcome

1/1/2005

Time

Contributing person‐times

12/31/2010Start of follow up (discharge date)

•Re‐hospitalization•Death•Health plan disenrollment•12/31/2010•730 days of follow‐up

365 days

Index bariatric hospitalization

39Toh et al, Med Care, 2014;52:664‐668 

Page 41: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Confounders

40

Age Asthma*Sex Deep vein thrombosis*Race/ethnicity Pulmonary embolism*Diabetes* Congestive heart failure*Baseline BMI* Hyperlipidemia*Year of procedure Coronary artery disease*Charlson comorbidity score* Oxygen use*Atrial fibrillation* Assistive walking device*

GERD* Smoking status*Hypertension* Blood pressure*Sleep Apnea* Length of stay assoc. with procedure

*Identified during the 365‐day baseline period prior to the index bariatric hospitalization

Toh et al, Med Care, 2014;52:664‐668 

Page 42: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Statistical analysis

Propensity score stratification

Analysis• Pooled patient‐level data analysis (benchmark)• Risk set‐based analysis• PS‐stratified analysis (by quintile)• Meta‐analysis of site‐specific effect estimates

41Toh et al, Med Care, 2014;52:664‐668 

Page 43: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Select baseline patient characteristicsCharacteristics Adjustable gastric band (n=1,550) Roux‐en‐y gastric bypass (n=5,792)

N %* N %*

Mean age (SD) 46.7 11.2 45.7 10.7

Age > 65 years 76 4.9 141 2.4

Female sex 1,266  81.7 4,823 83.3

Race/ethnicityBlack or African American 137 8.8 522 9.0

White 1,130 72.9 3,840 66.3

Hispanic 142 9.2 769 13.3

Other 62 4.0 280 4.8

Unknown 79 5.1 381 6.6

Baseline BMI

30‐34.9 96 6.2 174 3.0

35‐39.9 480 31.0 1,410 24.3

40‐49.9 813 52.4 3,126 54.0

≥50 161 10.4 1,082 18.7

42Toh et al, Med Care, 2014;52:664‐668 

Page 44: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Individual‐level data analysis, by site

Site Adjusted HR 95% CISite 1 0.68 0.45, 1.02Site 2 0.65 0.37, 1.15Site 3 0.52 0.26, 1.04Site 4 0.72 0.35, 1.50Site 5 0.82 0.46, 1.48Site 6 0.32 0.13, 0.75Site 7 0.79 0.62, 1.01

43Toh et al, Med Care, 2014;52:664‐668 

Page 45: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Results, by method

Method AdjustedHR 95% CI

Individual‐level 0.71 0.59, 0.84

Risk set 0.71 0.59, 0.84

PS stratification 0.70 0.59, 0.83

Meta‐analysis 0.71 0.60, 0.84

44Toh et al, Med Care, 2014;52:664‐668 

Page 46: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Example 2 – Distributed regression

45

Distributed Regression vs. Pooled Patient‐Level Regression – LINEAR

CovariatesDistributed Regression Pooled Patient‐Level Differences in 

Parameter EstimatesDifferences in Standard ErrorsParameter Estimates Standard Errors Parameter Estimates Standard Errors

Intercept 35.50548 1.57690 35.50548 1.57690 ‐8.38E‐13 2.26E‐14Variable 1 ‐0.27283 0.04401 ‐0.27283 0.04401 4.44E‐16 9.92E‐16Variable 2 ‐1.01582 0.23259 ‐1.01582 0.23259 1.09E‐13 3.22E‐15Variable 3  ‐0.73017 0.07229 ‐0.73017 0.07229 3.54E‐14 1.32E‐15

Distributed Regression vs. Pooled Patient‐Level Regression – LOGISTIC

CovariatesDistributed Regression Pooled Patient‐Level Differences in 

Parameter EstimatesDifferences in Standard ErrorsParameter Estimates Standard Errors Parameter Estimates Standard Errors

Intercept 2.49660 0.49057 2.49660 0.49060 1.33E‐15 9.99E‐16Variable 1 ‐0.14465 0.03686 ‐0.14460 0.03690 2.04E‐13 ‐2.97E‐14Variable 2 ‐0.14105 0.06976 ‐0.14100 0.06980 1.38E‐14 ‐2.22E‐16Variable 3  ‐0.13889 0.02376 ‐0.13890 0.02380 ‐2.42E‐14 ‐2.19E‐16

Distributed Regression vs. Pooled Patient‐Level Regression – COX

CovariatesDistributed Regression Pooled Patient‐Level Differences in 

Parameter EstimatesDifferences in Standard ErrorsParameter Estimates Standard Errors Parameter Estimates Standard Errors

Variable 1 ‐0.06692 0.02084 ‐0.06692 0.02084 ‐1.39E‐16 2.78E‐17Variable 2 ‐0.34644 0.19024 ‐0.34644 0.19024 2.22E‐16 ‐2.78E‐17Variable 3  0.09653 0.02724 0.09653 0.02724 ‐1.80E‐16 1.73E‐17

Page 47: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

46

Example 3 – PCORnet Bariatric Study

Use of bariatric surgery has expanded considerably

Evidence on the comparative effectiveness and safety of these procedures is limited

Page 48: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Study design

47

Main analysis Aggregate analysisComparisons • RYGB vs. SG

• RYGB vs. AGB• AGB vs. SG

• RYGB vs. SG

Outcomes • Weight change 1, 3, and 5 yrs post‐surgery

• Diabetes remission and relapse• Major adverse events

• Weight change 1 yr post surgery

Analysis • One model that combines all data• Additional data‐driven approaches 

to select covariates

• Site‐specific PS model• Fixed set of covariates

Page 49: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

48

Page 50: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

49

Page 51: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Combining propensity scores with distributed regression

50

VariableParameter estimate Standard error

Pooled individual‐level data analysis

Pooled individual‐level data analysis

RYGB vs. SG ‐0.05470 0.00113

PS stratum 1 Reference Reference

PS stratum 2 ‐0.00754 0.00209

PS stratum 3 ‐0.00671 0.00210

PS stratum 4 ‐0.00717 0.00211

PS stratum 5 0.00034218 0.00212

PS stratum 6 ‐0.00583 0.00213

PS stratum 7 ‐0.00135 0.00214

PS stratum 8 ‐0.00435 0.00216

PS stratum 9 ‐0.00523 0.00218

PS stratum 10 ‐0.00812 0.00222

Page 52: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Combining propensity scores with distributed regression

51

VariableParameter estimate Standard error

Pooled individual‐level data analysis

Distributed regression

Pooled individual‐level data analysis

Distributed regression

RYGB vs. SG ‐0.05470 ‐0.05470 0.00113 0.00113

PS stratum 1 Reference Reference Reference Reference

PS stratum 2 ‐0.00754 ‐0.00754 0.00209 0.00209

PS stratum 3 ‐0.00671 ‐0.00671 0.00210 0.00210

PS stratum 4 ‐0.00717 ‐0.00717 0.00211 0.00211

PS stratum 5 0.00034218 0.00034218 0.00212 0.00212

PS stratum 6 ‐0.00583 ‐0.00583 0.00213 0.00213

PS stratum 7 ‐0.00135 ‐0.00135 0.00214 0.00214

PS stratum 8 ‐0.00435 ‐0.00435 0.00216 0.00216

PS stratum 9 ‐0.00523 ‐0.00523 0.00218 0.00218

PS stratum 10 ‐0.00812 ‐0.00812 0.00222 0.00222

Page 53: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

52www.sentinelinitiative.org/sites/default/files/Drugs/Assessments/Mini‐Sentinel_AMI‐and‐Anti‐Diabetic‐Agents_Protocol_0.pdf

Example 4: Prospective surveillance of saxagliptin

52

Page 54: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

53http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm071627.pdf 53

Page 55: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

54

SAVOR‐TIMI 53 Trial

54

Page 56: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

55www.sentinelinitiative.org/sites/default/files/Drugs/Assessments/Mini‐Sentinel_AMI‐and‐Anti‐Diabetic‐Agents_Protocol_0.pdf

Prospective surveillance of saxagliptin

55

Page 57: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

56

Saxagliptin vs. sitagliptin

56

Page 58: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

57

Saxagliptin vs. pioglitazone

57

Page 59: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

58

Saxagliptin vs. sulfonylureas

58

Page 60: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

59

Saxagliptin vs. long‐acting insulin

59

Page 61: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

60

Comparisons with SAVOR‐TIMI 53 trial

Characteristics SAVOR‐TIMI 53 Trial Mini‐Sentinel surveillance*

Comparator Placebo Select anti‐hyperglycemics

No. saxagliptin users 8,280 82,264

No. comparator users 8,212 146,045 to 452,969

No. AMI in saxagliptin 265 94 to 171

No. AMI in comparator 278 75 to 1,085

Length of follow‐up 2.1 years (median) 4 to 8 months (mean)

Statistical analysis Intention‐to‐treat As‐treated

Hazard ratio for AMI 0.95 (95% CI: 0.80, 1.12) 0.54 to 1.17

* From end‐of‐surveillance analysis that included all patients

Interim results from the first 5 sequential analyses were made available to FDA prior to the publication of SAVOR‐

TIMI 53 findings

60

Page 62: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Overview

Evolution of multi‐center studies

Analytic methods in multi‐center studies

Select examples

Discussion

61

Page 63: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Analytical flexibility vs. granularity of information

62

Analytic flexibility

Individual‐level data 

with individual covariates

Effect‐estimate data

Individual‐level data 

with summary scores

Summary‐table data

Risk‐set data

Intermediate statistics

Privacy protection

Page 64: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Analytic methods in multi‐center studies

Covariate summarization technique

Individual covariates*

Propensity scores

Disease risk scores

Summary scores + individual covariates

A hybrid of above

Data sharing approach

Individual‐level data

Summary‐table data

Risk‐set data

Effect‐estimate data

Intermediate statistics

Covariate adjustment technique

Matching

Stratification

Restriction

Weighting

Modeling

Outcome type

Continuous

Binary

Count

Survival

63

What to share? How to share? What can we do? What outcome?

Page 65: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Analytic methods in multi‐center studies

Covariate summarization technique

Individual covariates

Propensity scores

Disease risk scores

Summary scores + individual covariatesA hybrid of above

Data sharing approach

Individual‐level data

Summary‐table data

Risk‐set data

Effect‐estimate data

Intermediate statistics

Covariate adjustment technique

Matching

Stratification

Restriction

Weighting

Modeling

Outcome type

Continuous

Binary

Count

Survival

64

Page 66: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

Conclusion

A suite of analytic methods are available for multi‐center studies

There are often trade‐offs between analytic flexibility and identifiability of information shared

Some newer methods offer excellent analytic flexibility and good privacy protection 

65

Page 67: (20181118, ADA) Distributed analysis in multi-center studies...methods that allow robust and flexible statistical analysis using aggregate-level information, without centralized pooling

[email protected]

@darrentoh_epi

https://www.distributedanalysis.org

66