online data supplement do types really exist?

40
1 ONLINE DATA SUPPLEMENT DO “COPD SUBTYPESREALLY EXIST? Assessment of COPD Heterogeneity and Clustering Reproducibility in 17,154 Individuals Across Ten Independent Cohorts Peter J Castaldi, MD, Marta Benet, MS, Hans Petersen, MS, Nicholas Rafaels, MS, James Finigan, MD, Matteo Paoletti, PhD, H. Marike Boezen, PhD, Judith M. Vonk, PhD, Russell Bowler, MD, Ph.D. , Massimo Pistolesi, MD, Milo A. Puhan, MD, PhD, Josep Anto, MD,Els Wauters, MD, Diether Lambrechts, PhD, Wim Janssens, MD, Francesca Bigazzi, MD, Gianna Camiciottoli, MD, Michael H Cho, MD, Craig P Hersh, MD, Kathleen Barnes, PhD, Stephen Rennard, MD, Meher Preethi Boorgula, MS, Jennifer Dy, PhD, Nadia H Hansel, James D Crapo, MD, Yohannes Tesfaigzi, PhD, Alvar Agusti, MD, Edwin K Silverman, MD, PhD, Judith Garcia-Aymerich, PhD COHORT DESCRIPTIONS STATISTICAL ANALYSIS Cluster Analysis Assessment of reproducibility of clustering methods Identification of Severe Airflow Limitation and Moderate Airflow Limitation Clusters Clustering of More Comprehensive Feature Set in the COPDGene- ECLIPSE Substudy RESULTS Table E1. Feature Importance Scores* from Unsupervised Random Forests Table E2. Characteristics of The Most Replicable Solution for CLIPCOPD study.

Upload: others

Post on 09-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

1

ONLINE DATA SUPPLEMENT

DO “COPD SUBTYPES” REALLY EXIST?

Assessment of COPD Heterogeneity and Clustering Reproducibility in 17,154

Individuals Across Ten Independent Cohorts

Peter J Castaldi, MD, Marta Benet, MS, Hans Petersen, MS, Nicholas Rafaels, MS, James Finigan, MD, Matteo Paoletti, PhD, H. Marike Boezen, PhD, Judith M. Vonk, PhD, Russell Bowler, MD, Ph.D. , Massimo Pistolesi, MD, Milo A. Puhan, MD, PhD, Josep Anto, MD,Els Wauters, MD, Diether Lambrechts, PhD, Wim Janssens, MD, Francesca Bigazzi, MD, Gianna Camiciottoli, MD, Michael H Cho, MD, Craig P Hersh, MD, Kathleen Barnes, PhD, Stephen Rennard, MD, Meher Preethi Boorgula, MS, Jennifer Dy, PhD, Nadia H Hansel, James D Crapo, MD, Yohannes Tesfaigzi, PhD, Alvar Agusti, MD, Edwin K Silverman, MD, PhD, Judith Garcia-Aymerich, PhD

COHORT DESCRIPTIONS

STATISTICAL ANALYSIS

Cluster Analysis

Assessment of reproducibility of clustering methods

Identification of Severe Airflow Limitation and Moderate Airflow

Limitation Clusters

Clustering of More Comprehensive Feature Set in the COPDGene-

ECLIPSE Substudy

RESULTS

Table E1. Feature Importance Scores* from Unsupervised Random

Forests

Table E2. Characteristics of The Most Replicable Solution for CLIPCOPD

study.

Page 2: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

2

Table E3. Characteristics of The Most Replicable Solution for COPDGene

study.

Table E4. Characteristics of The Most Replicable Solution for ECLIPSE

study.

Table E5. Characteristics of The Most Replicable Solution for

ICECOLDERIC study.

Table E6. Characteristics of The Most Replicable Solution for LEUVEN

study.

Table E7. Characteristics of The Most Replicable Solution for LifeLines

study.

Table E8. Characteristics of The Most Replicable Solution for Lovelace

study.

Table E9. Characteristics of The Most Replicable Solution for Lung

Health Study.

Table E10. Characteristics of The Most Replicable Solution for NJH

study.

Table E11. Characteristics of The Most Replicable Solution for PAC-

COPD study.

Table E12. Characteristics of Most Reproducible COPDGene Clusters

from the Clustering Substudy Limited to Subjects in GOLD Spirometric

Stage 2.

Table E13. Correlation Matrix for COPDGene Substudy Continuous

Clustering Variables (GOLD 2-4 subjects, N=4053)

Page 3: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

3

Table E14. Correlation Matrix for ECLIPSE Substudy Clustering

Variables (GOLD 2-4 subjects, N=1611)

Figure E1. PCA Screeplots in Participating Cohorts.

Figure E2. Multi-dimensional Scaling (MDS) Visualization of Similarity

Matrix for All Cohorts

Figure E3. Reproducibility of Different Clustering Methods in the

COPDGene-ECLIPSE Substudy.

REFERENCES

ACKNOWLEDGEMENTS

Page 4: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

4

COHORT DESCRIPTIONS

CLIP-COPD: CLIP-COPD is a single center observational study designed to assign

412 Caucasian patients with COPD to a predominant airway or predominant

parenchymal disease phenotype on the basis of CT densitometric changes. The

design of the study has been reported previously[1-3]. The aim of the study was to

establish a link between quantitative CT data on lung density and airway wall

thickening and clinical and whole pulmonary function evaluation. In particular, by

using a statistical approach allowing the classification of patients by large sets of

variables, avoiding a priori expectations about disease characteristics, we wanted to

ascertain whether the overall severity and the predominant type of the lung

pathologic changes quantitatively assessed by CT could be predicted by clinical and

pulmonary function data. Static and dynamic lung volumes and single breath

diffusing capacity of the lung for carbon monoxide (DLCO) were measured by a

mass-flow sensor and a multi-gas analyzer (V6200 Autobox Body Plethysmograph;

Sensor Medics, Yorba Linda, CA, USA) according to American Thoracic Society

(ATS)/ European Respiratory Society (ERS) guidelines, and expressed as

percentages of the predicted values. Patients enrolled in the study displayed an

obstructive pattern (FEV1/FVC<0.7) after the administration of 400 mcg of

salbutamol. Asthma and cardiovascular disease (defined as any one of the following

conditions: idiopathic arterial hypertension, ischemic heart disease, heart failure,

peripheral vascular disease) were ascertained by means of an interview, taking into

Page 5: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

5

account also of records of medical diagnosis obtained with objective diagnostic

criteria and pharmacological therapy prescribed by general practitioners.

COPDGene: COPDGene is a multicenter, longitudinal study designed to investigate

the genetic and epidemiologic characteristics of COPD and other smoking-related

lung diseases. The design of the study has been reported previously[4]. Briefly,

10,192 smokers with a wide range of lung function were recruited into the

COPDGene Study from 2007 to 2011. Non-Hispanic white (NHW) and African-

American (AA) subjects between the ages of 45 and 80 with at least a ten pack-year

smoking history were enrolled. Spirometry was performed using an ultrasound-

based spirometer (NDD, EasyOne Spirometer Medizintechnik AG, Zurich,

Switzerland) before and after administration of short-acting β2-agonist (albuterol)

in accordance with ATS recommendations[5]. For this analysis, post-bronchodilator

spirometry values were used. Asthma and cardiovascular disease were obtained

from patient self-report, with cardiovascular disease consisting of a composite

definition of any of the following conditions (myocardial infarction, angina, coronary

artery disease, congestive heart failure, stroke, or peripheral vascular disease).

ECLIPSE: The ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive

Surrogate Endpoints) Study is a multicenter, longitudinal study with three-year

follow-up data available for 2,501 smoking subjects (2,164 subjects with COPD and

337 smoking controls). The detailed study protocol and inclusion criteria have been

previously published[6]. The recruitment criteria included an age between 40 to 75

Page 6: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

6

years, a smoking history of ten or more pack-years, a forced expiratory volume in 1

second (FEV1) of less than 80% of predicted value after bronchodilator use, and a

ratio of FEV1 to forced vital capacity (FVC) of <0.7 after bronchodilator use. At

baseline, patients underwent standard spirometry after the administration of 400

μg of inhaled albuterol. Computed tomography (CT) scanning of the chest was

performed to evaluate the severity and distribution of emphysema. The patients’

self-reported respiratory symptoms, medications, smoking history, occupational

exposure, and coexisting medical conditions were documented at study entry with

the use of an updated version of the American Thoracic Society–Division of Lung

Disease (ATS-DLD) questionnaire[7].

ICE COLD ERIC: The International Collaborative Effort on Chronic Obstructive Lung

Disease: Exacerbation Risk Index Cohorts (ICE COLD ERIC) is an international multi-

site prospective cohort study with primary care patients with COPD from

Switzerland and the Netherlands. All included patients have provided written

informed consent. The study has been approved of by all local ethics committees

and is registered on www.ClinicalTrials.gov (NCT00706602). Detailed information

on the study design[8] and the baseline results[9] were published elsewhere.

At study enrollment (April 2008 to August 2009) patients had to be ≥40 years of

age, had Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages 2 to 4

(based on post-bronchodilator values after the administration of 400 μg of inhaled

albuterol) and free of exacerbation for at least 4 weeks at baseline. Follow-up

assessments took place every 6 months up to five years. The assessment of

Page 7: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

7

comorbidities was done by experienced and well-trained study nurses or physicians

during the baseline visits of the cohort study, which took place at the primary care

practices. The patients were asked which comorbidities they had using open-ended

questions. The patients also brought a list with all drugs they were taking to the

baseline interview. The study nurses or physicians compared the patient-reported

comorbidities with the list of medications (and in Switzerland also with the patient

records) and clarified with the general practitioners any uncertainties or

mismatches between the patients’ reports, the drug list, or the patients’ obvious

heath condition. Cardiovascular disease included any symptomatic (e.g. coronary

heart disease or heart failure) or previous events but no risk factors such as arterial

hypertension or hypercholesterolemia.

Leuven: The LEUVEN cohort is composed of 548 subjects that were prospectively

recruited at the COPD outpatient clinic of University Hospital of Leuven

(Belgium)[10]. Inclusion criteria were a smoking history of at least 15 pack-years, a

minimum age of 50 years and the availability of a complete pulmonary function test.

All pulmonary function measurements were performed with standardized

equipment (Sensormedics Whole Body Plethysmograph, Viasys Healthcare,

Belgium) and according to ATS and ERS guidelines. Spirometric values were post-

bronchodilator measurements. Patients with suspicion or diagnosis of asthma were

excluded, as well as patients with other respiratory diseases affecting pulmonary

function. All COPD patients had a stable clinical condition with no exacerbation

within 6 weeks before inclusion. From all study subjects, an extensive list of

Page 8: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

8

demographic variables (including age, gender, body mass index in kg/m2),

questionnaires determining smoking history, and a CT scan of the chest within one

year of enrollment were collected. Symptom level was assessed by using

the modified Medical Research Council (mMRC) dyspnea scale. Cardiovascular

disease (CVD) was self-reported and confirmed through medical record review, with

CVD consisting of a composite definition of any one of the following conditions:

ischemic heart disease, stroke or peripheral artery disease.

LifeLines: LifeLines is a three generational longitudinal general population based

cohort including 167,729 subjects from the three northern provinces of The

Netherlands[11,12]. During the LifeLines baseline visits (2007-2013) extensive

phenotyping has been performed: anthropometric measurements, ECG, lung

function tests, psychiatric interview, cognitive function tests, blood and urine

samples (now stored in the Biobank), and questionnaires on health, lifestyles, stress

and quality of life. The first round of follow-up, approximately 5 years after the

baseline visit, started in January 2014 and total follow-up duration in LifeLines will

be 30 years. For the current analyses only the baseline measurements were used.

We included subjects aged 40 years and over of self-reported Caucasian descent

with a FEV1/FVC pre-bronchodilator ratio lower than 70%. Asthma and

cardiovascular disease were obtained from the questionnaire; asthma was defined

as self-reported doctor’s diagnosed asthma ever, and cardiovascular disease was

defined as at least one cardiovascular event (self-reported heart attack,

cardiovascular surgery, or cerebrovascular accident).

Page 9: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

9

Lovelace: The Lovelace Smokers Cohort (LSC) is a cohort of 2400 current and

former smokers in Albuquerque, NM. Since women are underrepresented in most

studies of airflow obstruction, this large cohort of primarily female ever-smokers

(approximately 80% women) was assembled to study the susceptibility of women

to the adverse effects of cigarette smoking. Details regarding this cohort have been

previously published. Enrollment was restricted to current and former smokers age

40–74 years with a minimum of ten pack-years of smoking[13,14]. An average of

four pre- and post-bronchodilator spirometry tests have been performed on each

subject by respiratory therapists who were periodically re-credentialed, as part of a

standardized laboratory proficiency testing plan[15]. The reference standards were

those from the National Health and Nutrition Examination Survey (NHANES) III

spirometric reference. A detailed questionnaire written in English was used to

collect information on demographics; medical, smoking, and exposure history;

socioeconomic status; and quality of life. Standardized questionnaires were also

administered at each examination to record each patient’s history of asthma and

cardiovascular disease. Cardiovascular disease was defined for this study as any

one of the following conditions: myocardial infarction, coronary artery disease,

congestive heart failure, stroke, or peripheral vascular disease.

Lung Health Study: The LHS was a multicenter (ten centers) clinical study in the US

and Canada. The initial study population consisted of 5,887 men and women (63%

male) who were current smokers (aged 35–60) with spirometric evidence of mild to

Page 10: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

10

moderate airflow limitation[16]. Of these participants, 96% self-reported as

European American, 4,126 had clean genotype and phenotype data. 3,989 from this

subset had non-missing mMRC (dyspnea) values, and 3,132 had post bronchodilator

FEV1/FVC < 0.7. Thus 3,132 participants were included in this analysis. All

participants were randomized with equal probability into three groups: (i) smoking

intervention plus bronchodilator (ipratropium bromide); (ii) smoking intervention

plus placebo; or (iii) usual care. The primary outcomes were the rate of change and

the cumulative change in lung function (forced expiratory volume in one second

(FEV1)) over a 5-year period. Lung function was measured annually according to

ATS guidelines using identical spirometers, software, procedures and reading center

personnel. The quality of the spirometry was monitored centrally throughout the

testing, and comparison of baseline spirometry measures showed good

reproducibility. Post-bronchodilator spirometry values were used for the current

analysis. Asthma, chronic bronchitis, exacerbations, mMRC (dyspnea) and

cardiovascular disease were obtained from patient self-report, with cardiovascular

disease consisting of a composite definition of any one of the following conditions

(myocardial infarction, coronary artery disease, congestive heart failure or stroke).

Baseline exclusion criteria for enrollment included heart attack or stroke within the

past 2 years or other important medical conditions, including high blood pressure.

Individuals prescribed beta-blockers or nitrates were excluded from the study[16].

Page 11: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

11

NJH: The National Jewish Health COPD cohort includes patients with COPD drawn

from the National Jewish Health pulmonary clinic. The cohort includes over 2000

patients. All patients have a forced expiratory volume in 1 second (FEV1) of less

than 80% of predicted value after bronchodilator (albuterol) nebulization, and a

ratio of FEV1 to forced vital capacity (FVC) of 0.7 or less after bronchodilator

(albuterol) nebulization. Standard evaluation of these patients includes extensive

phenotyping including race, smoking and other exposure history, other current or

previous diagnoses, body mass index, spirometry, CT scan of the chest and MMRC.

Asthma and cardiovascular disease were obtained by self report. Cardiovascular

disease was defined as the presence of coronary artery disease or heart attack, heart

failure, peripheral vascular disease or stroke. For the current analysis, 60 patients

with complete data for clustering variables were included.

PAC-COPD: The Phenotype and Course of COPD Study (PAC-COPD, Spain) is a

prospective longitudinal study of 342 COPD patients hospitalized for the first time

because of a COPD exacerbation in nine teaching hospitals in Spain between January

2004 and March 2006[17,18] Patients aged <45 years and those with cancer,

residual extensive tuberculosis lesions of more than one third of the pulmonary

parenchyma, pneumonectomy and/or pneumoconiosis were excluded. All

epidemiological, clinical, functional and biological data were collected when patients

had been clinically stable for ≥3 months after hospital discharge. The diagnosis of

COPD was established according to the American Thoracic Society/European

Respiratory Society definition of post-bronchodilator forced expiratory volume in 1

Page 12: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

12

s (FEV1) to forced vital capacity ratio ≤0.70[19]. Post-bronchodilator spirometry

values were used for the current analysis. Sociodemographic data and smoking

status were obtained from epidemiological questionnaires. Dyspnea was assessed

by using the mMRC questionnaire, yielding a score ranging from zero to four. Weight

and height were obtained from a physical examination performed by a respiratory

physician participating in the study. Co-morbidities were defined as per doctor

diagnosis after physical examination and medical charts study. Cardiovascular

disease was defined for this study as any one of the following conditions: myocardial

infarction, coronary artery disease, congestive heart failure, stroke, or peripheral

vascular disease.

Page 13: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

13

STATISTICAL ANALYSIS

Cluster Analysis: The clustering features were: FEV1 percent of predicted (based

on local equations), FVC percent of predicted, FEV1/FVC ratio, body mass index

(BMI), modified Medical Research Council (MMRC) dyspnea score (zero to four), and

self-reported asthma and cardiovascular disease diagnosis (defined as at least one

of the following: ischaemic heart disease, stroke, congestive heart failure or

peripheral vascular disease). The clustering process consisted of two distinct

stages: 1) variable prioritization and generation of a subject similarity matrix and 2)

cluster identification. Unsupervised random forests was used for feature (i.e.

variable) prioritization and generation of a subject similarity matrix. Briefly, this

method can be applied to mixed (i.e. continuous and categorical) data types, and

provides a data-driven approach to feature weighting and selection[20]. This

approach leverages a supervised learning method (i.e. random forests) to

discriminate between the actual data and permuted data drawn from the same data

distributions (in which correlations between features have been broken via a

permutation procedure). In this formulation, the actual and permuted data are

labeled and combined, and the random forests procedure is used to predict the

“actual” observations from the permuted ones. This leads to a natural weighting of

variables based on those that are most useful for discrimination between real and

permuted data, because these variables are selected more frequently in the random

forests procedure. Based on the resulting trees, a similarity matrix is constructed

Page 14: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

14

based on the number of times pairs of observations co-occur within the same

terminal node.

To quantify the importance of specific variables, and additional permutation

procedure is performed within the subset of “actual” observations. Each feature is

individually permuted, and the impact of this permutation on overall prediction

performance (i.e. ability to successfully discriminate between actual and permuted

observations) is quantified. Thus, an importance of 0.05 for a given feature indicates

that permutation of this feature alone resulted in a 5% decrease in prediction

accuracy.

We used two approaches (k-medoids and hierarchical clustering) to generate

clusters from the subject similarity matrix. Both are standard methods frequently

applied in clustering problems. For k-medoids, the range of number of clusters, k,

was evaluated between two and ten, resulting in nine distinct k-medoids cluster

solutions in each cohort. For hierarchical clustering, we used the dynamic tree cut

algorithm to identify the number of clusters, as described in detail by Langfelder

and Horvath[21]. The dynamic tree cut algorithm uses specific similarity criteria to

identify subjects that are not sufficiently similar to other members in their assigned

cluster. These subjects are then reassigned to a miscellaneous group of “poorly

clustered” subjects. In subsequent reproducibility analyses, these results were

analysed with and without these “poorly clustered” individuals.

In our study, the dynamic tree cut algorithm resulted in 14 distinct cluster

solutions per cohort[21]. In subsequent analysis of these hierarchical solutions,

cross-cohort concordance was assessed both including and excluding unclustered

Page 15: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

15

subjects. In total, 23 clustering solutions (nine from k-medoids and 14 from

hierarchical) were generated in each of the ten cohorts, giving a total of 230

solutions. All analyses were performed in R (v3.1.0).

Assessment of reproducibility of clustering methods: In this section, for each

cluster solution we distinguish between the dataset in which the clustering model

was identified, i.e. the source dataset, and the nine other replication datasets to

which the clustering model was transferred to “re-generate” cluster assignments.

For each cluster solution within each given dataset, referred to as the source

solution, a predictive model using the input features was trained using supervised

random forests (as implemented in the randomForest R package) in the source

dataset[22]. This model was then applied in each of the nine replication datasets to

assign subjects to the clusters identified in the source solution, thereby

“transferring” a cluster solution from the source to the replication dataset; such re-

generated solutions are referred to as the transferred solutions (see Figure 1). In

order to assess reproducibility of cluster solutions, we calculated the normalized

mutual information (NMI)[23] between the 23 source solutions and 207 (230 minus

23) transferred solutions. The NMI is a measure of concordance ranging from zero

to one, where higher values can be interpreted as better replication (i.e. greater

agreement between the source and transferred solutions). Then, for each

participating study, the source solution with the highest average NMI across the

nine other studies was identified, resulting in a single “best NMI” (i.e., most

reproducible) solution for each dataset (i.e. ten best NMI solutions).

Page 16: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

16

Identification of Severe Airflow Limitation and Moderate Airflow Limitation

Clusters: A recent meta-analysis by Pinto et al. qualitatively determined that two

subtypes appeared to be common across multiple COPD clustering analyses, based

on comparison of cluster characteristics across studies[24]. The first subtype was

described as “younger with severe respiratory disease, having a low probability of

cardiovascular co-morbidities, high prevalence of poor nutritional status and poor

health status with poor longitudinal health outcomes” and the second subtype was

defined as having “moderate respiratory disease, and a high prevalence of obesity,

and increased prevalence of cardiovascular and metabolic co- morbidities and

inflammatory markers.” Adapting these definitions to the cluster variables used in

our analysis, we examined the best-NMI solution from each cohort to determine

whether any clusters met the following criteria: 1) low FEV1 (<45% predicted), low

BMI (<27), and high MMRC score (>1) or 2) moderately reduced FEV1

(45<FEV1<80), high BMI (>29), and high MMRC score (>1).

Clustering of More Comprehensive Feature Set in the COPDGene-ECLIPSE

Substudy: Because the set of COPD-related features common to all 10 datasets was

limited, we performed a separate analysis in the COPDGene and ECLIPSE studies

using the same clustering methods but with a more extensive set of features. In

addition to the seven features used in the primary analysis, the additional clustering

features were: LAA950 (the percentage of voxels on the inspiratory chest CT scan

with a density below 950 Hounsfield units), Pi10 (estimated airway wall thickness

Page 17: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

17

for a 10mm internal perimeter airway), the Saint George’s Respiratory

Questionnaire total score, chronic bronchitis (presence of cough and phlegm for ≥ 3

months a year for at least 2 consecutive years), and respiratory exacerbations (self-

reported change in respiratory symptoms requiring either antibiotics or oral

steroids). The clustering methods and assessment of replication were the same as in

the primary study.

Page 18: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

18

Table E1. Feature Importance Scores* from Unsupervised Random Forests

CLIPCOPD COPDGene ECLIPSE ICECOLDERIC LEUVEN LifeLines Lovelace

Lung

Health

Study NJH

PAC-

COPD

FEV1 (% predicted) 0.13 0.27 0.18 0.12 0.15 0.22 0.16 0.22 0.04 0.12

FEV1/FVC (%) 0.07 0.2 0.13 0.09 0.12 0.09 0.08 0.15 <0.01 0.08

FVC (% predicted) 0.09 0.2 0.12 0.08 0.09 0.19 0.13 0.21 0.03 0.07

BMI (kg/m2) <0.01 0.01 0.01 <0.01 0.01 0.01 <0.01 <0.01 -0.01 0.02

MMRC (0-4) 0.01 0.03 0.01 0.01 0.01 0.01 0.01 <0.01 <0.01 0.01

Asthma <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01

CVD <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01

* Feature Importance Scores are mean decrease in prediction accuracy when each feature is permuted, as estimated by the unsupervised random

forests procedure.

Page 19: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

19

Table E2. Characteristics of The Most Replicable Solution for CLIPCOPD study.

Cluster 1 Cluster 2 Cluster 3 Cluster 4

N 113 144 58 52

FEV1 (% predicted) 65 (13) 42 (12) 96 (16) 86 (9)

FEV1/FVC (%) 51 (10) 48 (11) 67 (2) 59 (3)

FVC (% predicted) 103 (19) 71 (16) 114 (17) 114 (13)

BMI (kg/m2) 27 (6) 25 (4) 26 (4) 28 (4)

MMRC (0-4) 2.1 (0.9) 2.3 (1.0) 1.4 (0.9) 2.0 (0.9)

Asthma, % 2 1 0 0

CVD, % 49 36 57 46

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 20: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

20

Table E3. Characteristics of The Most Replicable Solution for COPDGene study.

Cluster 1 Cluster 2 Cluster 3

N 2660 931 880

FEV1 (% predicted) 58 (15) 86 (10) 27 (7)

FEV1/FVC (%) 54 (11) 64 (4) 35 (8)

FVC (% predicted) 83 (17) 102 (12) 59 (12)

BMI (kg/m2) 29 (6) 27 (5) 26 (6)

MMRC (0-4) 2.1 (1.4) 0.2 (0.4) 3.2 (0.7)

Asthma, % 24 13 27

CVD, % 22 13 23

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 21: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

21

Table E4. Characteristics of The Most Replicable Solution for ECLIPSE study.

Cluster 1 Cluster 2 Cluster 3

N 1654 250 190

FEV1 (% predicted) 44 (13) 25 (5) 67 (8)

FEV1/FVC (%) 46 (11) 30 (4) 55 (6)

FVC (% predicted) 80 (20) 64 (12) 100 (10)

BMI (kg/m2) 27 (6) 24 (4) 25 (4)

MMRC (0-4) 1.7 (1.0) 2.5 (0.8) 0.5 (0.5)

Asthma, % 22 26 22

CVD, % 23 20 16

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 22: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

22

Table E5. Characteristics of The Most Replicable Solution for ICECOLDERIC study.

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9 Cluster 10

N 67 90 51 36 32 31 30 24 21 21

FEV1 (% predicted) 58 (15) 71(6) 28 (7) 37 (8) 67 (6) 50 (4) 64 (6) 50 (5) 61 (5) 55 (6)

FEV1/FVC (%) 56 (10) 63 (4) 38 (10) 38 (8) 48 (5) 40 (5) 54 (5) 51 (4) 55 (4) 64 (3)

FVC (% predicted) 87 (23) 92 (11) 63 (15) 78 (12) 115 (10) 102 (12) 98 (6) 80 (8) 92 (10) 67 (6)

BMI (kg/m2) 28 (6) 29 (6) 23 (3) 23 (4) 27 (4) 25 (3) 22 (2) 26 (5) 28 (2) 24 (3)

MMRC (0-4) 1.9 (1.6) 1.3 (1.3) 2.5 (1.3) 2.5 (1.3) 1.8 (1.4) 1.9 (1.5) 2.1 (1.7) 1.6 (1.5) 2.2 (1.3) 1.3 (1.1)

Asthma, % 4 1 2 3 6 6 3 0 14 5

CVD, % 22 21 14 17 22 16 17 25 24 24

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 23: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

23

Table E6. Characteristics of The Most Replicable Solution for LEUVEN study.

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8

N 80 132 95 60 56 46 42 37

FEV1 (% predicted) 43 (10) 75 (12) 30 (6) 51 (7) 56 (5) 51 (3) 26 (5) 41 (3)

FEV1/FVC (%) 40 (11) 57 (8) 38 (7) 58 (7) 47 (3) 39 (3) 29 (4) 36 (3)

FVC (% predicted) 86 (16) 106 (15) 63 (10) 69 (8) 94 (9) 103 (10) 73 (15) 89 (5)

BMI (kg/m2) 24 (5) 26 (4) 24 (5) 30 (6) 23 (3) 26 (5) 21 (3) 22 (2)

MMRC (0-4) 2.1 (1.1) 1.4 (1.1) 2.4 (1.1) 2.1 (0.9) 1.3 (0.9) 1.9 (0.9) 2.3 (1.4) 2.1 (1.2)

Asthma, % 0 0 0 0 0 0 0 0

CVD, % 35 43 37 45 41 39 19 35

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 24: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

24

Table E7. Characteristics of The Most Replicable Solution for LifeLines study.

Cluster 1 Cluster 2 Cluster 3 Cluster 4

N 3748 791 380 279

FEV1 (% predicted) 88 (11) 110 (8) 98 (3) 59 (10)

FEV1/FVC (%) 65 (5) 67 (2) 67 (2) 54 (9)

FVC (% predicted) 113 (13) 136 (9) 120 (3) 89 (11)

BMI (kg/m2) 26 (4) 25 (2) 24 (2) 27 (4)

MMRC (0-4) 0.4 (0.7) 0 (0) 0 (0) 0.9 (1.2)

Asthma, % 15 0 0 35

CVD, % 55 1 0 10

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 25: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

25

Table E8. Characteristics of The Most Replicable Solution for Lovelace study.

Cluster 1 Cluster 2

N 350 189

FEV1 (% predicted) 63 (15) 91 (9)

FEV1/FVC (%) 57 (10) 65 (3)

FVC (% predicted) 85 (15) 107 (11)

BMI (kg/m2) 27 (7) 26 (4)

MMRC (0-4) 1.6 (1.3) 0.9 (0.9)

Asthma, % 31 22

CVD, % 29 18

Values are mean (SE) or median (IQR) unless otherwise

indicated.

Page 26: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

26

Table E9. Characteristics of The Most Replicable Solution for Lung Health Study.

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

N 2435 200 168 165 164

FEV1 (% predicted) 76 (7) 60 (4) 88 (2) 91 (3) 84 (1)

FEV1/FVC (%) 63 (5) 57 (4) 69 (1) 67 (2) 67 (1)

FVC (% predicted) 95 (11) 84 (5) 101 (2) 108 (3) 99 (2)

BMI (kg/m2) 25 (4) 26 (5) 26 (3) 25 (4) 25 (3)

MMRC (0-4) 0.5 (0.7) 0.7 (0.9) 0.4 (0.6) 0.3 (0.6) 0.3 (0.6)

Asthma, % 8 6 4 6 9

CVD, % 2 1 2 2 1

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 27: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

27

Table E10. Characteristics of The Most Replicable Solution for NJH study.

Cluster

1

Cluster

2

Cluster

3

Cluster

4

Cluster

5

Cluster

6

Cluster

7

Cluster

8

Cluster

9

Cluster

10

Cluster

11

Cluster

12

Cluster

13

N 7 7 6 6 5 5 4 4 4 3 3 3 3

FEV1 (% predicted) 42 (5) 39 (7) 54 (6) 23 (3) 29 (4) 24 (3) 66 (2) 63 (6) 34 (2) 18 (2) 37 (2) 24 (4) 28 (6)

FEV1/FVC (%) 49 (6) 53 (6) 65 (7) 40 (2) 60 (7) 51 (8) 62 (2) 64 (4) 64 (3) 45 (4) 53 (3) 60 (4) 36 (2)

FVC (% predicted) 80 (8) 74 (9) 75 (8) 52 (4) 46 (2) 42 (3) 102

(4)

88

(12) 54 (3)

39

(12) 62 (3) 48 (2)

72

(13)

BMI (kg/m2) 26 (2) 18 (4) 32 (3) 21 (4) 42 (9) 21 (3) 25 (1) 40 (8) 23 (5) 30 (1) 28 (1) 28 (2) 31 (9)

MMRC (0-4) 2.7

(0.5)

3.3

(0.5)

1.7

(1.4)

2.8

(0.8)

3.4

(0.6)

3.4

(0.6)

1.5

(1.3)

3.3

(0.5)

3.0

(0.8)

3.7

(0.6) 3.0 (0) 3.0 (0)

2.7

(0.6)

Asthma, % 0 0 33 0 0 0 25 0 25 0 0 0 0

CVD, % 29 29 0 50 0 20 50 25 25 0 67 0 0

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 28: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

28

Table E11. Characteristics of The Most Replicable Solution for PAC-COPD study.

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6

N 136 (136) 58 (58) 47 (47) 45 (45) 25 (25) 23 (23)

FEV1 (% predicted) 51 (8) 32 (8) 42 (5) 64 (4) 78 (6) 81 (10)

FEV1/FVC (%) 56 (8) 41 (8) 39 (7) 66 (4) 65 (5) 62 (5)

FVC (% predicted) 70 (17) 58 (9) 80 (8) 72 (5) 89 (6) 96 (13)

BMI (kg/m2) 29 (5) 25 (4) 26 (4) 32 (3) 32 (3) 25 (3)

MMRC (0-4) 1.7 (1.3) 1.9 (1.3) 2.2 (1.2) 1.4 (0.8) 1.1 (1.0) 1.2 (1.2)

Asthma, % 65 79 77 64 36 70

CVD, % 33 7 17 31 28 22

Values are mean (SE) or median (IQR) unless otherwise indicated.

Page 29: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

29

Table E12. Characteristics of the Most Reproducible COPDGene Clusters from the Clustering Substudy Limited to

Subjects in GOLD Spirometric Stage 2.

Cluster 1 Cluster 2 Cluster 3 Cluster 4

N 1151 75 59 52

Age (years) 63 (9) 61 (9) 67 (8) 61 (8)

Pack Years, 51 (28) 45 (21) 56 (28) 55 (27)

Sex: female, %, 47 39 44 54

Ethnicity: African-American, % 27 20 13 22

FEV1 (% Predicted) 65 (8) 74 (4) 56 (4) 59 (6)

FEV1/FVC (%) 59 (7) 64 (4) 45 (6) 60 (6)

FVC (% predicted) 85 (13) 90 (7) 94 (14) 75 (6)

BMI (kg/m2) 29 (6) 27 (5) 26 (4) 33 (7)

Pi10 3.70 (0.14) 3.64 (0.11) 3.68 (0.11) 3.75 (0.15)

LAA950 6.0 (6.4) 4.0 (4.7) 19.7 (8.4) 4.7 (4.6)

SGRQ 33.8 (19.8) 10.7 (8.6) 40.4 (17.6) 60.7 (13.6)

MMRC (0-4) 1.7 (1.3) 0.20 (0.40) 2.1 (1.2) 3.2 (0.7)

Exacerbations 0.54 (1.02) 0 (0.06) 0.64 (0.96) 1.72 (1.53)

Chronic Bronchitis, % 24 1 22 63

CVD, % 20 14 17 35

Asthma, % 27 12 21 38

Cluster 1 - cluster for unassigned subjects as determined by the clustering algorithm.

Pi10 - estimated wall area of a 10mm internal perimeter airway.

LAA950 - quantitative emphysema from chest computed tomography defined as percentage of lung voxels with density less than 950 Hounsfield units.

SGRQ - total score from Saint George's Respiratory Questionnaire.

MMRC - score on Modified Medical Research Council dyspnea questionnaire.

Exacerbations - number of reported respiratory exacerbations over the previous 12 months.

CVD - self reported history of coronary artery disease, myocardial infarction, congestive heart failure, peripheral vascular disease, or stroke.

Page 30: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

30

Asthma - self-reported history of asthma.

Page 31: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

Table E13. Correlation Matrix for COPDGene Substudy Continuous Clustering

Variables (GOLD 2-4 subjects, N=4053)

FEV1 FEV1/FVC FVC BMI Pi10 LAA950 SGRQ

FEV1 1 0.82 0.82 0.06 -0.36 -0.56 -0.54

FEV1/FVC 0.82 1 0.38 0.21 -0.21 -0.71 -0.44

FVC 0.82 0.38 1 -0.08 -0.36 -0.26 -0.43

BMI 0.06 0.21 -0.08 1 0.11 -0.31 0.08

Pi10 -0.36 -0.21 -0.36 0.11 1 -0.04 0.31

LAA950 -0.56 -0.71 -0.26 -0.31 -0.04 1 0.29

SGRQ -0.54 -0.44 -0.43 0.08 0.31 0.29 1

Table E14. Correlation Matrix for ECLIPSE Substudy Continuous Clustering Variables

(GOLD 2-4 subjects, N=1611)

FEV1 FEV1/FVC FVC BMI Pi10 LAA950 SGRQ

FEV1 1 0.71 0.65 0.13 -0.11 -0.47 -0.37

FEV1/FVC 0.71 1 0.04 0.26 0.06 -0.61 -0.18

FVC 0.65 0.04 1 -0.07 -0.25 -0.05 -0.33

BMI 0.13 0.26 -0.07 1 0.29 -0.25 0.05

Pi10 -0.11 0.06 -0.25 0.29 1 -0.17 0.15

LAA950 -0.47 -0.61 -0.05 -0.25 -0.17 1 0.19

SGRQ -0.37 -0.18 -0.33 0.05 0.15 0.19 1

Page 32: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

Figure E1. PCA Screeplots in Participating Cohorts.

Page 33: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

Figure E2. Multi-dimensional Scaling (MDS) Visualization of Similarity Matrix for All

Cohorts (GOLD 1 = blue, GOLD 2 = orange, GOLD 3 = green, GOLD 4 = red)

Page 34: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

Figure E3. Reproducibility of Different Clustering Methods in the COPDGene-ECLIPSE Substudy. Distribution of normalized mutual information (NMI) is shown for clustering with partitioning around medoids (PAM), hierarchical clustering including unclassified subjects (HC_U), and hierarchical clustering excluding unclassified subjects (HC). Results are shown for clustering in spirometric GOLD 2-4 subjects (Panel A) and GOLD 2 only (Panel B).

Page 35: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

REFERENCES 1 Pistolesi M, Camiciottoli G, Paoletti M, et al. Identification of a predominant COPD

phenotype in clinical practice. Respiratory Medicine 2008;102:367–76. doi:10.1016/j.rmed.2007.10.019

2 Paoletti M, Camiciottoli G, Meoli E, et al. Explorative data analysis techniques and unsupervised clustering methods to support clinical assessment of Chronic Obstructive Pulmonary Disease (COPD) phenotypes. Journal of Biomedical Informatics 2009;42:1013–21. doi:10.1016/j.jbi.2009.05.008

3 Camiciottoli G, Bigazzi F, Paoletti M, et al. Pulmonary function and sputum characteristics predict computed tomography phenotype and severity of COPD. European Respiratory Journal 2013;42:626–35. doi:10.1183/09031936.00133112

4 Regan EA, Silverman E, Hokanson JE, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD 2010;7:32–43. doi:10.3109/15412550903499522

5 Miller MR, Hankinson J, Brusasco V, et al. Standardisation of spirometry. Eur. Respir. J. 2005;26:319–38. doi:10.1183/09031936.05.00034805

6 Vestbo J, Anderson W, Coxson HO, et al. Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-points (ECLIPSE). European Respiratory Journal 2008;31:869–73. doi:10.1183/09031936.00111707

7 Comstock GW, Tockman MS, Helsing KJ, et al. Standardized respiratory questionnaires: comparison of the old with the new. Am Rev Respir Dis 1979;119:45–53. doi:10.1164/arrd.1979.119.1.45

8 Siebeling L, Puhan MA, Muggensturm P, et al. Characteristics of Dutch and Swiss primary care COPD patients - baseline data of the ICE COLD ERIC study. Clin Epidemiol 2011;3:273–83. doi:10.2147/CLEP.S24818

9 Siebeling L, Riet ter G, van der Wal WM, et al. ICE COLD ERIC--International collaborative effort on chronic obstructive lung disease: exacerbation risk index cohorts--study protocol for an international COPD cohort study. BMC Pulm Med 2009;9:15. doi:10.1186/1471-2466-9-15

10 Wauters E, Smeets D, Coolen J, et al. The TERT-CLPTM1L locus for lung cancer predisposes to bronchial obstruction and emphysema. European Respiratory Journal 2011;38:924–31. doi:10.1183/09031936.00187110

11 Scholtens S, Smidt N, Swertz MA, et al. Cohort Profile: LifeLines, a three-generation cohort study and biobank. International Journal of Epidemiology 2015;44:1172–80. doi:10.1093/ije/dyu229

Page 36: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

12 Stolk RP, Rosmalen JGM, Postma DS, et al. Universal risk factors for multifactorial diseases. Eur J Epidemiol 2008;23:67–74. doi:10.1007/s10654-007-9204-4

13 Bruse S, Sood A, Petersen H, et al. New Mexican Hispanic smokers have lower odds of chronic obstructive pulmonary disease and less decline in lung function than non-Hispanic whites. American Journal of Respiratory and Critical Care Medicine 2011;184:1254–60. doi:10.1164/rccm.201103-0568OC

14 Hunninghake GM, Cho M, Tesfaigzi Y, et al. MMP12, lung function, and COPD in high-risk populations. N Engl J Med 2009;361:2599–608. doi:10.1056/NEJMoa0904006

15 Pellegrino R, Decramer M, van Schayck CPO, et al. Quality control of spirometry: a lesson from the BRONCUS trial. Eur Respir J 2005;26:1104–9. doi:10.1183/09031936.05.00026705

16 Connett JE, Kusek JW, Bailey WC, et al. Design of the Lung Health Study: a randomized clinical trial of early intervention for chronic obstructive pulmonary disease. Control Clin Trials 1993;14:3S–19S.

17 Garcia-Aymerich J, Gómez FP, Benet M, et al. Identification and prospective validation of clinically relevant chronic obstructive pulmonary disease (COPD) subtypes. Thorax 2011;66:430–7. doi:10.1136/thx.2010.154484

18 Garcia-Aymerich J, Gómez FP, Antó JM, et al. [Phenotypic characterization and course of chronic obstructive pulmonary disease in the PAC-COPD Study: design and methods]. Arch Bronconeumol 2009;45:4–11. doi:10.1016/j.arbres.2008.03.001

19 Celli BR, MacNee W, ATS/ERS Task Force. Standards for the diagnosis and treatment of patients with COPD: a summary of the ATS/ERS position paper. Eur. Respir. J. 2004;23:932–46.

20 Shi T, Horvath S. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics 2012;15:118–38. doi:10.1198/106186006X94072

21 Langfelder P, Zhang B, Horvath S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 2008;24:719–20. doi:10.1093/bioinformatics/btm563

22 Breiman L. Random Forests. Machine Learning 2001;45:5–32. doi:10.1023/A:1010933404324

23 Strehl A, Ghosh J. Cluster ensembles --- a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research 2003;3:583–617. doi:10.1162/153244303321897735

24 Pinto LM, Alghamdi M, Benedetti A, et al. Derivation and validation of clinical

Page 37: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

phenotypes for COPD: a systematic review. Respiratory Research 2015;16:50. doi:10.1186/s12931-015-0208-4

Page 38: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

ACKNOWLEDGEMENTS The authors acknowledge Dr. Steve Horvath and Dr. Peter Langfelder for helpful email correspondence. ECLIPSE: ECLIPSE Investigators — Bulgaria: Y. Ivanov, Pleven; K. Kostov, Sofia. Canada: J. Bourbeau, Montreal; M. Fitzgerald, Vancouver, BC; P. Hernandez, Halifax, NS; K. Killian, Hamilton, ON; R. Levy, Vancouver, BC; F. Maltais, Montreal; D. O'Donnell, Kingston, ON. Czech Republic: J. Krepelka, Prague. Denmark: J. Vestbo, Hvidovre. The Netherlands: E. Wouters, Horn-Maastricht. New Zealand: D. Quinn, Wellington. Norway: P. Bakke, Bergen. Slovenia: M. Kosnik, Golnik. Spain: A. Agusti, J. Sauleda, P. de Mallorca. Ukraine: Y. Feschenko, V. Gavrisyuk, L. Yashina, Kiev; N. Monogarova, Donetsk. United Kingdom: P. Calverley, Liverpool; D. Lomas, Cambridge; W. MacNee, Edinburgh; D. Singh, Manchester; J. Wedzicha, London. United States: A. Anzueto, San Antonio, TX; S. Braman, Providence, RI; R. Casaburi, Torrance CA; B. Celli, Boston; G. Giessel, Richmond, VA; M. Gotfried, Phoenix, AZ; G. Greenwald, Rancho Mirage, CA; N. Hanania, Houston; D. Mahler, Lebanon, NH; B. Make, Denver; S. Rennard, Omaha, NE; C. Rochester, New Haven, CT; P. Scanlon, Rochester, MN; D. Schuller, Omaha, NE; F. Sciurba, Pittsburgh; A. Sharafkhaneh, Houston; T. Siler, St. Charles, MO; E. Silverman, Boston; A. Wanner, Miami; R. Wise, Baltimore; R. ZuWallack, Hartford, CT. ECLIPSE Steering Committee: H. Coxson (Canada), C. Crim (GlaxoSmithKline, USA), L. Edwards (GlaxoSmithKline, USA), D. Lomas (UK), W. MacNee (UK), E. Silverman (USA), R. Tal Singer (Co-chair, GlaxoSmithKline, USA), J. Vestbo (Co-chair, Denmark), J. Yates (GlaxoSmithKline, USA). ECLIPSE Scientific Committee: A. Agusti (Spain), P. Calverley (UK), B. Celli (USA), C. Crim (GlaxoSmithKline, USA), B. Miller (GlaxoSmithKline, USA), W. MacNee (Chair, UK), S. Rennard (USA), R. Tal-Singer (GlaxoSmithKline, USA), E. Wouters (The Netherlands), J. Yates (GlaxoSmithKline, USA). COPDGene: NIH Grant Support and Disclaimer: The project described was supported by Award Number R01HL089897 and Award Number R01HL089856 from the National Heart, Lung, And Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, And Blood Institute or the National Institutes of Health.

COPD Foundation Funding: The COPDGene® project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, Sunovion, and GlaxoSmithKline

Page 39: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

COPDGene® Investigators – Core Units: Administrative Core: James Crapo, MD (PI), Edwin Silverman, MD, PhD (PI), Barry Make, MD, Elizabeth Regan, MD, PhD Genetic Analysis Core: Terri Beaty, PhD, Nan Laird, PhD, Christoph Lange, PhD, Michael Cho, MD, Stephanie Santorico, PhD, John Hokanson, MPH, PhD, Dawn DeMeo, MD, MPH, Nadia Hansel, MD, MPH, Craig Hersh, MD, MPH, Peter Castaldi, MD, MSc, Merry-Lynn McDonald, PhD, Emily Wan, MD, Megan Hardin, MD, Jacqueline Hetmanski, MS, Margaret Parker, MS, Marilyn Foreman, MD, Brian Hobbs, MD, Robert Busch, MD, Adel El-Bouiez, MD, Peter Castaldi, MD, Megan Hardin, MD, Dandi Qiao, PhD, Elizabeth Regan, MD, Eitan Halper-Stromberg, Ferdouse Begum, Sungho Won, Sharon Lutz, PhD. Imaging Core: David A Lynch, MB, Harvey O Coxson, PhD, MeiLan K Han, MD, MS, MD, Eric A Hoffman, PhD, Stephen Humphries MS, Francine L Jacobson, MD, Philip F Judy, PhD, Ella A Kazerooni, MD, John D Newell, Jr., MD, Elizabeth Regan, MD, James C Ross, PhD, Raul San Jose Estepar, PhD, Berend C Stoel, PhD, Juerg Tschirren, PhD, Eva van Rikxoort, PhD, Bram van Ginneken, PhD, George Washko, MD, Carla G Wilson, MS, Mustafa Al Qaisi, MD, Teresa Gray, Alex Kluiber, Tanya Mann, Jered Sieren, Douglas Stinson, Joyce Schroeder, MD, Edwin Van Beek, MD, PhD. PFT QA Core, Salt Lake City, UT: Robert Jensen, PhD. Data Coordinating Center and Biostatistics, National Jewish Health, Denver, CO: Douglas Everett, PhD, Anna Faino, MS, Matt Strand, PhD, Carla Wilson, MS. Epidemiology Core, University of Colorado Anschutz Medical Campus, Aurora, CO: John E. Hokanson, MPH, PhD, Gregory Kinney, MPH, PhD, Sharon Lutz, PhD, Kendra Young PhD, Katherine Pratte, MSPH, Lindsey Duca, MS COPDGene® Investigators – Clinical Centers: Ann Arbor VA: Jeffrey L. Curtis, MD, Carlos H. Martinez, MD, MPH, Perry G. Pernicano, MD. Baylor College of Medicine, Houston, TX: Nicola Hanania, MD, MS, Philip Alapat, MD, Venkata Bandi, MD, Mustafa Atik, MD, Aladin Boriek, PhD, Kalpatha Guntupalli, MD, Elizabeth Guy, MD, Amit Parulekar, MD, Arun Nachiappan, MD. Brigham and Women’s Hospital, Boston, MA: Dawn DeMeo, MD, MPH, Craig Hersh, MD, MPH, George Washko, MD, Francine Jacobson, MD, MPH. Columbia University, New York, NY: R. Graham Barr, MD, DrPH, Byron Thomashow, MD, John Austin, MD, Belinda D’Souza, MD, Gregory D.N. Pearson, MD, Anna Rozenshtein, MD, MPH, FACR. Duke University Medical Center, Durham, NC: Neil MacIntyre, Jr., MD, Lacey Washington, MD, H. Page McAdams, MD. Health Partners Research Foundation, Minneapolis, MN: Charlene McEvoy, MD, MPH, Joseph Tashjian, MD. Johns Hopkins University, Baltimore, MD: Robert Wise, MD, Nadia Hansel, MD, MPH, Robert Brown, MD, Karen Horton, MD, Nirupama Putcha, MD, MHS. Los Angeles Biomedical Research Institute at Harbor UCLA Medical Center, Torrance, CA: Richard Casaburi, PhD, MD, Alessandra Adami, PhD, Janos Porszasz, MD, PhD, Hans Fischer, MD, PhD, Matthew Budoff, MD, Harry Rossiter, PhD. Michael E. DeBakey VAMC, Houston, TX: Amir Sharafkhaneh, MD, PhD, Charlie Lan, DO. Minneapolis VA: Christine Wendt, MD, Brian Bell, MD. Morehouse School of Medicine, Atlanta, GA: Marilyn Foreman, MD, MS, Gloria Westney, MD, MS, Eugene Berkowitz, MD, PhD. National Jewish Health, Denver, CO: Russell Bowler, MD, PhD, David Lynch, MD. Reliant Medical Group, Worcester, MA: Richard Rosiello, MD, David Pace, MD. Temple University, Philadelphia, PA: Gerard Criner, MD, David Ciccolella, MD, Francis Cordova, MD, Chandra Dass, MD, Gilbert D’Alonzo, DO, Parag Desai, MD, Michael Jacobs, PharmD, Steven Kelsen, MD, PhD, Victor Kim, MD, A. James Mamary, MD, Nathaniel Marchetti, DO, Aditi Satti, MD, Kartik Shenoy, MD, Robert M. Steiner, MD, Alex Swift, MD, Irene Swift, MD, Maria Elena Vega-Sanchez, MD. University of Alabama, Birmingham, AL: Mark Dransfield, MD, William Bailey, MD, J. Michael Wells, MD, Surya Bhatt, MD, Hrudaya Nath, MD. University of California, San Diego, CA: Joe Ramsdell, MD, Paul Friedman, MD, Xavier Soler, MD, PhD, Andrew Yen, MD. University of Iowa, Iowa City, IA: Alejandro Cornellas, MD, John Newell, Jr., MD, Brad Thompson, MD. University of Michigan, Ann Arbor, MI: MeiLan Han, MD, Ella Kazerooni, MD, Carlos Martinez, MD. University of Minnesota, Minneapolis, MN: Joanne Billings, MD, Tadashi Allen, MD. University of Pittsburgh, Pittsburgh, PA: Frank Sciurba, MD, Divay Chandra, MD, MSc, Joel Weissfeld, MD, MPH, Carl Fuhrman, MD, Jessica Bon, MD. University of

Page 40: ONLINE DATA SUPPLEMENT DO TYPES REALLY EXIST?

Texas Health Science Center at San Antonio, San Antonio, TX: Antonio Anzueto, MD, Sandra Adams, MD, Diego Maselli-Caceres, MD, Mario E. Ruiz, MD

Lung Health Study: The principal investigators and senior staff of the clinical and coordinating centers, the NHLBI, and members of the Safety and Data Monitoring Board of the Lung Health Study are as follows: Case Western Reserve University, Cleveland, OH: M.D. Altose, M.D. (Principal Investigator), C.D. Deitz, Ph.D. (Project Coordinator); Henry Ford Hospital, Detroit, MI: M.S. Eichenhorn, M.D. (Principal Investigator), K.J. Braden, A.A.S. (Project Coordinator), R.L. Jentons, M.A.L.L.P. (Project Coordinator); Johns Hopkins University School of Medicine, Baltimore, MD: R.A. Wise, M.D. (Principal Investigator), C.S. Rand, Ph.D. (Co-Principal Investigator), K.A. Schiller (Project Coordinator); Mayo Clinic, Rochester, MN: P.D. Scanlon, M.D. (Principal Investigator), G.M. Caron (Project Coordinator), K.S. Mieras, L.C. Walters; Oregon Health Sciences University, Portland: A.S. Buist, M.D. (Principal Investigator), L.R. Johnson, Ph.D. (LHS Pulmonary Function Coordinator), V.J. Bortz (Project Coordinator); University of Alabama at Birmingham: W.C. Bailey, M.D. (Principal Investigator), L.B. Gerald, Ph.D., M.S.P.H. (Project Coordinator); University of California, Los Angeles: D.P. Tashkin, M.D. (Principal Investigator), I.P. Zuniga (Project Coordinator); University of Manitoba, Winnipeg: N.R. Anthonisen, M.D. (Principal Investigator, Steering Committee Chair), J. Manfreda, M.D. (Co-Principal Investigator), R.P. Murray, Ph.D. (Co-Principal Investigator), S.C. Rempel-Rossum (Project Coordinator); University of Minnesota Coordinating Center, Minneapolis: J.E. Connett, Ph.D. (Principal Investigator), P.L. Enright, M.D., P.G. Lindgren, M.S., P. O'Hara, Ph.D., (LHS Intervention Coordinator), M.A. Skeans, M.S., H.T. Voelker; University of Pittsburgh, Pittsburgh, PA: R.M. Rogers, M.D. (Principal Investigator), M.E. Pusateri (Project Coordinator); University of Utah, Salt Lake City: R.E. Kanner, M.D. (Principal Investigator), G.M. Villegas (Project Coordinator); Safety and Data Monitoring Board: M. Becklake, M.D., B. Burrows, M.D. (deceased), P. Cleary, Ph.D., P. Kimbel, M.D. (Chairperson; deceased), L. Nett, R.N., R.R.T. (former member), J.K. Ockene, Ph.D., R.M. Senior, M.D. (Chairperson), G.L. Snider, M.D., W. Spitzer, M.D. (former member), O.D. Williams, Ph.D.; Morbidity and Mortality Review Board: T.E. Cuddy, M.D., R.S. Fontana, M.D., R.E. Hyatt, M.D., C.T. Lambrew, M.D., B.A. Mason, M.D., D.M. Mintzer, M.D., R.B. Wray, M.D.; National Heart, Lung, and Blood Institute staff, Bethesda, MD: S.S. Hurd, Ph.D. (Former Director, Division of Lung Diseases), J.P. Kiley, Ph.D. (Former Project Officer and Director, Division of Lung Diseases), G. Weinmann, M.D. (Former Project Officer and Director, Airway Biology and Disease Program, DLD), M.C. Wu, Ph.D. (Division of Epidemiology and Clinical Applications).