online data supplement do types really exist?
TRANSCRIPT
1
ONLINE DATA SUPPLEMENT
DO “COPD SUBTYPES” REALLY EXIST?
Assessment of COPD Heterogeneity and Clustering Reproducibility in 17,154
Individuals Across Ten Independent Cohorts
Peter J Castaldi, MD, Marta Benet, MS, Hans Petersen, MS, Nicholas Rafaels, MS, James Finigan, MD, Matteo Paoletti, PhD, H. Marike Boezen, PhD, Judith M. Vonk, PhD, Russell Bowler, MD, Ph.D. , Massimo Pistolesi, MD, Milo A. Puhan, MD, PhD, Josep Anto, MD,Els Wauters, MD, Diether Lambrechts, PhD, Wim Janssens, MD, Francesca Bigazzi, MD, Gianna Camiciottoli, MD, Michael H Cho, MD, Craig P Hersh, MD, Kathleen Barnes, PhD, Stephen Rennard, MD, Meher Preethi Boorgula, MS, Jennifer Dy, PhD, Nadia H Hansel, James D Crapo, MD, Yohannes Tesfaigzi, PhD, Alvar Agusti, MD, Edwin K Silverman, MD, PhD, Judith Garcia-Aymerich, PhD
COHORT DESCRIPTIONS
STATISTICAL ANALYSIS
Cluster Analysis
Assessment of reproducibility of clustering methods
Identification of Severe Airflow Limitation and Moderate Airflow
Limitation Clusters
Clustering of More Comprehensive Feature Set in the COPDGene-
ECLIPSE Substudy
RESULTS
Table E1. Feature Importance Scores* from Unsupervised Random
Forests
Table E2. Characteristics of The Most Replicable Solution for CLIPCOPD
study.
2
Table E3. Characteristics of The Most Replicable Solution for COPDGene
study.
Table E4. Characteristics of The Most Replicable Solution for ECLIPSE
study.
Table E5. Characteristics of The Most Replicable Solution for
ICECOLDERIC study.
Table E6. Characteristics of The Most Replicable Solution for LEUVEN
study.
Table E7. Characteristics of The Most Replicable Solution for LifeLines
study.
Table E8. Characteristics of The Most Replicable Solution for Lovelace
study.
Table E9. Characteristics of The Most Replicable Solution for Lung
Health Study.
Table E10. Characteristics of The Most Replicable Solution for NJH
study.
Table E11. Characteristics of The Most Replicable Solution for PAC-
COPD study.
Table E12. Characteristics of Most Reproducible COPDGene Clusters
from the Clustering Substudy Limited to Subjects in GOLD Spirometric
Stage 2.
Table E13. Correlation Matrix for COPDGene Substudy Continuous
Clustering Variables (GOLD 2-4 subjects, N=4053)
3
Table E14. Correlation Matrix for ECLIPSE Substudy Clustering
Variables (GOLD 2-4 subjects, N=1611)
Figure E1. PCA Screeplots in Participating Cohorts.
Figure E2. Multi-dimensional Scaling (MDS) Visualization of Similarity
Matrix for All Cohorts
Figure E3. Reproducibility of Different Clustering Methods in the
COPDGene-ECLIPSE Substudy.
REFERENCES
ACKNOWLEDGEMENTS
4
COHORT DESCRIPTIONS
CLIP-COPD: CLIP-COPD is a single center observational study designed to assign
412 Caucasian patients with COPD to a predominant airway or predominant
parenchymal disease phenotype on the basis of CT densitometric changes. The
design of the study has been reported previously[1-3]. The aim of the study was to
establish a link between quantitative CT data on lung density and airway wall
thickening and clinical and whole pulmonary function evaluation. In particular, by
using a statistical approach allowing the classification of patients by large sets of
variables, avoiding a priori expectations about disease characteristics, we wanted to
ascertain whether the overall severity and the predominant type of the lung
pathologic changes quantitatively assessed by CT could be predicted by clinical and
pulmonary function data. Static and dynamic lung volumes and single breath
diffusing capacity of the lung for carbon monoxide (DLCO) were measured by a
mass-flow sensor and a multi-gas analyzer (V6200 Autobox Body Plethysmograph;
Sensor Medics, Yorba Linda, CA, USA) according to American Thoracic Society
(ATS)/ European Respiratory Society (ERS) guidelines, and expressed as
percentages of the predicted values. Patients enrolled in the study displayed an
obstructive pattern (FEV1/FVC<0.7) after the administration of 400 mcg of
salbutamol. Asthma and cardiovascular disease (defined as any one of the following
conditions: idiopathic arterial hypertension, ischemic heart disease, heart failure,
peripheral vascular disease) were ascertained by means of an interview, taking into
5
account also of records of medical diagnosis obtained with objective diagnostic
criteria and pharmacological therapy prescribed by general practitioners.
COPDGene: COPDGene is a multicenter, longitudinal study designed to investigate
the genetic and epidemiologic characteristics of COPD and other smoking-related
lung diseases. The design of the study has been reported previously[4]. Briefly,
10,192 smokers with a wide range of lung function were recruited into the
COPDGene Study from 2007 to 2011. Non-Hispanic white (NHW) and African-
American (AA) subjects between the ages of 45 and 80 with at least a ten pack-year
smoking history were enrolled. Spirometry was performed using an ultrasound-
based spirometer (NDD, EasyOne Spirometer Medizintechnik AG, Zurich,
Switzerland) before and after administration of short-acting β2-agonist (albuterol)
in accordance with ATS recommendations[5]. For this analysis, post-bronchodilator
spirometry values were used. Asthma and cardiovascular disease were obtained
from patient self-report, with cardiovascular disease consisting of a composite
definition of any of the following conditions (myocardial infarction, angina, coronary
artery disease, congestive heart failure, stroke, or peripheral vascular disease).
ECLIPSE: The ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive
Surrogate Endpoints) Study is a multicenter, longitudinal study with three-year
follow-up data available for 2,501 smoking subjects (2,164 subjects with COPD and
337 smoking controls). The detailed study protocol and inclusion criteria have been
previously published[6]. The recruitment criteria included an age between 40 to 75
6
years, a smoking history of ten or more pack-years, a forced expiratory volume in 1
second (FEV1) of less than 80% of predicted value after bronchodilator use, and a
ratio of FEV1 to forced vital capacity (FVC) of <0.7 after bronchodilator use. At
baseline, patients underwent standard spirometry after the administration of 400
μg of inhaled albuterol. Computed tomography (CT) scanning of the chest was
performed to evaluate the severity and distribution of emphysema. The patients’
self-reported respiratory symptoms, medications, smoking history, occupational
exposure, and coexisting medical conditions were documented at study entry with
the use of an updated version of the American Thoracic Society–Division of Lung
Disease (ATS-DLD) questionnaire[7].
ICE COLD ERIC: The International Collaborative Effort on Chronic Obstructive Lung
Disease: Exacerbation Risk Index Cohorts (ICE COLD ERIC) is an international multi-
site prospective cohort study with primary care patients with COPD from
Switzerland and the Netherlands. All included patients have provided written
informed consent. The study has been approved of by all local ethics committees
and is registered on www.ClinicalTrials.gov (NCT00706602). Detailed information
on the study design[8] and the baseline results[9] were published elsewhere.
At study enrollment (April 2008 to August 2009) patients had to be ≥40 years of
age, had Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages 2 to 4
(based on post-bronchodilator values after the administration of 400 μg of inhaled
albuterol) and free of exacerbation for at least 4 weeks at baseline. Follow-up
assessments took place every 6 months up to five years. The assessment of
7
comorbidities was done by experienced and well-trained study nurses or physicians
during the baseline visits of the cohort study, which took place at the primary care
practices. The patients were asked which comorbidities they had using open-ended
questions. The patients also brought a list with all drugs they were taking to the
baseline interview. The study nurses or physicians compared the patient-reported
comorbidities with the list of medications (and in Switzerland also with the patient
records) and clarified with the general practitioners any uncertainties or
mismatches between the patients’ reports, the drug list, or the patients’ obvious
heath condition. Cardiovascular disease included any symptomatic (e.g. coronary
heart disease or heart failure) or previous events but no risk factors such as arterial
hypertension or hypercholesterolemia.
Leuven: The LEUVEN cohort is composed of 548 subjects that were prospectively
recruited at the COPD outpatient clinic of University Hospital of Leuven
(Belgium)[10]. Inclusion criteria were a smoking history of at least 15 pack-years, a
minimum age of 50 years and the availability of a complete pulmonary function test.
All pulmonary function measurements were performed with standardized
equipment (Sensormedics Whole Body Plethysmograph, Viasys Healthcare,
Belgium) and according to ATS and ERS guidelines. Spirometric values were post-
bronchodilator measurements. Patients with suspicion or diagnosis of asthma were
excluded, as well as patients with other respiratory diseases affecting pulmonary
function. All COPD patients had a stable clinical condition with no exacerbation
within 6 weeks before inclusion. From all study subjects, an extensive list of
8
demographic variables (including age, gender, body mass index in kg/m2),
questionnaires determining smoking history, and a CT scan of the chest within one
year of enrollment were collected. Symptom level was assessed by using
the modified Medical Research Council (mMRC) dyspnea scale. Cardiovascular
disease (CVD) was self-reported and confirmed through medical record review, with
CVD consisting of a composite definition of any one of the following conditions:
ischemic heart disease, stroke or peripheral artery disease.
LifeLines: LifeLines is a three generational longitudinal general population based
cohort including 167,729 subjects from the three northern provinces of The
Netherlands[11,12]. During the LifeLines baseline visits (2007-2013) extensive
phenotyping has been performed: anthropometric measurements, ECG, lung
function tests, psychiatric interview, cognitive function tests, blood and urine
samples (now stored in the Biobank), and questionnaires on health, lifestyles, stress
and quality of life. The first round of follow-up, approximately 5 years after the
baseline visit, started in January 2014 and total follow-up duration in LifeLines will
be 30 years. For the current analyses only the baseline measurements were used.
We included subjects aged 40 years and over of self-reported Caucasian descent
with a FEV1/FVC pre-bronchodilator ratio lower than 70%. Asthma and
cardiovascular disease were obtained from the questionnaire; asthma was defined
as self-reported doctor’s diagnosed asthma ever, and cardiovascular disease was
defined as at least one cardiovascular event (self-reported heart attack,
cardiovascular surgery, or cerebrovascular accident).
9
Lovelace: The Lovelace Smokers Cohort (LSC) is a cohort of 2400 current and
former smokers in Albuquerque, NM. Since women are underrepresented in most
studies of airflow obstruction, this large cohort of primarily female ever-smokers
(approximately 80% women) was assembled to study the susceptibility of women
to the adverse effects of cigarette smoking. Details regarding this cohort have been
previously published. Enrollment was restricted to current and former smokers age
40–74 years with a minimum of ten pack-years of smoking[13,14]. An average of
four pre- and post-bronchodilator spirometry tests have been performed on each
subject by respiratory therapists who were periodically re-credentialed, as part of a
standardized laboratory proficiency testing plan[15]. The reference standards were
those from the National Health and Nutrition Examination Survey (NHANES) III
spirometric reference. A detailed questionnaire written in English was used to
collect information on demographics; medical, smoking, and exposure history;
socioeconomic status; and quality of life. Standardized questionnaires were also
administered at each examination to record each patient’s history of asthma and
cardiovascular disease. Cardiovascular disease was defined for this study as any
one of the following conditions: myocardial infarction, coronary artery disease,
congestive heart failure, stroke, or peripheral vascular disease.
Lung Health Study: The LHS was a multicenter (ten centers) clinical study in the US
and Canada. The initial study population consisted of 5,887 men and women (63%
male) who were current smokers (aged 35–60) with spirometric evidence of mild to
10
moderate airflow limitation[16]. Of these participants, 96% self-reported as
European American, 4,126 had clean genotype and phenotype data. 3,989 from this
subset had non-missing mMRC (dyspnea) values, and 3,132 had post bronchodilator
FEV1/FVC < 0.7. Thus 3,132 participants were included in this analysis. All
participants were randomized with equal probability into three groups: (i) smoking
intervention plus bronchodilator (ipratropium bromide); (ii) smoking intervention
plus placebo; or (iii) usual care. The primary outcomes were the rate of change and
the cumulative change in lung function (forced expiratory volume in one second
(FEV1)) over a 5-year period. Lung function was measured annually according to
ATS guidelines using identical spirometers, software, procedures and reading center
personnel. The quality of the spirometry was monitored centrally throughout the
testing, and comparison of baseline spirometry measures showed good
reproducibility. Post-bronchodilator spirometry values were used for the current
analysis. Asthma, chronic bronchitis, exacerbations, mMRC (dyspnea) and
cardiovascular disease were obtained from patient self-report, with cardiovascular
disease consisting of a composite definition of any one of the following conditions
(myocardial infarction, coronary artery disease, congestive heart failure or stroke).
Baseline exclusion criteria for enrollment included heart attack or stroke within the
past 2 years or other important medical conditions, including high blood pressure.
Individuals prescribed beta-blockers or nitrates were excluded from the study[16].
11
NJH: The National Jewish Health COPD cohort includes patients with COPD drawn
from the National Jewish Health pulmonary clinic. The cohort includes over 2000
patients. All patients have a forced expiratory volume in 1 second (FEV1) of less
than 80% of predicted value after bronchodilator (albuterol) nebulization, and a
ratio of FEV1 to forced vital capacity (FVC) of 0.7 or less after bronchodilator
(albuterol) nebulization. Standard evaluation of these patients includes extensive
phenotyping including race, smoking and other exposure history, other current or
previous diagnoses, body mass index, spirometry, CT scan of the chest and MMRC.
Asthma and cardiovascular disease were obtained by self report. Cardiovascular
disease was defined as the presence of coronary artery disease or heart attack, heart
failure, peripheral vascular disease or stroke. For the current analysis, 60 patients
with complete data for clustering variables were included.
PAC-COPD: The Phenotype and Course of COPD Study (PAC-COPD, Spain) is a
prospective longitudinal study of 342 COPD patients hospitalized for the first time
because of a COPD exacerbation in nine teaching hospitals in Spain between January
2004 and March 2006[17,18] Patients aged <45 years and those with cancer,
residual extensive tuberculosis lesions of more than one third of the pulmonary
parenchyma, pneumonectomy and/or pneumoconiosis were excluded. All
epidemiological, clinical, functional and biological data were collected when patients
had been clinically stable for ≥3 months after hospital discharge. The diagnosis of
COPD was established according to the American Thoracic Society/European
Respiratory Society definition of post-bronchodilator forced expiratory volume in 1
12
s (FEV1) to forced vital capacity ratio ≤0.70[19]. Post-bronchodilator spirometry
values were used for the current analysis. Sociodemographic data and smoking
status were obtained from epidemiological questionnaires. Dyspnea was assessed
by using the mMRC questionnaire, yielding a score ranging from zero to four. Weight
and height were obtained from a physical examination performed by a respiratory
physician participating in the study. Co-morbidities were defined as per doctor
diagnosis after physical examination and medical charts study. Cardiovascular
disease was defined for this study as any one of the following conditions: myocardial
infarction, coronary artery disease, congestive heart failure, stroke, or peripheral
vascular disease.
13
STATISTICAL ANALYSIS
Cluster Analysis: The clustering features were: FEV1 percent of predicted (based
on local equations), FVC percent of predicted, FEV1/FVC ratio, body mass index
(BMI), modified Medical Research Council (MMRC) dyspnea score (zero to four), and
self-reported asthma and cardiovascular disease diagnosis (defined as at least one
of the following: ischaemic heart disease, stroke, congestive heart failure or
peripheral vascular disease). The clustering process consisted of two distinct
stages: 1) variable prioritization and generation of a subject similarity matrix and 2)
cluster identification. Unsupervised random forests was used for feature (i.e.
variable) prioritization and generation of a subject similarity matrix. Briefly, this
method can be applied to mixed (i.e. continuous and categorical) data types, and
provides a data-driven approach to feature weighting and selection[20]. This
approach leverages a supervised learning method (i.e. random forests) to
discriminate between the actual data and permuted data drawn from the same data
distributions (in which correlations between features have been broken via a
permutation procedure). In this formulation, the actual and permuted data are
labeled and combined, and the random forests procedure is used to predict the
“actual” observations from the permuted ones. This leads to a natural weighting of
variables based on those that are most useful for discrimination between real and
permuted data, because these variables are selected more frequently in the random
forests procedure. Based on the resulting trees, a similarity matrix is constructed
14
based on the number of times pairs of observations co-occur within the same
terminal node.
To quantify the importance of specific variables, and additional permutation
procedure is performed within the subset of “actual” observations. Each feature is
individually permuted, and the impact of this permutation on overall prediction
performance (i.e. ability to successfully discriminate between actual and permuted
observations) is quantified. Thus, an importance of 0.05 for a given feature indicates
that permutation of this feature alone resulted in a 5% decrease in prediction
accuracy.
We used two approaches (k-medoids and hierarchical clustering) to generate
clusters from the subject similarity matrix. Both are standard methods frequently
applied in clustering problems. For k-medoids, the range of number of clusters, k,
was evaluated between two and ten, resulting in nine distinct k-medoids cluster
solutions in each cohort. For hierarchical clustering, we used the dynamic tree cut
algorithm to identify the number of clusters, as described in detail by Langfelder
and Horvath[21]. The dynamic tree cut algorithm uses specific similarity criteria to
identify subjects that are not sufficiently similar to other members in their assigned
cluster. These subjects are then reassigned to a miscellaneous group of “poorly
clustered” subjects. In subsequent reproducibility analyses, these results were
analysed with and without these “poorly clustered” individuals.
In our study, the dynamic tree cut algorithm resulted in 14 distinct cluster
solutions per cohort[21]. In subsequent analysis of these hierarchical solutions,
cross-cohort concordance was assessed both including and excluding unclustered
15
subjects. In total, 23 clustering solutions (nine from k-medoids and 14 from
hierarchical) were generated in each of the ten cohorts, giving a total of 230
solutions. All analyses were performed in R (v3.1.0).
Assessment of reproducibility of clustering methods: In this section, for each
cluster solution we distinguish between the dataset in which the clustering model
was identified, i.e. the source dataset, and the nine other replication datasets to
which the clustering model was transferred to “re-generate” cluster assignments.
For each cluster solution within each given dataset, referred to as the source
solution, a predictive model using the input features was trained using supervised
random forests (as implemented in the randomForest R package) in the source
dataset[22]. This model was then applied in each of the nine replication datasets to
assign subjects to the clusters identified in the source solution, thereby
“transferring” a cluster solution from the source to the replication dataset; such re-
generated solutions are referred to as the transferred solutions (see Figure 1). In
order to assess reproducibility of cluster solutions, we calculated the normalized
mutual information (NMI)[23] between the 23 source solutions and 207 (230 minus
23) transferred solutions. The NMI is a measure of concordance ranging from zero
to one, where higher values can be interpreted as better replication (i.e. greater
agreement between the source and transferred solutions). Then, for each
participating study, the source solution with the highest average NMI across the
nine other studies was identified, resulting in a single “best NMI” (i.e., most
reproducible) solution for each dataset (i.e. ten best NMI solutions).
16
Identification of Severe Airflow Limitation and Moderate Airflow Limitation
Clusters: A recent meta-analysis by Pinto et al. qualitatively determined that two
subtypes appeared to be common across multiple COPD clustering analyses, based
on comparison of cluster characteristics across studies[24]. The first subtype was
described as “younger with severe respiratory disease, having a low probability of
cardiovascular co-morbidities, high prevalence of poor nutritional status and poor
health status with poor longitudinal health outcomes” and the second subtype was
defined as having “moderate respiratory disease, and a high prevalence of obesity,
and increased prevalence of cardiovascular and metabolic co- morbidities and
inflammatory markers.” Adapting these definitions to the cluster variables used in
our analysis, we examined the best-NMI solution from each cohort to determine
whether any clusters met the following criteria: 1) low FEV1 (<45% predicted), low
BMI (<27), and high MMRC score (>1) or 2) moderately reduced FEV1
(45<FEV1<80), high BMI (>29), and high MMRC score (>1).
Clustering of More Comprehensive Feature Set in the COPDGene-ECLIPSE
Substudy: Because the set of COPD-related features common to all 10 datasets was
limited, we performed a separate analysis in the COPDGene and ECLIPSE studies
using the same clustering methods but with a more extensive set of features. In
addition to the seven features used in the primary analysis, the additional clustering
features were: LAA950 (the percentage of voxels on the inspiratory chest CT scan
with a density below 950 Hounsfield units), Pi10 (estimated airway wall thickness
17
for a 10mm internal perimeter airway), the Saint George’s Respiratory
Questionnaire total score, chronic bronchitis (presence of cough and phlegm for ≥ 3
months a year for at least 2 consecutive years), and respiratory exacerbations (self-
reported change in respiratory symptoms requiring either antibiotics or oral
steroids). The clustering methods and assessment of replication were the same as in
the primary study.
18
Table E1. Feature Importance Scores* from Unsupervised Random Forests
CLIPCOPD COPDGene ECLIPSE ICECOLDERIC LEUVEN LifeLines Lovelace
Lung
Health
Study NJH
PAC-
COPD
FEV1 (% predicted) 0.13 0.27 0.18 0.12 0.15 0.22 0.16 0.22 0.04 0.12
FEV1/FVC (%) 0.07 0.2 0.13 0.09 0.12 0.09 0.08 0.15 <0.01 0.08
FVC (% predicted) 0.09 0.2 0.12 0.08 0.09 0.19 0.13 0.21 0.03 0.07
BMI (kg/m2) <0.01 0.01 0.01 <0.01 0.01 0.01 <0.01 <0.01 -0.01 0.02
MMRC (0-4) 0.01 0.03 0.01 0.01 0.01 0.01 0.01 <0.01 <0.01 0.01
Asthma <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01
CVD <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01 <0.01
* Feature Importance Scores are mean decrease in prediction accuracy when each feature is permuted, as estimated by the unsupervised random
forests procedure.
19
Table E2. Characteristics of The Most Replicable Solution for CLIPCOPD study.
Cluster 1 Cluster 2 Cluster 3 Cluster 4
N 113 144 58 52
FEV1 (% predicted) 65 (13) 42 (12) 96 (16) 86 (9)
FEV1/FVC (%) 51 (10) 48 (11) 67 (2) 59 (3)
FVC (% predicted) 103 (19) 71 (16) 114 (17) 114 (13)
BMI (kg/m2) 27 (6) 25 (4) 26 (4) 28 (4)
MMRC (0-4) 2.1 (0.9) 2.3 (1.0) 1.4 (0.9) 2.0 (0.9)
Asthma, % 2 1 0 0
CVD, % 49 36 57 46
Values are mean (SE) or median (IQR) unless otherwise indicated.
20
Table E3. Characteristics of The Most Replicable Solution for COPDGene study.
Cluster 1 Cluster 2 Cluster 3
N 2660 931 880
FEV1 (% predicted) 58 (15) 86 (10) 27 (7)
FEV1/FVC (%) 54 (11) 64 (4) 35 (8)
FVC (% predicted) 83 (17) 102 (12) 59 (12)
BMI (kg/m2) 29 (6) 27 (5) 26 (6)
MMRC (0-4) 2.1 (1.4) 0.2 (0.4) 3.2 (0.7)
Asthma, % 24 13 27
CVD, % 22 13 23
Values are mean (SE) or median (IQR) unless otherwise indicated.
21
Table E4. Characteristics of The Most Replicable Solution for ECLIPSE study.
Cluster 1 Cluster 2 Cluster 3
N 1654 250 190
FEV1 (% predicted) 44 (13) 25 (5) 67 (8)
FEV1/FVC (%) 46 (11) 30 (4) 55 (6)
FVC (% predicted) 80 (20) 64 (12) 100 (10)
BMI (kg/m2) 27 (6) 24 (4) 25 (4)
MMRC (0-4) 1.7 (1.0) 2.5 (0.8) 0.5 (0.5)
Asthma, % 22 26 22
CVD, % 23 20 16
Values are mean (SE) or median (IQR) unless otherwise indicated.
22
Table E5. Characteristics of The Most Replicable Solution for ICECOLDERIC study.
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9 Cluster 10
N 67 90 51 36 32 31 30 24 21 21
FEV1 (% predicted) 58 (15) 71(6) 28 (7) 37 (8) 67 (6) 50 (4) 64 (6) 50 (5) 61 (5) 55 (6)
FEV1/FVC (%) 56 (10) 63 (4) 38 (10) 38 (8) 48 (5) 40 (5) 54 (5) 51 (4) 55 (4) 64 (3)
FVC (% predicted) 87 (23) 92 (11) 63 (15) 78 (12) 115 (10) 102 (12) 98 (6) 80 (8) 92 (10) 67 (6)
BMI (kg/m2) 28 (6) 29 (6) 23 (3) 23 (4) 27 (4) 25 (3) 22 (2) 26 (5) 28 (2) 24 (3)
MMRC (0-4) 1.9 (1.6) 1.3 (1.3) 2.5 (1.3) 2.5 (1.3) 1.8 (1.4) 1.9 (1.5) 2.1 (1.7) 1.6 (1.5) 2.2 (1.3) 1.3 (1.1)
Asthma, % 4 1 2 3 6 6 3 0 14 5
CVD, % 22 21 14 17 22 16 17 25 24 24
Values are mean (SE) or median (IQR) unless otherwise indicated.
23
Table E6. Characteristics of The Most Replicable Solution for LEUVEN study.
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8
N 80 132 95 60 56 46 42 37
FEV1 (% predicted) 43 (10) 75 (12) 30 (6) 51 (7) 56 (5) 51 (3) 26 (5) 41 (3)
FEV1/FVC (%) 40 (11) 57 (8) 38 (7) 58 (7) 47 (3) 39 (3) 29 (4) 36 (3)
FVC (% predicted) 86 (16) 106 (15) 63 (10) 69 (8) 94 (9) 103 (10) 73 (15) 89 (5)
BMI (kg/m2) 24 (5) 26 (4) 24 (5) 30 (6) 23 (3) 26 (5) 21 (3) 22 (2)
MMRC (0-4) 2.1 (1.1) 1.4 (1.1) 2.4 (1.1) 2.1 (0.9) 1.3 (0.9) 1.9 (0.9) 2.3 (1.4) 2.1 (1.2)
Asthma, % 0 0 0 0 0 0 0 0
CVD, % 35 43 37 45 41 39 19 35
Values are mean (SE) or median (IQR) unless otherwise indicated.
24
Table E7. Characteristics of The Most Replicable Solution for LifeLines study.
Cluster 1 Cluster 2 Cluster 3 Cluster 4
N 3748 791 380 279
FEV1 (% predicted) 88 (11) 110 (8) 98 (3) 59 (10)
FEV1/FVC (%) 65 (5) 67 (2) 67 (2) 54 (9)
FVC (% predicted) 113 (13) 136 (9) 120 (3) 89 (11)
BMI (kg/m2) 26 (4) 25 (2) 24 (2) 27 (4)
MMRC (0-4) 0.4 (0.7) 0 (0) 0 (0) 0.9 (1.2)
Asthma, % 15 0 0 35
CVD, % 55 1 0 10
Values are mean (SE) or median (IQR) unless otherwise indicated.
25
Table E8. Characteristics of The Most Replicable Solution for Lovelace study.
Cluster 1 Cluster 2
N 350 189
FEV1 (% predicted) 63 (15) 91 (9)
FEV1/FVC (%) 57 (10) 65 (3)
FVC (% predicted) 85 (15) 107 (11)
BMI (kg/m2) 27 (7) 26 (4)
MMRC (0-4) 1.6 (1.3) 0.9 (0.9)
Asthma, % 31 22
CVD, % 29 18
Values are mean (SE) or median (IQR) unless otherwise
indicated.
26
Table E9. Characteristics of The Most Replicable Solution for Lung Health Study.
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5
N 2435 200 168 165 164
FEV1 (% predicted) 76 (7) 60 (4) 88 (2) 91 (3) 84 (1)
FEV1/FVC (%) 63 (5) 57 (4) 69 (1) 67 (2) 67 (1)
FVC (% predicted) 95 (11) 84 (5) 101 (2) 108 (3) 99 (2)
BMI (kg/m2) 25 (4) 26 (5) 26 (3) 25 (4) 25 (3)
MMRC (0-4) 0.5 (0.7) 0.7 (0.9) 0.4 (0.6) 0.3 (0.6) 0.3 (0.6)
Asthma, % 8 6 4 6 9
CVD, % 2 1 2 2 1
Values are mean (SE) or median (IQR) unless otherwise indicated.
27
Table E10. Characteristics of The Most Replicable Solution for NJH study.
Cluster
1
Cluster
2
Cluster
3
Cluster
4
Cluster
5
Cluster
6
Cluster
7
Cluster
8
Cluster
9
Cluster
10
Cluster
11
Cluster
12
Cluster
13
N 7 7 6 6 5 5 4 4 4 3 3 3 3
FEV1 (% predicted) 42 (5) 39 (7) 54 (6) 23 (3) 29 (4) 24 (3) 66 (2) 63 (6) 34 (2) 18 (2) 37 (2) 24 (4) 28 (6)
FEV1/FVC (%) 49 (6) 53 (6) 65 (7) 40 (2) 60 (7) 51 (8) 62 (2) 64 (4) 64 (3) 45 (4) 53 (3) 60 (4) 36 (2)
FVC (% predicted) 80 (8) 74 (9) 75 (8) 52 (4) 46 (2) 42 (3) 102
(4)
88
(12) 54 (3)
39
(12) 62 (3) 48 (2)
72
(13)
BMI (kg/m2) 26 (2) 18 (4) 32 (3) 21 (4) 42 (9) 21 (3) 25 (1) 40 (8) 23 (5) 30 (1) 28 (1) 28 (2) 31 (9)
MMRC (0-4) 2.7
(0.5)
3.3
(0.5)
1.7
(1.4)
2.8
(0.8)
3.4
(0.6)
3.4
(0.6)
1.5
(1.3)
3.3
(0.5)
3.0
(0.8)
3.7
(0.6) 3.0 (0) 3.0 (0)
2.7
(0.6)
Asthma, % 0 0 33 0 0 0 25 0 25 0 0 0 0
CVD, % 29 29 0 50 0 20 50 25 25 0 67 0 0
Values are mean (SE) or median (IQR) unless otherwise indicated.
28
Table E11. Characteristics of The Most Replicable Solution for PAC-COPD study.
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6
N 136 (136) 58 (58) 47 (47) 45 (45) 25 (25) 23 (23)
FEV1 (% predicted) 51 (8) 32 (8) 42 (5) 64 (4) 78 (6) 81 (10)
FEV1/FVC (%) 56 (8) 41 (8) 39 (7) 66 (4) 65 (5) 62 (5)
FVC (% predicted) 70 (17) 58 (9) 80 (8) 72 (5) 89 (6) 96 (13)
BMI (kg/m2) 29 (5) 25 (4) 26 (4) 32 (3) 32 (3) 25 (3)
MMRC (0-4) 1.7 (1.3) 1.9 (1.3) 2.2 (1.2) 1.4 (0.8) 1.1 (1.0) 1.2 (1.2)
Asthma, % 65 79 77 64 36 70
CVD, % 33 7 17 31 28 22
Values are mean (SE) or median (IQR) unless otherwise indicated.
29
Table E12. Characteristics of the Most Reproducible COPDGene Clusters from the Clustering Substudy Limited to
Subjects in GOLD Spirometric Stage 2.
Cluster 1 Cluster 2 Cluster 3 Cluster 4
N 1151 75 59 52
Age (years) 63 (9) 61 (9) 67 (8) 61 (8)
Pack Years, 51 (28) 45 (21) 56 (28) 55 (27)
Sex: female, %, 47 39 44 54
Ethnicity: African-American, % 27 20 13 22
FEV1 (% Predicted) 65 (8) 74 (4) 56 (4) 59 (6)
FEV1/FVC (%) 59 (7) 64 (4) 45 (6) 60 (6)
FVC (% predicted) 85 (13) 90 (7) 94 (14) 75 (6)
BMI (kg/m2) 29 (6) 27 (5) 26 (4) 33 (7)
Pi10 3.70 (0.14) 3.64 (0.11) 3.68 (0.11) 3.75 (0.15)
LAA950 6.0 (6.4) 4.0 (4.7) 19.7 (8.4) 4.7 (4.6)
SGRQ 33.8 (19.8) 10.7 (8.6) 40.4 (17.6) 60.7 (13.6)
MMRC (0-4) 1.7 (1.3) 0.20 (0.40) 2.1 (1.2) 3.2 (0.7)
Exacerbations 0.54 (1.02) 0 (0.06) 0.64 (0.96) 1.72 (1.53)
Chronic Bronchitis, % 24 1 22 63
CVD, % 20 14 17 35
Asthma, % 27 12 21 38
Cluster 1 - cluster for unassigned subjects as determined by the clustering algorithm.
Pi10 - estimated wall area of a 10mm internal perimeter airway.
LAA950 - quantitative emphysema from chest computed tomography defined as percentage of lung voxels with density less than 950 Hounsfield units.
SGRQ - total score from Saint George's Respiratory Questionnaire.
MMRC - score on Modified Medical Research Council dyspnea questionnaire.
Exacerbations - number of reported respiratory exacerbations over the previous 12 months.
CVD - self reported history of coronary artery disease, myocardial infarction, congestive heart failure, peripheral vascular disease, or stroke.
30
Asthma - self-reported history of asthma.
Table E13. Correlation Matrix for COPDGene Substudy Continuous Clustering
Variables (GOLD 2-4 subjects, N=4053)
FEV1 FEV1/FVC FVC BMI Pi10 LAA950 SGRQ
FEV1 1 0.82 0.82 0.06 -0.36 -0.56 -0.54
FEV1/FVC 0.82 1 0.38 0.21 -0.21 -0.71 -0.44
FVC 0.82 0.38 1 -0.08 -0.36 -0.26 -0.43
BMI 0.06 0.21 -0.08 1 0.11 -0.31 0.08
Pi10 -0.36 -0.21 -0.36 0.11 1 -0.04 0.31
LAA950 -0.56 -0.71 -0.26 -0.31 -0.04 1 0.29
SGRQ -0.54 -0.44 -0.43 0.08 0.31 0.29 1
Table E14. Correlation Matrix for ECLIPSE Substudy Continuous Clustering Variables
(GOLD 2-4 subjects, N=1611)
FEV1 FEV1/FVC FVC BMI Pi10 LAA950 SGRQ
FEV1 1 0.71 0.65 0.13 -0.11 -0.47 -0.37
FEV1/FVC 0.71 1 0.04 0.26 0.06 -0.61 -0.18
FVC 0.65 0.04 1 -0.07 -0.25 -0.05 -0.33
BMI 0.13 0.26 -0.07 1 0.29 -0.25 0.05
Pi10 -0.11 0.06 -0.25 0.29 1 -0.17 0.15
LAA950 -0.47 -0.61 -0.05 -0.25 -0.17 1 0.19
SGRQ -0.37 -0.18 -0.33 0.05 0.15 0.19 1
Figure E1. PCA Screeplots in Participating Cohorts.
Figure E2. Multi-dimensional Scaling (MDS) Visualization of Similarity Matrix for All
Cohorts (GOLD 1 = blue, GOLD 2 = orange, GOLD 3 = green, GOLD 4 = red)
Figure E3. Reproducibility of Different Clustering Methods in the COPDGene-ECLIPSE Substudy. Distribution of normalized mutual information (NMI) is shown for clustering with partitioning around medoids (PAM), hierarchical clustering including unclassified subjects (HC_U), and hierarchical clustering excluding unclassified subjects (HC). Results are shown for clustering in spirometric GOLD 2-4 subjects (Panel A) and GOLD 2 only (Panel B).
REFERENCES 1 Pistolesi M, Camiciottoli G, Paoletti M, et al. Identification of a predominant COPD
phenotype in clinical practice. Respiratory Medicine 2008;102:367–76. doi:10.1016/j.rmed.2007.10.019
2 Paoletti M, Camiciottoli G, Meoli E, et al. Explorative data analysis techniques and unsupervised clustering methods to support clinical assessment of Chronic Obstructive Pulmonary Disease (COPD) phenotypes. Journal of Biomedical Informatics 2009;42:1013–21. doi:10.1016/j.jbi.2009.05.008
3 Camiciottoli G, Bigazzi F, Paoletti M, et al. Pulmonary function and sputum characteristics predict computed tomography phenotype and severity of COPD. European Respiratory Journal 2013;42:626–35. doi:10.1183/09031936.00133112
4 Regan EA, Silverman E, Hokanson JE, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD 2010;7:32–43. doi:10.3109/15412550903499522
5 Miller MR, Hankinson J, Brusasco V, et al. Standardisation of spirometry. Eur. Respir. J. 2005;26:319–38. doi:10.1183/09031936.05.00034805
6 Vestbo J, Anderson W, Coxson HO, et al. Evaluation of COPD Longitudinally to Identify Predictive Surrogate End-points (ECLIPSE). European Respiratory Journal 2008;31:869–73. doi:10.1183/09031936.00111707
7 Comstock GW, Tockman MS, Helsing KJ, et al. Standardized respiratory questionnaires: comparison of the old with the new. Am Rev Respir Dis 1979;119:45–53. doi:10.1164/arrd.1979.119.1.45
8 Siebeling L, Puhan MA, Muggensturm P, et al. Characteristics of Dutch and Swiss primary care COPD patients - baseline data of the ICE COLD ERIC study. Clin Epidemiol 2011;3:273–83. doi:10.2147/CLEP.S24818
9 Siebeling L, Riet ter G, van der Wal WM, et al. ICE COLD ERIC--International collaborative effort on chronic obstructive lung disease: exacerbation risk index cohorts--study protocol for an international COPD cohort study. BMC Pulm Med 2009;9:15. doi:10.1186/1471-2466-9-15
10 Wauters E, Smeets D, Coolen J, et al. The TERT-CLPTM1L locus for lung cancer predisposes to bronchial obstruction and emphysema. European Respiratory Journal 2011;38:924–31. doi:10.1183/09031936.00187110
11 Scholtens S, Smidt N, Swertz MA, et al. Cohort Profile: LifeLines, a three-generation cohort study and biobank. International Journal of Epidemiology 2015;44:1172–80. doi:10.1093/ije/dyu229
12 Stolk RP, Rosmalen JGM, Postma DS, et al. Universal risk factors for multifactorial diseases. Eur J Epidemiol 2008;23:67–74. doi:10.1007/s10654-007-9204-4
13 Bruse S, Sood A, Petersen H, et al. New Mexican Hispanic smokers have lower odds of chronic obstructive pulmonary disease and less decline in lung function than non-Hispanic whites. American Journal of Respiratory and Critical Care Medicine 2011;184:1254–60. doi:10.1164/rccm.201103-0568OC
14 Hunninghake GM, Cho M, Tesfaigzi Y, et al. MMP12, lung function, and COPD in high-risk populations. N Engl J Med 2009;361:2599–608. doi:10.1056/NEJMoa0904006
15 Pellegrino R, Decramer M, van Schayck CPO, et al. Quality control of spirometry: a lesson from the BRONCUS trial. Eur Respir J 2005;26:1104–9. doi:10.1183/09031936.05.00026705
16 Connett JE, Kusek JW, Bailey WC, et al. Design of the Lung Health Study: a randomized clinical trial of early intervention for chronic obstructive pulmonary disease. Control Clin Trials 1993;14:3S–19S.
17 Garcia-Aymerich J, Gómez FP, Benet M, et al. Identification and prospective validation of clinically relevant chronic obstructive pulmonary disease (COPD) subtypes. Thorax 2011;66:430–7. doi:10.1136/thx.2010.154484
18 Garcia-Aymerich J, Gómez FP, Antó JM, et al. [Phenotypic characterization and course of chronic obstructive pulmonary disease in the PAC-COPD Study: design and methods]. Arch Bronconeumol 2009;45:4–11. doi:10.1016/j.arbres.2008.03.001
19 Celli BR, MacNee W, ATS/ERS Task Force. Standards for the diagnosis and treatment of patients with COPD: a summary of the ATS/ERS position paper. Eur. Respir. J. 2004;23:932–46.
20 Shi T, Horvath S. Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics 2012;15:118–38. doi:10.1198/106186006X94072
21 Langfelder P, Zhang B, Horvath S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 2008;24:719–20. doi:10.1093/bioinformatics/btm563
22 Breiman L. Random Forests. Machine Learning 2001;45:5–32. doi:10.1023/A:1010933404324
23 Strehl A, Ghosh J. Cluster ensembles --- a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research 2003;3:583–617. doi:10.1162/153244303321897735
24 Pinto LM, Alghamdi M, Benedetti A, et al. Derivation and validation of clinical
phenotypes for COPD: a systematic review. Respiratory Research 2015;16:50. doi:10.1186/s12931-015-0208-4
ACKNOWLEDGEMENTS The authors acknowledge Dr. Steve Horvath and Dr. Peter Langfelder for helpful email correspondence. ECLIPSE: ECLIPSE Investigators — Bulgaria: Y. Ivanov, Pleven; K. Kostov, Sofia. Canada: J. Bourbeau, Montreal; M. Fitzgerald, Vancouver, BC; P. Hernandez, Halifax, NS; K. Killian, Hamilton, ON; R. Levy, Vancouver, BC; F. Maltais, Montreal; D. O'Donnell, Kingston, ON. Czech Republic: J. Krepelka, Prague. Denmark: J. Vestbo, Hvidovre. The Netherlands: E. Wouters, Horn-Maastricht. New Zealand: D. Quinn, Wellington. Norway: P. Bakke, Bergen. Slovenia: M. Kosnik, Golnik. Spain: A. Agusti, J. Sauleda, P. de Mallorca. Ukraine: Y. Feschenko, V. Gavrisyuk, L. Yashina, Kiev; N. Monogarova, Donetsk. United Kingdom: P. Calverley, Liverpool; D. Lomas, Cambridge; W. MacNee, Edinburgh; D. Singh, Manchester; J. Wedzicha, London. United States: A. Anzueto, San Antonio, TX; S. Braman, Providence, RI; R. Casaburi, Torrance CA; B. Celli, Boston; G. Giessel, Richmond, VA; M. Gotfried, Phoenix, AZ; G. Greenwald, Rancho Mirage, CA; N. Hanania, Houston; D. Mahler, Lebanon, NH; B. Make, Denver; S. Rennard, Omaha, NE; C. Rochester, New Haven, CT; P. Scanlon, Rochester, MN; D. Schuller, Omaha, NE; F. Sciurba, Pittsburgh; A. Sharafkhaneh, Houston; T. Siler, St. Charles, MO; E. Silverman, Boston; A. Wanner, Miami; R. Wise, Baltimore; R. ZuWallack, Hartford, CT. ECLIPSE Steering Committee: H. Coxson (Canada), C. Crim (GlaxoSmithKline, USA), L. Edwards (GlaxoSmithKline, USA), D. Lomas (UK), W. MacNee (UK), E. Silverman (USA), R. Tal Singer (Co-chair, GlaxoSmithKline, USA), J. Vestbo (Co-chair, Denmark), J. Yates (GlaxoSmithKline, USA). ECLIPSE Scientific Committee: A. Agusti (Spain), P. Calverley (UK), B. Celli (USA), C. Crim (GlaxoSmithKline, USA), B. Miller (GlaxoSmithKline, USA), W. MacNee (Chair, UK), S. Rennard (USA), R. Tal-Singer (GlaxoSmithKline, USA), E. Wouters (The Netherlands), J. Yates (GlaxoSmithKline, USA). COPDGene: NIH Grant Support and Disclaimer: The project described was supported by Award Number R01HL089897 and Award Number R01HL089856 from the National Heart, Lung, And Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, And Blood Institute or the National Institutes of Health.
COPD Foundation Funding: The COPDGene® project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, Siemens, Sunovion, and GlaxoSmithKline
COPDGene® Investigators – Core Units: Administrative Core: James Crapo, MD (PI), Edwin Silverman, MD, PhD (PI), Barry Make, MD, Elizabeth Regan, MD, PhD Genetic Analysis Core: Terri Beaty, PhD, Nan Laird, PhD, Christoph Lange, PhD, Michael Cho, MD, Stephanie Santorico, PhD, John Hokanson, MPH, PhD, Dawn DeMeo, MD, MPH, Nadia Hansel, MD, MPH, Craig Hersh, MD, MPH, Peter Castaldi, MD, MSc, Merry-Lynn McDonald, PhD, Emily Wan, MD, Megan Hardin, MD, Jacqueline Hetmanski, MS, Margaret Parker, MS, Marilyn Foreman, MD, Brian Hobbs, MD, Robert Busch, MD, Adel El-Bouiez, MD, Peter Castaldi, MD, Megan Hardin, MD, Dandi Qiao, PhD, Elizabeth Regan, MD, Eitan Halper-Stromberg, Ferdouse Begum, Sungho Won, Sharon Lutz, PhD. Imaging Core: David A Lynch, MB, Harvey O Coxson, PhD, MeiLan K Han, MD, MS, MD, Eric A Hoffman, PhD, Stephen Humphries MS, Francine L Jacobson, MD, Philip F Judy, PhD, Ella A Kazerooni, MD, John D Newell, Jr., MD, Elizabeth Regan, MD, James C Ross, PhD, Raul San Jose Estepar, PhD, Berend C Stoel, PhD, Juerg Tschirren, PhD, Eva van Rikxoort, PhD, Bram van Ginneken, PhD, George Washko, MD, Carla G Wilson, MS, Mustafa Al Qaisi, MD, Teresa Gray, Alex Kluiber, Tanya Mann, Jered Sieren, Douglas Stinson, Joyce Schroeder, MD, Edwin Van Beek, MD, PhD. PFT QA Core, Salt Lake City, UT: Robert Jensen, PhD. Data Coordinating Center and Biostatistics, National Jewish Health, Denver, CO: Douglas Everett, PhD, Anna Faino, MS, Matt Strand, PhD, Carla Wilson, MS. Epidemiology Core, University of Colorado Anschutz Medical Campus, Aurora, CO: John E. Hokanson, MPH, PhD, Gregory Kinney, MPH, PhD, Sharon Lutz, PhD, Kendra Young PhD, Katherine Pratte, MSPH, Lindsey Duca, MS COPDGene® Investigators – Clinical Centers: Ann Arbor VA: Jeffrey L. Curtis, MD, Carlos H. Martinez, MD, MPH, Perry G. Pernicano, MD. Baylor College of Medicine, Houston, TX: Nicola Hanania, MD, MS, Philip Alapat, MD, Venkata Bandi, MD, Mustafa Atik, MD, Aladin Boriek, PhD, Kalpatha Guntupalli, MD, Elizabeth Guy, MD, Amit Parulekar, MD, Arun Nachiappan, MD. Brigham and Women’s Hospital, Boston, MA: Dawn DeMeo, MD, MPH, Craig Hersh, MD, MPH, George Washko, MD, Francine Jacobson, MD, MPH. Columbia University, New York, NY: R. Graham Barr, MD, DrPH, Byron Thomashow, MD, John Austin, MD, Belinda D’Souza, MD, Gregory D.N. Pearson, MD, Anna Rozenshtein, MD, MPH, FACR. Duke University Medical Center, Durham, NC: Neil MacIntyre, Jr., MD, Lacey Washington, MD, H. Page McAdams, MD. Health Partners Research Foundation, Minneapolis, MN: Charlene McEvoy, MD, MPH, Joseph Tashjian, MD. Johns Hopkins University, Baltimore, MD: Robert Wise, MD, Nadia Hansel, MD, MPH, Robert Brown, MD, Karen Horton, MD, Nirupama Putcha, MD, MHS. Los Angeles Biomedical Research Institute at Harbor UCLA Medical Center, Torrance, CA: Richard Casaburi, PhD, MD, Alessandra Adami, PhD, Janos Porszasz, MD, PhD, Hans Fischer, MD, PhD, Matthew Budoff, MD, Harry Rossiter, PhD. Michael E. DeBakey VAMC, Houston, TX: Amir Sharafkhaneh, MD, PhD, Charlie Lan, DO. Minneapolis VA: Christine Wendt, MD, Brian Bell, MD. Morehouse School of Medicine, Atlanta, GA: Marilyn Foreman, MD, MS, Gloria Westney, MD, MS, Eugene Berkowitz, MD, PhD. National Jewish Health, Denver, CO: Russell Bowler, MD, PhD, David Lynch, MD. Reliant Medical Group, Worcester, MA: Richard Rosiello, MD, David Pace, MD. Temple University, Philadelphia, PA: Gerard Criner, MD, David Ciccolella, MD, Francis Cordova, MD, Chandra Dass, MD, Gilbert D’Alonzo, DO, Parag Desai, MD, Michael Jacobs, PharmD, Steven Kelsen, MD, PhD, Victor Kim, MD, A. James Mamary, MD, Nathaniel Marchetti, DO, Aditi Satti, MD, Kartik Shenoy, MD, Robert M. Steiner, MD, Alex Swift, MD, Irene Swift, MD, Maria Elena Vega-Sanchez, MD. University of Alabama, Birmingham, AL: Mark Dransfield, MD, William Bailey, MD, J. Michael Wells, MD, Surya Bhatt, MD, Hrudaya Nath, MD. University of California, San Diego, CA: Joe Ramsdell, MD, Paul Friedman, MD, Xavier Soler, MD, PhD, Andrew Yen, MD. University of Iowa, Iowa City, IA: Alejandro Cornellas, MD, John Newell, Jr., MD, Brad Thompson, MD. University of Michigan, Ann Arbor, MI: MeiLan Han, MD, Ella Kazerooni, MD, Carlos Martinez, MD. University of Minnesota, Minneapolis, MN: Joanne Billings, MD, Tadashi Allen, MD. University of Pittsburgh, Pittsburgh, PA: Frank Sciurba, MD, Divay Chandra, MD, MSc, Joel Weissfeld, MD, MPH, Carl Fuhrman, MD, Jessica Bon, MD. University of
Texas Health Science Center at San Antonio, San Antonio, TX: Antonio Anzueto, MD, Sandra Adams, MD, Diego Maselli-Caceres, MD, Mario E. Ruiz, MD
Lung Health Study: The principal investigators and senior staff of the clinical and coordinating centers, the NHLBI, and members of the Safety and Data Monitoring Board of the Lung Health Study are as follows: Case Western Reserve University, Cleveland, OH: M.D. Altose, M.D. (Principal Investigator), C.D. Deitz, Ph.D. (Project Coordinator); Henry Ford Hospital, Detroit, MI: M.S. Eichenhorn, M.D. (Principal Investigator), K.J. Braden, A.A.S. (Project Coordinator), R.L. Jentons, M.A.L.L.P. (Project Coordinator); Johns Hopkins University School of Medicine, Baltimore, MD: R.A. Wise, M.D. (Principal Investigator), C.S. Rand, Ph.D. (Co-Principal Investigator), K.A. Schiller (Project Coordinator); Mayo Clinic, Rochester, MN: P.D. Scanlon, M.D. (Principal Investigator), G.M. Caron (Project Coordinator), K.S. Mieras, L.C. Walters; Oregon Health Sciences University, Portland: A.S. Buist, M.D. (Principal Investigator), L.R. Johnson, Ph.D. (LHS Pulmonary Function Coordinator), V.J. Bortz (Project Coordinator); University of Alabama at Birmingham: W.C. Bailey, M.D. (Principal Investigator), L.B. Gerald, Ph.D., M.S.P.H. (Project Coordinator); University of California, Los Angeles: D.P. Tashkin, M.D. (Principal Investigator), I.P. Zuniga (Project Coordinator); University of Manitoba, Winnipeg: N.R. Anthonisen, M.D. (Principal Investigator, Steering Committee Chair), J. Manfreda, M.D. (Co-Principal Investigator), R.P. Murray, Ph.D. (Co-Principal Investigator), S.C. Rempel-Rossum (Project Coordinator); University of Minnesota Coordinating Center, Minneapolis: J.E. Connett, Ph.D. (Principal Investigator), P.L. Enright, M.D., P.G. Lindgren, M.S., P. O'Hara, Ph.D., (LHS Intervention Coordinator), M.A. Skeans, M.S., H.T. Voelker; University of Pittsburgh, Pittsburgh, PA: R.M. Rogers, M.D. (Principal Investigator), M.E. Pusateri (Project Coordinator); University of Utah, Salt Lake City: R.E. Kanner, M.D. (Principal Investigator), G.M. Villegas (Project Coordinator); Safety and Data Monitoring Board: M. Becklake, M.D., B. Burrows, M.D. (deceased), P. Cleary, Ph.D., P. Kimbel, M.D. (Chairperson; deceased), L. Nett, R.N., R.R.T. (former member), J.K. Ockene, Ph.D., R.M. Senior, M.D. (Chairperson), G.L. Snider, M.D., W. Spitzer, M.D. (former member), O.D. Williams, Ph.D.; Morbidity and Mortality Review Board: T.E. Cuddy, M.D., R.S. Fontana, M.D., R.E. Hyatt, M.D., C.T. Lambrew, M.D., B.A. Mason, M.D., D.M. Mintzer, M.D., R.B. Wray, M.D.; National Heart, Lung, and Blood Institute staff, Bethesda, MD: S.S. Hurd, Ph.D. (Former Director, Division of Lung Diseases), J.P. Kiley, Ph.D. (Former Project Officer and Director, Division of Lung Diseases), G. Weinmann, M.D. (Former Project Officer and Director, Airway Biology and Disease Program, DLD), M.C. Wu, Ph.D. (Division of Epidemiology and Clinical Applications).