flock: a density-based clustering method for automated cell population identification in high-...
TRANSCRIPT
![Page 1: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/1.jpg)
FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High-
Dimensional Flow Cytometry Data and the Cell Ontology
Richard H. Scheuermann, Ph.D.
Department of Pathology and Division of Biomedical Informatics
U.T. Southwestern Medical Center, Dallas, TX
![Page 2: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/2.jpg)
![Page 3: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/3.jpg)
![Page 4: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/4.jpg)
TRADITIONAL FLOW ANALYSIS
![Page 5: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/5.jpg)
Flow Cytometry (FCM)
• a.k.a. Fluorescence Activated Cell Sorting (FACSTM)
• Method:– Stain cell population with fluorescent reagents that bind to specific
molecules, e.g. fluorescein-conjugated anti-CD40 antibodies
– Measure fluorescence properties of each cell using flow cytometer
• Direct and indirect measurement of individual cell characteristics, e.g. cell size, membrane protein expression, secreted protein expression, cell cycle state, DNA ploidy, signal transduction activation
![Page 6: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/6.jpg)
Uses of Flow Cytometry (FCM)
• Differences in cell populations between specimens
• Study of normal cell activation, differentiation and function
• Study of abnormal cell activation, differentiation and function
• Isolate cells from mixture based on their molecular characteristics
• Diagnostics - leukemia, lymphoma, myeloproliferative disorders
• Novel biomarkers
10 10 10 10 100 1 2 3 4
A-07-3711 LAKHANPAL04.008CD15 FITC ->
Red - MyeloblastsGreen - GranulocytesL. Blue - Monocytes
normal leukemia
![Page 7: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/7.jpg)
FCM can measure many parameters simultaneously, e.g., BD LSR-II can produce data for up to 19 parameters for every cell in
a given sample
FCM instrumentation & reagents
![Page 8: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/8.jpg)
Traditional Flow Cytometry Analysis
•Subjective
•Time-consuming
•Doesn’t handle overlapping distributions well
•Sensitive to slight difference in fluorescence intensity distributions between samples
•Requires at least one 2D plot that clearly segregates populations in question
Goal - group together cells with similar characteristics
Traditional approach - manual gating 2D at a time
![Page 9: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/9.jpg)
Improved Approach
• Identifying cell populations automatically, objectively, and quickly in multi-dimensional flow cytometry data (eliminate manual gating)
• Quantitatively compare the identified populations across different samples and across different experiments
![Page 10: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/10.jpg)
Characteristics of FCM Data
Data sets are:• Large (and various) size
– From hundreds to millions of events• Multidimensional
– 19 parameter instrument already available• Noise and Outlier
– Dead cells and dirt
Populations are different in:• shapes
– Elongated, ellipsoid, spherical, banana shapes…• densities
– Some cell populations are relatively sparse even on 2D space• compositions
– Events that pile up on axis can change data distribution• positions
– Some are very close while others are far away• sizes
– From several events to hundreds of thousands events
![Page 11: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/11.jpg)
FLOCK APPROACH
![Page 12: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/12.jpg)
Grid-based Clustering Approach
• Divide n-dimensional space with hyper-grids
• Identify dense hyper-regions
• Merge neighboring dense hyper-regions to define k populations
• Determine centroids of each population
• Cluster data using k centroids to seed
![Page 13: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/13.jpg)
2D example
![Page 14: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/14.jpg)
Divide with hyper-grids
![Page 15: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/15.jpg)
Find dense hyper-regions
![Page 16: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/16.jpg)
Merge neighboring dense hyper-regions
![Page 17: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/17.jpg)
Clustering based on region centers
![Page 18: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/18.jpg)
FLOCK v2.0 STEPS
1. File Conversion - Convert binary .fcs file into a data matrix
2. Data Cleansing - Remove boundary events (noise) in FSC and SSC dimensions
3. Data Shrinking - Collapse data toward distribution modes
4. Normalization - Z-score normalization for values in each dimension ((x i - µ)/SD)
5. Dimension Selection - Select most informative dimensions based on measures of dispersion and
distortion
6. FLOCK LoDi. Partition each dimension to generate a hyper-gridii. Identify dense hyper-regions in hyper-gridiii. Merge neighboring dense hyper-regions to define hyper-region groups (n)iv. Determine centroids for each hyper-region groupv. Use n centroids to seed single round of distance-based clustering
7. FLOCK HiD - Refine population definition based on histogram partitioning
8. Group Merging - Merge close hyper-region groups based on [distance metric]
9. Centroid Calculation - Compute centroid for each hyper-region group
10. Clustering - Cluster events to nearest centroid
11. Population statistics - Summarize population proportions, intensity levels, etc.
12. Visualization
![Page 19: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/19.jpg)
Data
• Source: University of Rochester (Sanz)
• Normal human PBMC sample stained with:– FITC‑IgD– PE‑CD1c– PE‑Alexa610‑CD24– PE‑Cy5‑IgG– PerCP‑Cy5.5‑CD3– PE‑Cy7‑B220– PacificBlue‑CD38– PacificOrange‑Aqua dead cell staining– APC‑CD27– APC‑Cy7‑CD19
• 10 color; 12 parameter
• Gated on CD19+, CD3- (~67,000 events)
![Page 20: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/20.jpg)
N1-3
UM1-2
UM3-4PB GSM
GNSM
DNM
CD
27
IgD
B2
20
CD24
CD
38
IgG
A
17 B Cell Populations in Blood
![Page 21: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/21.jpg)
B2
20
CD24
CD
38
IgG
N1(B220+, CD38+)
N2(B220+, CD38-)
N3(B220low, CD38+)
Naïve B cells (CD27low, IgD+, IgG-)
![Page 22: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/22.jpg)
Population characteristics
Populationa Colorb CD27c IgDc IgGc CD38c CD24c B220c Proportiond Putative cell typea
N1 Gray - + - + int + 48.94% naïve (CD38+)[Bm2?]N2 Magenta - + - - + + 4.69% naïve (CD38-)N3 Purple - + - + + low 4.41% naïve (CD38+B220low)
UM1 Darkred + + - + + + 1.55% unswitched memory (CD38+)UM2 Salmon + + - - + + 0.94% unswitched memory (CD38-)[Bm1?]UM3 Darkblue + int - + + low 6.16% IgDlow unswitched memory (CD38+)UM4 Green + int - - + low 11.50% IgDlow unswitched memory (CD38-)
GSM1 Grayishgreen + + + + + + 0.36% switching memory (IgD+IgG+CD38+)GSM2 Yellow + - + + + low 4.05% switched memory (CD38+)[early Bm5?]GSM3 Blue + - + - + low 4.40% switched memory (CD38-)[late Bm5?]
GNSM1 Cyan + - - + + low 4.84% IgD-IgG- memoryGNSM2 Darkgreen + - - - + low 3.84% IgD-IgG- memoryGNSM3 Teal + - - + + + 1.30% IgD-IgG- memoryGNSM4 Orange + - - - - low 0.51% IgD-IgG- memory
DNSM1 Pink - - + - - + 0.85% double negative memory (IgG+)DNSM2 Darkgray - - - - - + 0.91% double negative memory (IgG-)
PB Red high - - high - low 0.75% plasmablasts
![Page 23: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/23.jpg)
Summary Statistics
![Page 24: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/24.jpg)
B cell component of the Cell Ontology
http://www.obofoundry.org/
![Page 25: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/25.jpg)
Tube Marker Summary
Tube 26 Tube 27 Tube 28 Tube 29 Tube 30 Tube 31 Tube 33
Major PBMC subsets and FcE
T cell subsets
NK & T cells
Naïve TH
Memory TH
T cell subsets
NK cells
FL1 CD14 CD4 CD4 CD4 CD4 CD4 CD56
FL2 CD23 CCR3 CD25 CD25 CD25 CXCR3 CXCR3
FL3 CD3 CD8 CD3 CD3 CD3 CD8 CD3
FL4 CD19 CCR4 CD161 CD45RA CD45RO CCR5 CCR5
![Page 26: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/26.jpg)
Tube 26 - CD19 vs CD3
T
B
CD19
CD3
![Page 27: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/27.jpg)
Ontology Schematic
![Page 28: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/28.jpg)
Normal 2324
0 102 103 104 105
<FITC-A>: IgD
0
103
104
105
<A
PC
-A>
: C
D27
1.6717.2 7.79
65.79.29
0 102 103 104 105
<FITC-A>: IgD
0
103
104
105
<A
PC
-A>
: C
D27
0 102 103 104 105
<FITC-A>: IgD0 102 103 104 1050 102 103 104 105
<FITC-A>: IgD
0
103
104
105
<A
PC
-A>
: C
D27
0
103
104
105
0
103
104
105
<A
PC
-A>
: C
D27
1.6717.2 7.79
65.79.29
1.671.6717.217.2 7.797.79
65.765.79.299.29
Percentage (%)
Population & ID Color Code CD27 IgD CD21 CD38 CD24 B220 CXCR3 2324
1 PB red CD27 high IgD- CD21low CD38+ CD24- B220low CXCR3low 3.11
2 CD27+ cyan CD27+ IgD- CD21+ CD38- CD24+ B220+ CXCR3+ 5.95
6 Memory magenta CD27+ IgD- CD21+ CD38- CD24+ B220low CXCR3- 4.37
9 blue CD27+ IgD- CD21low CD38- CD24- B220low CXCR3- 1.14
4 CD27- gray CD27low IgD- CD21- CD38- CD24- B220low CXCR3- 0.91
8 memory pink CD27low IgD- CD21- CD38low CD24- B220+ CXCR3- 2.28
13 darkblue CD27low IgD- CD21- CD38- CD24- B220+ CXCR3+ 1.98
5 green CD27- IgD- CD21+ CD38- CD24low B220low CXCR3- 0.47
12 darkgreen CD27- IgDlow CD21+ CD38- CD24+ B220low CXCR3- 1.01
3 unswitched yellow CD27+ IgDlow CD21+ CD38- CD24+ B220low CXCR3- 9.12
14 memory purple CD27+ IgDlow CD21- CD38low CD24+ B220+ CXCR3- 0.29
7 naive darkGray CD27+ IgD+ CD21+ CD38low CD24low B220+ CXCR3- 20.47
10 grayish green CD27low IgD+ CD21+ CD38+ CD24+ B220+ CXCR3- 3.79
11 darkred CD27- IgD+ CD21+ CD38- CD24low B220+ CXCR3- 45.09
Marker Expression
![Page 29: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/29.jpg)
![Page 30: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/30.jpg)
B cells from Immgen
![Page 31: FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard](https://reader036.vdocuments.us/reader036/viewer/2022081516/56649ce25503460f949ad487/html5/thumbnails/31.jpg)
UT SouthwesternYu (Max) QianJamie LeeMegan KongJennifer CaiJie HuangNishanth MarthandanDiane XiangYoung Bun KimPaula GuidryEva Sadat
Ignacio Sanz (Rochester)Chungwen Wei (Rochester)Tim Mosmann (Rochester)Adam Seegmiller (UTSW)Nitin Karandikar (UTSW)Christine Martens (Emory)Chris Ding (UTA)
Alex Diehl (Jackson Labs)Martin Zand (Rochester)
Supported by NIH N01AI40076 and N01AI40041
Northrop GrummanJohn CampbellLiz ThompsonJeff WiserMike Attasi
Immune Tolerance NetworkDave ParrishKeith BoyceTom CasaleJeff Bluestone
Acknowledgments