machine learning-based gating automation · tree layouts (e.g. spade), multi dimensional scaling,...

25
MACHINE LEARNING-BASED GATING AUTOMATION

Upload: others

Post on 29-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

MACHINE LEARNING-BASED GATING AUTOMATION

Page 2: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

WHO ARE WE?

Working on different kinds of data: • Omics (genomics, transcriptomics, proteomics…) • Cytometry • Clinical/Medical

CRO in data analysis, founded in 2006 with a focus on: • Life Science Data analysis/mining in immunology, oncology, dermo-

cosmetology,… • Biological interpretation of results • Development of dedicated databases and proprietary tools for data

processing, analysis and interpretation

Transcriptomic (DNA chip, RNA-Seq)

Proteomic (LC-MS/MS)

Genomic/Epigenomic

(DNA chip, NGS)

Flow/Mass Cytometry

With the aim of: • Caracterizing new models, the effect of treatments at different doses • Identifying molecular mecanisms/markers of pathology • Following biological ways of interest (oxydative stress, immune response,…) • Studying the microbiote • Immunophenotyping

Page 3: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

THE ALTRABIO’S WORKFLOW

Implementation of biostatistical and biomathematical methods

Analysis methods & results/Publication quality images/Oral presentation of the report

Comprehensive report

State-of-the-art methods/Customized statistical models/Classifier analyses

Generated DataFrom your own providers or from our partners

What we deliverWhat we need What we do

Biological question

Gating AutomataAvailables on a secured Web interface for raw data

deposit and results visualization and exploration

Gating automationCell-population-guided approachPatient-output-guided approach

Unbiased approaches

Page 4: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

THE ALTRABIO’S TOOLBOX FOR CYTOMETRY DATA➡ Data preprocessing: « Transform raw data for better

identification of cell populations »  ‣ Compensation ‣ Transformation (Logarithmic transformations; log-linear hybrid

transformations: logicle, Hyperlog, hyperbolic arcsine, biexponential, etc.) ‣ Normalization (Reduction of technical variance: statistical normalization,

peak alignment & registration methods, bead based normalisation, etc.) ‣ Debarcoding & « Pre-Gating » (Debarcoding, Beads Identification, Margin

Events / Doublets removal, Live Cells identification / DNA gating etc.)

➡ Data quality control & Visualization: « Remove artifacts & poor quality data » ‣ Detection of inconsistencies among individual samples (technical errors,

labelling errors, batch effect, sample effect, etc.): ‣ Summary statistics, statistical tests, outliers detection methods, probability

binning, fingerprinting, etc. ‣ Visualization to get insights ‣ Principal Component Analysis, t-SNE, UMAP, Minimum Spanning Tree

layouts, Multi Dimensional Scaling, etc.

➡ Automated unsupervised population identification: « Automatically identify cell populations in an unsupervised way » ‣ Clustering approaches (Proprietary approaches, topological/graph-based

approaches (e.g. SamSPECTRAL), density-based approaches (e.g. Flock), model-based approaches (e.g. immunoClust, FLAME, FlowClust, flowMerge), hybrid approaches (e.g. FlowSOM, Phenograph, FlowPeaks, FlowMeans, etc.), ensemble approaches, etc.)

‣ Gating by dimension reduction (Principal Component Analysis, Minimum Spanning Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.)

➡ Automated supervised population identification: « Automatically identify cell populations in a (partially) supervised way » —> Cell-populations-guided solution ‣ Partially supervised: Flow Density ‣ Supervised: Gating Automata (proprietary)

➡ Automated semi-supervised population identification: « Automatically identify cell populations while guiding the identification to find a clustering that fits the objective of the study thanks to knowledge embedding » —> Patient-output-guided solution

➡ Cross sample analysis: ‣ Statistical modeling

generalized linear models, mixed models (etc) for (1) differential analysis of abundance of cell populations or (2) differential analysis of marker expression stratified by cell population.

‣ Machine learning Supervised learning (e.g., Random Forest, Boosting, SVM, (sparse) PLS), correlation identification, etc.

‣ Specific task-dedicated algorithms CITRUS, RchyOptimyx, etc.

Page 5: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

WHY AUTOMATE YOUR GATING STRATEGY? But, the gating is usually performed manually which:

• is highly time consuming and tedious

• requires experience/knowledge • is subject to errors and subjectivity,

reproducibility issuesGating is a fundamental step for data analysis

such as immunophenotyping

NEED FOR AUTOMATION

Data Generation Gating Cross-samples

analyses

Cells characterization

Cells populations

Samples - Individuals

Page 6: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

TWO APPROACHES FOR AUTOMATED GATING DEPENDING ON NEEDS1. To identify discriminative cell populations without any (few) a priori

2. To identify predefined cell populations

Applications - Mainly for research projects: • Predictive modeling (e.g., definition of cell populations to

diagnose a disease and/or its evolution) • Discovery (e.g., identification & description of responding

populations for a therapy, identification of therapeutic targets)

Applications - Mainly for development & diagnostic projects: • Reproduce & standardize clinical trial data processing • Diagnostic, Minimal Residual Disease monitoring, …

ALTRABIO’S CELL-

POPULATIONS-GUIDED

SOLUTION

ALTRABIO’S

PATIENT-OUTPUT-GUIDED

SOLUTION

Page 7: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

ALTRABIO’S PATIENT-OUTPUT-GUIDED SOLUTION

Page 8: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

WORKFLOWS FOR AUTOMATED CYTOMETRY BASED BIOMARKERS IDENTIFICATION

Pros: Gating is performed automatically which:

• reduces time & cost (may render this task feasible) • no (less) experience/knowledge required • avoid errors and subjectivity, reproducibility issues

Cons: Identification do not take into account all previously accumulated knowledge: only based on data (=>risk of drowning into data) Understanding and Interpretation of Results are difficult

Automated Samples characterization without embedded knowledge

(Unbiased clustering)

Cytometry Data

Identification of relevant cellular populations

(BioMarkers)Predictive Modelling

Statistical Modeling / Machine Learning

Use of clustering algorithms to perform automated unbiased population identification

=> gating (done automatically) no use of Knowledge

classical automated ML based approach

based on Statistical Analyses

Page 9: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

Pros: Gating is performed automatically which:

• reduces time & cost (may render this task feasible) • no (less) experience/knowledge required • avoid errors and subjectivity, reproducibility issues

Identification based on both data and all previously accumulated knowledge Understanding and Interpretation of Results are easier

WORKFLOWS FOR AUTOMATED CYTOMETRY BASED BIOMARKERS IDENTIFICATIONAutomated Samples characterization

without embedded knowledge (Unbiased clustering)

Cytometry Data

Identification of relevant cellular populations

(BioMarkers)Predictive Modelling

Statistical Modeling / Machine Learning

Automated Samples characterization with embedded knowledge (OUTPUT guided solution)

Use of a proprietary clustering algorithm to perform automated output

guided population identification

=> gating (done automatically) embedded Knowledge

ALTRABIO’s automated ML based approach based on Statistical Analyses

Use of clustering algorithms to perform automated unbiased population identification

=> gating (done automatically) no use of Knowledge

classical automated ML based approach

Page 10: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

WHY?▸ Enhancement of the results ▸ « Automatic identification of the resolution » of the clustering ▸ Enhancement of the results adequacy with the problem to address

WHAT?▸ Tool to perform unbiased clustering while possibly « guiding » it (i.e. « guiding »

the kind of knowledge it should extract)

WHEN?▸ In any case ▸ Specially interesting when you need to describe samples in order to perform

comparisons/cross samples analyses

Page 11: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

HOW?▸ Own clustering approach based on density

▸ Clusterizes samples independently (allows to take into account samples specificities like marker shift, populations differences in abundance, etc.) + meta clustering

▸ Adopting a clustering approach with low assumptions on shapes, sizes of cellular populations, as well as imbalance between cellular populations

▸ Automatic identification of the « good resolution » for clustering ▸ Possible knowledge embedding in order to adopt a point of view on data

(clusterizing while taking into account that the extracted structure should help in adressing a given problem)

Page 12: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

RESULT VISUALIZATIONSMulti Dimensional Scaling (MDS)

−2

−1

0

1

−2.5 0.0 2.5 5.0MDS1

MDS2

Day.post.infection0

1

2

7

15

2.0

3.0

4.0

21.15

22.15

23.15

24.15

7.1

11.2

9.2

19.7

20.7

12.2

5.1

6.1

171Yb_CD44

141Pr_Ly6G

172Yb_CD11b

149Sm_CD19

160Gd_CD62L

209Bi_CD11c

168Er_CD8a

151Eu_CD25_bille

170Er_CD161

Tm169Di

145Nd_CD4

152Sm_CD3e

Day.post.infection

Day.post.infection012715

0

1

2

3

4

5

6

Various t-SNE plots

HeatMap + Hier. Clustering

Sample Stratification Condition Stratification Cluster Stratification

Page 13: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

● ●

●●

●●

●●

●● ●●

●●

●●●●●●●●

●●●

●●

●●

●●

0

5

10

15

20

shad

owM

inX3

2X3

6X3

3X2

3X3

5X4

5sh

adow

Mea

nX1

0X4

0X1

2X2

9X3

9 X5 X7 X25

X34

X41

X43

X47

X48

X49

X50

X51

X56

X52

X27

X53

X54

X11

X38 X9 X14

X44

X15

X30

X55

X24

X46 X8 X42 X6 X20

X28 X2 X31 X1 X19

X21

X22

shad

owM

axX2

6 X3 X13

X17

X16

X18

X37 X4

variable

valu

edecision

Confirmed

Rejected

Synthetic

Undetermined

Cluster Importance for 0

● ● ● ● ● ●

● ●●

● ● ●

● ●

● ●

● ●

● ●

●● ● ●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Classification for 0

sample index

RF

votin

g

● ● ●●●●●

● ● ●●

●●

●●●

●● ●

● ● ●

●●●

●●●

●●

●●● ●

Wilcoxon, p = 8e−08

0

10

20

30

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●●●●●

●● ● ●●●

● ● ●●

●●

●● ●●

●● ●● ●● ●●

● ● ● ●●●

●●●●

Wilcoxon, p = 0.0014

0

2

4

6

8

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

●● ●

●●

●●

●●

●●

●● ●

Wilcoxon, p = 0.0045

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

● ●

●●

● ●

●●

●● ●

●●●●● ●

● ●●

●●

● ● ●

Wilcoxon, p = 0.05

0.00

0.05

0.10

0.15

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

−5

0

5

10

15

20

25

shad

owM

inX2

7X1

2X2

4X1

5X1

0X2

3sh

adow

Mea

n X9 X5 X32

X25

X33 X8 X30

X31

X22 X7 X6 X14

X28

X11

X20

X19 X3 X26

X29

X21

shad

owM

ax X1 X13

X17 X2 X16

X18 X4

variable

valu

e

decisionConfirmed

Rejected

Synthetic

Undetermined

Cluster Importance for 1

●● ●

● ● ●

●● ●

● ●

● ●

● ●

● ●

● ● ●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Classification for 1

sample index

RF

votin

g

●●●

●●● ● ●●●

● ●● ●

● ●

●●●

● ● ●

●● ●

● ●●

● ●●

●●

●● ●●

Wilcoxon, p = 8.1e−08

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

● ●

●● ●

● ●

●●

●●

●●

● ●●

Wilcoxon, p = 0.0045

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

●●

●●

●●

●●

●● ●

●●●● ● ●

●● ●

●●

●●●

Wilcoxon, p = 0.05

0.00

0.05

0.10

0.15

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE●

●●●● ●●● ● ● ●● ●●● ●● ●● ●●● ●● ● ●

●●●

● ●● ●● ●●● ●● ●●

Wilcoxon, p = 0.095

0

20

40

60

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●●

● ●

●● ●

●●●●

●●●● ●

●● ●

●●●●

●●

●●●●●

●●●●●

0

10

20

shad

owM

inX1

2X2

3X2

7X2

4X1

0sh

adow

Mea

n X8 X25

X15 X9 X22

X30 X5 X7 X14 X6 X20 X1 X19

X29

X11

X28 X3 X26

shad

owM

axX2

1X1

3X1

7 X2 X16

X18 X4

variable

valu

e

decisionConfirmed

Rejected

Synthetic

Undetermined

Cluster Importance for 2

● ● ● ● ● ●

● ●

●●

● ●

● ●

● ●

● ●

● ● ●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Classification for 2

sample index

RF

votin

g

●●●

●●● ● ●●●

●● ●●

●●

●●●

● ●●

●● ●

●●●

● ●●

●●

●● ●●

Wilcoxon, p = 8.1e−08

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

● ●●

● ●

● ●

●●

●●

●●●

Wilcoxon, p = 0.0045

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

● ●

●●

●●

●●

●● ●

●●● ● ● ●

●● ●

●●

●● ●

Wilcoxon, p = 0.05

0.00

0.05

0.10

0.15

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE●

●●●● ●● ●●● ●●● ● ● ● ● ●● ●●● ●● ●●

●●●

● ●●● ● ●●● ●● ●●

Wilcoxon, p = 0.095

0

20

40

60

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

● ●

●●●

● ●●●

●●

−5

0

5

10

15

20

25

shad

owM

in

X25

X23

X12

shad

owM

ean

X10 X8 X24 X9 X22 X6 X14 X7 X5 X20

X11

X19

X15 X3 X1

shad

owM

ax

X21

X13

X17 X2 X16

X18 X4

variable

valu

e

decisionConfirmed

Rejected

Synthetic

Undetermined

Cluster Importance for 3

●●

● ●●

● ●

● ●

● ●

● ●

● ● ●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Classification for 3

sample index

RF

votin

g

●●●

● ●●● ● ●●

●● ● ●

● ●

●●

●●● ●

● ●●

●● ●

● ●●

●●

● ●●●

Wilcoxon, p = 8.1e−08

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

● ●

● ● ●

●●

● ●

●●

●●

●● ●

Wilcoxon, p = 0.0045

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

●●

●●

●●

●●

●● ●

● ●● ●● ●

●●●

●●

● ●●

Wilcoxon, p = 0.05

0.00

0.05

0.10

0.15

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

● ●

●●

●●

●●

●● ●●

●●

●●●●●●●●

●●●

●●

●●

●●

0

5

10

15

20

shad

owM

inX3

2X3

6X3

3X2

3X3

5X4

5sh

adow

Mea

nX1

0X4

0X1

2X2

9X3

9 X5 X7 X25

X34

X41

X43

X47

X48

X49

X50

X51

X56

X52

X27

X53

X54

X11

X38 X9 X14

X44

X15

X30

X55

X24

X46 X8 X42 X6 X20

X28 X2 X31 X1 X19

X21

X22

shad

owM

axX2

6 X3 X13

X17

X16

X18

X37 X4

variable

valu

e

decisionConfirmed

Rejected

Synthetic

Undetermined

Cluster Importance for 0

● ● ● ● ● ●

● ●●

● ● ●

● ●

● ●

● ●

● ●

●● ● ●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Classification for 0

sample index

RF

votin

g

● ● ●●●●●

● ● ●●

●●

●●●

●● ●

● ● ●

●●●

●●●

●●

●●● ●

Wilcoxon, p = 8e−08

0

10

20

30

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●●●●●

●● ● ●●●

● ● ●●

●●

●● ●●

●● ●● ●● ●●

● ● ● ●●●

●●●●

Wilcoxon, p = 0.0014

0

2

4

6

8

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

●● ●

●●

●●

●●

●●

●● ●

Wilcoxon, p = 0.0045

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

● ●

●●

● ●

●●

●● ●

●●●●● ●

● ●●

●●

● ● ●

Wilcoxon, p = 0.05

0.00

0.05

0.10

0.15

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

−5

0

5

10

15

20

25

shad

owM

inX2

7X1

2X2

4X1

5X1

0X2

3sh

adow

Mea

n X9 X5 X32

X25

X33 X8 X30

X31

X22 X7 X6 X14

X28

X11

X20

X19 X3 X26

X29

X21

shad

owM

ax X1 X13

X17 X2 X16

X18 X4

variable

valu

e

decisionConfirmed

Rejected

Synthetic

Undetermined

Cluster Importance for 1

●● ●

● ● ●

●● ●

● ●

● ●

● ●

● ●

● ● ●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Classification for 1

sample index

RF

votin

g

●●●

●●● ● ●●●

● ●● ●

● ●

●●●

● ● ●

●● ●

● ●●

● ●●

●●

●● ●●

Wilcoxon, p = 8.1e−08

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

● ●

●● ●

● ●

●●

●●

●●

● ●●

Wilcoxon, p = 0.0045

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

●●

●●

●●

●●

●● ●

●●●● ● ●

●● ●

●●

●●●

Wilcoxon, p = 0.05

0.00

0.05

0.10

0.15

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE●

●●●● ●●● ● ● ●● ●●● ●● ●● ●●● ●● ● ●

●●●

● ●● ●● ●●● ●● ●●

Wilcoxon, p = 0.095

0

20

40

60

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●●

● ●

●● ●

●●●●

●●●● ●

●● ●

●●●●

●●

●●●●●

●●●●●

0

10

20

shad

owM

inX1

2X2

3X2

7X2

4X1

0sh

adow

Mea

n X8 X25

X15 X9 X22

X30 X5 X7 X14 X6 X20 X1 X19

X29

X11

X28 X3 X26

shad

owM

axX2

1X1

3X1

7 X2 X16

X18 X4

variable

valu

e

decisionConfirmed

Rejected

Synthetic

Undetermined

Cluster Importance for 2

● ● ● ● ● ●

● ●

●●

● ●

● ●

● ●

● ●

● ● ●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Classification for 2

sample index

RF

votin

g

●●●

●●● ● ●●●

●● ●●

●●

●●●

● ●●

●● ●

●●●

● ●●

●●

●● ●●

Wilcoxon, p = 8.1e−08

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

● ●●

● ●

● ●

●●

●●

●●●

Wilcoxon, p = 0.0045

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

● ●

●●

●●

●●

●● ●

●●● ● ● ●

●● ●

●●

●● ●

Wilcoxon, p = 0.05

0.00

0.05

0.10

0.15

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE●

●●●● ●● ●●● ●●● ● ● ● ● ●● ●●● ●● ●●

●●●

● ●●● ● ●●● ●● ●●

Wilcoxon, p = 0.095

0

20

40

60

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

● ●

●●●

● ●●●

●●

−5

0

5

10

15

20

25

shad

owM

in

X25

X23

X12

shad

owM

ean

X10 X8 X24 X9 X22 X6 X14 X7 X5 X20

X11

X19

X15 X3 X1

shad

owM

ax

X21

X13

X17 X2 X16

X18 X4

variable

valu

e

decisionConfirmed

Rejected

Synthetic

Undetermined

Cluster Importance for 3

●●

● ●●

● ●

● ●

● ●

● ●

● ● ●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Classification for 3

sample index

RF

votin

g

●●●

● ●●● ● ●●

●● ● ●

● ●

●●

●●● ●

● ●●

●● ●

● ●●

●●

● ●●●

Wilcoxon, p = 8.1e−08

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

● ●

● ● ●

●●

● ●

●●

●●

●● ●

Wilcoxon, p = 0.0045

0

10

20

30

40

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

●●

●●

●●

●●

●●

●● ●

● ●● ●● ●

●●●

●●

● ●●

Wilcoxon, p = 0.05

0.00

0.05

0.10

0.15

FALSE TRUEType

Clu

ster

Type●

FALSE

TRUE

ENHANCED RESULTS (CASE STUDY)

« ReAnalysis » of a dataset (AltraBio & a French hospital) • Residual disease context • Known « disease cellular population » • Might be a rare or a frequent population

(~1% to ~50% of a given subpopulation)

• Complex targeted phenotype based on a combination of 7 markers

• Goals: • Build model to classify samples • Automatically identify the « disease » population

• Results: • Identification of a predictive cell cluster with a

prediction 100% accuracy

Perfect prediction 100% accuracy by cross validation (only 75 % accuracy with CITRUS)

The most relevant automatically identified stratifying clusters is highly discriminative

REAL \ PREDICTED MRD - MRD +

MRD - 28 0MRD + 0 15

MRD -MRD + MRD +

MRD -

MRD +MRD -

Page 14: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

CELL-POPULATIONS-GUIDED SOLUTION

Page 15: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

WHY?▸ Reduce time/cost ▸ Address scalability/exhaustivity issues ▸ Address errors/subjectivity/

reproducibility issues ▸ « Real time » results delivery ▸ Concentrate needed experience/

knowledge

WHAT?▸ Tool to automate manual gating tasks

WHEN?▸ Large number of samples to process ▸ Need to accelerate processing time ▸ Clinical trials/immuno-monitoring ▸ Increased reproducibility needed ▸ On line (« real time ») processing &

reporting needed ▸ Embedding gating process into another

(medical) device

Page 16: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

HOW?▸ Learns and applies your gating strategy (exactly mimics

what you would do) ▸ Uses the latest Machine Learning approaches with

emphasis on: - Handling variability - Addressing issues related to strongly imbalanced sizes of

cellular populations and rare populations

‣ Can embed additional knowledge (e.g. use of FMO, controls, copy of gates)

‣ Validated performances on numerous studies

FCS files

Train automaton

FCS Files + auto

gating

Learning data generation FCS

Files +manual gating

FCS Files

FCS File + auto gating

Batch processing Single file processing

Trained automaton

Learning stage

Application stage

1-4

wee

k(s)

5-10

min

/file

FCS File

Page 17: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

THE CYTAUTOMATON WORKFLOW

What we deliverWhat we need What we do

Build a dedicated gating automaton

Upload your data and access to customisable population and outlier reports through a secure

web interface

Deploy and apply to new FCS files

AltraBio’s Tool Box & Experts

Iterative 3-step workflow: Build / Evaluate / Validate

Learning DataA few representative manually gated FCS files +

Complementary Information (e.g. embedded assumptions, controls, FMO) to transfer human expertise to the computer

Page 18: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

LEARNING DATA

▸ Acquisition of human operator’s knowledge/experience with few manually gated examples

▸ Need representative examples

▸ Complementary information: details on how manual gating was done (e.g. embedded assumptions, controls, FMO)

▸ Collection made easier thanks to interfaces with FlowJO / Kaluza / DIVA

Page 19: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

▸ Iterative 3-step workflow: Build / Evaluate / Validate

▸ A loop to enhance performance of the learning if required

▸ Validate the learning performance before its deployment

▸ Go/No Go decision to deploy the Gating Automaton based on an objective predefined criterion

▸ For example, tests can allow to check that: • the GA handles variability arising from instrument

calibration (since GAs learnt on data coming from different days of acquisition were applied on data acquired another day)

• the GA is able to handle natural biological variations (the series were including different donors and different sample stimulations)

get some more learning data

get learning data

build Automaton

a priori evaluation of learning

performance

Validated Gating Automaton for deployment

learning performance

validation

No

Yes

enhance

BUILD A DEDICATED GATING AUTOMATON - THE DESIGN WORKFLOW

Page 20: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

DEPLOY AND APPLY TO NEW FCS FILES▸ Batch or single FCS file processing

▸ Web interface to limit the exchange of files

▸ Deliverables: • Customised « per file » reports and tables

(e.g. #events, % total, % parent, MFIs)

• Customised « per study » reports and tables (e.g. Files (samples) comparison, outliers detection)

• Optionally: - Automation of additional biological and technical quality controls - Automation of cross-samples analyses

CONFIDENTIAL 19 of 54

Panel_1_32160214_DRFZ_CANTO2_21APR2016_21APR2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32160463_CHP_NAVIOS_12SEP2016_12SEP2016.LMD_intra.fcs_comp.fcs / #mfi2_CD14+_MONOCYTES_FSC-A

Panel_1_32170435_CHP_NAVIOS_18JUL2017_18JUL2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32151817_UBO_NAVIOS_22MAR2016_22MAR2016.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-A

Panel_1_32151953_IRCCS_CANTOII_27JUN2016_27JUN2016.fcs_intra.fcs_comp.fcs / #mfi2_CD14+_MONOCYTES_FSC-APanel_1_32160221_DRFZ_CANTO2_11MAY2016_11MAY2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32170454_CHP_NAVIOS_07AUG2017_07AUG2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32170449_CHP_NAVIOS_24JUL2017_24JUL2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-A

Panel_1_32160219_DRFZ_CANTO2_22APR2016_22APR2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32160215_DRFZ_CANTO2_24MAY2016_24MAY2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32160552_DRFZ_CANTO2_01NOV2016_01NOV2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32160548_DRFZ_CANTO2_16SEP2016_16SEP2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32152235_DRFZ_CANTO2_12SEP2016_12SEP2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32152237_DRFZ_CANTO2_05SEP2016_05SEP2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32160551_DRFZ_CANTO2_21OCT2016_21OCT2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32170080_CHP_NAVIOS_06JUL2017_06JUL2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32170451_CHP_NAVIOS_14SEP2017_14SEP2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-A

Panel_1_32160106_UCL_CANTO2_16JUN2016_16JUN2016.fcs_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32160546_DRFZ_CANTO2_27SEP2016_27SEP2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32161029_DRFZ_CANTO2_07MAR2017_07MAR2017.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32151817_UBO_NAVIOS_22MAR2016_22MAR2016.LMD_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32170039_UCL_CANTO2_25SEP2017_25SEP2017.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32160087_UCL_CANTO2_15DEC2016_15DEC2016.fcs_intra.fcs_comp.fcs / #mfi2_PMN_SSC-A

Panel_1_32152084_UCL_CANTO2_19JAN2017_19JAN2017.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32161023_DRFZ_CANTO2_25JAN2017_25JAN2017.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32170632_CHP_NAVIOS_09OCT2017_09OCT2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32161022_DRFZ_CANTO2_24JAN2017_24JAN2017.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32170071_CHP_NAVIOS_10JUL2017_10JUL2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32152238_DRFZ_CANTO2_13SEP2016_13SEP2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32151213_UCL_CANTO2_15JAN2016_15JAN2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32151954_IRCCS_CANTOII_19JAN2016_19JAN2016.fcs_intra.fcs_comp.fcs / #mfi2_CD14+_MONOCYTES_FSC-A

Panel_1_32152234_DRFZ_CANTO2_19JUL2016_19JUL2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32161019_DRFZ_CANTO2_16JAN2017_16JAN2017.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32152089_UCL_CANTO2_01DEC2016_01DEC2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32150934_IRCCS_CANTOII_13OCT2015_13OCT2015.fcs_intra.fcs_comp.fcs / #mfi2_CD14+_MONOCYTES_FSC-APanel_1_32170455_CHP_NAVIOS_16AUG2017_16AUG2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-A

Panel_1_32170439_CHP_NAVIOS_19JUL2017_19JUL2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32150933_IRCCS_CANTOII_10SEP2015_10SEP2015.fcs_intra.fcs_comp.fcs / #mfi2_CD14+_MONOCYTES_FSC-A

Panel_1_32160683_UBO_NAVIOS_23SEP2016_23SEP2016.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32151984_UBO_NAVIOS_01MAR2016_01MAR2016.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-A

Panel_1_32152089_UCL_CANTO2_01DEC2016_01DEC2016.fcs_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32151952_IRCCS_CANTOII_20APR2016_20APR2016.fcs_intra.fcs_comp.fcs / #mfi2_CD14+_MONOCYTES_FSC-A

Panel_1_32170441_CHP_NAVIOS_09AUG2017_09AUG2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32160683_UBO_NAVIOS_23SEP2016_23SEP2016.LMD_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-A

Panel_1_32161415_UCL_CANTO2_23FEB2017_23FEB2017.fcs_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32170614_CHP_NAVIOS_10OCT2017_10OCT2017.LMD_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-A

Panel_1_32151466_UCL_CANTO2_29JAN2016_29JAN2016.fcs_intra.fcs_comp.fcs / #mfi2_LYMPHOCYTES_SSC-APanel_1_32151466_UCL_CANTO2_29JAN2016_29JAN2016.fcs_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-APanel_1_32152084_UCL_CANTO2_19JAN2017_19JAN2017.fcs_intra.fcs_comp.fcs / #mfi2_PBMC_SSC-A

Panel_1_32151466_UCL_CANTO2_29JAN2016_29JAN2016.fcs_intra.fcs_comp.fcs / #mfi2_MONOCYTES_SSC-A

deviation from closest hinge in number of interquartile ranges (black -, grey+)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Figure 2: Univariate iden fica on of poten al outliers (Plot 1) based onMFI per popula on forMorphologicalParameters

———————————————————————————————————————————————–Automaton: PRECISEADS Panel1 / Project: PRECISEADS Panel1Outlier Iden fica on Report generated with CytAutomaton by AltraBio. h ps://www.altrabio.com

CONFIDENTIAL 23 of 54

#mfi2_BEADS_SSC-A

#mfi2_NOT BEADS_SSC-A#mfi1_S2_SSC-A

#mfi2_S2_SSC-W

#mfi1_S1_FSC-A

#mfi2_S1_FSC-W

#mfi1_S3_FSC-A

#mfi1_S4_SSC-A

#mfi2_MONOCYTES_SSC-A

#mfi2_S3_FSC-H

-8

-4

0

4

-5 0 5Dim1 (47.7%)

Dim

2 (2

3%)

GroupsCHP

DRFZ

IRCCS

MHH

UBO

UCL

Figure 6: Principal Component Analysis(2): MFI for all popula ons for Morphological Parameters. Only the10 most contribu ve factors are drawn

———————————————————————————————————————————————–Automaton: PRECISEADS Panel1 / Project: PRECISEADS Panel1Outlier Iden fica on Report generated with CytAutomaton by AltraBio. h ps://www.altrabio.com

CONFIDENTIAL 29 of 54

Panel_1_32151466_UCL_CANTO2_29JAN2016_29JAN2016.fcs_intra.fcs_comp.fcs / #mfi2_CD8+_TCELLS_KO.APanel_1_32170608_CHP_NAVIOS_10OCT2017_10OCT2017.LMD_intra.fcs_comp.fcs / #mfi2_CD19+_BCELLS_APC.A

Panel_1_32170069_CHP_NAVIOS_13JUL2017_13JUL2017.LMD_intra.fcs_comp.fcs / #mfi2_CD19+_BCELLS_APC.APanel_1_32151976_UBO_NAVIOS_14MAR2016_15MAR2016.LMD_intra.fcs_comp.fcs / #mfi2_CD8+_TCELLS_KO.A

Panel_1_32161423_UCL_CANTO2_24AUG2017_24AUG2017.fcs_intra.fcs_comp.fcs / #mfi2_CD19+_BCELLS_APC.APanel_1_32151991_UBO_NAVIOS_26APR2016_26APR2016.LMD_intra.fcs_comp.fcs / #mfi2_CD8+_TCELLS_KO.A

Panel_1_32161417_UCL_CANTO2_17AUG2017_17AUG2017.fcs_intra.fcs_comp.fcs / #mfi2_CD19+_BCELLS_APC.APanel_1_32151466_UCL_CANTO2_29JAN2016_29JAN2016.fcs_intra.fcs_comp.fcs / #mfi1_CD4+_TCELLS_PB.A

anel_1_32161028_DRFZ_CANTO2_09MAR2017_09MAR2017.fcs_intra.fcs_comp.fcs / #mfi2_CD14LOWCD16POS_NONCLASSICALMONOCYTES_FITC.APanel_1_32151466_UCL_CANTO2_29JAN2016_29JAN2016.fcs_intra.fcs_comp.fcs / #mfi1_CD15LOWCD16HIGH_NEUTROPHILS_FITC.A

Panel_1_32170445_CHP_NAVIOS_27JUL2017_27JUL2017.LMD_intra.fcs_comp.fcs / #mfi1_CD8-_CD4-_TCELLS_PB.APanel_1_32160683_UBO_NAVIOS_23SEP2016_23SEP2016.LMD_intra.fcs_comp.fcs / #mfi2_CD56LOW_CD16HIGH_PC5.5.A

Panel_1_32151130_UBO_NAVIOS_12OCT2017_12OCT2017.LMD_intra.fcs_comp.fcs / #mfi2_CD19+_BCELLS_APC.APanel_1_32150704_UCL_CANTO2_23APR2015_23APR2015.fcs_intra.fcs_comp.fcs / #mfi1_CD15LOWCD16HIGH_NEUTROPHILS_FITC.A

Panel_1_32170036_UCL_CANTO2_04SEP2017_04SEP2017.fcs_intra.fcs_comp.fcs / #mfi1_CD3+_TCELLS_APC.AF750.APanel_1_32150695_UCL_CANTO2_20APR2015_20APR2015.fcs_intra.fcs_comp.fcs / #mfi2_CD8+_TCELLS_KO.A

Panel_1_32170448_CHP_NAVIOS_01AUG2017_01AUG2017.LMD_intra.fcs_comp.fcs / #mfi2_CD19+_BCELLS_APC.APanel_1_32151444_CHP_NAVIOS_02DEC2015_02DEC2015.LMD_intra.fcs_comp.fcs / #mfi2_CD56LOW_CD16HIGH_PC5.5.A

Panel_1_32161424_UCL_CANTO2_19JUN2017_19JUN2017.fcs_intra.fcs_comp.fcs / #mfi1_CD56HIGH_CD16LOW_FITC.APanel_1_32170355_UBO_NAVIOS_21NOV2017_21NOV2017.LMD_intra.fcs_comp.fcs / #mfi2_CD19+_BCELLS_APC.A

Panel_1_32151438_CHP_NAVIOS_12OCT2015_12OCT2015.LMD_intra.fcs_comp.fcs / #mfi1_CD15HIGHCD16NEG_EOSINOPHILS_FITC.APanel_1_32150698_UCL_CANTO2_23APR2015_23APR2015.fcs_intra.fcs_comp.fcs / #mfi1_CD15LOWCD16HIGH_NEUTROPHILS_FITC.A

Panel_1_32151465_UCL_CANTO2_01FEB2016_01FEB2016.fcs_intra.fcs_comp.fcs / #mfi2_CD3-CD56+_NKCELLS_PC5.5.APanel_1_32150696_UCL_CANTO2_23APR2015_23APR2015.fcs_intra.fcs_comp.fcs / #mfi2_CD8+_TCELLS_KO.APanel_1_32150698_UCL_CANTO2_23APR2015_23APR2015.fcs_intra.fcs_comp.fcs / #mfi2_CD8+_TCELLS_KO.A

Panel_1_32160074_UCL_CANTO2_11AUG2016_11AUG2016.fcs_intra.fcs_comp.fcs / #mfi2_CD3-CD56+_NKCELLS_PC5.5.APanel_1_32170448_CHP_NAVIOS_01AUG2017_01AUG2017.LMD_intra.fcs_comp.fcs / #mfi2_CD56LOW_CD16HIGH_PC5.5.A

Panel_1_32151472_UCL_CANTO2_04FEB2016_04FEB2016.fcs_intra.fcs_comp.fcs / #mfi1_CD56LOW_CD16HIGH_FITC.APanel_1_32170069_CHP_NAVIOS_13JUL2017_13JUL2017.LMD_intra.fcs_comp.fcs / #mfi2_CD3-CD56+_NKCELLS_PC5.5.A

Panel_1_32160222_DRFZ_CANTO2_07JUN2016_07JUN2016.fcs_intra.fcs_comp.fcs / #mfi2_CD3+_TCELLS_APC.APanel_1_32170039_UCL_CANTO2_25SEP2017_25SEP2017.fcs_intra.fcs_comp.fcs / #mfi1_CD14LOWCD16POS_NONCLASSICALMONOCYTES_PC7.A

Panel_1_32150704_UCL_CANTO2_23APR2015_23APR2015.fcs_intra.fcs_comp.fcs / #mfi1_CD15HIGHCD16NEG_EOSINOPHILS_FITC.APanel_1_32150704_UCL_CANTO2_23APR2015_23APR2015.fcs_intra.fcs_comp.fcs / #mfi2_CD8+_TCELLS_KO.A

Panel_1_32170354_UBO_NAVIOS_04JAN2018_04JAN2018.LMD_intra.fcs_comp.fcs / #mfi1_CD8-_CD4-_TCELLS_PB.APanel_1_32152087_UCL_CANTO2_08DEC2016_08DEC2016.fcs_intra.fcs_comp.fcs / #mfi1_CD3+_TCELLS_APC.AF750.A

Panel_1_32150813_DRFZ_CANTO2_08SEP2015_08SEP2015.fcs_intra.fcs_comp.fcs / #mfi1_CD15HIGHCD16NEG_EOSINOPHILS_FITC.APanel_1_32151819_UBO_NAVIOS_09MAY2016_10MAY2016.LMD_intra.fcs_comp.fcs / #mfi1_CD15POSCD14LOW_LDGS_PC7.A

Panel_1_32170447_CHP_NAVIOS_14SEP2017_14SEP2017.LMD_intra.fcs_comp.fcs / #mfi2_CD3+CD56+_NKLIKETCELLS_PC5.5.APanel_1_32152228_DRFZ_CANTO2_18FEB2016_18FEB2016.fcs_intra.fcs_comp.fcs / #mfi1_CD15HIGHCD16NEG_EOSINOPHILS_FITC.A

Panel_1_32150426_CHP_NAVIOS_20APR2015_20APR2015.LMD_intra.fcs_comp.fcs / #mfi2_CD14LOWCD16POS_NONCLASSICALMONOCYTES_FITC.APanel_1_32150426_CHP_NAVIOS_20APR2015_20APR2015.LMD_intra.fcs_comp.fcs / #mfi2_CD14HIGHCD16NEG_CLASSICALMONOCYTES_FITC.A

Panel_1_32161137_UBO_NAVIOS_14DEC2016_14DEC2016.LMD_intra.fcs_comp.fcs / #mfi1_CD4+_TCELLS_PB.APanel_1_32170432_CHP_NAVIOS_04OCT2017_04OCT2017.LMD_intra.fcs_comp.fcs / #mfi2_CD19+_BCELLS_APC.A

Panel_1_32160071_UCL_CANTO2_16JUN2016_16JUN2016.fcs_intra.fcs_comp.fcs / #mfi1_CD15HIGHCD16NEG_EOSINOPHILS_FITC.APanel_1_32150933_IRCCS_CANTOII_10SEP2015_10SEP2015.fcs_intra.fcs_comp.fcs / #mfi1_CD8+_TCELLS_PB.A

4 6 8 10 12

Figure 11: Univariate iden fica on of poten al outliers (Plot 2) based on MFI per popula on for Non Mor-phological Parameters

———————————————————————————————————————————————–Automaton: PRECISEADS Panel1 / Project: PRECISEADS Panel1Outlier Iden fica on Report generated with CytAutomaton by AltraBio. h ps://www.altrabio.com

Page 21: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

REPRODUCIBLE, ROBUST & ACCURATE RESULTS (CASE STUDY #1)

Consensus vs Expert#1

Consensus vs Expert#4

Consensus vs Expert#3

Consensus vs Automaton

Human Variability vs Automaton robustness

Comparison of gating performances for several human experts & gating automaton (AltraBio & CIRI: J. Marvel, INSERM, Lyon) • Mice, immunological data • Longitudinal data (several time points)

• 5 different human experts

• Consensus gating calculated on experts’ gatings

Consensus vs Expert#2

Consensus vs Expert#5

#1 #2 #3

#4 #5 Auto

Consensus Consensus Consensus

Consensus Consensus Consensus

Page 22: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

#cells manual

#cell

s au

to

R=0.999 p value<2x10 -16

SIMILAR RESULTS BUT FASTER (CASE STUDY #2)

manual gating automated gating

• Few days • No subjectivity/reproducibility issues

Auto

mat

ed

Gat

ing

Man

ual

Gat

ing • ~ 1 year of work

• Subjectivity/reproducibility issues

Profiling/Immuno: Monitoring Study on Human data (AltraBio & CRCL: C. Caux, INSERM, Lyon) • ~200 patients (8 samples per patient) • Huge number of gates 519 gates per patient • Some gates with high complexity

(continuum, unobvious boundaries, very rare populations)

Page 23: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

ACCURACY: AUTOMATON BENEFITS FROM MULTI DIMENSIONALITY (CASE STUDY #3) Manual gating Automated gating

The automaton takes into account all markers simultaneously, thus

enabling optimized discrimination of cellular

populations

Inaccurately assigned cells

Correctly assigned cells

Profiling/Immuno: Monitoring Study on Mouse data (AltraBio, IMPC & Ciphe: H. Luche, INSERM, Marseille)

Page 24: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

USABLE IN CLINICAL TRIAL CONTEXT (CASE STUDY #4)Assessment of the gating automaton performances in a clinical trial context (AltraBio & a French Big Pharma)

• Human donors • Longitudinal study • Different antigen stimulations • Signatures of polyfunctionality

The Automaton handles biological and technical variabilities

Gating automation of a huge immuno-phenotyping study (AltraBio & a US Big Pharma)

• More than 150 000 files to process

But also:

Gating automation of a complex multicentric study (AltraBio & an IMI project)

• 11 centres with different cytometers • Meta-automaton generation

And others….

Page 25: MACHINE LEARNING-BASED GATING AUTOMATION · Tree layouts (e.g. SPADE), Multi Dimensional Scaling, t-stochastic neighbor embeddings (e.g. ViSNE), UMAP etc.) ... ‣ Supervised: Gating

http://www.altrabio.com 30 rue Pré-Gaudry, 69007 Lyon, France

+33 (0)4 26 84 69 63 [email protected]

FOR MORE INFORMATION…