machine learning view of omics integration

16
Machine Learning View of OMICs Integration OMICs Integration and Systems Biology course Nikolay Oskolkov, NBIS SciLifeLab Lund, 6.09.2021

Upload: others

Post on 25-Dec-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning View of OMICs Integration

Machine Learning View of OMICs IntegrationOMICs Integration and Systems Biology course

Nikolay Oskolkov, NBIS SciLifeLabLund, 6.09.2021

Page 2: Machine Learning View of OMICs Integration

Various Data Distributions

DATA

Tabular

Text

Image

Time SeriesVideo

Sound

Page 3: Machine Learning View of OMICs Integration

High-Dimensional Data

N

P1

OMIC1

N

P2

OMIC2

N

P3

OMIC3

MetabolomicsN ≈ P

ProteomicsN ≈ P

TranscriptomicsN << P

(Single cell: N <= P)

manageable

Genomics

N <<<< P

MethylomicsN <<<< P

impossible

Page 4: Machine Learning View of OMICs Integration

The Curse of Dimensionalitycomplicates OMICs Integration

Page 5: Machine Learning View of OMICs Integration

Ex.1

The Curse of Dimensionality

P is the number of features (genes, proteins, genetic variants etc.)N is the number of observations (samples, cells, nucleotides etc.)

Biomedicine

Ex.2

Biased ML varianceestimator in HD-space

Page 6: Machine Learning View of OMICs Integration

Equidistant Points in High Dimensions

Data points become far from each other and equidistant from each other in high dimensions

The differences between closest and furthest data point neighbours disappearsin high-dimensional spaces – can’t cluster

In high-dimensional space we can not separate cases and controls any more

Page 7: Machine Learning View of OMICs Integration

Big Data in Single Cell

Watch out Underfitting! Paradise for Deep Learning!

Page 8: Machine Learning View of OMICs Integration

How to define and evaluate OMICs Integration?

Page 9: Machine Learning View of OMICs Integration

What I Mean by OMICs Integration

OnPLS

JIVE

DISCOClustering of Clusters

Idea Behind OMICs Integration:See Patterns Hidden in Individual OMICS

Page 10: Machine Learning View of OMICs Integration

How I Evaluate OMICs Integration,Data Science: Boost in Prediction

TEXT (78%) IMAGE (83%) SOUND (75%)

Predict Facebookuser interests

Data IntegrationAccuracy: 96%

Page 11: Machine Learning View of OMICs Integration

Prediction is an Ultimate Criterion of Successful OMICS Integration

Statistics searches for candidates

Consequence

Machine Learning optimizes prediction

Consequence

B. Maher, Nature 456, 18-21 (2008)

Page 12: Machine Learning View of OMICs Integration

How I Evaluate OMICs Integration,Life Sciences: Boost in Prediction

HEALTHY

Methylation (78%) Gene Expression (83%) Phenotype (75%)

SICK

Data IntegrationAccuracy: 96%

1) Convert to common space:Neural Networks, SNF, UMAP

2) Explicitly model distributions:MOFA, Bayesian Networks

3) Extract common variation:PLS, CCA, Factor Analysis

Page 13: Machine Learning View of OMICs Integration

What Method of Integration to Select?

For Example:1) With ~110 samples it is a good idea to do linear OMICs integration 2) T2D is a phenotype of interest, therefore supervised integration

Linear Non-Linear

Supervised PLS / OPLS / mixOmics,LASSO / Ridge / Elastic Net

Neural Networks, Random Forest, Bayesian Networks

Unsupervised Factor Analysis / MOFAAutoencoder, SNF, UMAP,

Clustering of Clusters

Page 14: Machine Learning View of OMICs Integration

Feature Pre-Selection to Overcome the Curse of Dimensionality

Data Set (4 OMICs)

Train Set (n = 88)

29% of Diabetics

Test Set (n = 22)

Feature Pre-Selection

OMICs IntegrationTrained Model

Evaluationunsupervised supervised

WGS (~30 mln dims)BSseq (~30 mln dims)

Page 15: Machine Learning View of OMICs Integration

Protocol of Integrative OMICS Analysis

1) Check that there is a relation between the OMICs (MOFA)

2) Choose integrative model based on amount of data and goal (linear, supervised)

3) Do feature pre-selection (supervised or unsupervised) on train data set

4) Integrate the OMICs using your favorite model chosen in 2) on train data set

5) Compare prediction of integrative model with predictions from individual OMICs

Page 16: Machine Learning View of OMICs Integration

National BioinformaticsInfrastructure Sweden (NBIS)