Download - Statistical methods for big data in life sciences and ...€¦ · Statistical methods for big data in life sciences and health with R Philippe Jacquet Data Scientist at CADMOS June

Statistical methods for big data

in life sciences and health with R

Philippe Jacquet

Data Scientist at CADMOS

June 2018

CADMOS

CADMOS: Center for Advanced Modeling Of Science

CADMOS


Free service for all UNIL, EPFL and UNIGE researchers

CADMOS



Help with HPC parallel computing (code optimization) orBig Data analysis (pipeline development), including somecomputing power at SCITAS (the EPFL cluster)

CADMOS



Help with HPC parallel computing (code optimization) orBig Data analysis (pipeline development), including somecomputing power at SCITAS (the EPFL cluster)

Contact:UNIL: [email protected], [email protected]: [email protected]: [email protected]

Outline

TODAY

13:30 – 15:00 Lecture on Neural Network

15:00 – 15:30 Coffee break

15:30 – 17:00 Practicals on Neural Network

Outline

TODAY

13:30 – 15:00 Lecture on Neural Network


15:30 – 17:00 Practicals on Neural Network

TOMORROW (Room 321 - Amphipôle)

09:00 – 10:30 Lecture on Decision Tree and Random Forest


11:00 – 12:30 Practicals on Decision Tree and Random Forest

Outline

If you have any question during the lecture

you are welcome to ask

Outline

We will analyse the same dataset (iris data)

with all the methods

Outline

And present some applications

in health science

The iris dataset

The iris dataset

4 measurements on 50 flowers in 3 species

The iris dataset

Sepal length Sepal width Petal length Petal width Species

5.1 3.5 1.4 0.2 setosa

7.0 3.2 4.7 1.4 versicolor

6.3 3.3 6.0 2.5 virginica

The iris dataset


5.1 3.5 1.4 0.2 setosa

7.0 3.2 4.7 1.4 versicolor

6.3 3.3 6.0 2.5 virginica

input output

The iris dataset

GOAL

Given an input (4 measurements)

predict its output (species)

Machine learning

MACHINE LEARNING

UNSUPERVISED LEARNING- only input data- explorative models: clustering, dim. reduction

Machine learning

MACHINE LEARNING

UNSUPERVISED LEARNING- only input data- explorative models: clustering, dim. reduction

SUPERVISED LEARNING- input data + output data- predictive models: classi�cation, regression

Supervised learning

Data splitting:

train validation test

0 60% 80% 100%

model selection model assessementmodel training

Supervised learning

Data splitting:


0 60% 80% 100%


Model training: estimate the model parameters

Supervised learning

Data splitting:


0 60% 80% 100%



Model selection: estimate the performance of differentmodels in order to choose the best one

Supervised learning

Data splitting:


0 60% 80% 100%



Model selection: estimate the performance of differentmodels in order to choose the best one

Model assessment: having chosen a final model, estimateits prediction accuracy on new data

Supervised learning

EXAMPLE

Supervised learning

Model 1= Neural Network

Supervised learning


Estimate the model parameters using the training set

Supervised learning



Measure the model accuracy based on the validation set

Supervised learning

Model 2 = Decision Tree

Supervised learning

Model 3 = Random Forest

Supervised learning

Model Selection

Supervised learning

Model Selection

Select the best model based on its accuracy

Supervised learning

Model Selection

Select the best model based on its accuracy

Neural Network = Decision Tree

Supervised learning

Model Assessment

Supervised learning

Model Assessment

Mesure best model accuracy on the test set

Supervised learning

Model Assessment

Mesure best model accuracy on the test set

THE END

Neural network for the iris data

How to build a neural network

for the iris data ?


x1

x2

x3

x4

Input Layer Output LayerHidden Layer

y1

y2

y3

Sepal LengthProb. Setosa

Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


x1

x2

x3

x4

Input Layer Output LayerMany Hidden Layers

y1

y2

y3


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Deep Learning = More than 1 Hidden Layer


x1

x2

x3

x4


y1

y2

y3


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Weights Weights


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica

Weights WeightsActivationFunction


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica



What is

the mathematical formula

for this neural network ?


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


X W(1) Z(1) ReLU Z(2) W(2) Z(3) SoftMAX Y


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


W(1) Z(1) ReLU Z(2) W(2) Z(3) SoftMAX Y

X


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


Z(1) ReLU Z(2) W(2) Z(3) SoftMAX Y

W (1)X


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


ReLU Z(2) W(2) Z(3) SoftMAX Y

Z (1)

� �� W (1)X


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


Z(2) W(2) Z(3) SoftMAX Y

ReLU(

Z (1)

� �� W (1)X )


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


W(2) Z(3) SoftMAX Y

ReLU(

Z (1)

� �� W (1)X )� ��

Z (2)


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


Z(3) SoftMAX Y

W (2) ReLU(

Z (1)

� �� W (1)X )� ��

Z (2)


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


Y = SoftMax(W (2) ReLU(

Z (1)

� �� W (1)X )� ��

Z (2)

)


In summary


x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


Y = f (X ) = SoftMax(W (2)ReLU(W (1)X ))

Forward propagation

In forward propagation, we go from X to Y by applying a seriesof functional transformations :

Y = SoftMax(W (2)ReLU(W (1)X ))

Forward propagation



Questions:

How do we determine the weights W (1) and W (2) ?

Forward propagation



Questions:


Why do we need the activation function ReLU ?

Forward propagation



Questions:


Why do we need the activation function ReLU ?

Why do we use the SoftMax function at the end ?

Weight matrices

How do we determine

the weight values ?

Weight matrices

We want to choose the weights to make

the best predictions possible

Weight matrices

We will use an algorithm

to find these weights

Weight matrices

This algorithm needs

some initial weight values

Weight matrices

How do we choose

the initial weight values ?

Weight matrices

The naive choice

Weight matrices

We set all the weights to zero

Weight matrices

All the predictions are the same:

P=1/3 for the three species

Weight matrices

The weights are updated by the same amount

and thus cannot get unequal values

Weight matrices

The network cannot learn

Weight matrices

A good choice

Weight matrices

Random initialisation around 0

Weight matrices

We choose the weights according to

a normal distribution N (0, σ2)

Weight matrices

We choose the weights according to

a normal distribution N (0, σ2)

The value of σ2 affects the performance

(speed and accuracy)

Weight matrices

Two common choices

σ2 =1

Nin(Xavier)

σ2 =2

Nin + Nout(Glorot & Bengio)

Nin = the number of nodes going in the weight matrix W

Nout = the number of nodes going out the weight matrix W

Weight matrices

Probabilistic framework

Linear model:

Y = W X

Yi =

Nin�

j=1

Wij Xj , i = 1, . . . ,Nout

Assumptions: Xj ∼ N (0,σ2x ) and Wij ∼ N (0,σ2)

The condition V (Yi) = V (Xj) leads to the values for σ2:Xavier and Glorot (when including backpropagation)

Weight matrices

EXAMPLE

x1

x2

x3

x4

Input Layer Hidden Layer

z(1)2

z(1)3

z(1)4

z(1)5

z(1)1

w(1)11

w(1)54

Weight matrices

EXAMPLE

x1

x2

x3

x4

Input Layer Hidden Layer

z(1)2

z(1)3

z(1)4

z(1)5

z(1)1

w(1)11

w(1)54

Nin = 4

Nout = 5

STEP 1


STEP 1



5.1 3.5 1.4 0.2 setosa

7.0 3.2 4.7 1.4 versicolor

6.3 3.3 6.0 2.5 virginica

input output

STEP 1


Matrix multiplication (σ = 0.5):

W (1) X =

0.5 0.1 −0.2 −0.4−0.4 1.0 0.5 1.0−0.2 −0.2 −0.5 −0.10.2 0.7 0.3 0.20.6 0.6 0.1 −0.4

5.13.51.40.2

=

5.42.22.9−1.7−2.4

Activation function

Why do we need

the activation function ReLU ?

Activation function

A neural network without any non-linear activation functionwould simply be a linear regression model

Activation function


There are several possible non-linear activation functions:

Activation function


There are several possible non-linear activation functions:

ReLU is very simple and very popular in deep neural networks

STEP 2


STEP 2


Activation function (ReLU):

ReLU(W (1) X ) = ReLU(

5.42.22.9−1.7−2.4

) =

5.42.22.900

STEP 3


STEP 3


Matrix multiplication (σ = 0.5):

W (2)ReLU(W (1)X ) =

0.6 0.1 0.9 −0.2 −0.50.3 −0.3 0.3 −0.9 −0.90.3 0.2 0.4 −1.0 0.6

5.42.22.900

=

1.90.32.9

SoftMax function

Why do we use the

SoftMax function ?

SoftMax function

x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


SoftMax function

x1

x2

x3

x4


z(1) y1

y2

y3

2

z(1)3

z(1)4

z(1)5

z(1)1

z(2)2

z(2)3

z(2)4

z(2)5

z(2)1

z(3)2

z(3)3

z(3)1

Soft

Max

w(1)11

w(1)54

w(2)11

w(2)35


Sepal Width

Petal Length

Petal Width

Prob. Versicolor

Prob. Virginica


The output Z (3) are arbitrary real values and we want to convertthem into probabilities:

SoftMax(Z (3))i =ez(3)

i

ez(3)1 + ez(3)

2 + ez(3)3

= Prob. Class i ∈ (0, 1)

SoftMax function

Binary classification

SoftMax function

SoftMax function

Binary classification:

Empirical observation - logistic regression:

Motivate - sigmoid function (1 output neuron):

wikipedia

SoftMax function

Binary classification:

Empirical observation - logistic regression:

Motivate - sigmoid function (1 output neuron):

wikipedia

Multinomial logistic regression: SoftMax function (K classes):

P(X )� =eβ�X

�Kk=1 eβk X

SoftMax(X )� =eX�

�Kk=1 eXk

STEP 4


STEP 4


SoftMax function:

SoftMax(W (2)ReLU(W (1)X )) = SoftMax(

1.90.32.9

) =

0.250.050.70

STEP 4


SoftMax function:


1.90.32.9

) =

0.250.050.70

Prediction: Virginica

STEP 4


SoftMax function:


1.90.32.9

) =

0.250.050.70

Prediction: Virginica This is wrong! It is Setosa

Network training using gradient descent

What to do when

the prediction is wrong ?

Download - Statistical methods for big data in life sciences and ...€¦ · Statistical methods for big data in life sciences and health with R Philippe Jacquet Data Scientist at CADMOS June

Top Related