the art and power of data-driven modeling: statistical and machine learning approaches - nataliya...

33
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches 1 PhD in Applied Mathematics Past: Postdoctoral research on brain MRI segmentation Current: Applied machine learning in materials science Nataliya Portman Postdoctoral Fellow Faculty of Science, UOIT, Oshawa, ON Canada “AI with the best” online conference September 24, 2016

Upload: withthebest

Post on 14-Jan-2017

39 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches

1

PhD in Applied Mathematics

Past: Postdoctoral research on brain MRI segmentation

Current: Applied machine learning in materials science

Nataliya Portman

Postdoctoral FellowFaculty of Science, UOIT, Oshawa, ON Canada

“AI with the best” online conference September 24, 2016

Page 2: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

• Statistical versus machine learning: - Principles - Goals - Applications in biomedical sciences

• Automatic brain tissue classification of infant brain MRI (Montreal Neurological Institute) - Challenges of automated segmentation - Combined solution: Kernel-based classifier + perceptual image quality model• Conclusion

Overview

2

Nataliya Portman

Page 3: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

3

Statistical Learning

• Learning is a process of probabilistic inference• Instance space X (quantities of interest, e.g., wind)• Hypothesis space H (e.g., h1=strong, h2=weak)• Training samples D (observed data, N recordings of

wind)

P(h | D) =P(D | h) P(h)

P(D)

Nataliya Portman

The Posterior

The probability that hypothesis h is true given the evidence D.

The Evidence

The probability of getting the evidence D if the hypothesis h were true.

The Prior

The probability of h being true, before gathering evidence.

The marginal probability of the evidence (Probability of D over all possible hypotheses).

Common statistical learning methods:• Bayesian• Maximum a posteriori

(MAP)• Maximum likelihood

3

Page 4: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Bayesian Learning

• An unknown quantity is a random variable • Requires the hypothesis prior P(hi)• Combines prior probabilities with observed data• Predictions are made by using all the hypotheses

weighted by their probabilities

Usually, a hypothesis determines a probability distribution over the unknown quantity of interest X (e.g., parameters of the Gaussian distribution).

μ, σ

P(hi | D) = α P(D | hi)P(hi),

P(X | D) = P(X | D, hi)P(hi | D)i

∑ .4

Nataliya Portman

The posterior

The predictive probability

Page 5: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

5

Bayesian Learning? Nataliya Portman

Page 6: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

MAP Learning

• For each hypothesis h in H, calculate the posterior probability

• Output the hypothesis hMAP with the highest posterior probability

P(h | D) =P(D | h)P(h)

P(D)

hMAP = argmaxh∈H

P(h | D)

6

Nataliya Portman

Page 7: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Maximum Likelihood Learning

P(h | D) =P(D | h)P(h)

P(D)• Assumes a prior P(h) is uniform over the space of

hypotheses H• Chooses an h that maximizes P(D|h)

• Reasonable approach when there is no reason to prefer one hypothesis over another a priori

• A good approximation to MAP and Bayesian learning when the dataset is large

hML = argmaxh∈H

P(D | h)

7

Nataliya Portman

Page 8: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

MAP Learning implementation

Distribution of grey level intensities of 3D adult brain MRI

• Training dataset D: 3D brain MR images• Hypothesis space per voxel: {h1,h2,h3} with h1=WM, h2=GM,

h3=CSF• Probability models of each tissue type:• Tissue class priors: P(WM), P(GM), P(CSF) Output: posterior probabilities (“soft” segmentation)

Ν(μ k,σ k ), k =1,2,3.

P(x i, j ,k = hl | D), l =1,2,3.

Decision boundaries

8

Nataliya Portman

Page 9: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

MAP Learning: Expectation-Maximization Algorithm

We estimate initial tissue class priors• Interactively select representative voxels for each

tissue type from each individual scan in the training dataset (and fit the Gaussians)

• Compute the ratios of each tissue class voxels with respect to all the representative voxels in the training data.

9

Nataliya Portman

Page 10: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

MAP Learning: Expectation-Maximization Algorithm

p(h j | x i,n,k , Φ(m )) =N(x i,n,k | h j ,Φ

(m ))P(h j )(m )

N(x i,n,k | hl ,Φ(m ))P(hl )

(m )

l∑ , Φ(m ) = {μ j

(m ), σ j(m )} for j =1, 2, 3.

Expectation step, mth iteration: Compute

Maximization step: Update of the Gaussian parameters

corresponding to the new posterior distribution obtained at the expectation step.

μk(m ), σ k

(m ) →μ k(m +1), σ k

(m +1)

P(WM | D),P(GM | D),P(CSF | D)

If D is the training dataset then P(h | D) is a probabilistic brain atlas

10

Nataliya Portman

Ep(h |x,Φ ( m) )

[ln p(h | Φ)]

Page 11: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Clinical applications

• Statistical learning is used in diagnostics classification.

• Example: Diagnostics in oncology (e.g., the diagnosis of a tumor as being “benign” or “malignant”).

• Relies on logistic regression model of the conditional probability

• Regression coefficients are estimated from a sample of N individuals with known covariate values x(n)=(x1

(n), x2(n),…,xp

(n),)

and known class h(n) in {0,1} via the minimization of a distance measure.

Odds(x) =P(h =1 | D = x)P(h = 0 | D = x)

= exp(β0 + βii=1

p

∑ x i),

11

Nataliya Portman

G. Schwarzer et al., Statistics in Medicine, 2000

Page 12: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Clinical applications (Machine Learning?)

X1

X2

X3

X4

h

P( h=1 | x )=f( x, w, W )

wij Wi

Neural Networks is another approach to model the conditional probability with a logistic transfer function.

G. Schwarzer et al., Statistics in Medicine, 2000.

• Lacks an easy interpretation of NN model parameters

• Generates implausible functions 12

Nataliya Portman

Page 13: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Given the training dataset of N observations of K-dimensional feature vector X and the corresponding outcomes Y, learn a mapping f(X) that minimizes the lossL(Y,f(X)).

X Unknown Y

13 Algorithm

Machine learning Nataliya Portman

Page 14: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

14

Machine learning

Modeling reduces to a problem of function optimization

Machine learning = algorithmic modeling

Target: find an algorithm that predicts the outcome for new samples outside of the training dataset

Algorithms:• Support Vector Machines• Artificial Neural Networks• Convolutional Neural Networks• Random Forests• Boosting• Decision Trees

Nataliya Portman

Page 15: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Brain tissue classification of infant brain MRI

McConnell Brain Imaging Centre

Montreal Neurological InstituteMcGill University

Postdoctoral fellow

15

Nataliya Portman

Page 16: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

The NIH (National Institutes of Health) pediatric “Objective-2” MRI database is the largest demographically diverse U.S. sample that consists of 69 subjects aged 10 days to 4.5 years of age.

16

Page 17: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Brain tissue classification of infant brain MRI

17

Nataliya Portman

Page 18: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Child =Greater intensity variation due to myelination of WM

Adult:NoiseIntensity non-uniformity +Partial Volume Effect Natural tissue intensity variation

Brain tissue classification of infant brain MRI

Challenges with existing software:• CIVET pipeline (developed at MNI) fails to perform

automatic accurate automatic classification into GM, WM and the CSF

• General anatomical image processing pipelines such as FSL (Smith et al., 2004) and SPM (Ashburner, 1997) poorly detect major tissue classes in NIH “Objective-2” dataset. 18

Nataliya Portman

Page 19: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Brain tissue classification of infant brain MRI

Three major segmentation frameworks (supervised):Expectation-Maximization[VanLeemput et al., 1999], [Tohka et al., 2004], [Prastawa and Gerig,2004], [Xue et al., 2007], [Murgasova et al., 2007]Registration-based[Collins et al., 1999], [Murgasova et al., 2007]Label Fusion[Weisenfeld and Warfield, 2009]

19

Nataliya Portman

Page 20: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Methodological limitations• Global estimation of tissue intensity distributions (EM, Label

fusion).Due to biological intensity variation and Partial Volume Effect (PVE) tissue intensity distribution in infant MRI can differ from the Gaussian (EM).

• Supervised (atlas-dependent) approach that assumes small deviations from average brain anatomy (EM, Registration-based).

Brain tissue classification of infant brain MRI

20

Nataliya Portman

Page 21: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

21

Imagine….

Human Visual System (HVS)

Information extraction

Computer Vision

that we have built an intelligent machine (software) that effectively identifies brain structures with the same accuracy as our Human Visual System.

Page 22: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Brain tissue classification of infant brain MRI

Classification machine requirements:• Does not depend on a probabilistic brain atlas• Does not assume global models of tissue intensity distributions• Objectively evaluates the quality of classification as perceived by

the Human Visual System• Multichannel• Flexible, can be extended to multiclass classification Impact:Alleviates an agonizing pain of • probabilistic atlas construction • manual segmentation• improves accuracy of segmentation of child brain MRI• accelerates research rate in the field of early brain development• revolutionizes the field of MRI segmentation

22

Nataliya Portman

Page 23: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Birth of a “Visionary”

Brain tissue classification of infant brain MRI

The “Visionary” is a MATLAB software that accomplishes a challenging task of brain tissue classification in child brain MRI.

Perceptual image quality model: In absence of “ground truth” it tries to mimic human perception of the quality of classification Structural SIMilarity Index (SSIM).The philosophy underlying the SSIM approach: the Human Visual System is highly adapted to extract structural information from images.

How is “Visionary” built?

23

Nataliya Portman

Page 24: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

24

μx ,μ y

σ x ,σ y

C1,C2,C3

- the local means of the corresponding image patches x and y,

- the local standard deviations (respectively),

- the small positive constants to stabilize each term.€

SSIM(x, y) = l(x, y)⋅ c(x,y)⋅ s(x,y) =

=2μ xμ y + C1

μ x2 + μ y

2 + C1

⋅2σ xσ y + C1

σ x2 +σ y

2 + C1

⋅σ xy + C3

σ xσ y + C3

,

Visionary classified image T1w template (08-11mon)

MSSIM quantifies the degree of structural similarity between input and classified images.

MSSIM=0.8614

Brain tissue classification of infant brain MRI Nataliya Portman

Page 25: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

25

Brain tissue classification of infant brain MRI

The choice of the reference depends on the age of the subject.T1w serves as a reference for MR brain data for ages 8 months and later.

Age: 02-05 months

- =

Nataliya Portman

Page 26: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

26

Brain tissue classification of infant brain MRI Nataliya Portman

KFDA-Kernel Fisher Discriminant Analysis

Page 27: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Modified KFDA criterion:

- spatial regularization term in the feature space, K and H are the kernel andnegative Laplacian matrices,

M and N are between-class and within-class covariance matrices.

• Feature selection method ( tissue intensities, morphological measurements, etc. ) in machine learning

• KFDA separability criterion measures the discriminating ability of a feature or a subset of features to distinguish between different classes.

• The power of KFDA lies in its generality (does not assume multivariate probability models of the classes) and closed form solution (algebraic).

Input and KFDA-classified data in stereotaxic and intensity spaces

Kernel Fisher Discriminant Analysis

27

Nataliya Portman

Page 28: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Brain tissue classification of infant brain MRI

Results for the brain template 08 to 11 months

28

Nataliya Portman

Page 29: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

MSSIM=0.8234 MSSIM=0.8537

Brain tissue classification of infant brain MRI

WM, GM and CSF detection in brain MRI template for ages 08 to 11 months.

T1w PVE Visionary

Myelinated WM detection in the brain MRI template for ages 02 to 05 months.29

Nataliya Portman

Page 30: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

Brain tissue classification of infant brain MRI

T2w PVE Visionary unmyelinated WM

Objective-2 template, age range: 02-05 months 30

Nataliya Portman

Page 31: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

31

Objective-2 template, age range: 44-60 months

Reference Initialization Visionary (label transfer from an older brain)

Brain tissue classification of infant brain MRI Nataliya Portman

Page 32: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

32

Brain tissue classification of infant brain MRI Nataliya Portman

…. still unpublished.

Page 33: The Art and Power of Data-Driven Modeling: Statistical and Machine Learning Approaches - Nataliya Portman

• Machine learning (ML) methods provide algorithmic models for an unknown mapping between predictor and outcome variables• ML techniques are differently motivated, the goal is to forecast the outcome with acceptable accuracy, to be transferrable to new datasets • Statistical learning methods are focused on estimation of the probability distribution over hypothesis space • In biomedical applications, models that explain the data are preferable as they allow to reveal statistically significant influences of some covariates on the outcome• In order to devise an appropriate method for data processing and analysis, one has to understand the data, namely, the source of noise and signal variation and mathematical assumptions of inference methodology

33

Conclusion Nataliya Portman