statistical learning: some principles and applicationsadeline.e-samson.org › wp-content ›...
TRANSCRIPT
Statistical learning: some principles and applications
Adeline LECLERCQ SAMSON
Massih-Reza AMINI
A. Samson Grenoble, 7/06/2018 1 / 24
Challenges in Statistical Learning
Some vocabulary in statistical learning1. Learning a decision rule
I Prediction of an (labeled) outcome based on observed variablesI Prediction of the outcome from new observations
2. ClusteringI Creation of groups of similar individuals / objects / variablesI Research of patterns
3. Learning a model from the dataI Physical / biological modelsI Estimation of the parameters, shape of the model
4. Learning associations, statistical testsI Correlation between variablesI Interactions in a network
5. Data visualizationI Exploration of the data, outliers detection, errorsI Reduction of the dimension
A. Samson Grenoble, 7/06/2018 2 / 24
Challenges in Statistical Learning
Challenges in statistical learning
High dimensionI A lot of variables per individualI Aim: learn the effect of all these variables even when the number of
individuals / units remains small
Repeated measuresI Repeated trialsI Longitudinal measurements: several measures in timeI Aim: learn the variability in the process and the evolution in time
Structured dataI Connected objects: a function measured with high frequencyI Functional dataI Aim: learn a function (infinite dimension) and not only some parameters
A. Samson Grenoble, 7/06/2018 3 / 24
Challenges in Statistical Learning
Main approaches in statistical learning
Parametric versus Non Parametric
Non parametricI No prior knowledge of a model, of a relationship between variablesI Objectives: learn the model, the distribution of the variables, the network
ParametricI Models with meaningful parameters: chemical interactions, physical laws,
biological transformationI Prior models: linear regression, reliabilityI Objectives: numerical optimization of a criteria
Challenges
Parsimony: high dimension but maybe few significant signals
Optimization: new technics to optimize complex criteria/space
Distributed calcul: scalability of the solution
A. Samson Grenoble, 7/06/2018 4 / 24
1. Decision rule
1. Decision rule, classification
Supervised learning: class labelsare provided
Aim: learn a classifier to predictclass labels of novel data
Statistical toolsI Logistic regression
(parametric)I K-nearest neighbors
(non-parametric)I Decision tree
(non-parametric)
A. Samson Grenoble, 7/06/2018 5 / 24
1. Decision rule
Decision tree
V1 >= 0.5
V2 >= 0.75
V2 >= 0.67
V1 < 0.2
V2 >= 0.25
1126 3 1 3 2 0
214 355 10 10 3 0
33 2 152 2 3 0
43 5 2 125 0 0
54 1 1 1 98 0
61 0 2 1 3 64
yes no
V1 >= 0.5
V2 >= 0.75
V2 >= 0.87
V2 < 0.85
V1 < 0.75
V2 >= 0.85
V1 < 0.88
V1 < 0.84
V2 < 0.022
V1 >= 0.93
V2 >= 0.019
V1 >= 0.69
V1 < 0.67
V2 < 0.015
V2 >= 0.67
V1 < 0.2
V1 >= 0.2
V1 < 0.2
V1 < 0.13
V1 >= 0.13
V1 >= 0.099
V1 < 0.1
V2 >= 0.25
V1 >= 0.46
V1 < 0.47
V1 < 0.47
178 0 0 1 1 0
144 1 0 2 0 0
14 0 0 0 0 0
20 1 0 0 0 0
20 1 0 0 0 0
30 0 1 0 0 0
50 0 0 0 1 0
11 0 0 0 0 0
11 0 0 0 0 0
20 7 0 0 0 0
20 2 0 0 0 0
30 0 1 0 0 0
30 0 1 0 0 0
212 346 8 10 3 0
33 2 152 2 3 0
30 0 1 0 0 0
40 0 0 2 0 0
11 0 0 0 0 0
20 2 0 0 0 0
40 1 0 10 0 0
41 2 1 64 0 0
41 0 0 49 0 0
11 0 0 0 0 0
40 0 0 1 0 0
50 1 0 0 8 0
53 0 1 0 90 0
61 0 2 1 3 64
yes no
A. Samson Grenoble, 7/06/2018 6 / 24
1. Decision rule
Examples
Advanced personalized medicine
I Predict the best treatment from the knowledge of biomarkers measured at aninitial clinical visit
I Logistic regression and decision tree
A. Samson Grenoble, 7/06/2018 8 / 24
1. Decision rule
Examples
Manufacturing industryI High production costsI Some non conformity at the end of the
production processI Large number of sensors
I Decision tree to predict early in the production process which piece is likely tobe not conform
I Reduce the proportion of non conformityA. Samson Grenoble, 7/06/2018 9 / 24
2. Clustering
2. Clustering
Unsupervised learning: no classlabel is given
Aim
I Creation of groups of similarindividuals / objects /variables;
I Understanding the structureunderlying the data
Statistical toolsI K-means (non-parametric)I Mixture model (parametric)I Bi-clustering, Stochastic
Block Model (parametric)
A. Samson Grenoble, 7/06/2018 10 / 24
2. Clustering
Examples
Consumption curvesI A curve per consumerI Prediction of the future consumption
I Clustering and identification of profiles by mixture model of functional dataI Prediction based on these clusters
A. Samson Grenoble, 7/06/2018 11 / 24
2. Clustering
Examples
Consumption curvesI A curve per consumerI Prediction of the future consumption
I Clustering and identification of profiles by mixture model of functional dataI Prediction based on these clusters
A. Samson Grenoble, 7/06/2018 12 / 24
2. Clustering
Bi-clustering
A. Samson Grenoble, 7/06/2018 13 / 24
2. Clustering
Stochastic block model
A. Samson Grenoble, 7/06/2018 14 / 24
2. Clustering
Examples
Autonomous VehicleI Precision of the positionI Sequence of images
I Segmentation of the images by Stochastic Block ModelI Prediction of the position
A. Samson Grenoble, 7/06/2018 15 / 24
3. Learning a model
3. Learning a model
AimI Fit the data with a modelI Regression model
Statistical toolsI Differential equationsI Point processI Estimation of the parameters
(parametric)I Maximum likelihoodI BayesianI Penalization with high
dimensionI Estimation of the model
(non-parametric)I Splines, Wavelets, Fourier
A. Samson Grenoble, 7/06/2018 16 / 24
3. Learning a model
Examples
Immunotherapy
I Efficacy of the treatmentI Optimization of the treatment, in terms of dose and time to treatment
A. Samson Grenoble, 7/06/2018 17 / 24
3. Learning a model
Examples
Maintenance, reliabilityI Maintenance optimizationI Imperfect maintenance
I Virtual age modeling, point processesI Optimization of the next preventive maintenance
A. Samson Grenoble, 7/06/2018 18 / 24
4. Learning associations
4. Learning associations, statistical tests
AimI Correlation between
variables,I Learning communities
and networks
Statistical toolsI Correlation tests,
multiple testsI Graphical models
A. Samson Grenoble, 7/06/2018 19 / 24
4. Learning associations
Examples
GenomicsI Effect of the pollution on epigenetics
and baby growth
I Associations tests and multiple testingI Mediation to infer causality
A. Samson Grenoble, 7/06/2018 20 / 24
4. Learning associations
Examples
NeurosciencesI Understanding the connexions in the brainI Longitudinal, functional data through EEG, MEG data
A. Samson Grenoble, 7/06/2018 21 / 24
4. Learning associations
Statistical learning frameworks
Open source development frameworks available
Possible to use in industry
A. Samson Grenoble, 7/06/2018 22 / 24
4. Learning associations
Take-Home message
Core-idea of statistical learningI Variety of (industrial) problemsI Variety of statistical questionsI Variety of learning approachesI Machine learning: decision rule, clusteringI Statistical learning: model learning, association/correlation
A strategy that is effective across different disciplinesI HealthI EnvironmentI EnergyI MarketingI Manufacturing industry
A. Samson Grenoble, 7/06/2018 23 / 24
4. Learning associations
Some directions of ongoing research
Structured data, connected objects
Social networks, interaction graph
High dimension
Optimization with constraints
Distributed computation
A. Samson Grenoble, 7/06/2018 24 / 24