ssb debate: model-based inference vs. machine learning · introduction to topics and terminology...
TRANSCRIPT
SSB Debate:
Model-based Inference vs. Machine Learning
June 3, 2018
SSB 2018 June 3, 2018 1 / 20
Machine learning in the biological sciences
SSB 2018 June 3, 2018 2 / 20
Machine learning in the biological sciences ....
..... and in our meeting!
SSB 2018 June 3, 2018 3 / 20
Introduction to topics and terminology
Machine learning:
Collection of algorithmic methods for pattern recognition, classification, andprediction, that may be based on models derived from existing data
Model-based inference:
Techniques in which probability models are defined and then fit to a data set,with the goals of evaluating model fit, estimating parameters, or testinghypotheses
SSB 2018 June 3, 2018 4 / 20
Example
Motivating problem:
Given a set of measurements on a sample of lizards, determine species-levelmemberships for the individuals
IData:
For each lizard sampled, collect two continuous measurements: snout-to-ventlength (SVL) and hind-leg length (HLL)
IQuestions of interest:
How many species are represented in the sample?
To what species does each individual belong?
Given a new lizard, how can we classify it into a species given what we’velearned from our initial sample?
SSB 2018 June 3, 2018 5 / 20
Types of machine learning algorithms
Supervised learning: Lizards in the sample are associated with species labels
Hin
d-le
g Le
ngth
Snout-to-Vent Length
SSB 2018 June 3, 2018 6 / 20
Types of machine learning algorithms
Supervised learning example 1: disciminant analysis
IBasic idea: Find a discriminant function for each group that can be evaluatedat the observed variables and used to classify individuals into groups
IExample:
For these data, the linear discriminant function is
0.0695SVL� 0.0591HLL
Evaluating this for our two groups, we find
Green, long-tailed lizards Blue, short-tailed lizards
-0.65035437 1.71239124-2.52041576 2.13046090-0.82433696 0.57807884-2.18249691 2.01962806-2.21079877 -0.029346570.11382142 2.39276035-0.77233604 0.24294457
SSB 2018 June 3, 2018 7 / 20
Example
Hin
d-le
g Le
ngth
Snout-to-Vent Length
SSB 2018 June 3, 2018 8 / 20
Types of machine learning algorithms
Supervised learning example 2: decision trees
SSB 2018 June 3, 2018 9 / 20
Example
Hin
d-le
g Le
ngth
Snout-to-Vent Length
SSB 2018 June 3, 2018 10 / 20
Types of machine learning algorithms
Supervised learning example 3: Support Vector Machines (SVM)
IBasic idea: Find a line (or plane) that produces maximum separation betweentwo groups of data
Hin
d-le
g Le
ngth
Snout-to-Vent Length
SSB 2018 June 3, 2018 11 / 20
Types of machine learning algorithms
Supervised learning: many, many other algorithms/methods exist
Examples:
I Neural networks
I k-Nearest Neighbor (k-NN) classifier
I Random forests
SSB 2018 June 3, 2018 12 / 20
Types of machine learning algorithms
Unsupervised learning: Lizards in the sample are NOT associated withspecies labels – species must be learned from the data
Example data:
Hin
d-le
g Le
ngth
Snout-to-Vent Length
SSB 2018 June 3, 2018 13 / 20
Types of machine learning algorithms
Unsupervised learning example 1: k-means clustering
Basic idea: partition the data into k groups such that the sum of squaresfrom the data points to the assigned cluster centers is minimized
Example: For our data with k = 2, we have
Green, long-tailed lizards Blue, short-tailed lizards2 21 21 21 21 22 22 2
SSB 2018 June 3, 2018 14 / 20
Example
Hin
d-le
g Le
ngth
Snout-to-Vent Length
SSB 2018 June 3, 2018 15 / 20
Types of machine learning algorithms
Unsupervised learning example 2: Principal components analysis
Basic idea: use an orthogonal transformation to convert a set ofobservations of possibly correlated variables into a set of values of linearlyuncorrelated variables called principal components
PC
A2
(45%
)
PCA1 (55%)
SSB 2018 June 3, 2018 16 / 20
Types of machine learning algorithms
Unsupervised learning: many, many other algorithms/methods exist
Examples:
I Hierarchical clustering
I Self-organizing map (SOM)
SSB 2018 June 3, 2018 17 / 20
Model-based methods
Basic idea: Write a formal model and fit the model to the data
Assumption is that the model describes the data su�ciently well that thedesired insights can be made through parameter estimation (Schrider andKern 2017)
Examples:
I Maximum likelihood
I Bayesian methods
I Discriminant analysis ??
I Logistic regression ??
In many cases, a method has elements of both machine learning andmodel-based inference, and methods are di�cult to classify!
SSB 2018 June 3, 2018 18 / 20
Machine learning vs. model-based methods
Now that we’ve got the basics down, we can make some broad comparisons
Machine learning methods
I Most often used for classification and prediction
I Useful when there are LOTS of data (training set, test set, pattern extraction)
I Useful when the number of predictor variables is much larger than the numberof observations
Model-based methods
I Most often used for inference - parameter estimation and hypothesis testing
I Model fitting may become di�cult for large data sets / complex models
I Methods perform best when the number of observations is larger than thenumber of predictors
SSB 2018 June 3, 2018 19 / 20
Introducing our panelists
Cecile Ane
Professor, Departments of Statisticsand Botany
University of Wisconsin-Madison
Machine learning
Joe Rusinko
Associate Professor, Department ofMathematics and Computer ScienceHobart and William Smith Colleges
Model-based methods
SSB 2018 June 3, 2018 20 / 20