ssb debate: model-based inference vs. machine learning · introduction to topics and terminology...

20
SSB Debate: Model-based Inference vs. Machine Learning June 3, 2018 SSB 2018 June 3, 2018 1 / 20

Upload: others

Post on 24-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

SSB Debate:

Model-based Inference vs. Machine Learning

June 3, 2018

SSB 2018 June 3, 2018 1 / 20

Page 2: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Machine learning in the biological sciences

SSB 2018 June 3, 2018 2 / 20

Page 3: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Machine learning in the biological sciences ....

..... and in our meeting!

SSB 2018 June 3, 2018 3 / 20

Page 4: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Introduction to topics and terminology

Machine learning:

Collection of algorithmic methods for pattern recognition, classification, andprediction, that may be based on models derived from existing data

Model-based inference:

Techniques in which probability models are defined and then fit to a data set,with the goals of evaluating model fit, estimating parameters, or testinghypotheses

SSB 2018 June 3, 2018 4 / 20

Page 5: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Example

Motivating problem:

Given a set of measurements on a sample of lizards, determine species-levelmemberships for the individuals

IData:

For each lizard sampled, collect two continuous measurements: snout-to-ventlength (SVL) and hind-leg length (HLL)

IQuestions of interest:

How many species are represented in the sample?

To what species does each individual belong?

Given a new lizard, how can we classify it into a species given what we’velearned from our initial sample?

SSB 2018 June 3, 2018 5 / 20

Page 6: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Supervised learning: Lizards in the sample are associated with species labels

Hin

d-le

g Le

ngth

Snout-to-Vent Length

SSB 2018 June 3, 2018 6 / 20

Page 7: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Supervised learning example 1: disciminant analysis

IBasic idea: Find a discriminant function for each group that can be evaluatedat the observed variables and used to classify individuals into groups

IExample:

For these data, the linear discriminant function is

0.0695SVL� 0.0591HLL

Evaluating this for our two groups, we find

Green, long-tailed lizards Blue, short-tailed lizards

-0.65035437 1.71239124-2.52041576 2.13046090-0.82433696 0.57807884-2.18249691 2.01962806-2.21079877 -0.029346570.11382142 2.39276035-0.77233604 0.24294457

SSB 2018 June 3, 2018 7 / 20

Page 8: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Example

Hin

d-le

g Le

ngth

Snout-to-Vent Length

SSB 2018 June 3, 2018 8 / 20

Page 9: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Supervised learning example 2: decision trees

SSB 2018 June 3, 2018 9 / 20

Page 10: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Example

Hin

d-le

g Le

ngth

Snout-to-Vent Length

SSB 2018 June 3, 2018 10 / 20

Page 11: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Supervised learning example 3: Support Vector Machines (SVM)

IBasic idea: Find a line (or plane) that produces maximum separation betweentwo groups of data

Hin

d-le

g Le

ngth

Snout-to-Vent Length

SSB 2018 June 3, 2018 11 / 20

Page 12: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Supervised learning: many, many other algorithms/methods exist

Examples:

I Neural networks

I k-Nearest Neighbor (k-NN) classifier

I Random forests

SSB 2018 June 3, 2018 12 / 20

Page 13: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Unsupervised learning: Lizards in the sample are NOT associated withspecies labels – species must be learned from the data

Example data:

Hin

d-le

g Le

ngth

Snout-to-Vent Length

SSB 2018 June 3, 2018 13 / 20

Page 14: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Unsupervised learning example 1: k-means clustering

Basic idea: partition the data into k groups such that the sum of squaresfrom the data points to the assigned cluster centers is minimized

Example: For our data with k = 2, we have

Green, long-tailed lizards Blue, short-tailed lizards2 21 21 21 21 22 22 2

SSB 2018 June 3, 2018 14 / 20

Page 15: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Example

Hin

d-le

g Le

ngth

Snout-to-Vent Length

SSB 2018 June 3, 2018 15 / 20

Page 16: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Unsupervised learning example 2: Principal components analysis

Basic idea: use an orthogonal transformation to convert a set ofobservations of possibly correlated variables into a set of values of linearlyuncorrelated variables called principal components

PC

A2

(45%

)

PCA1 (55%)

SSB 2018 June 3, 2018 16 / 20

Page 17: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Types of machine learning algorithms

Unsupervised learning: many, many other algorithms/methods exist

Examples:

I Hierarchical clustering

I Self-organizing map (SOM)

SSB 2018 June 3, 2018 17 / 20

Page 18: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Model-based methods

Basic idea: Write a formal model and fit the model to the data

Assumption is that the model describes the data su�ciently well that thedesired insights can be made through parameter estimation (Schrider andKern 2017)

Examples:

I Maximum likelihood

I Bayesian methods

I Discriminant analysis ??

I Logistic regression ??

In many cases, a method has elements of both machine learning andmodel-based inference, and methods are di�cult to classify!

SSB 2018 June 3, 2018 18 / 20

Page 19: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Machine learning vs. model-based methods

Now that we’ve got the basics down, we can make some broad comparisons

Machine learning methods

I Most often used for classification and prediction

I Useful when there are LOTS of data (training set, test set, pattern extraction)

I Useful when the number of predictor variables is much larger than the numberof observations

Model-based methods

I Most often used for inference - parameter estimation and hypothesis testing

I Model fitting may become di�cult for large data sets / complex models

I Methods perform best when the number of observations is larger than thenumber of predictors

SSB 2018 June 3, 2018 19 / 20

Page 20: SSB Debate: Model-based Inference vs. Machine Learning · Introduction to topics and terminology Machine learning: Collection of algorithmic methods for pattern recognition, classification,

Introducing our panelists

Cecile Ane

Professor, Departments of Statisticsand Botany

University of Wisconsin-Madison

Machine learning

Joe Rusinko

Associate Professor, Department ofMathematics and Computer ScienceHobart and William Smith Colleges

Model-based methods

SSB 2018 June 3, 2018 20 / 20