machine learning with matlab

1 © 2013 The MathWorks, Inc.

Machine Learning with MATLAB

Stefan Duprey, Application Engineer

[email protected]

2

What You Will Learn

Overview of machine learning

Algorithms available with MATLAB

MATLAB as an interactive environment

for evaluating and choosing the best algorithm

3

Machine Learning Basic Concepts

Start with an initial set of data

“Learn” from this data

– “Train” your algorithm

with this data

Use the resulting model

to predict outcomes

for new data sets

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

4

Machine Learning Characteristics and Examples

Characteristics

– Lots of data (many variables)

– System too complex to know

the governing equation (e.g., black-box modeling)

Examples

– Pattern recognition (speech, images)

– Financial algorithms (credit scoring, algo trading)

– Energy forecasting (load, price)

– Biology (tumor detection, drug discovery)

93.68%

2.44%

0.14%

0.03%

0.03%

0.00%

0.00%

0.00%

5.55%

92.60%

4.18%

0.23%

0.12%

0.00%

0.00%

0.00%

0.59%

4.03%

91.02%

7.49%

0.73%

0.11%

0.00%

0.00%

0.18%

0.73%

3.90%

87.86%

8.27%

0.82%

0.37%

0.00%

0.00%

0.15%

0.60%

3.78%

86.74%

9.64%

1.84%

0.00%

0.00%

0.00%

0.08%

0.39%

3.28%

85.37%

6.24%

0.00%

0.00%

0.00%

0.00%

0.06%

0.18%

2.41%

81.88%

0.00%

0.00%

0.06%

0.08%

0.16%

0.64%

1.64%

9.67%

100.00%

AAA AA A BBB BB B CCC D

AAA

AA

A

BBB

BB

B

CCC

D

5

Model Development Process

Exploration Modeling Evaluation Deployment

6

Exploratory Data Analysis

Gain insight from visual examination

– Identify trends and interactions

– Detect patterns

– Remove outliers

– Shrink data

– Select and pare predictors

– Feature transformation

MPG Acceleration Displacement Weight Horsepow er

MP

GA

ccele

ratio

nD

ispla

cem

ent

Weig

ht

Hors

epow

er

50 1001502002000 4000200 40010 2020 40

50

100

150

200

2000

4000

200

400

10

20

20

40

7

Data Exploration Interactions Between Variables

Plot Matrix by Group Parallel Coordinates Plot Andrews’ Plot

Glyph Plot Chernoff Faces

MPG Acceleration Displacement Weight Horsepow er

MP

GA

ccele

ratio

nD

ispla

cem

ent

Weig

ht

Hors

epow

er

50 1001502002000 4000200 40010 2020 40

50

100

150

200

2000

4000

200

400

10

20

20

40

MPG Acceleration Displacement Weight Horsepower-3

-2

-1

0

1

2

3

4

Coord

inate

Valu

e

4

6

8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-8

-6

-4

-2

0

2

4

6

8

t

f(t)

4

6

8

chevrolet chevelle malibu buick skylark 320 plymouth satellite

amc rebel sst ford torino ford galaxie 500

chevrolet impala plymouth fury iii pontiac catalina

chevrolet chevelle malibubuick skylark 320 plymouth satellite

amc rebel sst ford torino ford galaxie 500

chevrolet impala plymouth fury iii pontiac catalina

8

Machine Learning Overview Types of Learning, Categories of Algorithms

Machine

Learning

Supervised

Learning

Classification

Regression

Unsupervised

Learning Clustering

Group and interpret data based only

on input data

Develop predictive model based on both input and output data

Type of Learning Categories of Algorithms

9

Unsupervised Learning Clustering

Supervised

Learning

Unsupervised

Learning Clustering

Group and interpret data based only

on input data Machine

Learning

K-means,

Fuzzy K-means

Hierarchical

Neural Network

Gaussian

Mixture

Classification

Regression

10

Clustering Overview

What is clustering?

– Segment data into groups,

based on data similarity

Why use clustering?

– Identify outliers

– Resulting groups may be

the matter of interest

How is clustering done?

– Can be achieved by various algorithms

– It is an iterative process (involving trial and error)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

11

Dataset We’ll Be Using

Cloud of randomly generated points

– Each cluster center is randomly chosen inside specified bounds

– Each cluster contains the specified number of points per cluster

– Each cluster point is sampled from a Gaussian distribution

– Multi-dimensional dataset

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

12

Example Cluster Analysis K-Means

K-means is a partitioning method

Partitions data into k mutually

exclusive clusters

Each cluster has a

centroid (or center)

– Sum of distances from

all objects to the center

is minimized

Statistics Toolbox

13

Distance Metrics & Group Quality

Distance measures choices

Many built-in distance metrics, or

define your own

>> doc pdist

>> distances = pdist(data,metric); %pdist =

pairwise distances

>> squareform(distances)

>> kmeans(data,k,’distance’,’cityblock’)

%not all metrics supported

Create silhouette plots

>> silhouette(data,clusters)

Euclidean Distance

Default

Cityblock Distance

Useful for discrete variables

Cosine Distance

Useful for clustering variables

14

Clustering Neural Network

Networks are comprised of one or more layers

Outputs computed by

applying a nonlinear

transfer function with

weighted sum of inputs

Trained by letting the network

continually adjust itself

to new inputs (determines weights)

Output Variable

Transfer function

Weights

Bias

Input variables

15

Clustering Neural Network

Neural Network Toolbox provides

interactive apps for easily

creating and training networks

Multi-layered networks

created by cascading (provide better accuracy)

Example architectures for clustering:

– Self-organizing maps

– Competitive layers

Neural Network Toolbox

16

Self Organising Map Neural Net How it Works

Started with a regular grid of

„neurons‟ laid over the dataset

Size of the grid determined

the number of clusters

Neurons competed to recognize

data points (by being close to them)

Winning neurons were moved

closer to the data points

Repeated until convergence

-0.5 0 0.5 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2SOM Weight Positions

Weight 1

Weig

ht

2

-0.2 0 0.2 0.4 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1SOM Weight Positions

Weight 1

Weig

ht

2

17

Gaussian Mixture Models

Good when clusters have different

sizes and are correlated

Assume that data is drawn

from a fixed number K

of normal distributions

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

10

20

Statistics Toolbox

18

Cluster Analysis Summary

Segments data into groups, based on data similarity

No method is perfect (depends on data)

Process is iterative; explore different algorithms

Beware of local minima (global optimization can help)

Clustering

K-means,

Fuzzy K-means

Hierarchical

Neural Network

Gaussian

Mixture

19

Model Development Process

Exploration Modeling Evaluation Deployment

20

Supervised Learning Classification for Predictive Modeling

Unsupervised

Learning


Machine

Learning

Supervised

Learning

Decision Tree

Ensemble

Method

Neural Network

Support Vector

Machine

Classification

21

Classification Overview

What is classification?

– Predicting the best group for each point

– “Learns” from labeled observations

– Uses input features

Why use classification?

– Accurately group data never seen before

How is classification done?

– Can use several algorithms to build a predictive model

– Good training data is critical

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

22

Example Classification Decision Trees

Builds a tree from training data

– Model is a tree where each node is a

logical test on a predictor

Traverse tree by comparing

features with threshold values

The “leaf” of the tree

specifies the group

Statistics Toolbox

23

Ensemble Learners Overview

Decision trees are “weak” learners

– Good to classify data used to train

– Often not very good with new data

– Note rectangular groups

What are ensemble learners?

– Combine many decision trees to create a

“strong” learner

– Uses “bootstrapped aggregation”

Why use ensemble methods?

– Classifier has better predictive power

– Note improvement in cluster shapes

Statistics Toolbox

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5

0

0.5

1

1.5

x1

x2

group1

group2

group3

group4

group5

group6

group7

group8

24

Decision Trees How do I build them with MATLAB?

Build tree model >> tree = classregtree(x,y);

>> view(tree)

Evaluate the model on new data >> tree(x_new)

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5

0

0.5

1

1.5

x1

x2

group1

group2

group3

group4

group5

group6

group7

group8

Statistics Toolbox

25

Enhancing the model : Ensemble Learning

Combine weak learners into a stronger learner >> ens =fitensemble(x,y,'AdaBoostM2',200,'Tree');

Bootstrapped aggregated trees forest >> ens = fitensemble(x,y,'Bag',200,'Tree„,'type','classification');

>> y_pred = predict(ens,x);

Visualise class boundaries

26

K-Nearest Neighbor Classification

One of the simplest classifiers

Takes the K nearest points

from the training set, and

chooses the majority class

of those K points

No training phase – all the

work is done during the

application of the model -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

-0.5

0

0.5

1

1.5

x1

x2

group1

group2

group3

group4

group5

group6

group7

group8

Statistics Toolbox

27

MATLAB Helps to Manage Complexity

A single calling syntax

for all methods

Documentation helps

you choose an

appropriate algorithm for

your particular problem

28

Support Vector Machines Overview

Good for modeling with complex

boundaries between groups

– Can be very accurate

– No restrictions on the predictors

What does it do?

– Uses non-linear “kernel” to

calculate the boundaries

– Can be computationally intensive

Version in Statistics Toolbox only

classifies into two groups

Statistics Toolbox

(as of R2013a)

-3 -2 -1 0 1 2 3-2

-1

0

1

2

3

4

1

2

Support Vectors

29

Classification Summary

No absolute best method

Simple does not

mean inefficient

Watch for overfitting

– Decision trees and neural networks may overfit the noise

– Use ensemble learning and cross-validation

Parallelize for speedup

Decision Tree

Ensemble

Method

Neural Network

Support Vector

Machine

Classification

30

Supervised Learning Regression for Predictive Modeling

Unsupervised

Learning


Machine

Learning

Supervised

Learning

Linear

Non-linear

Non-parametric

Regression

31

Regression

Why use regression?

– Predict the continuous response

for new observations

Type of predictive modeling

– Specify a model that describes

Y as a function of X

– Estimate coefficients that

minimize the difference

between predicted and actual

You can apply techniques from earlier sections with

regression as well (e.g., Neural Network)

Statistics Toolbox

Curve Fitting Toolbox

32

Linear Regression

Y is a linear function of the regression coefficients

Common examples:

Straight line 𝑌 = 𝐵0 + 𝐵1𝑋1

Plane

𝑌 = 𝐵0 + 𝐵1𝑋1 +𝐵2𝑋2

Polynomial 𝑌 = 𝐵0 + 𝐵1𝑋13 + 𝐵2𝑋1

2 +𝐵3𝑋1

Polynomial

with cross terms 𝑌 = 𝐵0 + 𝐵1𝑋1

2 + 𝐵2(𝑋1 ∗ 𝑋2) + 𝐵3 𝑋22

33

Fourier Series 𝑏0 + 𝑏1 cos 𝑏3𝑋 + 𝑏2 sin 𝑏3𝑋

Exponential Growth 𝑁 = 𝑁0𝑒

Logistic Growth 𝑃 𝑡 =𝑏0

1 + 𝑏1𝑒−𝑘𝑡

Nonlinear Regression

Y is a nonlinear function of the regression coefficients

Syntax for formulas:

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

@(b,t)(b(1)*exp(b(2)*t)

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

@(b,t)(b(1)*exp(b(2)*t)

@(b,t)(1/(b(1) + exp(-

b(2)*x)))

34

Generalized Linear Models

Extends the linear model

– Define relationship between model and response variable

– Model error distributions other than normal

Logistic regression

– Response variable is binary (true / false)

– Results are typically expressed as an odd‟s ratio

Poisson regression

– Model count data (non-negative integers)

– Response variable comes from a Poisson distribution

35

Machine Learning with MATLAB

Interactive environment

– Visual tools for exploratory data analysis

– Easy to evaluate and choose best algorithm

– Apps available to help you get started (e.g,. neural network tool, curve fitting tool)

Multiple algorithms to choose from

– Clustering

– Classification

– Regression

36

Learn More : Machine Learning with MATLAB

Classification

with MATLAB

Credit Risk Modeling with

MATLAB

Multivariate Classification in

the Life Sciences

Electricity Load and

Price Forecasting

http://www.mathworks.com/discovery/

machine-learning.html

Data Driven Fitting

with MATLAB Regression

with MATLAB

http://www.mathworks.com/wbnr51468









http://www.mathworks.com/discovery/machine-learning.html









machine learning with matlab

Documents

input data machine learning

data similarity

data train

segment data

data exploration interactions

initial set of data

new data sets

clustering overview