machine learning with matlab
DESCRIPTION
Machine Learning Examples using MATLABTRANSCRIPT
1 © 2013 The MathWorks, Inc.
Machine Learning with MATLAB
Stefan Duprey, Application Engineer
2
What You Will Learn
Overview of machine learning
Algorithms available with MATLAB
MATLAB as an interactive environment
for evaluating and choosing the best algorithm
3
Machine Learning Basic Concepts
Start with an initial set of data
“Learn” from this data
– “Train” your algorithm
with this data
Use the resulting model
to predict outcomes
for new data sets
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
4
Machine Learning Characteristics and Examples
Characteristics
– Lots of data (many variables)
– System too complex to know
the governing equation (e.g., black-box modeling)
Examples
– Pattern recognition (speech, images)
– Financial algorithms (credit scoring, algo trading)
– Energy forecasting (load, price)
– Biology (tumor detection, drug discovery)
93.68%
2.44%
0.14%
0.03%
0.03%
0.00%
0.00%
0.00%
5.55%
92.60%
4.18%
0.23%
0.12%
0.00%
0.00%
0.00%
0.59%
4.03%
91.02%
7.49%
0.73%
0.11%
0.00%
0.00%
0.18%
0.73%
3.90%
87.86%
8.27%
0.82%
0.37%
0.00%
0.00%
0.15%
0.60%
3.78%
86.74%
9.64%
1.84%
0.00%
0.00%
0.00%
0.08%
0.39%
3.28%
85.37%
6.24%
0.00%
0.00%
0.00%
0.00%
0.06%
0.18%
2.41%
81.88%
0.00%
0.00%
0.06%
0.08%
0.16%
0.64%
1.64%
9.67%
100.00%
AAA AA A BBB BB B CCC D
AAA
AA
A
BBB
BB
B
CCC
D
5
Model Development Process
Exploration Modeling Evaluation Deployment
6
Exploratory Data Analysis
Gain insight from visual examination
– Identify trends and interactions
– Detect patterns
– Remove outliers
– Shrink data
– Select and pare predictors
– Feature transformation
MPG Acceleration Displacement Weight Horsepow er
MP
GA
ccele
ratio
nD
ispla
cem
ent
Weig
ht
Hors
epow
er
50 1001502002000 4000200 40010 2020 40
50
100
150
200
2000
4000
200
400
10
20
20
40
7
Data Exploration Interactions Between Variables
Plot Matrix by Group Parallel Coordinates Plot Andrews’ Plot
Glyph Plot Chernoff Faces
MPG Acceleration Displacement Weight Horsepow er
MP
GA
ccele
ratio
nD
ispla
cem
ent
Weig
ht
Hors
epow
er
50 1001502002000 4000200 40010 2020 40
50
100
150
200
2000
4000
200
400
10
20
20
40
MPG Acceleration Displacement Weight Horsepower-3
-2
-1
0
1
2
3
4
Coord
inate
Valu
e
4
6
8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-8
-6
-4
-2
0
2
4
6
8
t
f(t)
4
6
8
chevrolet chevelle malibu buick skylark 320 plymouth satellite
amc rebel sst ford torino ford galaxie 500
chevrolet impala plymouth fury iii pontiac catalina
chevrolet chevelle malibubuick skylark 320 plymouth satellite
amc rebel sst ford torino ford galaxie 500
chevrolet impala plymouth fury iii pontiac catalina
8
Machine Learning Overview Types of Learning, Categories of Algorithms
Machine
Learning
Supervised
Learning
Classification
Regression
Unsupervised
Learning Clustering
Group and interpret data based only
on input data
Develop predictive model based on both input and output data
Type of Learning Categories of Algorithms
9
Unsupervised Learning Clustering
Supervised
Learning
Unsupervised
Learning Clustering
Group and interpret data based only
on input data Machine
Learning
K-means,
Fuzzy K-means
Hierarchical
Neural Network
Gaussian
Mixture
Classification
Regression
10
Clustering Overview
What is clustering?
– Segment data into groups,
based on data similarity
Why use clustering?
– Identify outliers
– Resulting groups may be
the matter of interest
How is clustering done?
– Can be achieved by various algorithms
– It is an iterative process (involving trial and error)
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
11
Dataset We’ll Be Using
Cloud of randomly generated points
– Each cluster center is randomly chosen inside specified bounds
– Each cluster contains the specified number of points per cluster
– Each cluster point is sampled from a Gaussian distribution
– Multi-dimensional dataset
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
12
Example Cluster Analysis K-Means
K-means is a partitioning method
Partitions data into k mutually
exclusive clusters
Each cluster has a
centroid (or center)
– Sum of distances from
all objects to the center
is minimized
Statistics Toolbox
13
Distance Metrics & Group Quality
Distance measures choices
Many built-in distance metrics, or
define your own
>> doc pdist
>> distances = pdist(data,metric); %pdist =
pairwise distances
>> squareform(distances)
>> kmeans(data,k,’distance’,’cityblock’)
%not all metrics supported
Create silhouette plots
>> silhouette(data,clusters)
Euclidean Distance
Default
Cityblock Distance
Useful for discrete variables
Cosine Distance
Useful for clustering variables
14
Clustering Neural Network
Networks are comprised of one or more layers
Outputs computed by
applying a nonlinear
transfer function with
weighted sum of inputs
Trained by letting the network
continually adjust itself
to new inputs (determines weights)
Output Variable
Transfer function
Weights
Bias
Input variables
15
Clustering Neural Network
Neural Network Toolbox provides
interactive apps for easily
creating and training networks
Multi-layered networks
created by cascading (provide better accuracy)
Example architectures for clustering:
– Self-organizing maps
– Competitive layers
Neural Network Toolbox
16
Self Organising Map Neural Net How it Works
Started with a regular grid of
„neurons‟ laid over the dataset
Size of the grid determined
the number of clusters
Neurons competed to recognize
data points (by being close to them)
Winning neurons were moved
closer to the data points
Repeated until convergence
-0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2SOM Weight Positions
Weight 1
Weig
ht
2
-0.2 0 0.2 0.4 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1SOM Weight Positions
Weight 1
Weig
ht
2
17
Gaussian Mixture Models
Good when clusters have different
sizes and are correlated
Assume that data is drawn
from a fixed number K
of normal distributions
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
10
20
Statistics Toolbox
18
Cluster Analysis Summary
Segments data into groups, based on data similarity
No method is perfect (depends on data)
Process is iterative; explore different algorithms
Beware of local minima (global optimization can help)
Clustering
K-means,
Fuzzy K-means
Hierarchical
Neural Network
Gaussian
Mixture
19
Model Development Process
Exploration Modeling Evaluation Deployment
20
Supervised Learning Classification for Predictive Modeling
Unsupervised
Learning
Develop predictive model based on both input and output data
Machine
Learning
Supervised
Learning
Decision Tree
Ensemble
Method
Neural Network
Support Vector
Machine
Classification
21
Classification Overview
What is classification?
– Predicting the best group for each point
– “Learns” from labeled observations
– Uses input features
Why use classification?
– Accurately group data never seen before
How is classification done?
– Can use several algorithms to build a predictive model
– Good training data is critical
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
22
Example Classification Decision Trees
Builds a tree from training data
– Model is a tree where each node is a
logical test on a predictor
Traverse tree by comparing
features with threshold values
The “leaf” of the tree
specifies the group
Statistics Toolbox
23
Ensemble Learners Overview
Decision trees are “weak” learners
– Good to classify data used to train
– Often not very good with new data
– Note rectangular groups
What are ensemble learners?
– Combine many decision trees to create a
“strong” learner
– Uses “bootstrapped aggregation”
Why use ensemble methods?
– Classifier has better predictive power
– Note improvement in cluster shapes
Statistics Toolbox
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
24
Decision Trees How do I build them with MATLAB?
Build tree model >> tree = classregtree(x,y);
>> view(tree)
Evaluate the model on new data >> tree(x_new)
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
Statistics Toolbox
25
Enhancing the model : Ensemble Learning
Combine weak learners into a stronger learner >> ens =fitensemble(x,y,'AdaBoostM2',200,'Tree');
Bootstrapped aggregated trees forest >> ens = fitensemble(x,y,'Bag',200,'Tree„,'type','classification');
>> y_pred = predict(ens,x);
Visualise class boundaries
26
K-Nearest Neighbor Classification
One of the simplest classifiers
Takes the K nearest points
from the training set, and
chooses the majority class
of those K points
No training phase – all the
work is done during the
application of the model -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
Statistics Toolbox
27
MATLAB Helps to Manage Complexity
A single calling syntax
for all methods
Documentation helps
you choose an
appropriate algorithm for
your particular problem
28
Support Vector Machines Overview
Good for modeling with complex
boundaries between groups
– Can be very accurate
– No restrictions on the predictors
What does it do?
– Uses non-linear “kernel” to
calculate the boundaries
– Can be computationally intensive
Version in Statistics Toolbox only
classifies into two groups
Statistics Toolbox
(as of R2013a)
-3 -2 -1 0 1 2 3-2
-1
0
1
2
3
4
1
2
Support Vectors
29
Classification Summary
No absolute best method
Simple does not
mean inefficient
Watch for overfitting
– Decision trees and neural networks may overfit the noise
– Use ensemble learning and cross-validation
Parallelize for speedup
Decision Tree
Ensemble
Method
Neural Network
Support Vector
Machine
Classification
30
Supervised Learning Regression for Predictive Modeling
Unsupervised
Learning
Develop predictive model based on both input and output data
Machine
Learning
Supervised
Learning
Linear
Non-linear
Non-parametric
Regression
31
Regression
Why use regression?
– Predict the continuous response
for new observations
Type of predictive modeling
– Specify a model that describes
Y as a function of X
– Estimate coefficients that
minimize the difference
between predicted and actual
You can apply techniques from earlier sections with
regression as well (e.g., Neural Network)
Statistics Toolbox
Curve Fitting Toolbox
32
Linear Regression
Y is a linear function of the regression coefficients
Common examples:
Straight line 𝑌 = 𝐵0 + 𝐵1𝑋1
Plane
𝑌 = 𝐵0 + 𝐵1𝑋1 +𝐵2𝑋2
Polynomial 𝑌 = 𝐵0 + 𝐵1𝑋13 + 𝐵2𝑋1
2 +𝐵3𝑋1
Polynomial
with cross terms 𝑌 = 𝐵0 + 𝐵1𝑋1
2 + 𝐵2(𝑋1 ∗ 𝑋2) + 𝐵3 𝑋22
33
Fourier Series 𝑏0 + 𝑏1 cos 𝑏3𝑋 + 𝑏2 sin 𝑏3𝑋
Exponential Growth 𝑁 = 𝑁0𝑒
Logistic Growth 𝑃 𝑡 =𝑏0
1 + 𝑏1𝑒−𝑘𝑡
Nonlinear Regression
Y is a nonlinear function of the regression coefficients
Syntax for formulas:
y ~ b0 + b1*cos(x*b3) +
b4*sin(x*b3)
y ~ b0 + b1*cos(x*b3) +
b4*sin(x*b3)
@(b,t)(b(1)*exp(b(2)*t)
y ~ b0 + b1*cos(x*b3) +
b4*sin(x*b3)
@(b,t)(b(1)*exp(b(2)*t)
@(b,t)(1/(b(1) + exp(-
b(2)*x)))
34
Generalized Linear Models
Extends the linear model
– Define relationship between model and response variable
– Model error distributions other than normal
Logistic regression
– Response variable is binary (true / false)
– Results are typically expressed as an odd‟s ratio
Poisson regression
– Model count data (non-negative integers)
– Response variable comes from a Poisson distribution
35
Machine Learning with MATLAB
Interactive environment
– Visual tools for exploratory data analysis
– Easy to evaluate and choose best algorithm
– Apps available to help you get started (e.g,. neural network tool, curve fitting tool)
Multiple algorithms to choose from
– Clustering
– Classification
– Regression
36
Learn More : Machine Learning with MATLAB
Classification
with MATLAB
Credit Risk Modeling with
MATLAB
Multivariate Classification in
the Life Sciences
Electricity Load and
Price Forecasting
http://www.mathworks.com/discovery/
machine-learning.html
Data Driven Fitting
with MATLAB Regression
with MATLAB