machine learning demystified - store & retrieve … › utoug.documents › ...supervised...
TRANSCRIPT
![Page 1: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/1.jpg)
Machine Learning Demystified
Michelle HardwickDirector, Data Science & Analytics
Salt Lake Community College
![Page 2: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/2.jpg)
Director, Data Science & Analytics at Salt Lake Community College
Adjunct Professor at Utah State University
Oracle ACE Director, IOUG Executive Vice President, UTOUG President
About Me
Twitter: @datacheesehead
![Page 3: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/3.jpg)
Key Concepts
What is Machine Learning?
Algorithms
Agenda
Pop Quiz
Machine Learning Process
Applications of Machine Learning
![Page 4: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/4.jpg)
What is Machine Learning?
![Page 5: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/5.jpg)
Machine Learning
is the Process of Finding
Insightful Patterns
in your Data
![Page 6: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/6.jpg)
Why is machine learning important?
![Page 7: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/7.jpg)
How is machine learning being used?
Google search
Recommendation systems
Fraud detection
Tagging people in photos
![Page 8: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/8.jpg)
Algorithms
![Page 9: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/9.jpg)
Anomaly Detection
Regression
Classification
Clustering
Association
Support Vector Machines
Decision Tree Learning
Instance-Based Learning
Generalized Linear Models
Centroid-Based Clustering
Hierarchical Clustering
Density-Based Clustering
Problem Type Algorithm Family AlgorithmOne-Class SVM
Linear SVM
Non-Linear SVM
Classification/Regression Decision Tree
Random Forest
Isolation Forest
Radius Neighbors
K-Nearest Neighbors
Logistic Regression
Bayesian Naïve Classifier
Linear Regression
Bayesian Linear Regression
Feedforward ANN (Multilayer Perceptron)
K-Means Clustering
Complete-Linkage Clustering
Single-Linkage Clustering
Average-Linkage Clustering
DBSCAN
Association Rules
Artificial Neural Network
![Page 10: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/10.jpg)
Unsupervised Machine
Learning
When we do not know what the output
values should be
![Page 11: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/11.jpg)
Slide source: Toward Data Science blog
Used when we wish to learn the inherent structure of our data
without using explicitly-provided labels
![Page 12: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/12.jpg)
Supervised Machine
Learning
When we have prior knowledge to know
what the output values should be
![Page 13: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/13.jpg)
Dimension Reductionality
Learn about the data to find the dimensions that interrelate the features. Used for
eliminating redundant features to speed up data processing.
Types:
Regression
Supervised
![Page 14: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/14.jpg)
Classification
Identifying to which category an object belongs
Types:
Decision Tree
Support Vector Machine
Logistic Regression
Neural Networks
Instance Based Learning
Supervised
![Page 15: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/15.jpg)
Regression
Predicting a continuous valued attribute associated with an object. Regression predicts how
much something will happen.
Types:
Generalized Linear Model
Support Vector Machine
Neural Networks
Decision Tree
Supervised
![Page 16: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/16.jpg)
Naïve Bayes
Finds the probability of an event occurring given the probability of another event that has
already occurred
Types:
Bayes Theorem
Supervised
![Page 17: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/17.jpg)
Exploratory Analysis
Used to automatically identify structure in the data
Types:
Clustering
Unsupervised
![Page 18: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/18.jpg)
Association
Discover the rules that describe large portions of the data
Such as: People who buy X also tend to buy Y
Types:
Association Rules
Decision Tree
Unsupervised
![Page 19: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/19.jpg)
Anomaly Detection
Finds cases that are unusual or slightly different
Types:
Support Vector Machine
Decision Tree
Supervised or Unsupervised
![Page 20: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/20.jpg)
How do you
pick which
algorithm to run?
![Page 21: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/21.jpg)
ActivityWhat algorithm families would you run for the following ML problem?
What students are most likely to succeed at SLCC?
![Page 22: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/22.jpg)
Anomaly Detection
Regression
Classification
Clustering
Association
Support Vector Machines
Decision Tree Learning
Instance-Based Learning
Generalized Linear Models
Centroid-Based Clustering
Hierarchical Clustering
Density-Based Clustering
Problem Type Algorithm Family AlgorithmOne-Class SVM
Linear SVM
Non-Linear SVM
Classification/Regression Decision Tree
Random Forest
Isolation Forest
Radius Neighbors
K-Nearest Neighbors
Logistic Regression
Bayesian Naïve Classifier
Linear Regression
Bayesian Linear Regression
Feedforward ANN (Multilayer Perceptron)
K-Means Clustering
Complete-Linkage Clustering
Single-Linkage Clustering
Average-Linkage Clustering
DBSCAN
Association Rules
Artificial Neural Network
![Page 23: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/23.jpg)
Key ML Concepts
![Page 24: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/24.jpg)
Train vs Test
For Supervised Learning, we want to split out a portion of our dataset to do testing on to
validate the accuracy of our predictions
This should be done randomly
Typical splits are 70% Train and 30% Test or 80%/20%
Test dataset will have the output field but you will ignore it when running the model
![Page 25: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/25.jpg)
Feature Engineering
Feature engineering is the process of transforming the raw data into features that will
better represent the underlying problem, resulting in improved model accuracy
The Attribute Importance model can be used for this (minimum description length
algorithm)
![Page 26: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/26.jpg)
Bias
When an algorithm produces results that are systematically prejudiced due to erroneous
assumptions in the ML process
This is usually related to the gathering or usage of data
You should check your data, models and results for this bias
![Page 27: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/27.jpg)
The Machine Learning Process
![Page 28: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/28.jpg)
Training
Step 03
Testing
Step 04
Data Preparation
Step 02
Evaluation
Step 05
Gather Data
Step 01
The ML Process
![Page 29: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/29.jpg)
Step 1: Gather Data
To improve the accuracy of the predictions,
Quantity and Quality of the data is most important
![Page 30: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/30.jpg)
Step 1: Gather DataProfile data for quality
Reliable data avoids:
• Duplicated data
• Bad labels
• Bad values
• Omitted values
![Page 31: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/31.jpg)
Step 1: Gather DataIdentify Features
A feature is a measurable property of the object you are trying to analyze.
These are data points that describe the object. Such as age, gender, zip code, etc.
![Page 32: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/32.jpg)
Step 1: Gather DataLabel Sources
If your training data is not classified with the outcome, it needs to be labeled
![Page 33: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/33.jpg)
Step 2: Data Preparation
Perhaps the most time consuming task is preparing your data for the algorithm you’ll be
running
Where should you transform? Prior to the training? Or in the model?
![Page 34: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/34.jpg)
Step 2: Data Preparation
Numeric Transformations
Convert non-numeric data to numeric
![Page 35: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/35.jpg)
Step 2: Data Preparation
Numeric Normalization
Transform features to be on the same scale
Methods:
• Scaling to a range
• Clipping
• Log scaling
• Z-score
![Page 36: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/36.jpg)
Step 2: Data Preparation
Bucketing
For numeric data where there is not a linear relationship, you can bucket it
Two Types of Bucketing:
• Equal spaced boundaries
• Quantile boundaries
Images source: Google machine learning
![Page 37: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/37.jpg)
Step 2: Data Preparation
Transforming Categorial Data
When feature data is not an ordered relationship
Two methods:
• Vocabulary
• Hashing
![Page 38: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/38.jpg)
Step 3: Training
At this point you’ll put together your machine learning model
You can use many tools for this
Build the model, accepting defaults, then run it
![Page 39: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/39.jpg)
Step 4: Testing
Now run your model (with the same parameter settings) against your test dataset
![Page 40: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/40.jpg)
Step 5: Evaluation
Check the accuracy of your model
How many of your test records did the model predict correctly?
![Page 41: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/41.jpg)
Repeat and Repeat and Repeat
![Page 42: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/42.jpg)
Applications of Machine Learning
![Page 43: Machine Learning Demystified - Store & Retrieve … › utoug.documents › ...Supervised Machine Learning When we have prior knowledge to know what the output values should be Dimension](https://reader034.vdocuments.us/reader034/viewer/2022042315/5f0385397e708231d40976eb/html5/thumbnails/43.jpg)
Demo Time!