decision tree learning - amazon s3...introduction to machine learning examples of features features...
TRANSCRIPT
![Page 1: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/1.jpg)
INTRODUCTION TO MACHINE LEARNING
Decision tree learning
![Page 2: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/2.jpg)
Introduction to Machine Learning
Task of classification● Automatically assign class to observations with features
● Observation: vector of features, with a class
● Automatically assign class to new observation with features, using previous observations
● Binary classification: two classes
● Multiclass classification: more than two classes
![Page 3: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/3.jpg)
Introduction to Machine Learning
Example● A dataset consisting of persons
● Features: age, weight and income
● Class:
● binary: happy or not happy
● multiclass: happy, satisfied or not happy
![Page 4: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/4.jpg)
Introduction to Machine Learning
Examples of features● Features can be numerical
● age: 23, 25, 75, …
● height: 175.3, 179.5, …
● Features can be categorical
● travel_class: first class, business class, coach class
● smokes?: yes, no
![Page 5: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/5.jpg)
Introduction to Machine Learning
The decision tree● Suppose you’re classifying patients as sick or not sick
● Intuitive way of classifying: ask questions
Is the patient young or old?
![Page 6: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/6.jpg)
Introduction to Machine Learning
The decision tree● Suppose you’re classifying patients as sick or not sick
● Intuitive way of classifying: ask questions
Is the patient young or old?
Old
![Page 7: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/7.jpg)
Introduction to Machine Learning
The decision tree● Suppose you’re classifying patients as sick or not sick
● Intuitive way of classifying: ask questions
Is the patient young or old?
Old
Smoked for more than 10 years?
![Page 8: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/8.jpg)
Introduction to Machine Learning
The decision tree● Suppose you’re classifying patients as sick or not sick
● Intuitive way of classifying: ask questions
Is the patient young or old?
Vaccinated against the measles?
Young Old
Smoked for more than 10 years?
![Page 9: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/9.jpg)
Introduction to Machine Learning
The decision tree● Suppose you’re classifying patients as sick or not sick
● Intuitive way of classifying: ask questions
Is the patient young or old?
Vaccinated against the measles?
Young Old
Smoked for more than 10 years?
Yes No
… …
Yes No
… …
![Page 10: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/10.jpg)
Introduction to Machine Learning
The decision tree● Suppose you’re classifying patients as sick or not sick
● Intuitive way of classifying: ask questions
Is the patient young or old?
Vaccinated against the measles?
Young Old
Smoked for more than 10 years?
Yes No
… …
Yes No
… …
It’s a decision tree!!!
![Page 11: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/11.jpg)
Introduction to Machine Learning
Define the tree
A
B C
D E F G
![Page 12: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/12.jpg)
Introduction to Machine Learning
Nodes
Define the tree
A
B C
D E F G
![Page 13: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/13.jpg)
Introduction to Machine Learning
Edges
Define the tree
A
B C
D E F G
![Page 14: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/14.jpg)
Introduction to Machine Learning
Root
Define the tree
A
B C
D E F G
![Page 15: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/15.jpg)
Introduction to Machine Learning
Root
Leafs
Define the tree
A
B C
D E F G
![Page 16: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/16.jpg)
Introduction to Machine Learning
Root
Children of A
Children of B, C Grandchildren of A
Define the tree
A
B C
D E F G
![Page 17: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/17.jpg)
Introduction to Machine Learning
Root
Children of A
Define the tree
A
B C
D E F G
Leafs
Children of B, C Grandchildren of A
![Page 18: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/18.jpg)
Introduction to Machine Learning
Questions to ask
age <= 18
vaccinated smoked
not sick sick sick not
sick
yes
yes yes
no
nono
![Page 19: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/19.jpg)
Introduction to Machine Learning
Categorical feature● Can be a feature test on itself
● travel_class: coach, business or first
travel_class
…
…
…
coachbusiness
first
![Page 20: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/20.jpg)
Introduction to Machine Learning
Classifying with the tree
age <= 18
vaccinated smoked
not sick sick sick not
sick
yes
yes yes
no
nono
Observation: patient of 40 years, vaccinated and didn’t smoke
![Page 21: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/21.jpg)
Introduction to Machine Learning
Classifying with the tree
age <= 18
vaccinated smoked
not sick sick sick not
sick
yes
yes yes
no
nono
Observation: patient of 40 years, vaccinated and didn’t smoke
![Page 22: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/22.jpg)
Introduction to Machine Learning
Classifying with the tree
age <= 18
vaccinated smoked
not sick sick sick not
sick
yes
yes yes
no
nono
Observation: patient of 40 years, vaccinated and didn’t smoke
![Page 23: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/23.jpg)
Introduction to Machine Learning
Classifying with the tree
age <= 18
vaccinated smoked
not sick sick sick not
sick
yes
yes yes
no
nono
Observation: patient of 40 years, vaccinated and didn’t smoke
![Page 24: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/24.jpg)
Introduction to Machine Learning
Classifying with the tree
age <= 18
vaccinated smoked
not sick sick sick not
sick
yes
yes yes
no
nono
Observation: patient of 40 years, vaccinated and didn’t smoke
![Page 25: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/25.jpg)
Introduction to Machine Learning
Classifying with the tree
age <= 18
vaccinated smoked
not sick sick sick not
sick
yes
yes yes
no
nono
Observation: patient of 40 years, vaccinated and didn’t smoke
Prediction: not sick
![Page 26: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/26.jpg)
Introduction to Machine Learning
Learn a tree● Use training set
● Come up with queries (feature tests) at each node
![Page 27: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/27.jpg)
Introduction to Machine Learning
part of training set part of training setpart of training set
yes
part of training set
no
training set
age <= 18
Split into parts 2 parts for binary test
TRUE FALSE
![Page 28: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/28.jpg)
Introduction to Machine Learning
part of training set part of training set
feature test feature test
part of training set
yes
part of training set
no
part of training set
yes
part of training set
no
part of training set part of training set part of training set part of training set
![Page 29: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/29.jpg)
Introduction to Machine Learning
keep splitting until leafs contain small portion of training set
part of training set part of training set part of training set part of training set
![Page 30: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/30.jpg)
Introduction to Machine Learning
Learn the tree
leaf
part of training set
class 1 class 2class
● Goal: end up with pure leafs — leafs that contain observations of one particular class
![Page 31: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/31.jpg)
Introduction to Machine Learning
leaf
part of training set
class 1 class 2class
leaf
part of training set
class 1 class 2● When classifying new instances
● end up in leaf
● Goal: end up with pure leafs — leafs that contain observations of one particular class
Learn the tree
● In practice: almost never the case — noise
![Page 32: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/32.jpg)
Introduction to Machine Learning
leaf
part of training set
class 1 class 2
Learn the tree
● assign class of majority of training instances
● In practice: almost never the case — noise
● When classifying new instances
● end up in leaf
● Goal: end up with pure leafs — leafs that contain observations of one particular class
![Page 33: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/33.jpg)
Introduction to Machine Learning
Learn the tree● At each node
● Iterate over different feature tests
● Choose the best one
● Comes down to two parts
● Make list of feature tests
● Choose test with best split
![Page 34: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/34.jpg)
Introduction to Machine Learning
Construct list of tests● Categorical features
● Parents/grandparents/… didn’t use the test yet
● Numerical features
● Choose feature
● Choose threshold
![Page 35: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/35.jpg)
Introduction to Machine Learning
Choose best feature test● More complex
● Use spli!ing criteria to decide which test to use
● Information gain ~ entropy
![Page 36: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/36.jpg)
Introduction to Machine Learning
Information gain● Information gained from split based on feature test
● Test leads to nicely divided classes -> high information gain
● Test leads to scrambled classes-> low information gain
● Test with highest information gain will be chosen
![Page 37: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/37.jpg)
Introduction to Machine Learning
Pruning● Number of nodes influences chance on overfit
● Restrict size — higher bias
● Decrease chance on overfit
● Pruning the tree
![Page 38: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/38.jpg)
INTRODUCTION TO MACHINE LEARNING
Let’s practice!
![Page 39: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/39.jpg)
INTRODUCTION TO MACHINE LEARNING
k-Nearest Neighbors
![Page 40: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/40.jpg)
Introduction to Machine Learning
Instance-based learning● Save training set in memory
● No real model like decision tree
● Compare unseen instances to training set
● Predict using the comparison of unseen data and the training set
![Page 41: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/41.jpg)
Introduction to Machine Learning
k-Nearest Neighbor● Form of instance-based learning
● Simplest form: 1-Nearest Neighbor or Nearest Neighbor
![Page 42: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/42.jpg)
Introduction to Machine Learning
Nearest Neighbor - example● 2 features: X1 and X2
● Class: red or blue
● Binary classification
![Page 43: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/43.jpg)
Introduction to Machine Learning
Nearest Neighbor - example
![Page 44: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/44.jpg)
Introduction to Machine Learning
Nearest Neighbor - example● Save complete training set
![Page 45: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/45.jpg)
Introduction to Machine Learning
Nearest Neighbor - example● Save complete training set
● Given: unseen observation with features X = (1.3, -2)
![Page 46: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/46.jpg)
Introduction to Machine Learning
Nearest Neighbor - example● Save complete training set
● Given: unseen observation with features X = (1.3, -2)
● Compare training set with new observation
![Page 47: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/47.jpg)
Introduction to Machine Learning
Nearest Neighbor - example● Save complete training set
● Given: unseen observation with features X = (1.3, -2)
● Compare training set with new observation
● Find closest observation — nearest neighbor — and assign same class
just Euclidean distance, nothing fancy
![Page 48: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/48.jpg)
Introduction to Machine Learning
k-Nearest Neighbors● k is the amount of neighbors
● If k = 5
● Use 5 most similar observations (neighbors)
● Assigned class will be the most represented class within the 5 neighbors
![Page 49: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/49.jpg)
Introduction to Machine Learning
Distance metric● Important aspect of k-NN
![Page 50: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/50.jpg)
Introduction to Machine Learning
Distance metric● Important aspect of k-NN
● Euclidian distance:
![Page 51: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/51.jpg)
Introduction to Machine Learning
Distance metric● Important aspect of k-NN
● Euclidian distance:
● Manha!an distance:
![Page 52: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/52.jpg)
Introduction to Machine Learning
Scaling - example● Dataset with
● 2 features: weight and height
● 3 observations
height (m) weight (kg)
1 1.83 80
2 1.83 80.5
3 1.70 80
distance: 0.5
distance: 0.13
![Page 53: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/53.jpg)
Introduction to Machine Learning
Scaling - example● Dataset with
● 2 features: weight and height
● 3 observations
height (cm) weight (kg)
1 183 80
2 183 80.5
3 170 80
distance: 0.5
distance: 13
Scale influences distance!
![Page 54: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/54.jpg)
Introduction to Machine Learning
Scaling● Normalize all features
● e.g. rescale values between 0 and 1
● Gives be!er measure of real distance
● Don’t forget to scale new observations
![Page 55: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/55.jpg)
Introduction to Machine Learning
Categorical features● How to use in distance metric?
● Dummy variables
● 1 categorical features with N possible outcomes to N binary features (2 outcomes)
![Page 56: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/56.jpg)
Introduction to Machine Learning
Dummy variables — Example
mother_tongueSpanishItalianItalianSpanishFrenchFrenchFrench
spanish italian french1 0 00 1 00 1 01 0 00 0 10 0 10 0 1
mother tongue: Spanish, Italian or French
![Page 57: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/57.jpg)
INTRODUCTION TO MACHINE LEARNING
Let’s practice!
![Page 58: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/58.jpg)
INTRODUCTION TO MACHINE LEARNING
Introducing: The ROC curve
![Page 59: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/59.jpg)
Introduction to Machine Learning
Introducing● Very powerful performance measure
● For binary classification
● Reiceiver Operator Characteristic Curve (ROC Curve)
![Page 60: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/60.jpg)
Introduction to Machine Learning
Probabilities as output● Used decision trees and k-NN to predict class
● They can also output probability that instance belongs to class
![Page 61: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/61.jpg)
Introduction to Machine Learning
Probabilities as output - example● Binary classification
● Decide whether patient is sick or not sick
● Define probability threshold from which you decide patient to be sick
New patient: 70% 30%Decision tree:
higher than 50%classify as
Avoid sending sick patient home:lower threshold to 30%
decision function!
More patients classified as
More patients classified as
but also
![Page 62: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/62.jpg)
Introduction to Machine Learning
Confusion matrix● Other performance measure for classification
● Important to construct the ROC curve
![Page 63: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/63.jpg)
Introduction to Machine Learning
Confusion matrix● Binary classifier: positive or negative (1 or 0)
Prediction
P N
Truthp TP FN
n FP TN
![Page 64: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/64.jpg)
Introduction to Machine Learning
Prediction
P N
Truthp TP FN
n FP TN
True Positives Prediction: P
Truth: P
● Binary classifier: positive or negative (1 or 0)
Confusion matrix
![Page 65: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/65.jpg)
Introduction to Machine Learning
Confusion matrix
Prediction
P N
Truthp TP FN
n FP TN
False Negatives Prediction: N
Truth: P
● Binary classifier: positive or negative (1 or 0)
![Page 66: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/66.jpg)
Introduction to Machine Learning
Confusion matrix
Prediction
P N
Truthp TP FN
n FP TN
False Positives Prediction: P
Truth: N
● Binary classifier: positive or negative (1 or 0)
![Page 67: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/67.jpg)
Introduction to Machine Learning
Confusion matrix
Prediction
P N
Truthp TP FN
n FP TN
True Negatives Prediction: N
Truth: N
● Binary classifier: positive or negative (1 or 0)
![Page 68: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/68.jpg)
Introduction to Machine Learning
Prediction
P N
Truthp TP FN
n FP TN
TPR TP/(TP+FN)
Ratios in the confusion matrix● True positive rate (TPR) = recall
● False positive rate (FPR)
![Page 69: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/69.jpg)
Introduction to Machine Learning
Prediction
P N
Truthp TP FN
n FP TN
Ratios in the confusion matrix● True positive rate (TPR) = recall
● False positive rate (FPR)
TPR TP/(TP+FN)
Truly
Truly+
Falsely
![Page 70: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/70.jpg)
Introduction to Machine Learning
Prediction
P N
Truthp TP FN
n FP TN
Ratios in the confusion matrix● True positive rate (TPR) = recall
● False positive rate (FPR)
FPR FP/(FP+TN)
![Page 71: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/71.jpg)
Introduction to Machine Learning
Prediction
P N
Truthp TP FN
n FP TN
Ratios in the confusion matrix● True positive rate (TPR) = recall
● False positive rate (FPR)
FPR FP/(FP+TN)
Falsely
Falsely+
Truly
![Page 72: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/72.jpg)
Introduction to Machine Learning
ROC curve● Horizontal axis: FPR
● Vertical axis: TPR
● How to draw the curve?
False positive rateTr
ue p
ositi
ve ra
te0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 73: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/73.jpg)
Introduction to Machine Learning
Draw the curve● Need classifier which outputs probabilities
● The decision function
probability decide to diagnose
probability
threshold by decision function
![Page 74: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/74.jpg)
Introduction to Machine Learning
Draw the curve● Need classifier which outputs probabilities
● The decision function
probability
probability decide to diagnose
threshold by decision function
![Page 75: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/75.jpg)
Introduction to Machine Learning
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
50%
probability
>=50%: sick< 50%: healthy
![Page 76: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/76.jpg)
Introduction to Machine Learning
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0%
probability
all sick
![Page 77: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/77.jpg)
Introduction to Machine Learning
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
100%
probability
all healthy
![Page 78: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/78.jpg)
Introduction to Machine Learning
Interpreting the curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
● Is it a good curve?
● Closer to le! upper corner = be!er
● Good classifiers have big area under the curve
![Page 79: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/79.jpg)
Introduction to Machine Learning
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
AUC = 0.905
Area under the curve (AUC)
> 0.9 = very good
![Page 80: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be](https://reader036.vdocuments.us/reader036/viewer/2022062917/5ed1b62c7dccd150e82ae09f/html5/thumbnails/80.jpg)
INTRODUCTION TO MACHINE LEARNING
Let’s practice!