decision trees machine learning

44
Decision Trees & Machine Learning CS16: Introduction to Data Structures & Algorithms Summer 2021

Upload: others

Post on 21-May-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision Trees Machine Learning

Decision Trees&

Machine LearningCS16: Introduction to Data Structures & Algorithms

Summer 2021

Page 2: Decision Trees Machine Learning

Machine Learning‣ Algorithms that use data to design algorithms

‣ Allows us to design algorithms‣ that predict the future (e.g., picking stocks)‣ even when we don’t know how (e.g., facial

recognition)2

Learning Algodata Algo Algo

input

output

Page 3: Decision Trees Machine Learning

3CS 147

Page 4: Decision Trees Machine Learning

Applications of ML‣ Agriculture

‣ Astronomy

‣ Bioinformatics

‣ Classifying DNA

‣ Computer Vision

‣ Finance

‣ Linguistics

‣ Medical diagnostics

‣ Insurance

‣ Economics

‣ Advertising

‣ Self-driving cars

‣ Recommendation systems (e.g., Netflix)

‣ Search engines

‣ Translations

‣ Robotics

‣ Risk assessment

‣ Drug discovery

‣ Fraud discovery

‣ Computational Anatomy

Page 5: Decision Trees Machine Learning

Classes of ML‣ Supervised learning‣ learn to make accurate predictions from training data

‣ Unsupervised learning‣ find patterns in data without training data

‣ Reinforcement learning‣ improve performance with positive and negative

feedback

5

Page 6: Decision Trees Machine Learning

Supervised Learning‣ Make accurate predictions/classifications

‣ Is this email spam?‣ Will the snowstorm cancel class?‣ Will this flight be delayed?‣ Will this candidate win the next election?

‣ How can our algorithm predict the future?‣ We train it using “training data” which are past examples‣ Examples of emails classified as spam and of emails classified as non-spam‣ Examples of snowstorms that have lead to cancelations and of snowstorms that

have not‣ Examples of flights that have been delayed and of flights that have left on time‣ Examples of candidates that won and of candidates that have lost

6

Page 7: Decision Trees Machine Learning

Supervised Learning‣ Training data is a collection of examples

‣ An example includes an input and its classification‣ inputs: flights, snowstorms, candidates, …‣ classifications: delayed/non-delayed, canceled/not canceled, win/lose

‣ But how do we represent inputs for our algorithm?‣ What is a student? what is a flight? what is an email?

‣ We have to choose attributes that describe the inputs

‣ flight is represented by: source, destination, airline, number of passengers, …

‣ snowstorm is represented by: duration, expected inches, winds, …

‣ candidate is represented by: district, political affiliation, experience, …

7

Page 8: Decision Trees Machine Learning

Example: Waiting for a Table‣ Design algorithm that predicts if patron will wait

for a table‣ What are the inputs?‣ the “context” of the patron’s decision

‣ What are the attributes of this context?‣ is patron hungry? is the line long?

8

Page 9: Decision Trees Machine Learning

Example: Waiting for a Table?‣ Input attributes

‣ A1: Alternatives = {Yes, No}‣ A2: Bar = {Yes, No}‣ A3: Fri/Sat = {Yes, No}‣ A4: Hungry = {Yes, No}‣ A5: Patrons = {None, Some, Full}‣ A6: Price = {$, $$, $$$}‣ A7: Raining = {Yes, No}‣ A8: Reservation = {Yes, No}‣ A9: Type = {French, Italian, Thai, Burger}‣ A10: Wait = {10-30, 30-60, >60}

‣ Classification: {Yes, No}

9

Page 10: Decision Trees Machine Learning

Training Data

10

S. Russel & P. Norvig. Artificial Intelligence - A Modern Approach

Page 11: Decision Trees Machine Learning

Supervised Learning‣ Classification‣ If classifications are from a finite set ‣ ex: spam/not spam, delayed/not delayed

‣ Regression‣ If classifications are real numbers‣ ex: temperature

11

Page 12: Decision Trees Machine Learning

Decision Trees‣ A decision tree maps ‣ inputs represented by attributes… ‣ …to a classification

‣ Examples‣ snowstorm_dt(12h,8”,strong winds) returns Yes

‣ flight_dt(DL,PVD,Paris,night,no_storm,…) returns No

‣ restaurant_dt(estimate,hungry,patrons,…) returns No

12

Page 13: Decision Trees Machine Learning

Decision Tree Example

13

Page 14: Decision Trees Machine Learning

Decision Tree Example

14

Page 15: Decision Trees Machine Learning

Our Goal: Learning a Decision Tree

15

Training Data

Decision Tree

Learn

Page 16: Decision Trees Machine Learning

What is a Good Decision Tree?

16

Page 17: Decision Trees Machine Learning

What is a Good Decision Tree?‣ Consistent with training data

‣ classifies training examples correctly

‣ Performs well on future examples‣ classifies future inputs correctly

‣ As small as possible‣ Efficient classification

‣ How can we find a small decision tree?‣ there are possible decision

trees‣ so brute force is not possible

17

⌦(22n

)

Page 18: Decision Trees Machine Learning

Iterative Dichotomizer 3 (ID3) Algorithm

18

Data

(Learned) Decision Tree

ID3 Ross Quinlan

Page 19: Decision Trees Machine Learning

ID3

‣ Starting at root‣ node is either an attribute node or a classification node (leaf)‣ outgoing edges are labeled with attribute values

‣ children are either a classification node or another attribute node

‣ Tree should be as small as possible19

Page 20: Decision Trees Machine Learning

ID3

20

Patrons

Some

FullNone

4xYes2xYes4xNo2xNo

6xYes6xNo

Type

French

Thai Burger Italian

1xYes1xNo

2xYes2xNo

2xYes2xNo

1xYes1xNo

6xYes6xNo

Page 21: Decision Trees Machine Learning

ID3

21

French

Thai Burger Italian

1xYes1xNo

2xYes2xNo

2xYes2xNo

1xYes1xNo

Type

6xYes6xNo

Uncertainty about whetherwe should wait or not

Page 22: Decision Trees Machine Learning

ID3

22

Type

French

Thai Burger Italian

1xYes1xNo

2xYes2xNo

2xYes2xNo

1xYes1xNo

Patrons

Some

FullNone

4xYes2xYes4xNo2xNo

6xYes6xNo

6xYes6xNo

Page 23: Decision Trees Machine Learning

ID3‣ Start at root with entire training data‣ Choose attribute that creates a “good split”‣ Attribute “splits” data into subsets‣ good split: children with subsets that are unmixed (with

same classification)‣ bad split: children with subsets that are mixed (with

different classification)

‣ Children with unmixed subsets lead to a classification‣ Children with mixed subsets handled with recursion

23

Page 24: Decision Trees Machine Learning

ID3

24

Type

French

Thai Burger Italian

1xYes1xNo

2xYes2xNo

2xYes2xNo

1xYes1xNo

Patrons

Some

FullNone

4xYes2xYes4xNo2xNo

How do we distinguish

from“bad” attributes “good” attributes

many mixed subsets many unmixed subsets

6xYes6xNo

6xYes6xNo

Page 25: Decision Trees Machine Learning

ID3‣ How do we decide if attribute is good?‣ Compute entropy of each child ‣ quantifies how mixed/alike it is

‣ quantifies amount of certainty/uncertainty

‣ Combine the entropies of all the children‣ Compare combined entropy of children to entropy

of node‣ This is called the information gain

25

Page 26: Decision Trees Machine Learning

Entropy‣ Entropy of a dataset of examples

26

H(data) = �✓

#Yes

#Yes+#No· log2

✓#Yes

#Yes+#No

+

✓1� #Yes

#Yes+#No

◆· log2

✓1� #Yes

#Yes+#No

◆◆

<latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit>

log(0) = 0<latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZKK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZKK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZKK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZKK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit>

[ ]

Page 27: Decision Trees Machine Learning

Entropy‣ Entropy of a dataset of examples

‣ Intuition:‣ H(perfectly mixed) = 1‣ H(all the same) = 0

27

H(data) = �✓

#Yes

#Yes+#No· log2

✓#Yes

#Yes+#No

+

✓1� #Yes

#Yes+#No

◆· log2

✓1� #Yes

#Yes+#No

◆◆

<latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit>

log(0) = 0<latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZKK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZKK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZKK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit><latexit sha1_base64="NDBv5UwjCJzY/ZoaB4AHXijEA4w=">AAAB8XicbVBNS8NAEJ3Ur1q/qh69LBahXkoignoQil48VjC2kIay2W7apZvdsLsRSujP8OJBxav/xpv/xm2bg7Y+GHi8N8PMvCjlTBvX/XZKK6tr6xvlzcrW9s7uXnX/4FHLTBHqE8ml6kRYU84E9Q0znHZSRXEScdqORrdTv/1ElWZSPJhxSsMEDwSLGcHGSkGXy0HdPUXXyO1Va27DnQEtE68gNSjQ6lW/un1JsoQKQzjWOvDc1IQ5VoYRTieVbqZpiskID2hgqcAJ1WE+O3mCTqzSR7FUtoRBM/X3RI4TrcdJZDsTbIZ60ZuK/3lBZuLLMGcizQwVZL4ozjgyEk3/R32mKDF8bAkmitlbERlihYmxKVVsCN7iy8vEP2tcNbz781rzpkijDEdwDHXw4AKacAct8IGAhGd4hTfHOC/Ou/Mxby05xcwh/IHz+QNcSY+I</latexit>

[ ]

Page 28: Decision Trees Machine Learning

ID3 - Information Gain

28

Type

French

Thai Burger Italian

1xYes1xNo

2xYes2xNo

2xYes2xNo

1xYes1xNo

Patrons

Some

FullNone

4xYes2xYes4xNo2xNo

6xYes6xNo

6xYes6xNo

1

entropy

1

entropy

1

entropy

1

entropy

0 0 0.8

entropy entropy entropyWeighted Sum

Weighted Sum

entropy

1

remainder

1- = 0entropy

1

remainder

0.4- = 0.6

Page 29: Decision Trees Machine Learning

ID3 - Notation

29

Type

French

Thai Burger Italian

1xYes1xNo

2xYes2xNo

2xYes2xNo

1xYes1xNo

6xYes6xNodata0

data2 data3data1 data4

Page 30: Decision Trees Machine Learning

Information gain‣ Entropy of a dataset of examples

‣ Remainder of an attribute

‣ Information gain of an attribute

30

Rem(Att) =dX

i=1

#datai#data1 + · · ·+#datad

·H(datai)<latexit sha1_base64="bhQeD3I6bm4u58OIK2OjRa+Dlb4=">AAACYXicbVFNT9swGHYyGNAxCHDk8ooKqQipSriwHZBgHODIJgpITRccx6EWdhLZb5CqyH9oP2c3Tlz4Ibgph/LxSpYePx+y/TitpDAYho+e/2Vh8evS8krn2+r3tfVgY/PKlLVmfMBKWeqblBouRcEHKFDym0pzqlLJr9P706l+/cC1EWVxiZOKjxS9K0QuGEVHJcGkiU0Of7iyvRadINo9OILY1CppxFFk/2YQ55qyJu62jowiTYS1b/aRhX2IWVaigf15IbN2xsN5bz6+10mCbtgP24GPIHoF3eMz+HdLCLlIgv9xVrJa8QKZpMYMo7DCUUM1Cia57cS14RVl9/SODx0sqOJm1LQVWdh1TAZ5qd0qEFp2PtFQZcxEpc6pKI7Ne21KfqYNa8x/jBpRVDXygs0OymsJWMK0b8iE5gzlxAHKtHB3BTamrk90vzItIXr/5I9gcND/2Y9+uzJ+kdksk22yQ3okIofkmJyTCzIgjDx5i96at+49+x0/8DdnVt97zWyRN+NvvwA+PbZf</latexit><latexit sha1_base64="2CojO+9lrlrcFJZSlyilaAQHBFo=">AAACYXicbVFNT9swGHayAV3HR+iOXF5RTSpCqhIuwAEJ2GEc2bQOpKZEjuMUCzuJ7DdIVfAf2s/Zbadd+CG4KYfy8UqWHj8fsv04raQwGIb/PP/Dx5XVtc6n7uf1jc2tYLv325S1ZnzESlnq65QaLkXBRyhQ8utKc6pSya/Su29z/eqeayPK4hfOKj5RdFqIXDCKjkqCWRObHH5yZQctOkO0e3ACsalV0oiTyN5kEOeasibut46MIk2EtS/2kYV9iFlWooH9ZSGzdsHDxWA5vtdNgn44DNuBtyB6Bv3T7/AnTh6ml0nwN85KViteIJPUmHEUVjhpqEbBJLfduDa8ouyOTvnYwYIqbiZNW5GFr47JIC+1WwVCyy4nGqqMmanUORXFW/Nam5PvaeMa86NJI4qqRl6wxUF5LQFLmPcNmdCcoZw5QJkW7q7AbqnrE92vzEuIXj/5LRgdDI+H0Q9XxjlZTIfskF0yIBE5JKfkglySEWHkv7fibXpb3qPf9QO/t7D63nPmC3kx/s4T5dK3nQ==</latexit><latexit sha1_base64="2CojO+9lrlrcFJZSlyilaAQHBFo=">AAACYXicbVFNT9swGHayAV3HR+iOXF5RTSpCqhIuwAEJ2GEc2bQOpKZEjuMUCzuJ7DdIVfAf2s/Zbadd+CG4KYfy8UqWHj8fsv04raQwGIb/PP/Dx5XVtc6n7uf1jc2tYLv325S1ZnzESlnq65QaLkXBRyhQ8utKc6pSya/Su29z/eqeayPK4hfOKj5RdFqIXDCKjkqCWRObHH5yZQctOkO0e3ACsalV0oiTyN5kEOeasibut46MIk2EtS/2kYV9iFlWooH9ZSGzdsHDxWA5vtdNgn44DNuBtyB6Bv3T7/AnTh6ml0nwN85KViteIJPUmHEUVjhpqEbBJLfduDa8ouyOTvnYwYIqbiZNW5GFr47JIC+1WwVCyy4nGqqMmanUORXFW/Nam5PvaeMa86NJI4qqRl6wxUF5LQFLmPcNmdCcoZw5QJkW7q7AbqnrE92vzEuIXj/5LRgdDI+H0Q9XxjlZTIfskF0yIBE5JKfkglySEWHkv7fibXpb3qPf9QO/t7D63nPmC3kx/s4T5dK3nQ==</latexit><latexit sha1_base64="AIBnFzLOhWi2JHEzoGiTa9O/yNY=">AAACYXicbVFNS8MwGE7r9/yq8+glOISJMFov6kHw47KjilNhnSVNUw1L2pK8FUbJn/TmyYs/xKzbYXO+EHjyfJDkSVwIrsH3vxx3aXlldW19o7G5tb2z6+01n3ReKsp6NBe5eomJZoJnrAccBHspFCMyFuw5Ht6O9ecPpjTPs0cYFWwgyVvGU04JWCryRlWoU/zApGnX6BrAHONLHOpSRhW/DMxrgsNUEVqFrdqRECARN2ZuHxh8gkOa5KDxyayQGDPhcbc9Gz9uRF7L7/j14EUQTEELTecu8j7DJKelZBlQQbTuB34Bg4oo4FQw0whLzQpCh+SN9S3MiGR6UNUVGXxkmQSnubIrA1yzs4mKSK1HMrZOSeBd/9XG5H9av4T0fFDxrCiBZXRyUFoKDDke940TrhgFMbKAUMXtXTF9J7ZPsL8yLiH4++RF0DvtXHSCe791dTNtYx0doEPURgE6Q1eoi+5QD1H07aw4O86u8+M2XM9tTqyuM83so7lxD34B9yq0rg==</latexit>

Gain(Att) = H(data0)� Rem(Att)<latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">AAACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmyySkb71wXApFG+iQMlvY80h9CW/8fsnI//mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSSYkRHmdGe0JyhHFgCTAv7V8ruQQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ==</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">AAACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmyySkb71wXApFG+iQMlvY80h9CW/8fsnI//mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSSYkRHmdGe0JyhHFgCTAv7V8ruQQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ==</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">AAACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmyySkb71wXApFG+iQMlvY80h9CW/8fsnI//mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSSYkRHmdGe0JyhHFgCTAv7V8ruQQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ==</latexit><latexit sha1_base64="SPj7aZrFYj9woaAL0BHWfCVaFTI=">AAACJXicbVBNSwMxEM36WetX1aOXYBHag2VXBPVQqHrQo4pVoS1lNs1qaDa7JLNCWfbXePGvePFQRfDkXzHd9uDXg8Cb92aYzPNjKQy67oczNT0zOzdfWCguLi2vrJbW1q9NlGjGmyySkb71wXApFG+iQMlvY80h9CW/8fsnI//mgWsjInWFg5h3QrhTIhAM0ErdUj1tm4CeglBZJadHiFm1Ts/GVQ8Qsq5bpTs0ry95+L2vWyq7NTcH/Uu8CSmTCc67pWG7F7Ek5AqZBGNanhtjJwWNgkmeFduJ4TGwPtzxlqUKQm46aX5mRret0qNBpO1TSHP1+0QKoTGD0LedIeC9+e2NxP+8VoLBQScVKk6QKzZeFCSSYkRHmdGe0JyhHFgCTAv7V8ruQQNDm2zRhuD9Pvkvae7WDmvexV65cTxJo0A2yRapEI/skwY5I+ekSRh5JM9kSF6dJ+fFeXPex61TzmRmg/yA8/kFd56jhQ==</latexit>

H(data) = �✓

#Yes

#Yes+#No· log2

✓#Yes

#Yes+#No

+

✓1� #Yes

#Yes+#No

◆· log2

✓1� #Yes

#Yes+#No

◆◆

<latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit><latexit sha1_base64="CUCt8fc/HqYRscmeEmG1tEW24LM=">AAADGXiclVJNb9QwEHVSPkr42sKRy4gV1VZVo6RCajkgVXDpCRWJpUXr1cpxnF2rThzsCdIqyu/opX+lFw6AOMKJf4M3G6HS5UBHsvw8b+Z5PJ6kVNJiFP3y/LUbN2/dXr8T3L13/8HD3saj91ZXhosh10qbk4RZoWQhhihRiZPSCJYnShwnp68X/PEnYazUxTucl2Kcs2khM8kZOtdkw4sOBzW1GaQMWbMFL2FnkyqR4YBmhvGa9lv2g7BNc/mw3eE3umkoTzVSpaeT3WUq/H8uUCOnM9yCDoRAabBJP1Yshe2glQuX2yCGnWsId7qrtcXXUflTXtCBIJj0+lEYtQarIO5An3R2NOn9oKnmVS4K5IpZO4qjEsc1Myi5Ek1AKytKxk/ZVIwcLFgu7Lhuv7aBZ86TQqaNWwVC672cUbPc2nmeuMic4cxe5RbOf3GjCrP9cS2LskJR8OVFWaUANSzmBFJpBEc1d4BxI12twGfM9Q3dNC2aEF998ioY7oYvwvjt8/7Bq64b6+QJeUoGJCZ75IAckiMyJNw78y68L95X/9z/7H/zvy9Dfa/LeUz+Mv/nb1d/+DI=</latexit>

Page 31: Decision Trees Machine Learning

ID3 Pseudocode

31

id3_algorithm(data, attributes, parent_data):if data is empty:

return new node w/ most frequent classification in parent_data else if all examples in data have same classification:

return new node with that classificationelse if attributes is empty:

return new node w/ most frequent classification in dataelse:

A = attribute with largest information gaintree = new decision tree with root Afor each value a of attribute A

new_data = all examples in data such that example.A = asubtree = id3_algorithm(new_data, attributes - A, data)attach subtree to tree with branch labeled “a”

return tree

No examples w/ this combination of attribute values

Examples w/ this combination of

attribute values but they have different

classifications

Page 32: Decision Trees Machine Learning

Testing Decision Trees‣ ID3 learns a decision tree from training data‣ How good is this decision tree?

‣ How accurately does it classify inputs it hasn’t seen?‣ New flights, new snowstorms, new patrons, …

‣ Split examples into two non-overlapping sets ‣ training data & testing data‣ by assigning each example to one set uniformly at random

‣ Accuracy of decision tree‣ # of correct classifications on testing data / # of testing examples

32

Page 33: Decision Trees Machine Learning

ML Applications‣ Learned algorithms are different than classical algorithms

‣ A learned algorithm depends on training data‣ A learned algorithm depends on features

‣ Learned algorithms are being embedded everywhere‣ Traditional applications

‣ spell checking‣ ad targeting‣ recommendations‣ speech & handwriting recognition‣ computer vision

33

Page 34: Decision Trees Machine Learning

ML Applications‣ Newer applications‣ News feeds, self-driving cars‣ insurance companies, medical systems, K-12 education‣ Risk assessment

‣ These have serious impact on society‣ should news be tailored to your political beliefs?‣ should car save driver or pedestrian?‣ should we deny freedoms based on risk assessments?

34

Page 35: Decision Trees Machine Learning

Risk Assessment‣ Learned algorithms that predict risk‣ Insurance premiums‣ Credit‣ Crime‣ Recidivism

‣ Bonds

‣ Sentencing

35

Page 36: Decision Trees Machine Learning

Criminal Risk Assessment‣ ML in criminal justice system‣ better predict who will commit new crimes

‣ Purported goals‣ a fairer system‣ less people in jail‣ for less time

36

Page 37: Decision Trees Machine Learning

Criminal Risk Assessment Tools‣ COMPAS by Northpointe predicts‣ Risk of new violent crime‣ Risk of general recidivism‣ Pretrial risk (failure to appear)

‣ Public Safety Assessment by Arnold Foundation‣ helps Judges make release/detention decisions

37

Page 38: Decision Trees Machine Learning

Criminal Risk Assessment Tools‣ Used in ‣ Arizona, Colorado, Delaware, Kentucky, Louisiana,

Maryland, New York, Oklahoma, Virginia, Washington, Wisconsin, …

‣ Often adopted without independent study of effectiveness

‣ Example ‣ New York started using tool in 2001

‣ Studied it only in 201238

Page 39: Decision Trees Machine Learning

ProPublica Study‣ In 2016 ProPublica conducted a study of COMPAS

‣ 7000 arrests in Broward County, FL

‣ between 2013 and 2014

‣ OK predictions for all crimes (misdemeanors included)

‣ 61% of people labeled high risk committed new crimes

‣ But unreliable for violent crimes

‣ 20% of people labeled high risk committed new violent crimes

39

Page 40: Decision Trees Machine Learning

ProPublica Study‣ Found significant racial disparities‣ Out of people labeled high risk but didn’t re-offend‣ 44.9% were African American

‣ 23.5% were white

‣ Out of people labeled low risk but did re-offend‣ 28% were African American

‣ 47.7% were white

‣ Study accounted for ‣ Criminal history, age and gender

40

Page 41: Decision Trees Machine Learning

Algorithms‣ Algorithms are impacting our lives‣ You are learning how to ‣ design algorithms‣ analyze algorithms‣ implement algorithms

‣ But don’t forget that… ‣ …your algorithms will impact people

41

Page 42: Decision Trees Machine Learning

Algorithms‣ Algorithms can do a lot amazing things‣ But they can also‣ deny people their freedom‣ addict people‣ sway elections‣ expose people’s private lives‣ …

42

Page 43: Decision Trees Machine Learning

References‣ Artificial Intelligence - A Modern Approach by S. Russel & P. Norvig‣ Machine Bias by ProPublica

‣ https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

‣ New York State COMPAS-Probation Risk and Need Assessment Study

‣ http://www.northpointeinc.com/downloads/research/DCJS_OPCA_COMPAS_Probation_Validity.pdf

‣ Evaluating the Predictive Validity of the COMPAS Risk and Needs Assessment System by Northpointe‣ http://www.northpointeinc.com/files/publications/Criminal-Justice-

Behavior-COMPAS.pdf

43

Page 44: Decision Trees Machine Learning

References‣ Fairness, Accountability and Transparency in ML‣ http://fatml.mysociety.org

‣ Center for Humane Technology‣ http://humanetech.com/

44