machine learning for suits
TRANSCRIPT
!#ODSC,!Boston,!May!30th,!2015
MACHINE(LEARNING
FOR$SUITSRahul&Dave&([email protected])
LxPrior'Inc.'&'Harvard'IACS LXPrior !!! @rahuldave 1
CLASSIFICATION• will%a%customer%churn?
• is%this%a%check?%For%how%much?
• a%man%or%a%woman?
• will%this%customer%buy?
• do%you%have%cancer?
• is%this%spam?
• whose%picture%is%this?
• what%is%this%text%about?j
j"image"from"code"in"h/p://bit.ly/1Azg29G
LXPrior !!! @rahuldave 2
100 1 2 3 4 5 6 7 8 9
10
0
1
2
3
4
5
6
7
8
9
x
y
REGRESSION• how%many%dollars%will%you%spend?
• what%is%your%creditworthiness
• how%many%people%will%vote%for%Bernie%t%days%before%elec9on
• use%to%predict%probabili9es%for%classifica9on
• causal%modeling%in%econometrics
LXPrior !!! @rahuldave 3
HYPOTHESIS)SPACES
A"polynomial"looks"so:
!
All#polynomials#of#a#degree#or#complexity# #cons4tute#a#hypothesis#space.
LXPrior !!! @rahuldave 5
100 1 2 3 4 5 6 7 8 9
10
0
1
2
3
4
5
6
7
8
9
x
y
What%does%it%mean%to%FIT?
Minimize'distance'from'the'line?
Minimize'squared'distance'from'the'line.
Get$intercept$ $and$slope$ .
LXPrior !!! @rahuldave 7
EMPIRICAL)RISK)MINIMIZATION
The$sample$must$be$representa/ve$of$the$popula/on!
A:#Empirical#risk#es/mates#out#of#sample#risk.#B:#Thus#the#out#of#sample#risk#is#also#small.
Is#this#enough?
LXPrior !!! @rahuldave 8
TARGET y = f(x)
SAMPLE (TRAINING EXAMPLES)
( , ), ( , ), . . . , ( , ) x1 y1 x2 y2 xn yn
LEARNER
FINALHYPOTHESIS
g ≈ f
INPUT DISTRIBUTION
P(x)
RISK/ERRORMEASURE
HYPOTHESISSET H
, , . . . , g, . . . , h1 h2 hM
*
*"image"based"on"amlbook.com
LXPrior !!! @rahuldave 9
THE$REAL$WORLD$HAS$NOISE
Which%fit%is%be+er%now?%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%The%line%or%the%curve?
LXPrior !!! @rahuldave 10
SAMPLE (TRAINING EXAMPLES)
( , ), ( , ), . . . , ( , ) x1 y1 x2 y2 xn yn
LEARNER
FINALHYPOTHESIS
g ≈ f
INPUT DISTRIBUTION
P(x)
RISK/ERRORMEASURE
HYPOTHESISSET H
, , . . . , g, . . . , h1 h2 hM
TARGET DISTRIBUTION P(y|x) JOINT
DISTRIBUTION P(x, y)
y = f(x) + ϵ *
*"image"based"on"amlbook.com
LXPrior !!! @rahuldave 11
DETERMINISTIC*NOISE*(Bias)*vs*STOCHASTIC*NOISE**
*"image"based"on"amlbook.com
LXPrior !!! @rahuldave 12
!If$you$are$having$problems$in$machine$learning$the$reason$
is$almost$always
OVERFITTING!"Background"from"h0p://commons.wikimedia.org/wiki/File:Overfi>ng.svg
LXPrior !!! @rahuldave 14
DATA$SIZE$MATTERS:$straight$line$fits$to$a$sine$curve
Corollary:(Must(fit(simpler(models(to(less(data!
LXPrior !!! @rahuldave 15
OverfittingUnderfitting
Complexity “d”
Erro
r or r
isk
Low BiasHigh Variance
High BiasLow Variance
BALANCE'THE'COMPLEXITY
LXPrior !!! @rahuldave 17
TrainingSet
ValidationSet
Training Settrains g*
… … … …
trains g-0
estimates Rout(g-0)
trains g-1
estimates Rout(g-1)
trains g-*
estimates Rout(g-*)
trains g-n
estimates Rout(g-n)
∈ H∗Test Settests g* ∈ H∗estimates Rout(g*)
… … … …
in H0
in H1
in H∗
in Hn
pick H∗ with lowest Rout(g-*), then retrain in H∗ on entire set
D
VALIDATION• train'test*not*enough*as*we*fit*for* *on*test*set*and*contaminate*it
• thus*do*train'validate'test
TrainingSet
TestSet
D Dataset
ValidationSet
LXPrior !!! @rahuldave 18
TrainingSet
ValidationSet
Training Settrains g*
… … … …
estimates R0CV
∈ H∗Test Settests g* ∈ H∗estimates Rout(g*)
… … … …
in H0
in H1
in H∗
in Hn
pick H∗ with lowest RCV , then retrain in H∗ on entire set
estimates R1CV
estimates R*CV
estimates RnCV
D
CROSS%VALIDATION
Test Setleft over
D Dataset
LXPrior !!! @rahuldave 19
CROSS%VALIDATION
is
• robust(to(outlier(valida/on(set
• allows(for(larger(training(sets
• allows(for(error(es/mates
Here$we$find$ .
LXPrior !!! @rahuldave 21
OverfittingUnderfittingErro
r or r
isk
Low BiasHigh Variance
High BiasLow Variance
Regularizer α
H13
subsets of
α∗
S*
REGULARIZATION
Keep$higher$a*priori$complexity$and$impose$a
complexity+penalty
on#risk#instead,#to#choose#a#SUBSET#of# .#We'll#make#the#coefficients#small:
LXPrior !!! @rahuldave 22
REGULARIZATION
As#we#increase# ,#coefficients#go#towards#0.#
Lasso%uses% %sets%
coefficients%to%exactly%0.
Thus%regulariza-on%automates:
!!!!!!!FEATURE!ENGINEERING
LXPrior !!! @rahuldave 24
CLASSIFICATIONBY#LINEAR#SEPARATION
Which%line?
• Different)Algorithms,)different)lines.
• SVM)uses)max:marginj
j"image"from"code"in"h/p://bit.ly/1Azg29G
LXPrior !!! @rahuldave 27
PROBABILISTIC+CLASSIFICATION
Model& &or& .&&&&&
If&la,er&use&Bayes&Theorem:&
LXPrior !!! @rahuldave 28
The$virtual$ATM:
:"Inspired"by"h.p://blog.yhathq.com/posts/image9classifica;on9in9Python.html
LXPrior !!! @rahuldave 31
PCA$$$$$$unsupervised*dim*reduc-on*from*332x137x3*to*50
!!Diagram!from!h+p://stats.stackexchange.com/a/140579
LXPrior !!! @rahuldave 32
0 1
PNPredictedNegative
PPPredictedPositive
1 FNFalse Negative
TPTrue Positive
OPObservedPositive
0 TNTrue Negative
FPFalse Positive
ONObservedNegative
Predicted
Obs
erve
d
EVALUATING*CLASSIFIERS
confusion_matrix(bestcv.predict(Xtest), ytest)
[[11, 0], [3, 4]]
checks=blue=1,,dollars=red=0
LXPrior !!! @rahuldave 39
CLASSIFIER)PROBABILITIES
• classifiers*output*rankings*or*probabili3es
• ought*to*be*well*callibrated,*or,*atleast,*similarly*ordered
LXPrior !!! @rahuldave 40
CLASSIFICATION*RISK
•
• The%usual%loss%is%the%1.0%loss% .
• Thus,% %and%
!!!!!!!CHOOSE!CLASS!WITH!LOWEST!RISK
1"if" "1"if" .
!!!!!!!choose&1&if& &!&Intui/ve!
LXPrior !!! @rahuldave 41
ASYMMETRIC*RISK^
want%no%checks%misclassified%as%dollars,%i.e.,%no%false%nega6ves:% .
Start%with%
choose&1&if&
^"image"of"breast"carcinoma"from"h1p://bit.ly/1QgqhBw
LXPrior !!! @rahuldave 42
ASYMMETRIC*RISK
i.e.$ $where
0 1
PNPredictedNegative
PPPredictedPositive
1 FNFalse Negative
TPTrue Positive
OPObservedPositive
0 TNTrue Negative
FPFalse Positive
ONObservedNegative
Predicted
Obs
erve
d
confusion_matrix(ytest, bestcv2, r=10, Xtest) [[5, 4], [0, 9]]
LXPrior !!! @rahuldave 43
ROC$SPACE+
0 1
PNPredictedNegative
PPPredictedPositive
1 FNFalse Negative
TPTrue Positive
OPObservedPositive
0 TNTrue Negative
FPFalse Positive
ONObservedNegative
Predicted
Obs
erve
d
+"this+next"fig:"Data"Science"for"Business,"Foster"et."al.
LXPrior !!! @rahuldave 44
ROC$CURVE• Rank&test&set&by&prob/score&from&highest&to&lowest
• At&beginning&no&+ives
• Keep&moving&threshold
• confusion&matrix&at&each&threshold
LXPrior !!! @rahuldave 46
COMPARING*CLASSIFERSTelecom'customer'Churn'data'set'from'@YhatHQ<
<"h$p://blog.yhathq.com/posts/predic8ng:customer:churn:with:sklearn.html
LXPrior !!! @rahuldave 47
ASYMMETRIC*CLASSES
• A#has#large#FP#
• B#has#large#FN.#
• On#asymmetric#data#sets,#A#will#do#very#bad.
• Both#downsampling#and#unequal#classes#are#used#in#pracAce
#"figure"from"Data"Science"for"Business,"Foster"et."al.
LXPrior !!! @rahuldave 49
ASYMMETRIC*CLASSES
We#look#for#lines#with#slope
Large& &penalizes&FN.
Churn&and&Cancer&u&dont&want&FN:&an&uncaught&churner&or&cancer&pa3ent&(P=churn/cancer)
LXPrior !!! @rahuldave 50
Li#$Curve• Rank&test&set&by&prob/score&from&highest&to&lowest
• Calculate&TPR&for&each&confusion&matrix&( )&
• Calculate&frac?on&of&test&set&predicted&as&posi?ve&( )
• Plot& &vs& .&LiD&represents&the&
advantage&of&a&classifier&over&random&guessing.
LXPrior !!! @rahuldave 51
EXPECTED'VALUE'FORMALISM
Can$be$used$for$risk$or$profit/u3lity$(nega3ve$risk)
!!!!!!!!!
Frac%on(of(test(set(pred(to(be(posi%ve( :
LXPrior !!! @rahuldave 52
Profit&curve• Rank&test&set&by&prob/score&from&highest&to&lowest
• Calculate&the&expected&profit/u=lity&for&each&confusion&matrix&( )&
• Calculate&frac=on&of&test&set&predicted&as&posi=ve&( )
• plot& &against&
LXPrior !!! @rahuldave 53
Finite&budget#
• 100,000%customers,%40,000%budget,%5$%per%customer
• we%can%target%8000%customers
• thus%target%top%8%
• classifier%1%does%be>er%there,%even%though%classifier%2%makes%max%profit
#"figure"from"Data"Science"for"Business,"Foster"et."al.
LXPrior !!! @rahuldave 54
WHERE%TO%GO%FROM%HERE?• Follow&@YhatHQ,&@kdnuggets&etc
• Read&Provost,&Foster;&Fawce<,&Tom.&Data&Science&for&Business:&What&you&need&to&know&about&data&mining&and&dataIanalyJc&thinking.&O'Reilly&Media.&
• Read&Learning&From&Data&by&Yaser&Abu&Mustafa&et.al.&if&you&want&the&math.&The&learning&diagram&and&some&demos&were&inspired&by&that&book.
• check&out&Harvard's&cs109&or&Andrew&Ng's&coursera&course&videos&online
LXPrior !!! @rahuldave 55