additive models , trees , and related models
DESCRIPTION
Additive Models , Trees , and Related Models. Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University. Introduction. 9.1: Generalized Additive Models 9.2: Tree-Based Methods 9.3: PRIM:Bump Hunting - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/1.jpg)
Additive Additive ModelsModels ,, TreesTrees ,, and and
Related ModelsRelated Models
Prof. Liqing Zhang Prof. Liqing Zhang
Dept. Computer Science & Engineering,
Shanghai Jiaotong University
![Page 2: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/2.jpg)
IntroductionIntroduction
9.1: Generalized Additive Models
9.2: Tree-Based Methods
9.3: PRIM:Bump Hunting
9.4: MARS: Multivariate Adaptive Regression
Splines
9.5: HME:Hieraechical Mixture of Experts
![Page 3: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/3.jpg)
9.1 Generalized Additive Models9.1 Generalized Additive Models
• In the regression setting, a generalized additive models has the form:
Here s are unspecified smooth and nonparametric functions.
Instead of using LBE(Linear Basis Expansion) in chapter 5, we fit each function using a scatter plot smoother(e.g. a cubic smoothing spline)
if
![Page 4: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/4.jpg)
GAM(cont.)GAM(cont.)
• For two-class classification, the additive logistic regression model is:
Here
![Page 5: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/5.jpg)
GAM(cont)GAM(cont)
• In general, the conditional mean U(x) of a response y is related to an additive function of the predictors via a link function g:
Examples of classical link functions:• Identity: g(u)=u• Logit: g(u)=log[u/(1-u)]• Probit: g(• Log: g(u) = log(u)
![Page 6: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/6.jpg)
Fitting Additive ModelsFitting Additive Models
• The additive model has the form:
Here we have
Given observations , a criterion like penalized sum squares can be specified for this problem:
Where are tuning parameters.
ix iyE( )=0
![Page 7: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/7.jpg)
FAM(cont.)FAM(cont.)
Conclusions:• The solution to minimize PRSS is cubic splines,
however without further restrictions the solution is not unique.
• If 0 holds, it is easy to see that :
• If in addition to this restriction, the matrix of input values
has full column rank, then (9.7) is a strict convex criterion and has an unique solution. If the matrix is singular, then the linear part of fj cannot be uniquely determined. (Buja 1989)
)(ˆ ymean
![Page 8: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/8.jpg)
Learning GAM: BackfittingLearning GAM: Backfitting
Backfitting algorithm1. Initialize:
2. Cycle: j = 1,2,…, p,…,1,2,…, p,…, (m cycles)
Until the functions change less than a prespecified threshold
jifyN j
N
ii ,,0,
1
1
Nik
jkki
Nijj xfyxf 11 )}({,}{datawithFit
)(1
1ij
N
ijjj xf
Nff
jf
![Page 9: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/9.jpg)
Backfitting: Points to Ponder
),0(~,)( 2
1
NXfYp
jjj
)()|)(( jjjk
jkk XfXXfYE
Computational Advantage?
Convergence?
How to choose fitting functions?
![Page 10: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/10.jpg)
FAM(cont.)FAM(cont.)
![Page 11: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/11.jpg)
Additive Logistic RegressionAdditive Logistic Regression
23/04/21 11
)()(),,|0Pr(
),,|1Pr(log 11
1
1pp
p
p XfXfXXY
XXY
![Page 12: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/12.jpg)
Logistic RegressionLogistic Regression
• Model the class posterior in terms of K-1 log-odds
• Decision boundary is set of points
• Linear discriminant function for class k
• Classify to the class with the largest value for its k(x)
110
KkxxXKG
xXkG Tkk ,...,,
)|Pr(
)|Pr(log
}0)|Pr(
)|Pr(log:{)}|Pr()|Pr(:{
xXlG
xXkGxxXlGxXkGx
}0)()(:{ 00
xx Tlklk
xx Tkkk 0)(
)(maxarg)( xxG kgk
)|Pr( xXkG
![Page 13: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/13.jpg)
Logistic Regression con’tLogistic Regression con’t
• Parameters estimation– Objective function
– Parameters estimationIRLS (iteratively reweighted least squares)
Particularly, for two-class case, using Newton-Raphson algorithm to solve the equation, the objective function:
)|(Prlogmaxarg1 ii
N
ixy
)),(log()(),(log)( ii
N
i ii xpyxpyl 11
1
),()|(Pr);|(Pr),( xpxyxyxp 101
;)exp(
)exp()|(Pr),(
x
xxyxp
T
T
1
1
![Page 14: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/14.jpg)
Logistic Regression con’tLogistic Regression con’t
N
iii
TiiT
xpxpxxl
1
2
1 ));()(;()(
N
i
xi
Ti
ii
N
i ii
iT
exy
xpyxpyl
1
1
1
11
)}log({
)),(log()(),(log)(
;)exp(
)exp()|(Pr),(
x
xxyxp
T
T
1
1
01
N
iiii xpyx
l));((
)(
![Page 15: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/15.jpg)
Logistic Regression con’tLogistic Regression con’t
)()( ll
Toldnew
12
01
N
iiii xpyx
l));((
)(
N
iii
TiiT
xpxpxxl
1
2
1 ));()(;()(
![Page 16: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/16.jpg)
Logistic Regression con’tLogistic Regression con’t
)()( ll
Toldnew
12
)());(()(
pyXxpyxl T
N
iiii
1
WXXxpxpxxl T
N
iii
TiiT
1
2
1 ));()(;()(
WzXWXXpyXWXX TTTToldnew 11 )()()(1( ),
( ; ), (1 ),
old
oldi i i i i
z X W y p
p p x W p p
![Page 17: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/17.jpg)
Logistic Regression con’tLogistic Regression con’t
);(
),;()(;(
),(
oldii
oldi
oldii
old
xpp
xpxpW
pyWXz
1
1
WzXWXX
pyXWXXTT
TToldnew
1
1
)(
)()(
)()(minarg XzWXz Tnew
![Page 18: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/18.jpg)
Logistic Regression con’tLogistic Regression con’t• When it is used
– binary responses (two classes)– As a data analysis and inference tool to understand the role of the input
variables in explaining the outcome
• Feature selection– Find a subset of the variables that are sufficient for explaining their joint
effect on the response. – One way is to repeatedly drop the least significant coefficient, and refit the
model until no further terms can be dropped– Another strategy is to refit each model with one variable removed, and
perform an analysis of deviance to decide which one variable to exclude
• Regularization– Maximum penalized likelihood – Shrinking the parameters via an L1 constraint, imposing a margin constraint
in the separable case2
1 2)|(Prlog
Cxy ii
N
i
![Page 19: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/19.jpg)
Additive Logistic RegressionAdditive Logistic Regression
23/04/21 19
)()(),,|0Pr(
),,|1Pr(log 11
1
1pp
p
p XfXfXXY
XXY
![Page 20: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/20.jpg)
Additive Logistic RegressionAdditive Logistic Regression
![Page 21: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/21.jpg)
Fitting logistic regression Fitting additive logistic regression
1. jj
0
2.
iepx iijj ji
1
1,
Iterate:
)1(
iii ppw
)(1
iiiii pywz
Using weighted least squares to fit a linear model to zi with weights wi, give new estimates jj
3. Continue step 2 until converge
j
1. where y
y
1log jfyavgy ji
0),(
2.
iepxf iijj ji
1
1),(
Iterate:
b.
a. a.
c. c. Using weighted backfitting algorithm to fit an additive model to zi with weights wi, give new estimates
b.
jf j
3.Continue step 2 until converge
)1(
iii ppw
)(1
iiiii pywz
jf
Additive Logistic Regression: Backfitting
![Page 22: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/22.jpg)
SPAM Detection via Additive Logistic RegressionSPAM Detection via Additive Logistic Regression
• Input variables (predictors):– 48 quantitative variables: percentage of words in the email that
match a given word. Examples include business, address, internet, etc.
– 6 quantitative variables: percentage of characters in the email that match a given character, such as ‘ch;’, ch(, etc.
– The average length of uninterrupted sequences of capital letters– The length of the longest uninterrupted sequence of capital
letters– The sum of length of uninterrupted length of capital letters
• Output variable: SPAM (1) or Email (0)• fj’s are taken as cubic smoothing splines
![Page 23: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/23.jpg)
23/04/21 Additive Models 23
![Page 24: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/24.jpg)
23/04/21 Additive Models 24
![Page 25: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/25.jpg)
SPAM Detection: Results
True Class Predicted Class
Email (0) SPAM (1)
Email (0) 58.5% 2.5%
SPAM (1) 2.7% 36.2%
93.07.22.36
2.36
Sensitivity: Probability of predicting spam given true state is spam =
Specificity: Probability of predicting email given true state is email = 96.05.25.58
5.58
![Page 26: Additive Models , Trees , and Related Models](https://reader035.vdocuments.us/reader035/viewer/2022062519/568151fd550346895dc03927/html5/thumbnails/26.jpg)
GAM: SummaryGAM: Summary
• Useful flexible extensions of linear models
• Backfitting algorithm is simple and modular
• Interpretability of the predictors (input variables) are not obscured
• Not suitable for very large data mining applications (why?)