![Page 1: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/1.jpg)
![Page 2: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/2.jpg)
Multivariate Data
Nd
NN
d
d
XXX
XXX
XXX
21
22
2
2
1
11
2
1
1
X
2
Multiple measurements (sensors)
d inputs/features/attributes: d-variate
N instances/observations/examples
![Page 3: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/3.jpg)
Multivariate Parameters
2
21
2
2
221
112
2
1
μμ
ddd
d
d
TE
XXXCov
3
![Page 4: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/4.jpg)
Parameter Estimation
4
![Page 5: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/5.jpg)
Estimation of Missing Values What to do if certain instances have missing attributes?
Ignore those instances: not a good idea if the sample is small
Use ‘missing’ as an attribute: may give information
Imputation: Fill in the missing value
Mean imputation: Use the most likely value (e.g., mean)
Imputation by regression: Predict based on other attributes
5
![Page 6: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/6.jpg)
Multivariate Normal Distribution
μxμxx
μx
1
212Σ
2
1
Σ2
1
Σ
T
d
d
p exp
~
//
,N
6
![Page 7: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/7.jpg)
Multivariate Normal Distribution
7
Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)
Bivariate: d = 2
2
221
21
2
1
iiii xz
zzzzxxp
/
exp,
2
221
2
122
21
21 212
1
12
1
rij =s ij
s is j
![Page 8: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/8.jpg)
Bivariate Normal
8
![Page 9: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/9.jpg)
9
![Page 10: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/10.jpg)
Mahalanobis distance A correlation-based distance measure.
10
![Page 11: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/11.jpg)
If xi are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:
If variances are also equal, reduces to Euclidean distance
Independent Inputs: Naive Bayes
p x( ) = pi xi( )i=1
d
Õ =1
2p( )d /2
s ii=1
d
Õexp -
1
2
xi -mis i
æ
èç
ö
ø÷
2
i=1
d
åé
ë
êê
ù
û
úú
11
![Page 12: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/12.jpg)
Parametric Classification If p (x | Ci ) ~ N ( μi , ∑i )
Discriminant functions
ii
T
i
i
diCp μxμxx1
212Σ
2
1
Σ2
1exp|
//
iii
T
ii
iii
CPd
CPCpg
log loglog
log| log
μΣμ
2
1Σ
2
12
2
1xx
xx
12
![Page 13: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/13.jpg)
Estimation of Parameters
t
ti
T
it
t itt
i
i
t
ti
t
tti
i
t
ti
i
r
r
r
r
N
rCP
mxmx
xm
S
ˆ
iii
T
iii CPg ˆ log log
mxmxx1
2
1
2
1SS
13
![Page 14: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/14.jpg)
Different Si Quadratic discriminant
iiii
T
ii
iii
ii
i
T
iiT
iii
T
iiiT
iT
ii
CPw
w
CPg
ˆ log log
where
ˆ log log
SS
S
SW
W
SSSS
2
1
2
1
2
1
22
1
2
1
1
0
1
1
0
111
mm
mw
xwxx
mmmxxxx
14
![Page 15: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/15.jpg)
15
likelihoods
posterior for C1
discriminant: P (C1|x ) = 0.5
![Page 16: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/16.jpg)
Shared common sample covariance S
Discriminant reduces to
which is a linear discriminant
Common Covariance Matrix S
ii
iCP̂ SS
16
ii
T
ii CP̂g log2
1 1 mxmxx S
ii
T
iiii
i
T
ii
CPw
wg
ˆ log
where
mmmw
xwx
1
0
1
0
2
1SS
![Page 17: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/17.jpg)
Common Covariance Matrix S
17
![Page 18: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/18.jpg)
When xj j = 1,..d, are independent, ∑ is diagonal
p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)
Classify based on weighted Euclidean distance (in sj units) to the nearest mean
Diagonal S
i
d
j j
ijtj
i CPs
mxg ˆ log
1
2
2
1x
18
![Page 19: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/19.jpg)
Diagonal S
19
variances may be different
![Page 20: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/20.jpg)
Nearest mean classifier: Classify based on Euclidean distance to the nearest mean
Each mean can be considered a prototype or template and this is template matching
Diagonal S, equal variances
i
d
jij
tj
ii
i
CPmxs
CPs
g
ˆ log
ˆ log
2
12
2
2
2
1
2
mxx
20
![Page 21: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/21.jpg)
Diagonal S, equal variances
21
* ?
![Page 22: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/22.jpg)
Model Selection Assumption Covariance matrix No of parameters
Shared, Hyperspheric Si=S=s2I 1
Shared, Axis-aligned Si=S, with sij=0 d
Shared, Hyperellipsoidal Si=S d(d+1)/2
Different, Hyperellipsoidal Si K d(d+1)/2
22
As we increase complexity (less restricted S), bias decreases and variance increases
Assume simple models (allow some bias) to control variance (regularization)
![Page 23: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/23.jpg)
23
![Page 24: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/24.jpg)
Discrete Features
ij
ijjijj
iii
CPpxpx
CPCpg
log log log
log| log
11
xx
24
Binary features:
if xj are independent (Naive Bayes’)
the discriminant is linear
ijij Cxpp |1
d
j
x
ij
x
ijijj ppCxp
1
11|
t
ti
t
ti
tj
ijr
rxp̂Estimated parameters
![Page 25: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/25.jpg)
Multinomial (1-of-nj) features: xj Î {v1, v2,..., vnj}
if xj are independent
Discrete Features
ikjijkijk CvxpCzpp || 1
25
t
ti
t
ti
tjk
ijk
iijkj k jki
d
j
n
k
z
ijki
r
rzp
CPpzg
pCpj
jk
ˆ
log log
|
x
x1 1
![Page 26: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d5121d434590d102caa08/html5/thumbnails/26.jpg)
Multivariate Regression d
tt wwwxgr ,...,,| 10
26
Multivariate linear model
Multivariate polynomial model: Define new higher-order variables z1=x1, z2=x2, z3=x1
2, z4=x22, z5=x1x2
and use the linear model in this new z space (basis functions, kernel trick: Chapter 13)
211010
22110
2
1
t
tdd
ttd
tdd
tt
xwxwwrwwwE
xwxwxww
X|,...,,