introduction to machine learningcs.kangwon.ac.kr/~leeck/ai2/05.pdf · 2018-09-04 · estimation of...

26

Upload: others

Post on 29-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those
Page 2: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Multivariate Data

Nd

NN

d

d

XXX

XXX

XXX

21

22

2

2

1

11

2

1

1

X

2

Multiple measurements (sensors)

d inputs/features/attributes: d-variate

N instances/observations/examples

Page 3: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Multivariate Parameters

2

21

2

2

221

112

2

1

μμ

ddd

d

d

TE

XXXCov

3

Page 4: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Parameter Estimation

4

Page 5: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Estimation of Missing Values What to do if certain instances have missing attributes?

Ignore those instances: not a good idea if the sample is small

Use ‘missing’ as an attribute: may give information

Imputation: Fill in the missing value

Mean imputation: Use the most likely value (e.g., mean)

Imputation by regression: Predict based on other attributes

5

Page 6: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Multivariate Normal Distribution

μxμxx

μx

1

212Σ

2

1

Σ2

1

Σ

T

d

d

p exp

~

//

,N

6

Page 7: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Multivariate Normal Distribution

7

Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

Bivariate: d = 2

2

221

21

2

1

iiii xz

zzzzxxp

/

exp,

2

221

2

122

21

21 212

1

12

1

rij =s ij

s is j

Page 8: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Bivariate Normal

8

Page 9: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

9

Page 10: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Mahalanobis distance A correlation-based distance measure.

10

Page 11: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

If xi are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:

If variances are also equal, reduces to Euclidean distance

Independent Inputs: Naive Bayes

p x( ) = pi xi( )i=1

d

Õ =1

2p( )d /2

s ii=1

d

Õexp -

1

2

xi -mis i

æ

èç

ö

ø÷

2

i=1

d

åé

ë

êê

ù

û

úú

11

Page 12: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Parametric Classification If p (x | Ci ) ~ N ( μi , ∑i )

Discriminant functions

ii

T

i

i

diCp μxμxx1

212Σ

2

1

Σ2

1exp|

//

iii

T

ii

iii

CPd

CPCpg

log loglog

log| log

μΣμ

2

2

12

2

1xx

xx

12

Page 13: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Estimation of Parameters

t

ti

T

it

t itt

i

i

t

ti

t

tti

i

t

ti

i

r

r

r

r

N

rCP

mxmx

xm

S

ˆ

iii

T

iii CPg ˆ log log

mxmxx1

2

1

2

1SS

13

Page 14: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Different Si Quadratic discriminant

iiii

T

ii

iii

ii

i

T

iiT

iii

T

iiiT

iT

ii

CPw

w

CPg

ˆ log log

where

ˆ log log

SS

S

SW

W

SSSS

2

1

2

1

2

1

22

1

2

1

1

0

1

1

0

111

mm

mw

xwxx

mmmxxxx

14

Page 15: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

15

likelihoods

posterior for C1

discriminant: P (C1|x ) = 0.5

Page 16: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Shared common sample covariance S

Discriminant reduces to

which is a linear discriminant

Common Covariance Matrix S

ii

iCP̂ SS

16

ii

T

ii CP̂g log2

1 1 mxmxx S

ii

T

iiii

i

T

ii

CPw

wg

ˆ log

where

mmmw

xwx

1

0

1

0

2

1SS

Page 17: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Common Covariance Matrix S

17

Page 18: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

When xj j = 1,..d, are independent, ∑ is diagonal

p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)

Classify based on weighted Euclidean distance (in sj units) to the nearest mean

Diagonal S

i

d

j j

ijtj

i CPs

mxg ˆ log

1

2

2

1x

18

Page 19: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Diagonal S

19

variances may be different

Page 20: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Nearest mean classifier: Classify based on Euclidean distance to the nearest mean

Each mean can be considered a prototype or template and this is template matching

Diagonal S, equal variances

i

d

jij

tj

ii

i

CPmxs

CPs

g

ˆ log

ˆ log

2

12

2

2

2

1

2

mxx

20

Page 21: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Diagonal S, equal variances

21

* ?

Page 22: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Model Selection Assumption Covariance matrix No of parameters

Shared, Hyperspheric Si=S=s2I 1

Shared, Axis-aligned Si=S, with sij=0 d

Shared, Hyperellipsoidal Si=S d(d+1)/2

Different, Hyperellipsoidal Si K d(d+1)/2

22

As we increase complexity (less restricted S), bias decreases and variance increases

Assume simple models (allow some bias) to control variance (regularization)

Page 23: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

23

Page 24: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Discrete Features

ij

ijjijj

iii

CPpxpx

CPCpg

log log log

log| log

11

xx

24

Binary features:

if xj are independent (Naive Bayes’)

the discriminant is linear

ijij Cxpp |1

d

j

x

ij

x

ijijj ppCxp

1

11|

t

ti

t

ti

tj

ijr

rxp̂Estimated parameters

Page 25: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Multinomial (1-of-nj) features: xj Î {v1, v2,..., vnj}

if xj are independent

Discrete Features

ikjijkijk CvxpCzpp || 1

25

t

ti

t

ti

tjk

ijk

iijkj k jki

d

j

n

k

z

ijki

r

rzp

CPpzg

pCpj

jk

ˆ

log log

|

x

x1 1

Page 26: Introduction to Machine Learningcs.kangwon.ac.kr/~leeck/AI2/05.pdf · 2018-09-04 · Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those

Multivariate Regression d

tt wwwxgr ,...,,| 10

26

Multivariate linear model

Multivariate polynomial model: Define new higher-order variables z1=x1, z2=x2, z3=x1

2, z4=x22, z5=x1x2

and use the linear model in this new z space (basis functions, kernel trick: Chapter 13)

211010

22110

2

1

t

tdd

ttd

tdd

tt

xwxwwrwwwE

xwxwxww

X|,...,,