introduction to independent component analysiscsatol/gep_tan/ica_hut.pdfintroduction to independent...

38
Introduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨ arinen

Upload: others

Post on 22-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

Introduction to IndependentComponent Analysis

Jarmo Hurri, Patrik Hoyer, Aapo Hyvarinen

Page 2: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

1

Course timetable (this week)

• today: ICA examples and mathematical background

• tomorrow: ICA model, decorrelation, non-Gaussianity,

FastICA (me)

• Wednesday: practical considerations, ICA by maximum

likelihood, image representations (Patrik)

Page 3: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

2

Course timetable (next week)

• next Monday: nonnegative sparse coding, modeling

residual dependencies (Patrik)

• next Tuesday: recent advances and open questions

(Aapo)

• all classes at 14:15–16:00 in this room (A414)

• http://www.cs.helsinki.fi/jarmo.hurri

Page 4: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

3

Contents

I. some ICA examples

II. multivariate optimization

III. multivariate statistics

IV. estimation theory

Page 5: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

4

Example

• 6 images

• linear mixtures

of 6 originals

• determine orig-

inals

Page 6: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

5

Independent components

Page 7: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

6

Independent components

• independent

latent (hidden)

variables

• linear phe-

nomenon

Page 8: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

7

12

63

54

1

1

2

2

3

3

4

4

5

5

6

6

VEOG

ECG

MEG

HEOG

10 s

5 6 500 µV

500 µV

1000 fT/cmMEG

EOG

ECG

saccades blinking biting

Page 9: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

8

IC1

IC2

IC3

IC4

IC5

IC6

IC7

IC8

IC9

10 s

Page 10: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

9

Some ICA application areas

• biomedical signal analysis (EEG/MEG/MRI/fMRI)

• computational neuroscience

• multispectral image analysis

• bioinformatics (transcriptome analysis)

• gas sensor array analysis

• telecommunications

Page 11: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

10

II.Multivariate optimization (1/2)

• differentiable objective f(w)

• derivative: local linear approximation of change ∆f

• f ′(w) = [∂f/∂w1 ∂f/∂w2 · · · ∂f/∂wn]

• f(w + ∆w)− f(w) ≈ f ′(w)∆w

• gradient rule: w(k + 1) = w(k)± αf ′(w)T

Page 12: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

11

Multivariate optimization (2/2)

• differentiable constraint h(w) = 0

• necessary conditions:

f ′(w) + λh′(w) = 0T

• different iterative methods (e.g., projected gradient)

Page 13: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

12

III. Multivariate statistics

• notation

• basic statistics

• multivariate Gaussian density

• principal component analysis

• statistical independence

Page 14: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

13

Notation

• random variables: x, y, . . .

• density functions: px (x) , py (y) , . . .

• random vectors: x = [x1 x2 · · · xn]T

• each component a (continuous) random variable

• F (x0) = P (x ≤ x0)• px (x0) = ∂

∂x1

∂∂x2

· · · ∂∂xn

F (x)∣∣∣x=x0

Page 15: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

14

Basic statistics (1/3)

• expectation: E {g(x)} =∫

g(x)px (x) dx

• mean: E {x} = [E {x1} E {x2} · · · E {xn}]T

• correlation matrix:

Rx = E{xxT

}=

E

{x2

1

}E {x1x2} · · · E {x1xn}

E {x2x1} E{x2

2

}· · · E {x2xn}

... ... . . . ...

E {xnx1} E {xnx2} · · · E{x2

n

}

Page 16: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

15

Basic statistics (2/3)

• covariance matrix Cx = E{

(x− E {x}) (x− E {x})T}

• note: if E {x} = 0, then Cx = Rx

Page 17: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

16

Basic statistics (3/3)

∗ ∗∗∗∗

∗∗∗

∗∗∗

∗∗

∗∗

∗ ∗∗∗

∗∗∗∗ ∗

∗∗∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗ ∗

∗∗∗∗∗∗

∗ ∗∗∗∗∗∗

∗∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗∗

∗∗

∗∗

∗ ∗

∗ ∗

∗∗∗

∗∗∗

∗∗∗∗

∗ ∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗∗∗ ∗∗ ∗

∗∗

∗∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗

∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗

∗ ∗∗

∗∗

∗ ∗∗∗∗

∗∗ ∗∗∗

∗∗

∗∗∗∗

∗∗

∗∗

∗ ∗∗∗

∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗∗

∗∗∗

∗ ∗

∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗

∗ ∗

∗∗∗

∗∗

∗∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗

∗∗∗∗

∗∗ ∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗ ∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗∗ ∗∗ ∗∗∗

∗∗

∗∗

∗∗

∗∗∗ ∗∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗∗ ∗∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗∗

∗∗∗∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗∗

∗∗∗∗

∗∗

∗ ∗∗ ∗

∗ ∗∗

∗ ∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗∗

∗∗∗∗ ∗

∗∗∗

∗∗

∗∗

∗ ∗∗

∗∗

∗∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗∗

∗ ∗∗

∗∗∗

∗∗∗∗∗

∗∗

∗∗∗∗∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

140 160 180 200

40

60

80

100

height (cm)

wei

ght

(kg)

• x1: height, x2: weight

• Cx =[

37.26 17.9517.95 77.32

]• cov {x1, x2} > 0

• var {x2} > var {x1}

Page 18: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

17

Multivariate Gaussian density (1/2)

Page 19: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

18

Multivariate Gaussian density (2/2)

• px (x) = K exp(−1

2 (x−mx)T C−1

x (x−mx))

• K =((2π)n/2 det (Cx)

1/2)−1

• E {x} = mx

• covariance matrix Cx

• completely specified by 1st and 2nd order statistics

Page 20: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

19

Principal component analysis (1/4)

∗ ∗∗∗∗

∗∗∗

∗∗∗

∗∗

∗∗

∗ ∗∗∗

∗∗∗∗ ∗

∗∗∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗ ∗

∗∗∗∗∗∗

∗ ∗∗∗∗∗∗

∗∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗∗

∗∗

∗∗

∗ ∗

∗ ∗

∗∗∗

∗∗∗

∗∗∗∗

∗ ∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗∗∗ ∗∗ ∗

∗∗

∗∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗

∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗

∗ ∗∗

∗∗

∗ ∗∗∗∗

∗∗ ∗∗∗

∗∗

∗∗∗∗

∗∗

∗∗

∗ ∗∗∗

∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗∗

∗∗∗

∗ ∗

∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗

∗ ∗

∗∗∗

∗∗

∗∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗

∗∗∗∗

∗∗ ∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗ ∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗∗ ∗∗ ∗∗∗

∗∗

∗∗

∗∗

∗∗∗ ∗∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗∗ ∗∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗∗

∗∗∗∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗∗

∗∗∗∗

∗∗

∗ ∗∗ ∗

∗ ∗∗

∗ ∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗∗

∗∗∗∗ ∗

∗∗∗

∗∗

∗∗

∗ ∗∗

∗∗

∗∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗∗

∗ ∗∗

∗∗∗

∗∗∗∗∗

∗∗

∗∗∗∗∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

−40 −20 0 20 40−40

−20

0

20

40

dev. from mean height (cm)

dev

.from

mea

nwei

ght

(kg) • describe data with one lin-

ear projection

• w = [w1 w2 · · · wn]T ,

‖w‖2 = 1

• new data x∗ =(wTx

)w

• min f(w) = E{‖x− x∗‖2

}, s.t. h(w) = ‖w‖2− 1 = 0

Page 21: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

20

Principal component analysis (2/4)

u : uTw = 0, ‖u‖2 = 1

f(w) = E{∥∥(wTx)w + (uTx)u︸ ︷︷ ︸

=x

− (wTx)w︸ ︷︷ ︸=x∗

∥∥2}= E

{∥∥(uTx)u∥∥2

}expected norm of proj.

= E{uT(uTx)(uTx)u

}= uTE

{xxT

}u = uTRxu

Page 22: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

21

Principal component analysis (3/4)

f(u) = uTRxu

h(u) = uTu− 1 = 0

f ′(uopt)− λh′(uopt) = 0T

2uToptRx − λ2uT

opt = 0T

Rxuopt = λuopt

f(uopt) = λ

Page 23: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

22

Principal component analysis (4/4)

∗ ∗∗∗∗

∗∗∗

∗∗∗

∗∗

∗∗

∗ ∗∗∗

∗∗∗∗ ∗

∗∗∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗ ∗

∗∗∗∗∗∗

∗ ∗∗∗∗∗∗

∗∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗∗

∗∗

∗∗

∗ ∗

∗ ∗

∗∗∗

∗∗∗

∗∗∗∗

∗ ∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗∗∗ ∗∗ ∗

∗∗

∗∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗

∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗

∗ ∗∗

∗∗

∗ ∗∗∗∗

∗∗ ∗∗∗

∗∗

∗∗∗∗

∗∗

∗∗

∗ ∗∗∗

∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗∗

∗∗∗

∗ ∗

∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗

∗ ∗

∗∗∗

∗∗

∗∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗

∗∗∗∗

∗∗ ∗

∗∗

∗ ∗∗

∗∗

∗∗

∗∗ ∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗∗∗∗ ∗∗ ∗∗∗

∗∗

∗∗

∗∗

∗∗∗ ∗∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗∗ ∗∗ ∗

∗∗

∗∗

∗∗∗

∗∗∗

∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗∗

∗∗∗∗ ∗∗

∗∗∗∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗ ∗

∗∗

∗∗∗

∗∗∗∗

∗∗

∗ ∗∗ ∗

∗ ∗∗

∗ ∗

∗∗ ∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗∗

∗ ∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗ ∗∗

∗∗∗∗ ∗

∗∗∗

∗∗

∗∗

∗ ∗∗

∗∗

∗∗∗

∗∗ ∗∗

∗∗

∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗∗∗

∗ ∗∗

∗∗∗

∗∗∗∗∗

∗∗

∗∗∗∗∗∗

∗∗

∗∗

∗∗∗

∗∗

∗∗

∗∗

∗∗

−40 −20 0 20 40−40

−20

0

20

40

dev. from mean height (cm)

dev

.from

mea

nwei

ght

(kg) • orthogonal PCA basis

• deflationary minima /

maxima of variance

• dimensionality reduction

• noise attenuation

Page 24: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

23

PCA example — original

Page 25: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

24

PCA example — PCA basis

Page 26: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

25

PCA example — 90% compressed

Page 27: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

26

PCA example — 75% compressed

Page 28: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

27

PCA example — 50% compressed

Page 29: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

28

PCA example — original

Page 30: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

29

Statistical independence

• random variables x and y

• statistical independence: knowing the value of x does

not provide information about the distribution of y

• py (y||x) = px,y(x,y)px(x) = py (y)

⇔ px,y (x, y) = px (x) py (y)

Page 31: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

30

IV. Estimation theory

• basic concepts

• maximum-likelihood estimation

Page 32: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

31

Basic concepts (1/3)

• estimation: finding an approximate value or distribution

of some parameter (e.g., mean, standard deviation)

from random samples of the population

• estimator: a function computing the approximate

value or distribution from the samples (e.g., µ =1/T

∑Ti=1 x(i))

• estimate: numerical value of the estimator for a given

set of samples

Page 33: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

32

Basic concepts (2/3)

• some good properties of estimators:

• unbiasedness: E{

θ}

= θ

• consistency:

∀ε > 0[

limT→∞

P(∥∥∥θ − θ

∥∥∥ < ε)

= 1]

Page 34: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

33

Basic concepts (3/3)

• data model: mathematical representation of data

• parametric / nonparametric

• static / dynamic

• probabilistic / deterministic

• example: px (x||µ, σ) = Ke−(x−µ)2

2σ2

Page 35: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

34

Maximum-likelihood estimation (1/3)

• select data model parameter values that maximize the

probability of the observed data

• θML = arg maxθ P (x(1), x(2), ..., x(T ) || θ)

• continuous data: solve the likelihood equation

∂θln px(1),...,x(T ) (x(1), ..., x(T )||θ)

∣∣∣∣θ=θML

= 0

Page 36: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

35

Maximum-likelihood estimation (2/3)

• when samples are independent

px(1),...,x(T ) (x(1), ..., x(T )||θ) =T∏

i=1

px(i) (x(i)||θ)

• example: mean of a Gaussian

• px(1),...,x(T ) (x(1), ..., x(T )||µ) = KT∏T

i=1 e−(x(i)−µ)2

2σ2

Page 37: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

36

Maximum-likelihood estimation (3/3)

∂µ

[− 1

2σ2

T∑i=1

(x(i)− µ)2

]∣∣∣∣∣µ=µML

= 0

T∑i=1

(x(i)− µML) = 0

µML =1T

T∑i=1

x(i)

Page 38: Introduction to Independent Component Analysiscsatol/gep_tan/ICA_HUT.pdfIntroduction to Independent Component Analysis Jarmo Hurri, Patrik Hoyer, Aapo Hyv¨arinen 1 Course timetable

37

Tomorrow

ICA in images and video (and equations...)