non-negative and geodesic approaches to …markp/2002/plumbley02-ica...ica workshop, 20 december...

23
Non-Negative and Geodesic approaches to Independent Component Analysis Mark Plumbley Queen Mary, University of London, Email: [email protected]. ICA Workshop, 20 December 2002.

Upload: others

Post on 12-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Non-Negative and Geodesic approaches to

Independent Component Analysis

Mark Plumbley

Queen Mary, University of London,

Email: [email protected].

ICA Workshop, 20 December 2002.

Page 2: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Overview

• Introduction

• Nonnegative ICA using nonlinear PCA

• Successive rotations

• Geodesic line search

• Results

• Conclusions

1

Page 3: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Introduction

• Observations of mixed data - generative model

X = AS (1)

with sources S ∈ IRn×p and mixing matrix A ∈ IRm×n.

• Task - to discover the source samples S and mixing matrixA given only the observations X.

• An Underdetermined problem: if (A∗,S∗) is a solution, so is(A∗M,M−1S∗) (for invertible M)

• So - need constraints.

2

Page 4: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Constraints

1. Independence of sources: sjk sampled from independent ran-dom variables Sj.

2. Non-negativity of sources: sjk ≥ 0 for all 1 ≤ j ≤ n, 1 ≤ k ≤ p.

Independence alone→ classical noiseless ICA.Non-negativity alone (of S and A)→ non-negative matrix factorization [Lee & Seung, 1999]

Both constraints→ non-negative independent component analysis.

3

Page 5: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Non-negative ICA using Nonlinear PCA

ICA often simplified by pre-whitening - transform

x = Qz (2)

to get identity covariance Cx = E((x − x̄)(x − x̄)T ) = I.

Problem now to find orthonormal weight matrix W, satisfying

WTW = WWT = In, such that the outputs y = Wx = WQAs

are independent.

Typical ICA algorithms search for extremum of contrast function

(e.g. kurtosis).

4

Page 6: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Whitening of Non-Negative Data

−2 0 2 4

−2

−1

0

1

2

3

4

5

x1

x 2

(a)

−2 0 2 4

−2

−1

0

1

2

3

4

5

z1

z 2

(b)

Original data (a) is whitened (b) to remove 2nd order correla-

tions. Suggests we just try to fit the data into +ve quadrant.

5

Page 7: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Cost/Contrast Function for Non-negative ICA?

Let U = WQA, i.e. y = Us

For non-negative sources s, which are well-grounded (i.e. Pr(s <

δ) > 0 for any δ > 0),

then

U is a permutation matrix (i.e. sources are separated)

iff all components of y are non-negative w.p.1.

So - use a cost function, e.g. mean squared reconstruction error

J =1

2E(‖x − x̂‖) x̂ = WTy+

where y+ is rectified version of y = Wx.

6

Page 8: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Nonlinear PCA algorithms

Natural to consider nonlinear PCA algorithm:

∆W = ηg(y)[x − Wg(y)]T

in special case of g(y) = y+, i.e.

∆W = ηy+[x − x̂]T x̂ = WTy+

(“non-negative PCA”).

Convergence? g(y) = y+ neither odd nor twice differentiable, so

standard proof not applicable.

However, behaves like a ‘switching subspace network’, so can

modify PCA subspace convergence proofs (to be confirmed!)

7

Page 9: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Orthonormality through Axis Rotation

Nonlinear PCA updates W in Euclidean matrix space, tending

towards orthonormality WWT = I.

However, can instead construct W from 2D rotations [Comon,

1994], (yi1yi2

)=

(cosφ sinφ-sinφ cosφ

)(xi1xi2

)

Rotations (and product) always remain orthonormal.

Construct update multiplicatively as

W(t) = R(t)R(t − 1) · · ·R(1)W(0)

where R(t) is a 2D Givens rotation.

8

Page 10: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

2D Axis rotationsx2

x

y1

l

y2

x1

2J =

0 if y1 ≥ 0, y2 ≥ 0y22 if y1 ≥ 0, y2 < 0

y21 if y1 < 0, y2 ≥ 0

y21 + y2

2 = l2 otherwise (i.e. y1 ≥ 0, y2 < 0)

9

Page 11: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Derivative and Algorithm

dJ/dθ =

0 if y1 ≥ 0, y2 ≥ 0y2y1 if y1 ≥ 0, y2 < 0−y1y2 if y1 < 0, y2 ≥ 00 otherwise (y1 ≥ 0, y2 < 0)

= y+ × y−= y+

1 y−2 − y+2 y−1

Gradient descent algorithm:

∆φ = −ηφ · dJ/dφ = +ηφ · dJ/dθ

= ηφ(y+1 y−2 − y−1 y+

2 )

Relate this to concept of torque in a mechanical system.

10

Page 12: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Line Search over Rotation Angle

Instead of simple gradient descent, can use line search.

E.g. Matlab fzero for zero of dJ/dφ.

If sources are non-negative as required, we know min(J) = 0 so

make local quadratic approximation and jump to

φ(t + 1) = φ(t) − 2J(t)/(dJ(t)/dφ)

OK since solution locally quadratic & curvature increases away

from solution, as more data points ‘escape’ from the +ve quad-

rant.

11

Page 13: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

More Than 2 Dimensions: Algorithm

1. Set X(0) = X, W(0) = I, t = 0.

2. Calculate Y = X(t) = W(t)X(0)

3. Calculate torques gij =∑

k y+iky−jk − y−iky+

jk4. Exit if |gij| < tolerance

5. For i∗, j∗ maximizing |gij| construct X∗ from selecting rows i∗and j∗ from X(t).

6. Do line search to find φ∗(t + 1) which minimizes J.

7. Form the rotation matrix R(t+1) = [r(t+1)ij] from φ∗(t+1).

8. Form the updated weight matrix W(t + 1) = R(t + 1)W(t)

and modified input data X(t+1) = R(t+1)X(t) = W(t+1)X(0).

9. Increment the step count t, and repeat from 2.

12

Page 14: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Geodesic search

Successive rotations - equivalent to line search along axis direc-tions.Search in more general directions?

Geodesic - shortest path between 2 points on a manifold.

For orthonormal matrices, have [Edelmam, Arias & Smith, 1998]

W(τ) = eτBW(0)

where BT = −B and τ scalar. [Fiori 2001, Nishimori 1999]

NB: In 2D we get

B =

(0 b−b 0

)eτB =

(cos(τb) sin(τb)− sin(τb) cos(τb)

)

13

Page 15: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Steepest Descent Geodesic

Parameterize B = C−CT with cij = 0 for i ≥ j. C has n(n−1)/2

free parameters.

For steepest descent in C space, maximize

− lim∆τ→0

(change in J due to ∆τ)/∆τ

(distance moved by of τC due to ∆τ)/∆τ=

dJ/dτ

‖C‖FWe find

dJ/dτ = trace((Y−YT − YYT−)CT ) =< (Y−YT − YYT−), C >

so for steepest descent choose

C ∝ UT(Y−YT − YYT−) = UT(Y−YT+ − Y+YT−)

14

Page 16: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Gradient descent

Simply update W according to

W(t + 1) = e−η(Y−YT

+−Y+YT−)W(t)

with small update η. This is the geodesic flow method [Fiori

2001, Nishimori 1999].

BUT - no need to restrict to small updates.

Can do e.g. line search along the steepest-descent geodesic.

15

Page 17: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Line Search along Geodesic: Algorithm

1. X(0) = X and W(0) = I at step t = 0.

2. Calculate Y = X(t) = W(t)X(0).

3. Calculate gradient G(t) = UT(Y−YT+ − Y+YT−) and B-space

movement direction H(t) = −(G(t) − G(t)T ) = Y+YT− − Y−YT+.

4. Stop if ‖G(t)‖ < tolerance.

5. Perform a line search for τ∗ which minimizes J(τ) using

Y(τ) = R(τ)X(t) and R(τ) = e−τH.

6. Update W(t + 1) = R(τ∗)W(t) and X(t + 1) = R(τ∗)X(t) =

W(t + 1)X(0).

7. Increment t, repeat from 2. Until exit at 4.

For simple tasks can guess single quadratic jump to J = 0

∆C = 2JG/‖G‖2.16

Page 18: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Results - Image separation problem

Source images and histograms [Cichocki, Kasprzak & Amari

1996].

17

Page 19: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Nonlinear PCA

0 50 100 150 20010

−10

10−5

100

Epoch, t

Err

or

(a) Initial state

(b) After 50 epochs

(c) After 200 epochs

18

Page 20: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Successive rotations

0 5 10 15 20 25

10−10

100

Epoch

Err

or

(a) Initial state

(b) 5 epochs: rotated axes 1-3

(c) 15 epochs: rotated axes 2-3

(c) 22 epochs: rotated axes 1-3

19

Page 21: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Geodesic step

0 5 10 1510

−15

10−10

10−5

100

Epoch

Err

or

(Visually similar images)

20

Page 22: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Music example: Liszt Etude No 5 (extract)

Frame

Fre

quen

cy b

in

50 100 150 200 250 300 350 400 450

50

100

150

200

250

Frame

Out

put

50 100 150 200 250 300 350 400 450

2468

10

21

Page 23: Non-Negative and Geodesic approaches to …markp/2002/Plumbley02-ica...ICA Workshop, 20 December 2002. Overview • Introduction • Nonnegative ICA using nonlinear PCA • Successive

Conclusions

• Considered problem of non-negative ICA

• Separation of whitened sources when zero reconstruction er-

ror.

• Nonlinear PCA with g(y) = y+.

• Successive rotations keeps orthogonality

• Geodesic line search

22