valero laparra jesús malo gustavo camps

Gaussianization based on Principal Components Analisys

(GPCA):an easy tool for optimal signal

processing.

Valero Laparra

Jesús Malo

Gustavo Camps

-What?-Why?-How?-Conclusions

-Toolbox

• Estimate multidimensional Probability Densities

• How the N-D data is distributed in the N-D space

What to pay atention to!What is important from our data

• GENERIC OPTIMAL SOLUTIONS

• PDF estimation through samples always asume a model.

• HISTOGRAM: without assuming a functional model

• X = [ -1.66 1.25 0.73 1.72 0.88 0.19 -0.81 0.42 -0.14 …]

• Problem: Number of bins estimationNbins = √Nsamples

• Problem: “the curse of dimensionality”

- Nb_total = Nb_dim ^N_dim

- If we assume: Ns = Nb^2

- Ns = Nb^2*Nd

• Problem: “the curse of dimensionality”Nb total = Nb dimension ^N dimensions

e.g.– Assuming a minimum number of Nb = 11 bins– We need Ns = 11^2*Nd– If Nd = 1, Ns = 121 968 Bytes Nd = 2, Ns = 14641 117.128 Bytes Nd = 3, Ns = 1771561 14.172.488 Bytes Nd = 4, Ns = 214358881 1.714.871.048 Bytes Nd = 5, Ns = 25.937.000.000 HELP MEMORY

Nd = 6, Ns = 3.138.400.000.000 HELP MEMORY

• From P(x) to P(y) (Gaussian)

MATLAB, MATLAB, WHAT A WONDERFUL WORLD

Answer: GPCA

How? Theoretical convergence Proof

• Negentropy:

OPEN ISSUE

• Stop criterion:

NOTE THAT:

Measuring Mutual

Information

GAUSSIAN UNIQUE DISTRIBUTION

WITH MARGINAL DISTRIBUTIONS GAUSSIANS AND INDEPENDENTS

I (Xn) = ~ 0

How? GPCA Inverse

NOTE THAT:

Synthesis

How? GPCA Jacobian

CONCLUSIONS

• The optimal solution of many problems involves the knoledge of the data pdf.

• GPCA obtains a transform that convert any pdf in a Gaussian pdf.

• It has an easy inverse.

• It has an easy Jacobian.

• This transform can be used to calculate the pdf of any data.

GPCA toolbox (Matlab)3 examples

• PDF estimation

• Mutual Information Measures

• Synthesis

Wiki-page

Beta version

Basic toolbox

• [datT Trans] = GPCA (dat, Nit, Perc)

- dat = data matrix with [N dimensions x N samples]

e.g. 100 samples from 2-D gaussiandat = [2 x 100]

- Nit = Number of iterations

- Perc = percentage of increase the pdf Range.

Basic toolbox

• Perc = percentage of increase the pdf range.

Basic toolbox

• [datT Trans] = auto_GPCA(dat)

• [datT] = apply_GPCA(dat,Trans)

• [dat] = inv_GPCA(datT,Trans)

• [Px pT detJ JJ] = GPCA_probability(x0,Trans)

Estimating PDF/manifold

• [datT Trans] = auto_GPCA(dat)• [Px pT detJ JJ] = GPCA_probability (XX,Trans);

• PROBLEMS

– Not always arrives to Gaussian– Pdf with clusters is more complicated– The Jacobian estimation is highly point-dependent– The derivative (in the Jacobian estimation) is much more

irregular than the integral.– The pdf has to be estimated for each point

Measuring Mutual Information

• [datT Trans] = auto_GPCA(dat)• MI = abs(min(cumsum(cat(1,Trans.I)))));

Error = (Real MI – Estimated MI) / Real MI (10 realizations)

N - dim Pdf - 1 Pdf - 2 Pdf - 3

3 0.0697 0.0787 0.0630

4 0.0150 0.0031 0.0048

5 0.0353 0.0297 0.0328

8 0.0313 0.0369 0.0372

10 0.0148 0.0145 0.0132

Measuring Mutual Information

• PROBLEMS

– Entropy estimators are not perfectly defined– More iterations, more error– As more complicated pdf, more error

Synthesizing data• [datT Trans] = auto_GPCA(dat)• [dat2] = inv_GPCA( randn(Dim,Nsamples) , Trans);

Inv T1

Inv T2

Synthesizing data

• [datT Trans] = auto_GPCA(dat)• [dat2] = inv_GPCA(randn(Dim,Nsamples),Trans);

Synthesizing data

• PROBLEMS

– Not always arrive to a Gaussian– Little variations on the variance of the random data obtains very

different results.– No information about features of the data in the transformed

domain.

• Thanks for your time

valero laparra jesús malo gustavo camps

gpca dat

data pdf

gpca randndim

gpca inversenote

pdf range

complicated pdf

gaussian pdf

bytes nd

Documents

valero case study

dr larry malo

portfolio ivan valero

jean-pierre malo les hayet (ayet) dit malo de la rive-nord

valero 6.2 upgrade

portfolio malo tojo

max valero dv_pac3

saint malo > saint-brieuc

malo final hr

internship opportunities - valero

horák conexion - ana valero - ana valero

valero renewables - bluffton

valero annual report

malo-areshevskaya school

5eeca6d3937f5a0001903351 malo briand - polito

oscar wylde eric malo

auditoria 2011. valero

mark on valero

saint-malo - brittany - france

valero investor relations presentation