all you wanted to know about (sparse / high dimensional ...saharon/bigdata2018/pca_high_dim.pdf ·...

All you wanted to know about(sparse / high dimensional)

PCA and Covariance Estimation———————

but didn’t dare to ask

Boaz Nadler

Department of Computer Science and Applied MathematicsThe Weizmann Institute of Science

June 2018

Boaz Nadler PCA

Part 0: Introduction

Boaz Nadler PCA

Introduction

In many applications we need to analyze multivariate data,

x1, . . . , xn ∈ Rp

Throughout talk:p - dimension, n - number of samples

Many different scenarios:

- p small, n large.

- both p and n large.

- p large, n small.

Boaz Nadler PCA

Introduction

In many applications we need to analyze multivariate data,

x1, . . . , xn ∈ Rp

Throughout talk:p - dimension, n - number of samples

Many different scenarios:

- p small, n large.

- both p and n large.

- p large, n small.

Boaz Nadler PCA

Unsupervised Tasks

- Visualization,

- Dimensionality Reduction

- Compression, Denoising

- Exploratory Data Analysis - finding structure in data

- Many hypothesis testing problems.

Boaz Nadler PCA

Unsupervised Tasks

- Visualization,

Boaz Nadler PCA

Unsupervised Tasks

- Visualization,

Boaz Nadler PCA

Unsupervised Tasks

- Visualization,

Boaz Nadler PCA

Unsupervised Tasks

- Visualization,

Boaz Nadler PCA

Supervised Tasks

In some cases we observe also a response yi for each xi .

Common tasks:

- Classification (Linear Discriminant Analysis)

- Regression (Multivariate Linear Regression).

Boaz Nadler PCA

Density Estimation and Moments

Typical assumption: xi are i.i.d. from some r.v. X with unknowndensity f (x).

In principle, f (x) describes everything about X .

So, we can try to estimate it.

Non-parametric (kernel) density estimation:

|f (x)− f (x)| ∼ n−2β/(2β+p)

where β measure of smoothness of f (x).

Curse of dimensionality:For small error ε, need exponential in p number of samples,n = O(exp(Cp)).

Typically not practical if p > 15

Boaz Nadler PCA

|f (x)− f (x)| ∼ n−2β/(2β+p)

Boaz Nadler PCA

|f (x)− f (x)| ∼ n−2β/(2β+p)

Boaz Nadler PCA

|f (x)− f (x)| ∼ n−2β/(2β+p)

Boaz Nadler PCA

|f (x)− f (x)| ∼ n−2β/(2β+p)

Boaz Nadler PCA

First and Second Moments of X

Since already at moderate dimensions cannot estimate f (x), optfor first and second moments:

Definition: The population mean vector µ ∈ Rp is

µ = E[X ] =

∫xf (x)dx

Definition: The population covariance matrix Σp×p is defined as

Σ = E[(X − µ)(X − µ)T ] =

∫(x− µ)(x− µ)T f (x)dx

In many problems, estimates of µ,Σ will suffice to solve a dataanalysis task.

Boaz Nadler PCA

Example I: Quadratic Discriminant Analysis

Binary (2-class) classification problem:

Observe xin+i=1 from class +1

x′in−i=1 from class −1.

Goal: classify label y ∈ ±1 of new x.

QDA: Assume each class is multivariate Gaussian,

Likellhood Ratio = C (Σ+,Σ−)exp(−(x− µ+)TΣ−1+ (x− µ+)/2)

exp(−(x− µ−)TΣ−1− (x− µ−)/2)

To apply QDA, need to estimate µ± and Σ± (actually its inverse)

Boaz Nadler PCA

exp(−(x− µ−)TΣ−1− (x− µ−)/2)

Boaz Nadler PCA

exp(−(x− µ−)TΣ−1− (x− µ−)/2)

Boaz Nadler PCA

exp(−(x− µ−)TΣ−1− (x− µ−)/2)

Boaz Nadler PCA

exp(−(x− µ−)TΣ−1− (x− µ−)/2)

Boaz Nadler PCA

Example II: Markowitz Portfolio Optimization

[Harry Markowitz, Nobel prize 90’]User can invest in p stocks, with weight vector w = (w1, . . . ,wp),s.t.

∑wi = 1.

Stock i has expected return µi . All p stocks have joint covarianceΣ, where Σii is the variance of the i-th stock.

Goal: Construct a portfolio that achieves an average target returnµR , namely

∑i wiµi = µR

butwith minimal risk (variance).

To construct portfolio, need to estimate µi and Σ.

Boaz Nadler PCA

∑wi = 1.

∑i wiµi = µR

Boaz Nadler PCA

∑wi = 1.

∑i wiµi = µR

Boaz Nadler PCA

∑wi = 1.

∑i wiµi = µR

Boaz Nadler PCA

Example III: Dimensionality of Data / Signals in Noise

Analytical Chemistry, Array Signal Processing, ...:

Assume “signal+noise” model

x =K∑j=1

sjvj + noise

Given observed data x1, . . . , xn, a basic question is: what is K ?

The covariance matrix of data has K spikes, eigenvalues largerthan noise variance σ2, and remaining eigenvalues all equal σ2.

Question: How to estimate K , which signal strengths can bedetected ?

Boaz Nadler PCA

x =K∑j=1

sjvj + noise

Boaz Nadler PCA

x =K∑j=1

sjvj + noise

Boaz Nadler PCA

Detection of Structure / Signals in Noise

Eigenvalues of Sn (log-scale):

5 10 15 20 25

p= 25 n= 1000

5 10 15 20 25

p= 25 n= 50

Boaz Nadler PCA

Estimation of Signal Direction

Suppose there is one signal in the data.

x = sv + σξ

Goal: Estimate signal direction (vector v).

Questions: How should it be estimated ?How well can it be estimated ?What happens when dimension p is high ?

Boaz Nadler PCA

Suppose there is one signal in the data.

x = sv + σξ

Goal: Estimate signal direction (vector v).

Questions: How should it be estimated ?How well can it be estimated ?What happens when dimension p is high ?

Boaz Nadler PCA

0 1 2 30

λ1, λ

2 as a function of σ

0 1 2 30

< vk , e

Boaz Nadler PCA

0 2 4 6

λ1, λ

2 as a function of n

0 2 4 6

< vPCA

, e1 >

Boaz Nadler PCA

Common Statistical Inference Problems

Given data x1, . . . , xn:

- Estimate population covariance matrix Σ

- Estimate the precision matrix Ω = Σ−1.

- Estimate the largest eigenvalues and eigenvectors of Σ.

- Suppose Σ (or Σ−1) have many zeros. Find their support.

- Many hypothesis testing problems: Is Σ = Σ0 ?

- Uncorrelated variables: Is Σ = diag , is Σ banded ?

Boaz Nadler PCA

Inference Problems

Typically, µ and Σ are unknown and must be estimated from data.

Classical Estimates:

Sample Mean:

Sample Covariance Matrix:

n − 1

n∑i=1

(xi − x)(xi − x)T

Boaz Nadler PCA

Inference Problems

Typically, µ and Σ are unknown and must be estimated from data.

Classical Estimates:

Sample Mean:

Sample Covariance Matrix:

n − 1

n∑i=1

(xi − x)(xi − x)T

Boaz Nadler PCA

Outline

Q-1: How close are Sn and its eigenvalues/vectors to those of Σ ?

Part 1. p small, n→∞ [classical asymptotic statistics]

Part 2. p, n both large, [modern high dimensional statistics]

Q-2: What if we know additional information: sparsity.

Part 3. Sparse Principal Component Analysis, Sparse CovarianceEstimation

Boaz Nadler PCA

Outline

Boaz Nadler PCA

Outline

Boaz Nadler PCA

The Important Question

Why is this interesting, relevant, important ?

Boaz Nadler PCA

Covariance Estimation will be important to you

Boaz Nadler PCA

References (Partial List)

* Nadler, Finite sample approximation results for PCA, Annals of Stat.,2008.* Kritchman and Nadler, determining the number of components in afactor model from limited noisy data, 2008.* Birnbaum et. al., Minimax bounds for sparse PCA with noisy highdimensional data, Annals of Stat., 2013.* Baik and Silverstein, Eigenvalues of large sample covariance matrices...,J. Mult. Anal. 2006.* Paul, Asymptotics of sample eigenstructure..., Stat. Sinica, 2008.* Bickel and Levina, Covariance regularization by thresholding, Annals ofStat., 2008.* Cai and Liu, Adaptive thresholding for sparse covariance matrixestimation, J. Am. Stat. Assoc., 2011.

* Cai, Zhang, Zhou, Optimal rates of convergence for covariance matrix

estimation, Ann. of Stat. 2010.

and many others...

Boaz Nadler PCA

all you wanted to know about (sparse / high dimensional ...saharon/bigdata2018/pca_high_dim.pdf ·...

Documents