Download - High-Dimensional Robust Mean Estimation in Nearly-Linear Time

High-Dimensional Robust Mean Estimation

in Nearly-Linear Time

Yu Cheng1

Ilias Diakonikolas2

Rong Ge1

1Duke University

2University of Southern California

Yu Cheng (Duke) Faster Robust Mean Estimation 1 / 25

Page 2: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Statistical Learning

(Unknown) Parameters Samples Algorithms

Performance criteria:

Sample complexity

Running time

Robustness

Page 3: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Sample complexity

Running time

Robustness

Page 4: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Sample complexity

Running time

Robustness

Page 5: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Sample complexity

Running time

Robustness

Page 6: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Robust Statistical Learning

(Unknown) Parameters Corrupted samples Algorithms

Q: Can we design provably robust and computational eicient learning algorithmswhen a small fraction of the data is corrupted?

Motivation:

Model misspecification / Robust statistics [Huber 1960s, Tukey 1960s, . . . ]

Data poisoning aacks, Reliable / Adversarial / Secure ML

Page 7: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Motivation:

Page 8: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Motivation:

Page 9: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Motivation:

Page 10: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Motivation

Data Poisoning: High-frequency trading algorithms.

Twier account of the Associated Press was hacked in April 2013

($136 billion in 3 minutes).

Biological Datasets: POPRES project, HGDP datasets.

High-dimensional datasets tend to be inherently noisy.

Hard to detect in several cases [Rosenberg et al., Science’02; Li et al., Science’08; Paschou

et al., Medical Genetics’10]

Reliable/Adversarial/Secure ML:

Recommendation Systems, Crowdsourcing, . . .

Aacker can generate malicious data to maximize his objectives. [Mayzlin et al. ’14] [Wang

et al. ’14] [Li et al. ’16]

Page 11: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Motivation

et al. ’14] [Li et al. ’16]

Page 12: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Motivation

et al. ’14] [Li et al. ’16]

Mean Estimation

Input: N samples X1, . . . ,XN drawn from N (µ⋆, I) on Rd.

Goal: Learn µ⋆.

Empirical mean µ =1N ∑

Ni=1 Xi works:

∥µ − µ⋆∥2 ≤ ε when N = Ω(d/ε2).

Running time: O(Nd).

Mean Estimation

Goal: Learn µ⋆.

Ni=1 Xi works:

∥µ − µ⋆∥2 ≤ ε when N = Ω(d/ε2).

Mean Estimation

Goal: Learn µ⋆.

Ni=1 Xi works:

∥µ − µ⋆∥2 ≤ ε when N = Ω(d/ε2).

Robust Mean Estimation

Definition (ε-Corruption)

N samples are drawn i.i.d. from the ground-truth distribution D.

Adversary replaces εN samples with arbitrary points (aer inspecting D, the samples, and

the algorithm).

Input: an ε-corrupted set of N samples X1, . . . ,XN drawn from an unknown distribution Don Rd

with mean µ⋆.

Goal: Learn µ⋆ in `2-norm.

Page 17: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

the algorithm).

with mean µ⋆.

Page 18: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

the algorithm).

with mean µ⋆.

Page 19: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Previous Work

Robustly learn µ⋆ given ε-corrupted samples from N (µ⋆, I):

Algorithm Error Guarantee Poly-Time?

Tukey Median O(ε) No

Geometric Median O(ε√

d) Yes

Tournament O(ε) No

Pruning O(ε√

d) Yes

RANSAC ∞ Yes

[LRV’16] O(ε√

log d) Yes

[DKKLMS’16] O(ε√

log(1/ε)) Yes

Page 20: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Previous Work

d) Yes

Tournament O(ε) No

Pruning O(ε√

d) Yes

RANSAC ∞ Yes

[LRV’16] O(ε√

log d) Yes

log(1/ε)) Yes

Page 21: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Previous Work

d) Yes

Tournament O(ε) No

Pruning O(ε√

d) Yes

RANSAC ∞ Yes

[LRV’16] O(ε√

log d) Yes

log(1/ε)) Yes

Page 22: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Previous Work

d) Yes

Tournament O(ε) No

Pruning O(ε√

d) Yes

RANSAC ∞ Yes

[LRV’16] O(ε√

log d) Yes

log(1/ε)) Yes

Page 23: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Previous Work

Algorithm Error (δ) Runtime

Dimension Halving [LRV’16] O(ε√

log d) Ω(Nd2) + SVD

Convex Programming [DKKLMS’16] O(ε√

log(1/ε)) Ellipsoid Algorithm

Filtering [DKKLMS’16] O(ε√

log(1/ε)) Ω(Nd2)

This paper O(ε√

log(1/ε)) O(Nd/ε6)

All these algorithms have the right sample complexity N = O(d/δ2).

Page 24: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Previous Work

log(1/ε)) Ω(Nd2)

This paper O(ε√

Page 25: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Previous Work

log(1/ε)) Ω(Nd2)

This paper O(ε√

Page 26: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Previous Work

log(1/ε)) Ω(Nd2)

This paper O(ε√

Page 27: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Results

Robustly learn µ⋆ given ε-corrupted samples from D on Rd.

Distribution Error (δ) # of Samples (N) Runtime

Sub-Gaussian O(ε√

log(1/ε)) O(d/δ2)O(Nd/ε6)

Bounded Covariance (Σ ⪯ I ) O(

√

ε) O(d/δ2)

When ε is constant, our algorithm has the best possible error guarantee, sample complexity, and

running time (up to polylogarithmic factors).

Page 28: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Results

√

ε) O(d/δ2)

Page 29: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Results

√

ε) O(d/δ2)

Page 30: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Results

√

ε) O(d/δ2)

Robust mean estimation under bounded covariance assumptions has been used as a subroutine to

obtain robust learners for a wide range of supervised learning problems that can be phrased as

stochastic convex programs.

Our result provides a faster implementation of such a subroutine, hence yields faster robust

algorithms for all these problems.

Page 31: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Results

√

ε) O(d/δ2)

Page 32: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Results

√

ε) O(d/δ2)

Page 33: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Reweight the Samples

[DKKLMS’16]: To shi the empirical mean far from µ⋆, the corrupted samples must introduce a

large eigenvalue in the second-moment matrix.

For X ∼ N (µ⋆, I), E[(X − µ⋆)(X − µ⋆)⊺] = I .

Good Weights

minimize λmax (∑Ni=1wi(Xi − µ

⋆)(Xi − µ

⋆)⊺)

subject to w ∈∆N ,ε (∑i wi = 1 and 0 ≤ wi ≤ 1(1−ε)N )

Lemma ([DKKLMS’16])

If we can find a near-optimal solution w, we can output µw = ∑i wiXi.

This looks like a packing SDP in w (which we can solve in nearly-linear time).

Except that ... we do not know µ⋆.

Page 34: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

For X ∼ N (µ⋆, I), E[(X − µ⋆)(X − µ⋆)⊺] = I .

Good Weights

⋆)(Xi − µ

⋆)⊺)

Page 35: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

For X ∼ N (µ⋆, I), E[(X − µ⋆)(X − µ⋆)⊺] = I .

Good Weights

⋆)(Xi − µ

⋆)⊺)

Page 36: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

For X ∼ N (µ⋆, I), E[(X − µ⋆)(X − µ⋆)⊺] = I .

Good Weights

⋆)(Xi − µ

⋆)⊺)

Page 37: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

For X ∼ N (µ⋆, I), E[(X − µ⋆)(X − µ⋆)⊺] = I .

Good Weights

⋆)(Xi − µ

⋆)⊺)

Except that ...

we do not know µ⋆.

Page 38: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

For X ∼ N (µ⋆, I), E[(X − µ⋆)(X − µ⋆)⊺] = I .

Good Weights

⋆)(Xi − µ

⋆)⊺)

Page 39: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

Idea: guess the mean ν and solve the SDP with parameter ν.

Primal SDP (with parameter ν)

minimize λmax (∑Ni=1wi(Xi − ν)(Xi − ν)

⊺)

subject to w ∈∆N ,ε

We give a win-win analysis: either

a near-optimal solution w to the primal SDP give a good answer µw , or

a near-optimal solution to the dual SDP yields a new guess ν′ that is closer to µ⋆ by a

constant factor.

Page 40: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

⊺)

constant factor.

Page 41: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

⊺)

constant factor.

Page 42: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

⊺)

constant factor.

Page 43: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

Iteratively move ν closer to µ⋆ using the dual SDP,

until primal SDP has a good solution and we can output µw .

bc

bcbc

bcbcbc

bcbc

bc

bcbcbc

bc

bcbc bc bc bc

bc bc

bcbc

bc bc

bc

bcbc bcbcbc

bcbc bc bc bc

b

µ⋆

bν1

bν2

bν3

ν1: Dual

ν2: Dual

ν3: Primal

Page 44: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

Iteratively move ν closer to µ⋆ using the dual SDP,

until primal SDP has a good solution and we can output µw .

bc

bcbc

bcbcbc

bcbc

bc

bcbcbc

bc

bcbc bc bc bc

bc bc

bcbc

bc bc

bc

bcbc bcbcbc

bcbc bc bc bc

b

µ⋆

bν1

bν2

bν3

ν1: Dual

ν2: Dual

ν3: Primal

Page 45: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

Iteratively move ν closer to µ⋆ using the dual SDP:

Which direction is µ⋆?

How far is µ⋆?

Page 46: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

How far is µ⋆?

Page 47: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Our Approach

How far is µ⋆?

Page 48: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Direction of µ⋆: Dual SDP

⊺)

SDP Duality

minw∈∆N ,ε

maxM⪰0,tr(M)≤1

⟨M,∑iwi(Xi − ν)(Xi − ν)

⊺⟩

maxM⪰0,tr(M)≤1

minw∈∆N ,ε

⊺⟩

Page 49: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

⊺)

SDP Duality

minw∈∆N ,ε

maxM⪰0,tr(M)≤1

⊺⟩

maxM⪰0,tr(M)≤1

minw∈∆N ,ε

⊺⟩

Page 50: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

⊺)

SDP Duality

minw∈∆N ,ε

maxM⪰0,tr(M)≤1

⊺⟩

maxM⪰0,tr(M)≤1

minw∈∆N ,ε

⊺⟩

Page 51: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Dual SDP (with parameter ν)

maximize Mean of the smallest (1 − ε)-fraction of ((Xi − ν)⊺M(Xi − ν))

Ni=1

subject to M ⪰ 0, tr(M) ≤ 1

The dual SDP certifies that there are no good weights that can make the spectral norm small.

If the solution is rank-one: M = yy⊺, then in the direction of y, the variance is large no maer

how we reweight the samples.

Intuition: When ν is far from µ⋆, y should align with (ν − µ⋆).

Page 52: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Ni=1

Page 53: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Ni=1

Page 54: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Ni=1

Page 55: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Why would the dual SDP pick the direction (ν − µ⋆)?

bcbcbc

bcbcbcbcbcbc

bcbcbcbcbcbc bcbc bc

bc bc bc bc bc bc bcbc bcbc bcbc

bcbc bcbcbc

bcbc bc bcbc

b µ⋆

b ν

y′

y

M = yy⊤

y ≈ (ν − µ⋆)

Why is y better than y′?

Page 56: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Why would the dual SDP pick the direction (ν − µ⋆)?

bcbcbc

bcbcbcbcbcbc

bcbcbcbcbcbc bcbc bc

bc bc bc bc bc bc bcbc bcbc bcbc

bcbc bcbcbc

bcbc bc bcbc

b µ⋆

b ν

y′

y

M = yy⊤

y ≈ (ν − µ⋆)

Why is y better than y′?

Page 57: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

How Far is µ⋆: Optimal Value of the SDPs

Lemma

When ∥ν − µ⋆∥2 ≥ . . .,

1 + 0.99 ∥ν − µ⋆∥22 ≤ OPTν ≤ 1 + 1.01 ∥ν − µ⋆∥22 .

Page 58: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Puing it Together: Moving ν Closer to µ⋆

We show that despite the error from

the errors in the concentration bounds, and

we are only solving the SDP approximately,

OPTv ≈ 1 + ∥ν − µ⋆∥2, and the top eigenvector of M still aligns approximately with (ν − µ⋆).

bν

bµ⋆

bν′

r

v1

θ

bν′′

−v1

Page 59: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

bν

bµ⋆

bν′

r

v1

θ

bν′′

−v1

Page 60: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

bν

bµ⋆

bν′

r

v1

θ

bν′′

−v1

Page 61: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

bν

bµ⋆

bν′

r

v1

θ

bν′′

−v1

Page 62: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Full Algorithm: Sub-Gaussians

Algorithm 1: Robust Mean Estimation for Known Covariance Sub-Gaussian

Let ν be the coordinate-wise median of XiNi=1;

for i = 1 to O(log d) doCompute either

(i) a good solution w ∈ RNfor the primal SDP with parameters (ν, 2ε); or

(ii) a good solution M ∈ Rd×dfor the dual SDP with parameters (ν, ε);

if the objective value of w in primal SDP ≤ 1 + c0ε ln(1/ε) thenreturn the weighted empirical mean µw = ∑

Ni=1wiXi ;

elseMove ν closer to µ⋆ using the top eigenvector of M .

Page 63: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Full Algorithm: Bounded Covariance

Algorithm 2: Robust Mean Estimation for Bounded Covariance Distributions

Let ν be the coordinate-wise median of XiNi=1;

for i = 1 to O(log d) doCompute either

(i) a good solution w ∈ RNfor the primal SDP with parameters (ν, 2ε); or

(ii) a good solution M ∈ Rd×dfor the dual SDP with parameters (ν, ε);

if the objective value of w in primal SDP is at most c1 thenreturn the weighted empirical mean µw = ∑

Ni=1wiXi ;

elseMove ν closer to µ⋆ using the top eigenvector of M .

Page 64: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Summary: Robust Mean Estimation

Bounded Covariance O(

√

ε) O(d/δ2)

We hope our work will serve as a starting point for the design of faster algorithms for

high-dimensional robust estimation.

Page 65: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Summary: Robust Mean Estimation

Bounded Covariance O(

√

ε) O(d/δ2)

We hope our work will serve as a starting point for the design of faster algorithms for

high-dimensional robust estimation.

Page 66: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Follow up: Robust Covariance Estimation [C Diakonikolas Ge Woodru ’19]

Input: ε-corrupted set of N samples drawn from N (0,Σ).

Goal: Estimate Σ.

Gaussian

∥Σ−1/2ΣΣ−1/2− I∥F = O(ε log(1/ε))

O(d2/δ2)O(d3.26 logκ/ε8)

∥Σ −Σ∥F = O(ε log(1/ε)) O(d3.26/ε8)

All previous algorithms with similar error guarantee run in time Ω(d2ω) = Ω(d4.74).

Page 67: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Goal: Estimate Σ.

Gaussian

∥Σ−1/2ΣΣ−1/2− I∥F = O(ε log(1/ε))

∥Σ −Σ∥F = O(ε log(1/ε)) O(d3.26/ε8)

Page 68: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Goal: Estimate Σ.

Gaussian

∥Σ−1/2ΣΣ−1/2− I∥F = O(ε log(1/ε))

∥Σ −Σ∥F = O(ε log(1/ε)) O(d3.26/ε8)

Page 69: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Gaussian

∥Σ−1/2ΣΣ−1/2− I∥F = O(ε log(1/ε))

∥Σ −Σ∥F = O(ε log(1/ε)) O(d3.26/ε8)

Fast rectangular multiplication: d × d2 × d matrix multiplication can be done in time O(d3.26).

Our runtime almost matches that of the best non-robust covariance estimation algorithm.

Computing the empirical covariance matrix1N ∑

Ni=1 XiX⊺

i takes O(d3.26/ε2) time.

E[XX⊺] = Σ. Reduce to robust mean estimation with input X ⊗ X ∈ Rd2

.

We use the primal-dual framework presented in this talk.

Naive implementation takes Ω(Nd2) = Ω(d4) time. We need to open up the positive SDP solvers.

Page 70: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Gaussian

∥Σ−1/2ΣΣ−1/2− I∥F = O(ε log(1/ε))

∥Σ −Σ∥F = O(ε log(1/ε)) O(d3.26/ε8)

Ni=1 XiX⊺

.

Page 71: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Gaussian

∥Σ−1/2ΣΣ−1/2− I∥F = O(ε log(1/ε))

∥Σ −Σ∥F = O(ε log(1/ε)) O(d3.26/ε8)

Ni=1 XiX⊺

.

Page 72: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Gaussian

∥Σ−1/2ΣΣ−1/2− I∥F = O(ε log(1/ε))

∥Σ −Σ∥F = O(ε log(1/ε)) O(d3.26/ε8)

Ni=1 XiX⊺

.

Page 73: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Open Problems

Faster algorithms for other high-dimensional robust learning problems (e.g., sparse mean

estimation / sparse PCA)?

Can we avoid the poly(1/ε) in the runtime?

Page 74: High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Open Problems

Faster algorithms for other high-dimensional robust learning problems (e.g., sparse mean

estimation / sparse PCA)?

Can we avoid the poly(1/ε) in the runtime?

Download - High-Dimensional Robust Mean Estimation in Nearly-Linear Time

Top Related