· given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘....

131
Non-Adaptive Adaptive Sampling on Turnstile Streams Sepideh Mahabadi TTIC Ilya Razenshteyn MSR Redmond Samson Zhou CMU David Woodruff CMU

Upload: others

Post on 21-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Non-Adaptive Adaptive Sampling on Turnstile Streams

Sepideh MahabadiTTIC

Ilya RazenshteynMSR Redmond

Samson ZhouCMU

David WoodruffCMU

Page 2:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

An algorithmic paradigm for solving many data summarization tasks.

Adaptive Sampling

Page 3:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

An algorithmic paradigm for solving many data summarization tasks.

Adaptive Sampling

Given: 𝑛𝑛 vectors in ℝ𝒅𝒅

• Sample a vector w.p. proportional to its norm• Project all vectors away from the selected subspace• Repeat on the residuals

Page 4:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 5:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 6:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 7:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 8:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 9:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 10:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 11:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 12:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 13:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling Example

Page 14:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Data Summarization TasksGiven: • 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅

• parameter 𝑘𝑘Goal: • Find a representation (of “size 𝑘𝑘”) for the data• Optimize a predefined function

Rows correspond to 𝑛𝑛 data points

e.g. feature vectors of objects in a dataset

Page 15:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Given: • 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅

• parameter 𝑘𝑘Goal: • Find a representation (of “size 𝑘𝑘”) for the data• Optimize a predefined functionInstances:• Row/Column subset selection• Subspace approximation• Projective clustering• Volume sampling/maximization

Data Summarization Tasks

• Find a subset 𝑆𝑆 of 𝑘𝑘 rows minimizing the squared distance of all rows to the subspace of 𝑆𝑆

𝐴𝐴 − 𝑃𝑃𝑃𝑃𝑃𝑃𝑗𝑗𝑆𝑆 𝐴𝐴 𝐹𝐹

Best set of representatives

Page 16:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Data Summarization TasksGiven: • 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅

• parameter 𝑘𝑘Goal: • Find a representation (of “size 𝑘𝑘”) for the data• Optimize a predefined functionInstances:• Row/Column subset selection• Subspace approximation• Projective clustering• Volume sampling/maximization

• Find a subspace 𝐻𝐻 of dimension 𝑘𝑘 minimizing the squared distance of all rows to 𝐻𝐻

𝐴𝐴 − 𝑃𝑃𝑃𝑃𝑃𝑃𝑗𝑗𝐻𝐻 𝐴𝐴 𝐹𝐹

Best approximation with a subspace

Page 17:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Data Summarization TasksGiven: • 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅

• parameter 𝑘𝑘Goal: • Find a representation (of “size 𝑘𝑘”) for the data• Optimize a predefined functionInstances:• Row/Column subset selection• Subspace approximation• Projective clustering• Volume sampling/maximization

• Find 𝑠𝑠 subspaces 𝐻𝐻1, … ,𝐻𝐻𝑠𝑠each of dimension 𝑘𝑘minimizing

∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 2

Best approximation with several subspaces

Page 18:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Data Summarization TasksGiven: • 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅

• parameter 𝑘𝑘Goal: • Find a representation (of “size 𝑘𝑘”) for the data• Optimize a predefined functionInstances:• Row/Column subset selection• Subspace approximation• Projective clustering• Volume sampling/maximization

• Find a subset 𝑆𝑆 of 𝑘𝑘 rows that maximizes the volume of the parallelepiped spanned by 𝑆𝑆

Notion for capturing diversityMaximizing diversity

Page 19:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Given: • 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅

• parameter 𝑘𝑘Goal: • Find a representation (of “size 𝑘𝑘”) for the data• Optimize a predefined functionInstances:• Row/Column subset selection• Subspace approximation• Projective clustering• Volume sampling/maximization

Data Summarization Tasks

Adaptive sampling is used to derive algorithms for all these tasks

Page 20:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling[DeshpandeVempala06, DeshpandeVaradarajan07, DeshpandeRademacherVempalaWang06]

• Sample row 𝑖𝑖 w.p. proportional to distance squared 𝐴𝐴𝑖𝑖 22

• Given: 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅, parameter 𝑘𝑘

• Sample a row 𝐴𝐴𝑖𝑖 with probability 𝐴𝐴𝑖𝑖 2

2

𝐴𝐴 𝐹𝐹2

Page 21:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling[DeshpandeVempala06, DeshpandeVaradarajan07, DeshpandeRademacherVempalaWang06]

• Sample row 𝑖𝑖 w.p. proportional to distance squared 𝐴𝐴𝑖𝑖 22

• Given: 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅, parameter 𝑘𝑘

• Sample a row 𝐴𝐴𝑖𝑖 with probability 𝐴𝐴𝑖𝑖 2

2

𝐴𝐴 𝐹𝐹2

Frobenius norm:

𝐴𝐴 𝐹𝐹 = ∑𝑖𝑖 ∑𝑗𝑗 𝐴𝐴𝑖𝑖,𝑗𝑗2

Page 22:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling[DeshpandeVempala06, DeshpandeVaradarajan07, DeshpandeRademacherVempalaWang06]

• Sample row 𝑖𝑖 w.p. proportional to 𝐴𝐴𝑖𝑖 𝐼𝐼 − 𝑀𝑀+𝑀𝑀 22

• Given: 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅, parameter 𝑘𝑘• 𝑀𝑀 ← ∅• For 𝑘𝑘 rounds,

• Sample a row 𝐴𝐴𝑖𝑖 with probability 𝐴𝐴𝑖𝑖 𝐼𝐼−𝑀𝑀

+𝑀𝑀 22

𝐴𝐴 𝐼𝐼−𝑀𝑀+𝑀𝑀 𝐹𝐹2

• Append 𝐴𝐴𝑖𝑖 to 𝑀𝑀

Project away from sampled subspace𝑴𝑴+ :Moore-Penrose Pseudoinverse

Page 23:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampling[DeshpandeVempala06, DeshpandeVaradarajan07, DeshpandeRademacherVempalaWang06]

• Sample row 𝑖𝑖 w.p. proportional to 𝐴𝐴𝑖𝑖 𝐼𝐼 − 𝑀𝑀+𝑀𝑀 22

• Given: 𝑛𝑛 by 𝑑𝑑 matrix 𝑨𝑨 ∈ ℝ𝒏𝒏×𝒅𝒅, parameter 𝑘𝑘• 𝑀𝑀 ← ∅• For 𝑘𝑘 rounds,

• Sample a row 𝐴𝐴𝑖𝑖 with probability 𝐴𝐴𝑖𝑖 𝐼𝐼−𝑀𝑀

+𝑀𝑀 22

𝐴𝐴 𝐼𝐼−𝑀𝑀+𝑀𝑀 𝐹𝐹2

• Append 𝐴𝐴𝑖𝑖 to 𝑀𝑀

Seems inherently sequential

Project away from sampled subspace𝑴𝑴+ :Moore-Penrose Pseudoinverse

Page 24:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Question:

Can we implement Adaptive Sampling in one pass (non-adaptively)?

Page 25:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Streaming AlgorithmsMotivation: Data is huge and cannot be stored in the main memory

Streaming algorithms: Given sequential access to the data, make one or several passes over input

• Solve the problem on the fly

• Use sub-linear storage

Parameters: Space, number of passes, approximation

Page 26:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Streaming AlgorithmsMotivation: Data is huge and cannot be stored in the main memory

Streaming algorithms: Given sequential access to the data, make one or several passes over input

• Solve the problem on the fly

• Use sub-linear storage

Parameters: Space, number of passes, approximation

Models:

• Row Arrival: rows of 𝐴𝐴 arrive one by one

• Turnstile: we receive updates to the entries of the matrix i.e., (𝑖𝑖, 𝑗𝑗,Δ) means 𝐴𝐴𝑖𝑖,𝑗𝑗 ← 𝐴𝐴𝑖𝑖,𝑗𝑗 + Δ

Page 27:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Streaming AlgorithmsMotivation: Data is huge and cannot be stored in the main memory

Streaming algorithms: Given sequential access to the data, make one or several passes over input

• Solve the problem on the fly

• Use sub-linear storage

Parameters: Space, number of passes, approximation

Models:

• Row Arrival: rows of 𝐴𝐴 arrive one by one

• Turnstile: we receive updates to the entries of the matrix i.e., (𝑖𝑖, 𝑗𝑗,Δ) means 𝐴𝐴𝑖𝑖,𝑗𝑗 ← 𝐴𝐴𝑖𝑖,𝑗𝑗 + Δ

Focus on the row arrival model for the talk

Page 28:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Streaming AlgorithmsMotivation: Data is huge and cannot be stored in the main memory

Streaming algorithms: Given sequential access to the data, make one or several passes over input

• Solve the problem on the fly

• Use sub-linear storage

Parameters: Space, number of passes, approximation

Models:

• Row Arrival: rows of 𝐴𝐴 arrive one by one

• Turnstile: we receive updates to the entries of the matrix i.e., (𝑖𝑖, 𝑗𝑗,Δ) means 𝐴𝐴𝑖𝑖,𝑗𝑗 ← 𝐴𝐴𝑖𝑖,𝑗𝑗 + Δ

Focus on the row arrival model for the talk

Our goal: Simulate 𝑘𝑘 rounds of adaptive sampling in 1 pass of streaming Data Summarization tasks were considered in the streaming models in earlier works that used

adaptive sampling [e.g. DV’06, DR’10, DRVW’06]

Page 29:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Outline of Results

1. Simulate adaptive sampling in 1 pass turnstile stream• 𝐿𝐿𝑝𝑝,2 sampling with post processing matrix 𝑃𝑃

2. Applications in turnstile stream• Row/column subset selection• Subspace approximation• Projective clustering• Volume Maximization

3. Volume maximization lower bounds

4. Volume maximization in row arrival

Page 30:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Outline of Results

1. Simulate adaptive sampling in 1 pass turnstile stream• 𝑳𝑳𝒑𝒑,𝟐𝟐 sampling with post processing matrix 𝑷𝑷

2. Applications in turnstile stream• Row/column subset selection• Subspace approximation• Projective clustering• Volume Maximization

3. Volume maximization lower bounds

4. Volume maximization in row arrival

Page 31:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: 𝑳𝑳𝟐𝟐,𝟐𝟐 Sampling with Post-Processing

Input:• 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) stream• a post-processing 𝑷𝑷 ∈ ℝ𝑑𝑑×𝑑𝑑

Output: samples an index 𝑖𝑖 ∈ [𝑛𝑛] w.p. 𝑨𝑨𝒊𝒊𝑷𝑷 22

𝑨𝑨𝑷𝑷 𝐹𝐹2

Page 32:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: 𝑳𝑳𝟐𝟐,𝟐𝟐 Sampling with Post-Processing

Input:• 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) stream• a post-processing 𝑷𝑷 ∈ ℝ𝑑𝑑×𝑑𝑑

Output: samples an index 𝑖𝑖 ∈ [𝑛𝑛] w.p. 𝑨𝑨𝒊𝒊𝑷𝑷 22

𝑨𝑨𝑷𝑷 𝐹𝐹2

𝑷𝑷 corresponds to the projection matrix (𝐼𝐼 − 𝑀𝑀+𝑀𝑀)

Page 33:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: 𝑳𝑳𝟐𝟐,𝟐𝟐 Sampling with Post-Processing

Input:• 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) stream• a post-processing 𝑷𝑷 ∈ ℝ𝑑𝑑×𝑑𝑑

Output: samples an index 𝑖𝑖 ∈ [𝑛𝑛] w.p. 1 ± 𝜖𝜖 𝑨𝑨𝒊𝒊𝑷𝑷 22

𝑨𝑨𝑷𝑷 𝐹𝐹2 + 1

𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝(𝑛𝑛) In one pass 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑, 𝜖𝜖−1, log𝑛𝑛) space

Page 34:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: 𝑳𝑳𝟐𝟐,𝟐𝟐 Sampling with Post-Processing

Impossible to return entire row instead of index in sublinear space A long stream of small updates + an arbitrarily large update

Input:• 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) stream• a post-processing 𝑷𝑷 ∈ ℝ𝑑𝑑×𝑑𝑑

Output: samples an index 𝑖𝑖 ∈ [𝑛𝑛] w.p. 1 ± 𝜖𝜖 𝑨𝑨𝒊𝒊𝑷𝑷 22

𝑨𝑨𝑷𝑷 𝐹𝐹2 + 1

𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝(𝑛𝑛) In one pass 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑, 𝜖𝜖−1, log𝑛𝑛) space

Page 35:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: 𝑳𝑳𝒑𝒑,𝟐𝟐 Sampling with Post-Processing

Impossible to return entire row instead of index in sublinear space A long stream of small updates + an arbitrarily large update

Input:• 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) stream, 𝒑𝒑 ∈ {𝟏𝟏,𝟐𝟐}• a post-processing 𝑷𝑷 ∈ ℝ𝑑𝑑×𝑑𝑑

Output: samples an index 𝑖𝑖 ∈ [𝑛𝑛] w.p. 1 ± 𝜖𝜖 𝑨𝑨𝒊𝒊𝑷𝑷 2𝒑𝒑

𝑨𝑨𝑷𝑷 𝒑𝒑,𝟐𝟐𝒑𝒑 + 1

𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝(𝑛𝑛)

In one pass 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑, 𝜖𝜖−1, log𝑛𝑛) space

Page 36:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Outline of Results

1. Simulate adaptive sampling in 1 pass turnstile stream• 𝐿𝐿𝑝𝑝,2 sampling with post processing matrix 𝑃𝑃

2. Applications in turnstile stream• Row/column subset selection• Subspace approximation• Projective clustering• Volume Maximization

3. Volume maximization lower bounds

4. Volume maximization in row arrival

Page 37:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: Adaptive Sampling

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) streamOutput: Return each set 𝑺𝑺 ⊂𝒌𝒌 [𝒏𝒏] of 𝑘𝑘 indices w.p. 𝒑𝒑𝑺𝑺 s.t.

∑𝑆𝑆 𝒑𝒑𝑺𝑺 − 𝒒𝒒𝑺𝑺 ≤ 𝜖𝜖• 𝒒𝒒𝑺𝑺: prob. of selecting 𝑺𝑺 via adaptive sampling• w.r.t. either distance or squared distance (i.e., 𝑝𝑝 ∈ {1,2})

Page 38:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: Adaptive Sampling

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) streamOutput: Return each set 𝑺𝑺 ⊂𝒌𝒌 [𝒏𝒏] of 𝑘𝑘 indices w.p. 𝒑𝒑𝑺𝑺 s.t.

∑𝑆𝑆 𝒑𝒑𝑺𝑺 − 𝒒𝒒𝑺𝑺 ≤ 𝜖𝜖• 𝒒𝒒𝑺𝑺: prob. of selecting 𝑺𝑺 via adaptive sampling• w.r.t. either distance or squared distance (i.e., 𝑝𝑝 ∈ {1,2})

In one pass

𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, 𝜖𝜖−1, log𝑛𝑛) space

Page 39:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: Adaptive Sampling

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) streamOutput: Return each set 𝑺𝑺 ⊂𝒌𝒌 [𝒏𝒏] of 𝑘𝑘 indices w.p. 𝒑𝒑𝑺𝑺 s.t.

∑𝑆𝑆 𝒑𝒑𝑺𝑺 − 𝒒𝒒𝑺𝑺 ≤ 𝜖𝜖• 𝒒𝒒𝑺𝑺: prob. of selecting 𝑺𝑺 via adaptive sampling• w.r.t. either distance or squared distance (i.e., 𝑝𝑝 ∈ {1,2})

In one pass

𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, 𝜖𝜖−1, log𝑛𝑛) space

Besides indices S, a noisy set of rows 𝑃𝑃1, … , 𝑃𝑃𝑘𝑘 are returned • Each 𝑃𝑃𝑖𝑖 is close to the corresponding 𝐴𝐴𝑖𝑖 (w.r.t. residual)

Page 40:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Results: Adaptive Sampling

Impossible to return the row accurately in sublinear space A long stream of small updates + an arbitrarily large update

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) streamOutput: Return each set 𝑺𝑺 ⊂𝒌𝒌 [𝒏𝒏] of 𝑘𝑘 indices w.p. 𝒑𝒑𝑺𝑺 s.t.

∑𝑆𝑆 𝒑𝒑𝑺𝑺 − 𝒒𝒒𝑺𝑺 ≤ 𝜖𝜖• 𝒒𝒒𝑺𝑺: prob. of selecting 𝑺𝑺 via adaptive sampling• w.r.t. either distance or squared distance (i.e., 𝑝𝑝 ∈ {1,2})

In one pass

𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, 𝜖𝜖−1, log𝑛𝑛) space

Besides indices S, a noisy set of rows 𝑃𝑃1, … , 𝑃𝑃𝑘𝑘 are returned • Each 𝑃𝑃𝑖𝑖 is close to the corresponding 𝐴𝐴𝑖𝑖 (w.r.t. residual)

Page 41:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Outline of Results

1. Simulate adaptive sampling in 1 pass turnstile stream• 𝐿𝐿𝑝𝑝,2 sampling with post processing matrix 𝑃𝑃

2. Applications in turnstile stream• Row/column subset selection• Subspace approximation• Projective clustering• Volume Maximization

3. Volume maximization lower bounds

4. Volume maximization in row arrival

Page 42:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset Selection

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0

Output: 𝒌𝒌 rows of 𝐴𝐴 to form 𝑴𝑴 to minimize 𝐴𝐴 − 𝐴𝐴𝑀𝑀+𝑀𝑀 𝐹𝐹

Page 43:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset Selection

Our Result: finds M such that,Pr[ 𝐴𝐴 − 𝐴𝐴𝑴𝑴+𝑴𝑴 𝐹𝐹

2 ≤ 𝟏𝟏𝟏𝟏 𝒌𝒌 + 𝟏𝟏 ! 𝐴𝐴 − 𝐴𝐴𝑘𝑘 𝐹𝐹2 ] ≥ 2/3

• 𝐴𝐴𝑘𝑘: best rank-k approximation of 𝐴𝐴• first one pass turnstile streaming algorithm• 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, log𝑛𝑛) space

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0

Output: 𝒌𝒌 rows of 𝐴𝐴 to form 𝑴𝑴 to minimize 𝐴𝐴 − 𝐴𝐴𝑀𝑀+𝑀𝑀 𝐹𝐹

Page 44:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset Selection

Our Result: finds M such that,Pr[ 𝐴𝐴 − 𝐴𝐴𝑴𝑴+𝑴𝑴 𝐹𝐹

2 ≤ 𝟏𝟏𝟏𝟏 𝒌𝒌 + 𝟏𝟏 ! 𝐴𝐴 − 𝐴𝐴𝑘𝑘 𝐹𝐹2 ] ≥ 2/3

• 𝐴𝐴𝑘𝑘: best rank-k approximation of 𝐴𝐴• first one pass turnstile streaming algorithm• 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, log𝑛𝑛) space

Previous works: centralized setting [e.g. DRVW06, BMD09, GS’12] and row arrival [e.g., CMM’17, GP’14 , BDMMUWZ’18]

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0

Output: 𝒌𝒌 rows of 𝐴𝐴 to form 𝑴𝑴 to minimize 𝐴𝐴 − 𝐴𝐴𝑀𝑀+𝑀𝑀 𝐹𝐹

Page 45:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Subspace Approximation

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0Output: 𝒌𝒌-dim subspace 𝑯𝑯 to minimize ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 𝑝𝑝 1/𝑝𝑝

• 𝑝𝑝 ∈ 1,2• 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 = 𝐴𝐴𝑖𝑖 𝕀𝕀 − 𝐻𝐻+𝐻𝐻 2

Page 46:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Subspace Approximation

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0Output: 𝒌𝒌-dim subspace 𝑯𝑯 to minimize ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 𝑝𝑝 1/𝑝𝑝

• 𝑝𝑝 ∈ 1,2• 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 = 𝐴𝐴𝑖𝑖 𝕀𝕀 − 𝐻𝐻+𝐻𝐻 2

Our Result I: finds H (which is 𝒌𝒌 noisy rows of 𝑨𝑨) s.t.,

Pr[ ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝑯𝑯 𝑝𝑝 1/𝑝𝑝 ≤ 𝟒𝟒 𝒌𝒌 + 𝟏𝟏 ! ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐴𝐴𝑘𝑘 𝑝𝑝 1/𝑝𝑝] ≥ 23

• 𝐴𝐴𝑘𝑘: best rank-k approximation of A• 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, log𝑛𝑛) space

Page 47:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Subspace Approximation

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0Output: 𝒌𝒌-dim subspace 𝑯𝑯 to minimize ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 𝑝𝑝 1/𝑝𝑝

• 𝑝𝑝 ∈ 1,2• 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 = 𝐴𝐴𝑖𝑖 𝕀𝕀 − 𝐻𝐻+𝐻𝐻 2

Our Result I: finds H (which is 𝒌𝒌 noisy rows of 𝑨𝑨) s.t.,

Pr[ ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝑯𝑯 𝑝𝑝 1/𝑝𝑝 ≤ 𝟒𝟒 𝒌𝒌 + 𝟏𝟏 ! ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐴𝐴𝑘𝑘 𝑝𝑝 1/𝑝𝑝] ≥ 23

• 𝐴𝐴𝑘𝑘: best rank-k approximation of A• 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, log𝑛𝑛) space• First relative error on turnstile streams that returns noisy rows of A [Levin, Sevekari, Woodruff’18]

+(1 + 𝜖𝜖)-approximation –larger number of rows –rows are not from A

Page 48:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Subspace Approximation

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0Output: 𝒌𝒌-dim subspace 𝑯𝑯 to minimize ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 𝑝𝑝 1/𝑝𝑝

• 𝑝𝑝 ∈ 1,2• 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 = 𝐴𝐴𝑖𝑖 𝕀𝕀 − 𝐻𝐻+𝐻𝐻 2

Our Result II: finds H (which is 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒌𝒌,𝟏𝟏/𝝐𝝐) noisy rows of 𝑨𝑨) s.t.,

Pr[ ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝑯𝑯 𝑝𝑝 1/𝑝𝑝 ≤ 𝟏𝟏 + 𝝐𝝐 ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐴𝐴𝑘𝑘 𝑝𝑝 1/𝑝𝑝] ≥ 23

• 𝐴𝐴𝑘𝑘: best rank-k approximation of A• 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌,𝟏𝟏/𝝐𝝐, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏) space

[Levin, Sevekari, Woodruff’18] –𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(log 𝑛𝑛𝑑𝑑 ,𝑘𝑘, 1/𝜖𝜖) rows –rows are not from A

Page 49:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Projective Clustering

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑, target dim 𝒌𝒌 and target number of subspaces 𝒔𝒔Output: 𝒔𝒔 𝒌𝒌-dim subspaces 𝑯𝑯𝟏𝟏, … ,𝑯𝑯𝒔𝒔 to minimize ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝑯𝑯 𝑝𝑝 1/𝑝𝑝

• 𝑯𝑯 = 𝑯𝑯𝟏𝟏 ∪⋯∪𝑯𝑯𝒔𝒔 and 𝑝𝑝 ∈ 1,2• 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 = min

𝑗𝑗∈ 𝑠𝑠𝐴𝐴𝑖𝑖 𝕀𝕀 − 𝐻𝐻𝑗𝑗+𝐻𝐻𝑗𝑗 2

Page 50:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Projective Clustering

Our Result: finds S (which is 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒌𝒌, 𝒔𝒔,𝟏𝟏/𝝐𝝐) noisy rows of 𝑨𝑨),which contains a union T of 𝑠𝑠 𝒌𝒌-dim subspaces s.t.,

Pr[ ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝑻𝑻 𝑝𝑝 1/𝑝𝑝 ≤ 𝟏𝟏 + 𝝐𝝐 ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝑯𝑯 𝑝𝑝 1/𝑝𝑝] ≥ 2/3• 𝐻𝐻: optimal solution to projective clustering• first one pass turnstile streaming algorithm with sublinear space• 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, log𝑛𝑛 , 𝑠𝑠, 1/𝜖𝜖) space [BHI’02, HM’04, Che09, FMSW’10] based on coresets, works in row arrival [KR’15] turnstile but linear in number of points

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑, target dim 𝒌𝒌 and target number of subspaces 𝒔𝒔Output: 𝒔𝒔 𝒌𝒌-dim subspaces 𝑯𝑯𝟏𝟏, … ,𝑯𝑯𝒔𝒔 to minimize ∑𝑖𝑖=1𝑛𝑛 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝑯𝑯 𝑝𝑝 1/𝑝𝑝

• 𝑯𝑯 = 𝑯𝑯𝟏𝟏 ∪⋯∪𝑯𝑯𝒔𝒔 and 𝑝𝑝 ∈ 1,2• 𝑑𝑑 𝐴𝐴𝑖𝑖 ,𝐻𝐻 = min

𝑗𝑗∈ 𝑠𝑠𝐴𝐴𝑖𝑖 𝕀𝕀 − 𝐻𝐻𝑗𝑗+𝐻𝐻𝑗𝑗 2

Page 51:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Page 52:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Volume of the parallelepiped spanned by those vectors

𝒌𝒌 = 𝟐𝟐

Page 53:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Our Result (Upper Bound I): for an approximation factor 𝜶𝜶, finds S(set of 𝒌𝒌 noisy rows of 𝑨𝑨) s.t.,

Pr[𝛼𝛼𝑘𝑘 𝑘𝑘! Vol 𝐒𝐒 ≥ Vol(𝐌𝐌)] ≥ 2/3• first one pass turnstile streaming algorithm• �𝑂𝑂( ⁄𝑛𝑛𝑑𝑑𝑘𝑘2 𝛼𝛼2) space

Page 54:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Our Result (Upper Bound I): for an approximation factor 𝜶𝜶, finds S(set of 𝒌𝒌 noisy rows of 𝑨𝑨) s.t.,

Pr[𝛼𝛼𝑘𝑘 𝑘𝑘! Vol 𝐒𝐒 ≥ Vol(𝐌𝐌)] ≥ 2/3• first one pass turnstile streaming algorithm• �𝑂𝑂( ⁄𝑛𝑛𝑑𝑑𝑘𝑘2 𝛼𝛼2) space [Indyk, M, Oveis Gharan, Rezaei, ‘19 ‘20] coreset based �𝑂𝑂 𝑘𝑘 𝑘𝑘/𝜖𝜖 approx. and �𝑂𝑂(𝑛𝑛𝜖𝜖𝑘𝑘𝑑𝑑) space for row-arrival streams

Page 55:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Outline of Results

1. Simulate adaptive sampling in 1 pass turnstile stream• 𝐿𝐿𝑝𝑝,2 sampling with post processing matrix 𝑃𝑃

2. Applications in turnstile stream• Row/column subset selection• Subspace approximation• Projective clustering• Volume Maximization

3. Volume maximization lower bounds

4. Volume maximization in row arrival

Page 56:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Volume Maximization Lower Bounds

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Our Result (Lower Bound I): for 𝜶𝜶, any 𝒑𝒑-pass algorithm that finds 𝜶𝜶𝑘𝑘-approximation w.p. ≥ ⁄63 64 in turnstile-arrival requires Ω( ⁄𝑛𝑛 𝑘𝑘𝒑𝒑𝜶𝜶2) space.

• Our previous upper bound is matches the upper bound up to a factor of 𝒌𝒌𝟑𝟑𝒅𝒅 in space and 𝒌𝒌! in the approximation factor.

Page 57:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Volume Maximization Lower Bounds

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Our Result (Lower Bound II): for a fixed constant 𝑪𝑪, any one-pass algorithm that finds 𝑪𝑪𝒌𝒌-approximation w.p. ≥ ⁄63 64 in random order row-arrival requires Ω(𝑛𝑛) space

Page 58:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Volume Maximization – Row Arrival

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Our Result (Upper Bound II): for an approximation factor 𝑪𝑪 < ⁄(log 𝑛𝑛) 𝑘𝑘, finds S (set of 𝒌𝒌 rows of 𝑨𝑨) s.t.• approximation factor �𝑂𝑂 𝑪𝑪𝑘𝑘 𝑘𝑘/2 with high probability• one pass row-arrival streaming algorithm• �𝑂𝑂(𝑛𝑛𝑂𝑂(1/𝑪𝑪)𝑑𝑑) space

Page 59:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Volume Maximization – Row Arrival

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Our Result (Upper Bound II): for an approximation factor 𝑪𝑪 < ⁄(log 𝑛𝑛) 𝑘𝑘, finds S (set of 𝒌𝒌 rows of 𝑨𝑨) s.t.• approximation factor �𝑂𝑂 𝑪𝑪𝑘𝑘 𝑘𝑘/2 with high probability• one pass row-arrival streaming algorithm• �𝑂𝑂(𝑛𝑛𝑂𝑂(1/𝑪𝑪)𝑑𝑑) space

[Indyk, M, Oveis Gharan, Rezaei, ‘19 ‘20] coreset based �𝑂𝑂 𝑘𝑘 𝑪𝑪𝒌𝒌/𝟐𝟐 approx. and �𝑂𝑂(𝑛𝑛𝟏𝟏/𝑪𝑪𝑘𝑘𝑑𝑑)space for row-arrival streams

Page 60:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝒑𝒑,𝟐𝟐 Sampler1. Simulate adaptive sampling in 1 pass

• 𝐿𝐿𝑝𝑝,2 sampling with post processing matrix 𝑃𝑃

2. Applications in turnstile stream• Row/column subset selection• Subspace approximation• Projective clustering• Volume Maximization

3. Volume maximization lower bounds

4. Volume maximization in row arrival

Page 61:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler with Post-Processing Matrix

Input: matrix A as a data stream, a post-processing matrix P

Output: index 𝑖𝑖 of a row of AP sampled w.p. ~ 𝐴𝐴𝑖𝑖𝑃𝑃 22

𝐴𝐴𝑃𝑃 𝐹𝐹2

Page 62:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler with Post-Processing Matrix

Extension of 𝑳𝑳𝟐𝟐 Sampler [Andoni et al.’10][Monemizadeh, Woodruff’10][Jowhari et al.’11][Jayaram, Woodruff’18]

Input: matrix A as a data stream, a post-processing matrix P

Output: index 𝑖𝑖 of a row of AP sampled w.p. ~ 𝐴𝐴𝑖𝑖𝑃𝑃 22

𝐴𝐴𝑃𝑃 𝐹𝐹2

Input: vector f as a data stream

Output: index 𝑖𝑖 of a coordinate of f sampled w.p. ~ 𝑓𝑓𝑖𝑖2

𝐟𝐟 22

Page 63:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler with Post-Processing Matrix

Extension of 𝑳𝑳𝟐𝟐 Sampler [Andoni et al.’10][Monemizadeh, Woodruff’10][Jowhari et al.’11][Jayaram, Woodruff’18]

Input: matrix A as a data stream, a post-processing matrix P

Output: index 𝑖𝑖 of a row of AP sampled w.p. ~ 𝐴𝐴𝑖𝑖𝑃𝑃 22

𝐴𝐴𝑃𝑃 𝐹𝐹2

Input: vector f as a data stream

Output: index 𝑖𝑖 of a coordinate of f sampled w.p. ~ 𝑓𝑓𝑖𝑖2

𝐟𝐟 22

What is new:1. Generalizing vectors to matrices 2. Handling the post processing matrix 𝑃𝑃

Page 64:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 SamplerInput: matrix A as a data stream

Output: index 𝑖𝑖 of a row of A sampled w.p. ~ 𝐴𝐴𝑖𝑖 22

𝐀𝐀 𝐹𝐹2

Ignore 𝑃𝑃 for now

Page 65:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Page 66:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Page 67:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖Pr[ 𝐵𝐵𝑖𝑖 2

2 ≥ 𝐴𝐴 𝐹𝐹2 ] = Pr[ 𝐴𝐴𝑖𝑖 2

2

𝐴𝐴 𝐹𝐹2 ≥ 𝑡𝑡𝑖𝑖] = 𝑨𝑨𝒊𝒊 𝟐𝟐

𝟐𝟐

𝑨𝑨 𝑭𝑭𝟐𝟐

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Page 68:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖

Return 𝑖𝑖 that satisfies 𝑩𝑩𝒊𝒊 𝟐𝟐𝟐𝟐 ≥ 𝑨𝑨 𝑭𝑭

𝟐𝟐

Pr[ 𝐵𝐵𝑖𝑖 22 ≥ 𝐴𝐴 𝐹𝐹

2 ] = Pr[ 𝐴𝐴𝑖𝑖 22

𝐴𝐴 𝐹𝐹2 ≥ 𝑡𝑡𝑖𝑖] = 𝑨𝑨𝒊𝒊 𝟐𝟐

𝟐𝟐

𝑨𝑨 𝑭𝑭𝟐𝟐

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Page 69:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖

Return 𝑖𝑖 that satisfies 𝑩𝑩𝒊𝒊 𝟐𝟐𝟐𝟐 ≥ 𝑨𝑨 𝑭𝑭

𝟐𝟐

Pr[ 𝐵𝐵𝑖𝑖 22 ≥ 𝐴𝐴 𝐹𝐹

2 ] = Pr[ 𝐴𝐴𝑖𝑖 22

𝐴𝐴 𝐹𝐹2 ≥ 𝑡𝑡𝑖𝑖] = 𝑨𝑨𝒊𝒊 𝟐𝟐

𝟐𝟐

𝑨𝑨 𝑭𝑭𝟐𝟐

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Issues:1. Multiple rows passing the threshold

2. Don’t have access to exact values of 𝑩𝑩𝒊𝒊 𝟐𝟐𝟐𝟐 and 𝑨𝑨 𝑭𝑭

𝟐𝟐

Page 70:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖 Ideally, return the only 𝒊𝒊 that satisfies 𝑩𝑩𝒊𝒊 𝟐𝟐

𝟐𝟐 ≥ 𝑨𝑨 𝑭𝑭𝟐𝟐

𝜸𝜸𝟐𝟐: = 𝑪𝑪 𝐥𝐥𝐥𝐥𝐥𝐥 𝒏𝒏𝝐𝝐

Pr[ 𝐵𝐵𝑖𝑖 22 ≥ 𝜸𝜸𝟐𝟐 ⋅ 𝐴𝐴 𝐹𝐹

2 ] = 𝟏𝟏𝜸𝜸𝟐𝟐

× 𝑨𝑨𝒊𝒊 𝟐𝟐𝟐𝟐

𝑨𝑨 𝑭𝑭𝟐𝟐

Issue 1: Multiple rows passing the threshold Set the threshold higher

Page 71:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖 Ideally, return the only 𝒊𝒊 that satisfies 𝑩𝑩𝒊𝒊 𝟐𝟐

𝟐𝟐 ≥ 𝑨𝑨 𝑭𝑭𝟐𝟐

𝜸𝜸𝟐𝟐: = 𝑪𝑪 𝐥𝐥𝐥𝐥𝐥𝐥 𝒏𝒏𝝐𝝐

Pr[squared norm of at least one row exceeds 𝜸𝜸𝟐𝟐 ⋅ 𝐴𝐴 𝐹𝐹2 ] = Ω( 1

𝛾𝛾2)

Pr[squared norms of more than one row exceed 𝜸𝜸𝟐𝟐 ⋅ 𝐴𝐴 𝐹𝐹2 ] = O( 1

𝛾𝛾4)

Pr[ 𝐵𝐵𝑖𝑖 22 ≥ 𝜸𝜸𝟐𝟐 ⋅ 𝐴𝐴 𝐹𝐹

2 ] = 𝟏𝟏𝜸𝜸𝟐𝟐

× 𝑨𝑨𝒊𝒊 𝟐𝟐𝟐𝟐

𝑨𝑨 𝑭𝑭𝟐𝟐

Success prob: Ω( 𝜖𝜖log 𝑛𝑛

)

Issue 1: Multiple rows passing the threshold Set the threshold higher

Page 72:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖 Ideally, return the only 𝒊𝒊 that satisfies 𝑩𝑩𝒊𝒊 𝟐𝟐

𝟐𝟐 ≥ 𝑨𝑨 𝑭𝑭𝟐𝟐

𝜸𝜸𝟐𝟐: = 𝑪𝑪 𝐥𝐥𝐥𝐥𝐥𝐥 𝒏𝒏𝝐𝝐

To succeed, repeat �𝑶𝑶(𝟏𝟏/𝝐𝝐)

Pr[ 𝐵𝐵𝑖𝑖 22 ≥ 𝜸𝜸𝟐𝟐 ⋅ 𝐴𝐴 𝐹𝐹

2 ] = 𝟏𝟏𝜸𝜸𝟐𝟐

× 𝑨𝑨𝒊𝒊 𝟐𝟐𝟐𝟐

𝑨𝑨 𝑭𝑭𝟐𝟐

Success prob: Ω( 𝜖𝜖log 𝑛𝑛

)

Issue 1: Multiple rows passing the threshold Set the threshold higher

Page 73:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖 Return 𝒊𝒊 that satisfies 𝑩𝑩𝒊𝒊 𝟐𝟐 ≥ 𝜸𝜸 ⋅ 𝑨𝑨 𝑭𝑭

Issue 2: Don’t have access to exact values of 𝑩𝑩𝒊𝒊 and 𝑨𝑨 F

estimate 𝑩𝑩𝒊𝒊 𝟐𝟐 and 𝑨𝑨 𝑭𝑭

Page 74:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖 Return 𝒊𝒊 that satisfies 𝑩𝑩𝒊𝒊 𝟐𝟐 ≥ 𝜸𝜸 ⋅ 𝑨𝑨 𝑭𝑭

Issue 2: Don’t have access to exact values of 𝑩𝑩𝒊𝒊 and 𝑨𝑨 F

estimate 𝑩𝑩𝒊𝒊 𝟐𝟐 and 𝑨𝑨 𝑭𝑭

Estimate norm of A using AMS

Find heaviest row using CountSketch

Page 75:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Estimate 𝑩𝑩𝒊𝒊 𝟐𝟐 for rows with large norms

Count Sketch

Page 76:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)

#rows r =𝑂𝑂(log𝑛𝑛)#buckets/row b = 𝑂𝑂(1/𝜖𝜖2)

Count Sketch

𝑓𝑓𝑖𝑖

+𝑓𝑓𝑖𝑖ℎ1(𝑖𝑖)

• Hash ℎ𝑗𝑗: 𝑛𝑛 → [𝑏𝑏]• Sign 𝜎𝜎𝑗𝑗: 𝑛𝑛 → {−1, +1}

• Update: 𝐶𝐶[𝑗𝑗,ℎ𝑗𝑗(𝑖𝑖)] += 𝜎𝜎𝑗𝑗(𝑖𝑖) ⋅ 𝑓𝑓𝑖𝑖

Page 77:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)

#rows r =𝑂𝑂(log𝑛𝑛)#buckets/row b = 𝑂𝑂(1/𝜖𝜖2)

Count Sketch

𝑓𝑓𝑖𝑖

+𝑓𝑓𝑖𝑖

• Hash ℎ𝑗𝑗: 𝑛𝑛 → [𝑏𝑏]• Sign 𝜎𝜎𝑗𝑗: 𝑛𝑛 → {−1, +1}

• Update: 𝐶𝐶[𝑗𝑗,ℎ𝑗𝑗(𝑖𝑖)] += 𝜎𝜎𝑗𝑗(𝑖𝑖) ⋅ 𝑓𝑓𝑖𝑖

•−𝑓𝑓𝑖𝑖ℎ2(𝑖𝑖)

Page 78:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)

#rows r =𝑂𝑂(log𝑛𝑛)#buckets/row b = 𝑂𝑂(1/𝜖𝜖2)

Count Sketch

𝑓𝑓𝑖𝑖

+𝑓𝑓𝑖𝑖

• Hash ℎ𝑗𝑗: 𝑛𝑛 → [𝑏𝑏]• Sign 𝜎𝜎𝑗𝑗: 𝑛𝑛 → {−1, +1}

−𝑓𝑓𝑖𝑖+𝑓𝑓𝑖𝑖ℎ3(𝑖𝑖)

• Update: 𝐶𝐶[𝑗𝑗,ℎ𝑗𝑗(𝑖𝑖)] += 𝜎𝜎𝑗𝑗(𝑖𝑖) ⋅ 𝑓𝑓𝑖𝑖

Page 79:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)

#rows r =𝑂𝑂(log𝑛𝑛)#buckets/row b = 𝑂𝑂(1/𝜖𝜖2)

Count Sketch

𝑓𝑓𝑖𝑖

+𝑓𝑓𝑖𝑖

• Hash ℎ𝑗𝑗: 𝑛𝑛 → [𝑏𝑏]• Sign 𝜎𝜎𝑗𝑗: 𝑛𝑛 → {−1, +1}

• Update: 𝐶𝐶[𝑗𝑗,ℎ𝑗𝑗(𝑖𝑖)] += 𝜎𝜎𝑗𝑗(𝑖𝑖) ⋅ 𝑓𝑓𝑖𝑖

• Estimate 𝑓𝑓𝑖𝑖 ≔ 𝑚𝑚𝑚𝑚𝑑𝑑𝑖𝑖𝑚𝑚𝑛𝑛𝑗𝑗 𝜎𝜎𝑗𝑗𝐶𝐶[𝑗𝑗,ℎ𝑗𝑗(𝑖𝑖)]−𝑓𝑓𝑖𝑖

+𝑓𝑓𝑖𝑖

Page 80:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)

#rows r =𝑂𝑂(log𝑛𝑛)#buckets/row b = 𝑂𝑂(1/𝜖𝜖2)

Count Sketch

𝑓𝑓𝑖𝑖

+𝑓𝑓𝑖𝑖−𝑓𝑓𝑖𝑖

+𝑓𝑓𝑖𝑖

Estimation guarantee𝑓𝑓𝑖𝑖 − 𝑓𝑓𝑖𝑖 ≤ 𝜖𝜖 ⋅ 𝐟𝐟 2

• Update: 𝐶𝐶[𝑗𝑗,ℎ𝑗𝑗(𝑖𝑖)] += 𝜎𝜎𝑗𝑗(𝑖𝑖) ⋅ 𝑓𝑓𝑖𝑖

• Estimate 𝑓𝑓𝑖𝑖 ≔ 𝑚𝑚𝑚𝑚𝑑𝑑𝑖𝑖𝑚𝑚𝑛𝑛𝑗𝑗 𝜎𝜎𝑗𝑗𝐶𝐶[𝑗𝑗,ℎ𝑗𝑗(𝑖𝑖)]

Page 81:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Estimate 𝑩𝑩𝒊𝒊 𝟐𝟐 for rows with large norms

#rows r =𝑂𝑂(log𝑛𝑛)#buckets/row b = 𝑂𝑂(1/𝜖𝜖2)

Count Sketch

𝐵𝐵𝑖𝑖

+𝐵𝐵𝑖𝑖−𝐵𝐵𝑖𝑖

+𝐵𝐵𝑖𝑖

Estimation guarantee

𝐵𝐵𝑖𝑖 2 − �𝐵𝐵𝑖𝑖 2 ≤ 𝜖𝜖 ⋅ 𝐵𝐵 𝐹𝐹

Space usage:

𝑂𝑂 log𝑛𝑛 ×1𝜖𝜖2

× 𝑑𝑑

Page 82:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Goal: 𝐵𝐵𝑖𝑖 2 ≥ 𝜸𝜸 ⋅ 𝐴𝐴 𝐹𝐹

Page 83:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖

Step 2. • �𝑩𝑩𝒊𝒊 𝟐𝟐is an estimate of 𝑩𝑩𝒊𝒊 𝟐𝟐 by modified Countsketch• �𝑭𝑭 is an estimate of 𝑨𝑨 𝑭𝑭 by modified AMS

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Goal: 𝐵𝐵𝑖𝑖 2 ≥ 𝜸𝜸 ⋅ 𝐴𝐴 𝐹𝐹

Test: �𝐵𝐵𝑖𝑖 2 ≥ 𝜸𝜸 ⋅ �𝐹𝐹

Page 84:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Step 1. • pick 𝑡𝑡𝑖𝑖 ∈ [0,1] uniformly at random• set 𝐵𝐵𝑖𝑖 ≔

𝟏𝟏𝒕𝒕𝒊𝒊

× 𝐴𝐴𝑖𝑖

Step 2. • �𝑩𝑩𝒊𝒊 𝟐𝟐is an estimate of 𝑩𝑩𝒊𝒊 𝟐𝟐 by modified Countsketch• �𝑭𝑭 is an estimate of 𝑨𝑨 𝑭𝑭 by modified AMS

The test succeeds w.p. 𝜖𝜖, the estimate of largest row exceeds the threshold

𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Goal: 𝐵𝐵𝑖𝑖 2 ≥ 𝜸𝜸 ⋅ 𝐴𝐴 𝐹𝐹

Test: �𝐵𝐵𝑖𝑖 2 ≥ 𝜸𝜸 ⋅ �𝐹𝐹

Page 85:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Handling Post-Processing Matrix

Input: matrix A as a data stream, a post-processing matrix P

Output: index 𝑖𝑖 of a row of AP sampled w.p. ~ 𝐴𝐴𝑖𝑖𝑃𝑃 22

𝐴𝐴𝑃𝑃 𝐹𝐹2

Page 86:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Handling Post-Processing Matrix

Run proposed algorithm on A, then multiply by P:• CountSketch and AMS both are linear transformations

A is mapped to SA S (AP) = (SA) P

Input: matrix A as a data stream, a post-processing matrix P

Output: index 𝑖𝑖 of a row of AP sampled w.p. ~ 𝐴𝐴𝑖𝑖𝑃𝑃 22

𝐴𝐴𝑃𝑃 𝐹𝐹2

Page 87:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Handling Post-Processing Matrix

Run proposed algorithm on A, then multiply by P:• CountSketch and AMS both are linear transformations

A is mapped to SA S (AP) = (SA) P

Total space for sampler: 𝑂𝑂( 𝑑𝑑𝜖𝜖2

log2 𝑛𝑛) bits

Input: matrix A as a data stream, a post-processing matrix P

Output: index 𝑖𝑖 of a row of AP sampled w.p. ~ 𝐴𝐴𝑖𝑖𝑃𝑃 22

𝐴𝐴𝑃𝑃 𝐹𝐹2

Page 88:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

𝑳𝑳𝟐𝟐,𝟐𝟐 sampling with post processingInput:• 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 as a (turnstile) stream• a post-processing 𝑷𝑷 ∈ ℝ𝑑𝑑×𝑑𝑑

Output: samples an index 𝑖𝑖 ∈ [𝑛𝑛] w.p. 1 ± 𝜖𝜖 𝑨𝑨𝒊𝒊𝑷𝑷 22

𝑨𝑨𝑷𝑷 𝐹𝐹2 + 1

𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝(𝑛𝑛) In one pass 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑, 𝜖𝜖−1, log𝑛𝑛) space

Page 89:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Adaptive Sampler1. Simulate adaptive sampling in 1 pass

• 𝐿𝐿𝑝𝑝,2 sampling with post processing matrix 𝑃𝑃

2. Applications in turnstile stream• Row/column subset selection• Subspace approximation• Projective clustering• Volume Maximization

3. Volume maximization lower bounds

4. Volume maximization in row arrival

Page 90:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Algorithm Using 𝑳𝑳𝟐𝟐,𝟐𝟐 SamplerMaintain 𝑘𝑘 instances of 𝐿𝐿2,2 sampler with post processing: 𝑺𝑺𝟏𝟏, … ,𝑺𝑺𝒌𝒌𝑀𝑀 ← ∅For round 𝑖𝑖 = 1 to 𝑘𝑘,

• Set 𝑃𝑃 ← 𝐼𝐼 −𝑀𝑀+𝑀𝑀• Use 𝑺𝑺𝒊𝒊 to sample a noisy row 𝑃𝑃𝑗𝑗 of 𝐴𝐴 with post processing matrix 𝑃𝑃• Append 𝑃𝑃𝑗𝑗 to 𝑀𝑀

Page 91:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Algorithm Using 𝑳𝑳𝟐𝟐,𝟐𝟐 SamplerMaintain 𝑘𝑘 instances of 𝐿𝐿2,2 sampler with post processing: 𝑺𝑺𝟏𝟏, … ,𝑺𝑺𝒌𝒌𝑀𝑀 ← ∅For round 𝑖𝑖 = 1 to 𝑘𝑘,

• Set 𝑃𝑃 ← 𝐼𝐼 −𝑀𝑀+𝑀𝑀• Use 𝑺𝑺𝒊𝒊 to sample a noisy row 𝑃𝑃𝑗𝑗 of 𝐴𝐴 with post processing matrix 𝑃𝑃• Append 𝑃𝑃𝑗𝑗 to 𝑀𝑀

Issues:X Noisy perturbation of rows (unavoidable) Sample 𝑗𝑗, 𝑃𝑃𝑗𝑗 = A𝑗𝑗𝑃𝑃 + v where v has small norm v < 𝜖𝜖 𝐴𝐴𝑗𝑗𝑃𝑃 thus 𝑃𝑃𝑗𝑗 ≈ 𝐴𝐴𝑗𝑗𝑃𝑃

Page 92:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Algorithm Using 𝑳𝑳𝟐𝟐,𝟐𝟐 SamplerMaintain 𝑘𝑘 instances of 𝐿𝐿2,2 sampler with post processing: 𝑺𝑺𝟏𝟏, … ,𝑺𝑺𝒌𝒌𝑀𝑀 ← ∅For round 𝑖𝑖 = 1 to 𝑘𝑘,

• Set 𝑃𝑃 ← 𝐼𝐼 −𝑀𝑀+𝑀𝑀• Use 𝑺𝑺𝒊𝒊 to sample a noisy row 𝑃𝑃𝑗𝑗 of 𝐴𝐴 with post processing matrix 𝑃𝑃• Append 𝑃𝑃𝑗𝑗 to 𝑀𝑀

Issues:X Noisy perturbation of rows (unavoidable) Sample 𝑗𝑗, 𝑃𝑃𝑗𝑗 = A𝑗𝑗𝑃𝑃 + v where v has small norm v < 𝜖𝜖 𝐴𝐴𝑗𝑗𝑃𝑃 thus 𝑃𝑃𝑗𝑗 ≈ 𝐴𝐴𝑗𝑗𝑃𝑃

X This can drastically change the probabilities: may zero out probabilities of some rows

Page 93:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

𝑨𝑨𝟏𝟏 = (𝑴𝑴,𝟎𝟎)

𝑨𝑨𝟐𝟐 = (𝟎𝟎,𝟏𝟏)

Page 94:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

𝑨𝑨𝟏𝟏

𝑨𝑨𝟐𝟐 𝒓𝒓𝟏𝟏

Page 95:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

𝒓𝒓𝟏𝟏

Page 96:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

𝑨𝑨𝟏𝟏(𝑰𝑰 −𝑴𝑴+𝑴𝑴)

𝑨𝑨𝟐𝟐(𝑰𝑰 −𝑴𝑴+𝑴𝑴) 𝒓𝒓𝟏𝟏

Noisy row sampling: 𝑨𝑨𝟏𝟏(𝑰𝑰 −𝑴𝑴+𝑴𝑴) ≥ 𝑨𝑨𝟐𝟐(𝑰𝑰 −𝑴𝑴+𝑴𝑴)

Page 97:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

𝑨𝑨𝟏𝟏(𝑰𝑰 −𝑴𝑴+𝑴𝑴)

𝑨𝑨𝟐𝟐(𝑰𝑰 −𝑴𝑴+𝑴𝑴) 𝒓𝒓𝟏𝟏

Sample one row again and again

Noisy row sampling: 𝑨𝑨𝟏𝟏(𝑰𝑰 −𝑴𝑴+𝑴𝑴) ≥ 𝑨𝑨𝟐𝟐(𝑰𝑰 −𝑴𝑴+𝑴𝑴)

Page 98:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

Noisy row sampling: 𝑨𝑨𝟏𝟏(𝑰𝑰 −𝑴𝑴+𝑴𝑴) ≥ 𝑨𝑨𝟐𝟐(𝑰𝑰 −𝑴𝑴+𝑴𝑴)

True row sampling: 𝑨𝑨𝟏𝟏(𝑰𝑰 −𝑴𝑴+𝑴𝑴) = 0

Page 99:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

x We cannot hope for a multiplicative bound on probabilities.

Page 100:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

x We cannot hope for a multiplicative bound on probabilities.

Lemma: Not only the norm of 𝑣𝑣 is small in compare to 𝐴𝐴𝑗𝑗 but also its norm projected away from 𝐴𝐴𝑗𝑗 is small

Page 101:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

x We cannot hope for a multiplicative bound on probabilities.

Lemma: Not only the norm of 𝑣𝑣 is small in compare to 𝐴𝐴𝑗𝑗 but also its norm projected away from 𝐴𝐴𝑗𝑗 is small:

• 𝑃𝑃𝑗𝑗 = 𝐴𝐴𝑗𝑗𝑃𝑃 + 𝑣𝑣

• where 𝒗𝒗𝑸𝑸 ≤ 𝜖𝜖 𝐴𝐴𝑗𝑗𝑃𝑃 ⋅ 𝑨𝑨𝑷𝑷𝑸𝑸 𝑭𝑭𝑨𝑨𝑷𝑷 𝑭𝑭

for any projection matrix 𝑸𝑸

Page 102:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

x We cannot hope for a multiplicative bound on probabilities.

Lemma: Not only the norm of 𝑣𝑣 is small in compare to 𝐴𝐴𝑗𝑗 but also its norm projected away from 𝐴𝐴𝑗𝑗 is small:

• 𝑃𝑃𝑗𝑗 = 𝐴𝐴𝑗𝑗𝑃𝑃 + 𝑣𝑣

• where 𝒗𝒗𝑸𝑸 ≤ 𝜖𝜖 𝐴𝐴𝑗𝑗𝑃𝑃 ⋅ 𝑨𝑨𝑷𝑷𝑸𝑸 𝑭𝑭𝑨𝑨𝑷𝑷 𝑭𝑭

for any projection matrix 𝑸𝑸

Page 103:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Bad Example

x We cannot hope for a multiplicative bound on probabilities.

Lemma: Not only the norm of 𝑣𝑣 is small in compare to 𝐴𝐴𝑗𝑗 but also its norm projected away from 𝐴𝐴𝑗𝑗 is small:

• 𝑃𝑃𝑗𝑗 = 𝐴𝐴𝑗𝑗𝑃𝑃 + 𝑣𝑣

• where 𝒗𝒗𝑸𝑸 ≤ 𝜖𝜖 𝐴𝐴𝑗𝑗𝑃𝑃 ⋅ 𝑨𝑨𝑷𝑷𝑸𝑸 𝑭𝑭𝑨𝑨𝑷𝑷 𝑭𝑭

for any projection matrix 𝑸𝑸

Bound the additive error of sampling probabilities in subsequent rounds

Page 104:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Overview of How to Bound the ErrorSuppose indices reported by our algorithm are 𝑗𝑗1, … , 𝑗𝑗𝑘𝑘Consider two bases 𝑼𝑼 and 𝑾𝑾• 𝑼𝑼 follows True rows: 𝑈𝑈 = 𝑢𝑢1, … ,𝑢𝑢𝑑𝑑 s.t. {𝑢𝑢1, … ,𝑢𝑢𝑖𝑖} spans {𝐴𝐴𝑗𝑗1 , … ,𝐴𝐴𝑗𝑗𝑖𝑖}• 𝑾𝑾 follows Noisy rows: 𝑊𝑊 = 𝑤𝑤1, … ,𝑤𝑤𝑑𝑑 s.t. {𝑤𝑤1, … ,𝑤𝑤𝑖𝑖} spans {𝑃𝑃𝑗𝑗1 , … , 𝑃𝑃𝑗𝑗𝑖𝑖}

Page 105:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Overview of How to Bound the Error

For row 𝐴𝐴𝑥𝑥:

• 𝐴𝐴𝑥𝑥 = ∑𝑖𝑖=1𝑑𝑑 𝜆𝜆𝑥𝑥,𝑖𝑖𝑢𝑢𝑖𝑖• 𝐴𝐴𝑥𝑥 = ∑𝑖𝑖=1𝑑𝑑 𝜉𝜉𝑥𝑥,𝑖𝑖𝑤𝑤𝑖𝑖

Suppose indices reported by our algorithm are 𝑗𝑗1, … , 𝑗𝑗𝑘𝑘Consider two bases 𝑼𝑼 and 𝑾𝑾• 𝑼𝑼 follows True rows: 𝑈𝑈 = 𝑢𝑢1, … ,𝑢𝑢𝑑𝑑 s.t. {𝑢𝑢1, … ,𝑢𝑢𝑖𝑖} spans {𝐴𝐴𝑗𝑗1 , … ,𝐴𝐴𝑗𝑗𝑖𝑖}• 𝑾𝑾 follows Noisy rows: 𝑊𝑊 = 𝑤𝑤1, … ,𝑤𝑤𝑑𝑑 s.t. {𝑤𝑤1, … ,𝑤𝑤𝑖𝑖} spans {𝑃𝑃𝑗𝑗1 , … , 𝑃𝑃𝑗𝑗𝑖𝑖}

Page 106:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Overview of How to Bound the Error

For row 𝐴𝐴𝑥𝑥:

• 𝐴𝐴𝑥𝑥 = ∑𝑖𝑖=1𝑑𝑑 𝜆𝜆𝑥𝑥,𝑖𝑖𝑢𝑢𝑖𝑖• 𝐴𝐴𝑥𝑥 = ∑𝑖𝑖=1𝑑𝑑 𝜉𝜉𝑥𝑥,𝑖𝑖𝑤𝑤𝑖𝑖

Sampling probs in terms of 𝑈𝑈 and 𝑊𝑊 in t-th round

• The correct probability: ∑𝑖𝑖=𝑡𝑡𝑑𝑑 𝜆𝜆𝑥𝑥,𝑖𝑖

2

∑𝑦𝑦=1𝑛𝑛 ∑𝑖𝑖=𝑡𝑡𝑑𝑑 𝜆𝜆𝑦𝑦,𝑖𝑖

2

• What we sample from: ∑𝑖𝑖=𝑡𝑡𝑑𝑑 𝜉𝜉𝑥𝑥,𝑖𝑖

2

∑𝑦𝑦=1𝑛𝑛 ∑𝑖𝑖=𝑡𝑡𝑑𝑑 𝜉𝜉𝑦𝑦,𝑖𝑖

2

Suppose indices reported by our algorithm are 𝑗𝑗1, … , 𝑗𝑗𝑘𝑘Consider two bases 𝑼𝑼 and 𝑾𝑾• 𝑼𝑼 follows True rows: 𝑈𝑈 = 𝑢𝑢1, … ,𝑢𝑢𝑑𝑑 s.t. {𝑢𝑢1, … ,𝑢𝑢𝑖𝑖} spans {𝐴𝐴𝑗𝑗1 , … ,𝐴𝐴𝑗𝑗𝑖𝑖}• 𝑾𝑾 follows Noisy rows: 𝑊𝑊 = 𝑤𝑤1, … ,𝑤𝑤𝑑𝑑 s.t. {𝑤𝑤1, … ,𝑤𝑤𝑖𝑖} spans {𝑃𝑃𝑗𝑗1 , … , 𝑃𝑃𝑗𝑗𝑖𝑖}

Page 107:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Overview of How to Bound the Error

For row 𝐴𝐴𝑥𝑥:

• 𝐴𝐴𝑥𝑥 = ∑𝑖𝑖=1𝑑𝑑 𝜆𝜆𝑥𝑥,𝑖𝑖𝑢𝑢𝑖𝑖• 𝐴𝐴𝑥𝑥 = ∑𝑖𝑖=1𝑑𝑑 𝜉𝜉𝑥𝑥,𝑖𝑖𝑤𝑤𝑖𝑖

Sampling probs in terms of 𝑈𝑈 and 𝑊𝑊 in t-th round

• The correct probability: ∑𝑖𝑖=𝑡𝑡𝑑𝑑 𝜆𝜆𝑥𝑥,𝑖𝑖

2

∑𝑦𝑦=1𝑛𝑛 ∑𝑖𝑖=𝑡𝑡𝑑𝑑 𝜆𝜆𝑦𝑦,𝑖𝑖

2

• What we sample from: ∑𝑖𝑖=𝑡𝑡𝑑𝑑 𝜉𝜉𝑥𝑥,𝑖𝑖

2

∑𝑦𝑦=1𝑛𝑛 ∑𝑖𝑖=𝑡𝑡𝑑𝑑 𝜉𝜉𝑦𝑦,𝑖𝑖

2

Suppose indices reported by our algorithm are 𝑗𝑗1, … , 𝑗𝑗𝑘𝑘Consider two bases 𝑼𝑼 and 𝑾𝑾• 𝑼𝑼 follows True rows: 𝑈𝑈 = 𝑢𝑢1, … ,𝑢𝑢𝑑𝑑 s.t. {𝑢𝑢1, … ,𝑢𝑢𝑖𝑖} spans {𝐴𝐴𝑗𝑗1 , … ,𝐴𝐴𝑗𝑗𝑖𝑖}• 𝑾𝑾 follows Noisy rows: 𝑊𝑊 = 𝑤𝑤1, … ,𝑤𝑤𝑑𝑑 s.t. {𝑤𝑤1, … ,𝑤𝑤𝑖𝑖} spans {𝑃𝑃𝑗𝑗1 , … , 𝑃𝑃𝑗𝑗𝑖𝑖}

Difference between the correct prob and our algorithm sampling prob over all rows is 𝜖𝜖 for one round

• Change of basis matrix ≈ Identity matrix• Bound total variation distance by 𝜖𝜖

Error in each round gets propagated 𝑘𝑘 times

Total error is O(𝑘𝑘2𝜖𝜖)

Page 108:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Theorem:Our algorithm reports a set of 𝑘𝑘 indices such that with high probability • the total variation distance between the probability distribution output by the

algorithm and the probability distribution of adaptive sampling is at most 𝑂𝑂(𝜖𝜖)

• The algorithm uses space 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑘𝑘, 1𝜖𝜖

,𝑑𝑑, log𝑛𝑛)

Page 109:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications1. Simulate adaptive sampling in 1 pass

• 𝐿𝐿𝑝𝑝,2 sampling with post processing matrix 𝑃𝑃

2. Applications in turnstile stream• Row/column subset selection• Subspace approximation• Projective clustering• Volume Maximization

3. Volume maximization lower bounds

4. Volume maximization in row arrival

Page 110:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

ApplicationsMain Challenge: it suffices to get a noisy perturbation of the rows

Page 111:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset Selection

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0

Output: 𝒌𝒌 rows of 𝐴𝐴 to form 𝑴𝑴 to minimize 𝐴𝐴 − 𝐴𝐴𝑀𝑀+𝑀𝑀 𝐹𝐹

Page 112:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset SelectionAdaptive Sampling provides a 𝒌𝒌 + 𝟏𝟏 ! approximation for subset selection

[DRVW’06]: Volume Sampling provides a 𝑘𝑘 + 1 factor approximation to row subset selection with constant probability.

[DV’06]: Sampling probabilities for any 𝑘𝑘-set 𝑆𝑆 produced by Adaptive Sampling is at most 𝑘𝑘! of its sampling probability with respect to volume sampling.

Page 113:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset SelectionAdaptive Sampling provides a 𝒌𝒌 + 𝟏𝟏 ! approximation for subset selection

[DRVW’06]: Volume Sampling provides a 𝑘𝑘 + 1 factor approximation to row subset selection with constant probability.

[DV’06]: Sampling probabilities for any 𝑘𝑘-set 𝑆𝑆 produced by Adaptive Sampling is at most 𝑘𝑘! of its sampling probability with respect to volume sampling.

Non-adaptive Adaptive Sampling provides a good approximation to Adaptive Sampling

Page 114:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset SelectionAdaptive Sampling provides a 𝒌𝒌 + 𝟏𝟏 ! approximation for subset selection

[DRVW’06]: Volume Sampling provides a 𝑘𝑘 + 1 factor approximation to row subset selection with constant probability.

[DV’06]: Sampling probabilities for any 𝑘𝑘-set 𝑆𝑆 produced by Adaptive Sampling is at most 𝑘𝑘! of its sampling probability with respect to volume sampling.

Non-adaptive Adaptive Sampling provides a good approximation to Adaptive Sampling

1. For a set of indices 𝑱𝑱 output by our algorithm, 𝐴𝐴 𝐼𝐼 − 𝑅𝑅+𝑅𝑅 F ≤ (1 + 𝜖𝜖) 𝐴𝐴 𝐼𝐼 −𝑀𝑀+𝑀𝑀 F, w.h.p.

• 𝑅𝑅: the set of noisy rows corresponding to 𝑱𝑱• 𝑀𝑀: the set of true rows corresponding to 𝑱𝑱

Page 115:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset SelectionAdaptive Sampling provides a 𝒌𝒌 + 𝟏𝟏 ! approximation for subset selection

[DRVW’06]: Volume Sampling provides a 𝑘𝑘 + 1 factor approximation to row subset selection with constant probability.

[DV’06]: Sampling probabilities for any 𝑘𝑘-set 𝑆𝑆 produced by Adaptive Sampling is at most 𝑘𝑘! of its sampling probability with respect to volume sampling.

Non-adaptive Adaptive Sampling provides a good approximation to Adaptive Sampling

1. For a set of indices 𝑱𝑱 output by our algorithm, 𝐴𝐴 𝐼𝐼 − 𝑅𝑅+𝑅𝑅 F ≤ (1 + 𝜖𝜖) 𝐴𝐴 𝐼𝐼 −𝑀𝑀+𝑀𝑀 F, w.h.p.

• 𝑅𝑅: the set of noisy rows corresponding to 𝑱𝑱• 𝑀𝑀: the set of true rows corresponding to 𝑱𝑱

2. For most 𝑘𝑘-sets 𝑱𝑱, its prob. by adaptive sampling is within 𝑂𝑂(1) factor of Non-adaptive Sampling.

Page 116:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Row Subset Selection

Our Result: finds M such that,Pr[ 𝐴𝐴 − 𝐴𝐴𝑴𝑴+𝑴𝑴 𝐹𝐹

2 ≤ 16 𝑘𝑘 + 1 ! 𝐴𝐴 − 𝐴𝐴𝑘𝑘 𝐹𝐹2 ] ≥ 2/3

• 𝐴𝐴𝑘𝑘: best rank-k approximation of 𝐴𝐴• first one pass turnstile streaming algorithm• 𝑝𝑝𝑃𝑃𝑝𝑝𝑝𝑝(𝑑𝑑,𝑘𝑘, log𝑛𝑛) space

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌 > 0

Output: 𝒌𝒌 rows of 𝐴𝐴 to form 𝑴𝑴 to minimize 𝐴𝐴 − 𝐴𝐴𝑀𝑀+𝑀𝑀 𝐹𝐹

Page 117:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Page 118:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Volume of the parallelepiped spanned by those vectors

𝒌𝒌 = 𝟐𝟐

Page 119:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

Greedy

• For 𝑘𝑘 rounds, pick the vector that is farthest away from the current subspace.

𝒌𝒌 = 𝟐𝟐

Page 120:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

Greedy

• For 𝑘𝑘 rounds, pick the vector that is farthest away from the current subspace.

𝒌𝒌 = 𝟐𝟐

Page 121:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

Greedy

• For 𝑘𝑘 rounds, pick the vector that is farthest away from the current subspace.

𝒌𝒌 = 𝟐𝟐

Page 122:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

Greedy

• For 𝑘𝑘 rounds, pick the vector that is farthest away from the current subspace.

𝒌𝒌 = 𝟐𝟐

Page 123:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

Greedy

• For 𝑘𝑘 rounds, pick the vector that is farthest away from the current subspace.

𝒌𝒌 = 𝟐𝟐

Page 124:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

Simulate Greedy

• Maintain 𝑘𝑘 instances of CountSketch, AMS and 𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

Page 125:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

If the largest row exceeds the threshold, then it is correctly found by CountSketch w.h.p.

Simulate Greedy

• Maintain 𝑘𝑘 instances of CountSketch, AMS and 𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

• For 𝑘𝑘 rounds,• Let 𝒓𝒓 be the row of 𝐴𝐴𝑃𝑃 with largest norm //by CountSketch

Page 126:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

If the largest row exceeds the threshold, then it is correctly found by CountSketch w.h.p.

Otherwise, there are enough large rows and sampler chooses one of them w.h.p.

Simulate Greedy

• Maintain 𝑘𝑘 instances of CountSketch, AMS and 𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

• For 𝑘𝑘 rounds,• Let 𝒓𝒓 be the row of 𝐴𝐴𝑃𝑃 with largest norm //by CountSketch

• If 𝒓𝒓 𝟐𝟐 < 𝛼𝛼2

4𝑛𝑛𝑘𝑘𝑨𝑨𝑷𝑷 𝑭𝑭

𝟐𝟐, instead sample row 𝒓𝒓 according to norms of rows

Page 127:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization[Civril, Magdon’09] Greedy Algorithm Provides a 𝒌𝒌! approximation to Volume Maximization

If the largest row exceeds the threshold, then it is correctly found by CountSketch w.h.p.

Otherwise, there are enough large rows and sampler chooses one of them w.h.p.

Simulate Greedy

• Maintain 𝑘𝑘 instances of CountSketch, AMS and 𝑳𝑳𝟐𝟐,𝟐𝟐 Sampler

• For 𝑘𝑘 rounds,• Let 𝒓𝒓 be the row of 𝐴𝐴𝑃𝑃 with largest norm //by CountSketch

• If 𝒓𝒓 𝟐𝟐 < 𝛼𝛼2

4𝑛𝑛𝑘𝑘𝑨𝑨𝑷𝑷 𝑭𝑭

𝟐𝟐, instead sample row 𝒓𝒓 according to norms of rows

• Add 𝑃𝑃 to the solution, and update the postprocessing matrix 𝑃𝑃

Page 128:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Applications: Volume Maximization

Input: 𝐴𝐴 ∈ ℝ𝑛𝑛×𝑑𝑑 and an integer 𝒌𝒌Output: 𝒌𝒌 rows 𝒓𝒓𝟏𝟏, … , 𝒓𝒓𝒌𝒌 of 𝐴𝐴, 𝑴𝑴, with maximum volume

Our Result: for an approximation factor 𝜶𝜶, finds S (set of 𝒌𝒌 noisy rows of 𝑨𝑨) s.t.,

Pr[𝛼𝛼𝑘𝑘 𝑘𝑘! Vol 𝐒𝐒 ≥ Vol(𝐌𝐌)] ≥ 2/3• first one pass turnstile streaming algorithm• �𝑂𝑂( ⁄𝑛𝑛𝑑𝑑𝑘𝑘2 𝛼𝛼2) space

Page 129:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Problem Model Approximation/error space Comments

𝑳𝑳𝒑𝒑,𝟐𝟐 Sampler

turnstile

(𝟏𝟏 + 𝝐𝝐) relative + 𝟏𝟏𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑 𝒏𝒏 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅, 𝝐𝝐−𝟏𝟏, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)

Adaptive Sampling 𝑶𝑶(𝝐𝝐) total variation distance 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝝐𝝐−𝟏𝟏, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)Row Subset Selection 𝑶𝑶( 𝒌𝒌 + 𝟏𝟏 !) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)

Subspace Approximation

𝑶𝑶( 𝒌𝒌 + 𝟏𝟏 !) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)(𝟏𝟏 + 𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏 ,𝟏𝟏/𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒌𝒌,𝟏𝟏/𝝐𝝐) rows

Projective Clustering (𝟏𝟏 + 𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏 , 𝒔𝒔,𝟏𝟏/𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒌𝒌, 𝒔𝒔,𝟏𝟏/𝝐𝝐) rows

Volume Maximization

𝜶𝜶𝒌𝒌 𝒌𝒌! �𝑶𝑶( ⁄𝒏𝒏𝒅𝒅𝒌𝒌𝟐𝟐 𝜶𝜶𝟐𝟐)𝜶𝜶𝒌𝒌 𝛀𝛀( ⁄𝒏𝒏 𝒌𝒌𝒑𝒑𝜶𝜶𝟐𝟐) 𝒑𝒑 pass

Row Arrival

𝑪𝑪𝒌𝒌 𝛀𝛀(𝒏𝒏) Random Order�𝑶𝑶 𝑪𝑪𝒌𝒌 𝒌𝒌/𝟐𝟐 �𝑶𝑶(𝒏𝒏𝑶𝑶(𝟏𝟏/𝑪𝑪)𝒅𝒅) 𝑪𝑪 < ⁄(𝐥𝐥𝐥𝐥𝐥𝐥 𝒏𝒏) 𝒌𝒌

Page 130:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Open problems

• Get tight dependence on the parameters• Further applications of non-adaptive adaptive sampling

• Result on Volume Maximization in row arrival model is not tight, i.e., can we get 𝑂𝑂(𝑘𝑘)𝑘𝑘 approximation without dependence on 𝑛𝑛?

Problem Model Approximation/error space Comments

𝑳𝑳𝒑𝒑,𝟐𝟐 Sampler

turnstile

(𝟏𝟏 + 𝝐𝝐) relative + 𝟏𝟏𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑 𝒏𝒏 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅, 𝝐𝝐−𝟏𝟏, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)

Adaptive Sampling 𝑶𝑶(𝝐𝝐) total variation distance 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝝐𝝐−𝟏𝟏, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)Row Subset Selection 𝑶𝑶( 𝒌𝒌 + 𝟏𝟏 !) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)

Subspace Approximation

𝑶𝑶( 𝒌𝒌 + 𝟏𝟏 !) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)(𝟏𝟏 + 𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏 ,𝟏𝟏/𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒌𝒌,𝟏𝟏/𝝐𝝐) rows

Projective Clustering (𝟏𝟏 + 𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏 , 𝒔𝒔,𝟏𝟏/𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒌𝒌, 𝒔𝒔,𝟏𝟏/𝝐𝝐) rows

Volume Maximization

𝜶𝜶𝒌𝒌 𝒌𝒌! �𝑶𝑶( ⁄𝒏𝒏𝒅𝒅𝒌𝒌𝟐𝟐 𝜶𝜶𝟐𝟐)𝜶𝜶𝒌𝒌 𝛀𝛀( ⁄𝒏𝒏 𝒌𝒌𝒑𝒑𝜶𝜶𝟐𝟐) 𝒑𝒑 pass

Row Arrival

𝑪𝑪𝒌𝒌 𝛀𝛀(𝒏𝒏) Random Order�𝑶𝑶 𝑪𝑪𝒌𝒌 𝒌𝒌/𝟐𝟐 �𝑶𝑶(𝒏𝒏𝑶𝑶(𝟏𝟏/𝑪𝑪)𝒅𝒅) 𝑪𝑪 < ⁄(𝐥𝐥𝐥𝐥𝐥𝐥 𝒏𝒏) 𝒌𝒌

Page 131:  · Given: • 𝑛𝑛by 𝑑𝑑matrix 𝑨𝑨∈ℝ. 𝒏𝒏𝒅𝒅× • parameter 𝑘𝑘. Goal: • Find a representation (of “size 𝑘𝑘”) for the data • Optimize

Problem Model Approximation/error space Comments

𝑳𝑳𝒑𝒑,𝟐𝟐 Sampler

turnstile

(𝟏𝟏 + 𝝐𝝐) relative + 𝟏𝟏𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑 𝒏𝒏 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅, 𝝐𝝐−𝟏𝟏, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)

Adaptive Sampling 𝑶𝑶(𝝐𝝐) total variation distance 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝝐𝝐−𝟏𝟏, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)Row Subset Selection 𝑶𝑶( 𝒌𝒌 + 𝟏𝟏 !) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)

Subspace Approximation

𝑶𝑶( 𝒌𝒌 + 𝟏𝟏 !) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏)(𝟏𝟏 + 𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏 ,𝟏𝟏/𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒌𝒌,𝟏𝟏/𝝐𝝐) rows

Projective Clustering (𝟏𝟏 + 𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒅𝒅,𝒌𝒌, 𝐥𝐥𝐥𝐥𝐥𝐥𝒏𝒏 , 𝒔𝒔,𝟏𝟏/𝝐𝝐) 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑(𝒌𝒌, 𝒔𝒔,𝟏𝟏/𝝐𝝐) rows

Volume Maximization

𝜶𝜶𝒌𝒌 𝒌𝒌! �𝑶𝑶( ⁄𝒏𝒏𝒅𝒅𝒌𝒌𝟐𝟐 𝜶𝜶𝟐𝟐)𝜶𝜶𝒌𝒌 𝛀𝛀( ⁄𝒏𝒏 𝒌𝒌𝒑𝒑𝜶𝜶𝟐𝟐) 𝒑𝒑 pass

Row Arrival

𝑪𝑪𝒌𝒌 𝛀𝛀(𝒏𝒏) Random Order�𝑶𝑶 𝑪𝑪𝒌𝒌 𝒌𝒌/𝟐𝟐 �𝑶𝑶(𝒏𝒏𝑶𝑶(𝟏𝟏/𝑪𝑪)𝒅𝒅) 𝑪𝑪 < ⁄(𝐥𝐥𝐥𝐥𝐥𝐥 𝒏𝒏) 𝒌𝒌

Open problems

• Get tight dependence on the parameters• Further applications of non-adaptive adaptive sampling

• Result on Volume Maximization in row arrival model is not tight, i.e., can we get 𝑂𝑂(𝑘𝑘)𝑘𝑘 approximation without dependence on 𝑛𝑛?

Thank You!