sublinear algorihms for big data

11
Sublinear Algorihms for Big Data Grigory Yaroslavtsev http://grigory.us Lectur e 4 Slides are available at http://grigory.us/big-data.html

Upload: colman

Post on 06-Jan-2016

17 views

Category:

Documents


0 download

DESCRIPTION

Slides are available at http://grigory.us/big-data.html. Sublinear Algorihms for Big Data. Lecture 4. Grigory Yaroslavtsev http://grigory.us. Today. Dimensionality reduction AMS as dimensionality reduction Johnson- Lindenstrauss transform Sublinear time algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sublinear Algorihms  for Big Data

Sublinear Algorihms for Big Data

Grigory Yaroslavtsevhttp://grigory.us

Lecture 4

Slides are available at http://grigory.us/big-data.html

Page 2: Sublinear Algorihms  for Big Data

Today

• Dimensionality reduction– AMS as dimensionality reduction– Johnson-Lindenstrauss transform

• Sublinear time algorithms– Definitions: approximation, property testing– Basic examples: approximating diameter, testing

properties of images– Testing sortedness– Testing connectedness

Page 3: Sublinear Algorihms  for Big Data

-norm Estimation

• Stream: updates that define vector where . • Example: For

• -norm:

Page 4: Sublinear Algorihms  for Big Data

-norm Estimation

• -norm: • Two lectures ago:– -moment – -moment (via AMS sketching)

• Space: • Technique: linear sketches– for random set s – for random signs

Page 5: Sublinear Algorihms  for Big Data

AMS as dimensionality reduction

• Maintain a “linear sketch” vector

, where • Estimator for :

• “Dimensionality reduction”: “heavy” tail

Page 6: Sublinear Algorihms  for Big Data

Normal Distribution

• Normal distribution – Range: – Density: – Mean = 0, Variance = 1

• Basic facts:– If and are independent r.v. with normal

distribution then has normal distribution

– If are independent, then

Page 7: Sublinear Algorihms  for Big Data

Johnson-Lindenstrauss Transform

• Instead of let be i.i.d. random variables from normal distribution

• We still have because:– “variance of “

• Define and define:

• JL Lemma: There exists s.t. for small enough

Page 8: Sublinear Algorihms  for Big Data

Proof of JL Lemma

• JL Lemma: s.t. for small enough

• Assume . • We have and

• Alternative form of JL Lemma:

Page 9: Sublinear Algorihms  for Big Data

Proof of JL Lemma

• Alternative form of JL Lemma:

• Let and • For every we have:

• By Markov and independence of :

• We have , hence:

Page 10: Sublinear Algorihms  for Big Data

Proof of JL Lemma

• Alternative form of JL Lemma:

• For every we have:

• Let and recall that • A calculation finishes the proof:

Page 11: Sublinear Algorihms  for Big Data

Johnson-Lindenstrauss Transform

• Single vector: – Tight: [Woodruff’10]

• vectors simultaneously: – Tight: [Molinaro, Woodruff, Y. ’13]

• Distances between vectors = vectors: