sublinear algorihms for big data

Post on 06-Jan-2016

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides are available at http://grigory.us/big-data.html. Sublinear Algorihms for Big Data. Lecture 4. Grigory Yaroslavtsev http://grigory.us. Today. Dimensionality reduction AMS as dimensionality reduction Johnson- Lindenstrauss transform Sublinear time algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Sublinear Algorihms for Big Data

Grigory Yaroslavtsevhttp://grigory.us

Lecture 4

Slides are available at http://grigory.us/big-data.html

Today

• Dimensionality reduction– AMS as dimensionality reduction– Johnson-Lindenstrauss transform

• Sublinear time algorithms– Definitions: approximation, property testing– Basic examples: approximating diameter, testing

properties of images– Testing sortedness– Testing connectedness

-norm Estimation

• Stream: updates that define vector where . • Example: For

• -norm:

-norm Estimation

• -norm: • Two lectures ago:– -moment – -moment (via AMS sketching)

• Space: • Technique: linear sketches– for random set s – for random signs

AMS as dimensionality reduction

• Maintain a “linear sketch” vector

, where • Estimator for :

• “Dimensionality reduction”: “heavy” tail

Normal Distribution

• Normal distribution – Range: – Density: – Mean = 0, Variance = 1

• Basic facts:– If and are independent r.v. with normal

distribution then has normal distribution

– If are independent, then

Johnson-Lindenstrauss Transform

• Instead of let be i.i.d. random variables from normal distribution

• We still have because:– “variance of “

• Define and define:

• JL Lemma: There exists s.t. for small enough

Proof of JL Lemma

• JL Lemma: s.t. for small enough

• Assume . • We have and

• Alternative form of JL Lemma:

Proof of JL Lemma

• Alternative form of JL Lemma:

• Let and • For every we have:

• By Markov and independence of :

• We have , hence:

Proof of JL Lemma

• Alternative form of JL Lemma:

• For every we have:

• Let and recall that • A calculation finishes the proof:

Johnson-Lindenstrauss Transform

• Single vector: – Tight: [Woodruff’10]

• vectors simultaneously: – Tight: [Molinaro, Woodruff, Y. ’13]

• Distances between vectors = vectors:

top related