algorithm frameworks based on structure...

Algorithm Frameworks Based on Structure Preserving Sampling

Richard Peng

Georgia Tech

OUTLINE

• Sampling / sparsification

• Sparsifying without construction

• Algorithmic frameworks

RANDOM SAMPLING

Pick a small subset of a

collection of many objects

Goal:

• Estimate quantities

• Reduce sizes

• Speed up algorithms

Point sets Gradients Graphs

SAMPLING IN ALGORITHMS

• Compute on the sample

• Need to preserve more structure

Randomized size reductions used in

• Low rank approximations

• Linear system solvers

• Combinatorial optimization

PRESERVING GRAPH STRUCTURES

Undirected graph, n vertices, m < n2 edges

Is n2 edges (dense) sometimes necessary?

For some information, e.g. connectivity: < n edges

Preserving more: [Benczur-Karger `96]

for ANY G, can sample to get H with

O(nlogn) edges s.t. G ≈ H on all cuts

HOW TO SAMPLE?Widely used: uniform sampling

Works well when data is

uniform e.g. complete graph

Problem: long path, removing

any edge changes connectivity

(can also have both in one graph)

More systematic view of sampling?

1

Edge-vertex incidence matrix:

Beu = -1/1 if u is endpoint of e

0 otherwise

ALGEBRAIC REPRESENTATION OF GRAPHS

1n vertices

m edges

m rows

n columns1 -1 0

-1 0 1

• x: 0/1 indicator vector for cut

• Bx: difference across edges

• ║Bx║2 : size of cut given by x

1

0

1

1

0

n

n

ALGEBRAIC VIEW OF SAMPLING EDGES

B’B

L2 Row sampling:

Given B with m>>n, sample a few rows to form B’ s.t.║Bx║2 ≈║B’x║2 ∀ x

Note: normally use A instead of

B, n and d instead of m and n

m

0 -1 0 0 0 1 00 -5 0 0 0 5 0

≈n

≈: multiplicative

approximation

UNIFORM SAMPLING

Works well when data is

uniform e.g. complete graph

More general: importance sampling,

flip biased coin for each element

00100

n/mn/mn/mn/m1

Hard

cases:

PATTERN

L2 : leverage scores / effective resistances

Column subset: robust leverage scores

Lp norms: Lewis weights

For many objectives, exists a set of

`correct’ probabilities that sum to rank

Sampling by ANY upper bounds × O(logn)

gives a good approximation w.h.p.

τi = biT(BTB)-1bi

wi2/p = ai

T(ATW1-2/pA)-1ai

p1

p2

p3

p4

p5

5~10

[Spielman-Srivastava `08]:

• Leverage scores are same as weight

times effective resistance

• Sampling a graph with probabilities

at least O(logn) timest these values

give a good spectral approximation

UNDERSTANDING SPECTRAL SPARSIFICATION

MORE SYSTEMATIC VIEW

[Talagrand `90] “Embedding subspaces of L1 into LN1

”

can be interpreted as row-sampling / sparsification

[Cohen-P `15]: for any 1 ≤ p ≤ 2, input sparsity

time routines for producing A’ with about O(nlogn) rows s.t.║Ax║p ≈║A’x║p ∀ x

║y║1

║y║2

Rest of this talk: p = 2 case

WHY SPARSIFICATION?

• ║Bx║2 ≈║B’x║2 ∀ x is same as small relative

condition number from numerical analysis

• Easier to generate than verify approximations

• Probabilities computed adaptively, still

computing with all of the data

However, most data are sparse to begin with!

OUTLINE




DENSE OBJECTS

• Matrix powers

• Matrix inverse

• Transitive closures

Cost-prohibitive to compute / store

Directly access sparse approximates?

RANDOM WALKS

A: adjacency matrix of random walk

Still a graph, can sparsify!

Ak: k step random walk

• Formally, I – A = BTB

• I – A is Gram matrix of B

LONGER RANDOM WALKS

A: one step of random walk

A3: 3 steps of random walk

(part of) edge uv in A3

Length 3 path in A: u-y-z-v

SAMPLING WITHOUT CONSTRUCTING

I – A3 ≈ I - A

Rayleigh’s monotonicity: can use

resistance in subgraph as upper bound

Bound R(u, v) using length 3 path in X, u-y-z-v:

Bound on probability: wuy-1 + wyz

-1 + wzv-1

w( ) × R ( )

≈: operator approximation, same as ║Bx║2 ≈║B’x║2 ∀ x

SPARSIFYING I – AK

Repeat O(mlognε-2) times:

1. Pick 0 ≤ i ≤ k - 1 and edge e = xy at random.

2. Random walk i steps from x to u,

k – 1 - i steps from y to v.

3. Add (rescaled) edge between uv to

sparsifier.

Resembles:

• Local clustering

• Approximate triangle counting (c = 3)

[Cheng-Cheng-Liu-P-Teng `15]: any random

walk polynomial in nearly-linear time.

‘LIGHT WEIGHT’ SPARSIFICATION

• Terrible approximations to the ‘right’

probabilities can still give size reductions

• Such estimates are often easy to compute

• Many routines already compute sparsifiers:

e.g. approx. triangle counting via 2-step walks

GAUSSIAN ELIMINATION

For graphs:

• This is another graph:

• Analog of Y-Δ transform, remove

u, then connect all pairs of v ~ u

Partial state of Gaussian elimination:

Schur complement, linear system on

a subset of variables,

Alternate view:

equivalent circuit

on boundaries,

FILL

Degree d vertex: O(d2) new non-zero entries

Quickly leads to dense matrices

(after about 5000 vertices)

WAYS OF REDUCING FILL

• Elimination orderings: m ~ 105

• Drop entries by magnitude

(e.g. ichol from MATLAB): m ~ 106

[Kyng-Lee-P-Sachdeva-Spielman `15] approx

Schur complements in O(mlogcn) time

HIGH QUALITY SPARSIFICATION

Methods that work in theory:

• Solve linear system in graph Laplacians

• Page rank / local clustering

• Spanners / low diameter clusterings

• cs.cmu.edu/~jkoutis/SpectralAlgorithms.htm

• Wall clock: a few minutes for 107 edges

τi = biT(BTB)-1bi

OUTLINE




ALGORITHMIC USES OF SAMPLES

optimization problems with extra

conditions on x can be solved on B’

B’ s.t.║Bx║ ≈║B’x║ ∀ x

• Relative error approximation give preconditioners

• Can use iterative methods to reduce error

MULTISCALE ANALYSIS /MULTILEVEL METHODS

[P-Spielman `14] solving linear

systems / computing electrical flows:

(I – A)-1 = (I + A) (I + A2) (I + A4)…

Study of graph properties

via A2, A4, A8, A16…

[Cheng-Cheng-Liu-P-Teng `15]: sparsified Newton’s

method for matrix roots and Gaussian sampling

MATRIX SQUARING

Connectivity More general

Iteration Ai+1 ≈ Ai2 I - Ai+1 ≈ I - Ai

2

Until ║Ad║ small ║Ad║ small

Size Reduction Low degree Sparse graph

Method Derandomized Randomized

Solution transfer Connectivity Solution vectors

• NC algorithm for shortest path

• Logspace connectivity: [Reingold `02]

• Deterministic squaring: [Rozenman-Vadhan `05]

INCORPORATING THE SAMPLING KITCHEN SINK..

[Lee-P-Kyng-Sachedeva-Spielman `15]

O(mlog2n) time sparse Cholesky / multigrid

algorithm by incorporating

• Jacobi iteration on `heavy’ blocks

• Expanders,

CONNECTION LAPLACIANS

Motivated by AC circuits: wires

have both magnitude and phase

Blocks of the form:

U: unitary matrix: UTU = I

APPLICATIONS OF CONNECTION LAPLACIANS

Applications:

• Cryon-electron microscopy

• Phase retrieval

• Image processing

Many combinatorial graph tools

break down due to phase shifts:

cannot turn cycles into paths

Cannot use combinatorial graph

sparsification algorithms

SELF-REDUCTION OF SAMPLING

Nystrom method (on matrices):

• Pick random subset of data

• Compute on subset

• Post-process result

[Cohen-Lee-Musco-Musco-P-Sidford `15]:

random half of the rows give good

probabilities for sampling the original

Can recursively use a

solver to sparsify itself!

τi‘= bi

T(B‘TB‘)-1bi

INTUITIONS ABOUT SELF REDUCTION

Missing: needle in haystack:

• Needles are unique, number

bounded by dimension, n

• Fix mistakes via post-process,

only O(n) more samples

Hay in a haystack:

• Coherent data

• Half the data still

contains same info

Motivation: O(m) MST by [Klein-Karger-Tarjan `93]

QUESTIONS

• Do existing algorithms already

produce sparsifiers?

• Sparsification packages

• Notions of approximation other

than relative condition number?

• Non-linear extensions of

multilevel and sparse Cholesky?

• Sparsify + precondition more

general linear systems?

algorithm frameworks based on structure...

Documents