iterative methods with special structures

36
Iterative methods with special structures David F. Gleich Purdue University David Gleich · Purdue 1

Upload: david-gleich

Post on 15-Jan-2015

134 views

Category:

Technology


0 download

DESCRIPTION

In a talk at the Institute for Physics and Computational Mathematics in Beijing, I discuss a few different types of structure in iterative methods.

TRANSCRIPT

Page 1: Iterative methods with special structures

Iterative methods with special structures

David F. Gleich!Purdue University!

David Gleich · Purdue 1

Page 2: Iterative methods with special structures

Two projects.

1.  Circulant structure in tensors and a linear system solver.

2.  Localized structure in the matrix exponential and relaxation methods.

David Gleich · Purdue 2

Page 3: Iterative methods with special structures

Circulant algebra 40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 2 / 29

IntroductionKilmer, Martin, and Perrone (2008) presented a circulantalgebra: a set of operations that generalize matrix algebra tothree-way data and provided an SVD.

The essence of this approach amounts to viewingthree-dimensional objects as two-dimensional arrays (i.e.,matrices) of one-dimensional arrays (i.e., vectors).

Braman (2010) developed spectraland other decompositions.

We have extended this algebra withthe ingredients required for iterativemethods such as the power methodand Arnoldi method, and have char-acterized the behavior of these algo-rithms.

With Chen Greif and James Varah at UBC.

David Gleich · Purdue 3

Page 4: Iterative methods with special structures

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 4 / 29

Three-way arrays

Given an m ⇥ n ⇥ k table of data,we view this data as an m ⇥ n ma-trix where each “scalar” is a vectorof length k.

A 2 Km⇥nk

We denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.

David Gleich · Purdue 4

Page 5: Iterative methods with special structures

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 5 / 29

Circulants

Circulant matrices are a commutative, closed class under thestandard matrix operations.

2666664

�1 �k . . . �2

�2 �1. . .

......

. . . . . . �k�k . . . �2 �1

3777775

We’ll see more of their properties shortly!

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 6 / 29

The circ operationWe denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.

� = {�1 ... �k} 2 Kk.

� $ circ(�) ⌘

2666664

�1 �k . . . �2

�2 �1. . .

......

. . . . . . �k�k . . . �2 �1

3777775.

�+�$ circ(�)+circ(�) and ���$ circ(�)circ(�);

0 = {0 0 ... 0} 1 = {1 0 ... 0}

Kk is the ring of length-k circulants.David Gleich · Purdue 5

Page 6: Iterative methods with special structures

Scalars to matrix-vector products

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 13 / 29

Operations (cont.)More operations are simplified in the Fourier space too. Letcft(�) = di�g [�1, ..., �k]. Because the �j values are theeigenvalues of circ(�), we have:

abs(�) = icft(di�g [|�1 |, ..., |�k |]),� = icft(di�g [�1, ..., �k]) = icft(cft(�)�), and

angle(�) = icft(di�g [�1/ |�1 |, ..., �k/ |�k |]).

Proofs are simple, e.g. angle(�) � angle(�) = 1Live Matlab demo

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 10 / 29

cft and icftWe define the “Circulant Fourier Transform” or cft

cft : � 2 Kk 7! Ck⇥k

and its inverseicft : Ck⇥k 7! Kk

as follows:

cft(�) ⌘ñ�1

. . .�k

ô= F�circ(�)F,

icft

Çñ�1

. . .�k

ôå⌘ �$F cft(�)F�,

where �j are the eigenvalues of circ(�) as produced in theFourier transform. These transformations satisfyicft(cft(�)) = � and provide a convenient way of movingbetween operations in Kk to the more familiar environment ofdiagonal matrices in Ck⇥k.

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 7 / 29

The circ operation on matrices

A � x =

264

Pnj=1 A1,j � �j

...Pnj=1 Am,j � �j

375$24

circ(A1,1) ... circ(A1,n)...

. . ....

circ(Am,1) ... circ(Am,n)

3524circ(�1)

...circ(�n)

35 .

Define

circ(A) ⌘24

circ(A1,1) ... circ(A1,n)...

. . ....

circ(Am,1) ... circ(Am,n)

35 circ(x) ⌘24circ(�1)

...circ(�n)

35

A � x$ circ(A)circ(x) matrix-vector products.

x � �$ circ(x)circ(�) vector-scalar products

This is equivalent to Kilmer, Martin, Perrone (2008).

David Gleich · Purdue 6

Page 7: Iterative methods with special structures

The special structure

This circulant structure is our special structure for this first problem. We look at two types of iterative methods: 1.  the power method and 2.  the Arnoldi method.

David Gleich · Purdue 7

Page 8: Iterative methods with special structures

A perplexing result! 40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 19 / 29

Example

Run the power method on{2 3 1} {0 0 0}{0 0 0} {3 1 1}

Result � = (1/3) {10 4 4}

David Gleich · Purdue 8

Page 9: Iterative methods with special structures

Some understanding through decoupling 40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 15 / 29

Back to figure

David Gleich · Purdue 9

Page 10: Iterative methods with special structures

Some understanding through decoupling

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 16 / 29

ExampleLet A =î {2 3 1} {8 �2 0}{�2 0 2} {3 1 1}

ó. The result of the circ and cft

operations are:

circ(A) =

26666664

2 1 3 8 0 �23 2 1 �2 8 01 3 2 0 �2 8�2 2 0 3 1 10 �2 2 1 3 12 0 �2 1 1 3

37777775,

(�⌦ F�)circ(A)(�⌦ F) =

266666664

6 6�p3� �9+

p3�p

3� �9�p3�

0 5�3+

p3� 2

�3�p3� 2

377777775,

cft(A) =

266666664

6 60 5

�p3� �9+

p3�

�3+p3� 2 p

3� �9�p3�

�3�p3� 2

377777775.

David Gleich · Purdue 10

Page 11: Iterative methods with special structures

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 20 / 29

Example

A ={2 3 1} {0 0 0}{0 0 0} {3 1 1}

A1 =6 00 5

�, A2 =ñ-�p3 0

0 2

ô, A3 =ñ�p3 00 2

ô.

�1 = icft(di�g [6 2 2]) = (1/3) {10 4 4}�2 = icft(di�g [5 -�

p3 �p3]) = (1/3) {5 2 2}

�3 = icft(di�g [6 -�p3 �p3]) = {2 3 1}

�4 = icft(di�g [5 2 2]) = (1/3) {3 1 1} .The corresponding eigenvectors are

x1 ={1/3 1/3 1/3}{2/3 -1/3 -1/3}

�; x2 ={2/3 -1/3 -1/3}{1/3 1/3 1/3}

�;

x3 ={1 0 0}{0 0 0}

�; x4 =

{0 0 0}{1 0 0}

�.

Some understanding through decoupling

David Gleich · Purdue 11

Page 12: Iterative methods with special structures

Convergence of the power method is in terms of the individual blocks

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 23 / 29

The power method converges

Let A 2 Kn⇥nk have a canonical set of eigenvalues �1, . . . ,�n

where |�1| > |�2|, then the power method in the circulantalgebra convergences to an eigenvector x1 with eigenvalue �1.Where we use the ordering ...

� < �$ cft(�) < cft(�) elementwise

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 21 / 29

Canonical setThere are more eigenvalues

�5 = icft(di�g [6 -�p3 2]) �6 = icft(di�g [6 2 �

p3])

�7 = icft(di�g [5 -�p3 2]) �8 = icft(di�g [5 2 �

p3]),

altogether polynomial number, exceeds dimension of matrix.

Definition. A canonical set of eigenvalues and eigenvectors isa set of minimum size, ordered such thatabs(�1) � abs(�2) � . . . � abs(�k), which contains theinformation to reproduce any eigenvalue or eigenvector of A

In this case, the only canonical set is {(�1,x1), (�2,x2)}. (Needtwo, and have abs(�1) � abs(�2).)

David Gleich · Purdue 12

Page 13: Iterative methods with special structures

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 28 / 29

2000 4000 6000 800010

−15

10−10

10−5

100

✓2 +2 cos( 2⇡/n)

2 +2 cos( ⇡/n)

◆2 i

✓6 +2 cos( 2⇡/n)

6 +2 cos( ⇡/n)

◆2 i

✓6 +2 cos( 2⇡/n)

6 +2 cos( ⇡/n)

◆i

iteration

ma

gn

itud

e

Eigenvalue ErrorEigenvector Change

Figure: The convergence behavior of the powermethod in the circulant algebra. The gray lines showthe error in the each eigenvalue in Fourier space.These curves track the predictions made based onthe eigenvalues as discussed in the text. The redline shows the magnitude of the change in theeigenvector. We use this as the stopping criteria. Italso decays as predicted by the ratio of eigenvalues.The blue fit lines have been visually adjusted tomatch the behavior in the convergence tail.

0 10 20 30 40 50

10−15

10−10

10−5

100

Arnoldi iteration

Magnitu

de

Absolute errorResidual magnitude

Figure: The convergence behavior of a GMRESprocedure using the circulant Arnoldi process. Thegray lines show the error in each Fourier componentand the red line shows the magnitude of theresidual. We observe poor convergence in oneFourier component; until the Arnoldi basis capturesall of the eigenvalues after N/2+ 1 = 26 iterations.These results show how the two computations areperforming individual power methods or Arnoldiprocesses in Fourier space.

The Arnoldi Method Using our repertoire of operations, the Arnoldi method in the circulant algebra is equivalent to individual Arnoldi processes on each matrix. Equivalent to a block Arnoldi process. Using the cft and icft operations, we produce an Arnoldi factorization:

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 24 / 29

The Arnoldi process… Let A be an n⇥ n matrix with real valued entries. Then theArnoldi method is a technique to build an orthogonal basisfor the Krylov subspace Kt(A,v) = span{v,Av, . . . ,At�1v},where v is an initial vector.

… We have the decomposition

AQt = Qt+1Ht+1,t

where Qt is an n⇥ t matrix, and Ht+1,t is a (t + 1)⇥ t upperHessenberg matrix.

… Using our repertoire of operations, the Arnoldi method inthe circulant algebra is equivalent to individual Arnoldiprocesses on each matrix Aj.

… Equivalent to a block Arnoldi process.… Using the cft and icft operations, we produce an Arnoldifactorization:

A �Qt = Qt+1 �Ht+1,t .David Gleich · Purdue 13

Page 14: Iterative methods with special structures

A number of interesting mathematical results from this algebra

1.  A case study of how “decoupled” block iterations arise and are meaningful for an application.

2.  It’s a beautiful algebra. E.g.

40 60 80 100 120

40

60

80

mm

David F. Gleich (Purdue) LA/Opt Seminar 13 / 29

Operations (cont.)More operations are simplified in the Fourier space too. Letcft(�) = di�g [�1, ..., �k]. Because the �j values are theeigenvalues of circ(�), we have:

abs(�) = icft(di�g [|�1 |, ..., |�k |]),� = icft(di�g [�1, ..., �k]) = icft(cft(�)�), and

angle(�) = icft(di�g [�1/ |�1 |, ..., �k/ |�k |]).

Proofs are simple, e.g. angle(�) � angle(�) = 1Live Matlab demo

David Gleich · Purdue 14

Page 15: Iterative methods with special structures

Conclusion to the circulant algebra

Paper available from "https://www.cs.purdue.edu/homes/dgleich/ The power and Arnoldi method in an algebra of circulants, NLA 2013, Gleich, Greif, Varah. Code available from"https://www.cs.purdue.edu/homes/dgleich/codes/camat

David Gleich · Purdue 15

Page 16: Iterative methods with special structures

Project 2

Fast relaxation methods to estimate a column of the martrix exponential. With Kyle Kloster

David Gleich · Purdue 16

Page 17: Iterative methods with special structures

Matrix exponentials

exp(A) is defined as

1X

k=0

1

k !

Ak Always converges

dx

dt= Ax(t) , x(t) = exp(tA)x(0)

Evolution operator "for an ODE

A is n ⇥ n, real

David Gleich · Purdue 17

special case of a function of a matrix f (A)

others are f (x) = 1/x ; f (x) = sinh(x)...

Page 18: Iterative methods with special structures

Matrix exponentials on large networks

exp(A) =

1X

k=0

1

k !

Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs.

[Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality

If P is a transition matrix (column stochastic), then Pk is the probability of a length k walk between node pairs.

[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection

exp(P) =

1X

k=0

1

k !

Pk

David Gleich · Purdue 18

Page 19: Iterative methods with special structures

This talk: a column of the matrix exponential

x = exp(P)ec

x the solution

P the matrix

ec the column

David Gleich · Purdue 19

Page 20: Iterative methods with special structures

This talk: a column of the matrix exponential

x = exp(P)ec

x the solution

P the matrix

ec the column

localized large, sparse, stochastic

David Gleich · Purdue 20

Page 21: Iterative methods with special structures

Uniformly localized "solutions in livejournal

1 2 3 4 5

x 106

0

0.5

1

1.5

nnz = 4815948

magnitu

de

plot(x)

100

101

102

103

104

105

106

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

1−

no

rm e

rro

r

largest non−zeros retained10

010

110

210

310

410

510

6

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

1−

no

rm e

rro

r

largest non−zeros retained

x = exp(P)ec

David Gleich · Purdue 21

nnz(x) = 4, 815, 948

Gleich & Kloster, arXiv:1310.3423

Page 22: Iterative methods with special structures

Our mission!Exploit the structure Find the solution with work "roughly proportional to the "localization, not the matrix.

David Gleich · Purdue 22

Page 23: Iterative methods with special structures

Our algorithms for uniform localization"www.cs.purdue.edu/homes/dgleich/codes/nexpokit

100 101 102 103 104 105 10610−8

10−6

10−4

10−2

100

non−zeros

1−no

rm e

rror

gexpmgexpmqexpmimv

100 101 102 103 104 105 10610−8

10−6

10−4

10−2

100

non−zeros

1−no

rm e

rror

David Gleich · Purdue 23

work = O⇣

log(

1

" )(

1

" )

3/2d2

(log d)

2

⌘nnz = O

⇣log(

1

" )(

1

" )

3/2d(log d)

Page 24: Iterative methods with special structures

Matrix exponentials on large networks Is a single column interesting? Yes!

exp(P)ec =

1X

k=0

1

k !

Pk ec Link prediction scores for node c A community relative to node c

But … modern networks are "large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing …

and so we’d like "speed over accuracy

David Gleich · Purdue 24

Page 25: Iterative methods with special structures

6,$0� 5(9,(:� "��6RFLHW\�IRU�,QGXVWULDO�DQG�$SSOLHG�0DWKHPDWLFV�9RO������1R�����2FWREHU������ �������������������������������

1,1(7((1� '8%,286� :$<6� 72� &20387(�7+(� (;321(17,$/� 2)� $�0$75,; �

&/(9(� 02/(5W� $1'� &+$5/(6� 9$1� /2$1W�

$EVWUDFW��,Q�SULQFLSOH��WKH�H[SRQHQWLDO�RI�D�PDWUL[�FRXOG�EH�FRPSXWHG�LQ�PDQ\�ZD\V��0HWKRGV�LQYROYLQJ�DSSUR[LPDWLRQ�WKHRU\��GLIIHUHQWLDO�HTXDWLRQV��WKH�PDWUL[�HLJHQYDOXHV��DQG�WKH�PDWUL[�FKDUDFWHULVWLF�SRO\��QRPLDO�KDYH�EHHQ�SURSRVHG��,Q�SUDFWLFH��FRQVLGHUDWLRQ�RI�FRPSXWDWLRQDO�VWDELOLW\�DQG�HIILFLHQF\�LQGLFDWHV�WKDW�VRPH�RI�WKH�PHWKRGV�DUH�SUHIHUDEOH�WR�RWKHUV��EXW�WKDW�QRQH�DUH�FRPSOHWHO\�VDWLVIDFWRU\��

���,QWURGXFWLRQ��0DWKHPDWLFDO�PRGHOV� RI� PDQ\�SK\VLFDO��ELRORJLFDO��DQG�HFRQRPLF�SURFHVVHV�LQYROYH�V\VWHPV�RI�OLQHDU��FRQVWDQW�FRHIILFLHQW�RUGLQDU\�GLIIHUHQWLDO�HTXDWLRQV�

[��W�� �$[��W���

+HUH�$� LV�D�JLYHQ��IL[HG��UHDO�RU�FRPSOH[�Q�E\�Q�PDWUL[��$�VROXWLRQ�YHFWRU�[�W��LV�VRXJKW�ZKLFK�VDWLVILHV�DQ�LQLWLDO�FRQGLWLRQ�

[��2�� �[R��

,Q�FRQWURO�WKHRU\��$�LV�NQRZQ�DV�WKH�VWDWH�FRPSDQLRQ�PDWUL[�DQG�[�W��LV�WKH�V\VWHP�UHVSRQVH��

,Q�SULQFLSOH��WKH�VROXWLRQ�LV�JLYHQ�E\�[�W�� � HW$[R�ZKHUH�HW$�FDQ�EH�IRUPDOO\�GHILQHG�E\�WKH�FRQYHUJHQW�SRZHU�VHULHV�

W�$��W$� W$�H� ,�,$�� a������

7KH�HIIHFWLYH�FRPSXWDWLRQ�RI�WKLV�PDWUL[�IXQFWLRQ�LV�WKH�PDLQ�WRSLF�RI�WKLV�VXUYH\��:H�ZLOO�SULPDULO\�EH�FRQFHUQHG�ZLWK�PDWULFHV�ZKRVH�RUGHU�Q�LV�OHVV�WKDQ�D�IHZ�

KXQGUHG��VR�WKDW�DOO�WKH�HOHPHQWV�FDQ�EH�VWRUHG�LQ�WKH�PDLQ�PHPRU\�RI�D�FRQWHPSRUDU\�FRPSXWHU��2XU�GLVFXVVLRQ�ZLOO�EH�OHVV�JHUPDQH�WR�WKH�W\SH�RI�ODUJH��VSDUVH�PDWULFHV�ZKLFK�RFFXU�LQ�WKH�PHWKRG�RI�OLQHV�IRU�SDUWLDO�GLIIHUHQWLDO�HTXDWLRQV��

'R]HQV�RI�PHWKRGV�IRU�FRPSXWLQJ�H�W$FDQ�EH�REWDLQHG�IURP�PRUH�RU�OHVV�FODVVLFDO�UHVXOWV�LQ�DQDO\VLV��DSSUR[LPDWLRQ�WKHRU\��DQG�PDWUL[�WKHRU\��6RPH�RI�WKH�PHWKRGV�KDYH�EHHQ�SURSRVHG�DV�VSHFLILF�DOJRULWKPV��ZKLOH�RWKHUV�DUH�EDVHG�RQ�OHVV�FRQVWUXFWLYH�FKDUDFWHUL]DWLRQV��2XU� ELEOLRJUDSK\�FRQFHQWUDWHV�RQ� UHFHQW�SDSHUV�ZLWK�VWURQJ�DOJRULWKPLF�FRQWHQW��DOWKRXJK�ZH�KDYH�LQFOXGHG�D�IDLU�QXPEHU�RI�UHIHUHQFHV�ZKLFK�SRVVHVV�KLVWRULFDO�RU�WKHRUHWLFDO�LQWHUHVW��

,Q�WKLV�VXUYH\�ZH�WU\�WR�GHVFULEH�DOO�WKH�PHWKRGV�WKDW�DSSHDU�WR�EH�SUDFWLFDO��FODVVLI\�WKHP�LQWR�ILYH�EURDG�FDWHJRULHV��DQG�DVVHVV�WKHLU�UHODWLYH�HIIHFWLYHQHVV��$FWX��DOO\��HDFK�RI�WKH��PHWKRGV��ZKHQ�FRPSOHWHO\�LPSOHPHQWHG�PLJKW�OHDG�WR�PDQ\�GLIIHUHQW�FRPSXWHU�SURJUDPV�ZKLFK�GLIIHU�LQ�YDULRXV�GHWDLOV��0RUHRYHU��WKHVH�GHWDLOV�PLJKW�KDYH�PRUH�LQIOXHQFH�RQ�WKH�DFWXDO�SHUIRUPDQFH�WKDQ�RXU�JURVV�DVVHVVPHQW�LQGLFDWHV��7KXV��RXU�FRPPHQWV�PD\�QRW�GLUHFWO\�DSSO\�WR�SDUWLFXODU�VXEURXWLQHV��

,Q�DVVHVVLQJ�WKH�HIIHFWLYHQHVV�RI�YDULRXV�DOJRULWKPV�ZH�ZLOO�EH�FRQFHUQHG�ZLWK�WKH�IROORZLQJ�DWWULEXWHV��OLVWHG�LQ�GHFUHDVLQJ�RUGHU�RI�LPSRUWDQFH��JHQHUDOLW\��UHOLDELOLW\��

�5HFHLYHG�E\�WKH�HGLWRUV�-XO\�����������DQG�LQ�UHYLVHG�IRUP�0DUFK�����������W�'HSDUWPHQW�RI�0DWKHPDWLFV��8QLYHUVLW\�RI�1HZ�0H[LFR��$OEXTXHUTXH��1HZ�0H[LFR��������7KLV�

ZRUN�ZDV�SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������W�'HSDUWPHQW�RI�&RPSXWHU�6FLHQFH��&RUQHOO�8QLYHUVLW\��,WKDFD��1HZ�<RUN��������7KLV�ZRUN�ZDV�

SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������

����

This content downloaded from 128.210.126.199 on Sun, 28 Jul 2013 21:30:56 PMAll use subject to JSTOR Terms and Conditions

SIAM REVIEW c⃝ 2003 Society for Industrial and Applied MathematicsVol. 45, No. 1, pp. 3–49

Nineteen Dubious Ways toCompute the Exponential of aMatrix, Twenty-Five Years Later∗

Cleve Moler†

Charles Van Loan‡

Abstract. In principle, the exponential of a matrix could be computed in many ways. Methods involv-ing approximation theory, differential equations, the matrix eigenvalues, and the matrixcharacteristic polynomial have been proposed. In practice, consideration of computationalstability and efficiency indicates that some of the methods are preferable to others butthat none are completely satisfactory.

Most of this paper was originally published in 1978. An update, with a separatebibliography, describes a few recent developments.

Key words. matrix, exponential, roundoff error, truncation error, condition

AMS subject classifications. 15A15, 65F15, 65F30, 65L99

PII. S0036144502418010

1. Introduction. Mathematical models of many physical, biological, and eco-nomic processes involve systems of linear, constant coefficient ordinary differentialequations

x(t) = Ax(t).

Here A is a given, fixed, real or complex n-by-n matrix. A solution vector x(t) issought which satisfies an initial condition

x(0) = x0.

In control theory, A is known as the state companion matrix and x(t) is the systemresponse.

In principle, the solution is given by x(t) = etAx0, where etA can be formallydefined by the convergent power series

etA = I + tA +t2A2

2!+ · · · .

The effective computation of this matrix function is the main topic of this survey.

∗Published electronically February 3, 2003. A portion of this paper originally appeared in SIAMReview, Volume 20, Number 4, 1978, pages 801–836.

http://www.siam.org/journals/sirev/45-1/41801.html†The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 ([email protected]).‡Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853-7501

([email protected]).

3

Dow

nloa

ded

07/2

8/13

to 1

28.2

10.1

26.1

99. R

edis

tribu

tion

subj

ect t

o SI

AM

lice

nse

or c

opyr

ight

; see

http

://w

ww

.siam

.org

/jour

nals

/ojs

a.ph

p

David Gleich · Purdue 25

Page 26: Iterative methods with special structures

Our underlying method

Direct expansion!A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P!

"… no cancellation, unbounded norm, etc. !!

David Gleich · Purdue 26

x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 27: Iterative methods with special structures

Our underlying method "as a linear system Direct expansion! "!!!

David Gleich · Purdue 27

x = exp(P)ec ⇡PN

k=0

1

k !

P

kec = xN

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Lemma we approximate xN well if we approximate v well

Page 28: Iterative methods with special structures

Our mission (2)!Approximately solve " when A, b are sparse,"x is localized.

David Gleich · Purdue 28

Ax = b

Page 29: Iterative methods with special structures

Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods

David Gleich · Purdue 29

Algebraically! Procedurally!

Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j)

Ax = b

r

(k ) = b � Ax

(k )

x

(k+1) = x

(k ) + ejeTj r

(k )

r

(k+1) = r

(k ) � r (k )j Aej

Page 30: Iterative methods with special structures

Back to the exponential

David Gleich · Purdue 30

2

6666664

III�P/1 III

�P/2. . .. . . III

�P/N III

3

7777775

2

6666664

v0v1......

vN

3

7777775=

2

6666664

ec0......0

3

7777775xN =

NX

i=0

vi

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN

Page 31: Iterative methods with special structures

Error analysis for Gauss-Southwell

David Gleich · Purdue 31

Theorem

Assume P is column-stochastic, v

(0)

= 0.

(Nonnegativity)

iterates and residuals are nonnegative

v

(l) � 0 and r

(l) � 0

(Convergence)

residual goes to 0:

kr

(l)k1

Q

l

k=1

�1 � 1

2dk

� l

(

� 1

2d

)

(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec

“easy”

“annoying” d is the

largest degree

Page 32: Iterative methods with special structures

Proof sketch

Gauss-Southwell picks largest residual ⇒  Bound the update by avg. nonzeros in residual (sloppy) ⇒  Algebraic convergence with slow rate, but each update is

REALLY fast O(d max log n).

If d is log log n, then our method runs in sub-linear time "(but so does just about anything)

David Gleich · Purdue 32

Page 33: Iterative methods with special structures

Overall error analysis

David Gleich · Purdue 33

Components!Truncation to N terms Residual to error Approximate solve

Theorem kxN(`) � xk1 1

N!N+

1e· `� 1

2d

After ℓ steps of Gauss-Southwell

Page 34: Iterative methods with special structures

More recent error analysis

David Gleich · Purdue 34

Theorem (Gleich and Kloster, 2013 arXiv:1310.3423)" Consider computing the matrix exponential using the Gauss-Southwell relaxation method in a graph with a Zipf-law in the degrees with exponent p=1 and max-degree d, then the work involved in getting a solution with 1-norm error ε is

work = O⇣

log(

1

" )(

1

" )

3/2d2

(log d)

2

Page 35: Iterative methods with special structures

Problem size &"Runtimes

106

107

108

109

1010

10−4

10−2

100

102

104

|V|+ nnz(P)

runtim

e (

s)

expmvhalfgexpmqgexpmexpmimv

Figure 7 – The median runtime of our methods for the seven graphs over 100 trials (only 50 trials

for the largest two datasets). The coordinate relaxation methods have highly variable runtime,

but can be very fast on graphs such as webbase (the point nearest 10

9on the x-axis). We did

not run the gexpm function for matrices larger than the livejournal graph.

104

106

108

10−4

10−3

10−2

10−1

time

in s

eco

nd

s

graph size10

410

610

810

−4

10−2

100

102

104

time

in s

eco

nd

s

graph size10

610

810

−4

10−2

100

102

104

time

in s

eco

nd

s

max−degree squared

Figure 8 – The distribution of runtimes for the gexpm method on two forest-fire graphs (left:

p = 0.4, middle, p = 0.48) of various graph sizes, where graph size is computed as the sum of

the number of vertices and the number of non-zeros in the adjacency matrix. The thick line is

the median runtime over 50 trials, and the shaded region shows the 25% to 75% quartiles (the

shaded region is very tight for the second two figures). The final plot shows the relationship

between the max-degree squared and the runtime in seconds. This figure shows that the runtime

scales in a nearly linear relationship with the max-degree squared, as predicted by our theory.

The large deviations from the line of best fit might be explained by the fact that only a single

forest-fire graph was generated for each graph size.

30

Table 3 – The real-world datasets we use in our experiments span three orders of magnitude in

size.

Graph |V | nnz(P ) nnz(P )/|V |

itdk0304 190,914 1,215,220 6.37

dblp-2010 226,413 1,432,920 6.33

flickr-scc 527,476 9,357,071 17.74

ljournal-2008 5,363,260 77,991,514 14.54

webbase-2001 118,142,155 1,019,903,190 8.63

twitter-2010 33,479,734 1,394,440,635 41.65

friendster 65,608,366 3,612,134,270 55.06

Real-world networks The datasets used are summarized in Table 3. They include

a version of the flickr graph from [Bonchi et al., 2012] containing just the largest

strongly-connected component of the original graph; dblp-2010 from [Boldi et al., 2011],

itdk0304 in [(The Cooperative Association for Internet Data Analyais), 2005], ljournal-

2008 from [Boldi et al., 2011, Chierichetti et al., 2009], twitter-2010 [Kwak et al., 2010]

webbase-2001 from [Hirai et al., 2000, Boldi and Vigna, 2005], and the friendster graph

in [Yang and Leskovec, 2012].

Implementation details All experiments were performed on either a dual processor

Xeon e5-2670 system with 16 cores (total) and 256GB of RAM or a single processor

Intel i7-990X, 3.47 GHz CPU and 24 GB of RAM. Our algorithms were implemented in

C++ using the Matlab MEX interface. All data structures used are memory-e�cient:

the solution and residual are stored as hash tables using Google’s sparsehash pack-

age. The precise code for the algorithms and the experiments below are available via

https://www.cs.purdue.edu/homes/dgleich/codes/nexpokit/.

Comparison We compare our implementation with a state-of-the-art Matlab function

for computing the exponential of a matrix times a vector, expmv [Al-Mohy and Higham,

2011]. We customized this method with knowledge that kP k1

= 1. This single change

results in a great improvement to the runtime of their code. In each experiment, we

use as the “true solution” the result of a call to expmv using the ‘single’ option, which

guarantees a 1-norm error bounded by 2�24, or, for smaller problems, we use a Taylor

approximation with the number of terms predicted by Lemma 12.

6.1 Accuracy on Large Entries

When both gexpm and gexpmq terminate, they satisfy a 1-norm error of ". Many appli-

cations do not require precise solution values but instead would like the correct set of

25

David Gleich · Purdue 35

Page 36: Iterative methods with special structures

References and ongoing work

Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013. Also see the journal version on arXiv. www.cs.purdue.edu/homes/dgleich/codes/nexpokit

•  Error analysis using the queue (almost done …) •  Better linear systems for faster convergence •  Asynchronous coordinate descent methods •  Scaling up to billion node graphs (done …)

David Gleich · Purdue 36

Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich