iterative methods with special structures
DESCRIPTION
In a talk at the Institute for Physics and Computational Mathematics in Beijing, I discuss a few different types of structure in iterative methods.TRANSCRIPT
Iterative methods with special structures
David F. Gleich!Purdue University!
David Gleich · Purdue 1
Two projects.
1. Circulant structure in tensors and a linear system solver.
2. Localized structure in the matrix exponential and relaxation methods.
David Gleich · Purdue 2
Circulant algebra 40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 2 / 29
IntroductionKilmer, Martin, and Perrone (2008) presented a circulantalgebra: a set of operations that generalize matrix algebra tothree-way data and provided an SVD.
The essence of this approach amounts to viewingthree-dimensional objects as two-dimensional arrays (i.e.,matrices) of one-dimensional arrays (i.e., vectors).
Braman (2010) developed spectraland other decompositions.
We have extended this algebra withthe ingredients required for iterativemethods such as the power methodand Arnoldi method, and have char-acterized the behavior of these algo-rithms.
With Chen Greif and James Varah at UBC.
David Gleich · Purdue 3
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 4 / 29
Three-way arrays
Given an m ⇥ n ⇥ k table of data,we view this data as an m ⇥ n ma-trix where each “scalar” is a vectorof length k.
A 2 Km⇥nk
We denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.
David Gleich · Purdue 4
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 5 / 29
Circulants
Circulant matrices are a commutative, closed class under thestandard matrix operations.
2666664
�1 �k . . . �2
�2 �1. . .
......
. . . . . . �k�k . . . �2 �1
3777775
We’ll see more of their properties shortly!
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 6 / 29
The circ operationWe denote the space of length-k scalars as Kk.These scalars interact like circulant matrices.
� = {�1 ... �k} 2 Kk.
� $ circ(�) ⌘
2666664
�1 �k . . . �2
�2 �1. . .
......
. . . . . . �k�k . . . �2 �1
3777775.
�+�$ circ(�)+circ(�) and ���$ circ(�)circ(�);
0 = {0 0 ... 0} 1 = {1 0 ... 0}
Kk is the ring of length-k circulants.David Gleich · Purdue 5
Scalars to matrix-vector products
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 13 / 29
Operations (cont.)More operations are simplified in the Fourier space too. Letcft(�) = di�g [�1, ..., �k]. Because the �j values are theeigenvalues of circ(�), we have:
abs(�) = icft(di�g [|�1 |, ..., |�k |]),� = icft(di�g [�1, ..., �k]) = icft(cft(�)�), and
angle(�) = icft(di�g [�1/ |�1 |, ..., �k/ |�k |]).
Proofs are simple, e.g. angle(�) � angle(�) = 1Live Matlab demo
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 10 / 29
cft and icftWe define the “Circulant Fourier Transform” or cft
cft : � 2 Kk 7! Ck⇥k
and its inverseicft : Ck⇥k 7! Kk
as follows:
cft(�) ⌘ñ�1
. . .�k
ô= F�circ(�)F,
icft
Çñ�1
. . .�k
ôå⌘ �$F cft(�)F�,
where �j are the eigenvalues of circ(�) as produced in theFourier transform. These transformations satisfyicft(cft(�)) = � and provide a convenient way of movingbetween operations in Kk to the more familiar environment ofdiagonal matrices in Ck⇥k.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 7 / 29
The circ operation on matrices
A � x =
264
Pnj=1 A1,j � �j
...Pnj=1 Am,j � �j
375$24
circ(A1,1) ... circ(A1,n)...
. . ....
circ(Am,1) ... circ(Am,n)
3524circ(�1)
...circ(�n)
35 .
Define
circ(A) ⌘24
circ(A1,1) ... circ(A1,n)...
. . ....
circ(Am,1) ... circ(Am,n)
35 circ(x) ⌘24circ(�1)
...circ(�n)
35
A � x$ circ(A)circ(x) matrix-vector products.
x � �$ circ(x)circ(�) vector-scalar products
This is equivalent to Kilmer, Martin, Perrone (2008).
David Gleich · Purdue 6
The special structure
This circulant structure is our special structure for this first problem. We look at two types of iterative methods: 1. the power method and 2. the Arnoldi method.
David Gleich · Purdue 7
A perplexing result! 40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 19 / 29
Example
Run the power method on{2 3 1} {0 0 0}{0 0 0} {3 1 1}
�
Result � = (1/3) {10 4 4}
David Gleich · Purdue 8
Some understanding through decoupling 40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 15 / 29
Back to figure
David Gleich · Purdue 9
Some understanding through decoupling
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 16 / 29
ExampleLet A =î {2 3 1} {8 �2 0}{�2 0 2} {3 1 1}
ó. The result of the circ and cft
operations are:
circ(A) =
26666664
2 1 3 8 0 �23 2 1 �2 8 01 3 2 0 �2 8�2 2 0 3 1 10 �2 2 1 3 12 0 �2 1 1 3
37777775,
(�⌦ F�)circ(A)(�⌦ F) =
266666664
6 6�p3� �9+
p3�p
3� �9�p3�
0 5�3+
p3� 2
�3�p3� 2
377777775,
cft(A) =
266666664
6 60 5
�p3� �9+
p3�
�3+p3� 2 p
3� �9�p3�
�3�p3� 2
377777775.
David Gleich · Purdue 10
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 20 / 29
Example
A ={2 3 1} {0 0 0}{0 0 0} {3 1 1}
�
A1 =6 00 5
�, A2 =ñ-�p3 0
0 2
ô, A3 =ñ�p3 00 2
ô.
�1 = icft(di�g [6 2 2]) = (1/3) {10 4 4}�2 = icft(di�g [5 -�
p3 �p3]) = (1/3) {5 2 2}
�3 = icft(di�g [6 -�p3 �p3]) = {2 3 1}
�4 = icft(di�g [5 2 2]) = (1/3) {3 1 1} .The corresponding eigenvectors are
x1 ={1/3 1/3 1/3}{2/3 -1/3 -1/3}
�; x2 ={2/3 -1/3 -1/3}{1/3 1/3 1/3}
�;
x3 ={1 0 0}{0 0 0}
�; x4 =
{0 0 0}{1 0 0}
�.
Some understanding through decoupling
David Gleich · Purdue 11
Convergence of the power method is in terms of the individual blocks
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 23 / 29
The power method converges
Let A 2 Kn⇥nk have a canonical set of eigenvalues �1, . . . ,�n
where |�1| > |�2|, then the power method in the circulantalgebra convergences to an eigenvector x1 with eigenvalue �1.Where we use the ordering ...
� < �$ cft(�) < cft(�) elementwise
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 21 / 29
Canonical setThere are more eigenvalues
�5 = icft(di�g [6 -�p3 2]) �6 = icft(di�g [6 2 �
p3])
�7 = icft(di�g [5 -�p3 2]) �8 = icft(di�g [5 2 �
p3]),
altogether polynomial number, exceeds dimension of matrix.
Definition. A canonical set of eigenvalues and eigenvectors isa set of minimum size, ordered such thatabs(�1) � abs(�2) � . . . � abs(�k), which contains theinformation to reproduce any eigenvalue or eigenvector of A
In this case, the only canonical set is {(�1,x1), (�2,x2)}. (Needtwo, and have abs(�1) � abs(�2).)
David Gleich · Purdue 12
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 28 / 29
2000 4000 6000 800010
−15
10−10
10−5
100
✓2 +2 cos( 2⇡/n)
2 +2 cos( ⇡/n)
◆2 i
✓6 +2 cos( 2⇡/n)
6 +2 cos( ⇡/n)
◆2 i
✓6 +2 cos( 2⇡/n)
6 +2 cos( ⇡/n)
◆i
iteration
ma
gn
itud
e
Eigenvalue ErrorEigenvector Change
Figure: The convergence behavior of the powermethod in the circulant algebra. The gray lines showthe error in the each eigenvalue in Fourier space.These curves track the predictions made based onthe eigenvalues as discussed in the text. The redline shows the magnitude of the change in theeigenvector. We use this as the stopping criteria. Italso decays as predicted by the ratio of eigenvalues.The blue fit lines have been visually adjusted tomatch the behavior in the convergence tail.
0 10 20 30 40 50
10−15
10−10
10−5
100
Arnoldi iteration
Magnitu
de
Absolute errorResidual magnitude
Figure: The convergence behavior of a GMRESprocedure using the circulant Arnoldi process. Thegray lines show the error in each Fourier componentand the red line shows the magnitude of theresidual. We observe poor convergence in oneFourier component; until the Arnoldi basis capturesall of the eigenvalues after N/2+ 1 = 26 iterations.These results show how the two computations areperforming individual power methods or Arnoldiprocesses in Fourier space.
The Arnoldi Method Using our repertoire of operations, the Arnoldi method in the circulant algebra is equivalent to individual Arnoldi processes on each matrix. Equivalent to a block Arnoldi process. Using the cft and icft operations, we produce an Arnoldi factorization:
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 24 / 29
The Arnoldi process… Let A be an n⇥ n matrix with real valued entries. Then theArnoldi method is a technique to build an orthogonal basisfor the Krylov subspace Kt(A,v) = span{v,Av, . . . ,At�1v},where v is an initial vector.
… We have the decomposition
AQt = Qt+1Ht+1,t
where Qt is an n⇥ t matrix, and Ht+1,t is a (t + 1)⇥ t upperHessenberg matrix.
… Using our repertoire of operations, the Arnoldi method inthe circulant algebra is equivalent to individual Arnoldiprocesses on each matrix Aj.
… Equivalent to a block Arnoldi process.… Using the cft and icft operations, we produce an Arnoldifactorization:
A �Qt = Qt+1 �Ht+1,t .David Gleich · Purdue 13
A number of interesting mathematical results from this algebra
1. A case study of how “decoupled” block iterations arise and are meaningful for an application.
2. It’s a beautiful algebra. E.g.
40 60 80 100 120
40
60
80
mm
David F. Gleich (Purdue) LA/Opt Seminar 13 / 29
Operations (cont.)More operations are simplified in the Fourier space too. Letcft(�) = di�g [�1, ..., �k]. Because the �j values are theeigenvalues of circ(�), we have:
abs(�) = icft(di�g [|�1 |, ..., |�k |]),� = icft(di�g [�1, ..., �k]) = icft(cft(�)�), and
angle(�) = icft(di�g [�1/ |�1 |, ..., �k/ |�k |]).
Proofs are simple, e.g. angle(�) � angle(�) = 1Live Matlab demo
David Gleich · Purdue 14
Conclusion to the circulant algebra
Paper available from "https://www.cs.purdue.edu/homes/dgleich/ The power and Arnoldi method in an algebra of circulants, NLA 2013, Gleich, Greif, Varah. Code available from"https://www.cs.purdue.edu/homes/dgleich/codes/camat
David Gleich · Purdue 15
Project 2
Fast relaxation methods to estimate a column of the martrix exponential. With Kyle Kloster
David Gleich · Purdue 16
Matrix exponentials
exp(A) is defined as
1X
k=0
1
k !
Ak Always converges
dx
dt= Ax(t) , x(t) = exp(tA)x(0)
Evolution operator "for an ODE
A is n ⇥ n, real
David Gleich · Purdue 17
special case of a function of a matrix f (A)
others are f (x) = 1/x ; f (x) = sinh(x)...
Matrix exponentials on large networks
exp(A) =
1X
k=0
1
k !
Ak If A is the adjacency matrix, then Ak counts the number of length k paths between node pairs.
[Estrada 2000, Farahat et al. 2002, 2006] Large entries denote important nodes or edges. Used for link prediction and centrality
If P is a transition matrix (column stochastic), then Pk is the probability of a length k walk between node pairs.
[Kondor & Lafferty 2002, Kunegis & Lommatzsch 2009, Chung 2007] Used for link prediction, kernels, and clustering or community detection
exp(P) =
1X
k=0
1
k !
Pk
David Gleich · Purdue 18
This talk: a column of the matrix exponential
x = exp(P)ec
x the solution
P the matrix
ec the column
David Gleich · Purdue 19
This talk: a column of the matrix exponential
x = exp(P)ec
x the solution
P the matrix
ec the column
localized large, sparse, stochastic
David Gleich · Purdue 20
Uniformly localized "solutions in livejournal
1 2 3 4 5
x 106
0
0.5
1
1.5
nnz = 4815948
magnitu
de
plot(x)
100
101
102
103
104
105
106
10−14
10−12
10−10
10−8
10−6
10−4
10−2
100
1−
no
rm e
rro
r
largest non−zeros retained10
010
110
210
310
410
510
6
10−14
10−12
10−10
10−8
10−6
10−4
10−2
100
1−
no
rm e
rro
r
largest non−zeros retained
x = exp(P)ec
David Gleich · Purdue 21
nnz(x) = 4, 815, 948
Gleich & Kloster, arXiv:1310.3423
Our mission!Exploit the structure Find the solution with work "roughly proportional to the "localization, not the matrix.
David Gleich · Purdue 22
Our algorithms for uniform localization"www.cs.purdue.edu/homes/dgleich/codes/nexpokit
100 101 102 103 104 105 10610−8
10−6
10−4
10−2
100
non−zeros
1−no
rm e
rror
gexpmgexpmqexpmimv
100 101 102 103 104 105 10610−8
10−6
10−4
10−2
100
non−zeros
1−no
rm e
rror
David Gleich · Purdue 23
work = O⇣
log(
1
" )(
1
" )
3/2d2
(log d)
2
⌘nnz = O
⇣log(
1
" )(
1
" )
3/2d(log d)
⌘
Matrix exponentials on large networks Is a single column interesting? Yes!
exp(P)ec =
1X
k=0
1
k !
Pk ec Link prediction scores for node c A community relative to node c
But … modern networks are "large ~ O(109) nodes, sparse ~ O(1011) edges, constantly changing …
and so we’d like "speed over accuracy
David Gleich · Purdue 24
6,$0� 5(9,(:� "��6RFLHW\�IRU�,QGXVWULDO�DQG�$SSOLHG�0DWKHPDWLFV�9RO������1R�����2FWREHU������ �������������������������������
1,1(7((1� '8%,286� :$<6� 72� &20387(�7+(� (;321(17,$/� 2)� $�0$75,; �
&/(9(� 02/(5W� $1'� &+$5/(6� 9$1� /2$1W�
$EVWUDFW��,Q�SULQFLSOH��WKH�H[SRQHQWLDO�RI�D�PDWUL[�FRXOG�EH�FRPSXWHG�LQ�PDQ\�ZD\V��0HWKRGV�LQYROYLQJ�DSSUR[LPDWLRQ�WKHRU\��GLIIHUHQWLDO�HTXDWLRQV��WKH�PDWUL[�HLJHQYDOXHV��DQG�WKH�PDWUL[�FKDUDFWHULVWLF�SRO\��QRPLDO�KDYH�EHHQ�SURSRVHG��,Q�SUDFWLFH��FRQVLGHUDWLRQ�RI�FRPSXWDWLRQDO�VWDELOLW\�DQG�HIILFLHQF\�LQGLFDWHV�WKDW�VRPH�RI�WKH�PHWKRGV�DUH�SUHIHUDEOH�WR�RWKHUV��EXW�WKDW�QRQH�DUH�FRPSOHWHO\�VDWLVIDFWRU\��
���,QWURGXFWLRQ��0DWKHPDWLFDO�PRGHOV� RI� PDQ\�SK\VLFDO��ELRORJLFDO��DQG�HFRQRPLF�SURFHVVHV�LQYROYH�V\VWHPV�RI�OLQHDU��FRQVWDQW�FRHIILFLHQW�RUGLQDU\�GLIIHUHQWLDO�HTXDWLRQV�
[��W�� �$[��W���
+HUH�$� LV�D�JLYHQ��IL[HG��UHDO�RU�FRPSOH[�Q�E\�Q�PDWUL[��$�VROXWLRQ�YHFWRU�[�W��LV�VRXJKW�ZKLFK�VDWLVILHV�DQ�LQLWLDO�FRQGLWLRQ�
[��2�� �[R��
,Q�FRQWURO�WKHRU\��$�LV�NQRZQ�DV�WKH�VWDWH�FRPSDQLRQ�PDWUL[�DQG�[�W��LV�WKH�V\VWHP�UHVSRQVH��
,Q�SULQFLSOH��WKH�VROXWLRQ�LV�JLYHQ�E\�[�W�� � HW$[R�ZKHUH�HW$�FDQ�EH�IRUPDOO\�GHILQHG�E\�WKH�FRQYHUJHQW�SRZHU�VHULHV�
W�$��W$� W$�H� ,�,$�� a������
7KH�HIIHFWLYH�FRPSXWDWLRQ�RI�WKLV�PDWUL[�IXQFWLRQ�LV�WKH�PDLQ�WRSLF�RI�WKLV�VXUYH\��:H�ZLOO�SULPDULO\�EH�FRQFHUQHG�ZLWK�PDWULFHV�ZKRVH�RUGHU�Q�LV�OHVV�WKDQ�D�IHZ�
KXQGUHG��VR�WKDW�DOO�WKH�HOHPHQWV�FDQ�EH�VWRUHG�LQ�WKH�PDLQ�PHPRU\�RI�D�FRQWHPSRUDU\�FRPSXWHU��2XU�GLVFXVVLRQ�ZLOO�EH�OHVV�JHUPDQH�WR�WKH�W\SH�RI�ODUJH��VSDUVH�PDWULFHV�ZKLFK�RFFXU�LQ�WKH�PHWKRG�RI�OLQHV�IRU�SDUWLDO�GLIIHUHQWLDO�HTXDWLRQV��
'R]HQV�RI�PHWKRGV�IRU�FRPSXWLQJ�H�W$FDQ�EH�REWDLQHG�IURP�PRUH�RU�OHVV�FODVVLFDO�UHVXOWV�LQ�DQDO\VLV��DSSUR[LPDWLRQ�WKHRU\��DQG�PDWUL[�WKHRU\��6RPH�RI�WKH�PHWKRGV�KDYH�EHHQ�SURSRVHG�DV�VSHFLILF�DOJRULWKPV��ZKLOH�RWKHUV�DUH�EDVHG�RQ�OHVV�FRQVWUXFWLYH�FKDUDFWHUL]DWLRQV��2XU� ELEOLRJUDSK\�FRQFHQWUDWHV�RQ� UHFHQW�SDSHUV�ZLWK�VWURQJ�DOJRULWKPLF�FRQWHQW��DOWKRXJK�ZH�KDYH�LQFOXGHG�D�IDLU�QXPEHU�RI�UHIHUHQFHV�ZKLFK�SRVVHVV�KLVWRULFDO�RU�WKHRUHWLFDO�LQWHUHVW��
,Q�WKLV�VXUYH\�ZH�WU\�WR�GHVFULEH�DOO�WKH�PHWKRGV�WKDW�DSSHDU�WR�EH�SUDFWLFDO��FODVVLI\�WKHP�LQWR�ILYH�EURDG�FDWHJRULHV��DQG�DVVHVV�WKHLU�UHODWLYH�HIIHFWLYHQHVV��$FWX��DOO\��HDFK�RI�WKH��PHWKRGV��ZKHQ�FRPSOHWHO\�LPSOHPHQWHG�PLJKW�OHDG�WR�PDQ\�GLIIHUHQW�FRPSXWHU�SURJUDPV�ZKLFK�GLIIHU�LQ�YDULRXV�GHWDLOV��0RUHRYHU��WKHVH�GHWDLOV�PLJKW�KDYH�PRUH�LQIOXHQFH�RQ�WKH�DFWXDO�SHUIRUPDQFH�WKDQ�RXU�JURVV�DVVHVVPHQW�LQGLFDWHV��7KXV��RXU�FRPPHQWV�PD\�QRW�GLUHFWO\�DSSO\�WR�SDUWLFXODU�VXEURXWLQHV��
,Q�DVVHVVLQJ�WKH�HIIHFWLYHQHVV�RI�YDULRXV�DOJRULWKPV�ZH�ZLOO�EH�FRQFHUQHG�ZLWK�WKH�IROORZLQJ�DWWULEXWHV��OLVWHG�LQ�GHFUHDVLQJ�RUGHU�RI�LPSRUWDQFH��JHQHUDOLW\��UHOLDELOLW\��
�5HFHLYHG�E\�WKH�HGLWRUV�-XO\�����������DQG�LQ�UHYLVHG�IRUP�0DUFK�����������W�'HSDUWPHQW�RI�0DWKHPDWLFV��8QLYHUVLW\�RI�1HZ�0H[LFR��$OEXTXHUTXH��1HZ�0H[LFR��������7KLV�
ZRUN�ZDV�SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������W�'HSDUWPHQW�RI�&RPSXWHU�6FLHQFH��&RUQHOO�8QLYHUVLW\��,WKDFD��1HZ�<RUN��������7KLV�ZRUN�ZDV�
SDUWLDOO\�VXSSRUWHG�E\�16)�*UDQW�0&6����������
����
This content downloaded from 128.210.126.199 on Sun, 28 Jul 2013 21:30:56 PMAll use subject to JSTOR Terms and Conditions
SIAM REVIEW c⃝ 2003 Society for Industrial and Applied MathematicsVol. 45, No. 1, pp. 3–49
Nineteen Dubious Ways toCompute the Exponential of aMatrix, Twenty-Five Years Later∗
Cleve Moler†
Charles Van Loan‡
Abstract. In principle, the exponential of a matrix could be computed in many ways. Methods involv-ing approximation theory, differential equations, the matrix eigenvalues, and the matrixcharacteristic polynomial have been proposed. In practice, consideration of computationalstability and efficiency indicates that some of the methods are preferable to others butthat none are completely satisfactory.
Most of this paper was originally published in 1978. An update, with a separatebibliography, describes a few recent developments.
Key words. matrix, exponential, roundoff error, truncation error, condition
AMS subject classifications. 15A15, 65F15, 65F30, 65L99
PII. S0036144502418010
1. Introduction. Mathematical models of many physical, biological, and eco-nomic processes involve systems of linear, constant coefficient ordinary differentialequations
x(t) = Ax(t).
Here A is a given, fixed, real or complex n-by-n matrix. A solution vector x(t) issought which satisfies an initial condition
x(0) = x0.
In control theory, A is known as the state companion matrix and x(t) is the systemresponse.
In principle, the solution is given by x(t) = etAx0, where etA can be formallydefined by the convergent power series
etA = I + tA +t2A2
2!+ · · · .
The effective computation of this matrix function is the main topic of this survey.
∗Published electronically February 3, 2003. A portion of this paper originally appeared in SIAMReview, Volume 20, Number 4, 1978, pages 801–836.
http://www.siam.org/journals/sirev/45-1/41801.html†The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 ([email protected]).‡Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, NY 14853-7501
3
Dow
nloa
ded
07/2
8/13
to 1
28.2
10.1
26.1
99. R
edis
tribu
tion
subj
ect t
o SI
AM
lice
nse
or c
opyr
ight
; see
http
://w
ww
.siam
.org
/jour
nals
/ojs
a.ph
p
David Gleich · Purdue 25
Our underlying method
Direct expansion!A few matvecs, quick loss of sparsity due to fill-in This method is stable for stochastic P!
"… no cancellation, unbounded norm, etc. !!
David Gleich · Purdue 26
x = exp(P)ec ⇡PN
k=0
1
k !
P
kec = xN
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Our underlying method "as a linear system Direct expansion! "!!!
David Gleich · Purdue 27
x = exp(P)ec ⇡PN
k=0
1
k !
P
kec = xN
2
6666664
III�P/1 III
�P/2. . .. . . III
�P/N III
3
7777775
2
6666664
v0v1......
vN
3
7777775=
2
6666664
ec0......0
3
7777775xN =
NX
i=0
vi
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
Lemma we approximate xN well if we approximate v well
Our mission (2)!Approximately solve " when A, b are sparse,"x is localized.
David Gleich · Purdue 28
Ax = b
Coordinate descent, Gauss-Southwell, Gauss-Seidel, relaxation & “push” methods
David Gleich · Purdue 29
Algebraically! Procedurally!
Solve(A,b) x = sparse(size(A,1),1) r = b While (1) Pick j where r(j) != 0 z = r(j) x(j) = x(j) + r(j) For i where A(i,j) != 0 r(i) = r(i) – z*A(i,j)
Ax = b
r
(k ) = b � Ax
(k )
x
(k+1) = x
(k ) + ejeTj r
(k )
r
(k+1) = r
(k ) � r (k )j Aej
Back to the exponential
David Gleich · Purdue 30
2
6666664
III�P/1 III
�P/2. . .. . . III
�P/N III
3
7777775
2
6666664
v0v1......
vN
3
7777775=
2
6666664
ec0......0
3
7777775xN =
NX
i=0
vi
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
Solve this system via the same method. Optimization 1 build system implicitly Optimization 2 don’t store vi, just store sum xN
Error analysis for Gauss-Southwell
David Gleich · Purdue 31
Theorem
Assume P is column-stochastic, v
(0)
= 0.
(Nonnegativity)
iterates and residuals are nonnegative
v
(l) � 0 and r
(l) � 0
(Convergence)
residual goes to 0:
kr
(l)k1
Q
l
k=1
�1 � 1
2dk
� l
(
� 1
2d
)
(III⌦ IIIN � SN ⌦ P)v = e1 ⌦ ec
“easy”
“annoying” d is the
largest degree
Proof sketch
Gauss-Southwell picks largest residual ⇒ Bound the update by avg. nonzeros in residual (sloppy) ⇒ Algebraic convergence with slow rate, but each update is
REALLY fast O(d max log n).
If d is log log n, then our method runs in sub-linear time "(but so does just about anything)
David Gleich · Purdue 32
Overall error analysis
David Gleich · Purdue 33
Components!Truncation to N terms Residual to error Approximate solve
Theorem kxN(`) � xk1 1
N!N+
1e· `� 1
2d
After ℓ steps of Gauss-Southwell
More recent error analysis
David Gleich · Purdue 34
Theorem (Gleich and Kloster, 2013 arXiv:1310.3423)" Consider computing the matrix exponential using the Gauss-Southwell relaxation method in a graph with a Zipf-law in the degrees with exponent p=1 and max-degree d, then the work involved in getting a solution with 1-norm error ε is
work = O⇣
log(
1
" )(
1
" )
3/2d2
(log d)
2
⌘
Problem size &"Runtimes
106
107
108
109
1010
10−4
10−2
100
102
104
|V|+ nnz(P)
runtim
e (
s)
expmvhalfgexpmqgexpmexpmimv
Figure 7 – The median runtime of our methods for the seven graphs over 100 trials (only 50 trials
for the largest two datasets). The coordinate relaxation methods have highly variable runtime,
but can be very fast on graphs such as webbase (the point nearest 10
9on the x-axis). We did
not run the gexpm function for matrices larger than the livejournal graph.
104
106
108
10−4
10−3
10−2
10−1
time
in s
eco
nd
s
graph size10
410
610
810
−4
10−2
100
102
104
time
in s
eco
nd
s
graph size10
610
810
−4
10−2
100
102
104
time
in s
eco
nd
s
max−degree squared
Figure 8 – The distribution of runtimes for the gexpm method on two forest-fire graphs (left:
p = 0.4, middle, p = 0.48) of various graph sizes, where graph size is computed as the sum of
the number of vertices and the number of non-zeros in the adjacency matrix. The thick line is
the median runtime over 50 trials, and the shaded region shows the 25% to 75% quartiles (the
shaded region is very tight for the second two figures). The final plot shows the relationship
between the max-degree squared and the runtime in seconds. This figure shows that the runtime
scales in a nearly linear relationship with the max-degree squared, as predicted by our theory.
The large deviations from the line of best fit might be explained by the fact that only a single
forest-fire graph was generated for each graph size.
30
Table 3 – The real-world datasets we use in our experiments span three orders of magnitude in
size.
Graph |V | nnz(P ) nnz(P )/|V |
itdk0304 190,914 1,215,220 6.37
dblp-2010 226,413 1,432,920 6.33
flickr-scc 527,476 9,357,071 17.74
ljournal-2008 5,363,260 77,991,514 14.54
webbase-2001 118,142,155 1,019,903,190 8.63
twitter-2010 33,479,734 1,394,440,635 41.65
friendster 65,608,366 3,612,134,270 55.06
Real-world networks The datasets used are summarized in Table 3. They include
a version of the flickr graph from [Bonchi et al., 2012] containing just the largest
strongly-connected component of the original graph; dblp-2010 from [Boldi et al., 2011],
itdk0304 in [(The Cooperative Association for Internet Data Analyais), 2005], ljournal-
2008 from [Boldi et al., 2011, Chierichetti et al., 2009], twitter-2010 [Kwak et al., 2010]
webbase-2001 from [Hirai et al., 2000, Boldi and Vigna, 2005], and the friendster graph
in [Yang and Leskovec, 2012].
Implementation details All experiments were performed on either a dual processor
Xeon e5-2670 system with 16 cores (total) and 256GB of RAM or a single processor
Intel i7-990X, 3.47 GHz CPU and 24 GB of RAM. Our algorithms were implemented in
C++ using the Matlab MEX interface. All data structures used are memory-e�cient:
the solution and residual are stored as hash tables using Google’s sparsehash pack-
age. The precise code for the algorithms and the experiments below are available via
https://www.cs.purdue.edu/homes/dgleich/codes/nexpokit/.
Comparison We compare our implementation with a state-of-the-art Matlab function
for computing the exponential of a matrix times a vector, expmv [Al-Mohy and Higham,
2011]. We customized this method with knowledge that kP k1
= 1. This single change
results in a great improvement to the runtime of their code. In each experiment, we
use as the “true solution” the result of a call to expmv using the ‘single’ option, which
guarantees a 1-norm error bounded by 2�24, or, for smaller problems, we use a Taylor
approximation with the number of terms predicted by Lemma 12.
6.1 Accuracy on Large Entries
When both gexpm and gexpmq terminate, they satisfy a 1-norm error of ". Many appli-
cations do not require precise solution values but instead would like the correct set of
25
David Gleich · Purdue 35
References and ongoing work
Kloster and Gleich, Workshop on Algorithms for the Web-graph, 2013. Also see the journal version on arXiv. www.cs.purdue.edu/homes/dgleich/codes/nexpokit
• Error analysis using the queue (almost done …) • Better linear systems for faster convergence • Asynchronous coordinate descent methods • Scaling up to billion node graphs (done …)
David Gleich · Purdue 36
Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich