lecture 9. the curse of dimensionality

From Matrix to Tensor:The Transition to Numerical Multilinear Algebra

Lecture 9. The Curse of Dimensionality

Charles F. Van Loan

Cornell University

The Gene Golub SIAM Summer School 2010Selva di Fasano, Brindisi, Italy

⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality 1 / 27

Where We Are

Lecture 1. Introduction to Tensor Computations

Lecture 2. Tensor Unfoldings

Lecture 3. Transpositions, Kronecker Products, Contractions

Lecture 4. Tensor-Related Singular Value Decompositions

Lecture 5. The CP Representation and Tensor Rank

Lecture 6. The Tucker Representation

Lecture 7. Other Decompositions and Nearness Problems

Lecture 8. Multilinear Rayleigh Quotients

Lecture 9. The Curse of Dimensionality

Lecture 10. Special Topics


What is this Lecture About?

Big Problems

1. A single N-by-N matrix problem is big if N is big.

2. A problem that involves N small p-by-p problems is big if N is big.

3. A problem that involves a tensor A ∈ IRn1×···×nd is big if

N = n1 · · · nd

is big and that can happen rather easily if d is big.



Data Sparse Representation

We are used to solving big matrix problems when the matrix isdata-sparse, i.e., when A ∈ IRN×N can be represented with manyfewer than N2 numbers.

What if N is so big that we cannot even store length-N vectors?

How could we apply (for example) the Rayleigh Quotient procedure insuch a situation?



A Very Large Eigenvalue Problem

We will look at a problem where A ∈ IR2d×2dis data sparse but where

d is sufficiently big to make the storage of length-2d vectorsimpossible.

Vectors will be approximated by data sparse tensors of high order.



A Problem From Quantum Chemistry

Given a 2d -by-2d symmetric matrix H, find a vector a that minimizes

r(a) =aTHa

aTa

Of course: a = amin, λ = r(amin) ⇒ Ha = λa.

What if d = 100?


The Google Slide



The H-Matrix

H =d∑ij

tijHTi Hj +

d∑ijkl

vijklHTi HT

j HkHl

Hi = I2i−1 ⊗[

0 10 0

]⊗ I2d−i

T ∈ IRd×d is symmetric and V ∈ IRd×d×d×d has symmetries.

Sparsity

nzeros =

(1

64d4 − 3

32d3 +

27

64d2 − 11

32d + 1

)2d − 1



Modeling Electron Interactions

Have d “sites” (grid points) in physical space.

The goal is compute a wave function, an element of a 2d Hilbertspace.

The Hilbert space is a product of d , 2-dimensional Hilbert spaces. (Asite is either occupied or not occupied.)

A (discretized) wavefunction is a d-tensor, 2-by-2-by-2-by-2...It is the vector that minimizes aTHa/aTa where...



The H-Matrix

H =d∑ij

tijHTi Hj +

d∑ijkl

vijklHTi HT

j HkHl

⇑KineticEnergy

Weights

⇑PotentialEnergyWeights

Hi = I2i−1 ⊗[

0 10 0

]⊗ I2d−i



Dealing with N = 2d ≈ 2100

Intractable: min

a ∈ IRN

aTHa

aTa

Tractable: min

a ∈ IRN

a data sparse

aTHa

aTa


Tensor Networks

Definition

A tensor network is a tensor of high dimension that is built up frommany sparsely connected tensors of low-dimension.


TN slides


Linear Tensor Network

Recall the Block Vec Operation

[F1

F2

]⊗

G1

G2

G3

⊗ [ H1

H2

] =

[F1

F2

]⊗

G1H1

G1H2

G2H1

G2H2

G3H1

G3H2

=

F1G1H1

F1G1H2

F1G2H1

F1G2H2

F1G3H1

F1G3H2

F2G1H1

F2G1H2

F2G2H1

F2G2H2

F2G3H1

F2G3H2


Linear Tensor Network

In the ”Language” of Block Vec Products...

a =

[A11

A21

]⊗[

A12

A22

]⊗ · · · ⊗

[A1,d−1

A2,d−1

]⊗[

A1d

A2d

]where [

A11

A21

]=

[wT

1

wT2

]2 row vectors

[A1k

A2k

]=

[m-by-mm-by-m

]k = 2:d − 1

[A1d

A2d

]=

[z1

z2

]2 column vectors

a is a length-2d vector that depends on O(dm2) numbers


Back to the Main Problem...

Constrained Minimization

Minimize

r(a) =aTHa

aTa

subject to the constraint that

a =

[A11

A21

]⊗[

A12

A22

]⊗ · · · ⊗

[A1,d−1

A2,d−1

]⊗[

A1d

A2d

]

Let us look at both the denominator and the numerator in light of the factthat N = 2d .


Avoiding O(2d)

2-Norm of a Linear Tensor Network...

If

a =

[A11

A21

]⊗[

A12

A22

]⊗ · · · ⊗

[A1,d−1

A2,d−1

]⊗[

A1d

A2d

]then

aTa = wT

(d−1∏k=2

(A1k ⊗ A1k) + (A2k ⊗ A2k)

)z

where

w = A11 ⊗ A11 + A21 ⊗ A21 = w1 ⊗ w1 + w2 ⊗ w2

z = A1d ⊗ A1d + A2d ⊗ A2d = z1 ⊗ z1 + z2 ⊗ z2

A1k and A2k are m-by-m, k = 2:d − 1. Overall work is O(dm3).⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 9. The Curse of Dimensionality 17 / 27

Avoiding O(d4)

Recall...

H =d∑ij

tijHTi Hj +

d∑ijkl

vijklHTi HT

j HkHl

Hi = I2i−1 ⊗[

0 10 0

]⊗ I2d−i

The V-Tensor Has Familiar Symmetries

V(i , j , k, `) =

V(j , i , k, `)V(i , j , `, k)V(k, `, i , j)

and so we can find symmetric matrices B1, . . . ,Br so

V = B1 ◦ B1 + · · ·+ Br ◦ Br


Avoiding O(d4)

Idea

Approximate V with B1 ◦ B1 (or some short sum of the B’s) becausethen vijk` = B1(i , j)B1(k, `) and

H =d∑ij

tijHTi Hj +

d∑ijkl

vijklHTi HT

j HkHl

=d∑ij

tijHTi Hj +

∑ij

B1(i , j)HiHj

T ∑ij

B1(i , j)HiHj

Think about aTHa and note that we have reduced evaluation by a factor ofO(d2).


Optimization Approach

For k = 1:d ...

Minimize

r(a) =aTHa

aTa= r(A1k ,A2k)

subject to the constraint that

a =

[A11

A21

]⊗[

A12

A22

]⊗ · · · ⊗

[A1,d−1

A2,d−1

]⊗[

A1d

A2d

]and all by A1k and A2k are fixed.

This projected subproblem can reshaped into a smaller, 2m2-by-2m2

Rayleigh Quotient minimization...


Optimization Approach

The Subproblem

Minimize

r(ak) =aTk Hkak

aTk ak

where

ak =

[vec(A1k)vec(A2k)

]and

Hk = TTk HTk Tk ∈ IR2d×m2

can be formed in time polynomial in m.


Tensor-Based Thinking

Key Attributes

1 An ability to reason at the index-level about the constituentcontractions and the order of their evaluation.

2 An ability to reason at the block matrix level in order to exposefast, underlying Kronecker product operations.


Data-Sparse Representations and Facorizations

How Could We Compute the QR Factorization of This?

[F1

F2

]⊗

G1

G2

G3

⊗ [ H1

H2

] =

[F1

F2

]⊗

G1H1

G1H2

G2H1

G2H2

G3H1

G3H2

=

F1G1H1

F1G1H2

F1G2H1

F1G2H2

F1G3H1

F1G3H2

F2G1H1

F2G1H2

F2G2H1

F2G2H2

F2G3H1

F2G3H2

Without “Leaving” the Data Sparse Representation?


Data-Sparse Representations and Factorizations

QR Factorization and Block vector Products

If [F1

F2

]=

[Q1

Q2

]R

then[F1

F2

]⊗

G1

G2

G3

⊗ [ H1

H2

]=

[Q1

Q2

]⊗

RG1

RG2

RG3

⊗ [ H1

H2

]

If RG1

RG2

RG3

=

U1

U2

U3

S

then...


Data-Sparse Representations and Factorizations

QR Factorization and Block vector Products

[F1

F2

]⊗

G1

G2

G3

⊗ [ H1

H2

]=

[Q1

Q2

]⊗

U1

U2

U3

⊗ [ SH1

SH2

]

If [SH1

SH2

]=

[V1

V2

]T

then[F1

F2

]⊗

G1

G2

G3

⊗ [ H1

H2

]=

[ Q1

Q2

]⊗

U1

U2

U3

⊗ [ V1

V2

]T

The Matrix in Parentheses is Orthogonal


Summary of Lecture 9.

Key Words

The Curse of Dimensionality refers to the challenges that arisewhen dimension increases.

Clever data-sparse representations are one way to address theissues.

A tensor network is a way of combining low-order tensors toobtain a high-order tensor.

Reliable methods that scale with dimension are the goal.


References

G. Beylkin and M.J. Mohlenkamp (2002). “Numerical Operator Calculus in HigherDimensions,” Proceedings of the National Academy of Sciences, 99(16),10246–10251.

G. Beylkin and M. Mohlenkamp (2005). “Algorithms for Numerical Analysis inHigh Dimensions,” SIAM J. Scientific Computing, 26, 2133–2159.

A. Hartono, A. Sibiryakov, M. Nooijen, G. Baumgartner, D.E. Bernholdt, S.Hirata, C. Lam, R. Pitzer, J. Ramanujam, and P. Sadayappan (2005).“Automated Operation Minimization of Tensor Contraction Expressions inElectronic Structure Calculations,” in Proc. International Conference onComputational Science 2005, Atlanta.

S. Hirata (2003). “Tensor Contraction Engine: Abstraction and Automatic ParallelImplementation of Configuration Interaction, Coupled-Cluster, andMany-Body Perturbation Theories,” J. Phys. Chem. A., 107, 9887–9897.

A. Auer, G. Baumgartner, D. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva,X. Gao, R. Harrison, S. Krishnamoorthy, S. Krishnan, C. Lam, Q. Lu, M.Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov (2006).“Automatic Code Generation for Many-Body Electronic Structure Methods:The Tensor Contraction Engine,” Molecular Physics, 104, no. 2, 211–228.

G. Baumgartner, A. Auer, D. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva,X. Gao, R. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, C. Lam, Q.Lu, M. Nooijen, R. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov(2005). “Synthesis of High-Performance Parallel Programs for a Class of abinitio Quantum Chemistry Models,” Proceedings of the IEEE, 93, no. 2,276–292.

A. Bibireata, S. Krishnan, G. Baumgartner, D. Cociorva, C. Lam, P. Sadayappan,J. Ramanujam, D. Bernholdt, and V. Choppella (2004).“Memory-Constrained Data Locality Optimization for Tensor Contractions,”in Languages and Compilers for Parallel Computing,(L. Rauchwergeret et alEds.), Lecture Notes in Computer Science, Vol. 2958, 93–108,Springer-Verlag.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan (2008). “APractical and Automatic Polyhedral Program Optimization System,” Proc.ACM SIGPLAN 2008 Conference on Programming Language Design andImplementation (PLDI 08), Tucson.

U. Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, andP. Sadayappan (2008). “Automatic Transformations forCommunication-Minimized Parallelization and Locality Optimization in thePolyhedral Model,” in Proc. CC 2008 -International Conference on CompilerConstruction, Budapest.

C-H Huang, J.R. Johnson, and R.W. Johnson (1991). “Multilinear Algebra andParallel Programming,” J. Supercomputing, 5, 189–217.

E. Elmroth, F. Gustavson, I. Jonsson, B. Kagstrom (2004). “Recursive BlockedAlgorithms and Hybrid Data Structures for Dense Matrix Library Software,”SIAM Review 46, 3–45.

J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R.C.Whaley, and K. Yelick (2005). “Self-Adapting Linear Algebra Algorithmsand Software,” Proc. IEEE, 93, 293–312.

P. Drineas and M. Mahoney (2007). “A Randomized Algorithm for a Tensor-BasedGeneralization of the Singular Value Decomposition,” Linear Algebra and ItsApplications, 420, 553-571.

M.W. Mahoney, M. Maggioni, and P. Drineas (2008). “Tensor-CURDecompositions For Tensor-Based Data”, SIAM J. Matrix Analysis andApplications, 30, 957–987.

S.R. White and R.L. Martin (1999). “Ab Initio Quantum Chemistry Using theDensity Matrix Renormalization Group,” J. Chem. Phys., 110, no. 9, p.4127.

G. K-L Chan (2004). “An Algorithm for Large Scale Density MatrixRenormalization Group Calculations,” J. Chem. Phys., 120 (7), 3172.

G.K.-L. Chan, J. Dorando, D. Ghosh, J. Hachmann, E. Neuscamman, H. Wang,and T. Yanai (2007). “An Introduction to the Density MatrixRenormalization Group Ansatz in Quantum Chemistry,” arXiv:cond-mat, vol.0711.1398v1.


lecture 9. the curse of dimensionality

Documents