lecture 1. introduction to tensor computations · what is a tensor? deﬁnition an order-d tensor a...

From Matrix to Tensor:The Transition to Numerical Multilinear Algebra

Lecture 1. Introduction to TensorComputations

Charles F. Van Loan

Cornell University

The Gene Golub SIAM Summer School 2010Selva di Fasano, Brindisi, Italy

⊗ Transition to Computational Multilinear Algebra ⊗ Lecture 1. Introduction to Tensor Computations

What is a Tensor?

Definition

An order-d tensor A ∈ IRn1×···×nd is a real d-dimensional arrayA(1:n1, . . . , 1:nd) where the index range in the k-th mode is from1 to nk .

Low-Order Tensors

A scalar is n order-0 tensor.

A vector is an order-1 tensor.

A matrix is an order-2 tensor.

We will use calligraphic font to designate tensors that have order 3or greater, e.g., A, B, C, etc.


Where Might They Come From?

Discretization.

A(i , j , k, `) might house the value of f (w , x , y , z) at(w , x , y , z) = (wi , xj , yk , z`).

Multiway Analysis.

A(i , j , k, `) is a value that captures an interaction between fourvariables/factors.


You Have Seen them Before

Images

A color picture is an m-by-n-by-3 tensor:

A(:, :, 1) = red pixel values

A(:, :, 2) = green pixel values

A(:, :, 3) = blue pixel values


You Have Seen them Before

Block Matrices

A =

a11 a12 a13 a14 a15 a16

a21 a22 a23 a24 a25 a26

a31 a32 a33 a34 a35 a36

a41 a42 a43 a44 a45 a46

a51 a52 a53 a54 a55 a56

a61 a62 a63 a64 a65 a66

Matrix entry a45 is the (2,1) entry of the (2,3) block:

a45 ⇔ A(2, 3, 2, 1)


The Decomposition Paradigm in Matrix Computations

Typical:

Convert the given problem into an equivalent easy-to-solveproblem by using the “right” matrix decomposition.

PA = LU, Ly = Pb, Ux = y =⇒ Ax = b

Also Typical:

Uncover hidden relationships by computing the “right”decomposition of the data matrix.

A = UΣV T =⇒ A ≈r̂∑

i=1

σiuivTi


Two Natural Questions

Question 1.

Can we solve tensor problems by converting them to equivalent,easy-to-solve problems using a tensor decomposition?

Question 2.

Can we uncover hidden patterns in tensor data by computing anappropriate tensor decomposition?

We explore the decomposition issue for 2-by-2-by-2 tensors...


The Singular Value Decomposition

The 2-by-2 case...

[a11 a12

a21 a22

]=

[u11 u12

u21 u22

] [σ1 00 σ2

] [v11 v12

v21 v22

]T

= σ1

[u11

u21

] [v11

v21

]T

+ σ2

[u12

u22

] [v12

v22

]T

A reshaped presentation of the same thing...a11

a21

a12

a22

= σ1

v11u11

v11u21

v21u11

v21u21

+ σ2

v12u12

v12u22

v22u12

v22u22


The Kronecker Product

Definition As Applied to 2-vectors

[x1

x2

]⊗

[y1

y2

]=

x1y1

x1y2

x2y1

x2y2

2-by-2 SVD in Kronecker Terms...

a11

a21

a12

a22

= σ1

[v11

v21

]⊗

[u11

u21

]+ σ2

[v12

v22

]⊗

[u12

u22

]


Reshaping

Appreciate the Duality..

[a11 a12

a21 a22

]= σ1 · u1v

T1 + σ2 · u2v

T2

a11

a21

a12

a22

= σ1 · (v1 ⊗ u1) + σ2 · (v2 ⊗ u1)

U =[

u1 u2

]V =

[v1 v2

]


Write A ∈ IR2×2×2 as a minimal sum of rank-1 tensors.

What’s a rank-1 tensor?

If R has unit rank then

R = (rijk) rijk = hkgj fi

If R ∈ IR2×2×2 has unit rank then there exist f , g , h ∈ IR2 suchthat

r111

r211

r121

r221

r112

r212

r122

r222

=

[h1

h2

]⊗

[g1

g2

]⊗

[f1f2

]



Find thinnest possible X ,Y ,Z ∈ IR2×r so

a111

a211

a121

a221

a112

a212

a122

a222

=

r∑k=1

zk ⊗ yk ⊗ xk

where

X = [x1| · · · |xr ] Y = [y1| · · · |yr ] Z = [z1| · · · |zr ]



A Surprising Fact

If

a111

a211

a121

a221

a112

a212

a122

a222

= randn(8,1) then

r = 2 79% of the time

r = 3 21% of the time

Compare to..

If A = randn(n,n), then 100% of the time rank(A) = n

What are the “full rank” 2-by-2-by-2 tensors?


The 79/21 Property

Interesting Fact

If the aijk are randn then

det

([a111 a121

a211 a221

]− λ

[a112 a122

a212 a222

])= 0

has real distinct roots 79% of the time and complex conjugateroots 21% of the time.

What is the connection between this generalized eigenvalueproblem and the 2-by-2-by-2 tensor rank problem?


The 79% Situation

(Real Distinct Eigenvalues)

There exist nonsingular X =[

x1 x2

]and Y =

[y1 y2

]so[

a111 a121

a211 a221

]= X

[α1 00 α2

]Y T = α1x1y

T1 + α2x2y

T2[

a112 a122

a212 a222

]= X

[1 00 1

]Y T = x1y

T1 + x2y

T2

i.e.,

a111

a211

a121

a221

= α1(y1 ⊗ x1) + α2(y2 ⊗ x2)

a112

a212

a122

a222

= (y1 ⊗ x1) + (y2 ⊗ x2)


The 79% Situation

Stack ’em

a111

a211

a121

a221

a112

a212

a122

a222

=

[α1

1

]⊗ (y1 ⊗ x1) +

[α2

1

]⊗ (y2 ⊗ x2)

A has rank 2.


The 21% Situation

(Complex Conjugate Eigenvalues)

There exist nonsingular X =ˆ

x1 x2

˜and Y =

ˆy1 y2

˜so»

a111 a121

a211 a221

–= X

»α1 α2

−α2 α1

–Y T

= α1(x1yT1 +x2y

T2 )+α2(x1y

T2 −x2y

T1 )

= α1(x1+x2)(y1+y2)T − (α1−α2)x1y

T2 − (α1+α2)x2y

T1

»a112 a122

a212 a222

–= X

»1 00 1

–Y T

= x1yT1 + x2y

T2

= (x1 + x2)(y1 + y2)T − x1y

T2 − x2y

T1


The 21% Situation

(Complex Conjugate Eigenvalues)

a111

a211

a121

a221

a112

a212

a122

a222

=

[α1

1

]⊗ (y1 + y2)⊗ (x1 + x2)

−[α2 − α1

1

]⊗ y2 ⊗ x1

−[α1 + α2

1

]⊗ y1 ⊗ x1

A has rank 3.


Matlab: 2-by-2-by-2 Tensor Rank

% Script L1

% Estimates the prob that randn (2,2,2) has rank 2:

N = 1000000; count = 0;

for eg=1:N

A = randn (2,2,2);

A_front = [A(1,1,1) A(1,2,1); A(2,1,1) A(2 ,2,1)];

A_back = [A(1,1,2) A(1,2,2); A(2,1,2) A(2 ,2,2)];

[S,T] = qz(A_front ,A_back ,’real’);

if S(2 ,1)==0

count = count +1;

end

end

ProbARank2 = count/N


Problem 1.1. What happens if we use rand instead of randn in the scriptL1?

Problem 1.2. Explain why the script L1 generates the same resultsregardless of the choice for A front and A back:

Choice 1: A front =

»a111 a121

a211 a221

–A back =

»a112 a122

a212 a222

–

Choice 2: A front =

»a111 a112

a211 a212

–A back =

»a121 a122

a221 a222

–

Choice 3: A front =

»a111 a112

a121 a122

–A back =

»a211 a212

a221 a222

–


Matlab: Generating Tensors

% A 2-by -3-by -4 random tensor ...

A = randn (2,3,4);

% Dimensions can be specified by a vector ...

n = [2,3,4];

A = randn(n);

% The functions ones and zeros ...

A = ones(n);

A = zeros(n);

% Size and reshaping ...

n = size(A);

N = prod(n);

a = reshape(A,N,1);

B = reshape(a,n(length(n): -1:1));


Matlab: Tensor Operations

% Applying a function to each component ...

n = [2,3,4];

A = randn(n);

for i1=1:n(1)

for i2=1:n(2)

for i3=1:n(3)

A(i1 ,i2 ,i3) = sin(A(i1,i2 ,i3));

end

end

end

% In lieu of the triple loop ...

A = sin(A);

% The usual cast of characters ..

A = floor(A); A = ceil(A); A = round(A);

A = real(A); A = imag(A);

A = sqrt(A); A = abs(A);


Matlab: Tensor Operations

n = [2,3,4];

A = randn(n);

B = randn(n);

% These operations are legal ...

C = 3*A;

C = -A;

C = A+1;

C = A.^2;

% These operations are legal if A and B

% have the same dimension ...

C = A + B;

C = A./B;

C = A.*B;

C = A.^B;


Nearness Problems

The Nearest Rank-1 Matrix Problem

If A ∈ IRn1×n2 has SVD A = UΣV T then Bopt = σ1u1vT1

minimizesφ(B) = ‖ A− B ‖F rank(B) = 1.

The Nearest Rank-1 Tensor to A ∈ IR2×2×2

Find unit 2-norm vectors u, v ,w ∈ IR2 and scalar σ so that‖ a− σ · w ⊗ v ⊗ u ‖2 is minimized where

a =

a111

a211

a121

a221

a112

a212

a122

a222


A Highly Structured Nonlinear Optimization Problem

It depends on four parameters...

φ(σ, θ1, θ2, θ3) =

∥∥∥∥a − σ

[cos(θ3)sin(θ3)

]⊗

[cos(θ2)sin(θ2)

]⊗

[cos(θ1)sin(θ1)

]∥∥∥∥2

=

∥∥∥∥∥∥∥∥∥∥∥∥∥∥∥∥

a111

a211

a121

a221

a112

a212

a122

a222

− σ ·

c3c2c1

c3c2s1c3s2c1

c3s2s1s3c2c1

s3c2s1s3s2c1

s3s2s1

∥∥∥∥∥∥∥∥∥∥∥∥∥∥∥∥2

ci = cos(θi ), si = sin(θi ) i = 1:3



and it can be reshaped...

φ =

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚

266666666664

a111

a211

a121

a221

a112

a212

a122

a222

377777777775− σ ·

266666666664

c3c2c1

c3c2s1

c3s2c1

c3s2s1

s3c2c1

s3c2s1

s3s2c1

s3s2s1

377777777775

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2

=

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚

266666666664

a111

a211

a121

a221

a112

a212

a122

a222

377777777775−

266666666664

c3c2 00 c3c2

c3s2 00 c3s2

s3c2 00 s3c2

s3s2 00 s3s2

377777777775

»x1

y1

– ‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2

[x1

y1

]= σ ·

[c1

s1

]= σ ·

[cos(θ1)sin(θ1)

]

Idea: Improve σ and θ1 by minimizing with respect to x1 and y1,holding θ2 and θ3 fixed.




φ =

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚

266666666664

a111

a211

a121

a221

a112

a212

a122

a222

377777777775− σ ·

266666666664

c3c2c1

c3c2s1

c3s2c1

c3s2s1

s3c2c1

s3c2s1

s3s2c1

s3s2s1

377777777775

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2

=

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚

266666666664

a111

a211

a121

a221

a112

a212

a122

a222

377777777775−

266666666664

c3c1 0c3s1 00 c3c1

0 c3s1

s3c1 0s3s1 00 s3c1

0 s3s1

377777777775

»x2

y2

– ‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2

[x2

y2

]= σ ·

[c2

s2

]= σ ·

[cos(θ2)sin(θ2)

]





φ =

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚

266666666664

a111

a211

a121

a221

a112

a212

a122

a222

377777777775− σ ·

266666666664

c3c2c1

c3c2s1

c3s2c1

c3s2s1

s3c2c1

s3c2s1

s3s2c1

s3s2s1

377777777775

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2

=

‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚

266666666664

a111

a211

a121

a221

a112

a212

a122

a222

377777777775−

266666666664

c2c1 0c2s1 0s2c1 0s2s1 00 c2s1

0 c2s1

0 s2c1

0 s2s1

377777777775

»x3

y3

– ‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚‚2

[x3

y3

]= σ ·

[c3

s3

]= σ ·

[cos(θ3)sin(θ3)

]



Alternating Least Squares

A Common Framework for Tensor-Related Optimization

Choose a subset of the unknowns such that if they are(temporarily) fixed, then we are presented with aneasy-to-solve problem in the remaining unknowns.

By choosing different subsets, cycle through all the unknowns.

Repeat until converged.

“Easy-to-solve” usually means “linear.”


Problem 1.3. Write a Matlab function [sigma,theta] =

NearestTank1(A) that takes a tensor A ∈ IR2×2×2 and uses alternatingleast squares to produce an estimate of a nearest rank-1 approximant.What is a good starting value? The linear least squares problems that needto be solved are highly structured. Explain and exploit.

Problem 1.4. Fix θ3. How would you choose σ, θ1 and θ2 so that

φ(σ, θ1, θ2, θ3) =

‚‚‚‚a − σ

»cos(θ3)sin(θ3)

–⊗

»cos(θ2)sin(θ2)

–⊗

»cos(θ1)sin(θ1)

–‚‚‚‚2

is minimized?


Summary of Lecture 1.

Key Words

Reshaping is about turning a matrix in to a differently sizedmatrix or into a tensor or into a vector.

The Kronecker product is an operation between two matricesthat produces a highly structured block matrix.

A Rank-1 tensor can be reshaped into a multiple Kroneckerproduct of vectors.

A tensor decomposition expresses a given tensor in terms ofsimpler tensors.

The alternating least squares approach to multilinear LS isbased on solving a sequence of linear LS problems, eachobtained by freezing all but a subset of the unknowns.


What is the Course About?

The Lectures...

Lecture 1. Introduction to Tensor Computations

Lecture 2. Tensor Unfoldings

Lecture 3. Transpositions, Kronecker Products, Contractions

Lecture 4. Tensor-Related Singular Value Decompositions

Lecture 5. The CP Representation and Rank

Lecture 6. The Tucker Representation

Lecture 7. Other Decompositions and Nearness Problems

Lecture 8. Multilinear Rayleigh Quotients

Lecture 9. The Curse of Dimensionality

Lecture 10. Special Topics



The Next Big Thing

Scalar-Level Thinking

1960’s ⇓

Matrix-Level Thinking

1980’s ⇓

Block Matrix-Level Thinking

2000’s ⇓

Tensor-Level Thinking

⇐ The factorization paradigm:LU, LDLT , QR, UΣV T , etc.

⇐ Cache utilization, parallelcomputing, LAPACK, etc.

⇐New applications, factoriza-tions, data structures, non-linear analysis, optimizationstrategies, etc.



The Curse of Dimensionality

In Matrix Computations, to say that A ∈ IRn1×n2 is “big” is to saythat both n1 and n2 are big.

In Tensor Computations, to say that A ∈ IRn1×···×nd is “big” is tosay that n1n2 · · · nd is big and this need not require big nk . E.g.n1 = n2 = · · · = n1000 = 2.

Algorithms that scale with d will induce a transition...

Matrix-Based Scientific Computation

⇓

Tensor-Based Scientific Computation


The “Geometry” of the Tensor Research Community

Applications

NonlinearOptimization

MatrixComputations

MultilinearAlgebra

NonlinearAnalysis

Statistics

ProgrammingLanguages

Decompositions

SoftwareLibraries

-�� -

6

?

?

6

��

��3�

��

��+

QQ

QQQsQ

QQ

QQk

QQ

QQQkQ

QQ

QQs

��

��+�

��

��3


lecture 1. introduction to tensor computations · what is a tensor? deﬁnition an order-d tensor a...

Documents