for peer reviecoronae/qttem.pdf · for peer review 2 i. introduction volume integral equations...

For Peer Review

Quantic Tensor Train Method for Compression of Volume

Integral Equations Operators

Journal: Microwave and Optical Technology Letters

Manuscript ID MOP-17-1463

Wiley - Manuscript type: Research Article

Date Submitted by the Author: 21-Oct-2017

Complete List of Authors: Corona, Eduardo; University of Michigan, Ann Arbor, Mathematics gomez, luis Michielssen, Eric; University of Michigan, Ann Arbor, Department of Electrical Engineering and Computer Science

Keywords: Volume integral equations, Quantic tensor train, Adaptive integral method, Fast Multipole Method, Conjugate Gradient Fast Fourier Transform

John Wiley & Sons

Microwave and Optical Technology Letters

For Peer Review

1

Quantic Tensor Train Method for Compression of Volume

Integral Equations Operators

Eduardo Corona1, Luis J. Gomez2* and Eric Michielssen3

(1) Department of Mathematics

University of Michigan

(2) Psychiatry and Behavioral Sciences

Duke University School of Medicine

(3) Department of Electrical and Computer Engineering

University of Michigan

*Corresponding Author: Luis Gomez, 2306 huron st. Durham, NC, [email protected], (954)8165011

(ROUGH) PREPRINT – DO NOT DISTRIBUTE

Abstract – The FFT-based iterative solution of volume integral equations pertinent to the analysis of

electromagnetic scattering phenomena requires O(N) storage stemming from the need to store Toeplitz

matrices. A quantic tensor train scheme is shown to permit compression and storage of these matrices

using sublinear memory. For many applications this leads to a 3-5 orders of magnitude reduction in

storage requirements compared to FFT methods.

Index Terms – Volume integral equations (VIE), Quantic tensor train (QTT), Adaptive integral method

(AIM)

1

Page 1 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

mailto:[email protected]

For Peer Review

2

I. INTRODUCTION

Volume integral equations (VIEs) are widely used for analyzing electromagnetic (EM) scattering from

heterogeneous penetrable objects [1, 2, 3]. Applications of VIE solvers range from geoscience and remote

sensing (e.g. oil exploration [4]) to bio-electromagnetics (e.g. stimulation devices [5]), plasma physics

(e.g. communication with re-entry vehicles [6]), and circuit characterization (e.g. EMI susceptibility [7]).

Like all integral equation schemes, VIE solvers require “fast” methods to reduce the prohibitive storage

and CPU costs associated with classical implementations. All fast methods for solving VIEs proposed to

date belong to one of two categories:

(i) Fast Fourier Transform (FFT) methods, which exploit the convolutional nature of the VIE operator

to perform matrix-vector products using via FFT; representative examples include k-space [8], pre-

corrected FFT [9], and adaptive integral [10] methods.

(ii) Fast multipole methods (FMM), which exploit the low rank nature of blocks of the interaction

matrix in conjunction with diagonal translation operators [11, 12] to enable a multilevel hierarchical

representation of the interaction matrix.

The storage requirements of all these fast methods scale as O N . The CPU requirements of FFT and

FMM schemes scale as logO N N and O N , respectively. FFT methods generally speaking have

the edge over fast multipole schemes as their asymptotic CPU costs exhibit much smaller leading

constants.

Here we introduce the Quantic Tensor Train (QTT) as a new tool for compressing VIE operators and

applying them to vectors. The QTT simultaneously exploits the convolutional nature of the VIE operator

and the compressibility of low-rank sub-blocks of VIE matrices [13, 14]. Just like (3D) FFT methods,

QTT schemes represent the VIE matrix as a product of sparse matrices with (three-level) Toeplitz

matrices resulting from sampling Green’s functions on a regular grid. The QTT allows for these Toeplitz

matrices to be compressed, eliminating all storage resulting from oversampling of the Green’s function

[15]. To this end, the QTT scheme first reshapes the Toeplitz matrices into higher-dimensional tensors

Page 2 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

3

(quantization step). Next, the scheme compresses them using low-rank methods (tensor train (TT)

decomposition step) [16].

The QTT stores Toeplitz matrices arising from the discretization of VIE operators using logO N

and 2/3 logO N N memory for electrically small and large scatterers, respectively. In many practical

applications, the storage requirements of QTT-compressed Toeplitz matrices are three to five orders of

magnitude below those of classical FFT methods. Application of a QTT-compressed Toeplitz matrix to a

vector requires logO N N operations with a leading constant that typically is roughly an order of

magnitude larger than that of classical FFT schemes. While QTT compression of VIE Toeplitz matrices

does not negate the need for O N storage resources for other code components (e.g. trial solution

vectors), it results in significant memory savings compared to classical FFT codes. In many ways, the

QTT scheme presents an attractive compromise between storing Toeplitz matrices and re-computing them

on the fly during each solver iteration. QTT schemes also have important implications for parallel VIE

solvers, which will be addressed in a future communication.

This Letter is organized as follows. Section II elucidates the QTT scheme; for expositional purposes,

the focus is on VIEs discretized on a Cartesian grid; the procedure however is trivially extended, for

example, to AIM-accelerated VIEs discretized on tetrahedral grids [17]. Section III demonstrates the

technique’s accuracy and efficiency through the compression of scalar Helmholtz and VIE operators.

Section IV presents our conclusions. Throughout this Letter, a time variation j te

is assumed and

suppressed; here 2 f is the angular frequency, f denotes the frequency, and t stands for time.

II. FORMULATION

A. Traditional VIE formulation

Consider an inhomogeneous dielectric object with support and permittivity r that resides in a

Page 3 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

4

homogenous background with permittivity b and permeability b . Upon illumination by incident

electric and magnetic fields incE r and inc

H r , the object scatters electric and magnetic fields

scat inc E EE r r r and inscat c H r H r H r ; here E r and H r denote total electric and

magnetic fields. Using the volume equivalence principle, the object is replaced by equivalent polarization

currents V jrJ r D r where 1 /b r r denotes the electric contrast and

( ) D r r E r is the electric flux. The classical VIE for D r reads

2

inc

' '

'

'

,'

'

' '

b b

b

k g d

g d

rr r

DD r r r

D r r E

r

r r r rr

, (1)

where b b bk and exp 4b bg j xx k x are the wavenumber and scalar Green’s function of

the background medium, respectively.

B. Discretization of the VIE and factorization of system matrices

To simplify the description of the QTT scheme, we discretize (1) using the two-step procedure in [18].

First, is embedded in a cuboid C , which is subdivided into voxN voxels, each of which is assigned a

constant permittivity. Next, the flux D r is expanded as

6

1 1

voxNi i

j j

i j

d

r bD r , (2)

where i

jb r denotes the ith half-rooftop basis function (i = 1,…,6) on the jth voxel, and ijd is the

associated expansion coefficient. Inserting (2) into (1) and applying a Galerkin testing procedure yields

the algebraic system of equations V A d , where

Page 4 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

5

1 1 2 2 6 6

1 1 2 2 6 6

1 1

1 1 1 1 1 1

2 2 2 2 2 2

6 6 6 6

2

6 6

2 6 6

χ χ χ

χ χ χ

χ χ χ

G Z I G Z I G Z I

G Z I G Z I G Z I

G Z I G Z I G Z I

A , (3)

1 2 6, , ,

T

V V V V , and 1 2 6

, , ,T

d d d d . The entries of sub-matrices jiG , χI ,

jiZ and

vectors j

V are

,,

j j

i m nm n

i b br rG r , (4)

,

,n mm nε rI rp p r , (5)

2

,, ' ' '

, ' ' ' '

j

i m b n bm n

m

j

i

n

j

b

ik g d

g d

Z b r r rr b r

b r r r rr b

, (6)

and

inc

1,,

j j

mm rV E b r . (7)

In the above equations,

, d r r rA B rB A r (8)

and np r is a pulse basis function on the nth voxel. Next, we express d B z where B changes the

basis of the flux D r from full to half rooftops, enforcing its continuity across cells[18]. The final

system of equations reads T T B V B AB z .

Matrices B , jiG and χI are sparse and can be stored and applied using O N memory and CPU

resources, respectively. There are a total of 36 matrices jiZ , each having

2N non-zero entries.

However, each entry

,i

n

j

mZ can be expressed as a function of m nr r , where mr and nr are the

positions of the center of voxels m and n, respectively, as

Page 5 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

6

2

,

ˆ ˆ, ' ' '

ˆ ˆ, ' ' ' '

i b bm n

C

j j i

j i

b

C

k g d

g d

m n

m n

Z b r b r r r r r r

b r b r r r r r r

, (9)

where ˆ i

b r is the ith half-rooftop basis function (i = 1,…,6) in a voxel centered about the origin. In

other words, matrices j

i

( )Z are dense three-level Toeplitz matrices that can be stored using O N

memory and applied to vectors using logO N N CPU resources using FFT methods. We next

introduce the QTT as a memory saving alternative.

C. Quantic tensor train decomposition

Previously, the QTT scheme has been used to hierarchically compress linear operators arising in a

wide range of applications into a format termed “Tensor Train (TT) decomposition” [19]. Specifically,

when used to store inverse operators arising in the direct solution of volume and boundary integral

equations with non-oscillatory convolutional kernels in electrostatics, magnetostatics, and low Reynolds

number flow, the QTT format has been shown to require log NO memory and precomputation costs

[15]. Constructing the QTT compressed data structure requires first converting the matrices into a high-

dimensional tensor via a process called quantization and, second, compressing the resulting tensor using

the TT decomposition.

First, we show how to reshape the matrices j

i

( )Z into a single high-dimensional tensor A using an

indexing scheme that results in a hierarchy reminiscent of the geometric bisection tree structure used in

multi-level FMM schemes. Without loss of generality, it is assumed that C is a 2 2 2s s s cube (where

s is a positive integer) composed of unit length cubic voxels each with centers residing in the first octant

and one voxel with center anchored at the origin. The center of each voxel can be expressed as a function

of 3s parameters as

Page 6 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

7

1

1 2 3 3 2 3 1 3

1

, , , 2 ( , , )s

m m m m m mi

s i i i

i

rmr f , ( 10)

where 1,2, , voxm N , 0,1m

k and 1,2,...,3k s . The function rf is the result of subdividing the

domain into groups by geometric bisection. This procedure can be represented using a binary tree CT with

leaf nodes corresponding to voxels. Each index 0,1m

k ( 1,2,...,3k s ) indicates the group where

voxel m is located at level k . For example, Fig. 1 depicts CT of a 2 2 2 domain. In Figure 1, each leaf

voxel is further subdivided into 6 elements, each corresponding to a facet basis function. Since each entry

,

in

j

mZ is a function of m nr r , it is also a function of 1 1 2 2 3 3, , ,

m n m n m n

s s rf .

Tensorization simply leverages this property to construct tensor A as

,1 2

1

1 1

, , ,

1 6 1

2 2, 3, ,

im n

j

d

m n

l l l

k k k

k i j

k l d

ZA

. ( 11)

Here 3 1d s and 1 2, , , dk k kA is 1 2 dp p p = 36 4 4 . Parameter 1k indicates a half-

rooftop basis and testing function pair. Parameters 2 3, , , dk k k indicate a basis and testing voxel group

pair at their corresponding level.

Second, the TT decomposition is used to compress tensor A . The TT decomposition factorizes A

into a product of matrices

d

1 2 d 1, ,..., l ll

k k k k

A G , (12)

where each lG ( = 1,2,…,dl ) is known as a tensor core, each l lkG ( 1,2,...,pl lk ) is a -1r × rl l

matrix, and 0 dr = r = 1 . Each rl is known as a TT rank and generalizes the concept of matrix rank. Each

tensor core is a 3-dimensional array of numbers having k -1r rl lp entries. The total memory cost of the

QTT-decomposition is d

-11r rl l ll

p , in contrast to,

d

1 llp

for storing the whole tensor. The successful

compression of the tensor relies on the TT ranks being small. The maximum rank of the cores maxr are

Page 7 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

8

proportional to the number of multipole terms required for accurate representation of a scalar Helmholtz

operator using addition formulas and can be shown to be small for electrically small domains [15].

Correspondingly, maxr increases linearly with increasing electrical dimension of the domain.

To accomplish the actual QTT compression step we use a rank revealing factorization (specifically, the

multi-pass Alternating Minimal Energy (AMEN) Cross algorithm in the QTT-Toolbox [20]), which

implements a QTT rank-revealing method in which an initial low-QTT-rank approximation is improved

upon by a series of passes through all QTT cores. Each iteration increases the maximum QTT rank until

the difference between the QTT approximation of the last two iterations is below a user-defined threshold

th . Using AMEN, A can be stored with accuracy while only evaluating 2

max(r log(N))O tensor

entries, requiring 3

max(r log(N))O CPU resources and requiring 2

max(r log(N))O memory. Matrix-vector

products can be computed using 2

max(r N log(N))O CPU resources [20] using routines in the QTT-Toolbox,

thus allowing for the fast iterative solution of the systems of equations. For additional details on the QTT

scheme and TT format, see [16, 15, 21].

In what follows and in the Appendix we elucidate the connection of the ranks of the TT-compressed

tensor A with multipole counts used in the FMM. Here we show a connection between the tensorization

of A and variable groupings of FMM. In the appendix, we provide a construction of the QTT cores for a

compressed representation of a 2D Laplace kernel leveraging FMM expansions. For A , 1 1kG is a row

vector containing weights to linearly combine the output column vector of the product d

2 l llk

G .

Thus, 1 1kG can be viewed as an interpolation coefficients that takes samples of basis functions

d

2 l llk

G to entry 1 2 d, ,...,k k kA and 1r is the total number of interpolants. (Note: For notational

brevity we consider products of three consecutive cores in what follows. The same reasoning applies to

single cores, however, the precise inequalities and definitions of near groups is much more nuanced.)

Also for A , the product 2 2 3 3 4 4k k kG G G is a matrix of weights (or interpolants) that, upon

Page 8 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

9

multiplication with the product (or samples of basis functions) d

5 l llk

G , generates d

2 l llk

G .

Here 4r corresponds to the number of interpolants required to produce d

2 l llk

G from

2 2 3 3 4 4k k kG G G . The product of the cores 2 2 3 3 4 4k k kG G G encodes dependencies of A

with respect to variables 1 1

m n ,

2 2

m n and

3 3

m n and the product

d

5 l llk

G encodes

dependencies with respect to variables 4 4

m n ,

5 5

m n ,…,

3 3

m n

s s . From ( 10),

1

1 1 2 2 3 3 3 2 3 2 3 1 3 1 3 3

2

( , , ) 2 ( , , )s

m n m n m n m n m n m ni

i i i i i i

i

m nr r . ( 13)

Furthermore, whenever 1 1 2 2 3 3( , , )

m n m n m n m nr r the following is true

1

1 1 2 2 3 3 3 2 3 2 3 1 3 1 3 3

2

( , , ) 2 ( , , )s

m n m n m n m n m n m ni

i i i i i i

i

. ( 14)

Inequality ( 14) is valid recursively:

1

3 2 3 2 3 1 3 1 3 3

1

1 1

3 2 3 2 3 1 3 1 3 3 3 2 3 2 3 1 3 1 3 3

1 1

2 ( , , ) ,

2 ( , , ) 2 ( , , )

lm n m n m ni

i i i i i i

i

l sm n m n m n m n m n m ni i

i i i i i i i i i i i i

i i l

if then

m nr r

, ( 15)

where 1, 3, , 1l s . Whenever ( 15) is valid, the column vector 3 2

d

l ll lk

G is analogous to

global-to-local interpolants of an FMM scheme and 3 1

2

l

l llk

G is analogous to local-to-local

interpolation coefficients of an FMM scheme. For situations where ( 15) is not valid extra rows and

columns can be appended to the cores, thereby storing the correct values directly; this is analogous to

inserting a near-field representation in FMM. These two facts are used in [15] gain insight into the rank of

QTT compressed integral equations. Although, precise FMM expansions are not known for (9), the rank

of the QTT decomposition of jiZ is expected to behave like that of a scalar Helmholtz equation. Note:

the TT decomposition implicitly exploits the translation invariance of the Green’s function by using the

same core for all groups at each level having the same index; this results in additional compression over

Page 9 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

10

standard FMM schemes [15].

III. NUMERICAL RESULTS

This section presents numerical examples that demonstrate the memory requirements, accuracy, and

applicability of the new QTT acceleration scheme. Even though the QTT scheme can be applied to

matrices arising from the discretization of VIEs on any Cartesian mesh, we restrict ourselves to

L L LN N N meshes having voxels of dimensions L L L . Mesh dimensions therefore are

L L LN L N L N L . Voxel centers are denoted ˆ ˆ ˆm i j k r x y z with 2( 1) ( 1)L Lm i j N k N and

31,..., vox Lm N N . In all numerical examples, we consider one of 98 meshes arising from the possible

combinations of 1 2 142 , 2 , , 2LN and 1/100,1 / 200, ,1 / 6400 mL . Furthermore, we assume time-

harmonic illuminations with f = 300 MHz ; for this frequency, 2bk as scatterers are assumed to

reside in free space, i.e. the background permittivity and permeability are 0b and 0b , and the

background wavelength is 1 mb . Finally, flux expansion coefficients are always obtained by applying

a transpose-free quasi-minimal residual (TFQMR) solver [22] until a relative residual error (RRE) of 610

is reached, or the number of iterations reaches 2,000. Unless stated otherwise, the QTT compression

threshold is set to 510th .

A. Scalar Helmholtz Kernel

First, the QTT decomposition is used to compress three-level Toeplitz matrices scalarZ comprised of

scalar Helmholtz Green’s function samples on the grid of voxel centers, i.e.

,

exp / 4

0

b mscalar

m n

n m nj nk m

m n

r r r rZ , (16)

where , 1,2, , Nvoxn m . Compression of these matrices provides valuable insights into the rank

Page 10 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

11

dependencies of QTT decompositions of VIE matrices on discretization densities and the electrical size of

scatterers.

Small electrical dimensions. Here we investigate QTT ranks and memory requirements assuming that

linear mesh dimensions do not exceed one wavelength, i.e. LN L < 1 m. Fig. 2 shows the maximum rank

of the QTT cores and total QTT storage requirements versus LN L . We note that the maximum QTT rank

is nearly constant in the following cases: (i) for fixed L , with increasing LN (or Nvox ), and (ii) for fixed

LN L , with decreasing L . In other words, for electrically small domains, QTT memory requirements

increase as (log(N ))voxO independent of whether the unknown count increases because of increased

domain size or mesh refinement.

Large electrical dimensions. Here we study QTT ranks and memory requirements for domains with

linear mesh dimensions up to ten wavelengths, i.e. 10mLN L . Fig. 3 shows the maximum rank of the

QTT cores and total QTT memory requirements versus LN L . If L is held constant, the maximum rank

increases linearly with increasing LN L and, since memory scales quadratic with rank, for cubic domains

(i.e. 3

vox LN N L ) the memory increases as 2/3(N log(N ))vox voxO . For a fixed LN L , the ranks are nearly

independent of L and memory increases as (log(N ))voxO as L decreases. In other words, if the unknown

count increases because of increased domain size, memory increases as 2/3(N log(N ))vox voxO . If, in

contrast, it increases because of mesh refinement, memory requirements scale as (log(N ))voxO .

B. Solution of VIE

Here we study the rank and memory of the QTT approximation of matrices j

i

( )Z . Figure 4 shows the

maximum rank of the QTT cores versus LN L . As expected, the ranks exhibit the same trends as those for

the scalar Helmholtz kernel. For domains with electrical sizes less than one wavelength [Fig.4(a)], the

rank is nearly constant with increasing LN L . For domains with electrical size more than one wavelength

Page 11 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

12

[Fig.4(b)], the maximum rank increases linearly with LN L . In Fig. 5, the memory requirements for the

QTT data structure and a standard FFT based scheme (requiring 8N memory for each j

i

( )Z ) are shown.

The QTT can store j

i

( )Z s for a 512 512 512 using 510 times less memory than the FFT based method.

To test the accuracy of the method we consider concentric two-layer spheres with inner-radius

4LN L , outer-radius 2LN L , inner-permittivity 02.5 F/m , and outer-permittivity 05 F/m . First, we

compare results obtained by using the QTT and the FFT for matrix-vector multiplication. We apply each

operator to twenty vectors of random numbers. Table 1 lists the observed average relative L2 norm errors

and CPU-times observed for matrix-vector multiplication with the Toeplitz matrices and the VIE matrix.

As expected, the error in the matrix-vector multiplication decreases as the accuracy of the QTT

compression increases. Furthermore, the matrix-vector multiply is between 10-25 times faster when

performed using the FFT than using the QTT based scheme.

Finally, we consider a scattering scenario. A two-layer sphere with inner-radius 0.16 m , outer-radius

0.32 m , inner-permittivity 02.5 F/m , and outer-permittivity 05 F/m is illuminated by an x -polarized

plane-wave propagating along the z direction. The two-layer sphere is approximated by a voxel mesh

having 100L and 64LN [Fig. 4]. The bistatic radar cross section (RCS) obtained via TT and FFT

compressed VIE and Mie series is shown in Fig. 5. As expected, the FFT- and QTT-compressed VIEs

yield identical solutions and the results match those of the Mie series solution.

IV. CONCLUSIONS

The QTT format was used to compress Toeplitz matrices arising in the compression of discretized

VIEs. Numerical results demonstrated the accuracy and lowered memory requirements relative to FFT

based methods. When refining a mesh to increase the accuracy of a simulation, the QTT data structure

increases in memory (log(N ))voxO . Because accurate EM analysis of bio-physical phenomena oftentimes

requires dense discretization of the geometry, QTT memory savings will be significant for bio-

computational electromagnetic analysis. Future work, will involve lowering the computation time of the

Page 12 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

13

QTT matrix vector multiplication through algorithmic improvements and extensions of the QTT method

to non-regular discretizations (e.g. tetrahedral meshes).

V. APPENDIX

Here the QTT decomposition of a 2-D Laplace kernel sampled on a regular grid is constructed. A

large portion of the appeal of the QTT decomposition is that accurate compressed representations of

kernels can be rapidly constructed via rank revealing algorithms thereby automatically yielding accurate

compressed forms of systems of equations resulting from the discretization of integral equations without

the need of mathematical analysis. The aim of this section is only to illustrate links between QTT and

multi-level FMM and thus many details pertaining to choice of parameters to attain a given accuracy are

omitted.

The 2-D Laplace kernel is log ρ ρ' , where ,x yρ and ', 'x yρ' are observation and source

points. It is assumed that the Laplace kernel is sampled on a square 2 2d d regular lattice and each

sample is spaced a unit distance apart. Each point p on the lattice can be indexed with a d -dimensional

index 1 2, , , di i i , where 1, 2, 3, 4li and 1, 2, ,l d , as

1 2 1

1 2 1 2 3, , , 2 2 2d

d di i i i i i i p m m m m and 1 0,0m , 2 1,0m ,

3 0,1m and 4 1,1m . Correspondingly, the kernel sampled for all observation and source

points can be written as a ( d )-dimensional array

1 2 1 2 1 2

1 1 2 2

, , , log , , , , , ,

,0, ,0 ,0, ,0 0, , ,0 0, , ,0log

0,0, , 0,0, ,

d d d

d d

k k k i i i j j j

i j i j

i j

p p

p p p p

p p

A

, (17)

where 4 1l l lk i j , 1, 2, 3, 4li , 1, 2, 3, 4lj , 1, 2, ,16lk and 1, 2, ,l d . Indices lk

combine source and observation points into a single index. The dimensions 1, 2, ,l d each

correspond to a level in an multi-level FMM quad-tree. Furthermore, the dimensions of the tensor are

Page 13 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

14

numbered from lowest corresponding to the leaf node of the FMM quad-tree and the highest being the

root-node of the quad-tree. Because the dimensions where chosen the same as the FMM quad-tree each

1 1,0, ,0 ,0, ,0i jp p , 2 20, , ,0 0, , ,0i jp p , …, 0,0, , 0,0, ,d di jp p is either

zero or satisfies

1 1 2 2,0, ,0 ,0, ,0 0, , ,0 0, , ,0 0,0, , 0,0, ,d di j i j i j p p p p p p . Thus,

not unlike multi-level FMM schemes, constructing a low-rank tensor train decomposition of A simply

requires the iterative application (and truncation) of the following series expansions:

11 1 2 2 1 2 2

1 2

1

02 21 2

1log , , log log

1 1

n

n

n

k kn

zx y x y real z z real z

n z

n k z

nz zz z

, (18)

where 1 1 1z x jy , 1 2 2z x jy and 1 2z z . When the infinite-sums are approximated by the first q -

entries (i.e. truncation), the result is known to converge exponentially with increasing q to the correct

value. Furthermore, there are well known bounds on the error that can be used to determine the number of

terms required to achieve a guaranteed prescribed accuracy. The TT decomposition will use the following

matrices and vectors that codify (21) up to the q -th order term:

Page 14 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

15

2

2 3

2 1 1

2 2

log 1

0 log 1 1 1

0 0 0 0 0 0

12 3

2 30 1

1 2 1

30 0 1

1 2

0 0 0 0 11

0 0 0 0 01

q q

Tq

q q

q q

q q

q

q

q

q

q

q

q

q

q

q

S

T

G

. (19)

By combining (2)-(3) we have 1 2 1 2 1d d dz z z z z z z q q q q qT G G G T , whenever

1 20 dz z z and 1 2 1 2log z z z z q qS T , whenever 1 20 z z . Operators qS , qT and

qG form the building blocks of the TT decomposition of A . (Note: In what follows positions

,p px yp are interpreted as a complex number p px jy .) From the above the cores of a TT

decomposition of A are

1 1

1 1

1 1

0 1 0

,0, ,0 ,0, ,0

2, 3, ..., 10, , , , 0 0, , , , 0

1 0 0

0, ,0, 0, ,0,

l l

l l

l l

l l

l l l l

T

d d

d d T

d d l l

i jk

i j i j

i jk l d

i j i j

i jk

i j i j

q

q q

q

q

S p p

I

G p p

T p p

G

G

G

, (20)

where 1 1 q qI is an identity matrix. Correspondingly, 1 2 d 1 1 2 2, ,..., Re d dk k k k k kA G G G .

Page 15 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

16

VI. REFERENCES

1. D.E. Livesay and K.-M. Chen, Electromagnetic fields induced inside arbitrarily shaped biological bodies,

IEEE Trans. Microw. Theory Tech. 22 (1974), no. 12, 1273-1280.

2. D.H. Schaubert, D.R. Wilton and A.W. Glisson, A tetrahedral modeling method for electromagnetic

scattering by arbitrarily shaped inhomogenous dielectric bodies, IEEE Trans. Antennas Propagat. 32

(1984), no. 1, 77-85.

3. B.C. Usner, K. Sertel, M.A. Carr and J.L. Volakis, Generalized volume-surface integral equation for

modeling inhomogeneities within high contrast composite structures, IEEE Trans. Antennas Propagat. 54

(2006), 68-75.

4. D.B. Avdeev, A.V. Kuvshinov, O.V. Pankratov and G.A. Newman, Three-dimensional induction logging

problems, part i: An integral equation solution and model comparisons, Geophysics 67 (2002), no. 2, 413-

426.

5. L.J. Gomez, A.C. Yucel and E. Michielssen, The icvsie: A general purpose integral equation method for

bio-electromagnetic analysis, IEEE Transactions on Biomedical Engineering PP (2017), no. 99, 1-1.

6. L.J. Gomez, Y. A. C, x00Fc, cel and E. Michielssen, Volume-surface combined field integral equation for

plasma scatterers, IEEE Antennas and Wireless Propagation Letters 14 (2015), 1064-1067.

7. H. Bagci, A.E. Yilmaz, J.M. Jin and E. Michielssen, Fast and rigorous analysis of emc/emi phenomena on

electrically large and complex cable-loaded structures, IEEE Transactions on Electromagnetic

Compatibility 49 (2007), no. 2, 361-381.

8. N.N. Bojarski, The k‐space formulation of the scattering problem in the time domain, The Journal of the

Acoustical Society of America 72 (1982), no. 2, 570-584.

9. J.R. Phillips and J.K. White, A precorrected-fft method for electrostatic analysis of complicated 3-d

structures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 16 (1997), no.

10, 1059-1072.

10. E. Bleszynski, M. Bleszynski and T. Jaroszewicz, Aim: Adaptive integral method for solving large‐scale

electromagnetic scattering and radiation problems, Radio Science 31 (1996), no. 5, 1225-1251.

11. L. Cai-Cheng, A fast algorithm based on volume integral equation for analysis of arbitrarily shaped

dielectric radomes, IEEE Transactions on Antennas and Propagation 51 (2003), no. 3, 606-612.

12. W.C. Chew, J.-M. Jin, C.-C. Lu, E. Michielssen and J.M. Song, Fast solution methods in electromagnetics,

IEEE Transactions on Antennas and Propagation 45 (1997), no. 3, 533-543.

13. I. Oseledets, Tensors inside of matrices give logarithmic complexity 2009, Preprint (unpublished),<

http://pub.inm. ras. ru/preprint-2009-04.

14. B.N. Khoromskij, O (dlog n)-quantics approximation of nd tensors in high-dimensional numerical

modeling, Constructive Approximation 34 (2011), no. 2, 257-280.

15. E. Corona, A. Rahimian and D. Zorin, A tensor-train accelerated solver for integral equations in complex

geometries, arXiv preprint arXiv:1511.06029 (2015).

16. I. Oseledets and E. Tyrtyshnikov, Tt-cross approximation for multidimensional arrays, Linear Algebra and

its Applications 432 (2010), no. 1, 70-88.

17. H. Bagci, F.P. Andriulli, K. Cools, F. Olyslager and E. Michielssen, A calderón multiplicative

preconditioner for coupled surface-volume electric field integral equations, IEEE Transactions on

Antennas and Propagation 58 (2010), no. 8, 2680-2690.

18. H. Gan and W.C. Chew, A discrete bcg-fft algorithm for solving 3d inhomogeneous scatterer problems,

Journal of Electromagnetic Waves and Applications 9 (1995), no. 10, 1339-1357.

19. B.N. Khoromskij, Tensor numerical methods for multidimensional pdes: Theoretical analysis and initial

applications, ESAIM: Proceedings and Surveys 48 (2015), 1-28.

20. I. Oseledets, "Tt-toolbox 2.2," 2012.

21. I.V. Oseledets, Tensor-train decomposition, SIAM Journal on Scientific Computing 33 (2011), no. 5, 2295-

2317.

22. R.W. Freund, A transpose-free quasi-minimal residual algorithm for non-hermitian linear systems, SIAM J.

Sci. Stat. Comput. 14 (1993), 470-482.

Page 16 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

17

Figure and Table Captions

Figure 1. Recursive bisection of a 2 2 2 domain.

Figure 2. QTT results for Helmholtz kernel for small electrical lengths. (a) Maximum rank of the QTT

versus electrical size (b) Memory for QTT data structure versus electrical size.

Figure 3. QTT results for Helmholtz kernel for large electrical lengths. (a) Maximum rank of the QTT

versus electrical size (b) Memory for QTT data structure versus electrical size.

Figure 4. maximum QTT ranks versus electrical size for VIE Toeplitz matrix. (a) electrically small

domains (b) electrically large domains.

Figure 5. Total memory required for storing the QTT approximation of each VIE Toeplitz matrix for

domains having electrical sizes of up to ten wavelengths.

Figure 6. Radar Cross Section (RCS) of the two-layer sphere.

Table. 1. L2-norm error and CPU-time for matrix vector multiply using QTT compression. Matrix-vector

multiply using FFT method is 2.6 s for 64LN and 20 s for 128LN .

Page 17 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

18

Figure 1

Figure 2

Page 18 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

19

Figure 3

Figure 4

Page 19 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

For Peer Review

20

Figure 5

Figure 6

1 /1600L , 64LN 1 /1600L , 128LN 1/100L , 64LN 1/100L , 128LN

th ( )j

i Z x T B AB z

(CPU time)

( )j

i Z x T B AB z

(CPU time)

( )j

i Z x

T B AB z

(CPU

time)

( )j

i Z x

T B AB z

(CPU time)

310

46.7 10

21.6 10

(22 s)

31.2 10

27.8 10

(225 s)

38.7 10 23.0 10

(23 s)

35.0 10 23.1 10

(245 s)

410

45.1 10

37.0 10

(33 s)

43.0 10 21.2 10

(350 s)

44.6 10 32.6 10

(34 s)

44.4 10 35.3 10

(344 s)

510

55.3 10

44.7 10

(48 s)

51.8 10 31.1 10

(515 s)

53.7 10 42.3 10

(48 s)

54.1 10 45.8 10

(480 s)

Table 1

Page 20 of 20

John Wiley & Sons


123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

for peer reviecoronae/qttem.pdf · for peer review 2 i. introduction volume integral equations...

Documents