for peer reviecoronae/qttem.pdf · for peer review 2 i. introduction volume integral equations...
TRANSCRIPT
For Peer Review
Quantic Tensor Train Method for Compression of Volume
Integral Equations Operators
Journal: Microwave and Optical Technology Letters
Manuscript ID MOP-17-1463
Wiley - Manuscript type: Research Article
Date Submitted by the Author: 21-Oct-2017
Complete List of Authors: Corona, Eduardo; University of Michigan, Ann Arbor, Mathematics gomez, luis Michielssen, Eric; University of Michigan, Ann Arbor, Department of Electrical Engineering and Computer Science
Keywords: Volume integral equations, Quantic tensor train, Adaptive integral method, Fast Multipole Method, Conjugate Gradient Fast Fourier Transform
John Wiley & Sons
Microwave and Optical Technology Letters
For Peer Review
1
Quantic Tensor Train Method for Compression of Volume
Integral Equations Operators
Eduardo Corona1, Luis J. Gomez2* and Eric Michielssen3
(1) Department of Mathematics
University of Michigan
(2) Psychiatry and Behavioral Sciences
Duke University School of Medicine
(3) Department of Electrical and Computer Engineering
University of Michigan
*Corresponding Author: Luis Gomez, 2306 huron st. Durham, NC, [email protected], (954)8165011
(ROUGH) PREPRINT – DO NOT DISTRIBUTE
Abstract – The FFT-based iterative solution of volume integral equations pertinent to the analysis of
electromagnetic scattering phenomena requires O(N) storage stemming from the need to store Toeplitz
matrices. A quantic tensor train scheme is shown to permit compression and storage of these matrices
using sublinear memory. For many applications this leads to a 3-5 orders of magnitude reduction in
storage requirements compared to FFT methods.
Index Terms – Volume integral equations (VIE), Quantic tensor train (QTT), Adaptive integral method
(AIM)
1
Page 1 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
2
I. INTRODUCTION
Volume integral equations (VIEs) are widely used for analyzing electromagnetic (EM) scattering from
heterogeneous penetrable objects [1, 2, 3]. Applications of VIE solvers range from geoscience and remote
sensing (e.g. oil exploration [4]) to bio-electromagnetics (e.g. stimulation devices [5]), plasma physics
(e.g. communication with re-entry vehicles [6]), and circuit characterization (e.g. EMI susceptibility [7]).
Like all integral equation schemes, VIE solvers require “fast” methods to reduce the prohibitive storage
and CPU costs associated with classical implementations. All fast methods for solving VIEs proposed to
date belong to one of two categories:
(i) Fast Fourier Transform (FFT) methods, which exploit the convolutional nature of the VIE operator
to perform matrix-vector products using via FFT; representative examples include k-space [8], pre-
corrected FFT [9], and adaptive integral [10] methods.
(ii) Fast multipole methods (FMM), which exploit the low rank nature of blocks of the interaction
matrix in conjunction with diagonal translation operators [11, 12] to enable a multilevel hierarchical
representation of the interaction matrix.
The storage requirements of all these fast methods scale as O N . The CPU requirements of FFT and
FMM schemes scale as logO N N and O N , respectively. FFT methods generally speaking have
the edge over fast multipole schemes as their asymptotic CPU costs exhibit much smaller leading
constants.
Here we introduce the Quantic Tensor Train (QTT) as a new tool for compressing VIE operators and
applying them to vectors. The QTT simultaneously exploits the convolutional nature of the VIE operator
and the compressibility of low-rank sub-blocks of VIE matrices [13, 14]. Just like (3D) FFT methods,
QTT schemes represent the VIE matrix as a product of sparse matrices with (three-level) Toeplitz
matrices resulting from sampling Green’s functions on a regular grid. The QTT allows for these Toeplitz
matrices to be compressed, eliminating all storage resulting from oversampling of the Green’s function
[15]. To this end, the QTT scheme first reshapes the Toeplitz matrices into higher-dimensional tensors
Page 2 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
3
(quantization step). Next, the scheme compresses them using low-rank methods (tensor train (TT)
decomposition step) [16].
The QTT stores Toeplitz matrices arising from the discretization of VIE operators using logO N
and 2/3 logO N N memory for electrically small and large scatterers, respectively. In many practical
applications, the storage requirements of QTT-compressed Toeplitz matrices are three to five orders of
magnitude below those of classical FFT methods. Application of a QTT-compressed Toeplitz matrix to a
vector requires logO N N operations with a leading constant that typically is roughly an order of
magnitude larger than that of classical FFT schemes. While QTT compression of VIE Toeplitz matrices
does not negate the need for O N storage resources for other code components (e.g. trial solution
vectors), it results in significant memory savings compared to classical FFT codes. In many ways, the
QTT scheme presents an attractive compromise between storing Toeplitz matrices and re-computing them
on the fly during each solver iteration. QTT schemes also have important implications for parallel VIE
solvers, which will be addressed in a future communication.
This Letter is organized as follows. Section II elucidates the QTT scheme; for expositional purposes,
the focus is on VIEs discretized on a Cartesian grid; the procedure however is trivially extended, for
example, to AIM-accelerated VIEs discretized on tetrahedral grids [17]. Section III demonstrates the
technique’s accuracy and efficiency through the compression of scalar Helmholtz and VIE operators.
Section IV presents our conclusions. Throughout this Letter, a time variation j te
is assumed and
suppressed; here 2 f is the angular frequency, f denotes the frequency, and t stands for time.
II. FORMULATION
A. Traditional VIE formulation
Consider an inhomogeneous dielectric object with support and permittivity r that resides in a
Page 3 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
4
homogenous background with permittivity b and permeability b . Upon illumination by incident
electric and magnetic fields incE r and inc
H r , the object scatters electric and magnetic fields
scat inc E EE r r r and inscat c H r H r H r ; here E r and H r denote total electric and
magnetic fields. Using the volume equivalence principle, the object is replaced by equivalent polarization
currents V jrJ r D r where 1 /b r r denotes the electric contrast and
( ) D r r E r is the electric flux. The classical VIE for D r reads
2
inc
' '
'
'
,'
'
' '
b b
b
k g d
g d
rr r
DD r r r
D r r E
r
r r r rr
, (1)
where b b bk and exp 4b bg j xx k x are the wavenumber and scalar Green’s function of
the background medium, respectively.
B. Discretization of the VIE and factorization of system matrices
To simplify the description of the QTT scheme, we discretize (1) using the two-step procedure in [18].
First, is embedded in a cuboid C , which is subdivided into voxN voxels, each of which is assigned a
constant permittivity. Next, the flux D r is expanded as
6
1 1
voxNi i
j j
i j
d
r bD r , (2)
where i
jb r denotes the ith half-rooftop basis function (i = 1,…,6) on the jth voxel, and ijd is the
associated expansion coefficient. Inserting (2) into (1) and applying a Galerkin testing procedure yields
the algebraic system of equations V A d , where
Page 4 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
5
1 1 2 2 6 6
1 1 2 2 6 6
1 1
1 1 1 1 1 1
2 2 2 2 2 2
6 6 6 6
2
6 6
2 6 6
χ χ χ
χ χ χ
χ χ χ
G Z I G Z I G Z I
G Z I G Z I G Z I
G Z I G Z I G Z I
A , (3)
1 2 6, , ,
T
V V V V , and 1 2 6
, , ,T
d d d d . The entries of sub-matrices jiG , χI ,
jiZ and
vectors j
V are
,,
j j
i m nm n
i b br rG r , (4)
,
,n mm nε rI rp p r , (5)
2
,, ' ' '
, ' ' ' '
j
i m b n bm n
m
j
i
n
j
b
ik g d
g d
Z b r r rr b r
b r r r rr b
, (6)
and
inc
1,,
j j
mm rV E b r . (7)
In the above equations,
, d r r rA B rB A r (8)
and np r is a pulse basis function on the nth voxel. Next, we express d B z where B changes the
basis of the flux D r from full to half rooftops, enforcing its continuity across cells[18]. The final
system of equations reads T T B V B AB z .
Matrices B , jiG and χI are sparse and can be stored and applied using O N memory and CPU
resources, respectively. There are a total of 36 matrices jiZ , each having
2N non-zero entries.
However, each entry
,i
n
j
mZ can be expressed as a function of m nr r , where mr and nr are the
positions of the center of voxels m and n, respectively, as
Page 5 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
6
2
,
ˆ ˆ, ' ' '
ˆ ˆ, ' ' ' '
i b bm n
C
j j i
j i
b
C
k g d
g d
m n
m n
Z b r b r r r r r r
b r b r r r r r r
, (9)
where ˆ i
b r is the ith half-rooftop basis function (i = 1,…,6) in a voxel centered about the origin. In
other words, matrices j
i
( )Z are dense three-level Toeplitz matrices that can be stored using O N
memory and applied to vectors using logO N N CPU resources using FFT methods. We next
introduce the QTT as a memory saving alternative.
C. Quantic tensor train decomposition
Previously, the QTT scheme has been used to hierarchically compress linear operators arising in a
wide range of applications into a format termed “Tensor Train (TT) decomposition” [19]. Specifically,
when used to store inverse operators arising in the direct solution of volume and boundary integral
equations with non-oscillatory convolutional kernels in electrostatics, magnetostatics, and low Reynolds
number flow, the QTT format has been shown to require log NO memory and precomputation costs
[15]. Constructing the QTT compressed data structure requires first converting the matrices into a high-
dimensional tensor via a process called quantization and, second, compressing the resulting tensor using
the TT decomposition.
First, we show how to reshape the matrices j
i
( )Z into a single high-dimensional tensor A using an
indexing scheme that results in a hierarchy reminiscent of the geometric bisection tree structure used in
multi-level FMM schemes. Without loss of generality, it is assumed that C is a 2 2 2s s s cube (where
s is a positive integer) composed of unit length cubic voxels each with centers residing in the first octant
and one voxel with center anchored at the origin. The center of each voxel can be expressed as a function
of 3s parameters as
Page 6 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
7
1
1 2 3 3 2 3 1 3
1
, , , 2 ( , , )s
m m m m m mi
s i i i
i
rmr f , ( 10)
where 1,2, , voxm N , 0,1m
k and 1,2,...,3k s . The function rf is the result of subdividing the
domain into groups by geometric bisection. This procedure can be represented using a binary tree CT with
leaf nodes corresponding to voxels. Each index 0,1m
k ( 1,2,...,3k s ) indicates the group where
voxel m is located at level k . For example, Fig. 1 depicts CT of a 2 2 2 domain. In Figure 1, each leaf
voxel is further subdivided into 6 elements, each corresponding to a facet basis function. Since each entry
,
in
j
mZ is a function of m nr r , it is also a function of 1 1 2 2 3 3, , ,
m n m n m n
s s rf .
Tensorization simply leverages this property to construct tensor A as
,1 2
1
1 1
, , ,
1 6 1
2 2, 3, ,
im n
j
d
m n
l l l
k k k
k i j
k l d
ZA
. ( 11)
Here 3 1d s and 1 2, , , dk k kA is 1 2 dp p p = 36 4 4 . Parameter 1k indicates a half-
rooftop basis and testing function pair. Parameters 2 3, , , dk k k indicate a basis and testing voxel group
pair at their corresponding level.
Second, the TT decomposition is used to compress tensor A . The TT decomposition factorizes A
into a product of matrices
d
1 2 d 1, ,..., l ll
k k k k
A G , (12)
where each lG ( = 1,2,…,dl ) is known as a tensor core, each l lkG ( 1,2,...,pl lk ) is a -1r × rl l
matrix, and 0 dr = r = 1 . Each rl is known as a TT rank and generalizes the concept of matrix rank. Each
tensor core is a 3-dimensional array of numbers having k -1r rl lp entries. The total memory cost of the
QTT-decomposition is d
-11r rl l ll
p , in contrast to,
d
1 llp
for storing the whole tensor. The successful
compression of the tensor relies on the TT ranks being small. The maximum rank of the cores maxr are
Page 7 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
8
proportional to the number of multipole terms required for accurate representation of a scalar Helmholtz
operator using addition formulas and can be shown to be small for electrically small domains [15].
Correspondingly, maxr increases linearly with increasing electrical dimension of the domain.
To accomplish the actual QTT compression step we use a rank revealing factorization (specifically, the
multi-pass Alternating Minimal Energy (AMEN) Cross algorithm in the QTT-Toolbox [20]), which
implements a QTT rank-revealing method in which an initial low-QTT-rank approximation is improved
upon by a series of passes through all QTT cores. Each iteration increases the maximum QTT rank until
the difference between the QTT approximation of the last two iterations is below a user-defined threshold
th . Using AMEN, A can be stored with accuracy while only evaluating 2
max(r log(N))O tensor
entries, requiring 3
max(r log(N))O CPU resources and requiring 2
max(r log(N))O memory. Matrix-vector
products can be computed using 2
max(r N log(N))O CPU resources [20] using routines in the QTT-Toolbox,
thus allowing for the fast iterative solution of the systems of equations. For additional details on the QTT
scheme and TT format, see [16, 15, 21].
In what follows and in the Appendix we elucidate the connection of the ranks of the TT-compressed
tensor A with multipole counts used in the FMM. Here we show a connection between the tensorization
of A and variable groupings of FMM. In the appendix, we provide a construction of the QTT cores for a
compressed representation of a 2D Laplace kernel leveraging FMM expansions. For A , 1 1kG is a row
vector containing weights to linearly combine the output column vector of the product d
2 l llk
G .
Thus, 1 1kG can be viewed as an interpolation coefficients that takes samples of basis functions
d
2 l llk
G to entry 1 2 d, ,...,k k kA and 1r is the total number of interpolants. (Note: For notational
brevity we consider products of three consecutive cores in what follows. The same reasoning applies to
single cores, however, the precise inequalities and definitions of near groups is much more nuanced.)
Also for A , the product 2 2 3 3 4 4k k kG G G is a matrix of weights (or interpolants) that, upon
Page 8 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
9
multiplication with the product (or samples of basis functions) d
5 l llk
G , generates d
2 l llk
G .
Here 4r corresponds to the number of interpolants required to produce d
2 l llk
G from
2 2 3 3 4 4k k kG G G . The product of the cores 2 2 3 3 4 4k k kG G G encodes dependencies of A
with respect to variables 1 1
m n ,
2 2
m n and
3 3
m n and the product
d
5 l llk
G encodes
dependencies with respect to variables 4 4
m n ,
5 5
m n ,…,
3 3
m n
s s . From ( 10),
1
1 1 2 2 3 3 3 2 3 2 3 1 3 1 3 3
2
( , , ) 2 ( , , )s
m n m n m n m n m n m ni
i i i i i i
i
m nr r . ( 13)
Furthermore, whenever 1 1 2 2 3 3( , , )
m n m n m n m nr r the following is true
1
1 1 2 2 3 3 3 2 3 2 3 1 3 1 3 3
2
( , , ) 2 ( , , )s
m n m n m n m n m n m ni
i i i i i i
i
. ( 14)
Inequality ( 14) is valid recursively:
1
3 2 3 2 3 1 3 1 3 3
1
1 1
3 2 3 2 3 1 3 1 3 3 3 2 3 2 3 1 3 1 3 3
1 1
2 ( , , ) ,
2 ( , , ) 2 ( , , )
lm n m n m ni
i i i i i i
i
l sm n m n m n m n m n m ni i
i i i i i i i i i i i i
i i l
if then
m nr r
, ( 15)
where 1, 3, , 1l s . Whenever ( 15) is valid, the column vector 3 2
d
l ll lk
G is analogous to
global-to-local interpolants of an FMM scheme and 3 1
2
l
l llk
G is analogous to local-to-local
interpolation coefficients of an FMM scheme. For situations where ( 15) is not valid extra rows and
columns can be appended to the cores, thereby storing the correct values directly; this is analogous to
inserting a near-field representation in FMM. These two facts are used in [15] gain insight into the rank of
QTT compressed integral equations. Although, precise FMM expansions are not known for (9), the rank
of the QTT decomposition of jiZ is expected to behave like that of a scalar Helmholtz equation. Note:
the TT decomposition implicitly exploits the translation invariance of the Green’s function by using the
same core for all groups at each level having the same index; this results in additional compression over
Page 9 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
10
standard FMM schemes [15].
III. NUMERICAL RESULTS
This section presents numerical examples that demonstrate the memory requirements, accuracy, and
applicability of the new QTT acceleration scheme. Even though the QTT scheme can be applied to
matrices arising from the discretization of VIEs on any Cartesian mesh, we restrict ourselves to
L L LN N N meshes having voxels of dimensions L L L . Mesh dimensions therefore are
L L LN L N L N L . Voxel centers are denoted ˆ ˆ ˆm i j k r x y z with 2( 1) ( 1)L Lm i j N k N and
31,..., vox Lm N N . In all numerical examples, we consider one of 98 meshes arising from the possible
combinations of 1 2 142 , 2 , , 2LN and 1/100,1 / 200, ,1 / 6400 mL . Furthermore, we assume time-
harmonic illuminations with f = 300 MHz ; for this frequency, 2bk as scatterers are assumed to
reside in free space, i.e. the background permittivity and permeability are 0b and 0b , and the
background wavelength is 1 mb . Finally, flux expansion coefficients are always obtained by applying
a transpose-free quasi-minimal residual (TFQMR) solver [22] until a relative residual error (RRE) of 610
is reached, or the number of iterations reaches 2,000. Unless stated otherwise, the QTT compression
threshold is set to 510th .
A. Scalar Helmholtz Kernel
First, the QTT decomposition is used to compress three-level Toeplitz matrices scalarZ comprised of
scalar Helmholtz Green’s function samples on the grid of voxel centers, i.e.
,
exp / 4
0
b mscalar
m n
n m nj nk m
m n
r r r rZ , (16)
where , 1,2, , Nvoxn m . Compression of these matrices provides valuable insights into the rank
Page 10 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
11
dependencies of QTT decompositions of VIE matrices on discretization densities and the electrical size of
scatterers.
Small electrical dimensions. Here we investigate QTT ranks and memory requirements assuming that
linear mesh dimensions do not exceed one wavelength, i.e. LN L < 1 m. Fig. 2 shows the maximum rank
of the QTT cores and total QTT storage requirements versus LN L . We note that the maximum QTT rank
is nearly constant in the following cases: (i) for fixed L , with increasing LN (or Nvox ), and (ii) for fixed
LN L , with decreasing L . In other words, for electrically small domains, QTT memory requirements
increase as (log(N ))voxO independent of whether the unknown count increases because of increased
domain size or mesh refinement.
Large electrical dimensions. Here we study QTT ranks and memory requirements for domains with
linear mesh dimensions up to ten wavelengths, i.e. 10mLN L . Fig. 3 shows the maximum rank of the
QTT cores and total QTT memory requirements versus LN L . If L is held constant, the maximum rank
increases linearly with increasing LN L and, since memory scales quadratic with rank, for cubic domains
(i.e. 3
vox LN N L ) the memory increases as 2/3(N log(N ))vox voxO . For a fixed LN L , the ranks are nearly
independent of L and memory increases as (log(N ))voxO as L decreases. In other words, if the unknown
count increases because of increased domain size, memory increases as 2/3(N log(N ))vox voxO . If, in
contrast, it increases because of mesh refinement, memory requirements scale as (log(N ))voxO .
B. Solution of VIE
Here we study the rank and memory of the QTT approximation of matrices j
i
( )Z . Figure 4 shows the
maximum rank of the QTT cores versus LN L . As expected, the ranks exhibit the same trends as those for
the scalar Helmholtz kernel. For domains with electrical sizes less than one wavelength [Fig.4(a)], the
rank is nearly constant with increasing LN L . For domains with electrical size more than one wavelength
Page 11 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
12
[Fig.4(b)], the maximum rank increases linearly with LN L . In Fig. 5, the memory requirements for the
QTT data structure and a standard FFT based scheme (requiring 8N memory for each j
i
( )Z ) are shown.
The QTT can store j
i
( )Z s for a 512 512 512 using 510 times less memory than the FFT based method.
To test the accuracy of the method we consider concentric two-layer spheres with inner-radius
4LN L , outer-radius 2LN L , inner-permittivity 02.5 F/m , and outer-permittivity 05 F/m . First, we
compare results obtained by using the QTT and the FFT for matrix-vector multiplication. We apply each
operator to twenty vectors of random numbers. Table 1 lists the observed average relative L2 norm errors
and CPU-times observed for matrix-vector multiplication with the Toeplitz matrices and the VIE matrix.
As expected, the error in the matrix-vector multiplication decreases as the accuracy of the QTT
compression increases. Furthermore, the matrix-vector multiply is between 10-25 times faster when
performed using the FFT than using the QTT based scheme.
Finally, we consider a scattering scenario. A two-layer sphere with inner-radius 0.16 m , outer-radius
0.32 m , inner-permittivity 02.5 F/m , and outer-permittivity 05 F/m is illuminated by an x -polarized
plane-wave propagating along the z direction. The two-layer sphere is approximated by a voxel mesh
having 100L and 64LN [Fig. 4]. The bistatic radar cross section (RCS) obtained via TT and FFT
compressed VIE and Mie series is shown in Fig. 5. As expected, the FFT- and QTT-compressed VIEs
yield identical solutions and the results match those of the Mie series solution.
IV. CONCLUSIONS
The QTT format was used to compress Toeplitz matrices arising in the compression of discretized
VIEs. Numerical results demonstrated the accuracy and lowered memory requirements relative to FFT
based methods. When refining a mesh to increase the accuracy of a simulation, the QTT data structure
increases in memory (log(N ))voxO . Because accurate EM analysis of bio-physical phenomena oftentimes
requires dense discretization of the geometry, QTT memory savings will be significant for bio-
computational electromagnetic analysis. Future work, will involve lowering the computation time of the
Page 12 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
13
QTT matrix vector multiplication through algorithmic improvements and extensions of the QTT method
to non-regular discretizations (e.g. tetrahedral meshes).
V. APPENDIX
Here the QTT decomposition of a 2-D Laplace kernel sampled on a regular grid is constructed. A
large portion of the appeal of the QTT decomposition is that accurate compressed representations of
kernels can be rapidly constructed via rank revealing algorithms thereby automatically yielding accurate
compressed forms of systems of equations resulting from the discretization of integral equations without
the need of mathematical analysis. The aim of this section is only to illustrate links between QTT and
multi-level FMM and thus many details pertaining to choice of parameters to attain a given accuracy are
omitted.
The 2-D Laplace kernel is log ρ ρ' , where ,x yρ and ', 'x yρ' are observation and source
points. It is assumed that the Laplace kernel is sampled on a square 2 2d d regular lattice and each
sample is spaced a unit distance apart. Each point p on the lattice can be indexed with a d -dimensional
index 1 2, , , di i i , where 1, 2, 3, 4li and 1, 2, ,l d , as
1 2 1
1 2 1 2 3, , , 2 2 2d
d di i i i i i i p m m m m and 1 0,0m , 2 1,0m ,
3 0,1m and 4 1,1m . Correspondingly, the kernel sampled for all observation and source
points can be written as a ( d )-dimensional array
1 2 1 2 1 2
1 1 2 2
, , , log , , , , , ,
,0, ,0 ,0, ,0 0, , ,0 0, , ,0log
0,0, , 0,0, ,
d d d
d d
k k k i i i j j j
i j i j
i j
p p
p p p p
p p
A
, (17)
where 4 1l l lk i j , 1, 2, 3, 4li , 1, 2, 3, 4lj , 1, 2, ,16lk and 1, 2, ,l d . Indices lk
combine source and observation points into a single index. The dimensions 1, 2, ,l d each
correspond to a level in an multi-level FMM quad-tree. Furthermore, the dimensions of the tensor are
Page 13 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
14
numbered from lowest corresponding to the leaf node of the FMM quad-tree and the highest being the
root-node of the quad-tree. Because the dimensions where chosen the same as the FMM quad-tree each
1 1,0, ,0 ,0, ,0i jp p , 2 20, , ,0 0, , ,0i jp p , …, 0,0, , 0,0, ,d di jp p is either
zero or satisfies
1 1 2 2,0, ,0 ,0, ,0 0, , ,0 0, , ,0 0,0, , 0,0, ,d di j i j i j p p p p p p . Thus,
not unlike multi-level FMM schemes, constructing a low-rank tensor train decomposition of A simply
requires the iterative application (and truncation) of the following series expansions:
11 1 2 2 1 2 2
1 2
1
02 21 2
1log , , log log
1 1
n
n
n
k kn
zx y x y real z z real z
n z
n k z
nz zz z
, (18)
where 1 1 1z x jy , 1 2 2z x jy and 1 2z z . When the infinite-sums are approximated by the first q -
entries (i.e. truncation), the result is known to converge exponentially with increasing q to the correct
value. Furthermore, there are well known bounds on the error that can be used to determine the number of
terms required to achieve a guaranteed prescribed accuracy. The TT decomposition will use the following
matrices and vectors that codify (21) up to the q -th order term:
Page 14 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
15
2
2 3
2 1 1
2 2
log 1
0 log 1 1 1
0 0 0 0 0 0
12 3
2 30 1
1 2 1
30 0 1
1 2
0 0 0 0 11
0 0 0 0 01
q q
Tq
q q
q q
q q
q
q
q
q
q
q
q
q
q
q
S
T
G
. (19)
By combining (2)-(3) we have 1 2 1 2 1d d dz z z z z z z q q q q qT G G G T , whenever
1 20 dz z z and 1 2 1 2log z z z z q qS T , whenever 1 20 z z . Operators qS , qT and
qG form the building blocks of the TT decomposition of A . (Note: In what follows positions
,p px yp are interpreted as a complex number p px jy .) From the above the cores of a TT
decomposition of A are
1 1
1 1
1 1
0 1 0
,0, ,0 ,0, ,0
2, 3, ..., 10, , , , 0 0, , , , 0
1 0 0
0, ,0, 0, ,0,
l l
l l
l l
l l
l l l l
T
d d
d d T
d d l l
i jk
i j i j
i jk l d
i j i j
i jk
i j i j
q
q q
q
q
S p p
I
G p p
T p p
G
G
G
, (20)
where 1 1 q qI is an identity matrix. Correspondingly, 1 2 d 1 1 2 2, ,..., Re d dk k k k k kA G G G .
Page 15 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
16
VI. REFERENCES
1. D.E. Livesay and K.-M. Chen, Electromagnetic fields induced inside arbitrarily shaped biological bodies,
IEEE Trans. Microw. Theory Tech. 22 (1974), no. 12, 1273-1280.
2. D.H. Schaubert, D.R. Wilton and A.W. Glisson, A tetrahedral modeling method for electromagnetic
scattering by arbitrarily shaped inhomogenous dielectric bodies, IEEE Trans. Antennas Propagat. 32
(1984), no. 1, 77-85.
3. B.C. Usner, K. Sertel, M.A. Carr and J.L. Volakis, Generalized volume-surface integral equation for
modeling inhomogeneities within high contrast composite structures, IEEE Trans. Antennas Propagat. 54
(2006), 68-75.
4. D.B. Avdeev, A.V. Kuvshinov, O.V. Pankratov and G.A. Newman, Three-dimensional induction logging
problems, part i: An integral equation solution and model comparisons, Geophysics 67 (2002), no. 2, 413-
426.
5. L.J. Gomez, A.C. Yucel and E. Michielssen, The icvsie: A general purpose integral equation method for
bio-electromagnetic analysis, IEEE Transactions on Biomedical Engineering PP (2017), no. 99, 1-1.
6. L.J. Gomez, Y. A. C, x00Fc, cel and E. Michielssen, Volume-surface combined field integral equation for
plasma scatterers, IEEE Antennas and Wireless Propagation Letters 14 (2015), 1064-1067.
7. H. Bagci, A.E. Yilmaz, J.M. Jin and E. Michielssen, Fast and rigorous analysis of emc/emi phenomena on
electrically large and complex cable-loaded structures, IEEE Transactions on Electromagnetic
Compatibility 49 (2007), no. 2, 361-381.
8. N.N. Bojarski, The k‐space formulation of the scattering problem in the time domain, The Journal of the
Acoustical Society of America 72 (1982), no. 2, 570-584.
9. J.R. Phillips and J.K. White, A precorrected-fft method for electrostatic analysis of complicated 3-d
structures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 16 (1997), no.
10, 1059-1072.
10. E. Bleszynski, M. Bleszynski and T. Jaroszewicz, Aim: Adaptive integral method for solving large‐scale
electromagnetic scattering and radiation problems, Radio Science 31 (1996), no. 5, 1225-1251.
11. L. Cai-Cheng, A fast algorithm based on volume integral equation for analysis of arbitrarily shaped
dielectric radomes, IEEE Transactions on Antennas and Propagation 51 (2003), no. 3, 606-612.
12. W.C. Chew, J.-M. Jin, C.-C. Lu, E. Michielssen and J.M. Song, Fast solution methods in electromagnetics,
IEEE Transactions on Antennas and Propagation 45 (1997), no. 3, 533-543.
13. I. Oseledets, Tensors inside of matrices give logarithmic complexity 2009, Preprint (unpublished),<
http://pub.inm. ras. ru/preprint-2009-04.
14. B.N. Khoromskij, O (dlog n)-quantics approximation of nd tensors in high-dimensional numerical
modeling, Constructive Approximation 34 (2011), no. 2, 257-280.
15. E. Corona, A. Rahimian and D. Zorin, A tensor-train accelerated solver for integral equations in complex
geometries, arXiv preprint arXiv:1511.06029 (2015).
16. I. Oseledets and E. Tyrtyshnikov, Tt-cross approximation for multidimensional arrays, Linear Algebra and
its Applications 432 (2010), no. 1, 70-88.
17. H. Bagci, F.P. Andriulli, K. Cools, F. Olyslager and E. Michielssen, A calderón multiplicative
preconditioner for coupled surface-volume electric field integral equations, IEEE Transactions on
Antennas and Propagation 58 (2010), no. 8, 2680-2690.
18. H. Gan and W.C. Chew, A discrete bcg-fft algorithm for solving 3d inhomogeneous scatterer problems,
Journal of Electromagnetic Waves and Applications 9 (1995), no. 10, 1339-1357.
19. B.N. Khoromskij, Tensor numerical methods for multidimensional pdes: Theoretical analysis and initial
applications, ESAIM: Proceedings and Surveys 48 (2015), 1-28.
20. I. Oseledets, "Tt-toolbox 2.2," 2012.
21. I.V. Oseledets, Tensor-train decomposition, SIAM Journal on Scientific Computing 33 (2011), no. 5, 2295-
2317.
22. R.W. Freund, A transpose-free quasi-minimal residual algorithm for non-hermitian linear systems, SIAM J.
Sci. Stat. Comput. 14 (1993), 470-482.
Page 16 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
17
Figure and Table Captions
Figure 1. Recursive bisection of a 2 2 2 domain.
Figure 2. QTT results for Helmholtz kernel for small electrical lengths. (a) Maximum rank of the QTT
versus electrical size (b) Memory for QTT data structure versus electrical size.
Figure 3. QTT results for Helmholtz kernel for large electrical lengths. (a) Maximum rank of the QTT
versus electrical size (b) Memory for QTT data structure versus electrical size.
Figure 4. maximum QTT ranks versus electrical size for VIE Toeplitz matrix. (a) electrically small
domains (b) electrically large domains.
Figure 5. Total memory required for storing the QTT approximation of each VIE Toeplitz matrix for
domains having electrical sizes of up to ten wavelengths.
Figure 6. Radar Cross Section (RCS) of the two-layer sphere.
Table. 1. L2-norm error and CPU-time for matrix vector multiply using QTT compression. Matrix-vector
multiply using FFT method is 2.6 s for 64LN and 20 s for 128LN .
Page 17 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
18
Figure 1
Figure 2
Page 18 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
19
Figure 3
Figure 4
Page 19 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
For Peer Review
20
Figure 5
Figure 6
1 /1600L , 64LN 1 /1600L , 128LN 1/100L , 64LN 1/100L , 128LN
th ( )j
i Z x T B AB z
(CPU time)
( )j
i Z x T B AB z
(CPU time)
( )j
i Z x
T B AB z
(CPU
time)
( )j
i Z x
T B AB z
(CPU time)
310
46.7 10
21.6 10
(22 s)
31.2 10
27.8 10
(225 s)
38.7 10 23.0 10
(23 s)
35.0 10 23.1 10
(245 s)
410
45.1 10
37.0 10
(33 s)
43.0 10 21.2 10
(350 s)
44.6 10 32.6 10
(34 s)
44.4 10 35.3 10
(344 s)
510
55.3 10
44.7 10
(48 s)
51.8 10 31.1 10
(515 s)
53.7 10 42.3 10
(48 s)
54.1 10 45.8 10
(480 s)
Table 1
Page 20 of 20
John Wiley & Sons
Microwave and Optical Technology Letters
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960