eindhoven university of technology - tu/e · pdf fileeindhoven university of technology ......

EINDHOVEN UNIVERSITY OF TECHNOLOGY Department of Mathematics and Computer Science

CASA-Report 12-34 October 2012

Efficient solution of Maxwell’s equations for geometries with repeating patterns by an exchange of discretization directions in the

aperiodic Fourier modal method

by

M. Pisarenco, J.M.L. Maubach, I.D. Setija, R.M.M. Mattheij

Centre for Analysis, Scientific computing and Applications

Department of Mathematics and Computer Science

Eindhoven University of Technology

P.O. Box 513

5600 MB Eindhoven, The Netherlands

ISSN: 0926-4507

Efficient solution of Maxwell’s equations for geometries

with repeating patterns by an exchange of discretization

directions in the aperiodic Fourier modal method

M. Pisarencoa,b, J.M.L. Maubacha, I.D. Setijab, R.M.M. Mattheija

aDepartment of Mathematics and Computer Science, Eindhoven University ofTechnology, PO Box 513, 5600MB Eindhoven, The Netherlands

bDepartment of Research, ASML Netherlands B.V., De Run 6501, 5504DR Veldhoven,The Netherlands

Abstract

The aperiodic Fourier modal method in contrast-field formulation is a nu-merical discretization and solution technique for solving scattering problemsin electromagnetics. Typically, spectral discretization is used in the finiteperiodic direction and spatial discretization in the orthogonal direction. Inthe light of the fact that the structures of interest often have a large width-to-height ratio and that the two discretization approaches have different compu-tational complexities, we propose exchanging the directions for spatial andspectral discretization. Moreover, if the scatterer has repeating patterns,swapping the discretization directions facilitates the reuse of previous com-putations. Therefore, the new method is suited for scattering from objectswith a finite number of periods, such as gratings, memory arrays, metama-terials, etc. Numerical experiments demonstrate a considerable reduction ofthe computational costs in terms of time and memory. For a specific test caseconsidered in this paper, the new method (based on alternative discretiza-tion) is 40 times faster and requires 100 times less memory than the methodbased on classical discretization.

Keywords: aperiodic Fourier modal method, AFMM-CFF, rigorouscoupled-wave analysis, RCWA, electromagnetic scattering, perfectlymatched layer, PML, computational complexity, alternative discretization

1. Introduction

Electromagnetic scattering occurs whenever propagating waves encounterobstacles on their path. It may be observed in various locations and at variouslengthscales; from scattering of sunlight from the ocean surface to diffractionof a laser beam by a nanostructure. Maxwell’s equations define a math-ematical model which rigorously describes the behavior of electromagneticfields. The number of situations where an analytical solution of Maxwellequations can be found is very limited. Ref. (Bowman et al., 1987) gives athorough overview of such very special cases. In all other cases numericalmethods must be used to get an approximate solution of the equations. Dur-ing the last decades many numerical methods for solving Maxwell equationshave been developed. Several of them are widely used: the finite-elementmethod (FEM) (Monk, 1992; Zschiedrich et al., 2006; Monk, 2003), the finite-difference time-domain method (FDTD) (Yee, 1966; Prather and Shi, 1999;Taflove and Hagness, 2005) and the integral equation methods (IEM), whichinclude the boundary element method (BEM) (Prather et al., 1997; Li, 2010)and the volume integral method (VIM) (Zwamborn and van den Berg, 1991;Botha, 2006; Chang et al., 2006; Shcherbakov and Tishchenko, 2010; vanBeurden, 2011).

The Fourier modal method (FMM) and the aperiodic Fourier modalmethod in contrast field formulation (AFMM-CFF) are numerical solutionmethods for Maxwell’s equations which are less known to the large audiencein the field of computational electromagnetics. The FMM (Moharam et al.,1995a) is the oldest of the two and is also known under the name of rig-orous coupled-wave analysis (RCWA). It is applicable for infinitely periodicstructures and originated in the diffractive optics community more than 30years ago (Knop, 1978). In the past decades the method has matured dueto fundamental studies and improvements to its stability (Moharam et al.,1995b; Li, 1996a) and convergence (Granet and Guizal, 1996; Lalanne andMorris, 1996; Li, 1996b). Other important contributions to the evolution ofthe method are the techniques of adaptive spatial resolution (Granet, 1999)and normal vector fields (Popov and Neviere, 2000, 2001; Schuster et al.,2007). Ref. (Hench and Strakos, 2008) gives a mathematical perspectiveof the challenges that have been overcome in the FMM and of the openproblems still to be addressed.

The AFMM-CFF (Pisarenco et al., 2010, 2011) is a recent extension ofthe FMM to aperiodic structures. It builds upon the aperiodic Fourier modal

2

FDTD FEM FMM/AFMM-CFF BEM VIM

-x

?z

Figure 1: The rectangular grid of the FDTD method with sub-pixel smoothing to ap-proximate the cylinder. The triangular grid of the FEM that conforms to the surface ofthe cylinder. The discrete layers of the FMM and AFMM-CFF, with the permittivity de-scribed by a few Fourier modes per layer. The source contributions of the surface elementsfor the BEM and the polarization densities in the cylinder for the VIM. Image reproducedfrom Janssen (2010) with permission.

method developed and refined by Lalanne and co-workers (Lalanne and Sil-berstein, 2000; Silberstein et al., 2001; Cao et al., 2002; Hugonin and Lalanne,2005; Lecamp et al., 2007) with the key difference that the AFMM-CFFsolves Maxwell’s equations formulated in terms of a contrast (scattered) fieldinstead of a total field. This reformulation allows prescription of arbitraryincoming fields onto the structure of interest (Pisarenco et al., 2010). Ape-riodicity is achieved by using perfectly matched layers (PMLs) (Berenger,1994) on the vertical boundaries in order to annihilate the periodic bound-ary condition. An ideal PML yields an effective radiation boundary condi-tion. Thus, unlike in the (periodic) FMM, effectively the same boundarycondition is imposed on all boundaries. This fact facilitates the exchangeof discretization directions and, as shown later, decreases the computationalcosts.

Figure 1 shows a two-dimensional cylinder problem discretized by severalnumerical methods. From the point of view of computational costs, thediscretization process determines the number of degrees of freedom of thediscrete problem. This number in turn determines the execution time and thememory requirements. For instance, a discretization of Maxwell equations(containing six unknowns) on a regular grid as in FDTD yieldsNDOF = 6NMdegrees of freedom, where N and M represent the number of discretizationpoints in the x- and z-direction respectively. If the system is subsequentlysolved by an iterative method the computational cost is typically O(N2

DOF) =

3

O(N2M2). Similar reasoning holds for the FEM and the integral equationmethods, although the grid may be irregular in these cases. The key pointhere is that the computational cost of the methods mentioned above scalesin the same way for all the discretization directions.

As can be seen from Figure 1, the FMM and AFMM-CFF make a dis-tinction between the directions. In the vertical direction (the z-direction)the domain is divided into M layers in which the material properties areassumed to be z-independent. In each layer, the horizontal direction (thex-direction) is discretized by applying the Galerkin method with ”shifted”Fourier harmonics as basis and test functions. This yields a system of ordi-nary differential equations of a size equal to the number of basis functions,N . The general solution (with unknown integration constants) of this sys-tem may be derived by numerically solving a matrix diagonalization problem.The diagonalization of dense matrices (based on QR-decomposition) has acomputational complexity of O(N3) (Golub and Van Loan, 1996, Section5.2). Thus, for M layers the total computational cost is O(N3M). Similarlythe memory requirements can be estimated to O(N2M).

It appears that the x-direction is ”more expensive” both in terms oftime and memory. For rectangular scatterers/domains that are much longerin the x-direction, it is reasonable to choose an alternative discretization:make the longer direction ”cheaper” by using spatial discretization into layersand apply spectral discretization in the shorter direction. This exchange ofdirections turns out to be possible for the AFMM-CFF and not for the FMMsince the latter requires different boundary conditions on the vertical andhorizontal boundaries. For structures with repeating patterns, the alternativediscretization introduces slicing in the direction of repetition. We exploit thisfact by proposing a new recursive algorithm which steps through the slicesin a geometric progression. This further reduces the computational costs byreplacing the linear dependence of the costs on the number of slices by alogarithmic one.

In many important applications the scatterer is placed in a stratifiedmedium. Examples include broad scientific areas such as submarine detec-tion, geophysical exploration, optical microscopy and metrology (Li, 2010).Therefore, in this paper we consider the more general case of a bounded scat-terer placed in a stratified medium, also referred to as background multilayer.

This paper is structured as follows. In the next section we formulatea horizontal and an equivalent vertical problem whose solutions are to becomputed with the AFMM-CFF using respectively classical and alternative

4

discretizations. Section 3 contains a general formulation of the AFMM-CFFsuitable for both problems defined previously. The attention here is on thecomputation of the background field for the the vertical problem. In Section 4we specifically consider structures with repeating patterns and present a fastrecursive algorithm for solving the final linear system exploiting this property.A summary of the theoretical estimates of time and memory required by theAFMM-CFF is given in Section 5. Section 6 presents numerical evidence onthe efficiency of the proposed approach. Experimental speed-up factors arecompared to the theoretical ones. Finally, our conclusions are presented inSection 7.

2. Problem formulation

We consider electromagnetic fields with a time-harmonic behavior, i.e.

e(x, t) = Ree(x)eiωt, (1)

h(x, t) = Reh(x)eiωt, (2)

where ω is the angular temporal frequency. Using the time-harmonic prop-erty of the fields, Maxwell’s equations transform to

∇× e(x) = −k0h(x), (3a)

∇× h(x) = −k0ϵ(x, z)e(x). (3b)

Here k0 = ω√ϵ0µ0, x = (x, y, z)T is the position vector, e = (ex, ey, ez)

T is the

electric field and h = (hx, hy, hz)T is the magnetic field scaled by −i

√ϵ0/µ0

(with respect to the physical magnetic field). We assume that all materialsare non-magnetic and that the electric permittivity ϵ is a scalar y-invariantfunction. The incident field is given by

einc(x) = ae−ikinc·x, (3c)

where kinc = (kincx , kinc

y , kincz )T is the wavevector and a = (ax, ay, az)

T is theamplitude vector. Note that Maxwell’s equations in homogeneous isotropicmedia require that kinc · a = 0.

The numerical solution of (3) is to be computed on a domain Ω = [0,Λ]×R×R using AFMM-CFF with classical and alternative discretization. Note

5

x

z

Figure 2: Alternative discretization by a turn of the coordinate system. Geometry of thehorizontal problem P (left) and the vertical problem ¯P (right).

that alternative discretization may be applied by a ”turn” of the coordinatesystem

(x, y, z)T → (z,−y, x)T , (4)

such that formally the discretization directions remain unchanged, i.e. the x-direction is discretized with harmonics and the z-direction is discretized withslices in both approaches. Figure 2 demonstrates the turn for a finite gratingwith three rectangular lines. Note that instead of three slices, we now haveseven. From now on, we distinguish a horizontal problem P consisting of (3)with a geometry ϵ(x, z) = ϵ(x, z) and an incoming field einc(x) = ae−ikinc·x

and a vertical problem ¯P with ϵ(x, z) = ¯ϵ(x, z) and einc(x) = ¯ae−i¯kinc·x. Wedefine a transformation matrix

T =

0 0 10 −1 01 0 0

, (5)

which realizes the rotation of the problem. Note that a second applicationof the transformation yields the original coordinate system, i.e. TT = I orT = T−1. The vertical problem is related to the horizontal problem through

¯ϵ(x) = ϵ(Tx), (6a)

¯a = Ta, (6b)

¯kinc = Tkinc, (6c)

6

and the solutions of the two problems satisfy

e(x) = T¯e(Tx), (7a)

h(x) = T¯h(Tx). (7b)

The functions ϵ and ¯ϵ determine the geometry of the problem which typi-cally consists of a scatterer placed on a substrate (see Figure 2). In manyapplications the scatterer has a width which is much larger than its heightwhich motivates the turn of coordinates. Although the incident field is as-sumed to be a plane wave in (3c), any other wavefront may be modeled by asuperposition of multiple plane waves.

In the next section we describe a generalized AFMM-CFF which canbe applied to both horizontal and vertical problems. Where necessary wediscuss the differences and modifications required in order to solve the verticalproblem ¯P .

3. Generalized AFMM-CFF for horizontal and vertical backgrounds

3.1. Perfectly matched layers and the Maxwell equations for the contrast field

We use PMLs (Berenger, 1994) in order to impose the radiation condi-tion at the lateral boundaries. The radiation condition however imposes arestriction on the problems which can be solved: no incident field is allowed.Therefore, Maxwell’s equations need to be reformulated such that the inci-dent field is replaced by an equivalent source term. To this end, an associatedbackground problem is defined,

∇× eb(x) = −k0hb(x), (8a)

∇× hb(x) = −k0ϵb(x, z)eb(x), (8b)

with

eb,inc(x) = ae−ikinc·x. (8c)

This problem is chosen such that it admits an analytical solution and thefunction ϵ − ϵb has compact support. As explained in (Pisarenco et al.,2010), the compact support condition is required in order to avoid non-zerosource terms in the PML. Typically ϵb represents the permittivity of thebackground multilayer which supports the scatterer. It may be horizontal or

7

0 1 2 3 4 5−1

−0.5

0

0.5

1

x

β(x

)

xl xr Λ

Figure 3: A typical example of an imaginary part β(x) of the coordinate stretching functionf(x) used in the implementation of PMLs. For this particular β(x) the PMLs are locatedin the intervals x ∈ [0, 1] and x ∈ [4, 5].

vertical, depending on the problem being solved. Subtraction of (8) from (3)yields the so-called contrast-field formulation

∇× ec(x) = −k0hc(x), (9a)

∇× hc(x) = −k0ϵ(x, z)ec(x)− k0(ϵ(x, z)− ϵb(x, z))eb(x), (9b)

with

ec,inc(x) = 0. (9c)

The PMLs can be viewed as an analytical continuation of the solution intothe complex plane (Chew and Weedon, 1994; Collino and Monk, 1998b). ForPMLs placed in the x-direction, this implies a change of the x-derivative inthe differential equations (9):

∂

∂x→ 1

f ′(x)

∂

∂x, with f(x) = x+ iβ(x). (10)

The function β is continuous and non-zero only in the PMLs, which areplaced in the stripes [0, xl] and [xr,Λ]. We mention that although PMLsare originally designed for hyperbolic equations, they have also been recentlysuccessfully applied to the heat equation (Lantos and Nataf, 2010). Lalanneand co-workers (Lalanne and Silberstein, 2000; Silberstein et al., 2001; Hugo-nin and Lalanne, 2005) were the first to use absorbing layers and PMLs with

8

0 1 2 3 4 5−3

−2

−1

0

1

2

3

4

Einc

=e−ik

xx+

kzz

for

z=

0

<x

Real partMagnitude

Figure 4: Propagation of a plane wave through PMLs defined by β(x) in (11).

Fourier modal methods. In (Hugonin and Lalanne, 2005) they chose trigono-metric stretching functions whose Fourier coefficients can be computed ana-lytically. Many other forms for f (and implicitly for β) have been suggested(Berenger, 1996; Collino and Monk, 1998a; Petropoulos, 2003). The mostsuccessful use a polynomial or geometric variation of f in the PML. Weadopt the first form. An example of such a function is shown in Figure 3.

We assume a PML given by the coordinate transformation

f(x) = x+ iβ(x) = x+ ikp0(β0|x− x0|)p+1, p ∈ R+, (11)

where x0 = xl for the left PML and x0 = xr for the right PML. To theauthors’ knowledge, no study exists on the choice of p for PMLs in theaperiodic Fourier modal methods. For instance, for FDTD methods nearlyoptimal results have been obtained for p ∈ [3, 4] (Berenger, 1996; Wu andFang, 1996).

We look at the amplitude of a plane wave with unit amplitude afterpassing through a PML of width ∆x (see Figure 4)

|e−ikxf(x)| = |e−ikxx||e−kxkp0(β0∆x)p+1 | ≤ e−(k0β0∆x)p+1

.

Now, we determine the distance over which the amplitude decays from 1 toe−1.

−(k0β0∆x)p+1 = −1 ⇒ k0∆x =1

β0

.

9

Figure 5: Sliced geometry. The dashed line represents the smooth profile being approxi-mated. The hatched areas indicate PMLs.

This relation provides an intuitive meaning for β0 - it determines the inversedecay length. Since k0 = 2π/Λ, the length is scaled by the wavelength of theincident field.

3.2. Discretization

Spatial discretization in z is achieved in the FMM by dividing the com-putational domain vertically into M slices such that the permittivity may beconsidered z-independent in each separate slice. The terminology of ”slice”is preferred here to ”layer”, as the latter is reserved to physical layers in themultilayer stack. As illustrated in Figure 5, the upper and lower interface ofslice l are located at hl−1 and hl respectively. For future reference, we definethe set h = hl, l = 0, ...,M. Since z ∈ R, we take h0 = −∞, hM = +∞.The permittivities in each slice are given by

ϵl(x) = ϵ(x, zl), with zl ∈ [hl−1, hl), (12)

ϵbl (x) = ϵb(x, zl), with zl ∈ [hl−1, hl). (13)

Thus, the profile of the scatterer is approximated by a staircase as in Figure 5.In the horizontal problem, the permittivity of the background ϵbl is constantin each layer. The electric and magnetic fields on the computational domainnow consist of fields in separate slices

el(x, y, z) = e(x, y, z), z ∈ [hl−1, hl), (14)

hl(x, y, z) = h(x, y, z), z ∈ [hl−1, hl). (15)

10

The Maxwell equations for the contrast field (9) with PMLs (11) applied toslice l read

∂

∂yecz,l −

∂

∂zecy,l = −k0h

cx,l (16a)

∂

∂zecx,l −

1

f ′(x)

∂

∂xecz,l = −k0h

cy,l (16b)

1

f ′(x)

∂

∂xecy,l −

∂

∂yecx,l = −k0h

cz,l (16c)

1

ϵl(x)

∂

∂yhcz,l −

1

ϵl(x)

∂

∂zhcy,l = −k0e

cx,l − k0(1−

1

ϵl(x)ϵbl (x))e

bx,l (16d)

∂

∂zhcx,l −

1

f ′(x)

∂

∂xhcz,l = −k0ϵl(x)e

cy,l − k0(ϵl(x)− ϵbl (x))e

by,l (16e)

1

f ′(x)

∂

∂xhcy,l −

∂

∂yhcx,l = −k0ϵl(x)e

cz,l − k0(ϵl(x)− ϵbl (x))e

bz,l (16f)

for (x, y, z) ∈ [0,Λ] × R × [hl−1, hl). Equation (16d) has been divided byϵl(x) in order to avoid products of functions with concurrent (in the samepoint) jump discontinuities on the right-hand side. As shown by Li (1996b),discretization of the original equation leads to slower convergence.

Spectral discretization in x is achieved with a Galerkin approach thatuses ”shifted” Fourier harmonics as basis functions and test functions,

ϕn(x, y) = e−i(kxnx+kyy), (17)

where

kxn = kincx + n

2π

Λ, ky = kinc

y , for n = −N...+N.

In each slice l the contrast (electric and magnetic) fields are expanded as

ec,Nα,l (x, y, z) =N∑

n=−N

scα,l,n(z)ϕn(x, y) = (scα,l(z))T · ϕ(x, y), (18a)

hc,Nα,l (x, y, z) =

N∑n=−N

ucα,l,n(z)ϕn(x, y) = (uc

α,l(z))T · ϕ(x, y). (18b)

The α symbol stands for the x-, y-, or z-component of the field. The back-ground fields, which are known in advance, have to be represented in the

11

same basis as the contrast field, i.e.

eb,Nα,l (x, y, z) =N∑

n=−N

sbα,l,n(z)ϕn(x, y) = (sbα,l(z))T · ϕ(x, y), (19a)

hb,Nα,l (x, y, z) =

N∑n=−N

ubα,l,n(z)ϕn(x, y) = (ub

α,l(z))T · ϕ(x, y). (19b)

Computation of sbα,l,n(z) and ubα,l,n(z) for both horizontal and vertical multi-

layers will be discussed in Section 3.3.We apply the Galerkin method with a standard inner product on the in-

terval [0,Λ) to the contrast field equations (16) to obtain a system of ordinarydifferential equations in z for each slice l,

−iKyscz,l(z)− k−1

0

d

dzscy,l(z) = −uc

x,l(z), (20a)

k−10

d

dzscx,l(z) + iFKxs

cz,l(z) = −uc

y,l(z), (20b)

−iFKxscy,l(z) + iKys

cx,l(z) = −uc

z,l(z), (20c)

−iKyucz,l(z)− k−1

0

d

dzucy,l(z) = −P−1

l scx,l(z)− (P−1l −Pb

l

−1)sbx,l(z), (20d)

k−10

d

dzucx,l(z) + iFKxu

cz,l(z) = −Els

cy,l(z)− (El − Eb

l )sby,l(z), (20e)

−iFKxucy,l(z) + iKyu

cx,l(z) = −Els

cz,l(z)− (El − Eb

l )sbz,l(z). (20f)

Introducing the notation for Fourier coefficients ξn of a function ξ(x) on theinterval x ∈ [0,Λ),

ξn =

∫ Λ

0

ξ(x)e−in 2πΛx dx,

the matrices in the expressions above are defined as follows,

(Kx)mn = (kxn/k0)δmn, (21a)

(Ky)mn = (ky/k0)δmn, (21b)

(El)mn = ϵl,n−m, (21c)

(Pl)mn = pl,n−m, (21d)

(Ebl )mn = ϵbl,n−m, (21e)

(Pbl )mn = pbl,n−m, (21f)

(F)mn = γn−m, (21g)

12

for m,n = −N...+N . Here δmn is the Kronecker delta and

pl(x) = 1/ϵl(x) (22a)

pbl (x) = 1/ϵbl (x) (22b)

γ(x) = 1/f ′(x) (22c)

The function f(x) is the complex coordinate transformation implementingthe PML and has been defined in Section 3.1. Note that for a horizontalbackground the matrices Eb

l = ϵbl I and Pbl = (ϵbl )

−1I are diagonal.

3.3. Discretization of the background field

The background field is the solution to the problem of light propagationin a multilayer stack. In the horizontal problem, we assume a multilayerstack which extends infinitely in the xy-plane and is invariant in the x- andy-directions. Let the locations of interfaces in the multilayer be given by theset hb = hb

l , l = 0, ...,M b. Due to the nature of spatial discretization, wehave hb ⊆ h. We will therefore decompose the background field depending ondiscretization slices instead of physical layers. In each slice the field consistsof an upward- and downward-traveling wave.

eb(x, y, z) = ts,le−k0ql(z−hl)e−i(kincx x+kincy y) + rs,le

k0ql(z−hl)e−i(kincx x+kincy y), (23a)

hb(x, y, z) = tu,le−k0ql(z−hl)e−i(kincx x+kincy y) + ru,le

k0ql(z−hl)e−i(kincx x+kincy y), (23b)

for z ∈ [hl−1, hl), where

ql = i

√ϵbl −

(kincx

k0

)2

−(kincy

k0

)2

. (24)

and the amplitude vectors t and r can be determined from the interfaceconditions resulting from the Maxwell equations, see (Yeh, 2005; Schadleet al., 2007).

In the vertical problem the multilayer is invariant in the z- and y-directions.Application of the transformation (7) to (23) yields the vertical backgroundfield

13

t1 r1

tM

¯t1

¯r1

¯tM

x

z

Figure 6: The background field consisting of transmitted and reflected waves for thehorizontal (left) and vertical (right) problem.

¯eb(x, y, z) = ¯ts,le−i¯kincz ze−k0ql(x−hl)e−i¯kincy y (25a)

+ ¯rs,le−i¯kincz zek0ql(x−hl)e−i¯kincy y, x ∈ [hl−1, hl),

¯hb(x, y, z) = ¯tu,le−i¯kincz ze−k0ql(x−hl)e−i¯kincy y (25b)

+ ¯ru,le−i¯kincz zek0ql(x−hl)e−i¯kincy y, x ∈ [hl−1, hl).

where ¯ru,l = Tru,l, ¯rs,l = Trs,l, ¯tu,l = Ttu,l, ¯ts,l = Tts,l. Figure 6 depictsthe background fields for the horizontal and vertical problems. We need towrite the background fields in a discretized form (19). It will be shown thatthe following less general form can be used for both settings (horizontal andvertical),

ebα(x, y, z) = (sbα(z))T · ϕ(x, y) = (ds,αs

bα(z))

T · ϕ(x, y), (26a)

hbα(x, y, z) = (ub

α(z))T · ϕ(x, y) = (du,αu

bα(z))

T · ϕ(x, y). (26b)

The horizontal background field can be exactly represented in this form.Since the (x, y)-dependent part of the horizontal background field (23) co-incides with ϕ0(x, y), we have ds,α = du,α = eN+1, where eN+1 ∈ R2N+1 isdefined as (eN+1)n = δn,N+1, and

sbα(z) = ts,α,le−k0ql(z−hl) + rs,α,le

k0ql(z−hl), z ∈ [hl−1, hl), (27a)

ubα(z) = tu,α,le

−k0ql(z−hl) + ru,α,lek0ql(z−hl), z ∈ [hl−1, hl). (27b)

On the other hand, the vertical background field can only be representedin the form (19) or (26) through a projection. The basis function ϕ0(x, 0)

14

gives the x-dependence for all plane waves constituting the background fieldin Figure 6 (left). None of the basis functions in the vertical problem givean exact representation of the background field in Figure 6 (right). In otherwords, in the vertical problem the basis used for the contrast field is not aneigenbasis for the background problem. We separate the x-dependent partof the vertical background field (25),

bs,α(x) = ¯ts,α,le−k0ql(x−hl) + ¯rs,α,le

k0ql(x−hl), x ∈ [hl−1, hl), (28a)

bu,α(x) = ¯tu,α,le−k0ql(x−hl) + ¯ru,α,le

k0ql(x−hl), x ∈ [hl−1, hl), (28b)

and project bs/u,α(x) on the available basis ϕn(x, 0) = e−ikxnx,

bs/u,α(x) =N∑

n=−N

bs/u,α,ne−ikxnx. (29)

Multiplying both sides by eikxnx and integrating over the interval [0,Λ], yields

bs/u,α,n =1

Λ

∫ Λ

0

bs/u(x)eikxnx dx =

1

Λ

∫ Λ

0

(bs/u(x)eikx0x)e−in 2π

Λx dx.

Thus the coefficients bs/u,α,n may be computed with the help of a Fouriertransform (FFT). We denote by bs/u,α the vector formed from these coef-ficients. Now, the background field in the scatterer can be written in theavailable basis

¯eb,Nα (x, y, z) = e−i¯kincz z

N∑n=−N

bs,α,nϕn(x, y) = (bs,αe−i¯kincz z)Tϕ(x, y), (30)

¯hb,Nα (x, y, z) = e−i¯kincz z

N∑n=−N

bu,α,nϕn(x, y) = (bu,αe−i¯kincz z)Tϕ(x, y). (31)

The fields ¯eb,Nα , ¯hb,Nα approximate the exact fields ¯ebα,

¯hbα in (25) with 2N + 1

basis functions. Thus in representation (26) we have ¯ds,α = bs,α,¯du,α = bu,α

and

¯sbα(z) = ¯ubα(z) = e−i¯kincz z. (32)

To achieve uniform notation in the horizontal and vertical settings, we denote

¯ql = i¯kincz . (33)

15

From (27) and (32) we conclude that the following differential equation issatisfied for z ∈ [hl−1, hl),

d

dz2sbα = q2l s

bα, (34a)

d

dz2ubα = q2l u

bα. (34b)

This property will be used in the next subsection.

3.4. General solution in a slice

The y- and z-component of the electric and magnetic field can be elim-inated from the discretized Maxwell’s equations for the contrast field (20),

d

dz2scx,l(z) = k2

0Clscx,l(z) + k2

0(BlP−1l −Bb

l (Pbl )

−1)ds,αsbα(z), (35a)

d

dz2uc

x,l(z) = k20Dlu

cx,l(z)− k2

0(El − Ebl )du,αu

bα(z), (35b)

with

Al = (FKx)2 − El, (36a)

Bl = FKxE−1l FKx − I, (36b)

Bbl = FKx(E

bl )

−1FKx − I, (36c)

Cl = K2y +BlP

−1l , (36d)

Dl = K2y +Al. (36e)

In order to arrive at (35) from (20), additionally the Maxwell equations forthe background field (8) have been used. The solution vector consists of ahomogeneous and particular solution

scx,l = scx,hom,l + scx,part,l, (37a)

ucx,l = uc

x,hom,l + ucx,part,l. (37b)

The homogeneous solution is given by

scx,hom,l = Ws,l(e−k0Qs,l(z−hl)c+s,l + ek0Qs,l(z−hl+1)c−s,l), (38a)

ucx,hom,l = Wu,l(e

−k0Qu,l(z−hl)c+u,l + ek0Qu,l(z−hl+1)c−u,l), (38b)

16

where the pairs Ws,l, Qs,l and Wu,l, Qu,l contain the matrix of eigenvectorsand the diagonal matrix of square roots of eigenvalues of, respectively, Cl

and Dl. The radiation condition is applied in the z-direction by allowingonly outgoing waves in the top and bottom layers,

c+s,1 = c+u,1 = c−s,M = c−u,M = 0. (39)

Both the PML and the above constraint implement the same effective bound-ary condition, such that the same problem is solved with classical and alter-native discretizations. In the classical FMM, where a periodic boundarycondition is imposed at two of the sides, the two discretizations solve differ-ent problems. This implies that the exchange of directions is only possiblefor finite structures, and not for infinitely periodic ones.

To find the particular solution we assume the form

scx,part,l(z) = ps,lsbx(z), (40a)

ucx,part,l(z) = pu,lu

bx(z). (40b)

The solution has this form for right-hand sides with a finite family of deriva-tives, such as polynomial and trigonometric functions (method of undeter-mined coefficients applied to systems). Substituting this ansatz in Equation(35) and using property (34), we obtain two linear systems which can besolved for ps,l and pu,l

(Cl − q2l I)ps,l = −(BlP−1l −Bb

l (Pbl )

−1)ds,x, (41a)

(Dl − q2l I)pu,l = (El − Ebl )du,x. (41b)

3.5. Matching of solutions at interfaces

At the interface, continuity of the tangential components of the contrastfields is required

(ecl+1 − ecl )× n = 0, (42a)

(hcl+1 − hc

l )× n = 0. (42b)

These interface conditions follow from the continuity of the tangential com-ponents of the total and background fields. After applying the Galerkin

17

approach as described in Section 3.2 these equations become

scx,l(hl+1) = scx,l+1(hl+1), (43a)

scy,l(hl+1) = scy,l+1(hl+1), (43b)

ucx,l(hl+1) = uc

x,l+1(hl+1), (43c)

ucy,l(hl+1) = uc

y,l+1(hl+1). (43d)

Using (20c) in (20e) and (20f) in (20b), the y-components of the fields areexpressed in terms of x-components,

scy,l =A−1l (FKxKys

cx,l + k−1

0

d

dzucx,l + (El − Eb

l )ds,ysby), (44a)

ucy,l =B−1

l (FKxE−1Kyu

cx,l + k−1

0

d

dzscx,l − iFKx(I− Eb

lE−1l )ds,zs

bz). (44b)

We define

Wl =

[0 Ws,l

A−1l Wu,lQu,l A−1

l FKxKyWs,l

], (45a)

Vl = −[

Wu,l 0B−1

l FKxE−1l KyWu,l B−1

l Ws,lQs,l

], (45b)

Xl =

[e−k0Qu,l(hl+1−hl) 0

0 e−k0Qs,l(hl+1−hl)

](45c)

and

c+l =

[−c+u,lc+s,l

], c−l =

[c−u,lc−s,l

]. (46)

Then, from (43), (37), (38), (40) we have for each slice[WlXl Wl

VlXl −Vl

] [c+lc−l

]+ gl(hl+1) =

[Wl+1 Wl+1Xl+1

Vl+1 −Vl+1Xl+1

] [c+l+1

c−l+1

]+ gl+1(hl+1),

(47a)

where

gl(z) =

ps,ls

bx

A−1l (FKxKyps,ls

bx + k−1

0 pu,lddzubx + (El − Eb

l )ds,ysby)

pu,lubx

B−1l (FKxE

−1Kypu,lubx + k−1

0 ps,lddzsbx − iFKx(I− E−1

l Ebl )ds,zs

bz)

.

18

(Sl,l+1, fl,l+1)

c+

l

c+

l+1

c−

l

c−

l+1

Figure 7: Interface description.

In (47a) the vectors c+l , c−l are unknown, except the vectors corresponding

to incoming fields in the upper and bottom slices,

c+1 = 0, c−M = 0. (47b)

The coupled linear systems (47) for l = 1, ...,M − 1 have to be solved in astable manner by using a non-homogeneous S-matrix algorithm (Pisarencoet al., 2011) which is an adaptation of the classical (homogeneous) S-matrixalgorithm to the contrast field formulation.

4. Non-homogeneous S-matrix algorithm for repeating slices

In this section we develop a dedicated non-homogeneous S-matrix algo-rithm for the vertical problem. The repeating patterns in the direction ofslicing suggest that this property can be exploited in order to achieve evenhigher efficiency in terms of computational costs.

The interface relation (47a) is unstable in the sense that its direct usefor sequential elimination of field amplitudes leads to large round-off errors(Ko and Inkson, 1988; Li, 1996a; Pisarenco et al., 2011). To avoid instability,we use an S-matrix representation. This representation maps the incomingwaves on an interface to outgoing waves from that interface. For instance,the S-matrix relation associated with layers l and l + 1 is (see Figure 7)[

c+l+1

c−l

]=

[S11l,l+1 S12

l,l+1

S21l,l+1 S22

l,l+1

]︸︷︷︸

Sl,l+1

[c+lc−l+1

]+

[f1l,l+1

f2l,l+1

]︸︷︷︸fl,l+1

.

The pair (Sl,l+1, fl,l+1) describes the scattering and source properties of theinterface. It can be determined from the T-matrix Tl,l+1 and the vector g′

l,l+1

19

(see (Pisarenco et al., 2011)),

S11l,l+1 = (T11

l,l+1 −T12l,l+1(T

22l,l+1)

−1T21l,l+1)Xl, (48a)

S12l,l+1 = T12

l,l+1(T22l,l+1)

−1Xl+1, (48b)

S21l,l+1 = −(T22

l,l+1)−1T21

l,l+1Xl, (48c)

S22l,l+1 = (T22

l,l+1)−1Xl+1, (48d)

f1l,l+1 = g′1l,l+1 −T12

l,l+1(T22l,l+1)

−1g′2l,l+1, (48e)

f2l,l+1 = −(T22l,l+1)

−1g′2l,l+1, (48f)

where[T11

l,l+1 T12l,l+1

T21l,l+1 T22

l,l+1

]=

1

2

[W−1

l+1 V−1l+1

W−1l+1 −V−1

l+1

] [Wl Wl

Vl −Vl

], (49a)[

g′1l,l+1

g′2l,l+1

]=

1

2

[W−1

l+1 V−1l+1

W−1l+1 −V−1

l+1

](gl(hl+1)− gl+1(hl+1)). (49b)

Computation of matrices Wl and Vl = WlQl requires diagonalization of Cl

and Dl (Section 3.4), which is a relatively expensive computation. The timeand memory requirements are estimated by

Tdiag = O(M ′N3), (50a)

Mdiag = O(M ′N2), (50b)

where M ′ is the number of interfaces per period.We now consider the problem of replacing two neighboring interfaces by

one equivalent interface, depicted in Figure 8. Let the pair (Sa,b, fa,b) describethe interface(s) joining layers a and b, and the pair (Sb,c, fb,c) describe theinterface(s) joining layers b and c, (a < b < c, a, b, c ∈ N).[

c+bc−a

]=

[S11a,b S12

a,b

S21a,b S22

a,b

] [c+ac−b

]+

[f1a,bf2a,b

], (51a)[

c+cc−b

]=

[S11b,c S12

b,c

S21b,c S22

b,c

] [c+bc−c

]+

[f1b,cf2b,c

]. (51b)

We would like to find the pair (Sa,c, fa,c) which realizes the mapping for layersa and c [

c+cc−a

]=

[S11a,c S12

a,c

S21a,c S22

a,c

] [c+ac−c

]+

[f1a,cf2a,c

]. (52)

20

(Sa,b, fa,b)

c+a

c+

b

c−

a

c−

b

(Sb,c, fb,c)

c+c

c−

c

(Sa,c, fa,c)

Figure 8: The problem of interface merging.

It is shown in (Pisarenco et al., 2011) that (Sa,c, fa,c) is given by

Sa,c =

[S11b,c(I− S12

a,bS21b,c)

−1S11a,b S12

b,c + S11b,cS

12a,b(I− S21

b,cS12a,b)

−1S22b,c

S21a,b + S22

a,bS21b,c(I− S12

a,bS21b,c)

−1S11a,b S22

a,b(I− S21b,cS

12a,b)

−1S22b,c

],

(53)

fa,c =

[S11b,c(I− S12

a,bS21b,c)

−1(S12a,bf

2b,c + f1a,b) + f1b,c

S22a,bS

21b,c(I− S12

a,bS21b,c)

−1(S12a,bf

2b,c + f1a,b) + S22

a,bf2b,c + f2a,b

]. (54)

To simplify further exposition, based on the classical Redheffer matrix prod-uct (Redheffer, 1961), we define a non-homogeneous Redheffer product

(Sa,b, fa,b) ∗ (Sb,c, fb,c) = (Sa,c, fa,c). (55)

It can be proven that this new product, just like the regular matrix productand the Redheffer matrix product, is associative but not commutative. Wewill exploit the associativity property in order to group the interfaces in aconvenient way.

4.1. Global stack description

We can repeat the process of merging interfaces until the whole stack ofinterfaces is described by one equivalent interface. The non-homogeneousS-matrix algorithm (Pisarenco et al., 2011) uses the iteration

(S1,l+1, f1,l+1) = (S1,l, f1,l) ∗ (Sl,l+1, fl,l+1). (56)

The global matrix-vector pair for M − 1 interfaces (M layers) is given by

(S1,M , f1,M) = [...[[(S1,2, f1,2) ∗ (S2,3, f2,3)] ∗ (S3,4, f3,4)] ∗ ... ∗ (SM−1,M , fM−1,M)].(57)

21

x

z

...

(S1,3

,f1,3

)

(S1,5

,f1,5

)

(S1,9

,f1,9

)

(S1,5

,f1,5

)

(S1,4

,f1,4

)

(S1,3

,f1,3

)

(S1,9

,f1,9

)

O(M ′R) O(M ′ + log2 R)

...

Figure 9: A schematic representation of the nonhomogeneous S-matrix algorithm appliedto a grating with R = 4 periods and M ′ = 2 interfaces per period: classical linear recursion(left side) and fast exponential recursion (right side).

The left side of Figure 9 gives a visual representation of the merging process.We consider a structure with R repeating periods for simplicity assumed

to be a power of 2, R = 2K (the algorithm can be extended to an arbitrarynumber of periods), where each period has M ′ interfaces in the discretization(M − 1 = RM ′).

The cost of computing (57) scales linearly with the number of periods,

TS = O(RM ′N3), (58a)

MS = O(RM ′N2). (58b)

Due to the associativity property of the non-homogeneous Redheffer product,we can regroup the multiplication operations. This allows us to exploit thelocal periodicity and reduce the computational costs down to

TS = O((M ′ + log2(R))N3), (59a)

MS = O((M ′ + log2(R))N2). (59b)

22

We observe that once we have computed the matrix-vector correspondingto the stack 1, ...,M ′ + 1, the matrix-vector corresponding to the stackM ′ + 1, ..., 2M ′ + 1 can be computed with a minimal effort

(SM ′+1,2M ′+1, fM ′+1,2M ′+1) = (S1,M ′+1, νf1,M ′+1), (60)

where ν = exp(−k0q(hM ′+1−h1)) is a phase factor which appears due to thescalar z-dependence of f . Thus, we can directly compute the matrix-vectorcorresponding to the stack 1, ..., 2M ′ + 1

(S1,2M ′+1, f1,2M ′+1) = (S1,M ′+1, f1,M ′+1) ∗ (S1,M ′+1, νf1,M ′+1). (61)

The iteration which computes the matrix-vector pair for an exponentiallyincreasing number of layers is defined by

(S1,2k+1M ′+1, f1,2k+1M ′+1) = (S1,2kM ′+1, f1,2kM ′+1) ∗ (S1,2kM ′+1, ν2kf1,2kM ′+1),

(62)

for k = 0, ..., R− 1.Once the matrix-vector pair for the entire stack is computed, the coeffi-

cients in the upper and lower layers are given by(c+Mc−1

)=

(f11,Mf21,M

). (63)

The last expression is a result of (52) evaluated for a = 1 and c = M .

4.2. Intermediary coefficients

Relation (63) allows the computation of the coefficients in the first andlast layers of the stack. In order to compute the coefficients in the interme-diary layers, we start with the matrix-vector pair associated with the wholestack and recursively halve the stack. One iteration of this process is ex-plained by again considering the problem in Figure 8. This time we aregiven the pairs (Sa,b, fa,b), (Sb,c, fb,c) and the amplitudes c+a , c

−c , and we need

to determine the intermediary coefficients c+b and c−b . Substitution of thesecond equation of (51b) in the first equation of (51a) yields

(I− S12a,bS

21b,c)c

+b = S11

a,bc+a + S12

a,bS22b,cc

−c + S12

a,bf2b,c + f1a,b. (64)

23

Substitution of the first equation of (51a) in the second equation of (51b)yields

(I− S21b,cS

12a,b)c

−b = S21

b,cS11a,bc

+a + S22

b,cc−c + S21

b,cf1a,b + f2b,c. (65)

We can now write the amplitudes in layer b in terms of the amplitudes of theincoming modes in layer a and c,(

c+bc−b

)=

((I− S12

a,bS21a,b)

−1S11a,b S12

a,b(I− S21b,cS

12a,b)

−1S22b,c

S21b,c(I− S12

a,bS21b,c)

−1S11a,b (I− S21

b,cS12a,b)

−1S22b,c

)(c+ac−c

)+

(S12a,bf

2b,c + f1a,b

S21b,cf

1a,b + f2b,c

). (66)

The update relation (66) is applied recursively (by halving the stack) until thecoefficients in all layers are determined. This requires RM ′ − 1 evaluationsof (66) which, if computed in the right order, involves only matrix-vectorproducts. Therefore, the computational cost of computing and storing theintermediary coefficients is given by

Tc = O(RM ′N2), (67a)

Mc = O(RM ′N2). (67b)

5. Summary of computational costs

From the point of view of computational cost the AFMM-CFF with clas-sical and alternative discretizations consists of three consecutive tasks:

1. Matrix diagonalization (or eigenvalue decomposition);

2. Calculation of the global S-matrix;

3. Calculation of the intermediary coefficients.

The computational costs (in terms of time and memory) for each of the tasksis given respectively by (50), (59) and (67), By combining these estimates weobtain the total computational costs,

T = ((cTdiag + cTS )M′ + cTS log2(R))N3 + cTc RM ′N2, (68a)

M = ((cMdiag + cMS )M ′ + cMS log2(R))N2 + cMc RM ′N. (68b)

We specify that R in the above expressions gives the number of repeatingpatterns in the vertical direction. Thus, for horizontally repeating patterns

24

we always have R = 1 and M ′ represents the total number of interfaces.With this convention estimate (68) is general and applicable for the verticalas well as the horizontal problem. For a reasonably large N , the secondterm in the above expressions (being one order lower) can be neglected. Wewill also assume (supported by evidence from practical experiments) thatcTdiag = cTS = cT . Then estimate (68) reduces to

T ≈ (2cT M ′ + cT log2(R))N3, (69a)

M ≈ ((cMdiag + cMS )M ′ + cMS log2(R))N2. (69b)

6. Results

We consider the problem of scattering from finite gratings. These are peri-odic structures with a finite number of periods. The problem is inspired fromreal-life applications in lithography, where gratings are printed on the waferto be later used for quality control. By measuring the light scattered froma grating, its shape can be determined and the quality of the lithographicprocess can be assessed.

We first consider a small grating, infinitely long in the y-direction. Thegeometry of the problem is shown in Figure 10. It consists of R = 8 rectangu-lar lines made of resist (with a refractive nresist = 1.5), referred to as scatterer,supported by a silicon substrate (nsilicon = 4.28− 0.05i). The material abovethe substrate and the grating lines is air (nair = 1.0). The refractive indicesn of the material are related to the electric permittivity through

ϵ(x, z) = n2(x, z). (70)

We obtain the numerical solution by using the classical and alternativediscretization approaches for AFMM-CFF or, as explained in Section 2, bysolving a horizontal problem and a vertical problem. For the horizontalproblem, the incident plane wave is given by (3c) with

a = (0, 1, 0)T ,

kinc = 10(√2/2, 0,

√2/2)T .

and the PMLs are placed in the regions x ∈ [0, 0.1] ∪ [1.8, 1.9]. The distanceunits are µm. For the vertical problem, the amplitude and the wave vectors

25

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

0

0.2

0.4

x

z||ec ||

0.1

0.2

0.3

0

0.2

0.4

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

x

z

||ec||

0.1

0.2

0.3

Figure 10: Near field (real part) at y = 0 for a binary grating with 8 lines computed withclassical (top) and alternative discretization (bottom).

of the incident plane wave are given by

¯a = (0,−1, 0)T ,

¯kinc = 10(√2/2, 0,

√2/2)T .

and the PMLs are placed in the regions x ∈ [0, 0.1]∪ [0.4, 0.5]. Both problemsuse a PML defined by (11) with p = 1 and β0 = 6. It follows from thewavevectors kinc that in both cases the incident plane wave is TE-polarizedand has a wavelength λ = 2π/|kinc| ≈ 0.6283µm. Note that the spatialdiscretization requires M = 3 slices in the horizontal problem and ¯M =¯R ¯M ′ + 1 = 17 slices in the vertical problem.

Figure 10 shows the solutions of the two problems. The solution of thevertical problem is plotted in rotated coordinates (z,−y, x) to make the com-parison with the horizontal problem easier. Clearly, for both problems thefields decay to zero in the PMLs. The solution in the PML is not physi-cally relevant and the placement of the PMLs is different in the two prob-lems. Therefore, the horizontal problem has a relevant solution in the stripe(x, y, z) ∈ Ωr = [0.1, 1.8] × R × R, while the vertical problem has a rele-vant solution on (z,−y, x) ∈ ¯Ωr = R × R × [1, 4]. In the light of the above

26

100

101

102

103

10−6

10−5

10−4

10−3

10−2

10−1

N

Err

or(a)

100

101

102

103

10−1

100

101

102

N

Tim

e [s

]

(b)

10−3

10−2

10−1

100

101

102

103

104

10−6

10−5

10−4

10−3

10−2

10−1

Err

or

Memory [MBytes]

(c)

10−1

100

101

102

103

10−6

10−5

10−4

10−3

10−2

10−1

Err

or

Time [s]

(d)

Figure 11: Comparison of classical and alternative discretization approaches for a gratingwith 32 rectangular lines on resist substrate: (a) Convergence, (b) Complexity, (c) Errorvs. Memory, (d) Error vs. Time. Solid line (—) corresponds to the horizontal problem(classical discretization), dashed line (- -) corresponds to the vertical problem (alternativediscretization), dotted line (· · · ) indicates theoretical cubic complexity, T = O(N3).

discussion, the fields should be compared only on a domain where both arerelevant, that is on

Ωc = Ωr ∪ ¯Ωr = [0.1, 1.8]× R× [0.1, 0.4]

Thus we expect that

ec,N(x) ≈ T¯ec,N(Tx), for x = (x, y, z) ∈ Ωc (71)

A visual inspection of the obtained solutions in Ωc confirms this expectation.Note that N in (71) may be different for the horizontal and vertical problems.

In order to clearly assess the advantages of alternative discretization overclassical discretization, we need to consider convergence and complexity of

27

the two approaches. We define first the discretization errors,

EN = ||ec,N(x)− ec,ref (x)|| (72)

¯EN = ||T¯ec,N(Tx)− ec,ref (x)|| (73)

where the norm for a vector field v is defined as

||v||2 =∑i

∑j

∑α

|vα(xi, 0, zj)|2, (74)

for α = (x, y, z) and xi and zj sample the smallest rectangular domain enclos-ing the scatterer. For scatterers invariant in the y-direction, the dependenceof the field on y is known, and only the field at y = 0 may be considered. Inabsence of an exact solution for this type of problems, we choose as referencethe numerical solution for the (trusted) horizontal setup computed with alarge truncation number N

ec,ref = ec,1048

The computations are performed in MATLAB on a Intel Core 2 Quad CPUQ6600 4x2.40GHz with 2GB of RAM.

Figure 11 (a) shows the convergence of the solution for a vertical and hori-zontal setup. For small truncation numbers, N < 10, the error for the verticalproblem is larger. The reason for this behavior lies in the representation ofthe vertical background field in the available basis. While in the horizontalproblem a single basis function is required for an exact representation, inthe vertical problem the background field eb, hb can only be approximated.Clearly, for small N the accuracy of the approximation is unsatisfactory. Theslow decrease of the error for both approaches is explained by an insufficientresolution of the basis functions, so that the exponential decay of the solu-tion in the PMLs cannot be accurately resolved. For N > 10, the error forthe vertical problem ¯E becomes much smaller than for the horizontal one E .Because the width of the computational domain for ¯P is smaller than thewidth of the computational domain for P , less basis functions are needed forthe former to achieve the same accuracy. From the plot we conclude that thesame number of harmonics determine an error for the vertical problem whichis two orders of magnitude smaller than for the horizontal problem. Howeverthis plot is not sufficient to judge the performance of the approaches. Indeed,for a fixed number of harmonics, more work needs to be done for the verticalproblem than for the horizontal due to a larger number of slices.

28

We turn to Figure 11 (b) which shows the execution time as a functionof N for the two problems. Since M ′ = 2, R = 1 and ¯M ′ = 2, ¯R = 32, using(69) we can estimate the ratio of computation times for a large (but fixed)N :

(2cT M ′ + cT log2(R))N3

(2cT ¯M ′ + cT log2(¯R)) ¯N3

=4 + 0

4 + 5=

1

2.25.

Thus solving the vertical problem is estimated to take 2.25 times longer thanthe horizontal problem. The distance between the two lines on the plot forN approaching 103 comes close to this value. We also note that althoughasymptotically we expect a cubic complexity T = O(N3) indicated by thedotted line in the plot, it is only approached for N > 102. This means thatthe real speed-up factor will be lower than the one predicted theoretically.

We combine data from Figure 11 (a) and (b) in order to obtain the directdependence of error on time displayed in Figure 11 (d). We conclude thatfor errors larger than 10−2 the vertical approach is slower. It only becomesfaster for smaller errors. For instance, an error of 10−3 can be achievedapproximately 8 times faster by solving ¯P instead of P .

For large computations with AFMM-CFF, the memory requirements limitthe number of harmonics to be used. For this reason we also consider thememory required by the horizontal and vertical approaches to achieve a cer-tain error. The size of a matrix or vector element in a double precisionfloating-point arithmetic is 8 Bytes. Considering that we have about 4 ma-trices per layer, the estimated memory space required for a computation isgiven by (68b) with cMdiag = cMS = 36 and cMc = 16. Thus, the informa-tion from the convergence plot, Figure 11 (a), can be used to determine theerror-vs-memory behavior displayed in Figure 11 (c).

We expect the gain in computational costs to change depending on thewidth of the structures. For this reason we now consider three geometrieswith the same material properties but having a different number of periods.Using the functions T (E) and ¯T (E) that give the computation time for thetwo problems, we define the speed-up factor for a given error

ηT (E) =T (E)¯T (E)

. (75)

In analogy with the speed-up factor we also define the memory use factor

ηM(E) = M(E)¯M(E)

. (76)

29

10−4

10−3

10−2

10−1

100

101

102

Error

Spe

ed−

up fa

ctor

, ηT

(a)

16 lines32 lines64 lines

10−4

10−3

10−2

100

101

102

103

Error

Mem

ory

use

fact

or, η

M

(b)

16 lines32 lines64 lines

Figure 12: Speed-up and memory use factors for R ∈ 16, 32, 64, ns = 1.5.

A plot of the speed-up and memory use factor for different values of R(R ∈ 16, 32, 64) is shown in Figure 12. We observe the general trend of in-creasing factors for decreasing errors. For E = 10−3 the speed-up and memoryuse factors are in the range 3 . . . 15 and 10 . . . 110 respectively. Unfortunatelyit is difficult to simulate larger gratings (R ∈ 128, 256, . . .) with classicaldiscretization (horizontal problem) due to the prohibitively large (in termsof memory) number of harmonics. This is also the reason why the error of10−4 has not been reached for R = 64 (it is however easily reached in thevertical problem, see Figure 11). We can extrapolate the data presented inthe plot in order to predict an approximate speed-up of 102 and a memoryuse factor of 103 at E = 10−3 for a grating having 256 lines.

The convergence of the solution depends strongly on its smoothness,which in turn is determined by the presence and the size of the jump disconti-nuities in the discretized permittivity function ϵl(x). Clearly, these functionsare different for the vertical and horizontal problems. The jump discontinu-ities are material interfaces located either at the boundary of the scattereror at the boundaries of the background multilayer. Independent on the dis-cretization direction, the discontinuities introduced by the scatterer have thesame effect on the jumps in ϵl(x). The effect of the background multilayerhowever is different depending on the discretization direction. For the hori-zontal problem the jumps in ϵb(z) are not visible in ϵl(x) since they coincidewith the interfaces of discretization slices. On the other hand, in the verticalproblem the jump discontinuities in ¯ϵb(x) are present in ¯ϵl(x) for all layers.

30

10−4

10−3

10−2

10−1

100

101

102

Error

Spe

ed−

up fa

ctor

, ηT

(a)

n = 4.28−0.04in = 1.5n = 1.0

10−4

10−3

10−2

100

101

102

103

Error

Mem

ory

use

fact

or, η

M

(b)

n = 4.28−0.04in = 1.5n = 1.0

Figure 13: Speed-up and memory use factors for R = 32, ns ∈ 1.0, 1.5, 4.28− 0.04i.

This might slow down the convergence when solving the vertical problem.To quantify the effect, we consider a grating with R = 32 lines with threedifferent materials in the substrate: air (ns = 1.0), resist (ns = 1.5), silicon(ns = 4.28− 0.04i). A plot of the speed-up and memory use factor for thesecases is shown in Figure 13. In the case of air substrate the background per-mittivity function ϵb is continuous, which determines a faster convergence ofthe vertical problem’s solution. This results in large speed-up and memoryuse factors (up to 40 and 100 respectively). On the other hand, for siliconsubstrate we have a large jump discontinuity in ϵb at the interface betweenair and silicon and much smaller gains (2 and 8 for speed and memory re-spectively).

We can get a theoretical estimate of the attainable speed-up using (75)and (69)

ηT (E → 0) =2M ′ + log2(R)

2 ¯M ′ + log2(¯R)

(N¯N

)3

(77)

We consider the case of a resist grating with 32 lines on air substrate studiedin Figure 13. Because the superstrate and substrate are of the same materialwe have similar smoothness properties of the solution and permittivity inboth directions. Then, we may assume that in order to achieve the sameaccuracy, we need to have equal resolution of the discretization. Thus, werequire that the number of harmonics per unit length is constant for both

31

discretizations. This yields the estimate

N/ ¯N = Λ/ ¯Λ = 67/5 = 13.4. (78)

We also have M ′ = 2, R = 1 and ¯M ′ = 2, ¯R = 32. Then, the speed-up factor estimated with (77) is close to 1000. We stress that this is atheoretical estimate for very large N or, equivalently, for extremely small E .The estimate is useful as an upper bound for a potentially attainable speed-up. In practice, the pure cubic complexity in N is hardly reached. For theconsidered problem, the error of 10−3 is reached for N = 101..102. In thisrange a quadratic complexity can be assumed, as confirmed by Figure 11 (b).In this case our estimated speed-up factor becomes

ηT (E ≈ 10−3) =2M ′ + log2(R)

2 ¯M ′ + log2(¯R)

(N¯N

)2

= 79.8 (79)

which is reasonably close to the measured ηT for ns = 1.0 shown in Figure13. It is important to stress that convergence depends on the smoothnessof the permittivity. For problems which have different smoothness proper-ties in different directions, the requirement on equal resolution (harmonicsper unit length) in order to achieve equal errors (used in (78)) is not appli-cable. Moreover, in the vertical problem the background field has an exactrepresentation only for a homogeneous background permittivity.

We mention that speed-up is achieved even if the structure has no re-peating patterns. In this case the logarithmic complexity in the S-matrixalgorithm is lost, but the cubic complexity can still be replaced by a linearcomplexity in the longer direction. We estimate the speed-up for an arbi-trary structure with a width-to-height ratio given by Λ/ ¯Λ. We use expres-sions (77) and (79) with R = ¯R = 1 (no repeating patterns), M ′/ ¯M ′ = ¯Λ/Λand N/ ¯N = Λ/ ¯Λ (assuming similar smoothness of the permittivity in bothdirections) to get

ηT (E → 0) =

(Λ¯Λ

)2

, (80)

ηT (E ≈ 10−3) =Λ¯Λ. (81)

Thus, the theoretical speed-up for an arbitrary structure scales quadraticallywith the width-to height ratio. For accuracies required in practical applica-tions this behavior will be closer to a linear one, as indicated by (81).

32

0

0.1

0.2

0.3

0.4

0.50 0.2 0.4 0.6 0.8 1

x0

0.1

0.2

0.3

0.4

0.5204.2 204.4 204.6 204.8 205

z

...

Figure 14: Plot of ||¯ec|| for an ultra-large grating with 1024 lines. Only small parts of thefield at the edges of the grating are shown.

Although the most noticeable improvement resulting from the alternativediscretization is speed-up, the memory saving might become crucial whensolving large problems. Thus, with the new approach we may easily simulatescattering from structures with repeating patterns having a width of the orderof hundreds of wavelengths. This is a difficult task for the AFMM-CFF withclassical discretization, as well as for other numerical methods such as FDTDand FEM. Figure 14 shows a small part of the computed field for a gratingwith 1024 lines (more than 300 wavelengths).

In practical applications it is often needed to compute the field far abovethe scatterer. We briefly comment on this issue. The far field can be com-puted by a Green’s function approach (Michalski and Zheng, 1990) or by aFourier transformation of the field on a line above the scatterer, denoted bya dotted line in Figure 2). The second approach (Goodman, 2004, Section3.10.2) is faster since the integrals can be computed numerically with fastFourier transform (FFT) routines. From Figure 2 it is visible that the fieldon the dotted line is not available in the PMLs and beyond for the horizontalproblem. Therefore, the faster approach can only be used in the verticalproblem.

7. Conclusions

A technique for speeding up computations with the AFMM-CFF hasbeen presented. The speed-up is achieved by exchanging the directions ofspatial and spectral discretizations. The computationally cheaper spatialdiscretization is used in the longer direction, while the more expensive spec-tral discretization is applied in the shorter direction. Moreover, local peri-

33

odicity in the slicing direction is exploited by using the associativity of thenon-homogeneous Redheffer star product. This further decreases the com-putational costs: linear complexity is replaced by logarithmic complexity.The exchange of discretization directions is implemented by a rotation of thecoordinate system. While the rotated scatterer can be treated automaticallyby the method, the rotated background multilayer introduces a fundamen-tal difference: the background field cannot be exactly represented in theavailable basis and a projection has to be used. This step introduces an ad-ditional approximation error, especially when few basis functions are used.For this reason, larger speed-up is achieved when smaller errors are required(or equivalently more basis functions are employed). Besides the imposedaccuracy, the speed-up and memory use factors obtained with alternativediscretization are dependent on the number of repeating patterns as well asthe size of jump discontinuities in the background multilayer. The numeri-cal experiments confirm that the alternative discretization shows significantimprovements for geometries with small jumps at the layer interfaces and alarge number of periods.

In this paper we have considered two-dimensional scatterers (invariant inone direction). The presented approach can be extended to three-dimensionalscatterers supported on multilayers invariant in the x- and y-directions. Inthis case instead of the classical approach (spectral discretization of x- andy- directions and spatial discretization of z) we can choose either x or y tobe spatially discretized.

Acknowledgements

The authors acknowledge ASML B.V. for funding this project. We arealso grateful to dr. Sjoerd Rienstra for his incisive question at a local seminarwhich has ignited this research.

References

Berenger, J.-P., Oct. 1994. A perfectly matched layer for the absorption ofelectromagnetic waves. Journal of Computational Physics 114 (2), 185–200.

Berenger, J. P., Jan. 1996. Perfectly matched layer for the FDTD solution ofwave-structure interaction problems. IEEE Transactions on Antennas andPropagation 44 (1), 110–117.

34

Botha, M., Oct. 2006. Solving the volume integral equations of electromag-netic scattering. Journal of Computational Physics 218 (1), 141–158.

Bowman, J. J., Senior, T. B. A., Uslenghi, P. L. E., 1987. Electromagneticand acoustic scattering by simple shapes (Revised edition). New York,Hemisphere Publishing Corp.

Cao, Q., Lalanne, P., Hugonin, J.-P., Feb. 2002. Stable and efficient Bloch-mode computational method for one-dimensional grating waveguides. J.Opt. Soc. Am. A 19 (2), 335–338.

Chang, Y.-C., Li, G., Chu, H., Opsal, J., Mar. 2006. Efficient finite-element, Green’s function approach for critical-dimension metrology ofthree-dimensional gratings on multilayer films. J. Opt. Soc. Am. A 23 (3),638–645.

Chew, W. C., Weedon, W. H., Sep. 1994. A 3D perfectly matched mediumfrom modified Maxwell’s equations with stretched coordinates. Microwaveand Optical Technology Letters 7, 599–604.

Collino, F., Monk, P., Oct. 1998a. Optimizing the perfectly matched layer.Computer Methods in Applied Mechanics and Engineering 164 (1-2), 157–171.

Collino, F., Monk, P., Nov. 1998b. The Perfectly Matched Layer in Curvi-linear Coordinates. SIAM J. Sci. Comput. 19, 2061–2090.

Golub, G. H., Van Loan, C. F., Oct. 1996. Matrix Computations (JohnsHopkins Studies in Mathematical Sciences)(3rd Edition), 3rd Edition. TheJohns Hopkins University Press.

Goodman, J. W., Dec. 2004. Introduction to Fourier Optics, 3rd Edition.Roberts & Company Publishers.

Granet, G., Oct. 1999. Reformulation of the lamellar grating problem throughthe concept of adaptive spatial resolution. J. Opt. Soc. Am. A 16 (10),2510–2516.

Granet, G., Guizal, B., May 1996. Efficient implementation of the coupled-wave method for metallic lamellar gratings in TM polarization. J. Opt.Soc. Am. A 13 (5), 1019–1023.

35

Hench, J. J., Strakos, Z., 2008. The RCWA method - A case study with openquestions and perspectives of algebraic computations. Electronic Transac-tions on Numerical Analysis 31, 331–357.

Hugonin, J. P., Lalanne, P., 2005. Perfectly matched layers as nonlinearcoordinate transforms: a generalized formalization. J. Opt. Soc. Am. A22 (9), 1844–1849.

Janssen, O. T. A., 2010. Rigorous simulations of emitting and non-emittingnano-optical structures. Ph.D. thesis.

Knop, K., Sep. 1978. Rigorous diffraction theory for transmission phase grat-ings with deep rectangular grooves. J. Opt. Soc. Am. 68 (9), 1206–1210.

Ko, D. Y. K., Inkson, J. C., Nov. 1988. Matrix method for tunneling in het-erostructures: Resonant tunneling in multilayer systems. Physical ReviewB 38, 9945–9951.

Lalanne, P., Morris, G. M., Apr. 1996. Highly improved convergence of thecoupled-wave method for TM polarization. J. Opt. Soc. Am. A 13 (4),779–784.

Lalanne, P., Silberstein, E., Aug. 2000. Fourier-modal methods applied towaveguide computational problems. Opt. Lett. 25 (15), 1092–1094.

Lantos, N., Nataf, F., Dec. 2010. Perfectly matched layers for the heat andadvection-diffusion equations. Journal of Computational Physics 229 (24),9042–9052.

Lecamp, G., Hugonin, J. P., Lalanne, P., Sep. 2007. Theoretical and com-putational concepts for periodic optical waveguides. Opt. Express 15 (18),11042–11060.

Li, L., May 1996a. Formulation and comparison of two recursive matrix algo-rithms for modeling layered diffraction gratings. J. Opt. Soc. Am. A 13 (5),1024–1035.

Li, L., Sep. 1996b. Use of Fourier series in the analysis of discontinuousperiodic structures. J. Opt. Soc. Am. A 13 (9), 1870–1876.

36

Li, P., Jan. 2010. Coupling of finite element and boundary integral methodsfor electromagnetic scattering in a two-layered medium. Journal of Com-putational Physics 229 (2), 481–497.

Michalski, K. A., Zheng, D., Mar. 1990. Electromagnetic scattering and ra-diation by surfaces of arbitrary shape in layered media. I - Theory. II -Implementation and results for contiguous half-spaces. IEEE Transactionson Antennas and Propagation 38, 335–352.

Moharam, M. G., Grann, E. B., Pommet, D. A., Gaylord, T. K., May1995a. Formulation for stable and efficient implementation of the rigor-ous coupled-wave analysis of binary gratings. J. Opt. Soc. Am. A 12 (5),1068–1076.

Moharam, M. G., Pommet, D. A., Grann, E. B., Gaylord, T. K., May 1995b.Stable implementation of the rigorous coupled-wave analysis for surface-relief gratings: enhanced transmittance matrix approach. J. Opt. Soc. Am.A 12 (5), 1077–1086.

Monk, P., Dec. 1992. A finite element method for approximating the time-harmonic Maxwell equations. Numerische Mathematik 63 (1), 243–261.

Monk, P., 2003. Finite Element Methods for Maxwell’s Equations. OxfordUniversity Press, Oxford.

Petropoulos, P. G., Jul. 2003. An analytical study of the discrete perfectlymatched layer for the time-domain maxwell equations in cylindrical coordi-nates. IEEE Transactions on Antennas and Propagation 51 (7), 1671–1675.

Pisarenco, M., Maubach, J., Setija, I., Mattheij, R., Nov. 2010. AperiodicFourier modal method in contrast-field formulation for simulation of scat-tering from finite structures. J. Opt. Soc. Am. A 27 (11), 2423–2431.

Pisarenco, M., Maubach, J., Setija, I., Mattheij, R., Jul. 2011. Modified S-matrix algorithm for the aperiodic Fourier modal method in contrast-fieldformulation. J. Opt. Soc. Am. A 28 (7), 1364–1371.

Popov, E., Neviere, M., Oct. 2000. Grating theory: new equations in Fourierspace leading to fast converging results for TM polarization. J. Opt. Soc.Am. A 17 (10), 1773–1784.

37

Popov, E., Neviere, M., Nov. 2001. Maxwell equations in Fourier space:fast-converging formulation for diffraction by arbitrary shaped, periodic,anisotropic media. J. Opt. Soc. Am. A 18 (11), 2886–2894.

Prather, D. W., Mirotznik, M. S., Mait, J. N., Jan. 1997. Boundary integralmethods applied to the analysis of diffractive optical elements. J. Opt. Soc.Am. A 14 (1), 34–43.

Prather, D. W., Shi, S., May 1999. Formulation and application of thefinite-difference time-domain method for the analysis of axially symmetricdiffractive optical elements. J. Opt. Soc. Am. A 16 (5), 1131–1142.

Redheffer, R., 1961. Difference equations and functional equations intransmission-line theory. McGraw-Hill, New York, Ch. 12, pp. 282–337.

Schadle, A., Zschiedrich, L., Burger, S., Klose, R., Schmidt, F., Sep. 2007.Domain decomposition method for Maxwell’s equations: Scattering offperiodic structures. Journal of Computational Physics 226 (1), 477–493.

Schuster, T., Ruoff, J., Kerwien, N., Rafler, S., Osten, W., Sep. 2007. Normalvector method for convergence improvement using the RCWA for crossedgratings. J. Opt. Soc. Am. A 24 (9), 2880–2890.

Shcherbakov, A. A., Tishchenko, A. V., Aug. 2010. Fast numerical methodfor modelling one-dimensional diffraction gratings. Quantum Electronics,538+.

Silberstein, E., Lalanne, P., Hugonin, J.-P., Cao, Q., Nov. 2001. Use of grat-ing theories in integrated optics. J. Opt. Soc. Am. A 18 (11), 2865–2875.

Taflove, A., Hagness, S. C., Jun. 2005. Computational Electrodynamics:The Finite-Difference Time-Domain Method, Third Edition, 3rd Edition.Artech House Publishers.

van Beurden, M. C., Nov. 2011. Fast convergence with spectral volume in-tegral equation for crossed block-shaped gratings with improved materialinterface conditions. J. Opt. Soc. Am. A 28 (11), 2269–2278.

Wu, Z., Fang, J., Sep. 1996. High-performance PML algorithms. IEEE Mi-crowave and Guided Wave Letters 6 (9), 335–337.

38

Yee, K., May 1966. Numerical solution of initial boundary value problemsinvolving Maxwell’s equations in isotropic media. Antennas and Propaga-tion, IEEE Transactions on 14 (3), 302–307.

Yeh, P., Mar. 2005. Optical Waves in Layered Media (Wiley Series in Pureand Applied Optics), 2nd Edition. Wiley-Interscience.

Zschiedrich, L., Burger, S., Kettner, B., Schmidt, F., Jan. 2006. AdvancedFinite Element Method for Nano-Resonators. In: Proceedings of SPIE.Vol. 6115.

Zwamborn, A. P. M., van den Berg, P. M., Feb. 1991. A weak form of theconjugate gradient FFT method for plate problems. Antennas and Propa-gation, IEEE Transactions on 39 (2), 224–228.

39

PREVIOUS PUBLICATIONS IN THIS SERIES:

Number Author(s) Title Month

12-30 12-31 12-32 12-33 12-34

M. Abdel Malik E.H. van Brummelen F. Lippoth G. Prokert G. Ali N. Banagaaya W.H.A. Schilders C. Tischendorf G. Ali N. Banagaaya W.H.A. Schilders C. Tischendorf M. Pisarenco J.M.L. Maubach I.D. Setija R.M.M. Mattheij

Numerical approximation of the Boltzmann equation: Moment closure Stability of equilibria for a two-phase osmosis model Index-aware model order reduction for differential-algebraic equations Index-aware model order reduction for index-2 differential-algebraic equations Efficient solution of Maxwell’s equations for geometries with repeating patterns by an exchange of discretization directions in the aperiodic Fourier modal method

Sept. ‘12 Sept. ‘12 Oct. ‘12 Oct. ‘12 Oct. ‘12

Ontwerp: de Tantes,

Tobias Baanders, CWI

eindhoven university of technology - tu/e · pdf fileeindhoven university of technology ......

Documents