gpu-based combination of go and po for electromagnetic scattering of satellite

8
Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 Abstract—In the radar cross section (RCS) prediction of complex target, the intensive computational burden occurs while calculating the multiple scattering effect. In order to overcome the large computing, we present the program executing on graphics processing units (GPU’s). In this paper, we analyze the scattering properties of the satellite, on which the antennas are described as cubes and columns, by employing the GPU-based combinational method of geometrical optics (GO) and physical optics (PO) together with the kd-tree technique. Furthermore, due to this distinctive treatment, the improved method yields a superior performance at high frequency. Some examples will be displayed in the following text. The agreement of the results yielded in this paper with the experimental and other exact results demonstrates the accuracy and efficiency of this useful technique. Index Terms—Compute unified device architecture (CUDA), kd-tree, complex satellite I. INTRODUCTION LTHOUGH approximate, the combination of geometrical optics (GO) and physical optics (PO) is an effective method for electromagnetic scattering analysis of electrically large size targets with corner reflector structures, where multiple reflections that make significant contributions to the scattering field occur. Knott [1] analyzed the radar cross section of dihedral corner reflectors in the backward region using physical optics for single reflections and the combination method for double reflections. Ross followed a similar procedure in a previous study [2], and even considered edge-diffraction terms, but considered only 90 °dihedrals. Anderson [3] added higher order reflections to this combination method and further investigated the effects of the dihedral angle. In the papers above, their works are only limited to dihedral and trihedral corner reflectors. Without question, the combination of geometrical optics and physical optics is much more effective than other low frequency numerical methods, such as the method of moments (MoM), for the high-frequency RCS prediction. Unfortunately, Manuscript received June 29, 2011. This work was supported in part by the National Natural Science Foundations of China under Grant No.60871070/ No.61179010, the Fundamental Research Funds for the Central Universities and the Foundation of State Key Laboratory of Astronautic Dynamics. Peng-Bo Wei, Min Zhang and Wang-Qiang Jiang are with School of Science, Xidian University, Xi’an, 710071, China. (Email: pengbo029.ok @163.com ; [email protected] ; [email protected] ) Wei Niu is with State Key Laboratory of Astronautic Dynamics, Xi’an, 710043, China. (Email: [email protected] when encountering the electrically large and complex targets, the combination method is still time-consuming and can hardly meet the practical requirements. Moreover, if the target is described in terms of triangle patches, and it is usually the case, the number of intersection tests for each ray without any acceleration is proportional to the number of triangles, with computational intensity of order N 2 , which will conspicuously aggravate the computational burden of ray tracing. Due to the computing burden of ray tracing tests, the combination method is still not fast enough for practical application. In order to decrease the computation time, many acceleration techniques for ray tracing have been proposed. Sundararajan and Niamat [4] gave the ray-box intersection algorithm to efficiently determine whether rays hit or miss the bounding box of the target. Jin et al. [5] used the octree, which is constructed by recursively subdividing the box into eight children boxes, to reduce the amount of ray-patch intersection tests. Afterwards, the kd-tree, recursively subdividing the box into two uneven boxes using one axis-perpendicular plane, has been proved as the best general-purpose acceleration structure for ray tracing of static scenes in computer graphics [6]. Tao et al. [7] proposed to apply the kd-tree to reduce the time needed to trace each ray tube in shooting and bouncing ray (SBR) method. During the past few years, GPU is changed from the special graphics hardware to the general computing resources, a highly parallel multi-core processor with tremendous computational horsepower and very high memory bandwidth. All these advantages in computation allow GPU successfully applied to other complex computational problems, which is known as the general purpose processing on the GPU (GPGPU) [8]. Especially, the compute unified device architecture (CUDA) developed by NVIDIA provides a simple and efficient way to leverage the massively parallel resources on the GPU. Many works on computational electromagnetics have been reported to use the GPU for acceleration. Graphical electromagnetic computing (GRECO) [9][10] method is the first proposal of using graphics hardware to accelerate computations of the first-order scattered fields of visible surfaces and wedges of the target. Inman and Elsherbeni [11] discussed the GPU implementation of FDTD and got a speedup factor of 40 in 2D case and 14 in 3D case. Peng and Nie [12] proposed the GPU accelerated method of moments and achieved an acceleration ratio about 20. Moreover, a lot of work has been done for the GPU-based acceleration in electromagnetic computation [13]-[15]. In GPU-based Combination of GO and PO for Electromagnetic Scattering of Satellite Peng-Bo Wei, Min Zhang, Wei Niu, and Wang-Qiang Jiang A

Upload: wang-qiang

Post on 11-Oct-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GPU-Based Combination of GO and PO for Electromagnetic Scattering of Satellite

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

1

Abstract—In the radar cross section (RCS) prediction of

complex target, the intensive computational burden occurs while calculating the multiple scattering effect. In order to overcome the large computing, we present the program executing on graphics processing units (GPU’s). In this paper, we analyze the scattering properties of the satellite, on which the antennas are described as cubes and columns, by employing the GPU-based combinational method of geometrical optics (GO) and physical optics (PO) together with the kd-tree technique. Furthermore, due to this distinctive treatment, the improved method yields a superior performance at high frequency. Some examples will be displayed in the following text. The agreement of the results yielded in this paper with the experimental and other exact results demonstrates the accuracy and efficiency of this useful technique.

Index Terms—Compute unified device architecture (CUDA), kd-tree, complex satellite

I. INTRODUCTION LTHOUGH approximate, the combination of geometrical optics (GO) and physical optics (PO) is an

effective method for electromagnetic scattering analysis of electrically large size targets with corner reflector structures, where multiple reflections that make significant contributions to the scattering field occur. Knott [1] analyzed the radar cross section of dihedral corner reflectors in the backward region using physical optics for single reflections and the combination method for double reflections. Ross followed a similar procedure in a previous study [2], and even considered edge-diffraction terms, but considered only 90°dihedrals. Anderson [3] added higher order reflections to this combination method and further investigated the effects of the dihedral angle. In the papers above, their works are only limited to dihedral and trihedral corner reflectors.

Without question, the combination of geometrical optics and physical optics is much more effective than other low frequency numerical methods, such as the method of moments (MoM), for the high-frequency RCS prediction. Unfortunately,

Manuscript received June 29, 2011. This work was supported in part by the National Natural Science Foundations of China under Grant No.60871070/ No.61179010, the Fundamental Research Funds for the Central Universities and the Foundation of State Key Laboratory of Astronautic Dynamics.

Peng-Bo Wei, Min Zhang and Wang-Qiang Jiang are with School of Science, Xidian University, Xi’an, 710071, China. (Email: pengbo029.ok @163.com; [email protected]; [email protected] )

Wei Niu is with State Key Laboratory of Astronautic Dynamics, Xi’an, 710043, China. (Email: [email protected]

when encountering the electrically large and complex targets, the combination method is still time-consuming and can hardly meet the practical requirements. Moreover, if the target is described in terms of triangle patches, and it is usually the case, the number of intersection tests for each ray without any acceleration is proportional to the number of triangles, with computational intensity of order N2 , which will conspicuously aggravate the computational burden of ray tracing. Due to the computing burden of ray tracing tests, the combination method is still not fast enough for practical application.

In order to decrease the computation time, many acceleration techniques for ray tracing have been proposed. Sundararajan and Niamat [4] gave the ray-box intersection algorithm to efficiently determine whether rays hit or miss the bounding box of the target. Jin et al. [5] used the octree, which is constructed by recursively subdividing the box into eight children boxes, to reduce the amount of ray-patch intersection tests. Afterwards, the kd-tree, recursively subdividing the box into two uneven boxes using one axis-perpendicular plane, has been proved as the best general-purpose acceleration structure for ray tracing of static scenes in computer graphics [6]. Tao et al. [7] proposed to apply the kd-tree to reduce the time needed to trace each ray tube in shooting and bouncing ray (SBR) method.

During the past few years, GPU is changed from the special graphics hardware to the general computing resources, a highly parallel multi-core processor with tremendous computational horsepower and very high memory bandwidth. All these advantages in computation allow GPU successfully applied to other complex computational problems, which is known as the general purpose processing on the GPU (GPGPU) [8]. Especially, the compute unified device architecture (CUDA) developed by NVIDIA provides a simple and efficient way to leverage the massively parallel resources on the GPU. Many works on computational electromagnetics have been reported to use the GPU for acceleration. Graphical electromagnetic computing (GRECO) [9][10] method is the first proposal of using graphics hardware to accelerate computations of the first-order scattered fields of visible surfaces and wedges of the target. Inman and Elsherbeni [11] discussed the GPU implementation of FDTD and got a speedup factor of 40 in 2D case and 14 in 3D case. Peng and Nie [12] proposed the GPU accelerated method of moments and achieved an acceleration ratio about 20. Moreover, a lot of work has been done for the GPU-based acceleration in electromagnetic computation [13]-[15]. In

GPU-based Combination of GO and PO for Electromagnetic Scattering of Satellite

Peng-Bo Wei, Min Zhang, Wei Niu, and Wang-Qiang Jiang

A

Page 2: GPU-Based Combination of GO and PO for Electromagnetic Scattering of Satellite

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

2

order to incorporate the multiple-order scattering fields in the scattering analysis of complex target, Tao et al. [16] proposed GPU-based shooting and bouncing ray method, which yields better results for the targets with evident coupling effect. However, in order to satisfy the requirement of convergence, ray tubes on the virtual aperture perpendicular to the incident direction should be greater than about ten rays per wavelength, which can significantly add the computing amount as the frequency of incident wave increase.

In this paper, we present a GPU-based combination of geometrical optics and physical optics method. Within this method, it is obvious that ray tracing is well suitable for parallel processing due to the independence of all the triangle patches, and the computation of every patch for every order of scattering field is performed in parallel implementation. What’s more, So as to precisely determine whether one area is illuminated by reflected rays, the ray tracing is performed backwards with the stack-less kd-tree traversal algorithm. The proposed method is demonstrated to be an efficient and effective approach for scattering analysis of electrically large and complex space target, especially at high frequency.

II. GPU-BASED COMBINATION METHO

CUDA, as it is well known to us, offers an effective way to directly access the massively parallel computing resources on the GPU and is specialized for computationally demanding, highly parallel tasks. The GPU computing environment can appropriately be regarded as programming massively parallel processors. A 32-thread warp operates in the Single Instruction Multiple Data (SIMD) manner, i.e., 32 threads execute the same instruction on different data simultaneously. A thread block consists of several warps, which run in the single program multiple data (SPMD) fashion.

As the electrical current distribution induced by incident field and every order reflected field on one patch can be calculated independent of others, the combination of geometrical optics and physical optics can be easily restructured into the multi-threaded fashion. Given the structure of the space target, on which there are several cubes and columns erected, the dominant terms are electrical current distributions induced by incident field, first order reflected field and second order reflected field as for the scattering, regardless of terms related to diffraction for simplicity. Consequently, as illustrated in fig.1, the procedure of GPU-based combination method can be divided into three steps, and each step executes a corresponding kernel function on CUDA in a multi-threaded fashion. The details of these steps are described in the following text.

Fig.1 The procedure of GPU-based combination method

A. Preparation Work First of all, when the construction of the target model is

finished, we need to partition it into small triangle patches. There are two rules in the partitioning as follows:

1) As we assumed that every patch has two possible contrary states: illuminated or not illuminated, the patches should not be too large in order to describe the properties of the illumination more exactly, such as which region illuminated by the incident rays, which region illuminated by the first reflected rays, and which region illuminated by the second reflected rays, etc. Generally, we let the longest side of the triangles be shorter than one-third of the dimension of the smallest structure in the target.

2) On the condition of rule 1), we should let the patches be the largest. Consequently, the requirement of computer memory and time expenditure can be significantly reduced due to the decrease of the amount of patches. Besides, in order to compute the integral exactly and make no difference to the total efficiency, we utilize the Gordon method [17], instead of the assumption that the field on each patch is a constant.

For engineering application, as long as the meshing surface can give a relatively precise description of the illumination states, the accuracy of the results for the proposed method can be to some accuracy extent assumed to be independent of frequency of incident wave. As the frequency of incident wave increase, the patches need not to be partitioned smaller. Consequently, the amount of computation, also the computation efficiency, will not change with the increase of frequency, and the result can be more accurate, which make this method more advantageous in RCS prediction at high frequency.

The kd-tree is constructed recursively from top to bottom, and the root node corresponds to the bounding box of the target and contains all patches. At each step of the recursive construction, a node, which contains a group of patches that overlap the axis-aligned box of the node, is processed to be an interior node or a leaf node. The choice of the splitting plane is based on the ray-tracing cost estimation model and the best known greedy Surface Area Heuristic (SAH) heuristic [18] gives the least cost. Patches of this node are then associated with the child node they overlap in space, and if the patch is across the splitting plane, it should be associated with both children nodes. These two children nodes are then processed recursively. If the number of patches in this node is less than the user-defined number, or the depth of this node is above the maximum depth, or there is no benefit to split this node, a leaf node will be created with its associated patches, and the recursion of this node is terminated. Due to space limitations, further illustration is omitted. The detail of kd-tree construction is well described in Pharr and Humphreys’ book [19].

Kd-tree with ropes augment the leaf nodes with links, such that a direct traversal to adjacent nodes is possible: For each face of a leaf node they store a pointer to the adjacent leaf node, or, in case there is more than one adjacent leaf node overlapping that face, to the smallest node containing all

Page 3: GPU-Based Combination of GO and PO for Electromagnetic Scattering of Satellite

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

3

adjacent leaf nodes. Faces that do not have adjacent nodes lie on the border and point to a special nil node. During the traversal, the ray passing through a leaf node can directly move onto the adjacent node through its exit rope avoiding the requirement of the stack to keep to-be-visited nodes, and subsequently this manner removes the unnecessary traversal steps of interior nodes. More details can be found in [20].

B. Step One: Computation about Incident Wave In this method, the square root of RCS is composed of three

components that will be calculated in the following three steps respectively, and can be described as:

( )

( )

( )

0

1 11

2 22

ˆ ˆˆ ˆ ˆ ˆexp( ( ))

ˆˆ ˆ ˆ ˆexp( ( ))

ˆˆ ˆ ˆ ˆexp( ( ))

m

m

m

r m i ms

r m r m r m ms

r m r m r m ms

ki e s n h ikr i s ds

ki e s n h ikr i s ds

ki e s n h ikr i s ds

σπ

π

π

⎧ ⎫⎡ ⎤= ⋅ × × ⋅ −⎨ ⎬⎣ ⎦⎩ ⎭⎧ ⎫⎡ ⎤+ ⋅ × × ⋅ −⎨ ⎬⎣ ⎦⎩ ⎭⎧ ⎫⎡ ⎤+ ⋅ × × ⋅ −⎨ ⎬⎣ ⎦⎩ ⎭

∑ ∫

∑ ∫

∑ ∫

r

r r

r r

(1) where: the

0∑ ,1∑ and

2∑ are operated on the patches

illuminated by incident field, first reflected field and second reflected field respectively; i is a unit vector along the direction of incidence; s is an unit vector along the scattering direction; ˆre is a unit vector along the electric polarization of a

far-field receiver; ih is a unit intensity vector of the incident

magnetic field; 1r mhr

and 2r mhr

are the magnitude vectors of the first reflected and second reflected magnetic field illuminating on the m th patch respectively; 1r mi and 2r mi are the unit vectors along the direction of the first reflected and second reflected magnetic field illuminating on the m th patch respectively; ˆmn is the unit normal of m th patch; ms in the integral represents the surface of the m th patch; What’s more, the polygon integration over each patch can be easily converted to contour integration with the Gordon method. Take the m th patch for example:

( )( )

( )0

ˆ ˆ( )

1

ˆˆ ˆ ˆ

ˆˆ ˆ ˆ sin( / 2)ˆ/ 2

m

m

ikr i sr m i m

s

Mr m imikr w ikr w

mmm

ki e s n h e ds

e s n h ka we p a eka wT

π

π

⋅ −

⋅ ⋅

=

⎡ ⎤⋅ × × =⎣ ⎦

⎡ ⎤⋅ × × ⋅⎣ ⎦ ⋅⋅

r

r r r rr r

rr r

(2)

where: 0rr is the position vector of an origin on or near the

plate; ˆ ˆw i s= −r ; mar is a vector describing the length and

orientation of the m th edge of the plate, arranged tip to tail around the perimeter; mr

r is the position vector of the midpoint of the m th edge; T is the length of the projection of wr onto the plane of the plate; ˆ ˆ ˆ/p n w n w= × ×

r r is a unit vector in the

plane of the plate perpendicular to wr ; M is the number of plate edges.

ˆ ˆ 0mi n⋅ <

1r mhr

1r mr

Fig.2 Flow chart of Algorithm 1 for step 1.

Once the kd-tree with ropes is constructed, the rays for all incident and reflected directions can be traced efficiently. Differently, we use the backward ray-tracing for all steps. Besides, for the patches are small enough, we assume that if there is a ray intersecting the patch at central point, the entire patch is also illuminated by rays parallel to the ray. In step one, taking backward central ray-tracing of m th patch for example, we give the algorithm (Algorithm 1) operated in one of the parallel threads on GPU, as Fig.2 shows. For the origin of the ray is the central point of the m th patch and the direction contrary to the one of the incident wave ih , the traversal begins with the leaf node that the m th patch belongs to. At the leaf node, we let the ray do the intersection test with the patches which satisfied the condition ˆ ˆ 0mi n⋅ < , for only these patches can possibly be illuminated by the incident wave. If there is no intersection, the traversal moves onto the node that the rope of the exit face leads to. Meanwhile, if the node is an interior node, the kd-tree is traversed down until a leaf node is encountered, and then the intersection test is processed consequently. Afterward, the traversal continues until the nil node is encountered or the ray intersected with one patch. In the entire course, if the nil node is encountered, we consider that the m th patch is illuminated by the incident wave. Accordingly, the integral on the m th patch are calculated, and the intensity, the propagation direction of the field first reflected by m th patch according to theory of GO are stored for the computation of next step. Or else if the ray intersects

Page 4: GPU-Based Combination of GO and PO for Electromagnetic Scattering of Satellite

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

4

with one patch, it suggest that the central ray of m th patch is shaded by the patch, the program immediately go to the end and nothing will be done for the m th patch.

In the Algorithm1 above, the intensity, the propagation direction of the field first reflected by m th patch is calculated by the theory of GO at the intersection which is discussed in detail in [21]. For the calculation of the sum of the results in a block in shared memory which need less time to read and write than the global memory, as the Fig.3 shows, the parallel reduction algorithm is utilized to get an efficiency improvement. Besides, it is in this step that bank conflicts should be avoided, whose details can be found in [22].

Fig.3 Parallel reduction algorithm in a block

iθiθ

(a) (b)

Fig. 4. Dihedral corner reflector. (a) front view. (b) lateral view.

C. Step Two: Computation about the First Reflected Field In this part, the reflected field is also traced backwards, and the reason why we trace rays in this fashion is interpreted as follows. Taking the dihedral corner reflector for example, at first, we assume that if the central ray reflected by patch A intersects the patch B, the entire patch B is illuminated by parallel rays reflected by patch A. As Fig.4 shows, the reflector is a right-angle corner and two plate surfaces are partitioned into small triangle patches. Incident rays reflected by plate 1 intersect the plate 2 in some area. For 45iθ < ° case, as Fig.4 (b) shows, the entire plate 2 is illuminated by the rays reflected by the bottom part of the plate 1. In this case, the number of the patches of the bottom part of the plate 1 is much less than the one of the patches of the entire plate2. If the normal central ray-tracing is still adopted, many patches of the plate 2 will be missed. Otherwise, if 45iθ > ° , many patches on the plate 2 will be traced repeatedly.

Accordingly, we adopt the backward ray-tracing for the identification. Taking one of the patches of the plate 2 for

example, we do the tests with the patches which have been illuminated by the incident rays one by one. In every test, there is one patch A from plate 2 and one patch B from the illuminated patches of the plate 1. We let the origin of the ray be the central point of patch A, and the direction of the ray be the direction contrary to the one of the ray reflected by patch B. If the ray intersects the patch B, we consider that the patch A is illuminated by the field reflected by the patch B. Otherwise, the test moves onto the next illuminated patch. If all the rays do not intersect the patches respectively, we say the patch A is not illuminated by the reflected field. What’s more, if the structure of the target is relatively complex, we must ensure that the ray is not cut by other patches between the patch A and patch B before the positive identification.

1 0r i mr n⋅ <r r

2r mr 2r mhr

i

i

1i =

1i i= + If i n≤

Fig.5. Flow chart of Algorithm 2 for step 2.

In step two, we offer the algorithm (Algorithm 2) operated in one of the parallel threads on GPU, as Fig.5 illustrates. In

Page 5: GPU-Based Combination of GO and PO for Electromagnetic Scattering of Satellite

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

5

this step, take the m th thread (i.e. m th patch) for example, we let the m th patch do the tests for all the n patches illuminated by the incident wave. During the tests, if the direction of the field reflected by the i th patch 1r irr does not satisfy

1 0r i mr n⋅ <r r (i.e. the m th patch cannot be illuminated by the field reflected by the i th patch), the test skip the i th patch to the next one in n patches. Otherwise, the backward ray test with i th patch is done. If the ray intersects the m th patch, it suggests that the m th patch is likely illuminated by the field first reflected by the i th patch unless it is shaded by the other patch between them. Accordingly, when the judgment is true, we implement the backward ray tracing to find whether the field is cut off by the other patch during the propagation. The traversal process is similar to step 1. In the entire course, if the ray intersects with i th patch, we consider that the m th patch is illuminated by the field reflected by the i th patch. Accordingly, the integral on the m th patch are calculated, and the intensity, the propagation direction of the field second reflected by m th patch according to theory of GO are stored for the computation of next step. If the ray intersects with the other patch or the ray meets the nil node, it suggest that the m th patch is not illuminated by the field reflected by the i th patch, and the test goes to the next patch in n patches. What’s more, if the m th patch is illuminated by different first reflected fields in the n patches, the scattering contribution of the m th patch caused by the first reflected field is the sum of the results calculated in the entire cycle, and the different second reflected fields are calculated and stored respectively. The parallel reduction algorithm is utilized for the sum of the parallel results in a block.

D. Step Three: Computation about the Second Reflected Field As for this step, we only need to replace the patches

illuminated by the incident wave by the ones illuminated by first reflected fields, and the other parts of the algorithm are absolutely the same with the one for the second step.

III. RESULTS AND DISCUSSION

First of all, as Table 1 shows, clear are the reference parameters of GPU on which all the following examples are implemented. The targets to be tackled are all supposed to be perfectly electrically conducting.

Table 1 Parameters of GPU and other reference parameters

NVIDIA Enforced GTX 460 Other parameters

Memory 1024MB Type Intel Core I5

Memory Type GDDR5

Stream Processors 336 Multi-Core Technology Quad-Core

Core Clock 700 MHz

CPU

Clock S d

2.66 GHz Shader Clock 3600 MHz Main memory 4G

The trihedral corner reflector is a typical benchmark target

for verifying the high frequency multiple-bounce scattering [23],[24]. At first, we apply this method to the trihedral corner

reflector, as the Fig.6 shows, which is the same with the one in paper [11] in order to make a comparison. The trihedral corner reflector used in this paper is constructed of three right-angled triangles with the side length 1 m. Two different incident parameters are used to evaluate the accuracy of the proposed combination method: (a) ϕ from 0° to 90° on the

60θ = ° plane with an angular resolution of 1° at 3 GHz; (b) θ from 0° to 90° on the 45ϕ = ° plane with an angular resolution of 1° at 6 GHz.

0 20 40 60 80 10012

14

16

18

20

22

24

26

28

X

Y

Z

1m

1m

1mφ

60°RC

S(dB

sm)

ϕ(degree)

GPU-based combination method MLFMM

(a)

0 20 40 60 80 100-10

-5

0

5

10

15

20

25

30

35

X

Y

Z

1m

1m

1mθ

45°

RC

S(dB

sm)

θ(degree)

GPU-based combination method MLFMM

(b)

Fig.6. Comparison of GPU-based combination method result and the MLFMM result for the trihedral corner reflector. (a) HH-polarization result for the incident plane 60θ = ° at 3 GHz, (b) VV-polarization result for the incident plane 45ϕ = ° at 6 GHz.

Table 2 Comparison of the computation time of different methods.

Situations Frequency MLFMM(sec.) GPU-based SBR(sec.)

GPU-based combination method(sec.)

Situation1 3 GHz 20250 8.73 4.36

Situation2 6 GHz 148338 32.17 4.36

The monostatic RCS results of the HH-polarization using the parameter (a) and the VV-polarization using the parameter (b) are shown in Fig.6. We compare the GPU-based combination method result with the MLFMM result, and the figure shows a good agreement between the GPU-based combination method result and MLFMM result. The comparison of the computation time of different methods is illustrated in Table 2, here the computation time is the total

Page 6: GPU-Based Combination of GO and PO for Electromagnetic Scattering of Satellite

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

6

time for all the 90 incident angles. It’s worth to notice that the GPU-based combination method is independent of the frequency of the incident wave.

The second target is an example in paper [25]. The geometry of the plate and cylinder is shown in Fig.7. We evaluate the result of VV-polarization at 10 GHz where φ is from 180− ° to 180° at the 90θ = ° plane and compare it with the experiment result. From Fig.8, a good agreement is observed between the two results.

(a) (b)

Fig.7. Geometry of the plate and cylinder. (a) Front view. (b) Top view.

-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180-25

-20

-15

-10

-5

0

5

10

15

20

RC

S(d

Bsm

)

φ(degree)

GPU-based combination method experiment result

Fig.8. The comparison of GPU-based combination method result and the experiment result of the geometry of the plate and cylinder for VV-polarization on the incident plane 90θ = ° at 10 GHz.

Fig.9. The model of satellite.

Here we apply the GPU-based combination method to the RCS prediction of satellite, on the platform of which there are many antenna structures, like columns and cubes etc., as the Fig. 9 shows. The model is 10.5 meters’ long at the largest dimension and is partitioned into 39446 small patches.

The result derived from the GPU-based combination method is compared with the experimental result, and the experiment is carried out at 3 GHz for VV-polarization. It is worth to note that the VV polarization pattern is relative to ground coordinate system, not the satellite coordinate as Fig.9

shows. During the calculation, the transmitting V polarization wave is transformed into V and H components relative to satellite system, and then the scattered field is inversed back to ground system. In addition, for the testing orientations θ and ϕ in satellite system both change during the process, the abscissas in the figures are only noted with sample numbers. As Fig. 10 illustrated, (a) is a comparison of GPU-based combination method result, experimental result and the pure PO method (only single reflections included) result; (b) is a comparison of GPU-based combination method result and the experimental result for another testing sample. From the contrast, the GPU-based combination method incorporating the higher orders of scattering, which play the important role in RCS contribution of the targets where some structures are intensely coupled with each other, appears a good performance, and the pure PO method’s performance is rather poor. The predicted deviation between proposed method and pure PO method is mainly due to multiple-bounce interactions.

0 2000 4000 6000 8000 10000 12000 14000 16000-40

-35

-30

-25

-20

-15

-10

-5

0

5

10

15

20R

CS(

dBsm

)

testing samples of incident direction

GPU-based combination method experimental result pure PO method

(a)

0 2000 4000 6000 8000 10000 12000 14000-15

-10

-5

0

5

10

15

20

RC

S(dB

sm)

testing samples of incident direction

GPU-based combination method experimental result

(b)

Fig. 10. The comparison of GPU-based combination method result and the experiment result for the RCS prediction of satellite. (a) and (b) are two different tests.

In addition, Table 3 shows some details and efficiency comparison between GPU-based combination method and conventional combination method in the computation of satellite RCS prediction. It’s worthy to note that due to high

Page 7: GPU-Based Combination of GO and PO for Electromagnetic Scattering of Satellite

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

7

computing power on the GPU and the stack-less kd-tree traversal algorithm, the proposed GPU-based combination method is about 71.78 times faster than the CPU combination method per incident angle on average.

Table 3. Details and efficiency comparison in the satellite RCS prediction. GPU-COM and CON-COM represent GPU-based combination method and conventional combination method respectively. Total kd-tree memory includes the information of kd-tree nodes, triangle patches’ IDs in leaf nodes, rope pointers, and bounding boxes of leaf nodes.

Method Triangle number

Triangle Memory

(kB)

Leaf number

Total Kd-tree memory

(kB)

Average time for

a test point (s)

GPU-COM 39446 2683.18 49732 5336.29 1.09

CON-COM 39446 2683.18 ----- ----- 78.23

IV. CONCLUSION

A combination of GO and PO method, which focuses on the higher order items of multiple reflections, is implemented on CUDA GPU computing environment together with the stack-less kd-tree traversal algorithm. Thanks to the GPU hardware and the stack-less kd-tree traversal algorithm, the efficiency of the combination method is significantly improved and practical applications to the RCS prediction of satellites with complex structure can be performed effectively. What’s more, in the ray tracing part, the ray was traced backwards so that the contributions of some patches will not be missed or computed repeatedly. Numerical results show good agreement with the exact solutions, and demonstrate that the GPU-based combination method is an effective method for the computation of electrically large complex targets. In this method, it is also worth noting that the amount of the computation will not change with the frequency of incident wave increasing, which shows a pretty good performance in the high frequency case.

REFERENCES [1] E. F. Knott, “RCS reduction of dihedral corners,” IEEE Trans. Antennas

Propagat, vol. 25, no. 3, pp. 406-409, May 1977. [2] R. A. Ross, “Application of geometrical diffraction theory to reflex

scattering centers,” IEEE International Antennas and Propagation Symposium, Conference Publication 68 C 29 AP, September 9-11,1968.

[3] W. C. Anderson, “Consequence of nonorthogonality on the scattering properties of dihedral reflectors,” IEEE Trans. Antennas Propagat., vol. 35, no. 10, pp. 1154-1159, Oct. 1987.

[4] P. Sundararajan and M. Y. Niamat, “FPGA implementation of the ray tracing algorithm used in the XPATCH software,” in Proc. IEEE MWSCAS’01, Dayton, OH, vol. 1, pp. 446-449, Aug. 2001.

[5] K. S. Jin, , T. I. Suh, S. H. Suk, B. C. Kim, and H. T. Kim, “Fast ray tracing using a space-division algorithm for RCS prediction,” Journal of Electromagnetic Waves and Applications, vol. 20, No. 2, pp. 194-205,2008.

[6] V. Havran, “Heuristic Ray Shooting Algorithm,” Ph.D. dissertation, Univ. Czech Technical, Prague, 2000.

[7] Y. B. Tao, H. Lin, and H. J. Bao, “Kd-tree based fast ray tracing for RCS prediction,” Progress Electromagn. Res.(PIER), vol. 81, pp. 329-341,2008.

[8] J. D. Owens, D. Luebke, N. Govindaraju, M.Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” Comput. Graphics Forum, vol. 26, No. 1, pp. 80-113, 2007.

[9] J. M. Rius, M. Ferrando, and L. Jofre, “High frequency RCS of complexradar targets in real time,” IEEE Trans. Antennas Propag., vol. 41, no. 9, pp. 1308–1318, 1993.

[10] J. M. Rius, M. Ferrando, and L. Jofre, “GRECO: Graphical electromagnetic computing for RCS prediction in real time,” IEEE Antennas Propag. Mag., vol. 35, no. 2, pp. 7–17, 1993.

[11] M. J. Inman and A. Z. Elsherbeni, “Programming video cards for computational electromagnetics applications,” IEEE Antennas Propag. Mag., vol. 47, no. 6, pp. 71–78, 2005.

[12] S. X. Peng and Z. P. Nie, “Acceleration of the method of moments calculations by using graphics processing units,” IEEE Trans. Antennas Propag., vol. 56, no. 7, pp. 2130–2133, 2008.

[13] Y. B Tao, H. Lin, and H. J. Bao, “From CPU to GPU: GPU-based electromagnetic computing (GPUECO)”, PIERS 81, pp.1-19, 2008.

[14] S. H. Zainud-Deen and E. El-Deen, Electromagnetic Scattering Using GPU-Based Finite Difference Frequency Domain Method, PIER B, vol.16, pp.351-369, 2009.

[15] S. Chen, S. Dong, and X.-L. Wang, “GPU-based accelerated FDTD simulations for double negative (DNG) materials applications,” International conference on Microwave and Millimeter Wave Technology (ICMMT), pp.839-841, 2010.

[16] Y. B. Tao, H. Lin, and H. J. Bao, “GPU-based shooting and bouncing ray method for fast RCS prediction," IEEE Trans. Antennas Propag., Vol. 58, No. 2, 494-502, 2010.

[17] W. Gordon, “Far-Field Approximations to the Kirchhoff-Helmholtz Representations of Scattered Fields,” IEEE Trans. Antennas Propag., Vol. 23, No. 4,pp. 590-592, July 1975.

[18] J. D. Macdonald and K. S. Booth, “Heuristics for ray tracing using space subdivision,” Proc. Graphics Interface’89, pp. 152–163, Jun. 1989.

[19] M. Pharr and G. Humphreys, “Physically Based Rendering: From Theory to Implementation”, New York: Morgan Kaufmann, 2004.

[20] S. Popov, J. Günther, H.-P. Seidel, and P. Slusallek, “Stackless kd-tree traversal for high performance GPU ray tracing,” Comput. Graphics Forum, vol. 26, no. 3, pp. 415–424, 2007.

[21] C. A. Balanis, Advanced Engineering Electromagnetics, Wiley, New York, 1989.

[22] M. HARRIS, S. SENGUPTA, and J. D. OWENS, “parallel prefix sum (scan) with CUDA. In GPU Gems 3”, Nguyen H., (Ed.). Addison Wesley, Aug. 2007.

[23] J. Baldauf, S. W. Lee, L. Lin, S. K. Jeng, S. M. Scarborough, and C. L. Yu, “High frequency scattering from trihedral corner reflectors and other benchmark targets: SBR vs. experiments,” IEEE Trans. Antennas Propag., vol. 39, no. 9, pp. 1345-1351, 1991.

[24] F. Weinmann, “Ray tracing with PO/PTD for RCS modeling of large complex objects,” IEEE Trans. Antennas Propag., vol. 54, no. 6, pp. 1797-1860, 2006.

[25] W. J. Zhao, “Study on Prediction Techniques of Radar Cross Section of Complex Targets,” Ph.D. dissertation, Xidian University, 1999.

[26] W. J. Zhao, L. W. Li, and L. Hu, “Efficient current-based hybrid analysis of wire antennas mounted on a large realistic aircraft,” IEEE Transactions on Antennas and Propagation, vol. 58, no. 8, pp2666-2672, 2010.

Page 8: GPU-Based Combination of GO and PO for Electromagnetic Scattering of Satellite

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

8

Peng-Bo Wei was born in Shaanxi, China, in 1988. He received the B.S. degree in applied physics from Xidian University, Xi’an, China, in 2007. He is currently with School of Science, Xidian University, and working toward the Ph.D. degree in radio physics. His research interests include computational electromagnetics, EM theory and its application, and parallel-computing techniques.

Min Zhang received the B.Sc. degree in physics from Shaanxi Normal University, Xi’an, China, in 1990, the M.Sc. degree in radio physics from Xidian University, Xi’an, in 1998, and the Ph.D. degree in astrometry and celestial mechanics from Shaanxi Astronomical Observatory, Chinese Academy of Sciences, Xi’an, in 2001. From 2001 to 2003, he was a Postdoctoral Research Fellow in electromagnetic field and microwave technique with Xidian University, where he is currently a full

Professor with the School of Science. From May 2003 to October 2004, he was a Research Fellow with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore. His current research interests include electromagnetic model design, integral equation techniques, and fast hybrid algorithms for electromagnetic scattering, radiation, and wave propagation.

Niu Wei, Associate Professor, He is currently with the State Key Laboratory of Astronautic Dynamics. His current research interests include numerical methods in electromagnetic fields and electromagnetic scattering of satellites.

Wang-Qiang Jiang was born in Fujian, China, in 1986. He received the B.S. degree in applied physics from Xidian University, Xi’an, China, in 2009. He is currently with School of Science, Xidian University, and working toward the Ph.D. degree in radio physics. His research interests include computational electromagnetics, theoretical modeling on electromagnetic scattering of objects.