4/18/00spring 2000 fftw workshop1 ahpcc/ncsa workshop fast fourier transform using fftw guobin ma 1...
TRANSCRIPT
4/18/00 Spring 2000 FFTw workshop 1
AHPCC/NCSA WORKSHOPFast Fourier Transform Using FFTw
Guobin Ma1 ([email protected]),
Sirpa Saarinen2 ([email protected]),
and Paul M. Alsing1 ([email protected]),1AHPCC, 2NCSA
http://www.ahpcc.unm.edu/Workshop/FFTW
4/18/00 Spring 2000 FFTw workshop 2
ContentsFFT basic (Paul)
What is FFT and why FFT
FFTwOutline of FFTW (Guobin)
Characteristics C routines
Performance and C example codes (Sirpa) Fortran wrappers and example codes (Guobin)
Exercises (skipped)
4/18/00 Spring 2000 FFTw workshop 3
FFT Basic
What is FFT and why FFT
by Paul Alsing
4/18/00 Spring 2000 FFTw workshop 4
Fourier Transform: frequency analysis of time series data.DFT: Discrete Fourier Transform (N time/freq points)
FFT: Fast Fourier Transform: efficient implementation ~O(Nlog2N)
12/,,2/,
1,,1,0,,
1
2
1 1
0
/22
1
0
/22
NNntN
nf
Nktktthh
eHN
hdefHth
ehHdtethfH
n
kkk
N
n
Nnkink
tfi
N
k
Nnkikn
tfi
4/18/00 Spring 2000 FFTw workshop 5
Aliasing issues:
Let fc = Nyquist Frequency
= 1/(2t). A sine wave
sampled at fc will be sampled at
2 points, the peak and the trough.
Frequency components f > | fc |
will be falsely folded back into
the range -fc < f < fc.
4/18/00 Spring 2000 FFTw workshop 6
Fourier Transform: radix 2, Danielson-Lanczos
sh
nHsh
nH
HWH
eWheWhe
hehe
heH
k
onk
en
on
nen
NinN
kk
NnkinN
kk
Nnki
N
kk
NnkiN
kk
Nnki
N
kk
Nnkin
' original theof components odd thefrom formed N/2length of
FT theofcomponent th theis ;' original theof componentseven
thefrom formed N/2length of FT theofcomponent th theis where
, /212/
0
2//212/
0
2//2
12/
0
/12212/
0
/22
1
0
/2
4/18/00 Spring 2000 FFTw workshop 7
Fourier Transform: radix 2, Danielson-Lanczos (cont.)
8/length of are ,,,,,,,
4/length of are ,,,
2/length of are ,
Nlength of is
steps8log,
,
,
2424
424
22
NHHHHHHHH
NHHHH
NHH
H
NHWHWHWHW
HWHWHWH
HWHWHWH
HWHH
ooon
oeon
eoon
eeon
ooen
oeen
eoen
eeen
oon
oen
eon
een
on
en
n
oooon
nooen
noeon
noeen
n
eoon
neoen
neeon
neeen
oon
noen
neon
neen
on
nenn
4/18/00 Spring 2000 FFTw workshop 8
Fourier Transform: radix 2, butterfly Cooley-Tukey algorithm
We finally get down to 1-point transforms such as
The question is: which value of m corresponds to which pattern
of e’s and o’s?
The answer is:
Let {e=0,o=1}. Reverse the pattern of e’s and o’s and you will
have the value of m in binary.
1-Nm 0 somefor e)input valu (some moeeeoeeoeo
n hH
4/18/00
Bit reversal:The Cooley-Tukeyalgorithm first rearranges the datain bit reversed form,then builds up thetransform in
N log2N iterations
(decimation in time).
eee
eeo
eoe
eoo
oee
oeo
ooe
eee
eee
eeo
eoe
eoo
oee
oeo
ooe
eee
4/18/00 10
Ordering oftime series(coord space)and frequenciesin fourier (momentum) space.
11
Example Application: Quantum MechanicsPropagation of (dimensionless) Schrodinger Wave Function
tk
tk
e
e
tke
tx
tx
e
e
txe
txeee
VTHtxettx
txHtxtxVx
tx
t
txi
Ntki
tki
tiT
NttxiV
ttxiV
tiV
tTitiVtiT
tHi
N
N
,ˆ
,ˆ
,ˆ space, (momentum)fourier In
,
,
,,space coordinateIn
,
,,,
0,,,,,
1
2/2/1
2/2/1
2/
1
,
,
2/2/
2
2
2
21
1
4/18/00
x
y
y
x
transpose
Transpose data to keepy transforms continguousin memory.
x data is contiguous in memory (Fortran)
Serial FFTs
transposeIn parallel, all x transformsare local operations on eachprocessor (no communication)
In performing the transposeprocessors must perform anAll-to-All communication.
Parallel FFTs
y
xP0 P3P1 P2
x
y P2P0 P1 P3
4/18/00 Spring 2000 FFTw workshop 14
Outline of FFTw
By Guobin Ma
4/18/00 Spring 2000 FFTw workshop 15
Characteristics of FFTwC routines generated by Caml-Light ML1D/nD, real/complex dataArbitrary input size, not necessary 2n
Serial/Parallel, Share/Distributed MemoryFaster than all others, high performancePortable, automatically adapt to machine
4/18/00 Spring 2000 FFTw workshop 16
Two Phases of FFTwHardware dependent algorithmPlanner
‘Learn’ the fast way on your machineProduce a data structure --‘plan’Reusable
ExecutorCompute the transform
Apply to all FFTw operation modes 1D/nD, complex/real, serial/parallel
4/18/00 Spring 2000 FFTw workshop 17
C Routines of FFTwRoutines
1D/nD complex1D/nD realCorresponding parallel (MPI) ones
ArgumentsSpecial notesData formats
4/18/00 Spring 2000 FFTw workshop 18
1D Complex TransformTypical call
#include <fftw.h>…{ fftw_complex in[N], out[N]; fftw_plan p; … p = fftw_create_plan(int n, fftw_direction dir, int flags); … fftw_one(p, in, out); … fftw_destroy_plan(p);}
4/18/00 Spring 2000 FFTw workshop 19
1D Complex Transform (cont.) Routines
fftw_plan fftw_create_plan(int n, fftw_direction dir, int flags);
void fftw_one(fftw_plan plan, fftw_complex *in, fftw_complex *out);
fftw_plan fftw_create_plan_specific(int n, fftw_direction dir, int flags,
fftw_complex *in, int istride,fftw_complex *out, int ostride);
4/18/00 Spring 2000 FFTw workshop 20
1D Complex Transform (cont.) Routines (cont.)
void fftw(fftw_plan plan, int howmany,fftw_complex *in, int istride, int
idist, fftw_complex *out, int ostride, int odist);
fftw_destroy_plan(fftw_plan plan);
4/18/00 Spring 2000 FFTw workshop 21
1D Complex Transform (cont.) Arguments
plan: data structure containing all the information
n: data size
dir: FFTW_FORWARD (-1), FFTW_BACKWORD (+1)
flags: FFTW_MESURE, FFTW_ESTIMATE, FFTW_OUT_PLACE,FFTW_IN_PLACE, FFTW_USE_WISDOM, separated
by |
howmany: number of transforms / input arrays
in, istride, idist: input arrays, in[i*istride+j*idist]
out, ostride, odist: output arrays, ...
4/18/00 Spring 2000 FFTw workshop 22
1D Complex Transform (cont.) Notes
out of place (default), in[N], out[N]
in place, save memory, cost more timeignore ostride and odist; ignore out
in-order output, 0 frequency at out[0]
unnormalized, factor of N
4/18/00 Spring 2000 FFTw workshop 23
nD Complex TransformRoutines, similar to 1D case, except …
fftwnd_plan fftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);
void fftwnd_one(fftwnd_plan plan, , );
fftwnd_plan fftw_create_plan_specific(int rank, const *int n, fftw_direction dir, , , , , );
void fftwnd(fftwnd_plan plan, , , , , , , );
fftwnd_destroy_plan(fftwnd_plan plan);
4/18/00 Spring 2000 FFTw workshop 24
nD Complex Transform (cont.)Arguments
rank: dimensionality of the arrays to be transformed
n: pointer to an array of rank - size of each dimension, e.g. n[8,4,5]
row-major for C, column-major for Fortran
Special routines for 2D and 3D cases
nd -> 2d, 3d
n_dim -> nx, ny or nx, ny, nz
4/18/00 Spring 2000 FFTw workshop 25
1D Real TransformRoutines, similar to 1D complex case, except …
rfftw_plan rfftw_create_plan( , , );
void rfftw_one(rfftw_plan plan, fftw_real *in, fftw_real *out);
rfftw_plan rfftw_create_plan_specific(int n, fftw_direction dir, int flags, fftw_real *in, int istride, fftw_real *out, int ostride);
void rfftw(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_real *out, int ostride, int odist);
rfftw_destroy_plan(rfftw_plan plan);
4/18/00 Spring 2000 FFTw workshop 26
1D Real Transform (cont.)Arguments
dir: FFTW_REAL_TO_COMPLEX = FFTW_FORWARD = -1 FFTW_COMPLEX_TO_REAL = FFTW_BACK_WARD = 1
others have the same meaning as before
4/18/00 Spring 2000 FFTw workshop 27
nD Real TransformRoutines, similar to 1D real case, but …
rfftwnd_plan rfftwnd_create_plan(int rank, const *int n, fftw_direction dir, int flags);
void rfftwnd_one_real_to_complex(rfftwnd_plan plan, fftw_real *in, fftw_complex *out);
void rfftwnd_one_complex_to_real(rfftwnd_plan plan, fftw_complex *in, fftw_real *out);
void rfftwnd_real_to_complex(rfftwnd_plan plan, int howmany, fftw_real *in, int istride, int idist, fftw_complex *out, int ostride, int odist);
4/18/00 Spring 2000 FFTw workshop 28
nD Real Transform (cont.)Routines (cont.)
void rfftwnd_complex_to_real(rfftwnd_plan plan, int howmany, fftw_complex *in, int istride, int idist, fftw_real *out, int ostride, int odist);
rfftwnd_destroy_plan(rfftwnd plan);
Special 2D and 3D routines
4/18/00 Spring 2000 FFTw workshop 29
nD Array Format
nD arrays stored as a single contiguous blockC order, Row-major order
First index most slowly, last most quickly
Fortran order, Column-major orderFirst index most quickly, last most slowly
Static Array - no problemDynamic Array - may have problem in nD case
4/18/00 Spring 2000 FFTw workshop 30
Parallel FFTw
Multi-thread Skipped
MPI nD complex
RoutinesNotesData Layout
1D complexnD real
4/18/00 Spring 2000 FFTw workshop 31
nD Complex MPI FFTwRoutines, similar to uniprocessor case, except mpi…
fftwnd_mpi_plan fftwnd_create_plan(mpi_comm comm, int rank, const *int n, fftw_direction dir, int flags);
void fftwnd_mpi_local_size(fftwnd_mpi_plan p, int *local_first, int *local_first_start, int *local_second_after_transpose, int *local_second_start_after_transpose, int *total_local_size);
local_data = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);
work = (fftw_complex*) malloc(sizeof(fftw_complex) * total_local_size);
4/18/00 Spring 2000 FFTw workshop 32
nD Complex MPI FFTw (cont.)
Routines (cont.)
void fftwnd_mpi(fftwnd_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);
void fftw_mpi_destroy_plan(fftwnd_mpi_plan p);
4/18/00 Spring 2000 FFTw workshop 33
nD Complex MPI FFTw (cont.)Notets
First argument: comm - MPI communicatorData layoutAll fftw_mpi are in-placework:
Optional, Same size as local_data, great efficiency by extra storage
output_order: normal/transposedtransposed: performance improvements, need to reshape the data manually, may have problem sometimes
4/18/00 Spring 2000 FFTw workshop 34
nD Complex MPI FFTw (cont.)Data layout
Distributed dataDivided according to row (1st dimension) in CDivided according to column (last dimension) in Fortran
Given plan, all other parameters regarding to data layout are determined by fftwnd_mpi_local_sizetotal_local_size = n1/np*n1*n2…*nk*n_fieldstransposed_order: n2 will be the 1st dimension in output
inverse transform n[n2,n1,n3,...,nk]
4/18/00 Spring 2000 FFTw workshop 35
1D Complex MPI FFTw Routines, similar to nD case, except no nd…
fftw_mpi_plan fftw_create_plan(mpi_comm comm, int n, fftw_direction dir, int flags);
void fftw_mpi_local_size(fftw_mpi_plan p, int *local_n, int *local_n_start, int *local_n_after_transpose, int *local_start_after_transpose, int *total_local_size);
4/18/00 Spring 2000 FFTw workshop 36
1D Complex MPI FFTw (cont.) Routines (cont.)
void fftw_mpi(fftw_mpi_plan p, int n_fields, fftw_complex *local_data, fftw_complex *work, fftw_mpi_output_order output_order);
void fftw_mpi_destroy_plan(fftw_mpi_plan p);
Generally worse speedup than nD, fit large size
4/18/00 Spring 2000 FFTw workshop 37
nD Real MPI FFTw
Similar to that for uniprocessor and complex MPI Speedup 2, save 1/2 space at the expense of more complicated data formatCan have transposed-order output dataNo 1D Real MPI FFTw
4/18/00 Spring 2000 FFTw workshop 38
Break
4/18/00 Spring 2000 FFTw workshop 39
FFTw Performance
By Sirpa Saarinen
http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt
4/18/00 Spring 2000 FFTw workshop 40
C Example Codes
By Sirpa Saarinen
http://www.ncsa.uiuc.edu/MEDIA/agppt/myFFTW2.ppt
4/18/00 Spring 2000 FFTw workshop 41
FFTW Fortran Wrappersand Example Codes
By Guobin Ma
4/18/00 Spring 2000 FFTw workshop 42
FFTw Fortran-Callable WrappersRoutine names, append _f77 in C routine names
fftw/fftwnd/rfftw/rfttwnd ->
fftw_f77/fftwnd_f77/rfftw_f77/rfttwnd_f77fftw_mpi/fftwnd_mpi -> fftw_f77_mpi/fftwnd_f77_mpie.g. fftwnd_create_plan(3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE | FFTW_IN_PLACE)
-> fftwnd_f77_create_plan(plan, 3, n_dim, FFTW_FORWARD, FFTW_ESTIMATE + FFTW_IN_PLACE)
4/18/00 Spring 2000 FFTw workshop 43
FFTw Fortran-Callable WrappersNotes
Any function that returns a value is converted into a subroutines with an additional (first) parameter. No null in Fortran, must allocate and pass an array for out. nD arrays, column-major, Fortran orderplan variables: be declared as integer
ConstantsFFTW_FORWARD, FFTW_BACKWARD, FFTW_IN_PLACE, …
separated by ‘+’ instead of ‘|’In file fortran/fftw_f77.i, fftw_f90.i, fftw_f90_mpi.i
4/18/00 Spring 2000 FFTw workshop 44
Fortran ExamplesSource codes at AHPCC (tested on Turing, BB, SGI):
~gbma/workshop/fftw/codes orhttp://www.arc.unm.edu/~gbma/Workshop/FFTW/codesComplex data
1D serial, fftw_1d.f901D parallel, fftw_1d_p.f90nD serial, fftw_3d.f90nD Parallel
Normal order, fftw_3d_p_n.f90 Transposed order, fftw_3d_p_t.f90
4/18/00 Spring 2000 FFTw workshop 45
Fortran Examples (cont.)1D case
Input
Forward output Inverse output
nD caseInput
Forward outputInverse output
2
2)(
N
N
ikxdkkexf
)1,...12,2,12,...,,...,2,1,0()( NNNkkF)(xf
zyxzkykxki
zyx dkdkdkekkkzyxf zyx )(),,(
),,( zyxf
)1,...,,...,2,1,0(),,( zyxzyx kkkkkkF
4/18/00 Spring 2000 FFTw workshop 46
1D Serial Fortran ExampleFFTw codes
...
call fftw_f77_create_plan(plan_forward,N, &
FFTW_FORWARD, FFTW_ESTIMATE)
call fftw_f77_create_plan(plan_reverse,N, &
FFTW_BACKWARD,FFTW_ESTIMATE)
...
call fftw_f77_one(plan_forward,in,out)
...
call fftw_f77_one(plan_reverse,out,in)
...
call fftw_f77_destroy_plan(plan_forward)
call fftw_f77_destroy_plan(plan_reverse)
4/18/00 Spring 2000 FFTw workshop 47
1D Parallel Fortran ExampleFFTw codes
...
call fftw_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,N, &
FFTW_FORWARD,FFTW_ESTIMATE)
...
call fftw_f77_mpi_local_sizes(p_fwd, local_n, local_start, &
local_n_after_trans, local_start_after_trans, total_local_size)
...
allocate( psi_local(0:total_local_size-1) )
...
allocate( work(0:total_local_size-1) )
4/18/00 Spring 2000 FFTw workshop 48
1D Parallel Fortran Example (cont.)FFTw codes (cont.)
...
call fftw_f77_mpi(p_fwd,1,psi_local,work,USE_WORK)
...
call fftw_f77_mpi_destroy_plan(p_fwd)
...
call fftw_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD,N, &
FFTW_BACKWARD,FFTW_ESTIMATE)
...
call fftw_f77_mpi(p_rvs,1,psi_local,work,USE_WORK)
...
call fftw_f77_mpi_destroy_plan(p_rvs)
4/18/00 Spring 2000 FFTw workshop 49
nD Serial Fortran ExampleFFTw codes
call fftwnd_f77_create_plan(p_fwd,nd,n_dim, &
FFTW_FORWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)
call fftwnd_f77_one(p_fwd,psi,0)
call fftwnd_f77_destroy_plan(p_fwd)
call fftwnd_f77_create_plan(p_rvs,nd,n_dim, &
FFTW_BACKWARD,FFTW_ESTIMATE + FFTW_IN_PLACE)
call fftwnd_f77_one(p_rvs,psi,0)
call fftwnd_f77_destroy_plan(p_rvs)
4/18/00 Spring 2000 FFTw workshop 50
nD Parallel Fortran Example FFTw codes, normal order, nD local array
n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz
call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&
nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, &
local_last_start, local_nlast2_after_trans, &
local_last2_start_after_trans, total_local_size)
allocate( psi_local(0:nx-1,0:ny-1,0:local_nlast-1) )
allocate( work(0:nx-1,0:ny-1,0:local_nlast-1) )
4/18/00 Spring 2000 FFTw workshop 51
nD Parallel Fortran Example (cont.) FFTw codes, normal order, nD local array (cont.)
call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_fwd)
call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &
nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_rvs)
4/18/00 Spring 2000 FFTw workshop 52
nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array
n_dim(1)=nx; n_dim(2)=ny; n_dim(3)=nz
call fftwnd_f77_mpi_create_plan(p_fwd,MPI_COMM_WORLD,&
nd,n_dim,FFTW_FORWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi_local_sizes(p_fwd, local_nlast, &
local_last_start, local_nlast2_after_trans, &
local_last2_start_after_trans, total_local_size)
allocate( psi_local(0:total_local_size-1) )
allocate( work(0:total_local_size-1) )
4/18/00 Spring 2000 FFTw workshop 53
nD Parallel Fortran Example (cont.) FFTw codes, transposed order, 1D local array (cont.)
call fftwnd_f77_mpi(p_fwd,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_fwd)
n_dim(1)=nx; n_dim(2)=nz; n_dim(3)=ny
call fftwnd_f77_mpi_create_plan(p_rvs,MPI_COMM_WORLD, &
nd,n_dim,FFTW_BACKWARD,FFTW_ESTIMATE)
call fftwnd_f77_mpi(p_rvs,1,psi_local,work,USE_WORK,order)
call fftwnd_f77_mpi_destroy_plan(p_rvs)
4/18/00 Spring 2000 FFTw workshop 54
nD Parallel Fortran Example (cont.) Notes
Normal orderEasy to code, ‘low’ performance
Transposed order‘High’ performance, complicated to code, user reorder data
Use-workHigh efficiency, large memory space
4/18/00 Spring 2000 FFTw workshop 55
Run the Examples at AHPCC Copy files to your directory
cp ~gbma/workshop/fftw/codes/*.* .Compile
make filename.turmake filename.bbmake filename.sgiwith link specification -lfftw -lfftw_mpi (only for MPI)
RunBB: qsub -I -l nodes=2
mpirun -np 2 -machinefile $PBS_NODEFILE filename.bbTuring: filename.turSGI: mpirun -np 2 filename.sgi
4/18/00 Spring 2000 FFTw workshop 56
References Numerical Recipe (FOTRAN)
by / William T. Vetterling et al., New York : Cambridge University Press, 1992
Numerical integration by P. J. Davis & P. Rabinowitz, Waltham, Mass., Blaisdell Pub. Co. 1967
www.fftw.orgFFTW User’s manual
by M. Frigo & S. G. Johnson
4/18/00 Spring 2000 FFTw workshop 57
Acknowledgement Brain Baltz
installation of FFTw at AHPCCrunning MPI at AHPCC
John Greenfieldsetting up the grid access
Andrew Pinedacomputer work environment at AHPCC
Brain Smith & Susan Atlas many stimulated discussions
Many others ...