Using Sun Performance Library Fast Fourier Transform Routines

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

独立的冰棍 · CSS 动画与 JavaScript 动画 ...· 4 周前 ·

卖萌的充电器 · 3.3 数据读入 — 深入浅出PyTorch· 1 周前 ·

爱健身的木瓜 · main@rasterio.groups.i ...· 1 周前 ·

玩滑板的黄花菜 · Transform Logits to ...· 1 周前 ·

气宇轩昂的香瓜 · Copy and transform ...· 4 天前 ·

无聊的猕猴桃 · Java面试考点4之数据结构-腾讯云开发者社 ...· 3 月前 ·

无邪的灌汤包 · WildFly 24· 6 月前 ·

从容的金针菇 · 嵌入式IDE(1)：IAR中ICF链接文件详 ...· 6 月前 ·

谦逊的书包 · Spring ...· 6 月前 ·

稳重的皮带 · QObject::connect: ...· 7 月前 ·

Many problems involve computing the discrete Fourier transform (DFT) of a periodic sequence of length N , where N is the number of data points or samples. The number of calculations required to compute the DFT is proportional to N ² . The fast Fourier transform (FFT) was developed to efficiently compute the DFT, where the number of calculations required to compute the FFT is proportional to N log ₂ N . Sun Performance Library ^TM provides routines for computing the FFT or inverse transform (synthesis) of a sequence of length N . The FFT routines are based on FFTPACK and VFFTPACK, which are collections of public domain subroutines available from Netlib (


    http://www.netlib.org

). These routines have been enhanced and optimized for SPARC ^TM platforms, and then bundled with the Sun Performance Library. The Sun Performance Library also includes two-dimensional FFT routines, three-dimensional FFT routines, and convolution and correlation routines. FFTPACK routines operate on a single one-dimensional sequence of length N . After storing the sequence as a vector in an array, the fast sine, fast cosine, fast Fourier transform, or inverse transform of the sequence is computed. To process an additional sequence, the new sequence must be stored in the array before computing the FFT or inverse transform. VFFTPACK routines store the multiple one-dimensional sequences in a two-dimensional array, but the routines compute only a one-dimensional Fourier transform of each sequence. The two-dimensional and three-dimensional FFT routines provided with Sun Performance Library differ from the VFFTPACK routines. The two-dimensional FFT routines perform a two-dimensional Fourier transform of a sequence stored in a two-dimensional array, and the three-dimensional FFT routines perform a three-dimensional transform of a sequence stored in a three-dimensional array. TABLE 1 summarizes some of the similarities and differences between the single vector FFTPACK, multiple vector VFFTPACK routines, two-dimensional FFT routines, and three-dimensional FFT routines.

TABLE 1 Comparison Between Single Vector and Multiple Vector Routines

Double precision and double complex transforms. Because routines that process double precision and double complex data are not available in the standard package from Netlib, calls to these routines might not be portable.

Two-dimensional and three-dimensional FFTs. Netlib FFTPACK and VFFTPACK routines support one-dimensional FFTs.

Convolution and correlation routines.

Fortran 95 and C interfaces to FFTPACK and VFFTPACK. Conventions for these interfaces are described in detail in the Sun Performance Library User's Guide .

Optimizations for specific SPARC instruction set architectures.

Support for a 64-bit enabled Solaris ^TM Operating Environment.

Support for parallel processing compiler options.

Support for multiple processor hardware options.

The Discrete Fourier Transform (DFT) The following definition of the DFT is used when calculating the complex discrete Fourier transform of a periodic sequence, where

When computing the DFT of a real sequence, the resulting array of Fourier coefficients is conjugate symmetric, where

for k > 1, when using a one-based notation, or

for k > 0, when using a zero-based notation. The asterisk * denotes complex conjugation, where

. The number of calculations required to compute the DFT is reduced by taking advantage of this symmetry. When computing the transform of a real sequence, the complex discrete Fourier transform can be rewritten in the real trigonometric form shown in TABLE 2 . In TABLE 2 ,

equals the real part of

equals the imaginary part of

, and

equals the real part of

TABLE 2 Formulas for Real FFT Routines The FFT routines can be used to compute the discrete Fourier cosine transform, discrete Fourier sine transform, and inverse transforms of the functions listed in TABLE 3 .

TABLE 3 Symmetries Supported by FFT and VFFT Routines Symmetry Definition Trigonometric Expansion Cosine Even-Wave An even function f ( t ) that satisfies the condition f (- t ) = f ( t ). Trigonometric series containing only cosine terms. Cosine Quarter-Wave A even function with half-wave symmetry , where T is the period of the function. Trigonometric series containing only cosine terms with odd wave numbers. Sine Odd-Wave An odd function f ( t ) that satisfies the condition f (- t ) =- f ( t ). Trigonometric series containing only sine terms. Sine Quarter-Wave A odd function with half-wave symmetry Trigonometric series containing only sine terms with odd wave numbers. Prefixes used with FFT and VFFT base names are shown in TABLE 5 .

TABLE 5 Prefix and Operand Data Types Prefix Operand Data Type FFT Routines No prefix REAL , REAL*4 , REAL(4)

I: Initialize the Fourier transform or inverse Fourier transform routine

F: Compute the forward transform (the Fourier transform)

B: Compute the backward transform (the inverse Fourier transform or synthesis)

TABLE 6 FFT and VFFT Base Names Base Name Operation COSQB Inverse cosine quarter-wave transform (synthesis) COSQF Cosine quarter-wave transform COSQI Initialize cosine quarter-wave transform or inverse transform Cosine even-wave transform COSTI Initialize cosine even-wave transform EZFFTB Inverse EZ transform (synthesis) EZFFTF EZ transform EZFFTI Initialize EZ transform Inverse transform (synthesis) Forward transform Initialize before computing a transform or inverse transform SINQB Inverse sine quarter-wave transform (synthesis) SINQF Sine quarter-wave transform SINQI Initialize sine quarter-wave transform or inverse transform Sine odd-wave transform SINTI Initialize sine odd-wave transform

The prefix x is added to the base name when the information applies to


           REAL


           DOUBLE


           COMPLEX

, and


           DOUBLE


           COMPLEX

versions of that routine.

Specific prefixes are listed in square brackets [ ] before the base name when information does not apply to all versions of the routine.

The following example shows samples of these naming conventions.

Convention Routines x FFTF RFFTF , DFFTF , CFFTF , and ZFFTF [R,D]FFTI RFFTI or DFFTI [C,Z]FFTF CFFTF or ZFFTF V[R,D,C,Z]FFTF VRFFTF , VDFFTF , VCFFTF , or VZFFTF Sun Performance Library contains the routines shown in TABLE 8 . The data type of the arguments follows the conventions shown in TABLE 7 .

TABLE 7 Argument Data Types Argument Data Type AZERO , A , B , R (EZFFT routines) FULL , PLACE , ROWCOL Character N , M , K , LDA , LD2A , LDB , LWORK , MDIMX Integer A , B , X , XT Same as data type of routine called WSAVE , WORK See TABLE 11 In addition to the FFT and VFFT routines listed in TABLE 8 , the following routines are described in this manual.

TABLE 9 Convolution and Correlation Routines Routine Arguments Function SCNVCOR , DCNVCOR , CCNVCOR , ZCNVCOR CNVCOR,FOUR,NX,X,IFX, INCX,NY,NPRE,M,Y,IFY, INC1Y,INC2Y,NZ,K,Z, IFZ,INC1Z,INC2Z,WORK, LWORK Convolution or correlation of two vectors SCNVCOR2 , DCNVCOR2 , CCNVCOR2 , ZCNVCOR2 CNVCOR,METHOD,TRANSX, SCRATCHX,TRANSY, SCRATCHY,MX,NX,X,LDX, MY,NY,MPRE,NPRE,Y,LDY, MZ,NZ,Z,LDZ,WORKIN, LWORK Convolution or correlation of two matrices

All arguments are passed by reference.

The number of arguments to a routine is fixed.

Types of arguments must match.

Arrays are stored columnwise.

Indices are based at one, following standard Fortran practice.

C Interface Conventions

Input-only scalars are passed by value rather than by reference. Complex and double complex arguments are not considered scalars because they are not implemented as a scalar type by C.

Complex scalars can be passed as either structures or arrays of length 2.

Types of arguments must match even after C does type conversion. For example, be careful when passing a single precision real value, because a C compiler can automatically promote the argument to double precision.

Arrays are stored columnwise. For Fortran programmers, this is the natural order in which arrays are stored. For C programmers, this is the transpose of the order in which they usually work. References in the documentation and man pages to rows refer to columns and vice versa.

Array indices are based at zero in conformance with C conventions, rather than being based at one in conformance with Fortran conventions.

Using 64-Bit FFT Routines Sun Performance Library provides 32-bit and 64-bit interfaces. To use the 64-bit interfaces:

Modify the Sun Performance Library routine name. For C, FORTRAN 77, and Fortran 95 code, append

_64

to the names of Sun Performance Library routines (for example,


              rfftf_64


              CFFTB_64)

. For Fortran 95 code with the


              USE SUNPERF

statement, routines can be called using the optional interfaces (interfaces where certain arguments can be omitted), but

_64

must still be appended to the Sun Performance Library routine names. The compiler will infer the correct interface and values for the optional arguments, but the compiler cannot determine if the optional arguments are 32-bit integers or 64-bit integers.

Promote integers to 64 bits. Double precision variables and the real and imaginary parts of double complex variables are already 64 bits. Only the integers are promoted to 64 bits.

To control promotion of integer arguments, do one of the following:

To promote all default integers (integers declared without explicit byte sizes) from 32 bits to 64 bits, compile with


              -xtypemap=integer:64

When using Fortran, to promote specific integers, change


              INTEGER


              INTEGER*4

declarations to


              INTEGER*8

Note  When calling a 32-bit interface, such as


             ZFFTF

, from a 64-bit code, Sun Performance Library internally converts the arguments to 64 bits, and then calls the 64-bit interface (


             ZFFTF_64

). Extra overhead is associated with this argument conversion.

If N can be factored into powers of 2, 3, 4, 5, 7, 11, or 13, the transform is computed using the FFT, which is an N log ₂ N operation.

If N is a product of some of these prime factors, along with additional prime factors, the part of N that can be factored into powers of 2, 3, 4, 5, 7, 11, or 13 is computed using the FFT. The transform of the sequence corresponding to the additional prime factors is computed using the DFT, which is an N ² operation.

If N cannot be factored into powers of 2, 3, 4, 5, 7, 11, or 13, the transform of the sequence is computed using the DFT.

For example, the transform of a vector of length N = 1024 = 4 × 4 × 4 × 4 × 4 can be computed using an FFT, because 4 is a factor of 1024. If N = 6080 = 4 × 4 × 4 × 5 × 19, the transform of the vector corresponding to 4 × 4 × 4 × 5 is computed using the FFT, but the part of the vector corresponding to 19 is computed using the DFT. The transform of a vector of length N =1019 is computed using the DFT, because 1019 is not the product of small primes. Computing the Fourier transform is most efficient when

N-1

for fast cosine transforms,

N+1

for fast sine transforms, and

, and

for multi-dimensional FFT routines can be factored into powers of 2, 3, 4, 5, 7, 11, or 13, as summarized in TABLE 10 .

TABLE 10 Values That Must Have 2, 3, 4, 5, 7, 11, or 13 as Factors for Best Performance Routine Values COST , DCOST , VCOST, VDCOST N - 1 SINT , DSINT , VSINT, VDSINT N + 1 All other one-dimensional FFT and VFFT routines Two-dimensional FFT routines M and N Three-dimensional FFT routines M , N , and K The function x FFTOPT can be used to determine the optimal sequence length, as shown in CODE EXAMPLE 1 .


              COSQI, DCOSQI


              3N + 15


              REAL, REAL*8


              COSQB, DCOSQB


              3N + 15


              REAL, REAL*8


              COSQF, DCOSQF


              3N + 15


              REAL, REAL*8


              COST, DCOST


              3N + 15


              REAL, REAL*8


              COSTI, DCOSTI


              3N + 15


              REAL, REAL*8


              EZFFTB


              3N + 15


              REAL, REAL*8


              EZFFTF


              3N + 15


              REAL, REAL*8


              EZFFTI


              3N + 15


              REAL, REAL*8


              RFFTB, DFFTB


              2N + 15


              REAL, REAL*8


              RFFTF, DFFTF


              2N + 15


              REAL, REAL*8


              RFFTI, DFFTI


              2N + 15


              REAL, REAL*8


              CFFTB, ZFFTB


              4N + 15


              REAL, REAL*8


              CFFTF, ZFFTF


              4N + 15


              REAL, REAL*8


              CFFTI, ZFFTI


              4N + 15


              REAL, REAL*8


              SINQB, DSINQB


              3N + 15


              REAL, REAL*8


              SINQF, DSINQF


              3N + 15


              REAL, REAL*8


              SINQI, DSINQI


              3N + 15


              REAL, REAL*8


              SINT, DSINT


              2N + N/2 + 15


              REAL, REAL*8


              SINTI, DSINTI


              2N + N/2 + 15


              REAL, REAL*8


              VRFFTB, VDFFTB


              N + 15


              REAL, REAL*8


              VRFFTF, VDFFTF


              N + 15


              REAL, REAL*8


              VRFFTI, VDFFTI


              N + 15


              REAL, REAL*8

FFT and VFFT routines use the arguments shown in TABLE 12 . Some routines use additional arguments that are described in the sections for those routines.

FFT Routines The results of a complex one-dimensional FFT are stored in-place (in the original input array). Storage problems do not occur when performing the Fourier transform of a complex sequence, because the number of calculated Fourier coefficients equals the number of input values. The real and imaginary values of the Fourier coefficients can be stored in the original complex array without additional storage manipulations. Computing the Fourier transform of a real sequence produces complex Fourier coefficients. The number of computed Fourier coefficients is twice the number of values in the original sequence, because of the real and imaginary parts of the complex Fourier coefficients. The complex vector must be packed before it can be stored in the original real array. This packing is done by not storing the imaginary parts of the one or two Fourier coefficients that are always 0, and by not storing the complex conjugates of the Fourier coefficients.

If N is even:

The real part of X ₀ is stored.
The imaginary part of X ₀ is equal to 0; this part is not stored.
The real and imaginary parts of X ₁ , up to and including the real part of X ₍ _N _/2) , are stored sequentially.
The imaginary part of X ₍ _N _/2) is equal to 0; this part is not stored.
X ₍ _N _- _k ₎ is the complex conjugate of X _k , for k = 1 : N /2 - 1 and is not stored.
If N is odd,

The real part of X ₀ is stored.
The imaginary part of X ₀ is equal to 0 and is not stored.
The real and imaginary parts of X ₁ , up to and including the imaginary part of X ₍₍ _N _-1)/2) , are stored sequentially.
X ₍ _N _- _k ₎ is the complex conjugate of X _k , for k = 1 : (( N -1)/2) - 1 and is not stored. For example, if N = 6, the input array X contains the following six real data points: CODE EXAMPLE 2 computes the FFT and inverse of a real or complex sequence for even and odd values of N . The transform of the complex sequence shows all the Fourier coefficients in an unpacked, complex array. The transform of the real sequence shows the Fourier coefficients stored in a packed, real array. Differences between the real arrays for even and odd values of N can also be compared.

**CODE EXAMPLE 1** `RFFTOPT` Example
Arguments	Description
1. Specify the minimum dimension and data type of the work array `WSAVE` . The minimum dimension and data type depends upon the operand data type and FFT or VFFT routine, as shown in TABLE 11 . 2. Initialize the work array by calling the corresponding FFT or VFFT routine whose base name ends with the character I. For example, when using `RFFTF` or `RFFTB` , initialize the work array by calling `RFFTI` . The same work array can be used for both the transform or inverse transform as long as `N` remains unchanged. Different `WSAVE` arrays are required for different values of `N` . As long as `N` and `WSAVE` remain unchanged, subsequent transforms can be obtained faster than the first transform, because the initialization does not have to be repeated between calls to the transform or inverse transform routines.
Routine	Minimum Work Array Size ( `WSAVE` )	One-Dimensional Routines			VFFT Routines			`VCFFTB, VZFFTB`	If transforming rows: `2 * M + 15` If transforming columns: `2 * N + 15`	`REAL, REAL*8`

CODE EXAMPLE 3 shows a C example that uses


                  dfftf

to compute the Fourier coefficients of a real sequence.

In addition to the VFFT arguments defined in Arguments for One-Dimensional FFT and VFFT Routines , the


                       VCFFTF


                       VZFFTF


                       VCFFTB

, and


                       VZFFTB

routines use one additional argument called


                       ROWCOL.


                       ROWCOL

specifies whether to transform the rows or columns of


                       X(M,N)

. Set


                       ROWCOL

equal to

`R'

`r'

perform the transform or inverse transform on the rows of


                       X(M,N)

. Set


                       ROWCOL

equal to

`C'

`c'

perform the transform or inverse transform on the columns of


                       X(M,N)

. CODE EXAMPLE 4 uses


                       RFFTF

to compute the FFT of a real sequence and


                       RFFTB

to compute the inverse transform. The computed Fourier coefficients are packed and stored in the original real array. The inverse transform is unnormalized and can be normalized by dividing each value by

CODE EXAMPLE 5 uses


                            CFFTF

to compute the FFT of a complex sequence and


                            CFFTB

to compute the inverse transform. Because the number of calculated Fourier coefficients equals the number of input values, the real and imaginary values of the Fourier coefficients can be stored in the original array without additional storage manipulations. The inverse transform is unnormalized and can be normalized by dividing each value by N .

Sequence length For


                            EZFFTF

, a real array containing the sequence to be transformed, unchanged on exit. For


                            EZFFTB

, a real array containing the Fourier coefficients of the inputs.


                            AZERO

The Fourier constant A ₀ Real array containing the real parts of the complex Fourier coefficients. If

is even, then

is length

/2, otherwise

is length (

-1)/2. Real array containing the imaginary parts of the complex Fourier coefficients. If

is even, then

is length

/2, otherwise

is length (

-1)/2.


                            WSAVE

Work array initialized by


                            EZFFTI

CODE EXAMPLE 6 uses


                             EZFFTF

to compute a Fourier transform of a real sequence and


                             EZFFTB

to compute the inverse transform. When using


                             EZFFTF

, the computed Fourier coefficients are stored in the arrays

and

. The input array

is not overwritten. Unlike the output of


                             RFFTF

and


                             DFFTF

, no packing is performed, and the complex conjugates are retained.

**CODE EXAMPLE 5** Fast Fourier Transform and Inverse Transform for Complex Values
Argument	Definition
The EZFFT routines use the arguments shown in TABLE 13 .

CODE EXAMPLE 7 uses


                                 COSQF

to compute the cosine quarter-wave transform of a real sequence and


                                 COSQB

4*N

CODE EXAMPLE 8 uses


                                      VCOSQF

to compute the cosine quarter-wave transform of a single real sequence and


                                      VCOSQB

to compute the inverse transform. The computed Fourier coefficients are packed and stored in the original real array. The inverse transform is normalized.

CODE EXAMPLE 9 uses


                                           VCOSQF

to compute the cosine quarter-wave transform of multiple real sequences and


                                           VCOSQB

to compute the inverse transforms. The computed Fourier coefficients of each sequence are packed and stored in the rows of the original real array. The inverse transforms are normalized.

CODE EXAMPLE 10 uses


                                                COST

to compute the cosine even-wave transform of a real sequence and the inverse transform. The computed Fourier coefficients are packed and stored in the original real array. The inverse transform is unnormalized and can be normalized by dividing each value by


                                                2*(N-1)

CODE EXAMPLE 11 uses


                                                     SINQF

to compute sine quarter-wave transform of a real sequence and


                                                     SINQB

4*N

CODE EXAMPLE 12 uses


                                                          VSINQF

to compute the sine quarter-wave transform of a single real sequence and


                                                          VSINQB

to compute the inverse transform. The computed Fourier coefficients are packed and stored in the original real array. The inverse transform is normalized.

CODE EXAMPLE 13 uses


                                                               VSINQF

to compute the sine quarter-wave transform of multiple real sequences and


                                                               VSINQB

to compute the inverse transforms. The computed Fourier coefficients of each sequence are packed and stored in the rows of the original real array. The inverse transforms are normalized.

CODE EXAMPLE 14 uses


                                                                    SINT

to compute the sine odd-wave transform of a real sequence and the inverse transform. The computed Fourier coefficients are packed and stored in the original real array. The inverse transform is unnormalized and can be normalized by dividing each value by


                                                                    2*(N+1)

Number of rows to be transformed Number of columns to be transformed Two-dimensional array


                                                                    A(LDA,N)

containing the sequences to be transformed and the results of an in-place transform Leading dimension of array containing data to be transformed Work array initialized by x FFT2I


                                                                    LWORK

Dimension of work array


                                                                    WORK

Real two-dimensional FFT routines use the arguments shown in TABLE 15 .


                                                                     PLACE

`I'

`i'

specifies that an in-place transform is performed.

`O'

`o'

specifies that an out-of-place transform is performed.


                                                                     RFFT2F


                                                                     DFFT2F

only:

`F'

`f'

specifies that a full result matrix is generated.
Any other character specifies that a partial result matrix is generated. Number of rows to be transformed Number of columns to be transformed Two-dimensional array


                                                                     A(LDA,N)

containing the sequences to be transformed and the results of an in-place transform Leading dimension of array containing data to be transformed Two-dimensional array


                                                                     B(2*LDB,N

) that stores the results of an out-of-place transform One half of the actual leading dimension of array that stores results of out-of-place transform Work array initialized by x FFT2I


                                                                     LWORK

Dimension of work array


                                                                     WORK

In-place or Out-of-place. When using In-Place, the results are stored in the modified input array that contains one or two additional rows, depending upon whether

is odd or even. When using Out-of-Place, the results are stored in a separate array.

Full or Partial. When using Full, the complex conjugates are retained. When using Partial, the complex conjugates are discarded.

When computing a real one-dimensional FFT, the complex result can be packed and stored in the original array, because the values identically equal to zero and the complex conjugates are not stored. When computing the real two-dimensional FFT using the in-place and partial storage options, the complex conjugates are not stored, but the values identically equal to zero are stored. Saving the values identically equal to zero simplifies the indexing that occurs when computing the two-dimensional FFT. However, the size of the original array is modified to contain one or two additional rows, which are needed to store the values identically equal to zero. The values of the arguments used with the real two-dimensional FFT routines depend upon whether an in-place or out-of place transform is performed, and whether the results are stored in a full or partial result matrix, as shown in TABLE 16 .

In-Place Transform

unused

unused When computing the real two-dimensional FFT of an input sequence of

rows and

columns, the computed Fourier coefficients will be stored in a result matrix with

2*M

rows and

columns when using the Full storage option. When using the Partial storage option, the Fourier coefficients will be stored in a result matrix with

M+2

rows and

columns when

is even, or in a result matrix with

M+1

rows and

columns when

is odd. Sun Performance Library provides the


                                                                       [S,D,C,Z]CNVCOR

routines for computing the convolution or correlation of a filter with one or more input vectors and the


                                                                       [S,D,C,Z]CNVCOR2

routines for computing the two-dimensional convolution or correlation of two matrices. These routines are described in Section Convolution and Correlation Routines . The two-dimensional FFT routines can also be used to compute the two-dimensional convolutions of the two two-dimensional arrays

and

, as described in the following procedure. 1. Compute the two-dimensional FFT of A . 2. Compute the two-dimensional FFT of B . 3. Perform pointwise multiplication of A and B . 4. Compute the inverse two-dimensional FFT of the previous result. The second transpose can be avoided for increased performance by using the


                                                                       VFFT

and


                                                                       [SDCZ]TRANS

routines to explicitly compute the transposed two-dimensional FFT, as described in the following procedure. 1. Use VFFT to compute one dimensional FFTs along the columns of A . 2. Use ZTRANS to transpose A . 3. Use VFFT to compute one-dimensional FFTs along the columns of the new A . 4. Use VFFT to compute one-dimensional FFTs along the columns of B . 5. Use ZTRANS to transpose B . 6. Use VFFT to compute one-dimensional FFTs along the columns of the new B . 7. Perform pointwise multiplication of A and B . 8. Use VFFT to compute inverse one-dimensional FFTs along the columns of the result. 9. Use ZTRANS to transpose the result back into its original order. Sample Program: Two-Dimensional FFT and Inverse Transform CODE EXAMPLE 15 uses


                                                                       CFFT2F

to compute the two-dimensional FFT of a two-dimensional complex sequence and


                                                                       CFFT2B

to compute the inverse transform. The computed Fourier coefficients are stored in the original complex array. The inverse transform is unnormalized and can be normalized by dividing each value by

**CODE EXAMPLE 14** Sine Odd-Wave Transform and Inverse Transform
Argument	Definition
Perform a one-dimensional transform of the columns of the input vector. Transpose the result matrix. Perform a one-dimensional transform of the columns of the result matrix. Transpose the result matrix to restore the original order of the data points. Arguments for Two-Dimensional FFT Routines Complex two-dimensional FFT routines use the arguments shown in TABLE 14 .
Argument	Definition
Full Result Matrix	Partial Result Matrix

Re( X ₀ ) Im( X ₀ ) Re( X ₀ ) Im( X ₀ ) Re( X ₁ ) Im( X ₁ ) Re( X ₁ ) Im( X ₁ ) The inverse transform is unnormalized and can be normalized by dividing each value by

**CODE EXAMPLE 15** Two-Dimensional FFT and Inverse of Complex Sequence
The computed Fourier coefficients are stored in a `(2*M,` `N)` array where one row contains the real part of the complex coefficient and the next row contains the imaginary part of the complex coefficient. In CODE EXAMPLE 15 , to better display the complex conjugate symmetry, the real and imaginary parts of each complex coefficient are displayed on one line. For example, the following output: Transformed Out-of-Place, Full ( 6.241, 0.000) ( 1.173, 0.000) ( -0.018, 1.169) ( 0.304, 0.111) represents the following values for the Fourier coefficients.	Column 1		Column 2

CODE EXAMPLE 17 is a C example that uses


                                                                         zfft2f

to compute the two-dimensional FFT of a two-dimensional complex sequence and


                                                                         zfft2b

to compute the inverse transform. The computed Fourier coefficients are stored in the original complex array. The inverse transform is unnormalized and can be normalized by dividing each value by

CODE EXAMPLE 18 is a C example that uses


                                                                              rfft2f

to compute the two-dimensional FFT of a two-dimensional real sequence and


                                                                              rfft2b

to compute the inverse transform. The computed Fourier coefficients are stored in the original real array using the partial storage option. The inverse transform is unnormalized and can be normalized by dividing each value by

Number of rows to be transformed Number of columns to be transformed Number of planes to be transformed Three-dimensional array


                                                                              A(LDA,N,K)

containing the sequences to be transformed and the results of an in-place transform Leading dimension of array containing data to be transformed, where

LDA

Second dimension of array to be transformed, where


                                                                              LD2A

Work array initialized by x FFT3I


                                                                              LWORK

Dimension of work array


                                                                              WORK

Real three-dimensional FFT routines use the arguments shown in TABLE 18 .


                                                                               PLACE

`I'

`i'

specifies that an in-place transform is performed.

`O'

`o'

specifies that an out-of-place transform is performed.


                                                                               RFFT3F


                                                                               DFFT3F

only:

`F'

`f'


                                                                               A(LDA,N,K)

containing the sequences to be transformed and the results of an in-place transform Leading dimension of array containing data to be transformed Three-dimensional array


                                                                               B(2*LDB,N,K)

that stores the results of an out-of-place transform Leading dimension of array that stores results of out-of-place transform Work array initialized by x FFT3I


                                                                               LWORK

Dimension of work array


                                                                               WORK

In-place or Out-of-place. When using In-Place, the results are stored in the modified input array that contains one or two additional rows, depending upon whether

is odd or even. When using Out-of-Place, the results are stored in a separate array.

Full or Partial. When using Full, complex conjugates are retained. When using Partial, the complex conjugates are discarded.

When computing a real one-dimensional FFT, the complex result can be packed and stored in the original array, because the values identically equal to zero and the complex conjugates are not stored. When computing the real three-dimensional FFT using the in-place and partial storage options, the complex conjugates are not stored, but the values identically equal to zero are stored. Saving the values identically equal to zero simplifies the indexing that occurs when computing the three-dimensional FFT. However, the size of the original array is modified to contain one or two additional rows, which are needed to store the values identically equal to zero. The values of the arguments used with the real three-dimensional FFT routines depend upon whether an in-place or out-of place transform is performed, and whether the results are stored in a full or partial result matrix, as shown in TABLE 19 .

In-Place Transform

unused

unused When computing the real 3D FFT of an input sequence of

rows,

columns, and

planes, the computed Fourier coefficients will be stored in a result matrix with

2*M

rows,

columns for each value of

when using the Full storage option. When using the Partial storage option, the Fourier coefficients will be stored in a result matrix with

M+2

rows and

columns for each value of

when

is even, or in

M+1

rows and

columns when

is odd. For each value of

, the storage format of the Fourier coefficients in the

rows and

columns is the same as for the real two-dimensional FFT routines. See Storage of Real Two-Dimensional Sequences . CODE EXAMPLE 19 uses


                                                                                 CFFT3F

to compute the three-dimensional FFT of a three-dimensional complex sequence and


                                                                                 CFFT3B

to compute the inverse transform. The computed Fourier coefficients are stored in the original complex array. The inverse transform is unnormalized and can be normalized by dividing each value by

N*K

**CODE EXAMPLE 18** Example of Using the Partial Storage Option
Argument	Definition
Perform a one-dimensional transform of the columns of the input vector. Transpose the result matrix. Perform a one-dimensional transform of the columns of the result matrix. Reflect the result matrix so that the planes become columns. Perform a one-dimensional transform of the columns of the result matrix. Reflect and transpose the result matrix to restore the original order of the data points. Arguments for Three-Dimensional FFT Routines Complex three-dimensional FFT routines use the arguments shown in TABLE 17 .
Argument	Definition
Full Result Array	Partial Result Array


                                                                                 CNVCOR

`V'

`v'

specifies that convolution is computed.

`R'

`r'

specifies that correlation is computed.

`T'

`t'

specifies that the Fourier transform method is used.

`D'

`d'

specifies that the direct method is used, where the convolution or correlation is computed from the definition of convolution and correlation. (See Note 1) Length of filter vector, where

NX

0. Filter vector Index of first element of

, where

NX

IFX

1 Stride between elements of the vector in

, where


                                                                                 INCX

> 0. Length of input vectors, where

NY

0. Number of implicit zeros prefixed to the

vectors, where


                                                                                 NPRE

0. Number of input vectors, where

0. Input vectors. Index of the first element of

, where

NY

IFY

1 INC1Y Stride between elements of the input vectors in

, where


                                                                                 INC1Y

> 0. INC2Y Stride between input vectors in

, where


                                                                                 INC2Y

> 0. Length of the output vectors, where

NZ

0. Number of

vectors, where

0. If

, only the first

vectors will be processed. If

, all input vectors will be processed and the last

output vectors will be set to zero on exit. Result vectors Index of the first element of

, where

NZ

IFZ

1 INC1Z Stride between elements of the output vectors in

, where


                                                                                 INCYZ

> 0. INC2Z Stride between output vectors in

, where


                                                                                 INC2Z

> 0. Work array LWORK Length of work array Note 1. When the lengths of the two sequences to be convolved are similar, the FFT method is faster than the direct method. However, when one sequence is much larger than the other, such as when convolving a large time-series signal with a small filter, the direct method performs faster than the FFT-based method. The two-dimensional convolution and correlation routines use the arguments shown in TABLE 21 .


                                                                                  CNVCOR

`V'

`v'

specifies that convolution is computed.

`R'

`r'

specifies that correlation is computed.


                                                                                  METHOD

`T'

`t'

specifies that the Fourier transform method is used.

`D'

`d'

specifies that the direct method is used, where the convolution or correlation is computed from the definition of convolution and correlation. (See Note 1)


                                                                                  TRANSX

`N'

`n'

specifies that

is the filter matrix

`T'

`t'

specifies that the transpose of

is the filter matrix


                                                                                  SCRATCHX

`N'

`n'

specifies that

must be preserved

`S'

`s'

specifies that

can be used for scratch space. The contents of

are undefined after returning from a call where

is used for scratch space.


                                                                                  TRANSY

`N'

`n'

specifies that

is the input matrix

`T'

`t'

specifies that the transpose of

is the input matrix


                                                                                  SCRATCHY

`N'

`n'

specifies that

must be preserved

`S'

`s'

specifies that

can be used for scratch space. The contents of

are undefined after returning from a call where

is used for scratch space. Number of rows in the filter matrix

, where

MX

0 Number of columns in the filter matrix

, where

NX

0 Filter matrix.

is unchanged on exit when


                                                                                  SCRATCHX

`N'

`n'

and undefined on exit when


                                                                                  SCRATCHX

`S'

`s'

. Leading dimension of array containing the filter matrix

. Number of rows in the input matrix

, where

MY

0. Number of columns in the input matrix

, where

NY

0 Number of implicit zeros prefixed to each row of the input matrix

vectors, where


                                                                                  MPRE

0. Number of implicit zeros prefixed to each column of the input matrix

, where


                                                                                  NPRE

0. Input matrix.

is unchanged on exit when


                                                                                  SCRATCHY

`N'

`n'

and undefined on exit when


                                                                                  SCRATCHY

`S'

`s'

. Leading dimension of array containing the input matrix

. Number of output vectors, where

MZ

0. Length of output vectors, where

NZ

0. Result vectors Leading dimension of the array containing the result matrix

, where

LDZ


                                                                                  MAX(1,MZ)

. WORKIN Work array LWORK Length of work array Note 1. When the sizes of the two matrices to be convolved are similar, the FFT method is faster than the direct method. However, when one sequence is much larger than the other, such as when convolving a large data set with a small filter, the direct method performs faster than the FFT-based method. The minimum dimensions for the


                                                                                   WORK

work arrays used with the one-dimensional and two-dimensional convolution and correlation routines are shown in TABLE 24 . The minimum dimensions for one-dimensional convolution and correlation routines depend upon the values of the arguments


                                                                                   NPRE

NX

NY

, and

NZ

. The minimum dimensions for two-dimensional convolution and correlation routines depend upon the values of the arguments shown TABLE 22 .

Number of rows in the filter matrix Number of rows in the input matrix Number of output vectors Number of columns in the filter matrix Number of columns in the input matrix Length of output vectors Number of implicit zeros prefixed to each row of the input matrix Number of implicit zeros prefixed to each column of the input matrix


                                                                                   MPOST


                                                                                   MAX(0,MZ-MYC)


                                                                                   NPOST


                                                                                   MAX(0,NZ-NYC)


                                                                                   MPRE


                                                                                   MPOST


                                                                                   MYC_INIT

, where


                                                                                   MYC_INIT

depends upon filter and input matrices, as shown in TABLE 23


                                                                                   NPRE


                                                                                   NPOST


                                                                                   NYC_INIT

, where


                                                                                   NYC_INIT

depends upon filter and input matrices, as shown in TABLE 23


                                                                                    MYC_INIT

and


                                                                                    NYC_INIT

depend upon the following, where

is the filter matrix and

is the input matrix.


                                                                                    MYC_INIT


                                                                                    MAX(MX,MY)


                                                                                    MAX(NX,MY)


                                                                                    MAX(MX,NY)


                                                                                    MAX(NX,NY)


                                                                                    NYC_INIT


                                                                                    MAX(NX,NY)


                                                                                    MAX(MX,NY)


                                                                                    MAX(NX,MY)


                                                                                    MAX(MX,MY)

The values assigned to the minimum work array size is shown in TABLE 24 .


                                                                                     SCNVCOR, DCNVCOR


                                                                                     4*(MAX(NX,NPRE+NY)


                                                                                     +
                                                                                     

                                                                                     MAX(0,NZ-NY))


                                                                                     REAL, REAL*8


                                                                                     CCNVCOR, ZCNVCOR


                                                                                     2*(MAX(NX,NPRE+NY)


                                                                                     +
                                                                                     

                                                                                     MAX(0,NZ-NY)))


                                                                                     COMPLEX, COMPLEX*16

The most common form of the computation, and the case that executes fastest, is applying a filter vector

to a series of vectors stored in the columns of

with the result placed into the columns of

. In that case,


                                                                                      INCX

= 1,


                                                                                      INC1Y

= 1,


                                                                                      INC2Y

NY


                                                                                      INC1Z

= 1,


                                                                                      INC2Z

NZ

. Another common form is applying a filter vector

to a series of vectors stored in the rows of

and store the result in the row of

, in which case


                                                                                      INCX

= 1,


                                                                                      INC1Y

NY


                                                                                      INC2Y

= 1,


                                                                                      INC1Z

NZ

, and


                                                                                      INC2Z

= 1. Convolution can be used to compute the products of polynomials. CODE EXAMPLE 21 uses


                                                                                      SCNVCOR

to compute the product of 1 + 2 x + 3 x ² and 4 + 5 x + 6 x ² .

**CODE EXAMPLE 19** Three-Dimensional Fast Fourier Transform and Inverse Transform
Argument	Definition
The one-dimensional convolution and correlation routines use the arguments shown in TABLE 20 .
Argument	Definition
Argument	Definition
Transpose(Y)		Transpose(X)	Transpose(X)
Routine	Minimum Work Array Size ( `WORK` )	`SCNVCOR2` ^¹ `, DCNVCOR2` ¹	`MY` `+` `NY` `+` `30`	`COMPLEX, COMPLEX*16`

Making the output vector longer than the input vectors, as in the example above, implicitly adds zeros to the end of the input. No zeros are actually required in any of the vectors, and none are used in the example, but the padding provided by the implied zeros has the effect of an end-off shift rather than an end-around shift of the input vectors. CODE EXAMPLE 22 will compute the product between the vector [ 1, 2, 3 ] and the circulant matrix defined by the initial column vector [ 4, 5, 6 ]:

CODE EXAMPLE 22 Convolution Used to Compute the Product of a Vector and Circulant Matrix

The difference between this example and the previous example is that the length of the output vector is the same as the length of the input vectors, so there are no implied zeros on the end of the input vectors. With no implied zeros to shift into, the effect of an end-off shift from the previous example does not occur and the end-around shift results in a circulant matrix product.

**CODE EXAMPLE 23** Two-Dimensional Convolution Using Direct Method