Rader's FFT algorithm

Rader's algorithm (1968),[1] named for Charles M. Rader of MIT Lincoln Laboratory, is a fast Fourier transform (FFT) algorithm that computes the discrete Fourier transform (DFT) of prime sizes by re-expressing the DFT as a cyclic convolution (the other algorithm for FFTs of prime sizes, Bluestein's algorithm, also works by rewriting the DFT as a convolution).

Since Rader's algorithm only depends upon the periodicity of the DFT kernel, it is directly applicable to any other transform (of prime order) with a similar property, such as a number-theoretic transform or the discrete Hartley transform.

The algorithm can be modified to gain a factor of two savings for the case of DFTs of real data, using a slightly modified re-indexing/permutation to obtain two half-size cyclic convolutions of real data;[2] an alternative adaptation for DFTs of real data uses the discrete Hartley transform.[3]

Winograd extended Rader's algorithm to include prime-power DFT sizes ,[4][5] and today Rader's algorithm is sometimes described as a special case of Winograd's FFT algorithm, also called the multiplicative Fourier transform algorithm (Tolimieri et al., 1997),[6] which applies to an even larger class of sizes. However, for composite sizes such as prime powers, the Cooley–Tukey FFT algorithm is much simpler and more practical to implement, so Rader's algorithm is typically only used for large-prime base cases of Cooley–Tukey's recursive decomposition of the DFT.[3]

Algorithm

Visual representation of DFT matrix in Rader's FFT algorithm, The array consists of colored clocks represent DFT matrix of size 11. By permuting rows and columns using a sequence generated by the primitive root of 11 (which is 2) except the 1st row and column, the original DFT matrix becomes a circulant matrix. Multiplying a circulant matrix to a data sequence is equivalent to the cyclic convolution. This relation is an example of the fact that multiplicative group is cyclic: .

If N is a prime number, then the set of non-zero indices n = 1,...,N1 forms a group under multiplication modulo N. One consequence of the number theory of such groups is that there exists a generator of the group (sometimes called a primitive root, which can be found quickly by exhaustive search or slightly better algorithms[7]), an integer g such that n = gq (mod N) for any non-zero index n and for a unique q in 0,...,N2 (forming a bijection from q to non-zero n). Similarly k = gp (mod N) for any non-zero index k and for a unique p in 0,...,N2, where the negative exponent denotes the multiplicative inverse of gp modulo N. That means that we can rewrite the DFT using these new indices p and q as:

(Recall that xn and Xk are implicitly periodic in N, and also that e2πi=1. Thus, all indices and exponents are taken modulo N as required by the group arithmetic.)

The final summation, above, is precisely a cyclic convolution of the two sequences aq and bq of length N1 (q = 0,...,N2) defined by:

Evaluating the convolution

Since N1 is composite, this convolution can be performed directly via the convolution theorem and more conventional FFT algorithms. However, that may not be efficient if N1 itself has large prime factors, requiring recursive use of Rader's algorithm. Instead, one can compute a length-(N1) cyclic convolution exactly by zero-padding it to a length of at least 2(N1)1, say to a power of two, which can then be evaluated in O(N log N) time without the recursive application of Rader's algorithm.

This algorithm, then, requires O(N) additions plus O(N log N) time for the convolution. In practice, the O(N) additions can often be performed by absorbing the additions into the convolution: if the convolution is performed by a pair of FFTs, then the sum of xn is given by the DC (0th) output of the FFT of aq plus x0, and x0 can be added to all the outputs by adding it to the DC term of the convolution prior to the inverse FFT. Still, this algorithm requires intrinsically more operations than FFTs of nearby composite sizes, and typically takes 310 times as long in practice.

If Rader's algorithm is performed by using FFTs of size N1 to compute the convolution, rather than by zero padding as mentioned above, the efficiency depends strongly upon N and the number of times that Rader's algorithm must be applied recursively. The worst case would be if N1 were 2N2 where N2 is prime, with N21 = 2N3 where N3 is prime, and so on. In such cases, supposing that the chain of primes extended all the way down to some bounded value, the recursive application of Rader's algorithm would actually require O(N2) time. Such Nj are called Sophie Germain primes, and such a sequence of them is called a Cunningham chain of the first kind. The lengths of Cunningham chains, however, are observed to grow more slowly than log2(N), so Rader's algorithm applied in this way is probably not O(N2), though it is possibly worse than O(N log N) for the worst cases. Fortunately, a guarantee of O(N log N) complexity can be achieved by zero padding.

gollark: It's not as if I actually *use* the EATW formula. It's just there in the code.
gollark: Theoretically, I *could* get the ToDs down to about five minutes, but I don't think there's much demand for that.
gollark: Hatchery enter time.
gollark: ``` code | type | clicks | uniqueViews | views | hoursRemaining | sick | createdAt | updatedAt -------+-----------+--------+-------------+-------+----------------+------+----------------------------+---------------------------- XAn** | hatchling | 8 | 836 | 9120 | 156 | t | 2018-09-04 17:51:17.146+00 | 2018-09-05 11:42:26.133+00 11n** | hatchling | 4 | 820 | 9373 | 156 | t | 2018-09-04 17:51:32.346+00 | 2018-09-05 11:42:26.327+00 aLv** | hatchling | 1 | 1334 | 14000 | 109 | t | 2018-09-04 20:24:56.434+00 | 2018-09-05 11:42:26.326+00 2eV** | hatchling | 3 | 1124 | 11851 | 105 | t | 2018-09-04 20:08:23.8+00 | 2018-09-05 11:42:26.326+00 wOv** | hatchling | 2 | 591 | 7302 | 153 | t | 2018-09-04 20:33:48.953+00 | 2018-09-05 11:42:26.326+00```
gollark: They're slightly rare.

References

  1. C. M. Rader, "Discrete Fourier transforms when the number of data samples is prime," Proc. IEEE 56, 1107–1108 (1968).
  2. S. Chu and C. Burrus, "A prime factor FTT [sic] algorithm using distributed arithmetic," IEEE Transactions on Acoustics, Speech, and Signal Processing 30 (2), 217227 (1982).
  3. Matteo Frigo and Steven G. Johnson, "The Design and Implementation of FFTW3," Proceedings of the IEEE 93 (2), 216–231 (2005).
  4. S. Winograd, "On Computing the Discrete Fourier Transform", Proc. National Academy of Sciences USA, 73(4), 10051006 (1976).
  5. S. Winograd, "On Computing the Discrete Fourier Transform", Mathematics of Computation, 32(141), 175199 (1978).
  6. R. Tolimieri, M. An, and C.Lu, Algorithms for Discrete Fourier Transform and Convolution, Springer-Verlag, 2nd ed., 1997.
  7. Donald E. Knuth, The Art of Computer Programming, vol. 2: Seminumerical Algorithms, 3rd edition, section 4.5.4, p. 391 (Addison–Wesley, 1998).
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.