This is not a tutorial. It’s just an overview for those who already (hopefully) understood FFT. (Actually, more of a personal note to myself)
DFT but fast
Converts a polynomial into point value forms where the points sampled are principal $n$-th root of unity.
Yes, by doing the exact same thing as DFT but the sampled points are inverted too, then divide the final result by $n$.
Evaluate each sample point in $O(N)$, and you end up in $O(N^2)$ transformation.
Yes, that’s when FFT comes in.
The most commonly used algorithm is Cooley-Tukey algorithm, and the DFT size is usually taken in the form of $2^k$. It is basically a Divide and Conquer algorithm, by separating the odd and even terms, and does FFT recursively. By doing some reordering using bit reversal, you can actually does an iterative version of it. Final complexity is $O(N \log N)$ without any additional space. Details on FFT can be found in this Codeforces blog.
The main idea is sum-convolution (can also be viewed as polynomial multiplication). That is, given two sequences $a$ and $b$, for each pair of $i$, $j$, $a_i \times b_j$ is “contributed” to $c_{i + j}$, where $c$ is the convolution. The way to do that is: $c = IFFT(FFT(a) * FFT(b))$, where $*$ is point-wise multiplication.
Note that by default DFT is a cyclic convolution, i.e. $a_i b_j$ is actually contributed to $c_{(i + j) \bmod n}$ (in other words, the multiplication is in modulo $(x^n - 1)$). Therefore, to avoid “overflow”, you pad the polynoms with $0$s until the polynom size is of size $2^k ≥ |a| + |b|$.
An example problem on how FFT helps. For this problem, you create a polynom where the coefficient of $x^i$ is the number of elements in a that are equal to $i$. After that, you square the polynomial, then start processing from there.
Sometimes FFT is closely related to string “matching” too. In this case, usually each alphabet is associated with a polynom where coefficient $x^i$ equals to $1$ if $s_i$ is equal to the associated alphabet, or $0$ otherwise. Here is an example.
Another time, a DP problem can be optimized using FFT too. This can be noticed when the transition is $O(N)$ and looks like a convolution. In the case of DP, often we mention the polynomial as (Ordinary) Generating Function. The coefficients of the OGF will be of the interest when arriving at the final answer.
Talking about generating functions, there is also something called Exponential Generating Function. In EGF, the coefficient of $x^i$ is $\frac{\text{actual value}}{i!}$. And when you multiply, the contributed value is $\binom{i + j}{i} a_i b_j$.
These are operations on formal power series. The idea is, if you want to compute $Q$ that satisfies $f(Q) = 0$, for some polynom $Q$ and function $f$, we can use the Newton’s method $Q_{n+1} = Q_n - \frac{f(P)}{f’(P)}$.
For example, computing $Q = \frac{1}{P} \implies Q^{-1} - P = 0 \implies f(Q) = Q^{-1} - P$
$Q := Q - \frac{Q^{-1} - P}{-Q^{-2}} = Q + (Q - PQ) = 2Q - PQ^2$
Another example, computing $Q = \sqrt{P} \implies Q^2 - P = 0 \implies f(Q) = Q^2 - P$
$Q := Q - \frac{Q^2 - P}{2Q} = \frac{1}{2}(Q + \frac{P}{Q})$
In every iteration of the Newton’s method, the precision is doubled, i.e. the degree of $Q$ is improved from $n$ to $2n + 1$, and initially $Q$ is only a constant.
And here are several operations along with their tricks:
You can find the details of division, modulo, multipoint evaluation, and interpolation in adamant’s lecture note.
You can actually compute for $\{{z^i\}}_{i=0}^{n-1}$, for any complex number $z$. This is called Chirp Z-transform and can be done using Bluestein’s algorithm. The main idea is to write $ij = \binom{i + j}{2} - \binom{i}{2} - \binom{j}{2}$.
Hence, $\sum(a_i z^{ik}) = \sum \left(a_i z^{-\binom{i}{2}} z^{\binom{i + k}{2}} \right) z^{-\binom{k}{2}}$, which is exactly a convolution and can be computed using FFT. Example problem.
Yes, by using CZT, and $z$ such that $z$ is primitive $n$-th root of unity where $n$ is the DFT size. One example problem that use DFT any size can be found on Codeforces.
Multidimentional DFT can be done by performing DFT on each dimension one by one. An example problem can be found here. In this problem, you will need both sum-convolution for one variable, and xor-convolution for the other in a single polynomial (spoiler in white text).
Xor convolution means that $a_i b_j$ is contributed to $c_{i \oplus j}$. It is basically the standard DFT but multidimensional, and each dimension is DFT of size $2$. This is working because xor is basically addition in vector space $\bmod 2$.
In XOR, you tranform $(u, v)$ to $(u + v, u - v)$.
In AND, you tranform $(u, v)$ to $(u + v, v)$. After performing AND-tranform, you actually ends up with sum of supermasks.
In OR, you tranform $(u, v)$ to $(u, u + v)$. After performing OR-tranform, you actually ends up with sum of submasks.
In NAND, you perform AND, then do the bit inverting (i.e. swap coefficients of bit 0 and bit 1).
An example problem that uses various binary gates can be found on Codeforces
DFT works over an arbitrary ring, as long as you are using the principal $n$-th root of unity as the sampled points. Primitive $n$-th root is also principal $n$-th root. Since primitive roots are roots of the $n$-th cyclotomic polynomial, by ensuring your points satisfy $\Phi_n(x) = 0$, then DFT should work. An example problem.
Suppose that the sum of degrees of the given polynomials is $N$, then we can do either of the following:
All of these run in $O(N \log^2 N)$. This problem require this technique.