ZeroCorrelation Linear Cryptanalysis
Introduction
Update: Slides from the related presentation at DTU, can now be found here.
Those who have read Howard M. Heys excellent introduction to Linear and Differential cryptanalysis or even worked with linear hulls (e.g. read the Rijndael book) might be tempted to believe that the absence of linear approximations with a nonzero correlation implies that the cipher is immune to linear cryptanalysis. However the existence of linear hulls with correlation zero also allows an attacker to distinguish the cipher from a random permutation with high probability. In this entry we will explore the work by Andrey Bogdanov and Vincent Rijmen on socalled “ZeroCorrelation Linear Cryptanalysis” and apply the technique to a toy cipher.
This post is based on the orignal paper by Bogdanov and Rijmen. I highly recommend reading the paper for those who find this entry of interest. The original attack is mostly of theoretical interest (due to high data complexity), but has important implications for designing ciphers secure against linear cryptanalysis, e.i. Challenging the notation that Widetrail and counting active SBoxes is sufficient for claiming resistance against linear cryptanalsysis.
Beyond Heys : An Introduction to KeyDependence
I assume the reader has some familiarity with linear cryptanalysis (e.g. has read Heys and is familiar with the concept of linear trails). This section is meant as a brief overview to provide motivation for the attack and the discussion below is therefore highly simplified.
A blockcipher is a function \(E : \mathbb{F}_2^{k} \times \mathbb{F}_2^{n} \to \mathbb{F}_2^{n} \). A linear approximation is a pair \( (\alpha, \beta) \in \mathbb{F}_2^{n} \times \mathbb{F}_2^{n} \). The dotproduct in \( \mathbb{F}_2^{n} \) allows us to define the “parity” functions \( f(x) = \alpha^{t} x \) and \( g(x) = \beta^{t} x \) of \( \alpha \) and \( \beta \) respectively. If \( \alpha \neq 0 \) and \( \beta \neq 0 \) then we say that the approximation is nontrivial. We often call \( \alpha \) the input mask and \( \beta \) the output mask.
Essentially linear cryptanalysis considers the correlation \( C(f(x), g(E_K(x)) ) \) over the plaintext space \( x \in \mathbb{F}_2^{n} \), between parity functions \(f\) and \(g\) for approximations \( ( \alpha, \beta ) \) for the cipher \( E_K \). The hope is that the correlation for the “target cipher” and the correlation for a “random permutation”, will deviate significantly.
In a classical attack, we then let the “target cipher” be R rounds of the cipher under attack and assume that R+2 rounds of the cipher acts as a random permutation (atleast with regards to the approximation we have chosen). We then attack R+1 rounds of the cipher, by guessing (parts of) the last round key and (partially) decrypting the last round. If the guess was correct we obtain plaintext/ciphertext pairs from the “target cipher” (R rounds), if we guessed wrong we essentially encrypted one more round and obtained plaintext/ciphertext pairs from the random permutation. If we can differentiate between the correlation for some approximation over the random permutation and “target cipher” we can distinguish the two cases, from which we obtain a list of candidates for (parts of) the last roundkey. This allows us to move from distinguishing to keyrecovery.
Of course a blockcipher does not define a single permutation, but a family of permutations indexed by the key \( K \in \mathbb{F}^{k} \). So it makes little sense to talk about a single correlation for blockcipher, one assumption which is often implicit in early work is the socalled keyequivalence hypothesis, which states that the correlation for a particular approximation remains constant over the keyspace (or does not change much). But this needs to be justified and for the most part it is not the case. In fact for randomly sampled permutations over a sufficiently large domain we expect the correlation of any nontrivial approximation to follow a normal distribution (see Probability of distibutions of Correlation and Differentials in Block Ciphers).
In other words we obtain a distribution of the correlation over the keyspace and so we are not distinguishing between correlations directly, but distributions of correlations.
It turns out that even evaluating the exact correlation for a single instantiation of the cipher under a given key is infeasible for all but the smallest blocksizes, since it would require us to enumerate all possible inputs to the cipher. However we can calculate the correlation between parities over the cipher as the sum of correlation (contributions) for individual trails \( T \): \[ C(\alpha^{t}, \beta^{t}) = \sum_{T \ = \ \alpha  \ldots  \beta} C(T) \] We call the set of trails starting in \( \alpha \) and ending in \( \beta \), the “linear hull” between \( \alpha \) and \( \beta \). In modern linear cryptanalysis we often estimate the correlation for a single key by attempting to discover the most significant terms in the summation above (“dominant trails”); each of which can be evaluated efficiently against all practical ciphers for a particular key. We obtain an estimate for the distribution by sampling keys at random and computing the estimated correlation.
Example: Below I have generated an estimate for the correlation distribution over the keyspace of PRESENT for the approximation (0x8000000000000000, 0x8000000000000000) over 22 rounds, based on 1000000 keys and the hull of all trails with Hamming weight \( \leq 4 \):
Hence, if we sampled a correlation of \(2^{30.50}\) we midge judge that we are sampling from the PRESENT distribution. Whereas a correlation of \( \approx 0 \) is likely sampled from the random permutation distribution. Observe that for many keys, discerning which distribution the correlation is sampled becomes hard and our distinguisher becomes ineffective. Therefore keyrecovery may fail to find the correct key or return a large number of false positives.
ZeroCorrelation Hulls
With this introduction in mind, the idea of zerocorrelation linear attacks is simple:
Suppose we can find a nontrivial approximation \( ( \alpha, \beta ) \) where all trails in the hull have a correlation contribution of 0 for any key. Then the summation above is 0 and the “target cipher” / “right key” distribution is constant zero. Linear cryptanalysis without keydependence!
In this case the distribution (like seen above) for the target cipher is a point distribution at 0, yet we still expect the random permutations to follow a normal distribution. Hence, unlike before, if we sample exactly zero we judge that we are interacting with the target cipher. If we sample any other correlation we judge that the samples originate from the random permutation.
ZeroCorrelation Hulls for Feistel Networks
The paper above (and subsequent work by Bogdanov et al.), details multiple zerocorrelation hulls for concrete ciphers (AES), as well as general constructions (e.g. Feistel). The Feistel attacks are particularly sweet, since it only depends on the Ffunction being invertible. In other words, the Ffunction could be sampled uniformly at random from the set of all permutations for each round key (of any length); and the cipher would still be vulnerable.
In the remainder of this post, we always omit the final swap from the Feistel network. We shall consider a class of hulls with correlation zero described for a 5round balanced Feistel. Let \( a \in \mathbb{F}^{n/2}, a \neq 0 \) and consider the approximation: \[ (\alpha = 0  a, \beta = 0  a) \] Consider a linear trail \( (u_0, u_1, u_2, u_3, u_4, u_5) \) with \(u_0 = \alpha\), \(u_5 = \beta\), as illustrated below:
Let us assume that the trail has nonzero correlation. The approximation over the first Ffunction (orange), must be \( (0, 0) \), since the input mask is \( 0 \) and F is a permutation, hence \( u_1 = (a, 0) \). The approximation over the second Ffunction (yellow), must be \( (b, a) \) for some nonzero b, since the output mask is \( a \neq 0 \). The arguments for \( u_3, u_4 \) are completely analogous to the two above. We observe that the approximation over the middle (red) Ffunction becomes \( (0, b)\)
Hence we have an approximation with a zero mask and a nonzero mask over the middle Ffunction (red). The correlation is therefore that between a constant (0) and a balanced function over a permutation, which implies that the correlation is zero.
We conclude that every trail in the hull has correlation contribution 0. Therefore, the approximation must have correlation zero, \( C(\alpha^{t}, \beta^{t}) = 0 \). We use such approximations to construct a distinguisher between a 5round Feistel network and a “random” permutation; for which we do not generally expect the approximation to have correlation zero.
With this distinguisher we can then mount an attack on 6 or more rounds.
Evaluating The Correlation Without Full Codebook
We are required to evaluate the correlation accurately to check if it is nonzero. With the naive method, this would require the full codebook, which obviously makes the attack rather uninteresting: why would we need the key if we already have a lookup table for all inputs?
However as noted by Rijmen and Bogdanov we can evaluate the exact correlation using \( 2^{n1} \) chosen plaintexts, when the approximation is over a permutation: Let \( \alpha, \beta \) be a nontrivial approximation, let \( P \) be a permutation.
Consider the following 4 disjunct sets of input/outputs \( (x, y) \) to \( P \):
\[ T_{00} = \{ (x, y)  \alpha^{t} x = 0 \text{ and } \beta^{t} y = 0 \} \]
\[ T_{01} = \{ (x, y)  \alpha^{t} x = 0 \text{ and } \beta^{t} y = 1 \} \]
\[ T_{10} = \{ (x, y)  \alpha^{t} x = 1 \text{ and } \beta^{t} y = 0 \} \]
\[ T_{11} = \{ (x, y)  \alpha^{t} x = 1 \text{ and } \beta^{t} y = 1 \} \]
Since any nontrivial parity \( \alpha^{t} \) takes values \( 1 \) and \( 0 \) equally often, we conclude:
\[ T_{00} + T_{01} = T_{10} + T_{11} = 2^{n  1} \]
Furthermore since \( dom(P) = codom(P) \), the same applies to \( \beta^{t} \) on the outputs:
\[ T_{01} + T_{11} = T_{00} + T_{10} = 2^{n  1} \]
From which we can derive:
\[ T_{00} = T_{11} \]
\[ C(\alpha^t x, \beta^t y) = \frac{ T_{00} + T_{11} }{ 2^{n  1} }  1 = \frac{ 2 \cdot T_{00} }{ 2^{n  1} }  1 = \frac{ 2 \cdot T_{11} }{ 2^{n  1} }  1 \]
Therefore obtaining the size of \( T_{11} \) is sufficient for determining the exact correlation. Observe that this makes the attack a chosen plaintext attack, since we must choose all the inputs \( x \) such that \( \alpha^{t} x = 1 \) and obtain the corresponding outputs.
The ZeroCipher
Our target will be a 6round balanced 32bit Feistelcipher designed for the occasion, dubbed the “ZeroCipher”. The Ffunction of the ZeroCipher consists of key addition (XOR), the parallel application of two instances of the AESSbox, followed by a bit permutation (rotation) to obtain some level of diffusion. A single round is illustrated below:
The final swap is omitted (so that encryption / decryption is symmetric), and all round keys are independent. Here is Ccode for encryption/decryption procedure:


The Attack

Let \( a = \verb!0xffff! \). So our approximation becomes \( \alpha = \beta = (0^{16} \  \ a) \).

Request the encryption \(c\) of all \( 2^{31} \) plaintexts \(p\) where \( \langle \alpha, p \rangle = 1 \).

Attempt decryption of the last round for every possible roundkey \( k \) and calculate the correlation between \( \langle \alpha, p \rangle \) and \( \langle \beta, c' \rangle \) for the partially decrypted ciphertexts \( c' \).

If the round key was correct, we obtain the ciphertexts for a 5round version of the cipher, where the hull has correlation zero. If we calculate the correlation to be zero (e.i. \( \langle \beta, c' \rangle \) is balanced over \( c' \)), we output the roundkey \( k \) as a possible candidate.
Note that we do not need to store the plaintexts after obtaining the ciphertexts.
Overall the attack is very simple, since there is no complex trailsearch algorithm:


The attack requires 8GB of RAM and took \( \approx \) 12 hours on 12 nodes with 8 cores. \
It recovers 3 key candidates (from 65536 possible round keys):


Where the last roundkey is the correct one found in oracle.c
:


Summary
We saw that the existence of linear hulls with correlation zero can be used to create distinguishers in a manner very similar to impossible differentials. We explored one such class of hulls for Feistel networks and applied the attack to a 32bit toy cipher.
This post only covers the absolute basics of zerocorrelation linear attacks, in particular later work details how data complexity can be reduced by using multiple zerocorrelation approximations and employing a more complex distinguisher based on the expected mean of the sum of sampled correlations for multiple zerocorrelation hulls.