Punctured Low-Bias Codes Behave Like Random Linear Codes

Venkatesan Guruswami; Jonathan Mosheiff

doi:10.19086/da.117574

Computational Complexity

June 07, 2024 BST

Punctured Low-Bias Codes Behave Like Random Linear Codes

https://doi.org/10.19086/da.117574

Coding Theory Random Linear Codes List-Decoding Probabilistic Combinatorics

ccby-4.0

Photo by Mihály Köles on Unsplash

Guruswami, Venkatesan, and Jonathan Mosheiff. 2024. “Punctured Low-Bias Codes Behave Like Random Linear Codes.” Discrete Analysis, June. https://doi.org/10.19086/da.117574.

Editorial introduction

Read article at ArXiv

Punctured low-bias codes behave like random linear codes, Discrete Analysis 2024:4, 37 pp.

A binary error-correcting code, often referred to simply as a code, is a subset $A$ of $\{0,1\}^n$ with the property that any two distinct elements of $A$ are far apart in Hamming distance, where that typically means that there is a constant $c>0$ such that if $x$ and $y$ are distinct members of $A$ , then they differ in at least $cn$ coordinates. Given such a set, we can transmit messages via a noisy channel as follows. First, one encodes a message as an element of $A$ in some agreed way. Next, one sends the message to its recipient. However, the noisy channel may corrupt this message, changing a few of its 0s to 1s and 1s to 0s. The recipient then takes the 01-string and replaces it by the nearest element of $A$ . Provided the number of changed bits is at most $cn/2$ , this is guaranteed to be the initial message, so the error has been “corrected”.

That simple account raises all sorts of questions. If we want to encode a bit string of length $m$ as an element of $A$ , then we need $A$ to have size at least $2^m$ . How large does $n$ need to be to contain a $cn$ -separated subset of size $2^m$ ? If we can find such a subset, how can we encode a bit string in a computationally efficient way? And once the corrupted message is received, how can we find the nearest member of $A$ in a computationally efficient way?

The first of these questions has a surprisingly easy solution discovered by Gilbert and Varshamov: even if $c$ is close to 1/2, we can get away with $n$ depending linearly on $m$ . This can be achieved in various ways. One is to pick the points of $A$ greedily until you cannot pick any more. This was Gilbert’s approach. Another is to use an approach that is now very standard in probabilistic combinatorics but was much less standard at the time, which is to choose points of $A$ independently at random with a probability that ensures that the expected number of pairs of points at distance $cn$ is less than half the expected size of $A$ , after which one can remove a point from each bad pair. These constructions yield a linear dependence of $n$ on $m$ , though the precise dependence of the constant of linearity on $c$ is not known.

However, there is no obvious way of encoding a bit string of length $m$ as a member of a random subset $A$ of $\{0,1\}^n$ of size $2^m$ , and there is also no obvious way of decoding messages – that is, finding the nearest member of $A$ to a given element of $\{0,1\}^n$ . Therefore, random constructions are not much use in practice for transmitting messages across noisy channels.

A very good way of dealing with the encoding problem is to use linear codes – that is, we identify $\{0,1\}^n$ with $\mathbb F_2^n$ and take $A$ to be an $m$ -dimensional $\mathbb F_2$ -linear subspace. Of course, not every linear subspace will have the desired separation property, but many examples are known that do, including random linear subspaces, which was Varshamov’s approach to the Gilbert-Varshamov bound. We can think of an $m$ -dimensional subspace as the image of an injective linear map $\phi:\mathbb F_2^m\to\mathbb F_2^n$ , and then there is an obvious encoding procedure: one simply applies the linear map $\phi$ . It is still not obvious how to decode, but there are famous examples of linear codes for which efficient decoding procedures are also known.

These examples, however, are of very specific linear codes: unfortunately, for random linear codes it does not seem to be possible to find efficient decoding algorithms. The main theme of this paper is to show that random linear codes can be significantly derandomized, in the following sense. We can identify a random $m$ -dimensional subspace of $\mathbb F_2^n$ with a random sequence of $m$ vectors in $\mathbb F_2^n$ : given such a sequence, we take the subspace it generates, which will be $m$ -dimensional with high probability, and any two $m$ -dimensional subspaces will be chosen with equal probability if we choose them this way.

A sequence of $m$ vectors in $\mathbb F_2^n$ is just an $n\times m$ matrix, so for this method of choosing a random linear code we need $nm$ random bits.

Here is another way of thinking about random linear subspaces. Begin by defining the Hadamard code, which is a subspace of $\mathbb F_2^{2^r}$ consisting of all sequences $(\langle u,v\rangle)_{u\in\mathbb F_2^r}$ with $v\in\mathbb F_2^r$ . Here the angle brackets denote the mod-2 inner product, and we assume that the elements of $\mathbb F_2^r$ have been enumerated somehow. Note that what we have just defined is just the set of rows of a Walsh matrix with 0s and 1s instead of 1s and -1s.

This subspace of $\mathbb F_2^{2^r}$ has an excellent property, namely that any two of its elements differ in exactly $2^{r-1}$ out of the $2^r$ coordinates, which is the best distance one can hope for. However, it has size $2^r$ , which is only logarithmic in $2^{2^r}$ . (In coding theory parlance, the rate of the code is very small.) A natural way to try to improve the rate is to choose $m$ coordinates at random and restrict the vectors in the subspace to those $m$ coordinates, hoping that the good properties of the Hadamard code will be inherited by random projections of this kind. This is the same as choosing $m$ random vectors $v_1,\dots,v_m$ and taking the subspace to consist of all sequences $(\langle u,v_1\rangle,\dots,\langle u,v_m\rangle)$ with $u\in\mathbb F_2^r$ . But this is the same as taking the image of $\mathbb F_2^r$ by the linear map defined by the random matrix with rows $v_1,\dots,v_m$ , so it is nothing other than a random linear subspace of $\mathbb F_2^m$ of dimension $r$ .

The restriction of a code to some subset of the coordinates is called a puncturing of the code, so the above observation is that a random puncturing of the Hadamard code is the same as a random linear code. Since random linear codes have very good properties, we know, since it is an equivalent statement, that random puncturings of Hadamard codes have very good properties.

However, in order to get a good rate this way, we have to take a tiny puncturing of a huge code, so the amount of randomness we need, that is, the number of random bits we need in order to define the puncturing, is large. This paper is concerned with derandomizing the above process. Broadly speaking, the idea is that instead of randomly puncturing the Hadamard code, we can randomly puncture a much smaller and denser code, provided that it satisfies a condition known as small bias. A binary code $C\subset\mathbb F_2^n$ is said to have bias at most $\epsilon$ if every sequence in $C$ has between $(1-\epsilon)n/2$ and $(1+\epsilon)n/2$ 1s in it. (The definitions here can all be generalized straightforwardly from $\mathbb F_2$ to more general $\mathbb F_q$ , and indeed that is the level of generality at which the paper operates.) The main result of the paper is that random puncturings of low-bias linear codes have many of the same good properties that purely random linear codes have. This therefore allows us to produce random linear codes with good properties, but using far fewer random bits, so although they are still random, they are much “less random” than purely random linear codes.

Why should one be interested in doing this? There are two main reasons. The first is the general consideration that randomness, like time and memory, is a computational resource that one typically likes to minimize the use of wherever possible. The second, and perhaps more important, is that it raises the possibility of choosing a “mother code” about which one has a lot of information and exploiting that information in the decoding process (which, as was mentioned above, is difficult for purely random linear codes).

Finally, what are the “good properties” enjoyed by these random puncturings of low-bias codes? The answer is quite general: any property that is monotone decreasing (meaning that any subset of a code that has the property also has the property) and local (meaning that if a code does not have the property, then that fact is witnessed by a bounded number of elements of the code) will hold for a random puncturing of a low-bias code if it holds for a purely random subspace. A simple example of such a property is “any two elements have distance at least $cn$ ” since removing elements from the code only helps, and to disprove the property it is enough to exhibit two elements that have distance less than $cn$ . A generalization of this property known as list decodability, on which this paper focuses, states, for some pair of parameters $c,L$ , that the number of elements of the code within $cn$ of any element of the code is at most $L$ . List decodability has been much studied, since for many purposes a code is useful in practice if it is list decodable, especially if for most elements of the code, the number of elements within $cn$ of it is much smaller than $L$ . The authors also show that to obtain optimal list decodability, it is enough if the distances between the elements of the mother code are large, which is a slightly weaker condition than low bias.

Read article at ArXiv