Punctured Low-Bias Codes Behave Like Random Linear Codes

- Department of EECS, University of California, Berkeley
- Simons Institute for the Theory of Computing
- More about Venkatesan Guruswami

- Computer Science, Ben-Gurion University of the Negev
- More about Jonathan Mosheiff

*Discrete Analysis*, June. https://doi.org/10.19086/da.117574.

### Editorial introduction

Punctured low-bias codes behave like random linear codes, Discrete Analysis 2024:4, 37 pp.

A binary *error-correcting code*, often referred to simply as a code, is a subset A of {0,1}n with the property that any two distinct elements of A are far apart in Hamming distance, where that typically means that there is a constant c>0 such that if x and y are distinct members of A, then they differ in at least cn coordinates. Given such a set, we can transmit messages via a *noisy channel* as follows. First, one encodes a message as an element of A in some agreed way. Next, one sends the message to its recipient. However, the noisy channel may corrupt this message, changing a few of its 0s to 1s and 1s to 0s. The recipient then takes the 01-string and replaces it by the nearest element of A. Provided the number of changed bits is at most cn/2, this is guaranteed to be the initial message, so the error has been “corrected”.

That simple account raises all sorts of questions. If we want to encode a bit string of length m as an element of A, then we need A to have size at least 2m. How large does n need to be to contain a cn-separated subset of size 2m? If we can find such a subset, how can we encode a bit string in a computationally efficient way? And once the corrupted message is received, how can we find the nearest member of A in a computationally efficient way?

The first of these questions has a surprisingly easy solution discovered by Gilbert and Varshamov: even if c is close to 1/2, we can get away with n depending linearly on m. This can be achieved in various ways. One is to pick the points of A greedily until you cannot pick any more. This was Gilbert’s approach. Another is to use an approach that is now very standard in probabilistic combinatorics but was much less standard at the time, which is to choose points of A independently at random with a probability that ensures that the expected number of pairs of points at distance cn is less than half the expected size of A, after which one can remove a point from each bad pair. These constructions yield a linear dependence of n on m, though the precise dependence of the constant of linearity on c is not known.

However, there is no obvious way of encoding a bit string of length m as a member of a random subset A of {0,1}n of size 2m, and there is also no obvious way of decoding messages – that is, finding the nearest member of A to a given element of {0,1}n. Therefore, random constructions are not much use in practice for transmitting messages across noisy channels.

A very good way of dealing with the encoding problem is to use *linear codes* – that is, we identify {0,1}n with Fn2 and take A to be an m-dimensional F2-linear subspace. Of course, not every linear subspace will have the desired separation property, but many examples are known that do, including random linear subspaces, which was Varshamov’s approach to the Gilbert-Varshamov bound. We can think of an m-dimensional subspace as the image of an injective linear map ϕ:Fm2→Fn2, and then there is an obvious encoding procedure: one simply applies the linear map ϕ. It is still not obvious how to decode, but there are famous examples of linear codes for which efficient decoding procedures are also known.

These examples, however, are of very specific linear codes: unfortunately, for random linear codes it does not seem to be possible to find efficient decoding algorithms. The main theme of this paper is to show that random linear codes can be significantly derandomized, in the following sense. We can identify a random m-dimensional subspace of Fn2 with a random sequence of m vectors in Fn2: given such a sequence, we take the subspace it generates, which will be m-dimensional with high probability, and any two m-dimensional subspaces will be chosen with equal probability if we choose them this way.

A sequence of m vectors in Fn2 is just an n×m matrix, so for this method of choosing a random linear code we need nm random bits.

Here is another way of thinking about random linear subspaces. Begin by defining the *Hadamard code*, which is a subspace of F2r2 consisting of all sequences (⟨u,v⟩)u∈Fr2 with v∈Fr2. Here the angle brackets denote the mod-2 inner product, and we assume that the elements of Fr2 have been enumerated somehow. Note that what we have just defined is just the set of rows of a Walsh matrix with 0s and 1s instead of 1s and -1s.

This subspace of F2r2 has an excellent property, namely that any two of its elements differ in exactly 2r−1 out of the 2r coordinates, which is the best distance one can hope for. However, it has size 2r, which is only logarithmic in 22r. (In coding theory parlance, the *rate* of the code is very small.) A natural way to try to improve the rate is to choose m coordinates at random and restrict the vectors in the subspace to those m coordinates, hoping that the good properties of the Hadamard code will be inherited by random projections of this kind. This is the same as choosing m random vectors v1,…,vm and taking the subspace to consist of all sequences (⟨u,v1⟩,…,⟨u,vm⟩) with u∈Fr2. But this is the same as taking the image of Fr2 by the linear map defined by the random matrix with rows v1,…,vm, so it is nothing other than a random linear subspace of Fm2 of dimension r.

The restriction of a code to some subset of the coordinates is called a *puncturing* of the code, so the above observation is that a random puncturing of the Hadamard code is the same as a random linear code. Since random linear codes have very good properties, we know, since it is an equivalent statement, that random puncturings of Hadamard codes have very good properties.

However, in order to get a good rate this way, we have to take a tiny puncturing of a huge code, so the amount of randomness we need, that is, the number of random bits we need in order to define the puncturing, is large. This paper is concerned with *derandomizing* the above process. Broadly speaking, the idea is that instead of randomly puncturing the Hadamard code, we can randomly puncture a much smaller and denser code, provided that it satisfies a condition known as small bias. A binary code C⊂Fn2 is said to have *bias* at most ϵ if every sequence in C has between (1−ϵ)n/2 and (1+ϵ)n/2 1s in it. (The definitions here can all be generalized straightforwardly from F2 to more general Fq, and indeed that is the level of generality at which the paper operates.) The main result of the paper is that random puncturings of low-bias linear codes have many of the same good properties that purely random linear codes have. This therefore allows us to produce random linear codes with good properties, but using far fewer random bits, so although they are still random, they are much “less random” than purely random linear codes.

Why should one be interested in doing this? There are two main reasons. The first is the general consideration that randomness, like time and memory, is a computational resource that one typically likes to minimize the use of wherever possible. The second, and perhaps more important, is that it raises the possibility of choosing a “mother code” about which one has a lot of information and exploiting that information in the decoding process (which, as was mentioned above, is difficult for purely random linear codes).

Finally, what are the “good properties” enjoyed by these random puncturings of low-bias codes? The answer is quite general: any property that is *monotone decreasing* (meaning that any subset of a code that has the property also has the property) and *local* (meaning that if a code does not have the property, then that fact is witnessed by a bounded number of elements of the code) will hold for a random puncturing of a low-bias code if it holds for a purely random subspace. A simple example of such a property is “any two elements have distance at least cn” since removing elements from the code only helps, and to disprove the property it is enough to exhibit two elements that have distance less than cn. A generalization of this property known as list decodability, on which this paper focuses, states, for some pair of parameters c,L, that the number of elements of the code within cn of any element of the code is at most L. List decodability has been much studied, since for many purposes a code is useful in practice if it is list decodable, especially if for most elements of the code, the number of elements within cn of it is much smaller than L. The authors also show that to obtain optimal list decodability, it is enough if the distances between the elements of the mother code are large, which is a slightly weaker condition than low bias.