An Improved Lower Bound for Sparse Reconstruction from Subsampled Walsh Matrices
- Computer Science, Columbia University
- More about Jaroslaw Blasiok
- Department of Mathematics, University of Colorado, Boulder
- More about Kyle Luh
- Division of Applied Mathematics, Brown University
- More about Patrick Lopatto
- Computer Science, Northwestern University (IL)
- More about Shravas Rao
An Improved Lower Bound for Sparse Reconstruction from Subsampled Walsh Matrices, Discrete Analysis 2023:3, 9 pp.
There are many situations, often of great practical importance, where one would like to obtain a good approximation for a mathematical object after taking a small number of measurements. For instance, we can store an image by storing the colour and brightness of every pixel, but typical images have properties such as regions of smoothness that enable us to store them far more economically using other bases, wavelets being a popular choice.
One property that can make an object easier to store is if it is sparse, meaning that in a suitable basis it can be approximated by a vector of small support. A smooth function, for example, will have a Fourier transform that decays rapidly. Because of this phenomenon, the following definition, introduced by Emmanuel Candès and Terence Tao in a paper that became landmark in the field of compressed sensing , has become established as a concept of central importance. A real or complex matrix m×n matrix A has the (k,ϵ)-restricted isometry property if for every v∈Rn or Cn of support size at most k, we have the inequality
If k is substantially smaller than n, then it turns out that m can often be taken substantially smaller than n as well. If we think of the rows of A as measurements, then roughly speaking this is telling us that we do not need many measurements to determine all sparse vectors. (For practical purposes there are also algorithmic questions to consider, but in many situations there are efficient algorithms that allow one to reconstruct the sparse vectors from the measurements.)
A result of Candès and Tao of particular relevance to this paper is that if you start with the matrix M of the discrete Fourier transform on the cyclic group of order n, then in order to reconstruct signals of support size k it is sufficient to choose k(logn)6 random rows of M. (Better bounds can be obtained using random matrices, but the importance of this result is that measuring Fourier coefficients is far more practical.) This bound has subsequently been improved several times and currently stands at k(logk)2logn.
It turns out all known proofs of results of the above type apply equally well to any orthogonal (or unitary) matrix M with coefficients bounded above by Cn−1/2 for some constant C. In particular, they apply to the Walsh matrix (suitably normalized), which is the 2t×2t matrix obtained inductively by the rule
where W0=(1), and also the matrix of the discrete Fourier transform over the group Ft2.
It is not hard to see why the logn factor is necessary in the case of the Walsh matrix. To sketch the proof briefly, the rows of the Walsh matrix correspond to points in Ft2. But if A⊂Ft2 is any set of density 1/2, then by averaging there exists x1 such that A1=A∩(A−x1) has density at least 1/4, and then x2 such that A2=A1∩(A1−x2) has density at least 1/16, and so on. We can continue this process for r=log2t steps before the density drops below 2−t, which yields some point x such that x plus the subspace generated by x1,…,xr is a subset of A. Note that this subspace has cardinality 2r=t.
It follows that if we choose half the rows of the Walsh matrix, obtaining a matrix U, say, then the other half contain rows that correspond to an affine subspace of dimension r, and because the Fourier transform of an affine subspace is supported in the linear subspace orthogonal to it, the kernel of U contains a vector of support size at most 2t−r=t−12t. Setting n=2t, we see that this gives us a vector of support size at most n/logn that belongs to the kernel of U, and in particular does not have its norm approximately preserved (even when U is normalized appropriately).
An even simpler argument shows the weaker result that a random choice of rows does not work: one can simply take all the cosets of some fixed subspace of dimension r and look at the expected number of them that belong to A. Applying this argument more generally we find that in order for no vectors of support size k to belong to the kernel, we need to take at least klogn random rows of Wt.
In the light of these results, it is natural to conjecture that the true bound should be klogn. However, this paper shows the surprising fact that a logk factor is needed, at least for the Walsh matrix.
To see why some gain might be possible over the argument just sketched, note that the second version proves something too strong: it shows that for every subspace of dimension r, with high probability it has a translate that lives inside A, but all we need for the conclusion is the existence of one single subspace with that property.
It is not completely obvious how to exploit this weakness in the argument. The authors do so by means of a second-moment argument, but it is quite delicate because the events that A contains a given affine subspace of dimension r are correlated, and the correlations differ according to how the affine subspaces intersect. However, the authors manage to keep track of these correlations and obtain a lower bound of Ω(klogklog(n/k)), not just for the restricted isometry property but for the weaker property of not containing a sparse vector in the kernel. It remains an interesting open problem whether a similar lower bound can be obtained in the deterministic case: that is, whereas we know that the logk factor is required if we make a random choice of rows of the Walsh matrix, it is not known whether there exists a choice of rows that works with a bound that does not include the logk factor.