An Improved Lower Bound for Sparse Reconstruction from Subsampled Walsh Matrices

Jaroslaw Blasiok; Kyle Luh; Patrick Lopatto; Jake Marcinek; Shravas Rao

doi:10.19086/da.74802

Computational Complexity

May 10, 2023 BST

An Improved Lower Bound for Sparse Reconstruction from Subsampled Walsh Matrices

https://doi.org/10.19086/da.74802

cs.it (computer science - information theory)

ccby-4.0

Photo by Olga Kovalski on Unsplash

Blasiok, Jaroslaw, Kyle Luh, Patrick Lopatto, Jake Marcinek, and Shravas Rao. 2023. “An Improved Lower Bound for Sparse Reconstruction from Subsampled Walsh Matrices.” Discrete Analysis, May. https://doi.org/10.19086/da.74802.

Editorial introduction

Read article at ArXiv

An Improved Lower Bound for Sparse Reconstruction from Subsampled Walsh Matrices, Discrete Analysis 2023:3, 9 pp.

There are many situations, often of great practical importance, where one would like to obtain a good approximation for a mathematical object after taking a small number of measurements. For instance, we can store an image by storing the colour and brightness of every pixel, but typical images have properties such as regions of smoothness that enable us to store them far more economically using other bases, wavelets being a popular choice.

One property that can make an object easier to store is if it is sparse, meaning that in a suitable basis it can be approximated by a vector of small support. A smooth function, for example, will have a Fourier transform that decays rapidly. Because of this phenomenon, the following definition, introduced by Emmanuel Candès and Terence Tao in a paper that became landmark in the field of compressed sensing , has become established as a concept of central importance. A real or complex matrix $m\times n$ matrix $A$ has the $(k,\epsilon)$ -restricted isometry property if for every $v\in\mathbb R^n$ or of support size at most we have the inequality
$(1-\epsilon)\|v\|\leq Av\leq(1+\epsilon)\|v\|.$
If $k$ is substantially smaller than $n$ , then it turns out that $m$ can often be taken substantially smaller than $n$ as well. If we think of the rows of $A$ as measurements, then roughly speaking this is telling us that we do not need many measurements to determine all sparse vectors. (For practical purposes there are also algorithmic questions to consider, but in many situations there are efficient algorithms that allow one to reconstruct the sparse vectors from the measurements.)

A result of Candès and Tao of particular relevance to this paper is that if you start with the matrix $M$ of the discrete Fourier transform on the cyclic group of order $n$ , then in order to reconstruct signals of support size $k$ it is sufficient to choose $k(\log n)^6$ random rows of $M$ . (Better bounds can be obtained using random matrices, but the importance of this result is that measuring Fourier coefficients is far more practical.) This bound has subsequently been improved several times and currently stands at $k(\log k)^2\log n$ .

It turns out all known proofs of results of the above type apply equally well to any orthogonal (or unitary) matrix $M$ with coefficients bounded above by $Cn^{-1/2}$ for some constant In particular, they apply to the Walsh matrix (suitably normalized), which is the matrix obtained inductively by the rule
$W_t=\begin{pmatrix}W_{t-1}&W_{t-1}\\ W_{t-1}&-W_{t-1}\\ \end{pmatrix},$
where $W_0=(1)$ , and also the matrix of the discrete Fourier transform over the group $\mathbb F_2^t$ .

It is not hard to see why the $\log n$ factor is necessary in the case of the Walsh matrix. To sketch the proof briefly, the rows of the Walsh matrix correspond to points in $\mathbb F_2^t$ . But if $A\subset\mathbb F_2^t$ is any set of density 1/2, then by averaging there exists $x_1$ such that $A_1=A\cap (A-x_1)$ has density at least 1/4, and then $x_2$ such that $A_2=A_1\cap(A_1-x_2)$ has density at least $1/16$ , and so on. We can continue this process for $r=\log_2t$ steps before the density drops below $2^{-t}$ , which yields some point $x$ such that $x$ plus the subspace generated by $x_1,\dots,x_r$ is a subset of $A$ . Note that this subspace has cardinality $2^r=t$ .

It follows that if we choose half the rows of the Walsh matrix, obtaining a matrix $U$ , say, then the other half contain rows that correspond to an affine subspace of dimension $r$ , and because the Fourier transform of an affine subspace is supported in the linear subspace orthogonal to it, the kernel of $U$ contains a vector of support size at most $2^{t-r}=t^{-1}2^t$ . Setting $n=2^t$ , we see that this gives us a vector of support size at most $n/\log n$ that belongs to the kernel of $U$ , and in particular does not have its norm approximately preserved (even when $U$ is normalized appropriately).

An even simpler argument shows the weaker result that a random choice of rows does not work: one can simply take all the cosets of some fixed subspace of dimension $r$ and look at the expected number of them that belong to $A$ . Applying this argument more generally we find that in order for no vectors of support size $k$ to belong to the kernel, we need to take at least $k\log n$ random rows of $W_t$ .

In the light of these results, it is natural to conjecture that the true bound should be $k\log n$ . However, this paper shows the surprising fact that a $\log k$ factor is needed, at least for the Walsh matrix.

To see why some gain might be possible over the argument just sketched, note that the second version proves something too strong: it shows that for every subspace of dimension $r$ , with high probability it has a translate that lives inside $A$ , but all we need for the conclusion is the existence of one single subspace with that property.

It is not completely obvious how to exploit this weakness in the argument. The authors do so by means of a second-moment argument, but it is quite delicate because the events that $A$ contains a given affine subspace of dimension $r$ are correlated, and the correlations differ according to how the affine subspaces intersect. However, the authors manage to keep track of these correlations and obtain a lower bound of $\Omega(k\log k\log(n/k))$ , not just for the restricted isometry property but for the weaker property of not containing a sparse vector in the kernel. It remains an interesting open problem whether a similar lower bound can be obtained in the deterministic case: that is, whereas we know that the $\log k$ factor is required if we make a random choice of rows of the Walsh matrix, it is not known whether there exists a choice of rows that works with a bound that does not include the $\log k$ factor.

Read article at ArXiv