Good Bounds in Certain Systems of True Complexity One

Let $\Phi = (\phi_1,\dots,\phi_6)$ be a system of $6$ linear forms in $3$ variables, i.e. $\phi_i \colon \mathbb{Z}^3 \to \mathbb{Z}$ for each $i$. Suppose also that $\Phi$ has Cauchy--Schwarz complexity $2$ and true complexity $1$, in the sense defined by Gowers and Wolf; in fact this is true generically in this setting. Finally let $G = \mathbb{F}_p^n$ for any $p$ prime and $n \ge 1$. Then we show that multilinear averages by $\Phi$ are controlled by the $U^2$-norm, with a polynomial dependence; i.e. if $f_1,\dots,f_6 \colon G \to \mathbb{C}$ are functions with $\|f_i\|_{\infty} \le 1$ for each $i$, then for each $j$, $1 \le j \le 6$: \[ \left| \mathbb{E}_{x_1,x_2,x_3 \in G} f_1(\varphi_1(x_1,x_2,x_3)) \dots f_6(\phi_6(x_1,x_2,x_3)) \right| \le \|f_j\|_{U^2}^{1/C} \] for some $C>0$ depending on $\Phi$. This recovers and strengthens a result of Gowers and Wolf in these cases. Moreover, the proof uses only multiple applications of the Cauchy--Schwarz inequality, avoiding appeals to the inverse theory of the Gowers norms. We also show that some dependence of $C$ on $\Phi$ is necessary; that is, the constant $C$ can unavoidably become large as the coefficients of $\Phi$ grow.

(Here we have abused notation to let φ i : Z d → Z induce a function G d → G, in the obvious way.) So, in our examples above, we take f 1 = · · · = f r = 1 X , the indicator function of X; however it is convenient to allow more general functions in the definition of Λ Φ as they arise in intermediate computations.
A fundamental observation in much recent progress in such questions (as applied to Szemerédi-type theorems, or counting solutions to linear equations in the primes), originally due to Gowers [3], is that averages Λ Φ are controlled by Gowers uniformity norms. 1 A weak statement of this type is that if X has density α and X is suitably quasirandom in the sense , where s is some positive integer, then Λ Φ (1 X , . . . , 1 X ) = α r + o(1) i.e. the number of solutions to the system Φ in X is roughly the same as the expected count in a random set, i.e. α r |G| d . A stronger type of statement one could make is that if f 1 , . . . , f d : G → C are any functions with f i ∞ ≤ 1 for all i, and f j U s+1 = o(1) for any one j ∈ {1, . . . , d}, then . . , f r )| = o(1) ; (2) and indeed this kind of statement implies the previous one.
The remaining question is when one has such a statement for a system of linear forms Φ, and if so, how small the positive integer s can be; i.e. how far one has to go in the hierarchy of Gowers norms to control Λ Φ . For instance, for k-term arithmetic progressions, Gowers [3] showed that a statement of type (2) holds for s = k − 2, and with a good bound; specifically, whenever f i ∞ ≤ 1 for each i, and for any 1 ≤ j ≤ k. The proof is s + 1 applications of the Cauchy-Schwarz inequality. and observe that f 1 (x)f 2 (x+h)f 3 (x+2h)f 4 (x+3h) = 1 pointwise for any x, h ∈ G. So, Λ 4AP (f 1 , . . . , f 4 ) = 1, but one can show that f j U 2 = O(p −1/2 ). This rules out a statement of type (2) for s = 1; taking appropriate level sets of these functions rules out (1) also.
The first systematic approach to this question for general systems of linear forms was given by Green and Tao [9] in the course of their work on linear equations in primes. The following is essentially implicit as a much easier case of results from that paper, and was isolated in [4]; however, the terminology we use is slightly different to both. Proposition 1.1 (Essentially from [9]). Given a prime p, a system Φ = (φ 1 , . . . , φ r ) of linear forms Z d → Z, and an index j, 1 ≤ j ≤ r, we say Φ has Cauchy-Schwarz complexity ≤ s at j, modulo p, if the following holds: the indices {1, . . . , r} \ {j} can be partitioned into s + 1 classes I 1 , . . . , I s+1 such that φ j modulo p, considered as a linear form F d p → F p , is not contained in span Fp (φ i : i ∈ I k ) for any 1 ≤ k ≤ s + 1.
Let G = F n p , where p, n may be any size (including say n = 1). If Φ has Cauchy-Schwarz complexity ≤ s at j modulo p, then for any functions f 1 , . . . , f d : G → C with f i ∞ ≤ 1 for each i, we have If Φ has Cauchy-Schwarz complexity ≤ s at every index, modulo p, we could just say the system has Cauchy-Schwarz complexity ≤ s modulo p, and write s CS (Φ) for the smallest s for which this holds (where s CS implicitly depends on p). If p is very large, taking the span of φ 1 , . . . , φ r as linear forms F d p → F p is essentially equivalent to working over Q and the value of s CS stabilizes.
As the name suggests, the proof of Proposition 1.1 is s + 1 applications of Cauchy-Schwarz, as in Gowers' work. The content of the proposition is really in establishing the linear algebra condition that guarantees this Cauchy-Schwarz argument will work.
Following this, Gowers and Wolf, in a series of papers [4,5,6,7], considered the question: is the value of s given by Cauchy-Schwarz complexity optimal? It is natural to try to adapt the examples given by Gowers for k-term progressions to the general case to give a lower bound. The task comes down to finding phase polynomials f 1 , . . . , f r : G → C of degree D, i.e. functions of the form f i (x) = e(P i (x)) where P i : G → R/Z is a degree D polynomial (in a natural sense) for each i, such that Λ Φ (f 1 , . . . , f r ) = 1 i.e. the multilinear average is equal to 1 pointwise. By contrast f i U D will typically be very small when f i is a degree D phase polynomial, so this rules out a statement of type (2) or (1) for s ≤ D − 1.
It turns out that this is possible if and only if φ D 1 , . . . , φ D r ∈ ((F d p ) * ) ⊗D are linearly dependent, where φ D i = φ ⊗D i are interpreted as symmetric multilinear forms over F d p . In other words, Λ Φ can only be fully controlled by the U s+1 -norm if φ s+1 1 , . . . , φ s+1 r are linearly independent elements of ((F d p ) * ) ⊗D . It also turns out that this lower bound, arising from explicit phase polynomials, and the upper bound coming from Cauchy-Schwarz complexity, do not agree in general. Gowers and Wolf conjectured that the lower bound is the truth; that is, if the true complexity of Φ over F p is defined to be the smallest s such that φ s+1 1 , . . . , φ s+1 r are linearly independent, 2 then (2) holds for this s and any 1 ≤ j ≤ r. By our previous discussion, such a statement would be (qualitatively) best possible in s.
In what follows, we write s = s(Φ) to denote this notion of the true complexity of a system of linear forms (over F p ), unless otherwise stated. This conjecture has now been resolved in essentially all cases of interest. The original paper [4] by Gowers and Wolf proved the case where s = 1, s CS = 2, and G = F n p for p fixed and n large. This case was proved again in [6] (also by Gowers and Wolf) with a better bound. Still when G = F n p for p fixed, but not too small, the general case (i.e. arbitrary finite s and s CS ) was proven in another paper [5] by the same authors. They also showed the case s CS = 2 and s = 1 for G = Z/pZ, where this time p is large, in [7]. The general result in the cyclic setting G = Z/pZ for p large was shown by Green and Tao [8] as an application of their nilsequence-based arithmetic regularity lemma.
Later, Hatami and Lovett [11] extended the results of [5] to the asymmetric case, where φ s+1 1 , . . . , φ s+1 r may be linearly dependent but not all of these multilinear forms are in the linear span of the others, which corresponds to (2) holding for some choices of j but not others. Finally, Hatami, Hatami and Lovett [10] removed the requirement in the case G = F n p for p fixed that p be not too small. We comment only very briefly on the proofs, as they will not play a large role in the current work.
We focus on the simplest case of p fixed, s = 1 and s CS = 2. By the assumption on s CS and Proposition 1.1, we are free to discard small U 3 errors at any point. By the inverse theorem for the Gowers U 3norm, this means we are free to assume that each f i is a linear combination of a few phase polynomials of degree at most 2. We would like to argue that when s = 1, the quadratic terms do not contribute much to Λ Φ ; however, this requires a more robust version of our assumption that φ 2 1 , . . . , φ 2 r are linearly independent (in effect, we need that no non-trivial linear combination over ((F nd p ) * ) ⊗2 has low rank). Bridging this gap between the robust and non-robust statements is the heart of the argument.
At least qualitatively, these works verify all the central conjectures concerning true complexity.
Nonetheless, there are some unresolved questions of interest. Question 1.2. What are the best possible bounds in the true complexity statement? That is, how small must δ be in terms of ε to ensure that f j U s+1 ≤ δ implies |Λ Φ (f 1 , . . . , f d )| ≤ ε? 2 In fact Gowers and Wolf set up the definitions slightly differently: they define true complexity to be the smallest s such that (1) holds, and conjecture it is equal to the algebraic quantity we have just defined. Since this conjecture is known to be true in cases of interest, defining things the other way round should hopefully not cause too much confusion.
In the case s = 1 and s CS = 2, Gowers and Wolf [6,7] obtained a doubly exponential dependence, i.e. δ ≈ exp exp(−O(ε −C )). In all other cases where s = s CS the best known bounds are ineffective, or as good as ineffective, as they rely on the inverse theorems for the U k -norms for k ≥ 4 for which no good bounds are known.
In [5,Problem 7.8], Gowers and Wolf suggested that the dependence cannot be too good, and specifically, not polynomial; that is, they asked whether one could find a counterexample ruling out This is closely related to the following question. Question 1.3. In cases where the true complexity and Cauchy-Schwarz complexity differ, could a true complexity bound be proven by elementary means, e.g. by many applications of the Cauchy-Schwarz inequality; or is some appeal to the structural theory of higher order Fourier analysis essential? Is there some qualitative feature which separates the elementary and non-elementary cases?
The primary motivation behind Gowers and Wolf's appeal for counterexamples to good bounds is that this would rule out a proof based only on complicated applications of the Cauchy-Schwarz inequality, as that would surely give a polynomial bound.
Our final question is at first appearance more eccentric but we will see its relevance shortly. By contrast, the Cauchy-Schwarz complexity bound is completely uniform in the coefficients, provided the hypotheses are satisfied.
Is this restriction to bounded coefficients necessary, whenever s = s CS ?
Working over F n p for p fixed and n large, there are only finitely many choices of linear forms, so any dependence on the coefficients can be removed. In this setting, the analogous question is whether the bounds should genuinely depend on p.
In this paper, we consider what is in some sense the smallest non-trivial case where s = s CS , which concerns systems of 6 linear forms in 3 variables (i.e., r = 6 and d = 3). Indeed, when d = 2 it is always the case that s = s CS , and similarly for d = 3 and r ≤ 5. However, a generic system Φ with r = 6 and d = 3 will have s = 1 but s CS = 2 (see Section 2 for a discussion).
In this limited setting, we are able to give fairly complete answers to the questions above. We now outline the main results.
Theorem 1.5. Let Φ = (φ 1 , . . . , φ 6 ) be a system of 6 linear forms in 3 variables, let p be a prime (not necessarily small ), and let G = F n p for any n ≥ 1.
Then, provided the system Φ has true complexity 1 over F p , for any functions f 1 , . . . , f 6 : G → C with f i ∞ ≤ 1 for each i, and for any j, 1 ≤ j ≤ 6, we have the bound where C = C(Φ, j) > 0 is some constant depending on the coefficients of Φ, and perhaps j, but crucially not on p or n.
Moreover, the above inequality can be derived only using multiple applications of the Cauchy-Schwarz inequality. However, the number of applications used in the proof increases without bound as the coefficients of Φ grow.
Note that, since no restrictions are placed on n and p, this encompasses the cases Z/pZ for p a large prime as well as F n p for p fixed and n large. In intermediate cases where p and n are both large, even qualitatively the result may officially be new, although these cases are rarely of interest.
The key observation underlying the proof is the following. Slogan 1.6. Cauchy-Schwarz complexity is not preserved under applying the Cauchy-Schwarz inequality.
By this we mean the following. If we start with a system of linear forms Φ and apply the Cauchy-Schwarz inequality to one of the functions, what we get can be thought of as a new linear system Φ ′ with 2(r − 1) forms and 2d− 1 variables. It is not always true that if s CS (Φ) > 1 then s CS (Φ ′ ) > 1; so in some cases we can now apply the Cauchy-Schwarz complexity bound (Proposition 1.1) to Λ Φ ′ to bound it by f j U 2 , meaning that in turn Λ Φ is bounded by f j 1/2 U 2 . More generally, we can hope to apply Cauchy-Schwarz repeatedly and systematically, eventually arriving at a system with Cauchy-Schwarz complexity 1.
On the other hand, we show that the quantity C(Φ), which quantifies the number of times the Cauchy-Schwarz inequality is used, must necessarily grow without bound as Φ varies. Theorem 1.7. For any sufficiently large prime p, p ≡ ±1 (mod 8), there exist a system Φ of 6 linear forms in 3 variables, and functions f 1 , . . . , f 6 : Z/pZ → C with f i ∞ ≤ 1 for each i, such that Unlike in Theorem 1.5, here the system Ψ is allowed to change as p grows, with no control on the size of its coefficients. The condition p ≡ ±1 (mod 8) is an inessential one related to the precise construction used and could be removed without too much added difficulty. The difficulty turns out not to be that the Cauchy-Schwarz inequality is insufficiently powerful, or too blunt to detect the algebraic nature of the boundary between systems with s = 1 and s = 2; in fact it handles such considerations surprisingly easily.
Instead, the issue is that the Cauchy-Schwarz steps used must necessarily be tailor-made to the system Φ being considered. The task of describing a mapping from systems to Cauchy-Schwarz arguments could be likened to that of building a primitive computer using only the Cauchy-Schwarz inequality. Setting up the technical machinery required to achieve this will occupy most of the paper.
Remark 1.9. The value of C(Φ) given by the proof of Theorem 1.5 is completely explicit but in many cases unreasonably large. No serious attempt has been made to optimize it, although minor changes would probably produce only minor improvements.
For large p, the worst-case behavior given by the proof is something like exp(O(K O(1) )) where K is the size of the largest (integer) coefficient appearing in Φ. Although typically one expects not to hit the worst case, nonetheless in practice for integer coefficients of size about 10 values such as C ≈ 2 400 are not unusual. It seems likely such values are not best possible.
When p is fixed, we may state a bound in terms of p rather than the size of the coefficients. Here the method gives C(p) = O(p O(1) ). It is possible one could modify the argument to improve this to O(log p), which would be best possible up to absolute constants. However, significant additional technical challenges arise, and so we will not attempt this here. There is no immediately apparent obstruction to the overall approach of repeated application of the Cauchy-Schwarz inequality succeeding in general, but conversely it is not obvious how to generalize the specific strategies used when d = 3 and r = 6 to the general case. Therefore, this is left to possible future work.
1.1. Outline of the paper. In Section 2 we present some preliminaries concerning the case of 6 forms in 3 variables. In particular, we will deal with some initial degenerate cases where, in a technical and slightly disingenuous sense, we will see that applying the Cauchy-Schwarz inequality causes s CS to decrease. We will need these cases in what follows, but this also serves as an introduction to the general approach behind the proof of Theorem 1.5 without the notational complexities.
In Section 3 we introduce formalisms to keep track of the effects of multiple applications of Cauchy-Schwarz in a systematic manner. This has the effect of reducing the proof of Theorem 1.5 in any given instance, to winning a Cauchy-Schwarz "game" which has a well-defined set of possible moves and which can readily be simulated on a computer. Section 4 addresses the core problem of solving this game in general. This comes down to finding sequences of moves which have the effect of implementing predictable arithmetic operations on the system Φ, and using them to walk Φ to a degenerate configuration of the type considered in Section 2.
Finally, Section 5 gives the proof of the negative result, Theorem 1.7.
1.2. Notation. We use O(1) to denote any quantity bounded above by an absolute constant, and O(X) to mean O(1)X. The notation [m] for m a positive integer denotes the set {1, 2, . . . , m}. For a real parameter x, e(x) denotes exp(2πix). The notation [A = B] (for example) denotes the indicator function of the event A = B. If W is a finite-dimensional vector space over F p , we write W * for its dual space and P(W ) for the corresponding projective space (i.e. the space of 1-dimensional subspaces of W ). Also, P k = P k (F p ) means the same as P F k+1 p . Given w ∈ W \ {0}, we write [w] for the corresponding element of P(W ).
1.3. Acknowledgements. The author would like to thank Sean Eberhard, Ben Green, Rudi Mrazović and Julia Wolf for discussions on these topics at various times.

Preliminaries concerning six forms in three variables
We start by giving a brief analysis of the different cases that can arise concerning a system of six forms Φ = (φ 1 , . . . , φ 6 ) in three variables, and the associated Cauchy-Schwarz complexity and true complexity. Throughout this section we write V = F 3 p , so φ i (modulo p) can be thought of as linear functionals V → F p , always assumed to be non-zero.
It is clear that nothing substantial changes when we replace φ i by a non-zero scalar multiple λφ i . Indeed, the quantities f i (φ i (v)) and f i (λφ i (v)) are essentially the same, up to replacing f i with a dilate of itself, and so this has no effect on the conclusion of Theorem 1.5; and by inspection our definitions of true complexity and Cauchy-Schwarz complexity are also unchanged.
Therefore it makes sense to think of the forms φ i : V → F p as points [φ i ] in the projective plane P(V * ) ∼ = P 2 (F p ), quotienting out by the action of scalar multiplication. This allows us to phrase the different cases geometrically.
We have said that Φ has true complexity s = 1 if the symmetric bilinear forms φ 2 1 , . . . , φ 2 6 ∈ (V * ) ⊗2 are linearly independent. Note that this space of symmetric bilinear forms on V has dimension (3 · 4)/2 = 6, and there are six forms, so we expect this to be true generically. Indeed, a dependence relation on φ 2 1 , . . . , φ 2 6 exists if and only if there is a non-zero linear functional µ : {σ ∈ (V * ) ⊗2 : σ = σ T } → F p which evaluates to 0 on each φ 2 i ; and this in turn is the same thing as a non-zero quadratic form V * → F p which vanishes at each φ i ; i.e., a conic in the projective plane P(V * ) containing [φ i ] for each i. 3 In other words, we have shown the following.  It is possible for the true complexity to be greater than 2: for instance, if five of the points lie on a line in P 2 , in which case the true complexity is 3; or if [φ i ] = [φ j ] for some i = j, in which case the system has infinite complexity.
However, all such cases may be fully analyzed in terms of Cauchy-Schwarz complexity, which gives a bound |Λ Φ (f 1 , . . . , f 6 )| ≤ f j U s j +1 for each j where the values s j are best possible, even when they vary with j. The details are an uninteresting check that will not be relevant to the argument, so are omitted.
We therefore restrict our attention to the cases with s = 1. In particular, we can henceforth make the following assumptions: Indeed, any way we partition all but one of the forms into two classes, one of the classes will contain three forms and so their span will be all of V * ; hence s CS > 1. Conversely any 2 − 2 − 1 split achieves Our remaining task in this section is to consider the case where (i)-(iii) hold, but nonetheless are not in general position. This is a setting in which Cauchy-Schwarz complexity has some purchase, but nonetheless there is a subtlety meaning, technically speaking, that typically s CS = 1.
are not collinear. Then for functions f 1 , . . . , f 6 : F n p → C, with f i ∞ ≤ 1 for each i, and for any j = 4, 5, 6 we have a bound coming from Proposition 1.1.
(ii) Under the same conditions as (i), the system has true complexity s = 1. In particular, by results of Gowers and Wolf, ] is the only collinear triple. Then for j = 1, 2, 3 there is no way to partition {1, . . . , 6} \ {j} into two pieces such that φ j is in the span of neither piece, and hence s CS = 2 for this system.
Proof. For (i), say when j = 6, we can partition {1, . . . , 5} into {1, 2, 3} and {4, 5}. By our assumptions, it is clear that For (ii), we note that a conic containing three distinct collinear points must be degenerate, but are not contained in the union of any two lines. Hence the points do not lie on a conic, and so s = 1.
For (iii), when say j = 1, given any partition of {2, . . . , 6} into two pieces, one of the pieces contains three of the forms. Since that triple is not φ 1 , φ 2 , φ 3 , they are not collinear and so their span is all of V * .
So, this is a case where s and s CS differ, albeit for what feels like a bad reason. Indeed, it is not too challenging to recover a good bound on Λ Φ (f 1 , . . . , f 6 ) in terms of f 1 U 2 in this setting, for instance by decomposing f 6 into two parts corresponding to its large and small Fourier coefficients, bounding away the uniform contribution and treating what is left as essentially a system of five forms.
Instead, we will now recover such a bound purely by using the Cauchy-Schwarz inequality, and thereby provide the first (admittedly unimpressive) instantiation of Slogan 1.6.
] are collinear and no two are the same; Then for functions f 1 , . . . , Proof. By applying a suitable change of basis to V = F 3 p , we may assume without loss of generality that φ 6 (x, y, z) = x; this is not essential but eases the notation. So, We can apply the Cauchy-Schwarz inequality to obtain . Now, the term on the right expands to and we can think of this as Λ Φ ′ (f 1 , . . . , f 5 , f 1 , . . . , f 5 ) where Φ ′ is the system of 10 forms in the five variables x, y, y ′ , z, z ′ , given by φ i0 (x, y, y ′ , z, z ′ ) = φ i (x, y, z) and φ i1 (x, y, y ′ , z, z ′ ) = φ i (x, y ′ , z ′ ) for 1 ≤ i ≤ 5, each thought of as a linear functional F 5 p → F p . We claim that, under our hypotheses, it is possible to partition the nine forms φ 20 , φ 30 , . . . , φ 50 , φ 11 , . . . , φ 51 into two classes such that φ 10 is not in the span of either class. Specifically, we will take to be the sets of indices in each class. If this claim holds, then by the standard Cauchy-Schwarz complexity bound (Proposition 1.1) again we have and the result follows.
We now verify the claim. We first note that, since as φ 31 ∈ span(φ 11 , φ 21 ). This makes the claim plausible for dimension reasons: it is reasonable to expect the span of four linear forms on F 5 p not to contain a fifth, unless something untoward happens. However, something untoward could genuinely happen if too many of the original forms are collinear, and more generally we need to show that all bad cases are ruled out by our hypotheses. This is the technical part of the calculation, and may be skipped on first (or subsequent) reading.
To show φ 10 is not in the span of φ 20 , φ 40 , φ 11 , φ 21 , it would suffice to show that φ 10 together with the other four form a basis for F 5 p * ; equivalently, that the matrix (whose columns correspond to x, y, y ′ , z, z ′ respectively) is non-singular. However, it is not hard to see that The determinants on the right hand side are zero precisely when, respectively, are collinear. Under our assumptions, neither can be true (as then four points would lie on a line) and so M S is non-singular.
The argument for T is very similar. We define which is non-singular if and only if φ 10 , φ 30 , φ 50 , φ 41 , φ 51 form a basis. Then and again this is zero if and only if either these are explicitly ruled out by our hypotheses, and this proves the claim.
Remark 2.4. One way to think of this proof on a high level is as a combinatorial analogue of the method we sketched above: namely, first observing that Λ Φ is controlled by f 6 U 2 , then noting this allows us to essentially eliminate f 6 by replacing it with the sum of its large Fourier coefficients, and finally applying Cauchy-Schwarz on the remaining five forms.
What we do here is first make two copies of the original system, joined by φ 6 ; on the right, we decompose the remaining forms as if we were attempting to prove a Cauchy-Schwarz complexity bound in f 6 U 2 , as in Proposition 2.2; and on the left we decompose as if we were tring to prove a Cauchy-Schwarz complexity bound in f 1 U 2 and the form φ 6 didn't exist.
So, the initial Cauchy-Schwarz allows us to somehow substitute the information gained from the former argument into the latter. Remark 2.5. As we have said, this is an application of Slogan 1.6, but not a very convincing one. Before embarking on the programme in full generality, we briefly sketch an example of six forms in general position, having s = 1 (but necessarily s CS = 2), where we nonetheless get a bound using only Cauchy-Schwarz.
Consider the forms where a, b, c ∈ F p are arbitrary subject to the condition that the forms be in general position. For concreteness one could substitute a = 7, b = 11, c = 13.
The fact that these kinds of linear algebra conditions can detect whether the points lie on a conic should perhaps not be surprising in light of Pascal's hexagon theorem.

Formalisms for iterated Cauchy-Schwarz
The purpose of this section is to introduce some formalisms necessary to keep track of what happens when we apply the Cauchy-Schwarz inequality repeatedly. The notational overhead here is high, but preferable to handling yet larger explicit calculations in the style of the previous section.
3.1. Linear data. Although the central objects of study are systems of linear forms, it will be convenient to use a natural generalization of this notion, which handles the objects that arise in intermediate stages of the calculation. We introduce the relevant definitions now.
Definition 3.1. Let a prime p be fixed. By a linear datum, we mean a tuple Ψ = (V, (W i ) i∈I , (ψ i ) i∈I ), where I is some finite index set, and Given a positive integer n, we abuse notation to write ψ i : V n → W n i for the map that applies ψ i to each coordinate. Now, for a collection of functions It is clear that in the special case that dim W i = 1 for each i, this is essentially the same information as a system of linear forms on V ∼ = F d p for some d. The reader should always imagine dim V as being small, even when we are working over G = F n p for some large n: the n is taken care of in the definition of Λ Ψ , not of Ψ.
Attempting to analyse linear data in general exposes hard problems; see [1,2]. Since the linear data we will consider ultimately come from systems of linear forms, these subtleties will not arise here.
Remark 3.2. Typically we are not too concerned by replacing W i by isomorphic vector spaces, or by the exact form of the linear map ψ i : for instance, as we have said the difference between f (ψ i (v)) and f (2ψ i (v)) is usually immaterial.
As such, the only really important information is the collection of subspaces ker ψ i of V , as we can always recover W i up to isomorphism as V / ker ψ i . One can interpret ker ψ i as the subspace of V that the function f i cannot depend on.
Alternatively, we could think about the dual subspaces (ker ψ i ) * ⊆ V * , corresponding to the span of all linear forms derived from ψ i . This is consistent with the geometric picture from Section 2: such subspaces correspond to points, lines, planes etc. in P(V * ).
For technical reasons it is useful to keep track of the linear maps ψ i explicitly; but the reader will rarely lose anything, and possibly gain something, by thinking of a linear datum as simply a collection of subspaces of V or V * .
We need some notion of when one linear datum bounds another; for instance, but not exclusively, because one is obtained by applying the Cauchy-Schwarz inequality to the other.
. Suppose further that for some pair j ∈ I, j ′ ∈ I ′ the subspaces W ′ j ′ = W j are identified. Finally, let c > 0 be a positive real number.
Some straightforward examples of domination include (i) replacing Ψ by an isomorphic system (i.e. reparameterizing); (ii) augmenting Ψ by introducing further averaging, or by replacing ker ψ i by a strictly larger subspace for some i; or (iii) taking a supremum over some part of the average. All of these are subsumed in the following general proposition.
are two linear data on the same index set I, and that we are given linear maps θ : V ′ → V and σ i : e. a morphism of linear data). If j ∈ I is some index such that W j = W ′ j and σ j is the identity, then Ψ ′ dominates Ψ respecting (j, j), with exponent 1.
For any collection of functions f i : W n i → C, i ∈ I, we have Now fix ℓ to be any maximal choice, and define g i : W ′ i n → C by We deduce that |Λ Ψ ((f i ) i∈I )| ≤ |Λ Ψ ′ ((g i ) i∈I )|. Moreover, it follows from our assumptions that g j = f j , and so the conditions of Definition 3.3 are satisfied.
We now consider how to describe an application of the Cauchy-Schwarz inequality in this language.
Proposition 3.6. Suppose Ψ = (V, (W i ) i∈I , (ψ i ) i∈I ) is a linear datum, and some j ∈ I is given. Let be the linear datum defined as follows: • V ′ is the fiber product of V with itself over W j , i.e.: • I ′ is the disjoint union of two copies of I \ {j}, denoted Then for any i ∈ I \ {j}, Ψ ′ dominates Ψ respecting (i, i 0 ) and with exponent 1/2.
We note that ψ ′ i0 , ψ ′ i1 are surjective, e.g. by observing that Proof. As promised, this is just the statement of the Cauchy-Schwarz inequality as it applies in this context. Given f i : W n i → C, we have by Cauchy-Schwarz, and Defining (g i ) i∈I ′ in the obvious way, and provided f i ∞ ≤ 1 for each i ∈ I, we get the desired inequality.
Definition 3.7. We denote the system Ψ ′ defined in Proposition 3.6 by CS j (Ψ).
Often we need to apply Cauchy-Schwarz not just to one function, but to several at a time. The preferred way of formalizing this for our purposes is in two steps. First, we merge all the functions being considered for Cauchy-Schwarz into a single function. That is, we forget that they are separate functions, and consider their product as just one function of all the variables they collectively depend on. For instance, we might merge f 1 (x) and f 2 (x+y) into F (x, y). Next, we apply the Cauchy-Schwarz inequality in the form of Proposition 3.6 to the new function F .
In fact we will want to apply this merging operation in other contexts as well, because doing so is one way to eliminate redundant information. Having this ability is one of the main motivations for working in this more general language of linear data.
Again, we encode this operation with a proposition.
Proposition 3.8. Let Ψ = (V, (W i ) i∈I , (ψ i ) i∈I ) be a linear datum, let J be a finite set, and let τ : I → J be a surjective function. Define a new linear datum Ψ ′ = V, (W ′ j ) j∈J , (ψ ′ j ) j∈J on the same underlying space V , as follows: • define W ′ j = im ψ ′ j ; and • by abuse of notation consider ψ ′ j as a map V → W ′ j .
Proof. Given (f i ) i∈I , for each j ∈ J let g j : and then restrict this function to the subspace W ′ j n . Then it is easy to see that and so the necessary inequality is in fact an equality. Moreover, if τ −1 (j) = {i} is a singleton then In a slight abuse of notation, we may omit any indices that are unchanged by τ from the description of τ . For instance, the operation that merges indices 4 and 7 and labels the new combined index A might be denoted MERGE {4,7} →A .
We are now in a position to state a version of Theorem 1.5 coded in this language.
Lemma 3.11. Let p be a prime, and let φ 1 , . . . , φ 6 : Z 3 → Z be six linear forms in three variables. Let V = F 3 p , and by abuse of notation let φ i : V → F p for 1 ≤ i ≤ 6 denote the same forms reduced modulo p, assumed to be non-zero. Write W i = F p for 1 ≤ i ≤ 6, set I = [6], and hence define the linear datum Ψ = (V, (W i ) i∈I , (φ i ) i∈I ).
Suppose [φ 1 ], . . . , [φ 6 ] do not lie on a conic in P(V * ). Then there is some sequence of operations TRIVIAL, CS j and MERGE τ which can be applied to Ψ in turn to produce a final linear datum Ψ ′ , such that: i.e. Ψ ′ again corresponds to 6 linear forms in 3 variables over F p , where φ ′ 5 and φ ′ 6 only may have changed; • by applying Propositions 3.4, 3.6 or 3.8 as appropriate as we go, we can deduce that Ψ ′ dominates Ψ respecting (1, 1) and with exponent 2 −m where m is the number of CS steps; This last condition means that one of Proposition 2.2 or Proposition 2.3 applies to the forms φ ′ 1 , . . . , φ ′ 6 , and so Combining this with the domination statement allows us to deduce Theorem 1.5 (at least for j = 1; the other cases follow by relabelling the indices).
Remark 3.12. The combinatorial operations CS j , MERGE τ describe the heart of any strategy, whereas TRIVIAL steps are really just book-keeping to aid with proofs. One could in principle delay all TRIVIAL steps to the end of the argument, or perhaps remove them completely, without fundamentally changing the approach.

3.2.
Graphs of vector spaces. One remaining difficulty in reasoning about the effect of repeated invocations of CS j is finding a good notation for discussing the iterated fiber products that arise in the definition of the ambient vector space V ′ .
At the expense of yet further notational overhead, we introduce one more tool to help with this.
This subsection has very little content beyond allowing us to draw certain diagrams and make sense of what they mean.
Definition 3.13. Let G = (X, E) be a (multi)-graph with vertex set X and edge set E, and let V be any vector space over F p . Suppose that to every edge e = (x, y) ∈ E is associated a subspace H e of V . Then the vector space G(V, (H e ) e∈E ) associated to this set-up is the subspace of V X given by In other words, we place a copy of V at every vertex and impose a compatibility restriction for every edge.
We will always apply this when each subspace H e is one of ker ψ i for 1 ≤ i ≤ 6, where ψ i : V → W i is part of some original linear datum with underlying space V . It then makes sense to label each edge e with a number i ∈ [6], in place of the subspace H e = ker ψ i .
The useful feature of this set-up is that CS j steps correspond to simple combinatorial operations on the graph G: we replace X by two copies X 0 , X 1 , keeping all the edges in each half; and we add an edge between X 0 and X 1 for every linear form involved in that Cauchy-Schwarz step (which is applied to the merge of some linear forms). This is best illustrated by example. Suppose we start with a linear datum Ψ = V, (W i ) i∈ [6] , (ψ i ) i∈ [6] .
At this point, the graph G consists of a single vertex and no edges.
If we now apply CS 6 , we get a linear datum whose underlying vector space corresponds to the following graph: Any formal justification of this general pattern would be tedious and unreadable, so will will not attempt one. The reader may, if they wish, treat all such diagrams as visual aids having no formal impact on the proofs.

The detailed strategy for Theorem 1.5
The formalism of the previous section gives us very significant freedom to make radical changes to a linear datum Ψ. However, to prove a general result, what we want is to find a sequence of operations that changes Ψ as conservatively as possible, ideally giving back another datum of the same form with a small predictable change to some of the parameters.
Our task in this section then splits into two parts: (i) to describe such a sequence of operations -henceforth called a block -and analyze and verify the change it produces; and (ii) to show how to chain these blocks together to reach sufficiently arbitrary points in the parameter space.
We will approach these tasks in reverse order.

4.1.
The effect of the block construction. We again write V = F 3 p . Suppose X 1 , . . . , X 6 are six points in P(V * ), corresponding to some system of six linear forms. Suppose furthermore that X 1 , . . . , X 4 are in general position, and that X 5 , X 6 lie on some given line ℓ but X 1 , . . . , X 4 do not lie on ℓ. Note we allow that, say, X 1 X 2 X 5 be collinear, or even that X 5 = X 6 ; in the latter case, ℓ forms part of the data of the set-up since it cannot be recovered from X 1 , . . . , X 6 .
We will now describe an operation that modifies this collection of points. Specifically, it will leave X 1 , . . . , X 4 unchanged, and replace X 5 , X 6 with two different points X ′ 5 , X ′ 6 that both lie on the unchanged line ℓ.
The points X ′ 5 , X ′ 6 are constructed as follows. Let Y be the point at the intersection of the lines X 1 X 5 and X 3 X 4 ; then X ′ 5 is the intersection of X 2 Y and ℓ. Similarly, letting Z be the intersection of X 2 X 6 and X 3 X 4 , the point X ′ 6 is the intersection of X 1 Z and ℓ. This construction is shown in Figure  1.
Given our hypotheses on X 1 , . . . , X 6 , this definition always makes sense.
We call this construction a block operation B 1→2 , i.e. B 1→2 (X 1 , . . . , X 6 ; ℓ) = (X 1 , . . . , X 4 , X ′ 5 , X ′ 6 ; ℓ). By exchanging the roles of X 1 , X 2 , X 3 , X 4 we create a family of 12 operations B i→j for each pair Figure 1. The block construction i, j ∈ [4], i = j. Swapping X 3 and X 4 but leaving X 1 and X 2 the same gives the same construction, which accounts for the fact there are 12 operations not 4! = 24, and for the choice of notation.
In Section 4.2, we will implement a sequence of CS, MERGE and TRIVIAL operations whose overall effect is equivalent to this B 1→2 move. That is, starting with a linear datum corresponding (indirectly) to forms φ 1 , . . . , φ 6 and applying this sequence of operations, we obtain a linear datum corresponding to B 1→2 ([φ 1 ], . . . , [φ 6 ]). We will not state this result precisely yet, as there are some technical subtleties to do with the case [φ 5 ] = [φ 6 ], where the previous sentence does not even make sense and the datum has to be modified to encode the line ℓ as well as the points. 4 For the time being we will consider the operations B i→j as a black box. In order to prove Lemma 3.11, we broadly need to show that some sequence of moves B i→j takes the original X 5 , X 6 to some final X ′′ 5 , X ′′ 6 , with the property that one of X i , X j , X ′′ 5 or X i , X j , X ′′ 6 are collinear for some choice of 1 ≤ i < j ≤ 4, but the complementary triple are not collinear.
The second part of this is guaranteed by the following lemma, which shows that we never lose control of true complexity by applying B 1→2 (and symmetrically B i→j for other pairs (i, j)).  Let (X 1 , . . . , X 6 ; ℓ) be as above, and suppose: • if X 5 = X 6 , that X 1 , . . . , X 6 do not lie on a (possibly degenerate) conic; or • if X 5 = X 6 , that X 1 , . . . , X 5 do not lie on a (possibly degenerate) conic that is tangent to ℓ at X 5 . 4 Handling this case correctly is an irritating source of complexity in the argument, but seems to be slightly less irritating than avoiding it.
Note that in the degenerate case, saying a degenerate conic consisting of two lines µ 1 ,µ 2 is "tangent" to ℓ translates algebraically to saying that µ 1 , µ 2 , ℓ are concurrent.
Proof. For any point Y on ℓ, there is an unique (possibly degenerate) conic C passing through X 1 , . . . , X 4 and Y . This conic meets ℓ again at precisely one other point, counting multiplicity, which we denote by τ (Y ). So, τ is an involution τ : ℓ → ℓ of the points of ℓ.
It is clear without doing detailed calculations that τ is a birational map P 1 (F p ) → P 1 (F p ), and so it must be a Möbius transformation (or one can check this directly). Moreover, if we write A ij for 1 ≤ i < j ≤ 4 for the intersection point of the lines X i X j and ℓ, then τ is characterized by since in all of these cases the conic C is degenerate and so τ (Y ) can be found by inspection.
This means that if we can find a sequence of operations B i→j that takes X 5 to some X ′′ 5 such that some triple X k , X ℓ , X ′′ 5 for 1 ≤ k < ℓ ≤ 4 is collinear, then the complementary triple cannot be collinear as then X 1 , . . . , X 4 , X ′′ 5 , X ′′ 6 would lie on a degenerate conic. So, we can largely forget about in what follows and concentrate on the action of B i→j on X 5 , which corresponds to the Möbius transformations σ i→j defined in the proof of Lemma 4.1.
The following lemma explains how to take X 5 to an arbitrary point on ℓ, relatively efficiently, using multiple transformations σ i→j . Lemma 4.2. We continue the notation from the proof of Lemma 4.1. Suppose we identify ℓ with P 1 (F p ) (i.e., choose coordinates) by identifying A 12 with ∞, A 13 with 0 and A 23 with 1 (again noting these are guaranteed to be distinct ). Every Möbius transformation of ℓ now corresponds to a 2 × 2 matrix in PGL 2 (F p ).
Then the following hold: Consequently, the action of σ i→j on P 1 (F p ) is transitive, and more specifically any point [r : s] ∈ P 1 (F p ), where r, s ∈ Z, may be mapped to [1 : 1] using a word of length O(|r| + |s|) or O(log p) in σ i→j .
Proof. The first three identities may be verified using only (4) (and the corresponding statement for the other σ i→j ) to deduce the action on A 12 ,A 13 ,A 23 (which correspond to ∞, 0, 1).
For the next four, that approach does not appear to be sufficient. Our strategy is just pick coordinates and compute explicitly. Using the information (4), we can compute the matrices of σ i→j explicitly as To see this is possible, note that we can certainly choose a projective transformation sending X 1 , X 2 , X 3 to [1 : 0 : 0], [0 : 1 : 0] and [0 : 0 : 1] as X 1 , X 2 , X 3 are not collinear. In these coordinates, ℓ = {(x, y, z) ∈ P(V * ) : ax + by + cz = 0} for some a, b, c ∈ Fp \ {0} (as none of X 1 , X 2 , X 3 lie on ℓ); by further rescaling we can ensure a = b = c = 1. This convention now differs from the one above -which is somewhat more convenient -by a further fixed change of coordinates.
with the other six following from the relation σ i→j = σ −1 j→i . Verifying the remaining formulae is now just an exercise in multiplying (projective) matrices.
By a modified version of Euclid's algorithm, any point [r : s] ∈ P 1 may be reduced to one of ∞, 0, 1 using the matrices 1 ±2 steps (the worst case being something like [r : 1] for r a large integer; the typical case is better). Alternatively, this can be done in O(log p) steps, since the Cayley graph on PSL 2 (F p ) with these generators has diameter O(log p), as a corollary of celebrated results concerning expander graphs; see [12]. Finally, one of the first three matrices moves the end point to [1 : 1], if necessary.
We should also verify that the values of [r : s] representing the original point X 5 are not too large. Proof. Write |X i X j X k | for the determinant of the 3×3 matrix whose columns are the vectors (x i , y i , z i ), (x j , y j , z j ), (x k , y k , z k ). Then we set We have one further technical issue to consider. It will be convenient to construct the block that implements B i→j only in cases where none of the triples X i , X j , X 5 or X i , X j , X 6 for 1 ≤ i < j ≤ 4 is collinear. This is not too onerous, as the first time we obtain a set of points where one of these triples is collinear, we can just stop and the conclusion of Lemma 3.11 will be satisfied. However, for this to work we need to check that, when this happens, we are not in one of the degenerate cases where X 5 = X 6 .
We therefore check the following lemma.
Lemma 4.4. Suppose X 1 , . . . , X 6 and ℓ are as above and X ′ 5 , X ′ 6 are the points returned by the block move B 1→2 (X 1 , . . . , X 6 ; ℓ). Suppose also that no triple X i , X j , X 5 or X i , X j , X 6 for 1 ≤ i < j ≤ 4 is collinear, but that some triple X i , X j , X ′ 5 or X i , X j , X ′ 6 is collinear. Then X ′ 5 = X ′ 6 .

4.2.
Implementing a block move. In this subsection we describe a sequence of CS, MERGE and TRIVIAL operations that have the effect of a block transformation B 1→2 . This is the last but most central ingredient in the proof of Theorem 1.5.
In the course of the argument in Section 4.1, we may need to consider intermediate configurations (X 1 , . . . , X 6 ) ∈ P 2 for which X 5 = X 6 . Typically we do not expect this case to arise, but it would be onerous to try to avoid it in general. Also, it is not true that in such cases we are immediately done by some easy Cauchy-Schwarz technique as in Section 2: it appears this degeneracy is not one we can use to our advantage.
To handle this, we need to build more slack into our linear datum. We again set V = F 3 p .
There are many roughly equally cryptic ways to phrase this rigorously. Geometrically, what has happened is that we have embedded our projective plane P 2 = P(V * ) in a three-dimensional space P 3 = P((V ⊕ F p ) * ), and each of ψ 1 , . . . , ψ 4 corresponds to the respective point [φ 1 ], . . . , [φ 4 ] in the embedded copy of P 2 . Meanwhile, ψ 5 and ψ 6 correspond to lines in P 3 whose intersections with the embedded P 2 are [φ 5 ], [φ 6 ], and whose canonical projections onto the embedded P 2 are both (contained in, but secretly equal to) the line ℓ.
Proof. Both directions are by TRIVIAL steps (i.e. Proposition 3.4). First we show that Ψ 1 dominates Ψ 2 trivially (respecting 1 ≤ j ≤ 4). Indeed, we may consider the surjective maps θ : V ⊕ F p → V given by θ(v, t) = v, and σ i : W i → W ′ i given by the identity if 1 ≤ i ≤ 4 and σ 5 (x, y) = σ 6 (x, y) = x. It is immediate from our hypotheses that φ i • θ = σ i • ψ i for each i, and the claim follows.
To show Ψ 2 dominates Ψ 1 trivially (respecting 1 ≤ j ≤ 4), we consider an injective map ı : V → V ⊕ F p given by ı(v) = (v, µ(v)) for some µ ∈ V * which will have to be chosen carefully, together with τ i : W ′ i → W i given by the identity if 1 ≤ i ≤ 4 and to be specified when i = 5, 6. It suffices to show that ψ i • ı = τ i • φ i for each 1 ≤ i ≤ 6, under suitable choices. Note that this is already immediate for 1 ≤ i ≤ 4, given our hypotheses. For i = 5, 6, we need precisely that for any v ∈ V . If µ + χ i ∈ span(φ i ) for i = 5, 6 as elements of V * , so µ + χ i = γ i φ i for some γ 5 , γ 6 ∈ F p , we could define τ i (x) = (x, γ i x) and the equation would be satisfied.
Again it may be instructive to think about this geometrically. In the first part, we used the canonical embedding of P 2 into P 3 discussed above to get our morphism of linear data. In the second part, we chose a particular non-standard projection P 3 → P 2 that collapses the line corresponding to ψ 5 onto [φ 5 ] and that corresponding to ψ 6 onto [φ 6 ].
Finally, we can state a lemma which is the workhorse of the whole argument.
The argument has roughly two phases. In the first phase, our goal is to build a datum corresponding to the following graph of vector spaces over V ⊕ F p : Here the numbers next to the vertices denote two classes corresponding to those indices that get merged into ψ ′ 5 and those that get merged into ψ ′ 6 respectively. The indices at vertex A will turn into ψ ′ 1 , . . . , ψ ′ 4 . It is not possible to construct this graph directly using CS steps, so we have to build a larger graph using CS steps and then prune it back using MERGE and TRIVIAL steps.
In the second phase, we need to apply a carefully chosen TRIVIAL operation to reduce this to a system Ψ ′ defined on a single copy of V ⊕ F p .
Proof of Lemma 4.7. We abbreviate V ⊕ F p to V ′ . Beginning with the augmented datum Ψ = V ′ , (W i ) i∈ [6] , (ψ i ) i∈ [6] , we first apply CS 6 : Our next task is to prune back all of the (5, 6) squares apart from the bottom right one using MERGE and TRIVIAL steps. We will need the following standard linear algebra fact. • the maps σ 1 , σ 2 commute.
Proof. Pick a basis for ker s 1 ∩ ker s 2 , and extend it separately to a basis for ker s 1 and ker s 2 ; merging these gives a basis for ker s 1 + ker s 2 . Finally extend this to a basis for L. This gives a direct sum decomposition L = K 00 ⊕ K 01 ⊕ K 10 ⊕ K 11 where ker s 1 = K 00 + K 01 and ker s 2 = K 00 + K 10 . Let σ 1 (x 00 , x 01 , x 10 , x 11 ) = (0, 0, x 10 , x 11 ) and σ 2 (x 00 , x 01 , x 10 , x 11 ) = (0, x 01 , 0, x 11 ). It is clear these maps have the desired properties. and define a datum (It is not very important, but these are all surjective, as can be seen by considering the image of the diagonal embedding V ′ → V ′ .) We claim that Ψ 2 is dominated trivially by the "pruned" datum Ψ 3 . To justify this, we first apply Lemma 4.8 to V ′ , ψ 5 and ψ 6 to obtain maps σ 1 , σ 2 : V ′ → V ′ . We can then define an injection : ω = 0010, 1010, 1110 .
For this to make sense, we need the compatibility conditions associated to the graph for V to hold. In particular we need and indeed these follow from the properties of σ 1 and σ 2 . The remaining compatibility conditions are inherited from V ′ .
is just the identity. For R and S, consider that ker ψ as the original constituent forms of ψ R depend only on v ′ 0010 and those of ψ It follows that there exist unique linear maps σ R : W S , respectively. Hence the conditions of Proposition 3.4 are satisfied and Ψ 3 dominates Ψ 2 trivially.
It is natural to relabel R as 5 0110 and S as 6 0110 in Ψ 3 , as these indices now behave exactly like copies of ψ 5 and ψ 6 respectively associated to the vertex 0110.
We now perform our remaining MERGE operation. This partitions all remaining forms apart from those in the 0000 copy into two classes A and B, by Finally, we wish to dominate Ψ 5 trivially by an augmented datum Ψ ′ . Recall that Ψ ′ is as follows: • its base space is V ′ ; • the index set is {1, . . . , 6}; • the spaces W ′ i are given by , and χ ′ 5 , χ ′ 6 ∈ ℓ are forms we may choose.
In what follows we identify indices r 0000 and r for 1 ≤ r ≤ 4, A and 5, and B and 6. Our remaining task is therefore to construct linear maps  : We fix some notation. Again write X 1 , . . . , X 6 for the points [φ 1 ], . . . , [φ 6 ] in P(V * ). Let Y , Z, X ′ 5 , X ′ 6 be defined as above (see Figure 1); that is, Y is the intersection of the lines X 1 X 5 , Z is the intersection of the lines X 2 X 6 and and we write T = span((0, 1)) for the other summand. We also write V * , T * for the corresponding dual subspaces of V ′ * . Let ξ ∈ V ′ * be the linear form ξ(v, t) = t, meaning span(ξ) = T * .
We make a simplifying observation. If Y = Z, then X ′ 6 = X 5 and X ′ 5 = X 6 , so the effect of the whole block move was just to swap X 5 and X 6 . In this case, the result is trivially satisfied by exchanging the indices 5 and 6 (and ignoring everything we've done up to this point). Hence we can assume Y = Z in what follows.
We isolate a linear algebraic lemma which states concretely what is needed for this TRIVIAL step. Lemma 4.9. There exist subspaces H ′ 5 , H ′ 6 of ℓ + T * (which is itself a subspace of V ′ * ), and linear maps τ 1 , τ 2 , τ 3 , τ 4 : V ′ → V ′ , with the following properties: (i) τ 1 fixes ψ 3 and ψ 4 in the sense that ψ 3 • τ 1 = ψ 3 and ψ 4 • τ 1 = ψ 4 ; (ii) τ 2 fixes ψ 5 and ψ 6 in the sense that ψ 5 • τ 2 = ψ 5 and ψ 6 • τ 2 = ψ 6 ; (iii) similarly, ψ 3 • τ 3 = ψ 3 and ψ 6 • τ 4 = ψ 6 ; Indeed, suppose this lemma holds. We may define which is perhaps best summarized by further annotating the above diagram as follows: Statements (i)-(iii) ensure that  makes sense, i.e. that all the compatibility conditions in the definition of V ′′ are satisfied. By the fact that H ′ 5 , H ′ 6 ⊆ ℓ+T * and statement (viii), for any to complete the definition of ψ ′ 5 , ψ ′ 6 and thereby Ψ ′ . Then, statements (iv)-(vii) are precisely what we need to deduce that ker ψ and as before this guarantees that there exist unique maps ν 5 : W ′ 5 → W A and ν 6 : W ′ 6 → W B such that ψ This is the last ingredient in the proof of Theorem 1.5. We briefly summarize the proof as a whole, as the different parts have been spread over the last few sections.
First one calculates the point [r : s] ∈ P 1 (F p ), given explicitly in Lemma 4.3, corresponding to the point X 5 on ℓ in our chosen coordinates.
Next, we convert the standard datum given into an augmented datum (by Lemma 4.6).
In the main part of the argument, we apply Lemma 4.7 repeatedly, under various permutations of If at any point we arrive at a datum where X i X j X k are collinear for some 1 ≤ i < j ≤ 4 and k = 5, 6, we terminate this process early; but if it runs to completion, some such collinearity is guaranted at the end. By Lemma 4.6 again (and Lemma 4.4) we dominate this by the corresponding standard datum.
Finally, we apply Proposition 2.3, or the standard Cauchy-Schwarz complexity bound (Proposition 1.1), to control this final datum by f 1 1/2 U 2 or f 1 U 2 respectively. By keeping track of the various domination statements, and noting in particular that we did not apply Lemma 4.7 too many times, we deduce the required bound on the original linear datum.

A proof of Theorem 1.7
Here we describe the construction of the counterexample described in Theorem 1.7.
As in the statement, let p ≡ ±1 (mod 8) be a large prime. The congruence condition ensures that 2 is a quadratic residue modulo p. In what follows, we will assume that some choice of square root of 2 in F p has been fixed, and refer to it simply as √ 2.
We let X ⊆ F p denote the two-dimensional arithmetic progression: for some small absolute constant α > 0 to be specified.
Finally we need to consider f U 2 . This is a fairly standard estimate on quadratic exponential sums, but with some variations. For simplicity we use a mean value strategy.