Monochromatic sums and products

Suppose that $\mathbb{F}_p$ is coloured with $r$ colours. Then there is some colour class containing at least $c_r p^2$ quadruples of the form $(x, y , x + y, xy)$.


Introduction and notation
The following beautiful question was asked on numerous occasions by Hindman (see, for example, [14,Question 3]) and is very well-known. Question 1. Suppose that the natural numbers N are finitely coloured. Do there exist x and y such that x, y, x + y and xy all have the same colour?
It follows from Schur's theorem [19,Hilfssatz] that the answer is affirmative if either x + y or xy is omitted from the list. (In the latter case it is an observation of Graham [13, p1] that we can consider the colouring of N induced on {2 n : n ∈ N}.) It is also known that the answer to Question 1 is affirmative for 2-colourings; in fact, every 2-colouring of {1, 2, . . . , 252} contains a monochromatic quadruple of the stated type, as established by Graham [13,Theorem 4.3]. In general, however, Question 1 is quite open, as indeed is the following considerably weaker statement. (See the remarks following [14,Question 3].) Question 2. Suppose that the natural numbers N are finitely coloured. Do there exist x and y, not both 2, such that x + y and xy all have the same colour?
In this paper we are concerned with so-called finite field analogues of the above questions where we replace N by the finite fields F p . Schur's theorem was originally designed for application to F * p so it should be of little surprise that its finite field analogue is routine. Shkredov seems to have been the first to address the finite field analogue of Question 2 in [21, Theorem 1.2] (later generalised by Cilleruelo to any finite field [3,Corollary 4.2]) and in fact he proves rather more, namely the following [21, Theorem 1.2]. Theorem 1.1 (Shkredov). Suppose that A ⊂ F p has size α p. Then there are at least c α p 2 triples (x, x + y, xy) in A 3 , for some c α > 0 which does not depend on p.
(Technically, Shkredov only states that there is at least one such triple, not c α p 2 , but it is obvious that his proof gives this stronger statement.) It follows trivially that if F p is r-coloured and p is sufficiently large in terms of r then there are non-zero elements x and y such that x, x + y and xy all have the same colour. A corresponding result for colourings over Q (or any countable field) was obtained by Bergelson and Moreira [2, Theorem 1.2] using ergodic-theoretic techniques.
In this paper we solve the finite field analogue of Question 1 by establishing the following. Theorem 1.2. Suppose that F p is r-coloured. Then there are at least c r p 2 monochromatic quadruples (x, y, x + y, xy), where c r > 0 does not depend on p.
Note that while Theorem 1.1 is a density result, there is no such version of Theorem 1.2, as can be seen by considering the set {x ∈ F p : p 3 < x < 2p 3 }. This has density roughly 1 3 but does not even contain a set of the form {x, y, x + y}.
Before diving into the remains of the paper it may be useful to say that we discuss the outline of the main argument in §4, after setting up some notation in §3. §2 also includes some notation but more importantly develops the tools for counting the configurations we are interested in. After that the remaining sections fill out the details of §4.
Notation. We use fairly standard asymptotic notation such as O(), o(), and . We write o M;p→∞ (1) to mean a quantity tending to 0 as p → ∞, but in a manner that may depend on the parameter M. Similarly, for example, O ε (1) denotes a constant which may depend on some parameter ε. Throughout the paper F will denote the finite field with p elements (we do not explicitly indicate the prime p). Occasionally we shall write µ F for the uniform probability measure on F. As is standard in additive combinatorics we write E x∈X = 1 |X| ∑ x∈X for averages over some finite set X.

Counting quadruples
Given functions f 1 , f 2 , f 3 , f 4 : F → C we write so that if A ⊂ F then p 2 T (1 A , 1 A , 1 A , 1 A ) is the number of quadruples (x, y, x + y, xy) ∈ A 4 .
The additive Fourier transform and counting sums. The quantity p 2 T (1 A , 1 A , 1 A , 1 F ) is the number of triples (x, y, x + y) ∈ A 3 and this is well understood through the additive Fourier transform. We write F for the dual group of the additive group of F, and given f : F → C and γ ∈ F define the (additive) Fourier transform of f by f (γ) := E x∈F f (x)γ(x).
Writing e p (x) := exp(2πix/p), we know that the elements of F are just the maps of the form x → e p (rx) as r ranges over F and so we shall frequently identify F with F and write f (r) for f (γ) where γ(x) = e p (rx) for all x ∈ F. As usual the key tools are the inversion formula and Parseval's theorem (2. 2) The convolution of two functions f , g : F → C is defined to be f * g(y) := E x f (x)g(y − x) for all y ∈ F.
It is well-known and easy to prove that f * g(r) = f (r) g(r) for all r ∈ F. Usually, the u + 2 -norm is simply denoted u 2 ; we include the plus sign because we shall shortly need the multiplicative analogue. It is easy to see that · u + 2 is a norm and that it is dominated by · 1 . This norm is important because of the following lemma.
Proposition 2.1. Suppose that f 1 , f 2 , f 3 : F → C are such that f 1 2 , f 2 2 , f 3 2 1. Then Proof. This is very standard. By the inversion formula (2.1) we have The stated inequality now follows from the Cauchy-Schwarz inequality and Parseval's theorem (2.2). For example, Multiplicative characters and counting products. The quantity p 2 T (1 A , 1 A , 1 F , 1 A ) counts the number of triples (x, y, xy) ∈ A 3 and this is well understood through the multiplicative Fourier transform. If χ : F * → C is a character, we extend χ to all of F by setting χ(0) = 1. We write F * for the set of all such extended characters and then define the u × 2 -semi-norm of a function f : F → C to be The analogue of Proposition 2.1 is then the following.
By exactly the same analysis as in the proof of Proposition 2.1, but using the (multiplicative) Fourier transform on F * instead of the additive one, we obtain |T (g 1 , g 2 , g 4 )| p p − 1 sup for each i ∈ {1, 2, 4}. The extra factor of p/(p − 1) comes from the fact that g j 2 L 2 (µ F * ) p p − 1 g j 2 L 2 (µ F ) for all j ∈ {1, 2, 4}.
Counting sums and products. In the light of Proposition 2.1 and Proposition 2.2 one might hope that Unfortunately this is not the case, as the following example shows. Given γ ∈ F and χ ∈ F * let Then On the other hand if γ and χ are non-trivial then it may be checked using character sums estimates of the type discussed at the beginning of §7 that f i u +

2
, f i u × 2 p −1/2 . (As this is only a motivating example, the exact details of the proof of this need not concern us.) Write Q(F) for the set of quadratic phases on F, that is to say Q(F) := {x → e p (rx 2 + sx) : r, s ∈ F}.
In the above example, these quadratic phases mixed with multiplicative characters. This suggests the following definition, which is a key definition in our paper. Definition 2.1 (QM-norm). Suppose that f : F → C is a function. Then we define which is easily seen to be a norm.
The reason this is such an important definition for us is the following fact, which is the main result of the section. It says that in a sense these quadratic-multiplicative examples are the only ones affecting the count Then We shall begin the proof of this shortly, but first we must introduce one final norm.
The u + 3 -norm. Higher-order variants of the u + 2 -norm examining correlation with quadratic phases have been widely-studied since the ground-breaking paper [6] of Gowers. We shall only need a fragment of his ideas here. We define the u + 3 -norm by We have the chain of inequalities A key ingredient in the proof of Proposition 2.3 is the following result which, in view of (2.4), is in fact stronger than that result when i = 3.
and such that f 3 satisfies the slightly weaker bounds f 3 2 1 and f 3 ∞ p 1/16 . Then Remark. Of course, the conclusion is valid under the stronger assumption that f 3 ∞ 1, but it will be helpful to allow weaker bounds when applying Lemma 2.6 below.
The proof follows the ideas of [6], although the additional multiplicative structure makes the argument considerably simpler. In particular we make no use of the Balog-Szemerédi-Gowers theorem or any results in the direction of Freiman's theorem. Following Gowers we introduce some notation. For any f : The operator acts as a difference operator in the exponent so that if φ ∈ Q(F) then ∆ h φ is (a constant times) a linear character, and that character itself depends linearly on h. As in [6], we use a fairly straightforward converse of this fact.
Lemma 2.5. Suppose that f : F → C has f 2 1. Then Proof. Let h ∈ F * and r ∈ F be arbitrary. Since h = 0 we can write g(x) := f (x)e p (−rx 2 /2h) and note that g 2 = f 2 1 and that g u + Applying the basic facts of additive Fourier analysis we have The result follows.
Proof of Proposition 2.4. If z ∈ F * then, as (x, y) ranges over F × F, so does (xz −1 , zy). It follows that Averaging over all z, we have By Cauchy-Schwarz and the inequality f 4 ∞ 1 it follows that For fixed z 0 , z 1 , we apply Cauchy-Schwarz to the inner average over x. We obtain Since f 1 ∞ 1 (in fact f 1 4 1 would be enough) this is bounded above by Substituting back in to (2.5), we see that Write y = y 0 and h = y 1 − y 0 . The pair (y, h) ranges uniformly over F × F as (y 0 , y 1 ) does. Evidently If z ∈ F * , we write m z : L ∞ (F) → L ∞ (F) for the operator defined by (m z g)(x) := g(zx) for all x ∈ F. Then and Therefore by (2.6) we have By Cauchy-Schwarz once more (and the fact that f 2 ∞ 1) we have We have arranged the three averages in this way to make the next step clearer. If z 2 0 = z 2 1 then the inner average over y is precisely There are at most 2(p − 1) pairs satisfying the second condition, so we have where in the second line we used the inequality f 3 ∞ p 1/16 , and in the third (the additive) Parseval's identity and the fact that convolution goes to multiplication. Now observe that for any g ∈ L ∞ (F) and z ∈ F * we have Thus It follows from (2.7) that Using Parseval's identity, we see that the contribution to this average from h = 0 is Thus by Parseval's identity (since z ranges over F * ) and interchanging the order of summation we have However, for z ∈ F * we have and hence We are now almost ready to apply Lemma 2.5 to (at last) bound this in terms of the u + 3 -norm of f 3 . To do this, we need only extend the z-average to include z = 0. We have Now we use f 3 ∞ p 1/16 , (2.8) and Lemma 2.5 to obtain which is precisely the stated result.
DISCRETE ANALYSIS, 2016:5, 48pp. Now we deduce Proposition 2.3 itself. A key ingredient in this process is the following decomposition result, reminiscent of results of "Koopman von Neumann" type. There are closely-related results in [7,8,9,10,17]. However, in our particular setting it is not too hard to establish what we need quite directly. Lemma 2.6. Suppose that f : F → C has f 2 1 and that ε 4p −1/8 is a parameter. Then there are complex numbers λ φ , φ ∈ Q(F), such that Proof. Define Write I for the set of φ such that λ φ = 0, and let I ⊂ I. By Cauchy-Schwarz we have |λ φ | 1 for all φ . Using this and the Gauss sum estimate On the other hand from the definition of the λ φ s and the Cauchy-Schwarz inequality we have Comparing this with (2.9) (and using f 2 1) gives This is true for all I ⊂ I. Since |λ φ | ε/2 for all φ ∈ I, it follows that With the assumption that ε 4p −1/8 if follows that if |I| 8/ε 2 then we can take m ∈ N with 8/ε 2 m < 8/ε 2 + 1 contradicting the above inequality. Therefore Taking I = I in (2.9) and (2.10), then using (2.11), we have (2.12) Taking I = I in (2.10), then using (2.11), we have It follows from this and the Gauss sum estimate that if φ ∈ I then Taken together, (2.12), (2.13), (2.14) and (2.15) cover all the statements we claimed, and the proof is complete.
We are finally ready for the proof of Proposition 2.3, the main result of this section. Set ε. By Proposition 2.4 we have, since ε 8p −1/16 , provided p is sufficiently large absolutely which we may certainly assume. Thus and hence that there is some φ ∈ Q(F) for which Suppose that φ (t) = e p (at 2 + bt), and write g 1 (t) := f 1 (t)e p (at 2 + bt), g 2 (t) := f 2 (t)e p (at 2 + bt) and g 4 (t) := f 4 (t)e p (2at). Then By Proposition 2.2 it follows that inf i∈{1,2,4} and so inf i∈{1,2,4} g i QM δ 5 .
This is precisely (2.16), so the proof is concluded.
To conclude this section we state Proposition 2.3 in a qualitatively equivalent form, more useful for our later applications.

QM-systems and related concepts
We begin by defining the notion of a QM-system, an important concept in our paper. As is fairly standard, we write S 1 := {z ∈ C : |z| = 1}.
We shall also require a non-standard piece of notation: define The group G is of course abelian, but it is convenient to use juxtaposition for the group operation; thus . We shall often be considering the product group G d , and we use the same convention there.
where a i ∈ F and ψ i ∈ F * . Recall from §2 that we extend ψ i to all of F by setting ψ i (0) = 1.
Given a QM-system Ψ it will be helpful to write Ψ ⊃ Ψ to mean that Ψ extends Ψ in the sense that Ψ has dimension d d and An important fact for us is that the "orbit" {Ψ(x) : x ∈ F} is highly equidistributed inside a certain closed subgroup of G d . To explain what this group is, we need to make some definitions. Let Ψ be a QM-system as above. Then we associate to Ψ the sublattices of Z d , and so both Λ + Ψ and Λ × Ψ have full rank. With these lattices defined we introduce the closed subgroups These two groups G + Ψ and G × Ψ are both closed subgroups of compact groups ((R/Z) d and (S 1 ) d respectively) and as such they carry natural Haar probability , and put the natural probability measure on this group. By abuse of notation, we regard this as a probability measure on G d as well (which is permissible, as H Ψ is a subgroup of G d ). Note that Ψ(F) ⊂ H Ψ . It turns out that Ψ(F) is close to being equidistributed in H Ψ . To formulate this fact in a convenient form (Proposition 3.1 below), we need a further definition.
We call this a norm, although "measure of complexity" would be more accurate. We have as well as the shift property that if we define We call those functions F : G d → C with finite trig norm trigonometric polynomials. They are dense in C(G d ) (with the sup norm). This follows from the Stone-Weierstrass theorem: the trigonometric polynomials form an algebra and contain the characters, hence separate points. (This could also be established directly using harmonic analysis.) We turn now to the promised equidistribution statement. We call it the "baby counting lemma" as it is a simpler cousin of one of the key ingredients of our work which we shall call the counting lemma.
Proposition 3.1 (Baby counting lemma). Let Ψ be a QM-system of dimension d, and let F : G d → C be a trigonometric polynomial. Then we have The proof of Proposition 3.1 relies on estimates for character sums twisted by quadratic phases. We recall what we need on this topic at the beginning of §7, and give the proof of Proposition 3.1 later in that same section.
Since S 1 already has a notion of absolute value inherited from C, we have to be slightly careful and for z ∈ S 1 write where 1 2πi log z naturally takes values in R/Z and so we can use the notation in (3.3). These combine in the obvious way for G so that Thus | · | is used in several different ways in our paper and in any given situation its meaning must be inferred from context. This should not be difficult.
The last piece of notation we need is for boxes in G d : given ε > 0 define The group H Ψ may look rather complicated with respect to | · | (on G d ). (Informally, it may "wind around" G d a very large number of times). However, we do at least have some control, as shown in Lemma 3.3 below. We deduce that lemma from the following more general fact which will be useful in §6.

Lemma 3.2.
Suppose that G is a compact abelian group with Haar measure µ G , and that π : Taking x = π(g) and integrating over G, we obtain Hence there is some θ such that Note that this proof is really just the same as that of [25,Lemma 4.19].
Proof. Apply the previous lemma with π being the restriction to H Ψ of the natural homomorphism from G d to (R/Z) 3d .
A useful corollary of Proposition 3.1 and Lemma 3.3 is the following assertion.
then the following is true.
For every d-dimensional QM-system Ψ and h ∈ H Ψ we have Proof. We use the fact that the trigonometric polynomials are dense in • F 0 0 outside X(ε), the box defined in (3.4); • F 0 1 on X(ε/2); • F 0 2 on G d and the second equality being a consequence of the invariance of Haar measure. Since F 0 1 on X(ε/2) and F 0 −ε everywhere, it follows from Lemma 3.3 that . It remains to set This function p 1 is monotonic in the desired sense, and the result follows since there is some j ∈ Z 0 with ε 2 − j ε/2 and Monotonic functions. By invoking the Stone-Weierstrass theorem we have taken a soft approach to approximating intervals by trigonometric polynomials in Corollary 3.4. This adds a slight technical complexity because the constant behind the O ε,d (1) term in the proof could, in principle, depend in a very peculiar way on ε and d. This complexity will crop up in other parts of the argument and we shall deal with it by ensuring that "universal functions" (in the above case p 1 ) are monotonic in a suitable sense.
We take the following convention: all our "universal functions" will be of the form f : The two places we have encountered monotonicity so far are in Corollary 3.4 where p 1 is monotonic in the above sense and Corollary 2.7 where ν is monotonic.
Finally we come to a crucial definition in the paper. It should not come as a surprise that we have an analogue of the usual lower bound for the size of Bohr sets [25,Lemma 4.19].

Structure of the main argument
We turn now to a discussion of the basic form of the rest of the argument. The basic strategy is to work by induction on the number of colours, but to get this to work effectively we must establish a more general statement. It may be useful to recall the notion of monotone function we are using from the discussion after Corollary 3.4. Remarks. This immediately gives our main result. One may think of c as an "almost" colouring (or more precisely a (1 − η(δ , d, r))-almost r-colouring). The nature of our inductive arguments necessitates the consideration of almost colourings in addition to true colourings.
Proof of Theorem 1.2. Apply Proposition 4.1 with d = 0, δ = 1/2 andB = B = F. Since (0, 0, 0 + 0, 0 · 0) is always monochromatic we see that every colouring contains at least 1 monochromatic quadruple. It follows that the total number of monochromatic quadruples is at least The result is proved.
The proof of Proposition 4.1 combines three fairly substantial pieces: a regularity lemma, a counting lemma, and a Ramsey-theory result. These three ingredients and their proofs can be understood more-or-less independently, the only common features being the language of §3.
Regularity lemma. The basic principle of the regularity lemma is that it allows one to replace an arbitrary colouring of F by one that is induced from a "nice" colouring of This is a well-trodden idea which of course goes back to Szemerédi [23]. Perhaps closer to our particular instance is the arithmetic development in [11,Theorem 5.2]; the probabilistic framing in [24,Theorem 2.11]; and the combination in [12, Theorem 1.2].
Suppose that Ψ is a d-dimensional QM-system of width δ , B := B(Ψ, δ ),B ⊂ B has |B| (1 − ε 2 100 )|B| for some parameter ε ∈ (0, 1], and c :B → [r] is an r-colouring ofB. Then there is a QM-system Ψ ⊃ Ψ of some dimension d , functions F 1 , . . . , F r : G d → R 0 , and functions g 1 , . . . , g r : The proof of the regularity lemma involves an "energy increment" argument of a type familiar to experts, but some of the technical details are a little tricky. The proof is given in §5.
Counting lemma. As usual we complement the regularity lemma with a counting lemma. There is always a trade-off between the effort needed to prove a regularity lemma and its companion counting lemma; in our work the former contains essentially all of the difficulties.
The basic idea of the counting lemma is that if we have a colouring on F pulled back from a colouring of G d under a QM-map (as provided by the regularity lemma) then the number of monochromatic quadruples x, y, x + y, xy in F is related to the number of monochromatic copies of a certain type of linear configuration in G d : specifically, if y is constrained to lie in a small QM-Bohr set then Ψ(x), Ψ(x + y), Ψ(xy) correspond to triples of the form (t, u, v), (t + u , u, v ), (t , u , v). To get an idea of why this is so consider the case when d = 1. In this case we can put and the colouring of F is pulled back from a colouring of G. If Ψ(y) ≈ id G = (0, 0, 1) then the remaining three points Ψ(x), Ψ(x + y), and Ψ(xy) have constraints resulting from the identities (where the quotation marks in the last are because our characters χ are not 0 at 0), which imply that This tells us that Ψ(x), Ψ(x + y), Ψ(xy) have approximately the form (t, u, v), (t + u , u, v ), and (t , u , v) respectively. The counting lemma is a much stronger statement than this, essentially asserting that the above is the only type of constraint that occurs.
The counting lemma is established using harmonic analysis in §7. It relies on bounds for certain character sums (mixing quadratic phases and shifts of multiplicative characters), which in turn use some fairly deep number-theoretic inputs. These issues are discussed at the beginning of §7.
Ramsey lemma. Finally, we need a result of a Ramsey-theoretic nature. The counting lemma above transfers our nonlinear question about quadruples x, y, x + y, xy to a question about linear configurations, but we must still solve this linear problem. A toy version of the result we need is the following: if G is a sufficiently large abelian group and if c : . We do indeed prove such a statement, but again we need something a little more complicated for the purposes at hand, designed to dovetail with the conclusions of the counting lemma.
Proposition 4.4 (Ramsey lemma). There is a monotonic function ρ : Z 0 × Z 0 × (0, 1] → (0, 1] with the following property. Suppose that X and Y are compact Abelian groups with Haar probability measures µ X , µ Y , that π X : X → (R/Z) d and π Y : Y → (R/Z) d are continuous homomorphisms, and that F 1 , . . . , F r : This result is established in §6. We again proceed by induction on the number of colours (thus, taken as a whole, our paper has two nested inductions on the number of colours). The basic scheme of the argument is inspired by work of Cwalina and Schoen [4] on Rado's theorem, but the details are quite different. We make critical use of the "dependent random choice" technique pioneered by Gowers [6].
Proposition 4.1. We shall shortly give the proof of Proposition 4.1, assuming the regularity, counting and Ramsey lemmas. First we note that the counting and Ramsey lemmas may be combined to give the following statement. . Then for any sets S 1 , . . . , S r with S i ⊂ B(Ψ , κ(r, d, δ , F i trig )) and µ F (S i ) σ there is some i such that provided that p p 2 (M, d , r, σ , δ ). (Where ρ is as in the Ramsey lemma, Proposition 4.4.) Remark. A crucial point here is that neither κ nor ρ depends on d . If they did, our arguments would be circular.
Proof. Apply the counting lemma (Proposition 4.3) to each of the F i s and the QM-system Ψ to get that trig ) for some absolute c > 0, which satisfies the relevant monotonicity properties since ρ does.
Let ε := δ /12πrd (1 + M) 2 ; the reason for this choice will become clear later. We should like to apply the Ramsey lemma to which end we put X := G + Ψ and Y := G × Ψ then H Ψ = X × X × Y . Moreover, if we write π X : G + Ψ → G + Ψ , and π Y : G × Ψ → G × Ψ for the respective projections onto the first d-coordinates then these are well-defined continuous homomorphisms, as is π : Two things follow from this. First, |π(h) −1 π(Ψ (z))| ε, so if |π(h)| δ /4 then |Ψ(z)| 1 4 δ + ε 1 2 δ by the triangle inequality i.e. z ∈ B(Ψ, 1 2 δ ). Secondly, it is easy to see that the functions F i are Lipschitz in the sense that so they are continuous, but we also have We conclude from these two facts and the hypothesis that We now apply the Ramsey lemma (Proposition 4.4) to the functions 2F i to get that for some i ∈ [r]. This gives the result provided which is easily seen to be monotone. The proposition is proved.
Finally, we record a fairly simple application of the Cauchy-Schwarz inequality. Then Proof. By the Cauchy-Schwarz inequality we have The proofs for i ∈ {3, 4} are very similar.
We are now ready for the proof of Proposition 4.1.
Suppose first that |S i | 1 2 η(δ i , d , r − 1)|B i | for some i. Then by the lower bound for the density of QM-Bohr sets (Lemma 3.5); the definition of η; the fact that d D r,d,δ and δ i δ ; and the monotonicity of η(·, ·, r − 1) and β , we have It follows that c restricts to an (r − 1)-colouring of B i \ (B ∪ S i ) where, by the triangle inequality, we have By the inductive hypothesis and monotonicity of ζ (·, ·, r − 1) we conclude that there are at least pairs (x, y) with x, y, x + y, xy all the same colour. Thus in this case the result is proved.

The second case is that |S
by the lower bound for the density of QM-Bohr sets (Lemma 3.5), the monotonicity of η(·, ·, r − 1) and β , and the fact that d D r,d,δ and δ i δ .
Note that by (6) and the triangle inequality we have Thus by the triangle inequality, Lemma 4.6 and item (3) above we have By the choice of ε, and provided Of course by (4.1) we see and so (4.2) holds provided p p 0 (r, d, δ ).
It follows from the hypothesis on the size of the S i s, and the lower bound on the density of QM-Bohr sets that By the generalised von Neumann theorem (Corollary 2.7), monotonicity of ν −1 , the choice of Ω and (4) above, this implies that By monotonicity of Ω r,d,δ and the fact that d D r,d,δ and F i trig M r,d,δ , this follows from p p 0 (δ , d, r) as defined. By the monotonicity of η(·, ·, r − 1) and β , and the pointwise inequality 1 S i 1 c −1 (i) we conclude that The result is proved given the choice of ζ .
Remark. The above argument is not straightforward, in particular with regard to checking that the parameters do not depend on one another in a circular manner. A different way to arrange the arguments might be via the use of an ultraproduct. However, this introduces a considerable amount of additional language, which propagates out to other sections as well. Therefore, even though on some conceptual level an ultraproduct formulation could be the "right" way to phrase the argument, we have chosen not to follow this route.

Proof of the regularity lemma
In this section we prove the regularity lemma, Proposition 4.2. The reader may wish to recall its statement. The proof proceeds using an "energy-increment" argument of a type that will be familiar to experts. However, it takes some effort to sort out the technical details specific to our situation. Let d be a non-negative integer (which d we are talking about at any given point will be clear from context). We begin by describing a partition of G d into certain boxes. Let R > 0 be a power of 2. Suppose that t, u, v ∈ {0, 1, . . . , R − 1} d . Then we define generalised intervals It is worth making a few remarks about these sets.
(i) For each R the set (ii) We restrict R to be a power of 2 as a technical convenience, to ensure that if R > R then I R is a refinement of I R .
(iii) The √ 2 here is present as a technical device to help control edge effects later on; it has the usual property of being poorly approximated by rationals. The only point in the argument at which it is relevant is in the proof of estimate (5.6), a technical point in the proof of Lemma 5.3.
Let Ψ = (a i x 2 /p, 2a i x/p, ψ i (x)) d i=1 be a QM-system of dimension d. Suppose that R > 0 is a power of 2. Where Ψ is clear like this we write A R;t,u,v := Ψ −1 (I R;t,u,v ) for each t, u, v ∈ {0, . . . , R − 1} d .
The sets {A R;t,u,v : t, u, v ∈ {0, . . . , R − 1} d } form a partition of F (possibly with the addition of the empty set) since I R is a partition of G d and, in particular, generate a σ -algebra B. We define the associated projection operator at resolution R to be where A(x) is the atom containing x. As with any conditional expectation operator, Π Ψ R is self-adjoint. Indeed and the kernel 1 A(x) (x ) 1 |A(x)| is symmetric in x and x so this also equals E( f |B), g .
Lemma 5.1. Suppose that f has f ∞ 1 and f QM δ . Let R Cδ −1 be a power of 2. Then there is a QM-system Φ of dimension at most 2 and a function g ∈ L ∞ (F) with g ∞ 1, such that | f , Π Φ R g | δ .
Remark. The functions of the form Π Φ R g where g ∞ 1, are precisely those functions which are bounded by 1 in modulus and are constant on atoms A R;t,u,v .
Proof. It follows from the definition of the QM-norm (Definition 2.1) that, under the hypotheses on f , there are a 1 , a 2 and a multiplicative function ψ such that Then Now since F is quite a smooth function, we have g ≈ Π Ψ R g. More precisely, since the Lipschitz constant of F is O(1). It follows from this and (5.1) that if R > Cδ −1 with C large enough then | f , Π Φ R g | δ , and the result follows.
The next lemma is a result of "Koopman von Neumann" type. In establishing this we shall need the following property of the projection operators Π Ψ R : if Ψ ⊃ Ψ and R|R , then This is because each atom associated to Π Ψ R is a union of atoms associated to Π Ψ R . Lemma 5.2. Suppose that f 1 , . . . , f r : F → C are such that f i ∞ 1 for all i ∈ {1, . . . , r}. Let Ψ be a QMsystem, and let R Cδ −1 be a power of 2. Then there is a Proof. Define a nested sequence Ψ =: Ψ 0 ⊂ Ψ 1 ⊂ Ψ 2 ⊂ . . . of QM-systems with dim Ψ j = dim Ψ + 2 j in the following manner. For j = 0, 1, 2, 3, . . . , proceed as follows. Set f i, j : . . , r} then stop; otherwise, by Lemma 5.1, there is an i ∈ {1, . . . , r}, some QM-system Φ of dimension 2 and a function g ∈ L ∞ (F) with g ∞ 1, such that | f i, j , Π Φ R g | δ . In this case, set Ψ j+1 := Ψ j ∪ Φ and note by idempotence of Π Φ R and (5.2) we have By the Cauchy-Schwarz inequality we conclude and so (expanding out the L 2 -norm square and using (5.2)) it follows that Hence, defining the energy As long as this process continues, we obviously have the trivial bound E j r. It follows that the process terminates after at most O(rδ −2 ) steps, and the result follows. For all f : F → [0, 1], d-dimensional QM-systems Ψ, and parameters R > 0 a power of 2, and ε ∈ (0, 1] there is a function F : Proof. Set For each t, u, v ∈ {0, 1, . . . , R − 1} d we define an "η-enlargement" and an "η-reduction" of I R;t,u,v by We then define the "boundary" to be and make two claims: We shall establish these claims later, but give the rest of the proof first. Since the functions on G d with bounded trig norm are dense in C(G d ) we see that there are functions F R;t,u,v : G d → R and a function M * with the following properties: 1. 0 F R;t,u,v 1 + ε/10R 3d pointwise; 2. F R;t,u,v (z) 1 for all z ∈ I R;t,u,v ; 3. |F R;t,u,v (z)| ε/10R 3d for all z ∈ I + R;t,u,v ; 4. F R;t,u,v trig M * (ε, d, R).
The function M * need not be monotonic but this can be quickly fixed. For reasons that will become clear later we shall in fact put which is monotonic.
The function Π Ψ R f is constant on atoms A R;t,u,v . If A R;t,u,v is non-empty then write λ R;t,u,v for the value of Π Ψ R f on this set and if A R;t,u,v = / 0 we set λ R;t,u,v = 0, so that the λ R;t,u,v s are all non-negative. Then we define It follows that Π Ψ R f = F 0 • Ψ, and since the F R;t,u,v s and the λ R;t,u,v s are non-negative, by (2) above we have We now show that F satisfies the relevant properties.
By Claim B we have that z ∈ I + R;t ,u ,v for any (t , u , v ) = (t, u, v). Hence by (3) we have |F R;t ,u ,v (z)| ε/10R 3d whenever (t , u , v ) = (t, u, v), and so On the other hand, if z ∈ E then we just have the trivial bound

It follows that
and we have (iii) by Claim A.
(iv) Finally, using the above we have from which (iv) follows.
A suitable choice of p 3 can be made given the definition of η and the hypothesis of Claim A. We now turn to establishing the claims.
and so it is enough to establish the following statements for any a ∈ F * , non-trivial ψ ∈ F * and for any interval J ⊂ R/Z of the form J = j R + and #{x ∈ F : The claim then follows from allowing j, a, ψ to range over all choices from the sets {0, . . . , R−1}, {a 1 , . . . , a d } and {ψ 1 , . . . , ψ d } respectively. Of course we must still establish (5.4), (5.5) and (5.6).
Proof of (5.6). The image of F under 1 2πi log ψ is {0, 1 Q , . . . , Q−1 Q } for some Q, and each point is hit the same number p−1 Q of times as x ranges over F * . The number of the points {0, 1 Q , . . . , Q−1 Q } lying in J is at most 1 + 2ηQ, and so Without a lower bound on Q, this is useless. To obtain a lower bound on Q, we assume that there exists at least one x ∈ F * for which 1 2πi log ψ(x) ∈ J. (Otherwise (5.6) really is trivial, provided p Cη −1 .) Suppose that for this point we have 1 2πi log ψ(x) = q Q . Then we have q Q − j R − √ 2 η, and so QR Proof of Claim B: We apply the triangle inequality for each j ∈ {1, . . . , d} to get Since −(R − 1) t j − t j R − 1 it follows that t j = t j for all j ∈ {1, . . . , d} and similarly for u and u , and v and v . The claim is proved, and this concludes the proof of Lemma 5.3.
We are now ready for the proof of the regularity lemma itself. With these definitions in hand we are ready for the proof. Suppose that p p Ω (r, d, δ , ε). Suppose, further, that Ψ is a d-dimensional QM-system of width δ , that B := B(Ψ, δ ), thatB ⊂ B has |B| (1 − ε 2 100 )|B|, and finally that c :B → [r] is an r-colouring. We extend c :B → [r] to a full colouring c : B → [r] in some arbitrary way such that On the jth such application we apply Lemma 5.2 with parameter R j , obtaining nested QM-systems for all i ∈ {1, . . . , r}. It is convenient to write P j := Π Ψ j R j for short; it is also convenient to write B j for the σ -algebra on F generated by Thus P j = E(·|B j ). Note that, since Ψ j ⊂ Ψ j+1 and R j |R j+1 (since R j and R j+1 are powers of 2), B j+1 is a refinement of B j and hence P j = P j P j+1 . By Pythagoras' theorem we have so by the pigeonhole principle and the choice of J there is some j J such that Fix this value of j, and set d j := dim Ψ j (thus d j d max j ). Let l ∈ N be minimal such that 2 −l ε, and apply Lemma 5.3 to f i for each i ∈ {1, . . . , r}. Since We are now ready to describe the functions F 1 , . . . , F r , g 1 , . . . , g r in the regularity lemma. Set d := d j , , and for all i ∈ [r]. Note with this choice that the g i s all map into [−1, 1] as required. We claim that these functions F i and g i do indeed verify (1) to (6) of the regularity lemma; we check these points in turn.
Point (1). This is immediate from (ii) above since Point (2). This is immediate from (5.10) since that tells us Point (3). We know from point (iii) above, (5.12), and (5.9) above that By the choice of δ j+1 , this is at and Ω is monotonic, it follows that this is at most 1/Ω(r, d, δ , F i trig , d ) and (4) follows.

Point (5). By design we have
On the other hand, by (i) we have F j i • Ψ j P j f i pointwise, and so summing over i we have since P j is linear. However, since R j 2δ −1 every atom of B j which meets B(Ψ, δ /2) is entirely contained in B, and so P j 1 B 1 pointwise on B(Ψ, 1 2 δ ). Point (6). This follows immediately from (iv).
The result is proved.

Some results in Ramsey theory
In this section we state and prove some auxiliary results of a Ramsey-theoretic nature, the main aim being to establish Proposition 4.4. A model for the type of result we are interested in is the following: if N × N is finitely coloured, then there is a monochromatic triple of distinct elements (t 1 , u), (t 2 , u), (t 3 ,t 2 − t 1 ). Such a result is certainly true, and can be easily proved as follows.
Given an r-colouring of [N] 2 , consider the colouring induced on {(1, 1), . . . , (N, 1)} which is certainly at most an r-colouring. By van der Waerden's theorem [26] there is a monochromatic arithmetic progression If there is some t 3 such that (t 3 , jd) has the same colour as ∆ for some 1 j M then we have a suitable monochromatic triple, namely (x, 1), (x + jd, 1), (t 3 , jd). Of course this result is just the tip of the iceberg, and there is a vast generalisation available in recent work of Bergelson, Johnson, and Moreira [1]. Indeed, [1,Corollary 3.7] contains the above as a special case by taking (in the language of that result) G := N 2 0 , m := 1, c : G → G to be the identity, and letting F 1 be the set containing the two maps While these extensions are clearly interesting we need to generalise our model in a different direction. In particular, we need not just one monochromatic triple but many triples. A result of this type extending Rado's theorem was established in [5,Theorem 1], and extended to the case of torsion groups in [20,Theorem 1.3]. In particular it is worth noting that [20, Section 7.1] identifies some difficulties that emerge in the presence of torsion.
We turn, now, to our arguments. Let us recall the statement we are aiming to prove.
We shall prove this via the following result.
Proposition 6.1. Suppose that G is a compact Abelian group, that T ⊂ G is open, and that A 1 , . . . , A r are the measurable colour classes of some r-colouring on T × (T − T ). Then Here, µ T := µ G (T ) −1 µ G , where µ G is the normalised Haar measure on G which makes sense since T is open and so is measurable and of positive measure.
We have written the bound here explicitly, first to show that it may be taken to be monotonically decreasing in r, and secondly because it is not too far from the right order. Indeed, suppose G = F r 2 and we have a (2r + 1)-colouring of the pairs (t, u) ∈ G × (G − G) with colour classes defined by where in both cases i ranges over {1, . . . , r}. There are no triples (t, u), (t , u), (t ,t + t ) all lying in A i for any i > 0 since this would imply that t i = t i and so (t + t ) i = 0, a contradiction. Hence all the monochromatic triples lie in A 0 , which implies that their total measure is at most 4 −r .
Before turning to the proof of Proposition 6.
the last step here being a consequence of the Cauchy-Schwarz inequality. Let B ⊂ X × X be the set of pairs (t, u) such that |π

By hypothesis we have
and so by averaging It follows that the A i restrict to give an r-colouring of T × (T − T ), where T := {x ∈ X : |π X (x)| 1 8 δ }. Since µ X (T ) (δ /8) d by Lemma 3.2, the claim now follows from Proposition 6.1.
The remaining task of the section, then, is to establish Proposition 6.1. A key ingredient of this is a "dependent random selection" result of the type pioneered by Gowers [6]. It is given in almost this specific form as [25,Lemma 6.17] the proof of which is itself contained in [22,Lemma 4.2]. We need a weighted version of their result so we provide a self-contained proof, though the argument is more-or-less identical.
Lemma 6.2. Suppose that (X, ν X ) and (Y, ν Y ) are probability spaces, A ⊂ X ×Y is measurable with (ν X × ν Y )(A) = α, and let η ∈ (0, 1] be a parameter. Then there is a measurable set X ⊂ X with ν X (X ) 1 2 α such that the set In words, a (weighted) proportion 1 − η of the "edges" in X have many common neighbours in Y .
Proof. For x ∈ X, write N Y (x) := {y ∈ Y : (x, y) ∈ A} and for y ∈ Y write N X (y) := {x ∈ X : (x, y) ∈ A}. By Fubini's theorem we have with both sides being equal to By the Cauchy-Schwarz inequality and the identity we see that By definition of E we have Putting this together with (6.3), we see that or in other words (by Fubini's theorem) In particular, there is some specific choice of y for which (N X (y) is measurable and) For this y set X := N X (y), and we have both whence ν X (X ) 1 2 α, and In what follows we shall be working in products T (t 1 ,t 2 ,t 3 ,t 4 ,t 5 ).
In this notation, Proposition 6.1 may be restated as follows: if c : T × (T − T ) → [r] is a colouring then there is some i ∈ [r] for which Λ T (c −1 (i)) r −O(r) . We shall establish this by induction on the number of colours (one of two places in our paper where we do this, the other being in the proof of Proposition 4.1). To carry this out we need to prove a slightly stronger statement.
Proof. We proceed by induction on r, the result being vacuously true when r = 0 since ε 0 = 1 2 and there are no 0-colourings of non-empty sets. Suppose we know the result for r − 1, and that we have a partial r-colouring of T × (T − T ) as described.
Certainly ε r < 1 2 , so by the pigeonhole principle there is some i for which δ T (A i ) 1/2r where A 1 , . . . , A r are the colour classes of c. We shall apply Lemma 6.2 with X = T , Y = T − T , A = c −1 (i) and with η = 1 4 ε r−1 . Let µ X be the probability measure induced on T by the Haar probability measure µ G and let the measure on Y be given by µ Y := µ T * µ −T . The lemma outputs a measurable set T (that is, X in the lemma) with µ T (T ) 1/4r and and a measurable set Z ⊂ T × T consisting of a proportion at least 1 − 1 4 ε r−1 of all pairs (t 1 ,t 2 ) in T × T such that Suppose in the first instance that δ T (A i ) 1 2 ε r−1 . Then The alternative is that δ T (A i ) 1 2 ε r−1 . Noting that This concludes the proof.

The baby counting lemma and the counting lemma
The objective of this section is to prove the baby counting lemma from §3 and the counting lemma from §4, the first of these acting as a kind of warm up to the second.
The proofs require various lemmas on exponential sums with characters, and we begin by assembling these. There is a great deal of general theory on this topic; see, for example, [15,Chapter 11]. We develop just what we need for our application, namely Proposition 7.2 below, eschewing any temptation to seek the strongest available bounds. In fact, a bound of o(1) times the trivial bound is all we need.
Shkredov [21] makes use of the following result from Johnsen [16] which he (Shkredov)  Remark. Note, of course, that our multiplicative characters are extended to the whole of F in a slightly different way to usual since they are 1 at 0. This can only add an error of size t in the sum in Theorem 7.1 which will be of no consequence to us.
With this in hand we are in a position to prove our key proposition.
Proof. Suppose first that χ = χ = 1. Then the sum reduces to E x e p (ax 2 + bx), and it follows immediately from the standard Gauss sum estimate that this is bounded by O(p −1/2 ) unless a = 0; if a = 0 then the sum is 0 by orthogonality of characters unless b = 0. Suppose, then, that we do not have by the Gowers-Cauchy-Schwarz inequality (see, for example, [25,Equation (11.6)]; here · U 3 is the Gowers U 3 -norm on F, discussed in many places including [25,Chapter 11], and C is the complex conjugation operator). Thus it suffices to establish that F U 3 = o(1), uniformly in χ, χ and h; we shall in fact establish the stronger bound F U 3 = O(p −1/16 ). In other words, we shall prove that where ω · z := ω 1 z 1 + ω 2 z 2 + ω 3 z 3 . Since h = 0, there are O(p 2 ) triples z ∈ F 3 such that there are some ω, ω ∈ {0, 1} 3 with h + ω · z = ω · z. For all other triples we may apply Theorem 7.1 to see that

Equation (7.1) follows immediately, and so does the proposition.
This concludes our discussion of character sum estimates. We turn now to the proofs of the counting lemma and baby counting lemma. Before proceeding it may help the reader to recall some of the definitions from §3, particularly the definition of the trig-norm, Definition 3.2. We begin by recalling some basic facts about duality. Lemma 7.3. Suppose that Λ ⊂ Z d is a lattice of full rank. Write G + := {g ∈ (R/Z) d : ξ · g = 0 for all ξ ∈ Λ}. Suppose that λ ∈ Z d satisfies λ · g = 0 for all g ∈ G + . Then λ ∈ Λ.
(In fact, the full rank hypothesis is unnecessary, but it is satisfied in our applications.) An easy consequence of this is the following.
By the orthogonality relations (7.2) and (7.3), this is precisely the left-hand side of (7.5).
Remark. We did not make any use of the fact thatF was supported where ξ 1 1 , ξ 2 1 , ξ 3 1 M in this argument, but this will be important in the next argument.