Logarithmic bounds for Roth's theorem via almost-periodicity

We give a new proof of logarithmic bounds for Roth's theorem on arithmetic progressions, namely that if $A \subset \{1,2,\ldots,N\}$ is free of three-term progressions, then $\lvert A\rvert \leq N/(\log N)^{1-o(1)}$. Unlike previous proofs, this is almost entirely done in physical space using almost-periodicity.


Introduction
We shall prove here the following version of Roth's theorem on arithmetic progressions. 1 Theorem 1.1. Let r 3 (N) denote the largest size of a subset of {1, 2, . . . , N} with no non-trivial three-term arithmetic progressions. Then Roth [8] proved this with a denominator of log log N in the 1950s, laying the foundation for using harmonic analysis to tackle problems of an additive nature in rather arbitrary sets of integers. Subsequent improvements were made by Heath-Brown [6] and Szemerédi [13], increasing the denominator to (log N) c for some positive constant c, and then by Bourgain [2,3], obtaining such a bound with c = 1 2 −o(1) and then c = 2 3 −o(1). Sanders [10,9] then proved this with c = 3 4 − o (1) and was then the first to reach the logarithmic barrier in the problem, obtaining c = 1 − o(1). The best bounds currently known were then given by the first author [1], Sanders's result [9] had a power of 6 in place of the 4 here, but the two techniques were quite orthogonal: [1] proceeds by getting structural information about the spectrum of the indicator function of a set A with few three-term progressions, whereas [9] employed a result on the almost-periodicity of convolutions [5] due to Croot and the second author, coupling this with a somewhat intricate combinatorial thickening argument on the physical side.
This article presents a fairly simple proof of logarithmic bounds for Roth's theorem, showing that they follow quite directly from almost-periodicity results along the lines of [5]. Our focus is on clarity of exposition, and we therefore do not take steps to optimise the power of the log log N term that we would obtain.
2010 Mathematics Subject Classification. 11B30; 11K70; 28C10. 1 For details of the asymptotic notation we use, see the next section.

Notation, main theorem and outline of proof
Notation for averaging and counting. The argument proceeds by studying high L p -norms of the convolution 1 A * 1 A of the indicator function of a set A with itself. We use the following conventions for these objects. Let G be a finite abelian group and let f, g : G → C be functions. We define the convolution f * g : G → C by f * g(x) = y f (y)g(x − y).
In considering L p -norms on subsets of G, it will be convenient to sometimes use sums and to sometimes use averages. To distinguish between these, we write, for B ⊆ G, where E x∈B = 1 |B| x∈B . If we write just f p then we mean f L p (G) . As usual f ∞ = sup x∈G |f (x)|. We also write Finally, if A ⊆ B ⊆ G, we write 1 B for the indicator function of B, and µ B for both the function 1 B /|B| and for the measure µ B (A) = |A|/|B|; this latter quantity is known as the relative density of A in B. In the case B = G, this is known simply as the density of A.
Where we have chosen discrete normalisations, the reader who is used to 'compact normalisations' should find comfort in the fact that much of what we shall consider is normalisation-independent. For example, regardless of normalisation-convention, the function 1 A * µ B is always This immediately implies Theorem 1.1, by embedding a subset of {1, . . . , N} into G = Z/(2N + 1)Z in the natural way, so that a (non-trivial) 3AP found in the set in G is also a (non-trivial) 3AP in the original set.
To prove Theorem 2.1, we employ a density increment strategy following the framework of Roth [8].
Density increments. Starting with A ⊆ G of density α, we show that if A has few 3APs then there is a structured part B ⊆ G -in some cases a genuine subgroupsuch that some translate of A has increased density on B: where c > 0. Such a condition is succinctly summarised by 1 A * µ B ∞ (1 + c)α. We then repeat the argument with G replaced by B and A replaced by if A 2 has few 3APs, then we find a new structured piece and a new, denser subset, and repeat the argument. This cannot go on for too long, since the densities can never increase beyond 1. At this point we will have shown that some translate of A has many 3APs, which by translation-invariance of 3APs implies that A itself does.
Outline of argument. Finding the structured piece B and the appropriate translate of A relies on an almost-periodicity result for convolutions that says that 1 A * 1 A is approximately translation-invariant in L p by something like a large subgroup. How we apply this depends on which of two cases we are in. If 1 A * 1 A p is small, where p ≈ log(1/α), then the L 2p -almost-periodicity result is particularly efficient, and has as a straightforward consequence that if T (A) deviates much from α|A| 2 then it must have a density increment on some subgroup-like object B. If, on the other hand, 1 A * 1 A p is large, then, by L p -almost-periodicity, we see that 1 A * 1 A * µ B p must also be large for some group-like B, from which a density increment is immediate.
Asymptotic notation. We employ both Vinogradov notation X ≪ Y and the 'constantly changing constant'. Thus, any statement involving one or more expressions of the form X i ≪ Y i should be considered to mean "There exist absolute constants C i > 0 such that a true statement is obtained when Similarly, any sequence of statements involving unspecified constants c, C should be read with the understanding that there exist positive constants to make the statements true, and that these constants may change from instance to instance. Generally the expectation will be that c 1 and C 1, a device intended to guide the reader.

The finite field argument
As is customary, we begin with a proof in the finite field case, as there are very few technical hurdles here. Our goal is the following density increment result.
Theorem 3.1. If A ⊆ F n q has density α and T (A) α 2 |A| 2 then there is a subspace V with codimension α α −1 such that 1 A * µ V ∞ 5 4 α.
The notation X α Y here means that X ≪ (log(2/α)) C Y .
We prove this result by considering two possibilities: µ A * 1 A 2m is small for some large m, and µ A * 1 A 2m is large for some large m. It clearly suffices to show that both possibilities (combined with T (A) α 3 /2) lead to a suitable density increment.
We will require the following almost-periodicity result. While it is not explicitly given in the literature, the deduction from the almost-periodicity results proved by Croot and the second author [5] is routine, and is given in an appendix.
Theorem 3.2. Let p 2 and ǫ ∈ (0, 1). Let G = F n q be a vector space over a finite field and suppose A ⊆ G has |A| α|G|. Then there is a subspace V G of codimension Proof. Apply Theorem 3.2 with p = 4m and ǫ = α 1/2 /100 to get a subspace V of the required codimension such that by our assumption on µ A * 1 A 2m . Now, if 1/r + 1/4m = 1, Hölder's inequality gives Since µ A * 1 A * 1 −2·A (0) α 2 /2 by assumption, this means that

It remains to convert this upper bound on the average into a lower bound for 1
There are a number of ways to do this, either in Fourier space or physical space; here we present a particularly short method using purely physical arguments.
On the other hand, if µ A * 1 A 2m is very large, then this directly implies a large density increment, without any assumptions on T (A).
Proof. Applying Theorem 3.2 as in the proof of Lemma 3.3, but with p = 2m, there is a subspace V of the required codimension such that 2m + 1 by nesting. Since µ A * 1 A 2m 10α, this is at least 5α, say. Hence and we have a density increment.
The two preceding lemmas together immediately imply Theorem 3.1. A routine iterative application of this theorem then proves the finite field version of Theorem 2.1: we can increase the density as in the theorem at most C log(1/α) times before reaching 1, and so a translate of A must have plenty of 3APs on some subspace of codimension α α −1 .

Bohr sets and L p -almost-periodicity
Following Bourgain [2], the role played by subspaces in the density increment argument above will in general groups be played by Bohr sets, whose basic theory we review below. For proofs of these results, one may consult [14]. Throughout, G will be a finite abelian group, and we write G = {γ : G → C × : γ a homomorphism} for the dual group of G, the group operation being pointwise multiplication of functions. and call this a Bohr set. Denoting it by B, we call rk(B) := |Γ| the rank of B and ρ its radius. 2 We shall often need to narrow the radius: if τ 0, we write B τ = Bohr(Γ, τ ρ).
If furthermore B ′ = Bohr(Λ, δ) where Λ ⊇ Γ and δ ρ, then we write B ′ B and say that B ′ is a sub-Bohr set of B; note that this implies that B ′ ⊆ B as sets.

Lemma 4.2 (Size estimates).
If B is a Bohr set of rank d and radius ρ 2, then One deficit of Bohr sets compared to subspaces is that the number of 3APs in a Bohr set B need not be approximately |B| 2 -the trivial upper bound -as it would be for a subspace. The standard work-around for this is to work with pairs (B, B ′ ) of Bohr sets where B ′ is a radius-narrowed copy of B. Provided B is regular, defined as follows, one then has T (B, B ′ , B) ≈ |B||B ′ |, matching the trivial upper bound.
Note in particular that if B is regular, then |B + B c/ rk(B) | 2|B|, for example. Importantly, regular Bohr sets are in plentiful supply, a fact that we use frequently: Let us now assume that G has odd order, so that the map x → 2x is injective on G. The square-root map is then well-defined on G, and we write γ 1/2 for the unique element in G such that (γ 1/2 ) 2 = γ. We extend this to sets via Γ 1/2 = {γ 1/2 : γ ∈ Γ}. Note that this is compatible with the notation for set-dilation: Lemma 4.6. If B is a Bohr set and τ 0, then In particular, if B is regular, then so is 2 · B.
We shall use the following almost-periodicity result for convolutions that works relative to Bohr sets. While it does not explicitly appear in the literature, it is not a far cry from the combination of the almost-periodicity ideas of [5] with the Chang-Sanders lemma on large spectra as in [4,12]. The main differences are the presence of an L 1 -norm (as opposed to an L 0 -type estimate in [5]) and that the L p -norms are restricted to a Bohr set. We delay the proof of this (and some generalisations) to Section 6.
Theorem 4.7 (L p -almost-periodicity relative to a Bohr set). Let m 1 and ǫ, δ ∈ (0, 1). Let A, L be subsets of a finite abelian group G, with η := |A|/|L| 1, and let B ⊆ G be a regular Bohr set of rank d and radius ρ. Suppose |A + S| K|A| for a subset S ⊆ B τ , where B τ is regular and τ (cδ) 2m /d log(2/δη). Then there is a regular Bohr set T B τ of rank at most d + d ′ and radius at least In particular,

The main argument
We can now describe the main argument. As mentioned in the previous section, we shall work with a pair (B, B ′ ) of Bohr sets, regularity ensuring that B + B ′ ≈ B. We shall correspondingly have a pair (A, A ′ ) of sets, with A ⊆ B and 2 · A ′ ⊆ B ′ , each of relative density at least α. There will then be two cases: 10α, then we apply L 2m (B ′ )-almost-periodicity to get that Assuming that the number of 3APs across (A, α, this tells us that the same thing is true with an extra convolution with µ T , which quickly leads to a density increment.
Large L p -norm of convolution implies density increment. Here we expand upon the first case above, namely the one in which Proposition 5.1. Let G be a finite abelian group of odd order, let B ⊆ G be a regular Bohr set, and let B ′ 2 · B be regular of rank d and radius ρ. If A ⊆ B is a set of relative density at least α with for some m ∈ N, then there is a regular Bohr set T B ′ of rank at most d + d ′ and radius at least Proof. Let ǫ = cα 1/2 , δ = cα and apply Theorem 4.7 with these parameters to the convolution µ A * 1 A , with the Bohr set B ′ in place of B, and τ = (cα) Cm /d chosen so that S := B ′ τ is regular. We then have that Lemma 4.6 and regularity, allowing us to take K = 2/α. This gives us a Bohr set T B ′ of the required rank and radius such that By nesting of L p -norms, the right-hand side here is at least by our choice of ǫ and δ. Thus, provided the constants in these parameters are chosen appropriately, we are done, as Small L p -norm of convolution and few 3APs implies density increment. Here we expand upon how to argue in the case Proposition 5.2. Let G be a finite abelian group of odd order, let B ⊆ G be a regular Bohr set, and let B ′ be a regular Bohr set of rank d and radius ρ with for some m C log(2/α), then either Proof. Either we are in the first case of the proposition, or We now apply Theorem 4.7 to µ A * 1 A with parameters 2m, ǫ = cα 1/2 , δ = cα, the Bohr set B ′ in place of B, and S = B ′ τ with τ = (cα) Cm /d, giving us a Bohr set T B ′ τ of the required rank and radius such that By assumption and choice of parameters, and assuming that µ A * 1 A L 1 (B ′ ) 3 2 α (or else increment) as in the previous argument, we thus have that where the positive constant c may be chosen as small as we wish. Thus, letting q be such that 1/q + 1/4m = 1, Hölder's inequality yields Since m C log(2/α), this is at most 2cα. Picking c small enough thus gives that 1 2 α. We are then done by the following lemma.
In particular, we have the pointwise inequality We now use regularity to estimate the right-hand side for x ∈ B τ . Indeed, where d := rk(B), since B is regular, and furthermore The second term in (5.1) can be bounded trivially: again by regularity. Renormalising (5.1) and picking the implied constant in the bound for τ in the hypothesis small enough, we thus have where c > 0 is as small a fixed constant as we like. Picking c = 1/2, say, makes this bigger than (1 − 2λ 2 )α, as desired.
Remark 5.4. There are several variants of this type of result, converting deviations to increments. Perhaps the most standard one uses Fourier analysis, which gives a slightly better λ-dependence, but this is of no relevance in our application.
If not for the fact that we need to work with the two copies of the set A here, one living in a slightly narrower Bohr set than the other, we could just iterate this proposition to yield the theorem. This is where the following 'two scales' lemma of Bourgain's [2] comes in: it converts a single set A in a Bohr set to two copies of roughly the original density living inside narrower Bohr sets (or else we have a density increment). The lemma is now fairly standard, but we include the proof for completeness.
Proof. Picking the constant c in the radius-narrowing small enough, regularity yields and similarly for B ′′ . Since 1 A * µ B (0) = µ B (A) = α, this implies that With such an x, if we are not in the second case of the conclusion then 1 A * µ B ′ (x) (2 − 1 8 )α − 9 8 α = 3 4 α, and similarly for B ′′ , and so we are done.
Proposition 5.7 (Main iterator). Let G be a finite abelian group of odd order, let B ⊆ G be a regular Bohr set rank d and radius ρ, and let A ⊆ B be a set of relative density at least α. Then either (i) (Many 3APs) T (A) exp (−Cd log(d/α)) |A| 2 , or (ii) (Density increment) there is a regular Bohr set T B of rank at most d + Cα −1 log(2/α) 4 , and radius at least cρα C log(2/α) /d 5 , such that 1 A * µ T ∞ c/d , with small constants c picked so that these are regular. Applying Lemma 5.6 with these sets, we are either done, obtaining a density increment with T being B (1) or B (2) , or else we find an x such that 1 A * µ B (i) (x) 3 4 α for i = 1, 2. In the latter case, we define 3 4 α, and, moreover by Lemma 4.2, |A|.
Note that by translation-invariance of three-term progressions, and if this quantity is at least 3 16 α|A (1) ||A (2) | then we are in the first case of the conclusion. If not, apply Proposition 5.5 with B (1) in place of B, B ′ = 2 · B (2) , which is regular by Lemma 4.6, and A (1) , A (2) in place of A, A ′ , respectively. We must then be in the second case of the conclusion of that lemma, giving us the Bohr set T required in the conclusion, since It is now straightforward to iterate this to prove our main theorem.
Theorem 5.8. Let G be a finite abelian group of odd order, and let A ⊆ G be a set of density at least α. Then Proof. We define a sequence of Bohr sets B (i) of rank d i and radius ρ i , and corresponding subsets A (i) of relative densities α i , starting with B (0) = Bohr({1}, 2) = G and A (0) = A.
Having defined B (i) and A (i) , we apply Proposition 5.7 to these sets. If we are in the first case of the conclusion, we exit the iteration, and if we are in the second case, say Since the densities are increasing exponentially and can never be bigger than 1, the procedure must terminate with some set A (k) with k ≪ log(1/α). By summing the geometric progression, the final rank satisfies d k ≪ α −1 log(2/α) 4 , and the final radius satisfies ρ k exp (−C log(2/α) 3 ). Having exited the iteration, we thus have by Lemma 4.2, as desired.
6. L p -almost-periodicity with more general measures In this section we record some results on the L p -almost-periodicity of convolutions, including a proof of Theorem 4.7. These results have their origins in [5], but since we require a couple of slight twists in the fundamentals of the arguments, we give an essentially self-contained treatment. Our presentation is at a somewhat greater level of generality than needed for the current application; we expect this to be useful for future applications, however, as well as being conceptually illuminating, perhaps. The first few results are phrased in terms of an arbitrary group G, which we view as a discrete group with the discrete σ-algebra when discussing measures. 3 Thus when we work with L p norms restricted to some measure µ on G, we have We take as our definition of convolution and, for a k-tuple a = (a 1 , . . . , a k ), we write µ a = E j∈[k] 1 {a j } .
The following moment-type estimates were essentially proved in [5].
Lemma 6.1. Let m, k 1. Let A, L be finite subsets of a group G, let µ be a measure on G, and denote If a ∈ A k is sampled uniformly at random, then, provided k Cm/ǫ 2 , We include a proof in Appendix B in order to cater for the differences from [5].
Definition 6.2 (Translation operator). Given a function f on a group G, and an element t ∈ G, we write τ t f for the function on G defined by Similarly, if µ is a measure on G, we write τ t µ for the measure given by τ t µ(X) = µ(tX). Thus Definition 6.3. Let ν, µ be two measures on a group G. We say that ν µ if ν(X) µ(X) for every measurable X, that is, if for every integrable f 0.

Definition 6.4 (S-invariant pairs of measures)
. Let ν, µ be two measures on a group G, and let S ⊆ G. We say that (ν, µ) is S-invariant if τ t ν µ for every t ∈ S.
A prototypical example is the pair In the following proof, if X is a subset of a group then we write X ⊗k for the kth Cartesian power of X, in order to distinguish it from the product set X k = X · X · · · X. Theorem 6.5. Let m, n 1, ǫ ∈ (0, 1). Let A, L, S be finite subsets of a group G, and suppose (ν, µ) is an (S −1 S) n -invariant pair of measures on G. Suppose |S · A| K|A|. Then there is a subset T ⊆ S, |T | 0.99K −Cmn 2 /ǫ 2 |S|, such that, for every t ∈ (T −1 T ) n , The main differences between this and the results in [5] lie in the restriction of the norms and in the slight extra care to give an L 1 -norm rather than an L 0 -type estimate.
Proof. Let ǫ 0 = ǫ/2n. By Lemma 6.1 applied with k = Cm/ǫ 2 0 , we get that if a ∈ A ⊗k is sampled uniformly then with probability at least 0.99, Let us call tuples a ∈ A ⊗k satisfying this bound good, so that P a∈A ⊗k ( a is good) 0.99. Now let us write ∆(S) = {(t, . . . , t) ∈ S ⊗k }, and let us identify elements t ∈ S with the corresponding tuple in ∆(S). Define, for each a ∈ ∆(S) · A ⊗k , We now claim two things: firstly, that (T −1 a · T a ) n is a set of almost-periods for any a; secondly, that |T a | is large on average. We begin with the second claim: for each t ∈ S, This was the second claim; we turn now to showing the first.
Fix any a and let T = T a , and for brevity write g = µ A * 1 L . Then, by definition, for t ∈ T we have Now let t 1 , . . . , t n ∈ T −1 T . Then . Carrying on in this way, we have where r j ∈ (T −1 T ) n−j . Consider one of the summands here, with r = r j and t = t j = s −1 1 s 2 for some elements s i ∈ T . We have and so, since T ⊆ S and (ν, µ) is (S −1 S) n -invariant, both of these terms can be bounded as in (6.1). Thus , which proves the claim that the set (T −1 T ) n is a set of almost-periods for µ A * 1 L .
Letting a be some tuple for which T = T a has size at least 0.99K −k |S| yields the theorem.
We now bootstrap this in a standard way using Fourier analysis, making use of the following local version of Chang's lemma on large spectra due to Sanders [11]. Lemma 6.6 (Chang-Sanders). Let δ, ν ∈ (0, 1]. Let G be a finite abelian group, let B = Bohr(Γ, ρ) ⊆ G be a regular Bohr set of rank d and let X ⊆ B. Then there is a set of characters Λ ⊆ G and a radius ρ ′ with |Λ| ≪ δ −2 log(2/µ B (X)) and ρ ′ ≫ ρνδ 2 /d 2 log(2/µ B (X)) such that |1 − γ(t)| ν for all γ ∈ Spec δ (µ X ) and t ∈ Bohr(Γ ∪ Λ, ρ ′ ). Theorem 6.7 (L p -almost-periodicity relative to Bohr-compatible measures). Let m 1 and ǫ, δ ∈ (0, 1). Let A, L be subsets of a finite abelian group G with η := |A|/|L| 1, let B ⊆ G be a regular Bohr set of rank d and radius ρ, and let (ν, µ) be an rB-invariant pair of measures on G, where r C log(2/δη). Suppose |A + S| K|A| for a subset S ⊆ B. Then there is a regular Bohr set B ′ B of rank at most d + d ′ and radius at least Proof. We could deduce a version of this from Theorem 6.5 as stated, working with an intermediate measure ν 2 for which (ν, ν 2 ) and (ν 2 , µ) are invariant, but for a cleaner statement we instead argue directly, picking up where the proof of that theorem left off. Indeed, say we have followed that argument with parameters m, n = ⌊(r − 1)/2⌋ and ǫ/2, thus obtaining a set T ⊆ S with µ B (T ) 0.99K −Cmr 2 /ǫ 2 µ B (S) such that, for each s ∈ nT − nT , X represents the n-fold convolution µ X * · · · * µ X . By the triangle inequality, we then have where we have written s = t 1 +· · ·+t n −t n+1 −· · ·−t 2n in the expectation. We also want this estimate to hold for any translate τ t ν of ν with t ∈ B, which follows from (ν, µ) being (2n + 1)B-invariant: for any t 1 , . . . , t n ∈ T − T and t ∈ B, the bound (6.2) holds with ν replaced by τ −t (ν), and the final measures appearing thereafter in the proof are still dominated by µ, by (2n + 1)B-invariance, meaning that also τ t (g * σ) − τ t g L 2m (ν) ǫ ′ holds for all t ∈ B. Now we carry out the Fourier-bootstrapping in a standard way. By the triangle inequality, we have that, for any t ∈ B, which, by the above, is at most The last term here is at most and it is in bounding this that we shall need to pick t carefully. Indeed, apply Lemma 6.6 to T ⊆ B with parameter δ = 1/2 to get a regular Bohr set B ′ B of rank at most d + d ′ and radius at least such that |1 − γ(t)| δη 1/2 for all γ ∈ Spec 1/2 (µ T ) and t ∈ B ′ .
The main almost-periodicity theorem used in this paper, Theorem 4.7, is a simple corollary of this, using the regularity of Bohr sets through the following lemma. Using regularity at this point is somewhat inefficient quantitatively, adding an extra log log to our final bound for Roth's theorem, but it allows for simpler statements.
Lemma 6.8. Let B be a regular Bohr set of rank d, let δ ∈ [0, 1], and suppose τ cδ p /d. Then, for any F : G → C and p 1, Proof. By the triangle inequality It follows from regularity that |B \ B 1−τ | ≪ τ d|B|, and so the result follows if we choose c small enough.
It is now a short matter to deduce Theorem 4.7, the almost-periodicity result with all the L p -norms being relative to the same Bohr set.
Proof of Theorem 4.7. Let r = ⌈C log(2/δη)⌉ and apply Theorem 6.7 to A and L with parameters m, ǫ, δ/2, the Bohr set B τ in place of B and the rB τ -invariant pair of measures ν = 1 B 1−rτ , µ = 1 B . This gives a Bohr set T B τ of the required rank and radius such that, for each t ∈ T , . Since τ c(δ/2) 2m /dr, the main claim follows from Lemma 6.8. The 'in particular' then follows by averaging and the triangle inequality.

Concluding remarks
In some sense, it should not be altogether surprising that the almost-periodicity arguments of [5] can be used to prove logarithmic bounds for Roth's theorem, as these results were used to reach this barrier in several other related problems, already in [5] but also in [4]. Being able to do this rests on using the more elaborate moment-bounds present in [5] (or in this paper) for the random sampling, rather than the more usual Khintchine-type bounds.
The number of log logs. The argument presented in this paper gives a bound of r 3 (N)/N ≪ (log log N ) C log N with C = 7. One of these log logs is caused by applying Bohr-set regularity to an L p norm with p large, which makes for clean statements but is otherwise quite wasteful. Circumventing this and taking into account some further optimisations allows one to reduce this C, but not to below 4, which is the best bound currently known [1].

Appendix B. Central moments of the binomial distribution
Here we prove Lemma 6.1, a version of the sampling lemma at the heart of the probabilistic approach to almost-periodicity. As mentioned before, it is a variant of results from [5].
Lemma B.1. Let m, k 1. Let A, L be finite measure subsets of a σ-finite locally compact group G, let µ be a σ-finite Borel measure on G, and denote If a ∈ A k is sampled uniformly at random, then, provided k Cm/ǫ 2 , Note that the measures of A and L, the σ-finiteness, and the convolutions are with respect to (left) Haar measure µ G on G. Thus The function µ a * 1 L is to be interpreted as . We remark that although introducing the function f might seem cumbersome, it turns out to be somewhat natural. Note for example that if A = L is a subgroup, the righthand side is actually 0, since then µ A * 1 A = 1 A .
To prove this lemma, we shall use the following bounds for the central moments of the binomial distribution. These are surely standard, but we include a self-contained proof as we have not been able to locate a readily available reference. (We note that they follow from general results on iid random variables, but only after some calculation.) Lemma B.2. Let p ∈ [0, 1] and m, n ∈ N. If X is a Bin(n, p) random variable, with q = 1 − p, then E|X − np| 2m m max(m 2m−1 npq, e m−1 (mnpq) m ).
In particular, if Z = X/n and n 4m/δ, we have The particular constants here could be improved, but are of no consequence to us. Before proving this, let us see how it implies Lemma B.1.
Proof of Lemma B.1. Fix x ∈ G. For a = (a 1 , . . . , a k ) sampled uniformly from A k , we have µ a * 1 L (x) = E j∈[k] 1 L (a −1 j x). This is an average of k Bernoulli random variables 1 L (a −1 j x), each with parameter The sum of these k Bernoulli random variables is a binomial random variable, and so Lemma B.2 (with n = k) implies that Integrating over all x ∈ G with respect to µ and swapping orders of integration using Fubini-Tonelli yields the result.
To prove the above moment bounds, we use a few standard facts about a binomially distributed random variable X ∼ Bin(n, p). Throughout, let µ r = E(X − np) r = n j=0 n j p j q n−j (j − np) r .
The moment generating function of X − np is ∞ k=0 µ k t k k! = qe −tp + pe tq n .
We note that µ r 0 provided p 1/2. Furthermore, formal manipulation of the above power series yields, as noted in [7, §5.5], the recurrence for r 2, which, together with the initial conditions µ 0 = 1, µ 1 = 0 can be used to compute these moments. We use it to bound the moments as follows. The claim thus follows by induction.
The polynomials ν r so defined give the best upper bound possible for µ r that is a polynomial in npq and otherwise uniform in p. We can describe them fairly explicitly: Proposition B.4 (Explicit description of the polynomials ν r ). For r 0, where S 2 (r, k) is a 2-associated Stirling number of the second kind, defined as the number of partitions of a set of size r into k parts, each of size at least 2. In particular, ν r has degree ⌊r/2⌋ and, if r 1, no constant term.
Lemma B.5. For r 0 and k 1, Proof. For r 1 the result is trivial, so assume r 2. We consider the partitions of [r] into k parts, each of size 2. We count these according to how many elements 1 is placed with. If the part containing 1 is to have size n + 1, there are r−1 n choices for the other elements to place with 1, and S 2 (r − 1 − n, k − 1) ways to partition the remaining elements into k − 1 parts, each of size at least 2. Summing up all these (disjoint) ways yields the result.
Proof of Proposition B.4. The recursion in Lemma B.5 shows immediately that the sequence p r = k 0 S 2 (r, k)x k satisfies the recursion defining ν r . Since the initial conditions also match, the sequences are the same.
We next use this combinatorial description to place an upper bound on ν r . Rearranging, this completes the proof.
One could of course be more careful here in order to obtain better constants, but we have no need for it, opting instead for uniform bounds.
Proof of Lemma B.2. The first claim follows immediately from combining Proposition B.3 and Proposition B.6. The second one follows from the first upon replacing the maximum by a sum.