Popular progression differences in vector spaces II

Green used an arithmetic analogue of Szemer\'edi's celebrated regularity lemma to prove the following strengthening of Roth's theorem in vector spaces. For every $\alpha>0$, $\beta<\alpha^3$, and prime number $p$, there is a least positive integer $n_p(\alpha,\beta)$ such that if $n \geq n_p(\alpha,\beta)$, then for every subset of $\mathbb{F}_p^n$ of density at least $\alpha$ there is a nonzero $d$ for which the density of three-term arithmetic progressions with common difference $d$ is at least $\beta$. We determine for $p \geq 19$ the tower height of $n_p(\alpha,\beta)$ up to an absolute constant factor and an additive term depending only on $p$. In particular, if we want half the random bound (so $\beta=\alpha^3/2$), then the dimension $n$ required is a tower of twos of height $\Theta \left((\log p) \log \log (1/\alpha)\right)$. It turns out that the tower height in general takes on a different form in several different regions of $\alpha$ and $\beta$, and different arguments are used both in the upper and lower bounds to handle these cases.


Introduction
The game Set consists of a deck of cards. Each card has four attributes: color, shape, shading, and number, and there are three possibilities for each attribute, for a total of 3 4 = 81 cards. The goal of the game is to find a "set", which is a triple of distinct cards in which each attribute is the same or all different on the three cards. How many cards can there be which contains no set? We can naturally view each card as an element of F 4 3 , and a set is then a line (or, equivalently, a three-term arithmetic progression) in this vector space. While a seemingly recreational problem, its generalization to higher dimensions is the well-known those with common difference zero) at least α C p . As the density of three-term arithmetic progressions with common difference zero is the density α of the set, by averaging, there is a nonzero d for which the density of three-term arithmetic progressions with common difference d is at least α Cp p n −α p n −1 ≥ α C p /2. A version of Green's arithmetic removal lemma in F n p from [21] states that for each ε > 0 there is are subsets of F n p with m ≥ ε p n and x i , y i , z i form a three-term arithmetic progression for each i, then there are at least δ p 2n three-term arithmetic progressions x i , y j , z k . Green's proof uses the arithmetic regularity lemma and gives an upper bound on 1/δ which is a tower of twos of height ε −O (1) . Recently, it was observed by Blasiak et al. [8] and Alon that the recent breakthrough on the cap set problem extends to prove a multicolor sum-free result, and results of Kleinberg, Sawin, and Speyer [42], Norin [45], and Pebody [46] show that the bound for the multicolor sum-free result is sharp. Using this result, the first author and Lovász [21] proved δ (ε) ≥ ε C p , which is essentially tight. Note that taking X = Y = Z = A, we have that any subset A ⊂ F n p of density α has three-term arithmetic progression density at least α C p .
It is an interesting problem to understand how n p (α, β ) grows as we increase β . This function is on the order of log(1/α) when 0 < β ≤ α C p /2, and is a tower of p's of height Θ(log(1/ε)) for α fixed and ε small, where ε = α 3 − β .
We determine for p ≥ 19 the tower height of n p (α, β ) up to an absolute constant factor and an additive constant depending on p. One of the difficulties in doing this is that the tower height takes a different form in different regions of α and β , and the proofs use additional ideas beyond those in the proof of the main result in [22] both in the upper and lower bounds.
We first discuss the case when α ≤ 1/2. When 0 < β < α 3+e −133 , the discussion above and Theorem 6 together show that n p (α, β ) grows as an exponential tower of constant height (depending on p) with log(1/α) on top. So we assume β > α 3+e −133 . This case splits into three cases, depending on whether or not ε is small, in an intermediate range, or large, with respect to α and p, where ε = α 3 − β . In particular, when ε < α 3 /(log(1/α)) log p , n p (α, β ) grows as a tower of p's of height Θ(log(α 3 /ε)) ± O p (1). 1 When α 3 /(log(1/α)) log p ≤ ε < α 3 (1 − 2 −8−8C p ), n p (α, β ) grows as a tower of p's of height (log p) log log(1/α) ± O p (1). When ε ≥ α 3 (1 − 2 −8−8C p ), n p (α, β ) grows as a tower of p's of height Θ log log(1/α) log(α 3 /β ) ± O p (1) with a 1/β on top. This is summarized in the theorem below. We conjecture these bounds should also hold for p < 19. The upper bounds still hold when p < 19, as well as the lower bound when ε is small. The only issue is the case ε is large. In this case, the lower bound construction fails as it uses bounds on the largest subset of F n p without a three-term arithmetic progression, and the known bounds for this are not good enough to imply the desired estimates in this case.
Theorem 2. Let p ≥ 19 be prime, 0 < α ≤ 1/2, and α 3 > β > α 3+e −133 . Recall that n p (α, β ) is the least positive integer such that for each n ≥ n p (α, β ) and every subset A of F n p with density at least α, there is a nonzero d in F n p such that the density of three-term arithmetic progressions with common difference d in A is at least β . Let ε = α 3 − β .
A special case of this theorem is that we want a nonzero d for which the arithmetic progression density with common difference d is at least half the random bound. In this case, for set density α, we get the dimension we need to guarantee such a common difference grows as a tower of height proportional to (log p) log log(1/α) up to an additive error depending on p.
Corrolary 3. The minimum dimension n = n p (α, α 3 /2) needed to guarantee that for any subset of F n p of density at least α, there is a nonzero d for which the density of three-term arithmetic progressions with common difference d is at least half the random bound grows as a tower of p's of height Θ((log p) log log(1/α)) ± O p (1).
Corollary 3 follows by substituting in β = α 3 /2 into the bound in Theorem 2. If we only want to guarantee a density which is considerably smaller than the random bound, we still get a tower-type bound by substituting in β = α 3+z .
Corrolary 4. For α ≤ 1/2, z < e −133 and α z ≤ 2 −8−8C p , the minimum dimension n = n p (α, α 3+z ) needed to guarantee that for any subset of F n p of density at least α, there is a nonzero d for which the density of three-term arithmetic progressions with common difference d is at least α 3+z grows as a tower of p's of height Θ((log p)(log(1/z))) ± O p (1) with 1/α on top.
The previous results only apply for α ≤ 1/2, and there is a good reason for this. The tower height for the function n p (α, β ) actually changes behavior for α close to one, as given by the following theorem. It determines the tower height up to an absolute constant factor and an additive term depending on p. The proof uses additional ideas both in the upper and lower bounds.
We remark that when ε ≥ γ 2 , a simple argument shows that n p (α, β ) ≤ 3 log p (1/γ) is not of tower type. Organization. We prove the tight upper and lower bounds in Theorem 2 in Sections 2 and 3, respectively. In Section 4, we discuss the case where the set density α is close to 1, and in particular prove Theorem 5. In the concluding remarks we discuss many related problems and results.
For the sake of clarity of presentation, we omit floor and ceiling signs where they are not crucial. We use log to denote the logarithm base 2, and ln to denote the natural logarithm. We often use 3-AP as shorthand for three-term arithmetic progression.
Let Tow(a, k) denote a tower of a's of height k, and Tow(a, k, r) denote a tower of a's of height k and an r on top. So Tow(a, 0) = 1 and Tow(a, k + 1) = a Tow(a,k) for k ≥ 0, and Tow(a, 0, r) = r and Tow(a, k + 1, r) = a Tow(a,k,r) for k ≥ 0.
We next give an upper bound on n p (α, β ), which tightens the above bound when ε = α 3 − β is larger compared to α. Recall that the exponent C p in the arithmetic removal lemma satisfies C p = Θ(log p). We assume β ≥ α C p /2 as otherwise we know n p (α, β ) = Θ(log(1/α)) by the discussion in the introduction. Theorem 6. Let β = α 3 − ε and suppose that β ≥ α C p /2. We have the following upper bounds.
So we may assume β ≤ 2 −8−8C p α. If ε < α 3 /(log 1/α) log p , as β < α 3 , the first term in the sum in the tower height in the bound in Theorem 6(2) is the largest of the two terms (up to an absolute constant factor), and we get n p (α, β ) in this case is at most a tower of p's of height O(log(α 3 /ε)). If , the second term is larger (up to a multiplicative constant and an additive term depending on p), and as 2 −8−8C p α 3 < β < α 3 , this is O((log p) log log(1/α)) ± O p (1). Otherwise, we have ε ≥ α 3 (1 − 2 −8−8C p ) and we can apply Theorem 6(3) to get an upper bound on n p (α, β ) which is a tower of p's of height O (log p) log log(α/β ) log(α 3 /β ) with a 1/β on top. In any case, we get the desired upper bounds in Theorem 2.
A 3-AP with common difference d is an ordered triple (a, b, c) such that c − b = b − a = d. A 3-AP is trivial if the common difference d is zero, i.e., it contains the same element three times. Otherwise, we call the 3-AP nontrivial.
Let G = F n p . We will more generally prove the upper bounds in Theorem 6 for weighted set, given by a function f : is the density of f . For a subspace H, the mean cube density b(H) is defined to be the average of the cube of the density of f in the affine translates of H which partition F n p . It is also given by b(H) = E y∈G [α(H + y) 3 ], where H + y = {h + y : h ∈ H} is the affine translate of H by y.
We define the density of 3-APs with common difference d of a weighted set f : . The density of 3-APs with common difference d of a set A is the same as that of the characteristic function of A. For a function f : For an affine subspace H, we let Λ H ( f ) denote the density of three-term arithmetic progressions of f in H. That is, We let λ H ( f ) denote the density of nontrivial three-term arithmetic progressions of f in H. That is, As observed in [22], By averaging the previous inequality over all translates of H and letting The proof of Theorem 6 is by a density increment argument using the mean cube density.
In [22], we proved the following lemma, which shows that if the density of 3-APs with nonzero common difference in a subspace H is small, then the mean cube density can be increased substantially by passing to a subspace H of bounded codimension.
Lemma 7. If f : F n p → [0, 1] has density α, H is a subspace of F n p of size larger than 4α/ε, and Lemma 10 below has a similar assumption and conclusion as the previous lemma. However, it assumes both a stronger hypothesis on the 3-AP density, and has a stronger conclusion, that the ratio of the mean cube density to the bound on the 3-AP density increases by a factor in the exponent. As before, the proof uses the weak regularity lemma and counting lemma, but it also uses the tight bound from [21] on the arithmetic removal lemma to get a larger density increment. For convenience, we restate the statements of the weak regularity lemma and the counting lemma here.
For G = F n p and f : , which is constant on each affine translate of H and has value equal to the average value of f over that affine translate. A subspace H is defined to be δ -weakly-regular with respect to Lemma 8. (Weak regularity lemma.) For any function f : F n p → [0, 1], there is a subspace H which is δ -weakly-regular with respect to f such that H has codimension at most δ −2 .
As remarked in [22], while stated only for functions on F n p , the weak regularity lemma and counting lemma can also be applied to affine subspaces of F n p . The following density increment lemma assumes that the mean cube density is significantly larger than β , and concludes that, in passing to a large subspace, b(H)/β increases by a power. 1], and H be a subspace of F n p with |H| ≥ 2α/β and b(H) ≥ 2 8+8C p β , where C p = Θ(log p) is the exponential constant in the arithmetic removal lemma.
Proof. Let y = b(H)/β ≥ 2 8+8C p > 2 8 and η = β /6. Denote the affine translates of H by H j , where j ∈ F n p /H, so each affine translate of H is labeled by the corresponding element in F n p /H. For each affine translate H j of H, we apply Lemma 8 and Lemma 9, as remarked to apply on affine subspaces, to the translate H j of H and the restriction of f to H j . We obtain a subspace where t j : H j → [0, 1] is defined by t j (x) = E t∈T j +x [ f (t)] for x ∈ H j . We then let We next prove that Denote the affine translates of T j in H j by T jk for k ∈ H/T j . Since t j is constant on each translate of T j in H j , we can denote by t j (T jk ) the constant value t j (x) for x ∈ T jk . Let X j be the set of k ∈ H/T j with t j (T jk ) > α(H j ) y 1/6 and let x j = |X j | |H/T j | . By the arithmetic removal lemma as discussed in the introduction, there is at least a x C p j fraction of the ordered triples (k 1 , k 2 , k 3 ) that form a 3-AP in H/T j with k 1 , k 2 , k 3 ∈ X j . Each 3-AP (k 1 , k 2 , k 3 ) in H/T j with k 1 , k 2 , k 3 ∈ X j corresponds to three affine translates of T j in H j that form a 3-AP of subspaces, where the value of t j on each of them is more than α(H j ) y 1/6 . Hence, Moreover, the mean cube density over H j of t j is where the first inequality is by Karamata's inequality (a generalization of Jensen's inequality, also known as the majorization inequality) applied to the convex function h(x) = x 3 , and a 1 − x j fraction of the translates of T j have density at most α(H j ) (2) and (3), we have where in the last inequality we use the condition |H| > 2α/β and η = β /6. Let a ∈ [0, 1] be a constant to be defined later, A be the set of j such that x j > a, and I( j ∈ A) be the indicator function which is 1 if j ∈ A and 0 otherwise. From (5), we have Observe that f (x) = 1 − x + (z+x) 3 x + 3z is a decreasing function in x for z > 0 and x ∈ [0, 1]. Hence, when x j ≤ a and z = y 1/6 − 1 > 0, Thus, recalling that H = j T j , we have where the first inequality follows from Jensen's inequality applied to the convex function h(x) = x 3 , noting that the partition by H is a refinement of the partition by T j in each affine subspace H j . The second inequality follows since the left hand side is a sum of nonnegative terms and therefore we can delete some of them and the sum does not increase. The third inequality is by substituting in (4) and (7). The fourth inequality is by substituting in (6). As y > 2 6 and a ∈ [0, 1], we have Choose a = 2 2/C p y −1/(2C p ) , so a ∈ [0, 1] as y > 16. It follows from (8) and (9) that The second inequality above follows from y ≥ 2 8+8C p = 2 Moreover, we have Thus, the subspace H has the desired properties.
In [22], we proved the following bound on n p (α, α 3 − ε) by repeated application of Lemma 7. This bound is tight up to a constant factor in the tower height when ε is relatively small.
We next prove the main theorem in this section, Theorem 6, by a similar proof to that of Theorem 11. As remarked earlier, we will prove the upper bounds in the more general setting of functions f : F n p → [0, 1] instead of subsets. We assume that the function f : F n p → [0, 1] has density α and, for each nonzero d, the density of 3-APs with common difference d is less than β < α 3 . Starting from H 0 = F n p , we repeatedly apply Lemma 7 until we can apply Lemma 10, at each step finding a subspace of substantially larger mean cube density at the expense of having a larger codimension. As the mean cube density is at most α, this yields the desired upper bound on the dimension n.
Proof of Theorem 6. Theorem 11 gives the first desired bound in Theorem 6. So we may and will assume that β ≤ 2 −8−8C p α. We assume that f : F n p → [0, 1] has density α and, for each nonzero d, the density of 3-APs with common difference d is less than is the trivial subspace containing only 0.
If |H s 1 | ≥ 4α/ε, and |H i | > 2α/β for some i ≥ s 1 , then we apply Lemma 10 to find a subspace Here τ p = 1/(2C p ). It follows that 2Codim(H i+1 ) ≤ max 80 2 β −4 , p 2Codim(H i ) . Let s 2 be the number of times that we apply Lemma 10 before we cannot anymore. Hence, the number of subspaces s we pick before stopping is s = s 1 + s 2 . If β ≤ 2 −8−8C p α 3 , then s 1 = 0 and we have from which it follows that As 80 2 β −4 < p p p 1/β , we have We also have |H s | < 2α/β . Since p n = |H s |p Codim(H s ) , we obtain that n is less than Tow(p, s 2 + 4, 1/β ). This gives the third desired bound.

Lower bound
In this section, a lower bound construction is given which matches the tower height in the upper bound from the previous section up to an absolute constant factor and an additive constant depending on the characteristic p.
In [22], we gave a probabilistic construction which proves the following theorem. It matches the upper bound when ε is small compared to α. Theorem 12. [22] For 0 < α ≤ 1/2 and ε ≤ 2 −161 p −8 α 3 , there exists A ⊂ F n p of density at least α, where n is a tower of p's of height at least 1 52 log(α 3 /ε), such that for all nonzero d in F n p , the density of 3-APs with common difference d in A is less than α 3 − ε.
We next discuss how to obtain the lower bound in the remaining ranges in Theorem 2. If ε > 2 −161 p −8 , as ε < α 3 , we have α > 2 −54 p −3 , and it follows from the upper bound that n p (α, β ) is at most a constant depending only on p. So we may suppose ε < 2 −161 p −8 . If ε < α 3 /(log 1/α) log p , then the above theorem gives the lower bound in the first case of Theorem 2. The other case, when ε ≥ α 3 /(log 1/α) log p , we will deduce from the following theorem, which gives a lower bound in the case ε is large and p ≥ 19.
Theorem 13. For p ≥ 19, 0 < α ≤ 1/2, and α 3+e −133 ≤ β ≤ α 3 min(p − log p , p −50 ), there exists a subset A ⊂ F n p of density at least α, where n is a tower of p's of height at least 1 30 (log p) ln log(1/α) log(α 3 /β ) with an α 3 /β on top, such that for each nonzero d ∈ F n p , the density of 3-APs with common difference d in A is less than β . That is,

Note that Theorem 13 does not directly apply for
then α is bounded below by a constant depending only on p, and from Theorem 6, n p (α, β ) is at most a tower of p's of constant height (depending on p). Hence, we can assume that α e −133 ≤ p −C log p (since the bounds in Theorem 2 are up to additive constants depending on p). By monotonicity of n p (α, β ) in β , as α e −133 ≤ p −C log p , we can apply Theorem 13 with β = α 3 p −C log p ∈ [α 3+e −133 , α 3 min(p − log p , p −50 )] to get the bound We thus have the following corollary.
This corollary gives the lower bound in Theorem 2 when , we can check that the lower bound in Theorem 2 directly follows from the bound in Corollary 14 when and from the bound in Theorem 13 when ε ≥ α 3 (1 − p −C log p ). This completes the proof of Theorem 2. Our goal for the remainder of the section is to prove Theorem 13.

From weighted to unweighted
For the construction of the set A in Theorem 13, as in [22], it will be more convenient to work with a weighted set in F n p , which is given by a function f : F n p → [0, 1]. The weighted analogue of Theorem 13 is given below. Note that for the weighted constructions, it will be convenient to normalize and replace ε by εα 3 and β by β α 3 .
with a 1/β on top, such that for each nonzero d, the density of 3-APs with common difference d of f is less than β α 3 .
As in [22], we can go from the weighted version to the unweighted version by sampling. , then there exists a subset A ⊂ F n p such that the density of A and, for each nonzero d ∈ F n p , the density of 3-APs with common difference d of A deviate no more than ε from those of f .
Using Lemma 16, Theorem 13 follows from Theorem 15. Proof of Theorem 13: Apply Theorem 15 with α and β = (β /α 3 ) 2 to obtain n and f satisfying the conclusion of Theorem 15. Let β = α 3+z . In particular, n is at least a tower of p's of height with a 1/β = α −2z on top. By the lower bound on n, we have We apply Lemma 16 with ε = β /6. We obtain a set whose density is in [α − β /6, α + β /6] and such that the density of 3-APs for each nonzero common difference is less than β α 3 + β /6 < β /4. Now, we simply delete or add arbitrary elements to make the set have density α. For each nonzero d, the 3-AP density with common difference d can change by at most by 3β /6, so the density of 3-APs with common difference d is less than β /4 + β /2 < β .

Construction idea
In the next subsection, we prove Theorem 15. The general idea for the construction has some similarities to the construction we used in [22] to prove Theorem 12. We partition the dimension n = m 1 + m 2 + · · · + m s , where m i+1 is roughly exponential in m i for each i, and let n i = m 1 + m 2 + · · · + m i be the i th partial sum, so n 1 = m 1 and n i = n i−1 + m i for 2 ≤ i ≤ s. Consider the vector space as a product of smaller vector spaces: In each step i, we determine a partial function f i : F n i p → [0, 1] with density α. The function f i has the property that for each nonzero d ∈ F n i p , the density of 3-APs with common difference d of f i is less than β α 3 .
For Theorem 15, we let m 1 = 10000 log p (1/β ) and choose f 1 to be the characteristic function, appropriately scaled to have average value α on F m 1 p , of a maximum subset of F m 1 p with no 3-AP.
For i ≥ 2, observe that we can use f i−1 to define a function g i : where y is the first n i−1 coordinates of x. Thus, g i has constant value f i−1 (y) on the copy of F m i p consisting of those elements of F n i p whose first n i−1 coordinates equal y. We perturb g i to obtain f i so that it has several useful properties.
We first describe some of the useful properties f i will have. While g i has constant value f i−1 (y) on each copy of F m i p whose first n i−1 coordinates are equal to y, the function f i will not have this property, but will still have average value f i−1 (y) on each of these copies. Another useful property is that for each d ∈ F n i p such that d is not identically 0 on the first n i−1 coordinates, the density of 3-APs with common difference Once we have established this property, it suffices then to check that for each nonzero d ∈ F n i p with the first n i−1 coordinates of d equal to 0, the density of 3-APs with common difference d is less than β α 3 . In order to check this, it now makes sense to explain a little more about how we obtain f i from g i .
Consider a set B ⊂ F m i p with relatively few three-term arithmetic progressions (considerably less than the random bound) given its size. In [22], we took B to be the elements whose first coordinate is in an interval of length roughly 2p/3 in F p . Here, we take B to be the elements whose first r i coordinates (for an appropriately chosen r i ) are in a maximum subset of F r i p with no 3-AP.
We consider the p n i−1 copies of F m i p in F n i p , where each copy has the first n i−1 coordinates fixed to some y ∈ F n i−1 p . For each copy A, we consider a random copy of B in A by taken a random linear transformation of full rank from F m i p to A and consider the image of B by this linear transformation, and then scale the indicator function of the image of B by the constant factor p m i /|B| to keep the average density unchanged on A. We do this independently for each A where g i is nonzero on A. We show that with high probability, for every nonzero d ∈ F n i p with the first n i−1 coordinates of d equal to 0, the density of 3-APs with common difference d is less than β α 3 . One can show this for each such d by observing that the density of 3-APs with common difference d is just the average of the densities of 3-APs with common difference d on each of the p n i−1 copies of F m i p . The densities of 3-APs with common difference d in the perturbed subspaces are independent random variables that have expected value (appropriately scaled) equal to the density of 3-APs in B, which is much less than the random bound for a set of this size. We can then use Hoeffding's inequality, which allows us to show that the sum of a set of independent random variables with values in [0, 1] is highly concentrated on its mean, to show that it is very unlikely that the density of 3-APs with common difference d is at least β α 3 . Since the probability is so tiny, a simple union bound allows us to get this to hold simultaneously for all nonzero d. This completes the construction idea.
To compare with the construction in [22], there, we perturb only a sufficient fraction of the affine subspaces. Here, to account for the large decrease in 3-AP density we need to make in each step (as ε is large), we need to perturb all subspaces, and additionally use a perturbation that has significantly smaller 3-AP density. In the next subsection we present the construction of a set that has significantly few three-term arithmetic progressions, which serves as a main ingredient in our construction.

Subsets with few arithmetic progressions
An important ingredient in our constructions is subsets of F n p with significantly few arithmetic progressions. For this purpose, we will use a set with no nontrivial 3-AP. We let A p,n be a subset of F n p of maximum size with no nontrivial 3-AP. Let r(p, n) = |A p,n |. Alon, Shpilka, and Umans [3] gave a construction of a subset of (Z/pZ) n of cardinality p/2 n−o(n) with no 3-AP. More recently, Alon [1] observed that a variant of the Behrend construction gives an even better bound of the same form. We present Alon's construction in the proof of the following lemma.
When viewed as a subset of R d , set A is a subset of the sphere centered at the origin and with radius √ c. In R d , any line intersects the boundary of a convex set (such as a sphere) in at most two points and hence cannot contain a 3-AP. So A has no 3-AP when viewed as a subset of R d . As the coordinates of the points in A have value between 0 and p−1 2 , there is no wrap-around when adding two elements of A, and it follows that any 3-AP in A must also be a 3-AP in R d . Hence, A ⊂ F n p has no 3-AP and has the desired size.
The bound in Lemma 17 for p ≥ 19 and n ≥ 10 6 gives the bounds in the following corollary.

The Construction
Choice of constants. Let s = 1 11 (log p) ln log(1/α) log(1/β ) − 4 log p . Observe that s ≥ 8 log p > 33 as β ≥ α e −132 and p ≥ 19. In particular, s ≥ 1 20 (log p) ln log(1/α) log(1/β ) + 3. We soon recursively define sequences of positive integers m 1 , ..., m s and r 1 , . . . , r s . We let n = ∑ s i=1 m i , and n j = ∑ j i=1 m i and u j = ∑ j i=1 r i be the jth partial sums. Let N j = p n j , R j = p r j , and U j = p u j , so in these cases we use capital letters to denote p raised to the lower case power. We recall that A p,r is a maximum subset of F r p that contains no 3-AP. Let i−1 β . The constants above were carefully chosen so as to work in the case p = 19. For larger values of p, there is more flexibility in the choices of the constants. We collect several useful bounds between the constants in Appendix A.1.
We divide the construction process into levels in order, starting with level 1 and ending at level s. Construction for level 1. Let f 1 : at the previous level such that the density of 3-APs with common difference d of f i−1 is less than β α 3 for each nonzero d in F , v j (c)} of 3r i vectors are linearly independent, and, for each codimension The existence of such a choice of vectors is guaranteed by Lemma 19 below.
For each a ∈ F n i p , with x ∈ F n i−1 p being the first n i−1 coordinates of a and y ∈ F m i p the last m i coordinates of a, let z a = (y if z a ∈ A p,r i and f i (a) = 0 otherwise. This completes the construction for level i.
When we have finished the construction for level s, let f = f s . It is clear from the construction that the density of each f i is α, and there are

The proof
The following lemma shows that we can choose the vectors v j (x) as specified in the above construction.
where in the first inequality we repeatedly used the inequality (1 − y)(1 − z) > 1 − (y + z) for y, z nonnegative, and in the second inequality we bounded a finite geometric series by the infinite one. Thus, by the union bound, the probability that for some distinct a, b, c ∈ F n i−1 p , the vectors in the set , v j (c)} are not linearly independent is at most where in the first inequality we used m i ≥ 8n i−1 and m i ≥ 8r i , which follow from (26) and (27). Let S be a codimension one affine subspace of F m i p . For each x ∈ F n i−1 p , the probability that the set {v j (x) : 1 ≤ j ≤ r i } is a subset of S is p −r i . By the union bound, the probability that the number of Thus, by the union bound, the probability that there is a codimension one affine subspace S of F m i p for which at least m i of the x ∈ F where the second to last inequality is by (29). As 1/3 + 1/3 < 1, with positive probability (and hence there exists an instance such that) for all distinct a, b, c ∈ F We now prove Theorem 15.
Proof of Theorem 15. We need to prove that for each i, 1 ≤ i ≤ s, the following holds. For every nonzero d ∈ F n i p , the density of 3-APs with common difference d of f i is less than β α 3 . We will prove this by induction on i.
The base case i = 1 follows from the fact that the set where f 1 is nonzero has no nontrivial 3-AP. Assume that for i = k and every nonzero d ∈ F n i p , the density of 3-APs with common difference d of f i is less than β α 3 . We next show this for level i = k + 1.
Let ρ j (d) be the density of 3-APs with common difference d of f j . If a nonzero d ∈ F n k+1 p is 0 in the first n k coordinates, let the last m k+1 coordinates of d be d , which must be nonzero. If d is perpendicular to all v j (x), 1 ≤ j ≤ r i , for some x where f k (x) = 0, then the density of 3-APs with common difference d inside the affine subspace of points where the first n k coordinates are equal to x is |A p,r k+1 |/R k+1 α 3 k+1 = R k+1 /|A p,r k+1 | 2 α 3 k ; otherwise this density is 0, since A p,r k+1 is 3-AP free.
Let t be the number of x in F n k p with f k (x) = 0 such that d · v j (x) = 0 for all 1 ≤ j ≤ r i . The density of 3-APs with common difference d is where the second equality is by α k = µ −1 k α. Since d = 0, we have t < m k+1 , as the construction requires that the number of x ∈ F n k p with f k (x) = 0 such that v j (x), 1 ≤ j ≤ r k+1 are contained in the orthogonal complement of d (which has codimension one) is less than m k+1 , which is guaranteed by Lemma 19. Hence, as m k+1 = N k |A p,r k+1 |/R k+1 2 · µ 3 k · β , we have ρ k+1 (d) < β α 3 . If d ∈ F n k+1 p is nonzero in the first n k coordinates, letting d * denote the first n k coordinates of d, by Lemma 20 below, we have ρ k+1 (d) = ρ k (d * ) < β α 3 . We have thus proved by induction that for each nonzero d ∈ F n i p , the density of 3-APs with common difference d of f i is less than β α 3 .
We chose s to ensure that the functions f i take values α i ∈ [0, 1]. Indeed, where we used Corollary 18 to get |A p,r i | ≥ p −2 2.0001 −r i |R i | and hence the fourth inequality, Inequality (21) to get the fifth inequality, the next equality is by the choice of γ = 3/ log(p/8), the next inequality is by Inequality (23), the next equality is by the choice of r s , the second to last inequality is by m 1 ≤ 10000 log p (1/β ), and the last inequality is by the choice of γ and s so that and hence e γs < e −12 log(1/α) log(1/β ) < 10 −5 log(1/α) log(1/β ) . Now, we estimate the dimension n = n s of our final space. By the definition, we have n s ≥ m s . By (28), we have m s is at least a tower of p's of height s − 3 with a m 3 on top.
We now provide the proof of Lemma 20, which is very similar to the proof of Lemma 12 in [22] but requires some modifications. We include it here for completeness. Proof. Since d * is nonzero, for any 3-AP a, b, c with common difference d, the restrictions of a, b, c to the first n i−1 coordinates are distinct. Let a * be the first n i−1 coordinates of a. Similarly define b * , c * , d * . Fix a * = a 0 , b * = b 0 , c * = c 0 for any 3-AP (a 0 , b 0 , c 0 ) in F n i−1 p with common difference d * , and consider all 3-APs a, b, c with common difference d such that the first n i−1 coordinates of a, b, c coincide with For 1 ≤ j ≤ r i , let a 1 j , b 1 j , c 1 j be any three fixed values in F p . Let L = p m i −3r i . We prove that the number of 3-APs in F n i p with common difference d, a * = a 0 and a · v j (a 0 ) = a 1 j , b Since there are p m i elements in F m i p , and p m i possible tuples (x · v) v∈B , each tuple must appear exactly once. The 3-APs with common difference d, a * = a 0 and a · v j (a 0 ) = a 1 j , b * = b 0 and b · v j (b 0 ) = b 1 j , c * = c 0 and c · v j (c 0 ) = c 1 j are given by triples (a , a + d , a + 2d ) Hence the number of such 3-APs is equal to the number of a ∈ F m i p such that a · v j (a 0 ) = a 1 j , . There is exactly one element of F m i p such that its dot product with each vector in the basis B ⊃ {v j (a 0 ), v j (b 0 ), v j (c 0 ), 1 ≤ j ≤ r i } is fixed to an arbitrary value. Since there are exactly p m i −3r i = L ways to choose the value of a ·v for Hence, which completes the proof.

Popular differences in very dense sets
In Theorem 2, we proved essentially tight bounds on the tower height of n p (α, β ) when α ≤ 1/2 and p ≥ 19. We next discuss what happens when the set density α is close to one and prove Theorem 5, which gives a tight bound on the tower height in this regime. No bound on p is needed. Some of this discussion works for all abelian groups of odd order, so we indicate where we are assuming the group is F n p . Let G be a finite abelian group of odd order and f : G → [0, 1] have density α. Since α is close to one, it will be convenient to work with the complementary function g : G → [0, 1] given by g(x) = 1− f (x), which has density γ := 1−α. The weight of a 3- AP (a, b, c . As each element is in exactly three 3-APs with a given common difference, for every nonzero d, the density of 3-APs with common difference d is at least 1 − 3γ. This bound is best possible in F n 3 if γ = 1/3 by considering the characteristic function of the subset which consists of all elements with 1 or 2 in its first coordinate and the common difference d is nonzero in the first coordinate. However, the total density of 3-APs is considerably larger than this bound. Indeed, the density of 3- APs  (a, b, c) is It is not difficult to see that this bound is best possible if g is the indicator function for a subgroup of G. It follows from (14) by taking away the contribution from the trivial arithmetic progressions (those with Recall that β = α 3 − ε. The above discussion shows that, for ε ≥ γ 2 , we have n p (α, β ) ≤ 3 log p (1/γ). We next discuss the proof of Theorem 5. We first prove the upper bound as given in the following theorem, which improves for α close to 1 the tower height from the bound in Theorem 6(1) by a factor Θ(log(1/γ)). Note that we may simply apply the bound in Theorem 6(1) for 1/2 ≤ α < 59/60 to get the range of α in Theorem 5 not covered by the theorem below.
The only difference between the proofs of Theorem 21 and Theorem 6(1) is that we repeatedly apply Lemma 22 below instead of Lemma 7. Lemma 22 gives a better density increment at each step than Lemma 7, giving a factor 1 + 1/(36γ) instead of a factor two.
Proof. The proof begins along the lines of the proof of Lemma 7 with H = F n p . However, we can improve the bound on the mean cube density by a more careful analysis over simply using Schur's inequality. Denote the translates of H by H j , and denote the density of f in H j by α j = α + δ j , so E[δ j ] = 0. Let D be the number of translates of H. By the above discussion in (14) and (15), the density of nontrivial 3-APs in H j is at least α 3 j − (1 − α j ) 2 . We will use this bound when δ j < −5γ.
We apply the weak regularity lemma, Lemma 8, to each translate H j of H with δ j ≥ −5γ to obtain an η-weakly-regular subspace T j of relative codimension at most η −2 in H with η = ε/12. Denote the translates of T j in H j by T jk for k ∈ H/T j . Let D j = |H/T j | ≤ p η −2 . Denote the density of f in T jk by α jk = α j + δ jk , so, for each j, we have E[δ jk ] = 0. As in the proof of Lemma 7, the density of 3-APs in H j with nonzero common difference is at least where the expectations are over all triples of subspaces T ja , T jb , T jc that form a 3-AP, and the equality is by expanding and using E[δ ja ] = 0. Hence, We next estimate the terms above. Observe that, for δ j ≤ −5γ, we have Substituting in (17) and η = ε/12 into (16), We still need to bound the last term. Observe that ∑ δ ja δ jb δ jc = D 2 j E[δ ja δ jb δ jc ] as there are D 2 j choices for the triple (a, b, c) for which T ja , T jb , T jc are translates of T j in arithmetic progression, because there are D j choices for each of a and b (which uniquely determine c). By throwing away the nonnegative terms, we have ∑ δ ja δ jb δ jc ≥ ∑ δ ja δ jb δ jc <0 δ ja δ jb δ jc .
The above sum includes terms where all three of δ ja , δ jb , δ jc are negative or exactly one is negative and the other two are positive. Hence, |δ ja δ jb δ jc |, as each term in the sums are nonnegative, and each term on the left hand side appears either once or three times on the right hand side. Thus, at least one of the three sums on the right hand side is at least 1 3 ∑ δ ja δ jb δ jc <0 |δ ja δ jb δ jc |. Without loss of generality, suppose it is the first sum, so − ∑ δ ja <0 |δ ja δ jb δ jc | ≤ 1 3 D 2 j E[δ ja δ jb δ jc ]. By the arithmetic mean -geometric mean inequality, we have |δ ja δ jb δ jc | ≤ |δ ja |(δ 2 jb + δ 2 jc )/2. Summing over all triples of subspaces in arithmetic progression, we obtain As E y [λ H+y ( f )] < α 3 − ε, we therefore obtain Let By expanding with α ja = α j + δ ja and using E[δ ja ] = 0, we have and so 1.
Substituting into (20), we obtain completing the proof.
We next discuss the proof of the lower bound in Theorem 5. Note that there must be a reason why Theorem 8 in [22] (the weighted analogue which is used to deduce Theorem 12) does not hold for α close to one, as the lower bound it claims would be better than the upper bound we just got. We next discuss where in the proof of Theorem 8 in [22] we relied on α not being close to one, and discuss how to properly modify it to get the lower bound in Theorem 5. Since the proof is only a minor modification of Theorem 8 in [22], for brevity we do not repeat the details and only specify the key difference.
We reuse the notations in [22]. The first level of the construction is identical. Observe that in the proof of Theorem 8 in [22], at each level i ≥ 2 for x ∈ H i we partition the subspace with the first n i−1 coordinates equal to x into p affine subspaces of relative codimension one (which are translates of each other) depending on the dot product of the last m i coordinates with v(x), and for some of these codimension one subspaces we make the value of f i equal to zero, and on the other subspaces we make the value 1 ζ (1 + η)α ≈ 3 2 α. The problem with this construction for α close to one is that we would get a density which is bigger than one on some subspaces, and this is not allowed. So instead of making the values 0 or something else (namely 1 ζ (1 + η)α) to keep the average value to be (1 + η)α on these subspaces, we make the values 1 (instead of 1 ζ (1 + η)α) on the denser subspaces and (1 − ζ ) −1 ((1 + η)α − ζ ) (instead of 0) on the sparser subspaces, to keep the average value to be (1 + η)α and all values in [0, 1]. The rest of the proof is the same, apart from appropriately modifying the parameters. In particular, the exponential growth of µ i (which is the fraction of the space F n i−1 p that H i takes up) has the base exponential constant in this version γ −O(1) instead of 90 in order to counteract the smaller decrease in 3-AP density that we get from the modification described above.

Concluding remarks Arithmetic Progressions in Groups
Green [31] further proved that Theorem 1 holds not just in F n p but in any abelian group G of odd order or in [N] = {1, 2, . . . , N}. In joint work with Zhao [25], we extend Theorem 2 of [22] to the interval setting. We also generalize the upper bound in Theorem 2 of [22] to general abelian groups of odd order. A substantial difficulty in this setting is dealing with the possible lack of subgroups. The upper bound proof uses Fourier analysis on Bohr sets to extend the ideas here. With some additional ideas, the lower bound can be generalized to cyclic groups which can be written as a product of groups with appropriate growth in size. The construction however runs into a serious obstruction in the case the group has few subgroups, for example, when the group is a primal cyclic group.
A major direction in additive combinatorics in recent years has been to extend results from abelian groups to all (not necessarily abelian) groups. For example, the Freiman-Ruzsa theorem gives a characterization of subsets in abelian groups that have small doubling, i.e., sets A for which |A + A| = O(|A|). Breuillard, Green, and Tao [9] have recently extended this to nonabelian groups, which has diverse applications.
Another important example is Roth's theorem, which was extended by Bergelson, McCutcheon, and Zhang [7] to the nonabelian setting. This result says that if G is a group of order N and A ⊂ G is without a nontrivial solution to xy = z 2 with x, y, z distinct 2 , then |A| = o(N). This also follows from the arithmetic triangle removal lemma of Král', Serra, and Vena [43] (see also [54]), which was proved through an application of the triangle removal lemma. However, by using the triangle removal lemma, the proof gives a weak quantitative estimate. Recently, Sanders [49] used representation theory in order to extend the standard Fourier proof to the nonabelian setting and give a new proof of this nonabelian Roth's theorem with the bound |A| = N/(log log N) Ω (1) .
Another natural extension of Roth's theorem to groups states that if G is a group of order N, and A ⊂ G has no triple x, xd, xd 2 with d = 1, then |A| = o(N). Pyber [47] proved that every group G of order N has an abelian subgroup H of order at least 2 Ω( √ log N) , which is in general best possible. By applying Roth's theorem for abelian groups in the cosets of H, we get this Roth-type theorem in the group G, and with a reasonable quantitative bound, see Solymosi [54].
We think it would be interesting to know if Green's theorem, Theorem 1, extends to nonabelian groups. One version asks: does there exist, for each ε > 0, an N 1 (ε) such that for every group G of order at least N 1 (ε) and every subset A ⊂ G of density α, there is nonidentity element d ∈ G such that the density of triples x, xd, xd 2 which are in A is at least α 3 − ε. If G has an abelian subgroup H of index O(1), then applying the regularity-type upper bound argument, starting with the subgroup H, shows that such a result holds in this case with a similar bound on N 1 (ε) as in the abelian case.
A variant of this question asks: does there exist, for each ε > 0, an N 2 (ε) such that if G is a group of order at least N 2 (ε) and A ⊂ G has density α, then there is a nonidentity element d ∈ G such that the density of triples x, y, z with xz = y 2 and yx −1 = d which are in A is at least α 3 − ε? For quasirandom groups, which were introduced by Gowers [37], it is easy to show that this version holds as the density of solutions to xz = y 2 which are in A is asymptotically α 3 − o(1). Further, it holds for quasirandom groups with only a polynomial bound on N 2 (ε), instead of the tower-type bound as in the abelian case. This shows that, unlike in the abelian case, the quantitative bounds can depend substantially on the group structure.
One possible approach to proving a nonabelian Green's theorem is by developing a nonabelian generalization of the arithmetic regularity lemma, which would likely have further applications. One would likely want to develop a nonabelian version of Bohr sets. Maybe some of the ideas on approximate subgroups in the important work of Breuillard, Green, and Tao [9] or from representation theory as in the work of Gowers on quasirandom groups [37], [38] could be helpful here.

Four-term arithmetic progressions
Green and Tao [32,33,34] proved that for each ε > 0 there is n (ε) such that for any n ≥ n (ε) and any subset of F n 5 of density α, there is a nonzero d ∈ F n 5 such that the density of four-term arithmetic progressions with common difference d is at least α 4 − ε. Ruzsa [5] proved that an analogous result does not hold for longer lengths. We think it would be interesting to estimate n (ε). Does it grow as a tower-type function? It appears the proof given in [33] with the bound on the inverse Gowers theorem for U 3 from [36] gives a wowzer-type upper bound, which is in the next level of the Grzegorczyk hierarchy after the tower function. The lower bound construction we presented here for three-term arithmetic progressions can be modified to give a lower bound on n (ε) which is a tower of twos of height Θ(log(1/ε)). To get an improved upper bound, some of the ideas of Green and Tao in the paper [35] might be useful.

Multidimensional generalization of cap sets and popular differences
Recall that the multidimensional cap set problem discussed in the beginning of the introduction asks to estimate the maximum size r(n, m) of a subset of F n 3 which does not contain a m-dimensional affine subspace, and N 1−(m+1)3 −m ≤ r(n, m) ≤ (1 + o(1))N 1−C m , where C ≈ 13.901 is an explicit constant.
It remains an interesting problem to tighten the bounds on r(n, m). Is the right exponential constant 3, which comes from the random bound, or is it C, which comes from applying the arithmetic triangle removal lemma, or is it something in between?
We have the following multidimensional generalization of Green's theorem.
Theorem 23. For every ε > 0 and positive integer r, there is a (least) positive integer n(r, ε) such that for every n ≥ n(r, ε) and A ⊂ F n 3 of density α, there is a subspace S of dimension r such that the density of translates of S in A is at least α 3 r − ε.
The multidimensional cap set result can be generalized to vector spaces over a fixed finite field, and Theorem 23 to abelian groups of odd order or to intervals, but we need to replace the notion of subspace by "box", also known as a generalized arithmetic progression. A k-box B of dimension r is a set of the form {a 0 + i 1 d 1 + i 2 d 2 + · · · + i r d r : 0 ≤ i j ≤ k − 1 for 1 ≤ j ≤ r}. So a k-box of dimension one is just a k-term arithmetic progression. It is proper if all the elements are distinct, that is, if |B| = k r . We refer to d 1 , . . . , d r as the common differences of the k-box, and a 0 as the initial term.
Theorem 24. For every ε > 0 and positive integer r, there is a (least) positive integer N(r, ε) such that the following holds. For every N ≥ N(r, ε), if G is an abelian group of odd order N or G = [N], and A ⊂ G of density α, then there are d 1 , . . . , d r such that the 3-boxes of dimension r with common differences d 1 , . . . , d r are proper and the number of them in A is at least (α 3 r − ε)N.
The proof of Theorem 24 is by induction on r. The base case r = 1 is simply Green's theorem. Suppose we would like to prove it for r > 1. Let A ⊂ G with |G| = N sufficiently large and |A| = αN. We apply the induction hypothesis to A with parameters r − 1 and ε/4 to obtain d 1 , . . . , d r−1 such that the sums i 1 d 1 + · · · + i r−1 d r−1 with 0 ≤ i j ≤ 2 for 1 ≤ j ≤ r − 1 are distinct, the number of a 0 ∈ [N] for which {a 0 + i 1 d 1 + i 2 d 2 + · · · + i r−1 d r−1 : 0 ≤ i j ≤ k − 1 for 1 ≤ j ≤ r − 1} ⊂ A is at least α 3 r−1 − ε/4 N, and we let A 0 be the set of such initial terms a 0 . The proof of Green's theorem shows that not only is there a single nonzero d which is a popular difference, but in fact a positive constant fraction (depending on ε) of d are popular differences. Applying Green's theorem to A 0 , we get many choices of d r (more than 5 r suffices) such that the number of 3-APs in A 0 of common difference d r is at least α 3 r−1 − ε/4 i 1 d 1 + · · · + i r d r with 0 ≤ i j ≤ 2 for 1 ≤ j ≤ r are distinct. The r-dimensional 3-boxes with common differences d 1 , . . . , d r are proper and at least α 3 r − ε N of them are contained in A, completing the proof. This proof shows that for r fixed, N(r, ε) in Theorem 24 grows at most as a tower of twos of height Θ r (log(1/ε)). A modification to our lower bound constructions shows that this bound is tight over F n p for each fixed r. For brevity, we leave out the details of the proof.
The proof of Theorem 24 together with the result of Green and Tao [34] for four-term progressions shows that Theorem 24 also holds with the length 3 replaced by 4.

From Tower to Wowzer
Ron Graham [29] asked if faster growing functions like wowzer-type (the next level in the Grzegorczyk hierarchy after tower) naturally appear in similar problems. Formally, the wowzer function is defined by Wow(1) = 2 and Wow(n) = Tow(Wow(n − 1)), where Tow(n) = Tow(2, n) is an exponential tower of twos of height n.
We first remark that Green's proof of his theorem, which is obtained by directly applying the arithmetic regularity lemma and the counting lemma, gives the following strengthening. It is stronger by the fact that, for a set of density α, the mean cube density of a subspace is at least α 3 by convexity.
Theorem 25. For each ε > 0 and p, there is a least positive integer m p (ε) such that for every A ⊂ F n p of density α, there is a subspace H of F n p of codimension at most m p (ε) such that the density of 3-APs with common difference in H is at least b(H) − ε.
The original proof of Theorem 25 using the arithmetic regularity lemma gives an upper bound on m p (ε) for p fixed which grows as a tower of height ε −O (1) . Adapting the upper bound proof from [22] gives a better upper bound which is a tower of height Θ(1/ε). We can get a matching lower bound, so a tower of height Ω(1/ε), by modifying the construction used to give a lower bound in Green's theorem from [22]. In the lower bound construction in [22], the fraction of subspaces we make perturbations to increases by a constant factor at each step. To get the lower bound here, after the first level, the fraction of subspaces we make perturbations to at each step is the same. For brevity, we leave out the details of the proof.
Theorem 26. The function m p (ε) defined in Theorem 25 for p fixed grows as a tower of height Θ(1/ε).
While the tower height grows faster in the above result than in Green's theorem, it is still in the same level of the Grzegorczyk hierarchy. To go to the next level of the Grzegorczyk hierarchy, it is natural to try to find an extremal problem that essentially encodes an arithmetic strong regularity lemma. The (graph) strong regularity lemma of Alon, Fischer, Krivelevich, and Szegedy [2] finds a pair of partitions P and Q, with Q a refinement of P, and the regularity of Q is allowed to depend on the size of P. The proof of the strong regularity lemma involves applying Szemerédi's regularity lemma at each step, and so the bound one gets on the number of parts of Q is iteratively applying the tower function ε −Θ(1) times. In other words, it is of wowzer-type. That such a bound is necessary was shown by Conlon and the first author [13] and independently with a weaker wowzer-type by Kalyanasundaram and Shapira [40].
An arithmetic analogue of the strong regularity lemma is as follows. For each function g : Z ≥0 → (0, 1), there is M p (g) such that the following holds. If A ⊂ F n p , then there are subspaces H 1 ⊂ H 0 ⊂ F n p such that the codimension of H 1 is at most M p (g), b(H 1 ) ≤ b(H 0 ) + g(0), and H 1 is g(m)-regular, where m is the codimension of H 0 . As long as 1/g(n) grows not too slowly and not too fast with n, which here means it is bounded between any constant number of iterations of the logarithmic function and any constant number of iterations of the exponential function, then the function M p (g) grows as wowzer in Θ(1/ε) where ε = g(0).
The next result follows easily from the arithmetic strong regularity lemma and the counting lemma. The upper bound follows directly from the proof of the arithmetic strong regularity lemma. The lower bound is by appropriately modifying the lower bound construction for Green's theorem with modifications similar to that used in [13] to mimic the upper bound proof of the strong regularity lemma. For brevity, we leave out the details of the proof.

Monochromatic arithmetic progressions with popular differences
Van der Waerden's theorem [60] states that, for all positive integers k and r, there exists W (k, r) such that if N ≥ W (k, r), then every r-coloring of Z N contains a monochromatic k-term arithmetic progression. Many results in Ramsey theory are of this flavor, that in any finite coloring of a large enough system, there is a monochromatic pattern. In some instances, a stronger density-type theorem also holds, showing that any dense set contains the desired pattern. This is the case for van der Waerden's theorem, as Szemerédi's theorem [56] is such a strengthening, implying that the densest of the r color classes will necessarily contain the desired arithmetic progression. Szemerédi's theorem states that for each positive integer k and ε > 0, there is S(k, ε) such that, if N ≥ S(k, ε), then any subset of Z N of size at least εN contains a k-term arithmetic progression. Note that Roth's theorem is the case k = 3. By a Varnavides-type averaging argument, one can further show that a stronger, multiplicity version of van der Waerden's theorem (and of Szemerédi's theorem) holds, which shows that a fraction c(k, r) − o(1) of the k-term arithmetic progressions must be monochromatic. Observe that a random coloring gives an upper bound on c(k, r) of r 1−k . For r > 2 it is possible to show that there are colorings with relatively few monochromatic arithmetic progressions, considerably smaller than the random bound. For example, using the Behrend construction giving a lower bound for Roth's theorem, one can construct an r-coloring of Z n such that the fraction of three-term arithmetic progressions which are monochromatic is only r −Ω(log r) , which is much less than the random bound of r −2 .
Let G be an abelian group of odd order. Note that, in a random coloring of G, for each nonzero d, the density of 3-APs with common difference d will likely be concentrated around 1 r 2 . Just like in the density version, we can get arbitrarily close to the random bound for the most popular difference. Green [31] proved that, for r fixed, the arithmetic regularity lemma in G extends to r subsets of G (in particular, for the r color classes in an r-coloring of G), so that the decomposition is regular with respect to each of the r subsets. Green's proof of the density theorem on arithmetic progressions with popular differences extends in a straightforward manner to obtain the following coloring variant. Indeed, using this extension of the arithmetic regularity lemma and scaling the approximation parameter ε by r, as long as |G| > N(ε, r), there is a nonzero d such that, for each color i, the density of 3-APs with common difference d which are monochromatic with color i is at least α 3 i − ε r , where α i is the density of color i. Summing over all i, we get the density of monochromatic 3-APs with common difference d is at where the inequality uses Jensen's inequality applied to the convex function f (x) = x 3 and the average of the α i is 1/r. Theorem 29. For each ε > 0 and positive integer r, there is N = N(ε, r) such that if G is an abelian group with odd order |G| ≥ N, then for every r-coloring of G, there is a nonzero d ∈ G such that the density of 3-APs with common difference d that are monochromatic is at least 1 r 2 − ε. Picking the most popular of the r colors, we have the following corollary.
Corrolary 30. For each ε > 0 and positive integer r, there is N = N (ε, r) such that if G is an abelian group of odd order with |G| ≥ N , then for every r-coloring of G, there is a nonzero d and a color i such that the density of 3-APs with common difference d that are monochromatic in color i is at least 1 r 3 − ε. For r = 2, these results are actually quite simple to prove and with bounds that are much better than applying the arithmetic regularity lemma because of the folklore observation that the total number of monochromatic 3-APs is determined by the size of the first color class. Indeed, if R and B are the two color classes, so |B| = |G| − |R|, then we can count the number P of three-term arithmetic progressions with a distinguished pair of elements of the same color in two different ways. First, for each pair of elements there are three arithmetic progressions containing that pair, so we get P = 3 |R| 2 + 3 |B| 2 . Alternatively, every three-term arithmetic progression contains one or three such monochromatic pairs, and there are |G| 2 such three-term arithmetic progressions in G, so we also get P = |G| 2 + 2M, where M is the number of monochromatic three-term arithmetic progressions. We thus get M = 1 . This is maximized when |R| = |G| 2 = (|G| − 1)/2, and we get (|G| 2 − 4|G| + 3)/8 monochromatic arithmetic progressions in this case. Thus, the density of monochromatic 3-APs is at least . Hence, as long as |G| ≥ 3 4 ε −1 , then the density of monochromatic 3-term APs at least 1 4 − ε, and it follows that N(ε, 2) ≤ 3 4 ε −1 . Thus we get a linear upper bound in 1/ε on N(2, ε), much smaller than the tower-type bound that comes from applying the arithmetic regularity lemma.
However, for r ≥ 3, there is no such simple formula for the number of monochromatic 3-APs in a r-coloring. In fact, we can prove a coloring variant of Theorem 2 in [22], showing that N(ε, r) and N (ε, r) for r ≥ 3 grow as a tower of twos of height Θ r (log(1/ε)).
Further discussion and proofs are contained in [23].

Linear equations
A well known conjecture of Sidorenko [51] and Erdős-Simonovits [52] states that if H is a bipartite graph, then the random graph with edge density α has in expectation asymptotically the minimum density of copies of H (which is α e(H) ) over all graphs with the same number of vertices and edge density. Simple constructions show that the assumption that H is bipartite is necessary. A stronger conjecture, known as the forcing conjecture, states that if H is bipartite and contains a cycle, and G is a graph with edge density α and H-density α e(H) + o(1), then G is quasirandom with edge density α. Sidorenko's conjecture and the stronger forcing conjecture are still open but are now known to be true for a large class of bipartite graphs, see [14,16,17,39,41,44,55].
Saad and Wolf [48] began the systematic study of analogous questions for linear systems of equations in finite abelian groups. While much of this discussion extends to general finite abelian groups, for simplicity we restrict our attention to F n p . Let L(x 1 , ..., x k ) = ∑ k i=1 a i x i be a linear form with coefficients a i ∈ F p \{0}. The linear homogeneous equation L = 0 is called Sidorenko if for every subset A ⊂ F n p of density α, the density of solutions to L = 0 which are in A is at least α k , the random bound. We say that the equation L = 0 is matched if the coefficients of L can be paired up so that each pair sums to zero. It is a simple application of the Cauchy-Schwarz inequality that if L = 0 is matched, then it is Sidorenko. Zhao and the authors [26] recently proved that L = 0 is Sidorenko if and only if it is matched.
The equation L = 0 is called common if for every 2-coloring of F n p , the density of monochromatic solutions to L = 0 is at least 2 1−k , the random bound. It is easy to see that if L = 0 is Sidorenko, then it is common. Cameron et al. [10] observed that if k is odd, then L = 0 is common. Saad and Wolf [48] conjectured that if k is even, then L = 0 is common if and only if it is matched. Zhao and the authors [26] also proved this conjecture.
The popular difference property can be generalized to general linear equations. We say that L = 0 is popular if, for each ε > 0, if n ≥ n L (ε) and A ⊂ F n p has density α, then there are nonzero and distinct d 1 , . . . , d k−1 such that the density of solutions to L with x i+1 − x 1 = d i for i = 1, . . . , k − 1 is at least α k − ε. In particular, when k = 3, and a 1 = a 2 = 1 and a 3 = −2 (viewed as elements of F p ), then n L (ε) = max α n p (α, α 3 − ε). Note that if L = 0 is Sidorenko, then simply by averaging, L = 0 is popular and furthermore, n L (ε) is bounded above by O(log(1/ε)). We say that the linear homogeneous equation L = 0 is translation invariant if and only if the sum of the coefficients is zero. Theorem 1 can be extended to show that L = 0 is popular if and only if L = 0 is translation invariant. Indeed, if L = 0 is not translation invariant, then the affine subspace S of codimension one (so density α = 1/p) consisting of those elements whose first coordinate is 1 has no solution to L = 0. Hence, it follows that n L (ε) does not exist for ε < 1/p k . On the other hand, if L = 0 is translation invariant, then it follows from the arithmetic regularity lemma proof that there is a regular subspace H, and the counting lemma and Jensen's inequality gives that the density of L with x 1 , . . . , x k all in the same translate of H is at least almost α k . By throwing out the solutions with x i = x j for some i = j (which is of smaller order) and averaging, we get that there exists nonzero d 1 , . . . , d k−1 ∈ H for which the density of solutions to L with x i+1 − x 1 = d i for 1 ≤ i ≤ k − 1 is at least α k − ε. This proof gives an upper bound on n L (ε) which is a tower of height ε −O (1) . In fact we can directly adapt the proof of Theorem 2 in [22] to show that n L (ε) is bounded above by a tower of height Θ(log(1/ε)), using a density increment argument with the mean k-th power density, We note that our construction of the lower bound in Theorem 2 of [22] and the lower bound in Theorem 2 heavily depends on the fact that the equation x 1 − 2x 2 + x 3 = 0 is not Sidorenko, as a crucial ingredient of our construction is a model function with low 3-AP density. As mentioned above, if L = 0 is Sidorenko, then the lower bound on n L (ε) is not of tower type, but in fact, only logarithmic in ε. However, the tower-type lower bound in Theorem 2 of [22] does not necessarily hold for a general linear equation which is not Sidorenko. In fact, we can construct explicit examples of linear homogeneous equations in 2 t variables for t ≥ 3 which are not Sidorenko but n L (ε) = O(log(ε −1 )). On the other hand, when the number of variables in the equation L = 0 is 3 as in the case of 3-APs, we can adapt directly the proof of Theorem 2 of [22] to show that if an equation L(x 1 , x 2 , x 3 ) = 0 is not Sidorenko, then n L (ε) grows as a tower of height proportional to log(ε −1 ).
The above discrepancy suggest that it could be very interesting to understand and characterize the growth rate of n L (ε) for linear equations in at least four variables. In fact, we do not know if there are any linear equations in at least four variables for which n L (ε) grows as a tower function.
We refer the reader to [26] for further details and questions.
Beyond the random bound when there are relatively few total arithmetic progressions We can strengthen Green's theorem, Theorem 1, as follows. If our space is large enough and the total density of three-term arithmetic progressions is substantially less than the random bound, then there is a nonzero d for which the density of three-term arithmetic progressions with common difference d is substantially larger than the random bound.
The proof is as follows. Let H 0 = F n p , so b(H 0 ) = α 3 . By Lemma 7, there is a subspace H 1 of bounded codimension with b(H 1 ) ≥ α 3 + (α 3 − β )/2. If α 3 ≥ 2 8+8C p β , by Lemma 10, we can instead find a subspace H 1 of bounded codimension with b(H 1 ) ≥ β (α 3 /β ) 1+1/(2C p ) = (α 3 /β ) 1/(2C p ) α 3 . We then apply the arithmetic regularity lemma to find a subspace H 2 of H 1 of bounded codimension which is δ -regular, where δ = δ /8. That is, all but at most a δ -fraction of the translates of H 2 are δ -regular, where an affine subspace S is δ -regular if the function f is δ -close to the constant function E x∈S [ f (x)] on the subspace. By the counting lemma, Lemma 9, applied to each of the δ -regular translates of H 2 , the average density of three-term arithmetic progressions with common difference in H 2 is at least where we used E y∈H 2 +x [ f (y)] ≤ 1 and the density of translates of H 2 that are not δ -regular is at most δ . This includes the arithmetic progressions with common difference zero. As long as H 2 is sufficiently large, which holds as n is sufficiently large to start with, then the 3-AP density with common difference in H 2 is negligibly affected by whether the common difference zero arithmetic progressions are included or not. We thus get the average 3-AP density with nonzero common difference in H 2 is at least b(H 2 ) − δ , and hence there is a nonzero d ∈ H 2 for which the 3-AP density with common difference d is at least b(H 2 ) − δ . By the lower bound b(H 2 ) ≥ b(H 1 ), which holds as H 2 is a subspace of H 1 , this completes the proof.

Quasirandomness and arithmetic progressions
The study of quasirandomness in graphs and other structures have played an important role in combinatorics, number theory, and theoretical computer science. For graphs, Chung, Graham, Wilson [11], building on earlier work of Thomason [58,59], discovered many equivalent properties of graphs that are all shared (almost surely) by random graphs. Following this work, Chung and Graham [12] studied quasirandom properties in other combinatorial structures such as hypergraphs, permutations, boolean functions, and subsets of Z N . Here we focus on F n p with p fixed as it is simpler due to the existence of subspaces, although it can be extended using Bohr sets to general abelian groups.
For a set A ⊂ F n p , we let A : F n p → {0, 1} denote the indicator function. That is A(x) = 1 if x ∈ A, and A(x) = 0 otherwise. We let N = p n . We say a set A ⊂ F n p is ε-quasirandom if it satisfies the following property. P(ε): For every affine subspace H ⊂ F n p , we have ||A ∩ H| − |A||H|/N| ≤ εN. As discovered by Chung and Graham [12] (in the setting of Z N ), there are many equivalent properties up to changing ε. For example, the set A having all nonzero Fourier coefficients at most ε is such an equivalent property. Another example is that for all but at most εN elements x ∈ F n p , the size of the intersection of A and its translate A + x is within εN of |A| 2 /N. Yet another such property is that the Cayley sum graph of A is ε-quasirandom. This relates quasirandomness of subsets of F n p to the quasirandomness of an associated graph.
A graph H is forcing if, for every fixed 0 < α < 1, a graph with edge density α and H-density α e(H) + o(1) is necessarily quasirandom. It is not hard to show that if H is acyclic or not bipartite, then H is not forcing, and the forcing conjecture is that all other graphs are forcing. However, Simonovits and Sós [53] proved that if H is a fixed graph with at least one edge, and G is a graph on n vertices such that every vertex subset S has α e(H) |S| v(H) + o(n v(H) ) ordered copies of H, then the graph is necessarily quasirandom. In particular, if all linear-sized induced subgraphs of a graph have about the same density of triangles, then the graph is quasirandom. Their proof used Szemerédi's regularity lemma, and gave a weak (tower-type) bound on the dependency between the error parameter for this quasirandom property and the traditional quasirandom properties. They posed the problem of finding a new proof that avoids using Szemerédi's regularity lemma and gives a better dependency. This problem was recently solved by Conlon, Sudakov, and the first author [15], giving a linear dependence for cliques, and a polynomial bound for other graphs.
Proofs of Roth's theorem typically begin by observing that if the set is quasirandom, then a counting lemma shows that the total number of three-term arithmetic progressions is about the random bound. However, if the total number of three-term arithmetic progressions is about the random bound, then the set needs not be quasirandom. Motivating by this observation and the results of [53] and [15] in the case of triangles in graphs, it is natural to ask if there is a natural analogue of the quasirandom property of hereditary triangle counts in the arithmetic setting and how such a property relates to other notions of arithmetic quasirandomness quantitatively. In the arithmetic setting, we count three-term arithmetic progressions instead of triangles, and affine subspaces replace the role of vertex subsets. Indeed, we can prove that if the number of 3-APs is not much larger than the random bound in any large affine subspace, then we do get quasirandomness. We have two different versions below. The first involves counting 3-APs in a single affine subspace, and the second counts 3-APs inside translates of a subspace, or equivalently, with common difference in the subspace. Both Q and R are quasirandom properties, that is, they are equivalent to P up to changing ε. In particular, P(ε) implies Q(8p 2 ε) and R(8p 2 ε). Conversely, Q(δ ) with δ = 2 −(p/ε) O(1) implies P(ε), whereas R(δ ) implies P(ε) when δ −1 grows as a tower of p's of height Θ(log(1/ε)). Surprisingly, while the dependency between the quasirandom parameters of Q and P is only exponential and can be further improved, the dependency between the quasirandom parameters of R and P is of tower-type, which can be shown to be tight using the construction in the proof of Theorem 12 in [22]. This shows that, unlike for hereditary triangle counting where the dependency between the quasirandom parameters turns out to be linear, for property R, the dependency turns out to be tower-type (of height logarithmic in 1/ε). We thus address with a somewhat unexpected answer the arithmetic analogue of the Simonovits-Sós problem [53] (see also [14]) on the dependency between hereditary counts and other quasirandom properties.
Further discussion and proofs are contained in [24].

Popular restricted differences
There is now an extensive literature on strengthenings of Szemerédi's theorem in which the common difference lies in a particular set S. An early result of this sort is the Furstenberg-Sárközy theorem [50], which guarantees that any dense subset of the integers contains a pair of distinct elements whose difference is a perfect square. A result of Bergelson and Leibman [6] implies that any dense subset of the integers contains a k-term arithmetic progression whose common difference is a perfect rth power. Another result of this type is when S is the set of primes shifted by one, see [27]. Quantitative bounds have also received much attention, see, e.g., Green [30].
One naturally wonders whether similar strengthenings of Green's theorem hold, where the popular nonzero common difference must lie in a particular set S. A simple example in F n p is when S is a subspace of codimension D, where D is fixed. Indeed, such a result follows from the following somewhat stronger version of Theorem 1.
Theorem 33. For each ε > 0, nonnegative integer D and odd prime p, there is δ = δ p (ε, D) > 0 such that for any subset of F n p and any subspace H 0 of codimension at most D, there is a subspace H of H 0 with |H| > δ p n such that the density of 3-APs with common difference in H is at least b(H 0 ) − ε.
The proof of the above theorem can be obtained directly from our quantitative improvement of Green's theorem, showing that we can take δ p (ε, D) so that 1/δ p (ε, D) grows as a tower of p's of height O(log(1/ε)) with a D on top. Thus, there is a popular nonzero common difference in S if n is larger than a tower of p's of height O(log(1/ε)) with a D on top. We can also modify the lower bound construction in the proof of Theorem 12 in [22] by starting with the partition into translates of the subspace S, and get a lower bound showing that this is essentially tight. In particular, in F n p with p fixed, if S is a subspace of codimension D, to guarantee that for any set there is a nonzero d ∈ S for which the density of 3-APs with common difference d is at least ε less than the random bound, the smallest dimension n we need is a tower of p's of height Θ(log(1/ε)) with a D on top.
It would be interesting to prove other strengthenings of Theorem 1 with restricted differences. For example, what if S is a random set with a given density? Even the threshold probability for Roth's theorem with a random restricted difference set is poorly understood (see, e.g., [28]), so this is likely a challenging problem.