Quantitative bounds in the polynomial Szemer\'edi theorem: the homogeneous case

We obtain quantitative bounds in the polynomial Szemer\'edi theorem of Bergelson and Leibman, provided the polynomials are homogeneous and of the same degree. Such configurations include arithmetic progressions with common difference equal to a perfect kth power.


Introduction
The polynomial Szemerédi's theorem of Bergelson and Leibman [BL96] states that for polynomials P 1 , . . . , P n ∈ Z[x] with zero constant term, a set A ⊂ [N] := {1, 2, . . . , N} lacking configurations of the form x, x + P 1 (y), . . . , x + P n (y) with y ∈ Z \ {0} (1) satisfies the size bound |A| = o P (N). Gowers [Gow01a,Gow01b] has posed the problem of obtaining quantitative bounds for the o P (N) term appearing in this theorem. The purpose of this note is to prove a special case of such a result. This has a seemingly more general consequence.
Corollary 1.2. Let P 1 , . . . , P n ∈ Z[y 1 , . . . , y m ] be homogeneous polynomials of degree k, and let K denote a finite union of proper subspaces of R m . If A ⊂ [N] lacks configurations of the form x, x + P 1 (y), . . . , x + P n (y) with y ∈ Z m \ K (3) then A satisfies the size bound |A| ≪ P,K N(log log N) −c(n,k) .
There is a long history quantitative bounds for certain configurations of the form (1). For three point linear configurations (when deg P i ≤ 1) we refer the reader to [Blo14], for longer linear configurations see [Gow01b] and for two point non-linear configurations see [Luc06].
Our approach is to apply the method of van der Corput differencing to relate the nonlinear configuration (3) to a longer linear configuration x, x + a 1 y, . . . , x + a d y with length d = d(n, k) dependent only on n and k. We then treat this linear configuration using Date: September 30, 2014. During this work the author was supported by a Richard Rado Postdoctoral Fellowship at the University of Reading.
1 the methods of Gowers [Gow01b]. The use of van der Corput's inequality allows us to control the size of the coefficients a i .
The main technical difficulty in our approach is that the common difference y in the resulting linear configuration is constrained to lie in a much shorter interval than the shift parameter x. Unfortunately, the current inverse theory for the Gowers norms can only handle parameters x and y ranging over similarly sized intervals. Our strategy, heuristically at least, is to decompose y into a difference of smaller parameters y = y 2 −y 1 . Changing variables in the shift x, we transform the configuration x, x + a 1 y, . . . , x + a d y into one of the form For each fixed value of x, one can view this as a shift of the linear configuration b 1 y 1 , c 2 y 2 , b 3 y 1 + c 3 y 2 , . . . , b d y 1 + c d y 2 .
Crucially, in this linear configuration the parameters y 1 and y 2 range over the same interval. To each of these shifted 'short' configurations we apply Gowers's local inverse theorem for the U d -norm [Gow01b], which yields a density increment on an even shorter subprogression.
Unfortunately, we cannot use Gowers's local inverse theorem as a black box. Instead, we must modify the theorem in a standard way to deliver a density increment on an arithmetic progression with common difference equal to a kth power. The proof of this modification requires extra information on small fractional parts of polynomials to be incorporated into a few of the initial lemmas of [Gow01b]. This slight twist must then be carried through the remainder of the paper, and appears to require re-running the majority of the arguments. We devote a separate note [Pre] to an exposition of this modified local inverse theorem.
To the author's knowledge, the only existing quantitative bound for a non-linear configuration of the form (3) and with length greater than two is a result of Green [Gre02], which deals with the configuration x, x + y, x + 2y where y = a 2 + b 2 .
In a technical tour de force, Green employs the Gowers U 3 -norm to attack this configuration, whereas a naive application of the method in our present paper would require the U 7 -norm. In a future paper, we show how one can in fact use the U 2 -norm 1 to deal with this configuration, and consequently obtain good quantitative bounds.
Unfortunately, we have not been able to devise a way to directly apply our methods to inhomogeneous configurations, such as The difficulty lies in our application of the local inverse theorem for the Gowers norms. This gives us a density increment on a subprogression whose modulus can be much larger than its length, and this poses a difficulty for the inhomogeneous density increment strategy. In another future paper, we employ the global U 3 -inverse theorem of Green and Tao [GT08] in order to obtain quantitative bounds for sets lacking the configuration (4).
The structure of our argument should be standard to those familiar with the density increment method, and is hopefully ordered in a self-explanatory manner. For those unfamiliar with this strategy see Green [Gre05] or Gowers [Gow01a] for an overview.
We end this introduction by showing how Corollary 1.2 follows from Theorem 1.1. Notice that we must have yz / ∈ K for all y ∈ Z \ {0}. Let us define c i := P i (z) for i = 1, . . . , n. Then by homogeneity, the set A lacks configurations of the form x, x + c 1 y k , . . . , x + c n y k with y ∈ Z \ {0}. The result now follows on employing Theorem 1.1.

Configuration-free sets have large balanced count
Given a set A ⊂ [N] := {1, 2, . . . , N} with |A| = δN, define the balanced function of A with respect to [N] by The function f A has mean zero. Hence, if A were a random set, one would expect the statistics of f A to be close to zero when averaged along polynomial configurations. Our first lemma shows that if A lacks configurations of the form (2), then this non-randomness is witnessed by one such arithmetic average.
In the remainder of the paper we assume that c 1 , . . . , c n are distinct non-zero integers. Notice one may assume this in Theorem 1.1 by reducing n if necessary.
then there exist functions f 0 , f 1 , . . . , f n ∈ δ1 [N ] , f A with f n = f A and distinct non-zero integersc 1 ,c 2 , . . . ,c n of magnitude at most B such that x,y Proof. Since A lacks configurations of the form (5), we have Making the substitution 1 A = δ1 [N ] + f A and expanding, we deduce that there exist f 0 , f 1 , . . . , f n satisfying and δN + (2 n+1 − 1) x,y Choosing B = O c,k (1) sufficiently large, the assumption N ≥ B yields x,y Since N ≥ Bδ −kn we can subtract δN from either side of (7) and deduce that x,y If f A = f i for some i > 0 then, after re-labelling the indices of the f j , we takec to be any permutation of c which ensures that f n = f A . If f 0 = f A we shift the x variable in (8) by −c n y k and takec := (c 1 − c n , . . . , c n−1 − c n , −c n ).

The linearisation process
In this section we show how the large non-linear arithmetic average (6) leads to a large linear arithmetic average, albeit over a longer configuration. This deduction proceeds via the method of van der Corput differencing.
Proof. By a change of variables, for any h ∈ Z we have y g(y) = y g(y + h).
Averaging over h ∈ H and interchanging the order of summation gives y g(y) = 1 |H| y h∈H g(y + h).

The function
is supported on the difference set S − H. Squaring and applying Cauchy-Schwarz, we deduce that Lemma 3.2 (weak van der Corput). Suppose that g : A change of variables gives the identity Combining this with the fact that Using the trivial estimates By the pigeon-hole principle and monotonicity of expectation, there exists The required inequality follows. .
Remark. Again, the same conclusion can be drawn for some h with −1 − 8H 0 δ −2 ≤ h ≤ −1, so that we may choose whether h is positive or negative.
Proof. Shifting the argument of the functions P i , we may assume that I = [M]. Set g(x, y) := f 1 x + P 1 (y) · · · f n x + P n (y) 1 [N ] The 4N 2 M 2 H 0 H −1 term in the above is sufficiently small, and our result follows, provided that we can take H to satisfy That such an H exists in the interval [1, M] follows from our assumption that M ≥ 8H 0 δ −2 .
We iteratively apply this lemma, beginning with the configuration x, x + c 1 y k , . . . , x + c n y k and eventually obtaining a configuration of the form x, x + a 1 y, . . . , x + a d y. The complexity of the intermediate configurations requires us to take an abstract approach. Moreover, each application of the linearisation step necessitates a number of technical assumptions whose purpose is to guarantee that the coefficients a i in our final linear configuration are non-zero and distinct.
Before proceeding to describe this argument in general, we illustrate the underlying ideas for the configuration x, x + c 1 y 2 , x + c 2 y 2 .
Lemma 3.4 (Linearisation for square 3APs). Let f 0 , f 1 , f 2 : Z → [−1, 1] be supported on [N] and let c 1 , c 2 be distinct non-zero integers. Suppose that There there exists an absolute constant Proof. Our assumption on the support of f i ensures that f 0 (x)f 1 (x+c 1 y 2 )f 2 (x+c 2 y 2 ) = 0 only when |y| < √ N. Let us set We can then write (10) as δNM ≪ x y∈I This inequality is in a form amenable to an application of Lemma 3.3 with H 0 = {0}, provided that M ≥ Cδ −2 with C = O(1). Let ǫ 1 ∈ {±1} (to be determined later) and write c := c 2 − c 1 . Applying Lemma 3.3, together with the remark that follows it, we deduce that there exists an integer 1 ≤ h 1 ≪ δ −2 , an interval I 1 ⊂ I and integers b i satisfying Here we have made use of the simple identity (y + h) 2 − y 2 = 2hy + h 2 .
The above argument required three applications of Lemma 3.3 in order to linearise the simplest example of a non-linear kth power configuration of length greater than two. In general we require many more applications of the linearisation step, and at each stage of the iteration, it is not immediately obvious that we have reduced the 'degree' of the configuration at all. To see that we have indeed reduced an invariant associated to the configuration, we require the following definition.
In words, D r (P) is the number of of distinct leading coefficients occurring amongst the degree r polynomials in P. Let us define the degree sequence of P = (P 1 , . . . , P n ) by D(P) := (D 1 (P), D 2 (P), D 3 (P), . . . ).
Definition (Colex order). We order degree sequences according to the colexicographical ordering, so that D(P) ≺ D(Q) if there exists r ∈ N such that D r (P) < D r (Q) and for all s > r we have D s (P) = D s (Q).

then P (m) is true for all m ∈ S.
Proof. We leave the reader to check that is transitive, anti-symmetric and total. We show that every non-empty subset of S has a least element.
Let F be a non-empty subset of S and fix m ∈ F . Write k for the maximum index satisfying m k = 0. Let F k+1 consist of those m ′ ∈ F such that m ′ i = 0 for all i ≥ k + 1. Suppose that we have constructed a non-empty subset F i+1 ⊂ F k+1 . Writing π i for the projection onto the ith coordinate, π i (F i+1 ) is a non-empty set of non-negative integers, hence contains a least element m * i . Define F i to be the set of m ′ ∈ F i+1 for which m ′ i = m * i . Iterating this process we eventually obtain F 1 = {m * } and one can check that m * m ′ for all m ′ ∈ F .
Proof. At the cost of increasing the height H by a factor of 2, we may assume that the polynomial P n occurring in the argument of the function f n has degree greater than one. To see why this is so, suppose that j := max {i : deg P i > 1} < n. Then the configuratioñ P := (−P 1 , P 1 − P j , . . . , P n − P j ) has distinct non-constant parts, height at most 2H, satisfies an analogous inequality to (13) and degP n > 1.
Given this assumption, we may re-order indices to ensure the existence of an integer l ≤ n such that deg P i = 1 ⇐⇒ i < l.
Claim. There are at most n 2 choices of h for which the following polynomials have indistinct non-constant parts P 1 (y), . . . , P n (y), P l (y + h), . . . , P n (y + h) To establish the claim, let us suppose that h is such that two of the polynomials in the list (15) have the same non-constant part. Since the polynomials P 1 , . . . , P n have distinct non-constant parts, the only possibility is that P i (y + h) − P j (y) is constant for some l ≤ i ≤ n and 1 ≤ j ≤ n. Let P i (y) = a d y d + a d−1 y d−1 + . . . with a d = 0. Then since d > 1 we have is equal to the coefficient of y d−1 in P j , and this completely determines h. Since there are at most n choices for P j and at most n choices for P i the claim follows.
Let H 0 denote the set of h for which two of the polynomials in (15) have the same non-constant part. Notice that H 0 contains 0. Applying Lemma 3.3, we deduce that there exists h ∈ Z \ H 0 with 1 ≤ h ≤ 1 + 8n 2 δ −2 and an interval I ′ ⊂ I such that ω∈{0,1} f i x + P i (y + ωh) − P 1 (y) .
Given f : Z → R and a ∈ Z, write ∆(f, a) for the function Then for each i < l there exists an integer a i = a i (h) such that We can therefore re-write the right-hand side of (16) as From our claim we see that 0, Q 1 , · · · , Q n ′ have distinct non-constant parts, since adding P 1 to each polynomial in this sequence gives the sequence (15). From the way we have written (17), we see that g n ′ = f A . From the binomial theorem, one can check that the height of each Q i is at most H(8nδ −1 ) 2k .

9
It remains to show that D(Q) has the form given in (14), and consequently D(Q) ≺ D(P). From our ordering of indices, we have r = deg P 1 . Hence if deg P i > r then for either choice of ω ∈ {0, 1}, the polynomial P i (y + ωh) − P 1 (y) has the same leading term as P i . It follows that for s > r we have D s (Q) = D s (P). Let {a 1 , . . . , a t } denote the set of leading coefficients which appear in some P i with deg P i = r. We may assume that a 1 is the leading coefficient of P 1 . Then the set of leading coefficients occurring amongst those Q i with deg Q i = r is equal to {a 2 − a 1 , . . . , a t − a 1 }, which has cardinality one less than {a 1 , . . . , a t }. Moreover, since there are at most 2n − 1 polynomials Q i , the number of Q i with deg Q i < r is also at most 2n − 1. This leads to the bound for i 1 + · · · + i r−1 claimed in the theorem. f 0 (x)f 1 x + P 1 (y) · · · f n x + P n (y) .
Writing m := D(P), there exist absolute constants C = C(n; m) and d(n; m), dependent only on n and m, such that for M ≥ Cδ −C one can find: • an interval I ′ ⊂ I; • 1-bounded functions g 0 , . . . , g d supported on [N] with d ≤ d(n; m) and g d = f n ; • integers a i , b i with a i distinct, non-zero, of magnitude at most HCδ −C , and together these satisfy Let us therefore assume that k := max i deg P i > 1. Provided that M ≥ 8n 2 δ −2 we may apply Lemma 3.6 and conclude the existence of: • an interval I ′ ⊂ I, • 1-bounded functions g 0 , . . . , g n ′ supported on [N] with n ′ ≤ 2n − 1 and g n ′ = f n , • polynomials 0, Q 1 , . . . , Q n ′ of height at most H(8nδ −1 ) 2k and distinct non-constant parts, and together these satisfy the inequality Furthermore, writing r for the smallest index such that m r > 0, we have m ′ := D(Q) = (i 1 , . . . , i r−1 , m r − 1, m r+1 , . . . ) ≺ m for some i 1 + · · · + i r−1 ≤ 2n − 1.
Applying the induction hypothesis, we conclude that there exist absolute constants C = C(n ′ ; m ′ ) and d(n ′ ; m ′ ) such that for M ≥ C(δ 2 /2) −C there exist the following.
• Integers a i , b i with a 1 , . . . , a d distinct, non-zero and of magnitude at most Moreover, we have the inequality The lemma follows provided that we can define appropriate constants C(n; m) and d(n; m) dependent only on n and m. Let us define the set M(n; m) := m ′ : m ′ j ≤ 2n − 1 for j < r, m ′ r = m r − 1, m ′ j = m j for j > r . Notice that since r is the minimal index for which m r = 0, this set M(n; m) is finite and completely determined by n and m. By induction along colex, for any n ′ and any m ′ ∈ M(n; m), the constants C(n ′ ; m ′ ) and d(n ′ ; m ′ ) exist. We can therefore take d(n; m) := max Similarly, one can check that we can take C(n; m) := max Then there exists C = C(n, k) and d(n, k) such that for N ≥ Cδ −C there are 1-bounded functions g 0 , g 1 , . . . , g d supported on [N] with d ≤ d(n, k) and g d = f A , along with integers a i , b i , M with M ≥ 1 C δ C N 1/k and the a i distinct, non-zero, of order O c,k (δ −C ) and such that

A modified local inverse theorem for the U d -norm
Unfortunately we cannot use Gowers's local inverse theorem as it is presently found in the literature. Our difficulty is that the theorem as stated only gives us information about the correlation of a function on long arithmetic progressions. Our present approach requires information about correlation on progressions of a special form.
Definition (kth power progression). Let us call an arithmetic progression a kth power progression if it has common difference of the form y k for some positive integer y.
Theorem 5.1 (kth power local inverse theorem, see [Pre]). For d, k ≥ 2 there exist C = C(d, k) > 0 such that the following is true. Suppose that and that f : Then one can partition [N] into kth power progressions P i , each of length at least N δ C /C such that Remarks.
(i) The case k = 1 of Theorem 5.1 is due to Gowers [Gow01b].
(ii) A proof of Theorem 5.1 for k = 2 and d = 2, 3 can be found in Green [Gre02].
(iii) For d = 2 and any k, the theorem is a nice exercise in Diophantine approximation and Fourier analysis. (iv) The theorem should follow in a standard way from an adaptation of the methods of Gowers [Gow01b]. Unfortunately, the only way we can think of obtaining such an adaptation is by re-running a large section of the argument of [Gow01b], together with some extra applications of basic facts on small fractional parts of polynomials. (v) For inhomogeneous polynomial configurations such as x, x + y, x + y 2 , one requires some control on the size of the kth power in the given progression. This requires a further strengthening of the theorem. (vi) A good estimate for the value of C(d, k) is important if one wishes to determine the nature of the log log N exponent in Theorem 1.1.
We relegate the proof of Theorem 5.1 to a separate note [Pre], where we give a modified exposition of Gowers's argument [Gow01b].

The density increment and final iteration
Lemma 6.1 (Density increment lemma). There exist absolute constants B(c, k) and C = C(n, k) such that the following is true. Suppose that and that A is a subset of [N] of size at least δN lacking a configuration of the form x, x + c 1 y k , . . . , x + c n y k with y ∈ Z \ {0} .
Then there exists a kth power progression P of length at least N δ C /B such that |A ∩ P | ≥ (δ + 1 B δ C )|P |. f (x + ay) = 0.
Adding the above to (24) we find that Hence there exists a kth power arithmetic progression P of length at least N δ C /B such that y∈P f (x + ay) ≫ c,k δ C |P |.
The modulus a appearing above is of order O c,k (δ −C ). Thanks to (22), this is small enough to ensure that on partitioning P into congruence classes mod |a| k−1 each subprogression has length at least N δ C /B (increasing B and C if necessary). We therefore conclude that there exists an arithmetic progression Q with kth power common difference and length at least N δ C /B such that y∈Q f (y) ≫ c,k δ C |Q|.
This completes the proof of the lemma.
Proof of Theorem 1.1. Suppose that A ⊂ [N] with |A| = δN lacks a configuration of the form x, x + c 1 y k , . . . , x + c n y k with y ∈ Z \ {0} . (25) Then by Lemma 6.1, provided that N ≥ exp(Bδ −C ), there exists a kth power progression P = r + a k · [N 1 ] of length at least N δ C /B such that |A ∩ P | ≥ (δ + 1 B δ C )|P |. Let A 1 := x ∈ Z : r + a k x ∈ A ∩ P . Then we have obtained a set A 1 ⊂ [N 1 ] lacking configurations of the form (25) and of density δ 1 := |A 1 |/N 1 satisfying δ 1 ≥ δ + 1 B δ C . Setting δ 0 := δ, N 0 := N, A 0 := A, and iteratively applying Lemma 6.1, we see that provided there exists a set A n ⊂ [N n ] lacking configurations of the form (25) and of density δ n := |A n |/N n satisfying δ n ≥ δ n−1 + 1 B δ C n−1 and N n ≥ N δ C n−1 /B n−1 .