On an almost all version of the Balog-Szemeredi-Gowers theorem

We deduce, as a consequence of the arithmetic removal lemma, an almost-all version of the Balog-Szemer\'{e}di-Gowers theorem: For any $K\geq 1$ and $\varepsilon>0$, there exists $\delta = \delta(K,\varepsilon)>0$ such that the following statement holds: if $|A+_{\Gamma}A| \leq K|A|$ for some $\Gamma \geq (1-\delta)|A|^2$, then there is a subset $A' \subset A$ with $|A'| \geq (1-\varepsilon)|A|$ such that $|A'+A'| \leq |A+_{\Gamma}A| + \varepsilon |A|$. We also discuss issues around quantitative bounds in this statement, in particular showing that when $A \subset \mathbb{Z}$ the dependence of $\delta$ on $\epsilon$ cannot be polynomial for any fixed $K>2$.


Introduction
Let G be an abelian group, and let A ⊂ G be a finite set. The sumset A + A is defined by A central subject in additive combinatorics is to study the structure of sets A with small sumset. It has emerged that, in some applications, one only has information about a restricted version or a popular-sum version of the complete sumset A + A. For Γ ⊂ A × A, we define the restricted sumset A + Γ A = {a + a ′ : (a, a ′ ) ∈ Γ}.
The following natural question arises: given that A+ Γ A is small, can we still obtain structural information on the set A? The Balog-Szemerédi-Gowers theorem (see [13,Theorem 2.29]) answers this question in the affirmative, by producing a subset A ′ ⊂ A with positive density, such that the complete sumset A ′ + A ′ is small.
Theorem (Balog-Szemerédi-Gowers). Let G be an abelian group, and let A, B ⊂ G be two subsets. Let Γ ⊂ A × B be a subset with |Γ| ≥ |A||B|/K ′ for some K ′ ≥ 1. If |A + Γ B| ≤ K|A| 1/2 |B| 1/2 for some K ≥ 1, then there exist subsets A ′ ⊂ A and B ′ ⊂ B with the properties that This paper focues on an "almost all" version (or 99% version) of the Balog-Szemerédi-Gowers theorem. More precisely, if Γ in the statement above is almost all (as opposed to just a positive proportion) of A × B, can we take the sets A ′ , B ′ in the conclusion to be almost all of A, B, and moreover can we ensure that the sumset A ′ + B ′ is just a little larger than A + Γ B?
Thanks to Ben Green for valuable discussions. X.S. was supported by the NSF grant DMS-1802224. Theorem 1.1. Let G be an abelian group, and let A, B ⊂ G be two subsets with |A| = |B| = N . Let K ≥ 1 and ε ∈ (0, 1/2), and let δ > 0 be sufficiently small in terms of K, ε. Let Γ ⊂ A × B be a subset with |Γ| ≥ (1 − δ)N 2 . If |A + Γ B| ≤ KN , then there exist subsets A ′ ⊂ A and B ′ ⊂ B with the properties that Moreover, if G = F n p with p fixed then we may take δ = (ε/K) Op (1) . The proof of this theorem is not difficult. In fact, it is closely related to an equivalent formulation of the arithmetic removal lemma; see Proposition 3.1 below. Consequently, the quantitative dependence of δ on K, ε that we are able to obtain in Theorem 1.1 is the same as that in the arithmetic removal lemma.
As an immediate consequence of Theorem 1.1, we derive the following structure theorem for sets of integers whose restricted doubling is less than 3.
In fact, one may even hope that δ = cε 2 should work. It is easy to see that this would be best possible: Let ε > 0 be small. Take and form Γ by removing the (2εN ) 2 pairs (n, n ′ ) with 1.1N ≤ n, n ′ ≤ (1.1 + 2ε)N . Then |A + Γ A| = 2.1N − 1. On the other hand, if P is an arithmetic progression that shares all but at most εN elements of A, then P must contain the interval from 1 to (1.1+2ε)N , and so |P | > |A+ Γ A|−(1−ε)N +1. Hence one must have δ ≪ ε 2 in Corollary 1.2.
We are unable to settle Conjecture 1.3, but we show that Theorem 1.1 does not hold with polynomial bounds if G = Z, so that one cannot settle Conjecture 1.3 purely via Theorem 1.1. Theorem 1.4. In Theorem 1.1, if G = Z then for any K > 2 and ε > 0 with D := ε −1 min(K − 2, 1) sufficiently large, we must have for some absolute constant c > 0.
Our construction to prove Theorem 1.4 is motivated by, but different from, Behrend's construction of a large set of integers without 3-term arithmetic progressions. Let A ⊂ [M ] be the 3-AP-free Behrend set, so that |A| ≥ exp(−C(log M ) 1/2 )M . Let Γ ⊂ A × A be the set of all non-diagonal pairs (i.e. those pairs (a, a ′ ) with a = a ′ ), so that δ = |A| −1 . Then A + Γ A misses all elements 2a (a ∈ A) by the 3-AP-free property, so that |A + Γ A| ≤ |A + A|−|A|. If one removes ε|A| elements from A to form a subset A ′ , one might guess that A ′ +A ′ is smallest if the ε|A| elements removed are from an initial interval {1, 2, · · · , L} for some L. In this case, A ′ +A ′ should more-or-less be (A + A) \ {1, 2, · · · , 2L}, and so we should have |A ′ + A ′ | ≥ |A + Γ A| + ε|A| if we take L = 0.1|A| (say). Since ε is the proportion of elements from A lying in {1, 2, · · · , L}, it should be of the form ε ≈ L/M ≈ exp(−C(log M ) 1/2 ). Since δ ≈ 1/M , this should show that we must have in Theorem 1.4 when G = Z. We will make this argument rigorous by constructing A as a discretized version of a thin annulus in R d (with large d), and by a trick to make the doubling constant K to be as close to 2 as possible. Our analysis leads to an extra log log ε −1 factor, which we do not know how to remove.

2.1.
Dense models for sets with small doubling. Recall that a map π : G → G from one abelian group G to another is a Freiman isomorphism on A ⊂ G, if it is injective on A and moreover a 1 + a 2 = a 3 + a 4 ⇐⇒ π(a 1 ) + π(a 2 ) = π(a 3 ) + π(a 4 ), for all a 1 , a 2 , a 3 , a 4 ∈ A. In that case we say that A is Freiman isomorphic to π(A). Using Freiman's theorem, one can show that any set with small doubling is Freiman isomorphic to a dense set in a finite abelian group (see [7,Proposition 1.2]). Proposition 2.1. Let G be an abelian group, and let A ⊂ G be a finite subset. If |A + A| ≤ K|A| for some K ≥ 1, then one can find a finite abelian group G and a subset A ⊂ G which is Freiman isomorphic to A, In the case G = F n p with p fixed, one can take c(K) to be polynomial in K. This is a slight generalization of [7, Proposition 6.1].
Proof. The proof is motivated by Ruzsa's embedding lemma [11] (see also [13,Lemma 5.26]). Let m be the smallest integer satisfying p m > K 4 |A|, so that p m ≤ pK 4 |A|. We may assume that m ≤ n, since otherwise the proposition holds simply with G = G and A = A.
Let π : F n p → F m p be the projection map onto the first m coordinates, and let σ ∈ GL n (F p ) be an element chosen uniformly at random. It suffices to show that, with positive probability, π • σ is a Freiman isomorphism from A to its image. Indeed, if this is true for some σ then we may take G = F m p and A = π(σ(A)).
Suppose that π • σ fails to be a Freiman isomorphism when restricted to A. Then π fails to be a Freiman isomorphism when restricted to σ(A), which means that there exist x, y, z, w ∈ σ(A) such that π(x) + π(y) = π(z) + π(w), x + y = z + w.
Thus x + y − z − w is a nonzero element in 2σ(A) − 2σ(A) whose first m coordinates are all zeros. The probability of this happening is at most P(π(σ(a)) = 0).
For any fixed a = 0, we have This can be seen by noting that σ(a) is uniformly distributed in F n p \ {0}. It follows that By Plünnecke's inequality (see [13,Corollary 6.29]), we have |2A − 2A| ≤ K 4 |A|, and thus P < 1 by our choice of m. This completes the proof.

Arithmetic removal lemmas.
The key ingredient in proving Theorem 1.1 is the arithmetic removal lemma due to Green [6] (see also [9]). Theorem 2.3. Let G be a finite abelian group, and let A, B, C ⊂ G be three subsets. Let ε > 0, and let δ > 0 be sufficiently small in terms of ε. If the number of solutions to a + b = c with a ∈ A, b ∈ B, c ∈ C is at most δ|G| 2 , then one can remove at most ε|G| elements from A, B, C to obtain A ′ , B ′ , C ′ , respectively, such that there is no solution The best known bound in the removal lemma is of tower type; see [3]. However, in the finite field model G = F n p with p fixed, the removal lemma can be proved with polynomial bounds [4], building on the recent breakthrough on cap-sets [1,2].
Theorem 2.3 is only non-trivial if A, B, C are dense in G, but using the dense model theorem (Proposition 2.1), we can deduce a "local" version of the arithmetic removal lemma that applies to sets with small doubling.
Corollary 2.5. Let G be an abelian group, let X ⊂ G be a finite subset with |X + X| ≤ K|X| for some K ≥ 1, and let A, B, C ⊂ X be three subsets. Let ε > 0, and let δ > 0 be sufficiently small in terms of K, ε. If the number of solutions to a+b = c with a ∈ A, b ∈ B, c ∈ C is at most δ|X| 2 , then one can remove at most ε|X| elements from A, B, C to obtain A ′ , B ′ , C ′ , respectively, such that there is no solution to Proof. We may assume that 0 ∈ X (at the cost of replacing K by K +1). By Proposition 2.1, there exists a Freiman isomorphism φ : X → X, where X is a subset of a finite abelian group G with | X| ≥ c(K)| G|. By an appropriate translation we may assume that φ(0) = 0. Note that Proof. Run the same argument as above, but using the polynomial bounds in Proposition 2.2 and Theorem 2.4.

A weak version of Theorem 1.1.
Lemma 2.7. Let G be an abelian group, and let A, B ⊂ G be two subsets Proof. When A = B this is [5,Lemma 5.1]. For the general case, we consider "paths of length 3" in addition to "paths of length 2", as in [13,Section 6.4]. Let A ′ be the set of a ∈ A such that (a, b) ∈ Γ for at least (1−δ 1/2 )N elements b ∈ B, and similarly let B ′ be the set of b ∈ B such that (a, b) ∈ Γ for at least (1 − δ 1/2 )N elements a ∈ A. Thus It follows that for any a ∈ A ′ , there are at least

Proof of Theorem 1.1 and Corollary 1.2
The following proposition encapsulates the connection between Theorem 1.1 and the arithmetic removal lemma.
Proposition 3.1. Let G be an abelian group, and let A, B ⊂ G be two subsets with |A| = |B| = N . Let ε, δ > 0. The following two statements are equivalent. ( Proof. To show that (2) implies (1) ∈ Γ leads to a solution to a + b = c with a ∈ A, b ∈ B, c ∈ C, so the number of pairs not in Γ is at most δN 2 . Then (1) implies that there exist subsets Proof of Theorem 1.1. Suppose that δ < 1/100 is small enough in terms of K, ε. By Proposition 3.1, it suffices to prove the second statement of Proposition 3.1 for any C ⊂ G. By Lemma 2.7, we can find A 0 ⊂ A and B 0 ⊂ B such that By shrinking A 0 or B 0 , we may assume that |A 0 | = |B 0 |. By translating appropriately, Let C 0 = C ∩ (A 0 + B 0 ), and let X = A 0 ∪ B 0 ∪ C 0 . Then X + X is contained in the union of the iterated sumsets nA 0 + mB 0 with n, m ∈ {1, 2}. Using the Ruzsa triangle inequality, one can deduce that (See [13, Corollary 2.24]). The number of solutions to a + b = c with a ∈ A 0 , b ∈ B 0 , c ∈ C 0 is at most δN 2 ≤ 2δ|X| 2 , so Corollary 2.5 (applied with ε replaced by εK −O(1) /2) implies that one can remove at most εN/2 elements from A 0 , B 0 , C 0 to obtain The polynomial dependence of δ on K, ε in the case when G = F n p follows by using Corollary 2.6.
Proof of Corollary 1.2. By Theorem 1.1 (applied with ε replaced by ε/10), there exist subsets A ′ ⊂ A and B ′ ⊂ B with the properties that Apply (an assymetric version of) Freiman's 3k−4 theorem (see [10]) to conclude that there exist arithmetic progressions P, Q with the same common differences and sizes at most such that A ′ ⊂ P and B ′ ⊂ Q. This completes the proof.

A continuous version of Behrend's construction
Our construction for proving Theorem 1.4 is motivated by Behrend's construction of a large 3-AP-free set, and in particular motivated by the construction in [8], which starts with a continuous version and then converts it into a discrete one via a probabilisitic argument. We will also start with a continuous set, but to convert it into a discrete set we adopt a more rudimentary approach.
Let d be a positive integer, and henceforth we will always assume that it is sufficiently large. Define where · denotes the L 2 -norm, d is a large positive integer, and η > 0 is small (say η ≤ d −10 ). Denote by V d the volume of the unit ball in R d . Then the volume of S is In particular dηV d ≪ vol(S) ≪ dηV d . We will use the crude estimates Proof. If x, x − y, x + y is a 3-term progression in S, then from the identity 2 x 2 + 2 y 2 = x + y 2 + x − y 2 one can deduce that It follows that the volume of T is Clearly S + S is the ball with radius 2 centered at the origin, so that vol(S + S) = 2 d V d .
In particular, if ε = 25 −d η 3 then Our result is weaker than this, but turns out to be sufficient for our purposes.
To prove Proposition 4.2, first we need some estimates on the volume of the sets R y := S ∩ (y − S). Proof. By symmetry, we may assume that y = (2 − t, 0, · · · , 0). First we show that R y contains the set To see that R − y is also contained in y − S, take any x ∈ R − y and use the identity , a little bit of algebra reveals that 1 − η ≤ y − x ≤ 1, as desired.
Thus it suffices to estimate the volume of R − y , which can be done by an integral. For any x 1 ∈ (0, 1), denote by I(x 1 ) the volume of

So overall we always have
for all x 1 ∈ (0, 1). Hence,  Unfortunately this will not be enough for our purpose due to the exponent 2/(d + 1) decaying like 1/d as d grows. To do better we will make use of the fact that most of the R y 's are disjoint from each other. Lemma 4.6. Suppose that y 1 = 2 − t 1 and y 2 = 2 − t 2 for some t 1 , t 2 ∈ (0, 2). If R y 1 ∩ R y 2 is nonempty, then y 1 − y 2 < 2(t 1/2 it follows that 2x − y 1 < 2t We will get an upper bound for vol(D t ). Pick a maximal set of elements y 1 , · · · , y m ∈ D t with R y 1 , · · · , R ym mutually disjoint. Then for any y ∈ D t , the set R y must intersect with some R y i , and thus by Lemma 4.6, y−y i ≤ 4t 1/2 . It follows that where B(y i , 4t 1/2 ) denotes the ball with radius 4t 1/2 centered at y i . By considering the volume, we get vol . Summing over all i, and using the disjointness of R y i and Lemma 4.4, we get We also have the trivial bound vol which is better when t ≤ ε 2/3 . Combining them together we have By summing over t dyadically we get as desired.
While this finishes the analysis of S, for technical reasons later on we will need to cut the corners of S and consider instead the set We summarize the required properties ofS in the following proposition. Proposition 4.7. Let d be large, and suppose that η ≥ d −d . The setS defined above has the following properties: (1) vol(S) = dηV d (1 + O(dη)); (2) the setT := {(x, y) : x, x − y, x + y ∈S} has volume vol(T ) then vol(S ′ + S ′ ) ≥ vol(S +S) − 1 90 vol(S). Proof. Note that S \S is contained in the union of 2d "caps" of height h = d −10 , and each cap has volume Hence (1) is clear from the estimate on vol(S), and (2) follows from Lemma 4.1.
where the second inequality follows from the assumption that η ≥ d In particular A lies inside the ball of radius M centered at 0, and moreover if a = (a 1 , · · · , a d ) ∈ A then |a i | ≤ (1 − d −10 )M for each i. This implies the lower bound for |A|.

Lemma 5.2. The number of 3-term arithmetic progressions in
Proof. LetT be defined as in Proposition 4.7 (2). Any 3-term progression a, a − b, a + b in A leads to a subset Since the sets T a,b for different choices of (a, b) are disjoint from each other, and the volume of each T a,b is (4M 2 ) −d , we can conclude that the number of such (a, b) is where we used Lemma 5.1 in the last inequality.
Remark 5.3. If one wants to construct a 3-AP-free set A with this approach, then we would like 6 d η d/2−1 |A| 2 to be smaller than |A|. Since |A| ≈ M d , we would need to require that M is smaller than η −1/2 . This scale is certainly too coarse for the argument to work. However, the construction here is sufficient for the purpose of requiring A to have few 3-APs rather than none, and it allows us to analyze the sumset A + A (and also A ′ + A ′ with A ′ almost all of A) rigorously.
For any a, a ′ ∈ A, we have We have where the last inequality folows from Lemma 5. (2) |A + A| ≪ 4 d |A|; This already shows that, when G = Z in Theorem 1.1, δ cannot depend polynomially on K/ε. More precisely, in Theorem 1.1 we must have for some absolute constant c > 0.
Proof. Let d be large, let η = 2 −d , and let M be sufficiently large in terms of d. Let A ⊂ Z d be the set constructed above. Take A to be the image of A under the map π : Z d → Z defined by π(a) = a 1 + a 2 (10M ) + · · · + a d (10M ) d−1 for a = (a 1 , · · · , a d ). Since A ⊂ [−M, M ] d , π is a Freiman isomorphism from A to A, and thus for the properties (2), (3), and (4), it suffices to prove them with A replaced by A.
For (1), A lies in an interval of length For (2), we have For ( It remains to establish (5). This is the place where we needed to trim the corners of S, so that for any a = (a 1 , · · · , a d ) ∈ A we have |a i | ≤ (1−d −10 )M for each i. Let Define X ⊂S by We claim that all elements b ∈ B are close to a in the sense that |a − b| ≤ 0.1|A|. Indeed, if This is ≤ 0.1|A| as claimed, because Hence it suffices to show that |B| ≥ d −O(d) |A|. By the construction of B from X, we have |B| ≥ M d vol(X), and vol(X) can be estimated via an integral: Finally, we modify the construction above to get counterexamples whose doubling constant can be arbitrarily close to 2. The following result clearly implies Theorem 1.4, by taking d = c log D/ log log D for some small absolute constant c > 0.
To form Γ, we remove from A × A those pairs in (A 0 × A 0 ) \ Γ 0 , so that Since all non-positive elements in A + A must come from A 0 + A 0 , we see that A + Γ A misses all the elements 2a with a ∈ A 0 and a ≤ 0, by the symmetry of A 0 we have |A + Γ A| ≤ |A + A| − 1 2 |A 0 |. Now let A ′ ⊂ A be a subset with |A ′ | ≥ (1−ε)|A|. We can write A ′ = A ′ 0 ∪I, where A ′ 0 ⊂ A 0 and I ⊂ {N + 1, · · · , LN } satisfy We need to bound the number of sums in A + A that are missing in A ′ + A ′ . There are three types of them: To bound |U 1 |, note that any sum in the range {−2N, · · · , 0} must come from A 0 + A 0 , so that U 1 ⊂ (A 0 + A 0 ) \ (A ′ 0 + A ′ 0 ). By choosing the absolute constant C in the definition ε = λd −Cd large enough, we can ensure that Ld O(d) ε ≤ 800 −d , and thus by property (4) of A 0 we have To bound |U 2 |, note that by property (5)