Product mixing in the alternating group

We prove the following one-sided product-mixing theorem for the alternating group: Given subsets $X,Y,Z \subset A_n$ of densities $\alpha,\beta,\gamma$ satisfying $\min(\alpha\beta,\alpha\gamma,\beta\gamma)\gg n^{-1}(\log n)^7$, there are at least $ (1+o(1))\alpha\beta\gamma |A_n|^2$ solutions to $xy=z$ with $x\in X, y\in Y, z\in Z$. One consequence is that the largest product-free subset of $A_n$ has density at most $n^{-1/2}(\log n)^{7/2}$, which is best possible up to logarithms and improves the best previous bound of $n^{-1/3}$ due to Gowers. The main tools are a Fourier-analytic reduction noted by Ellis and Green to a problem just about the standard representation, a Brascamp--Lieb-type inequality for the symmetric group due to Carlen, Lieb, and Loss, and a concentration of measure result for rearrangements of inner products.


Introduction
Product mixing for a group G generally refers to any estimate of the following form: whenever subsets X,Y, Z ⊂ G have densities α, β , γ above some threshold, the number of solutions to xy = z with x ∈ X, y ∈ Y, z ∈ Z is (1 + o(1))αβ γ|G| 2 . The following foundational theorem proved by Gowers [Gow08] (and expanded by Babai, Nikolov, and Pyber [BNP08]) explains this idea further.
In particular if αβ γ m −1 then Here and throughout this paper we write X Y to mean that X ≤ O(Y ), and we write X Y to mean that X ≤ o(Y ). We will write X ∼ Y to mean X Y and X Y . This differs from the standard convention in analytic number theory, but it will be convenient for us.
There are several immediate corollaries of Theorem 1.1. For example, if αβ γ ≥ m −1 , then the intersection XY ∩ Z is nonempty, and in fact XY Z −1 = G. In particular, if X ⊂ G is product-free (meaning that there are no solutions to xy = z with x, y, z ∈ X), then X has density at most m −1/3 .
For the purpose of illustration let us assume α ∼ β ∼ γ. Then Theorem 1.1 asserts that there is a product-mixing phenomenon for sets of density greater than m −1/3 . On the other hand Kedlaya [Ked97] proved that any group G acting transitively on a set of size n has a product-free subset of density n −1/2 . For a broad class of groups, including for example the alternating groups and special linear groups, we have m ∼ n, so for these groups this leaves a gap between m −1/3 and m −1/2 .
In Section 2 we partly explain this gap by showing that any group G acting transitively on a set of size n has a subset X of density ∼ n −1/3 for which there are significantly more than the expected number of solutions to xy = z. In groups with m ∼ n this shows that the density threshold for product mixing is m −1/3 , as in Gowers's theorem.
Our main purpose, however, is to demonstrate that a one-sided product-mixing phenomenon persists in the alternating group A n for somewhat lower densities. Specifically we prove the following theorem.
In particular if X is product-free then X has density at most O(n −1/2 (log n) 7/2 ). This is best possible up to the logarithmic factors.
As to the methods, we first use nonabelian Fourier analysis to reduce to a problem taking place only in the standard representation, an idea due to Ellis and Green. This problem is then interpretted in terms of random rearrangements of inner products, and we tackle this problem using concentration of measure and entropy subadditivity. The backbone of our proof is a Brascamp-Lieb-type inequality for the symmetric group due to Carlen, Lieb, and Loss, which we explain in Section 4.
Notation. As already mentioned, in addition to the usual asymptotic notation O(·) and o(·), we write X Y to mean that X ≤ O(Y ), and we write X Y to mean that X ≤ o(Y ). We write X ∼ Y to mean X Y and X Y .
We write Ω throughout for the ground set {1, . . . , n} on which S n and A n act. We attach the uniform measures to S n , A n , and Ω, and we write an unadorned integral f to mean the integral with respect to the uniform measure on the domain of f . We also define inner products, L p norms, and convolutions accordingly.

Examples of sets with poor product mixing
In this section we give two concrete examples of fairly dense sets with poor product-mixing properties. The first example, a relatively large product-free set, is due to Kedlaya [Ked97] and independently Edward Crane (Ben Green, personal communication), but we recall the construction here as it shows that Theorem 1.2 is best possible up to logarithms. The second construction is original, and shows that Theorem 1.1 is best possible for two-sided mixing.
2.1 Sets with no solutions to xy = z First we give an example of a set X of density ∼ n −1/2 with no solutions to xy = z. Fix a set T ⊂ Ω of size t and a point 1 not in T and let X be the set of all π ∈ A n such that π(1) ∈ T and such that π(T ) ⊂ T c . Then clearly X 2 is disjoint from X, as every π ∈ X 2 satisfies π(1) ∈ T c , and it is straightforward to see that X has density Thus if t ∼ n 1/2 then X has density ∼ n −1/2 . This example is due to Kedlaya [Ked97] and independently Edward Crane (Ben Green, personal communication), and it shows that Theorem 1.2 is best possible up to logarithms. As explained in Kedlaya [Ked97], his construction adapts straightforwardly to any 2-transitive subgroup G ≤ S n , and in fact it adapts to any transitive subgroup G ≤ S n through an averaging argument.
Proposition 2.1 (Kedlaya [Ked97]). Let G be a transitive subgroup of S n . Then there is a subset X ⊂ G of density ∼ n −1/2 such that X 2 ∩ X = / 0.
2.2 Sets with too many solutions to xy = z Next we give an example of a set X of density α ∼ n −1/3 having many more than the expected number of solutions (namely, α 3 n! 2 ) to xy = z. Fix a set T of size t and let X be the set of all π ∈ A n such that π(T ) ∩ T is nonempty. As long as t = o(n 1/2 ) then X has density roughly t 2 /n, and if you choose π 1 , π 2 randomly from X then π 1 π 2 is again in X with probability of order t 2 /n + 1/t. To see this it may help to notice that X is symmetric, and that π −1 1 π 2 ∈ X if and only if π 1 (T ) ∩ π 2 (T ) = / 0. Each of π 1 (T ) and π 2 (T ) is required to intersect T nontrivially, so π 1 (T ) and π 2 (T ) intersect with probability at least 1/t. Aside from that restriction π 1 (T ) and π 2 (T ) are just random sets of size t, so they intersect with probability at least t 2 /n. (We can afford to be somewhat lax with this computation as we will shortly prove a more general proposition.) Note that the probability t 2 /n + 1/t is much larger than the expected probability t 2 /n whenever t is small compared to n 1/3 .
As with the previous construction, this construction adapts straightforwardly to any 2-transitive subgroup G ≤ S n , and to an arbitrary transitive subgroup G ≤ S n through an averaging argument.
Proposition 2.2. Let G be a transitive subgroup of S n . Then there is a subset X ⊂ G of density α ∼ n −1/3 for which there are at least 100α 3 |G| 2 solutions to xy = z with x, y, z ∈ X.
Proof. For T ⊂ Ω of size t let X T be the set of all g ∈ G for which g(T ) ∩ T = / 0. Also fix an arbitrary total order < on Ω. Clearly |X T |/|G| ≤ t 2 /n. We will bound |X T | below by the number of g ∈ G for which there are i, j ∈ T with i < j such that g(i) = j. Thus by inclusion-exclusion we have The first sum here is ∼ t 2 /n by transitivity, for any T . The second sum can be rewritten as (2.1) Now note that for any fixed i, i ∈ Ω such that i = i and for any g satisfying g(i) > i and g(i ) > i we have |{i, g(i), i , g(i )}| ≥ 3, and in fact |{i, g(i), i , g(i )}| = 4 except for a proportion at most O(1/n) of g ∈ G. It follows that the average of (2.1) over T ⊂ Ω is bounded by Thus, by Markov's inequality, (2.1) is O(t 4 /n 2 ) with probability at least 9/10. Similarly let us count solutions to xy = z in X T . We will bound the number N T of solutions below by the number of pairs (g 1 , g 2 ) ∈ G 2 for which there exists i, j, k ∈ T with i < j < k such that g 1 (i) = j and g 2 ( j) = k. Thus by inclusion-exclusion again we have The first sum is ∼ t 3 /n 2 by transitivity. The second sum can be rewritten To bound this we again average over T ⊂ Ω. For j = j and g 1 , g 2 under the stated restrictions the set , j , g 2 ( j )} always has size at least 4, has size 4 for at most a proportion O(1/n 2 ) of (g 1 , g 2 ) ∈ G 2 , has size 5 for at most a proportion O(1/n) of (g 1 , g 2 ) ∈ G 2 , and otherwise has size 6. It follows that the average of (2.2) over T ⊂ Ω is bounded by O(n 2 (t/n) 6 + n(t/n) 5 + (t/n) 4 ) = O(t 6 /n 4 ). Thus, by Markov's inequality, (2.2) is O(t 6 /n 4 ) with probability at least 9/10. We deduce that there is some T for which (2.1) is O(t 4 /n 2 ) and (2.2) is O(t 6 /n 4 ). For this T it follows that and that N T |G| 2 t 3 /n 2 + O(t 6 /n 4 ).
Thus as long as t = o(n 1/2 ) we see that X T has density α ∼ t 2 /n while there are at least (t 3 /n 2 )|G| 2 ∼ (n/t 3 )α 3 |G| 2 solutions to xy = z in X. Now take t = cn 1/3 for a sufficiently small constant c.

Nonabelian Fourier analysis
Here we briefly recall the fundamentals of nonabelian Fourier analysis, and then we give a short Fourieranalytic proof of Theorem 1.1. This proof seems to be well known among experts: see for example Let G be a compact group endowed with the uniform measure. The Fourier transform of a function f ∈ L 2 (G) at an irreducible unitary representation ξ : We then have the inversion formula and Parseval's identity (3.1) Here the sums are taken over a complete set of representatives of the irreducible representations of G up to equivalency, and the Hilbert-Schmidt inner product ·, · HS is defined by Like classical Fourier analysis, nonabelian Fourier analysis is a powerful tool for understanding the behaviour of convolutions. Here the convolution f * g of two functions f , g ∈ L 2 (G) is defined by and by an application of Fubini's theorem we have the rule For all this and more the reader might refer to Tao [Tao14, §2.8].
We can now give a short proof of Theorem 1.1.
Proof of Theorem 1.1. Suppose that G is finite, that d ξ ≥ m for ξ = 1, and that X,Y, Z ⊂ G have densities α, β , γ, respectively. Let f = 1 X , g = 1 Y , h = 1 Z . Then by the convolution rule (3.3) and Parseval (3.1) we have Here we have written 1 for the trivial representation of G. Now by Cauchy-Schwarz and the algebra property RS HS ≤ R HS S HS of the Hilbert-Schmidt norm we have so by using Cauchy-Schwarz together with Parseval again we have This proves Theorem 1.1.
For the rest of the paper we specialize to the alternating group G = A n . As explained in the introduction, Theorem 1.1 provides a satisfactory estimate for 1 X * 1 Y , 1 Z only if αβ γ 1/n. However, as observed by Ellis and Green (personal communication), by examination of the proof above it is clear that only the standard (n − 1)-dimensional representation σ is problematic: and since d ξ n 2 for ξ = 1, σ (this follows from the hook formula: see for example [Ras77, Result 2]) we have, by straightforward adaptation of (3.4), This is negligible compared to the main term αβ γ whenever αβ γ n −2 . Thus it remains only to control (n − 1) f (σ )ĝ(σ ),ĥ(σ ) HS .
For each i ∈ Ω we have a map S n → Ω given by π → π(i), which induces a map L 2 (Ω) → L 2 (S n ) given by composition with π → π(i). We denote by p i the adjoint of this map, and we call p i f the pushforward of f under π → π(i). Explicitly p i f is defined by and for any g ∈ L 2 (Ω) we have Now by direct computation whenever at least one of f , g, h is zero we have Here we define the convolution of functions f ∈ L 2 (S n ) and u ∈ L 2 (Ω) by the same formula: f * u is then a function defined on Ω, and one may check the relation Note that the assumption that one of f , g, h is zero is innocuous, since changing f by a constant does not changef (σ ).
Similarly whenever f = 0 we have the following remnant of Parseval's identity: We can now summarize the rest of the proof. We will prove a concentration-of-measure result for the randomly rearranged inner product π * p i g, p i h = Ω p i g(π −1 (ω))p i h(ω).
This result will ensure that π * p i g, p i h ≈ p i g p i h = g h with high probability, and with a tail depending on the variances p i g − g 2 2 and p i h − h 2 2 of p i g and p i h and on the entropies of p i g/ g and p i h/ h. Crucially, when the variances are small there is rather strong concentration from below, unless one of the entropies is large. We will then apply the Parseval remnant (3.7) and a version of subadditivity of entropy to conclude.

An inequality of Carlen, Lieb, and Loss
The following inequality was proved by Carlen, Lieb, and Loss [CLL06].
This inequality can be viewed in at least two ways. First, as it resembles the classical Loomis-Whitney inequality, or more generally the Brascamp-Lieb inequality, it can be viewed as an inequality of Brascamp-Lieb-type for the symmetric group. In this light Theorem 4.1 bears a striking resemblance to another Brascamp-Lieb-type inequality proved by Carlen, Lieb, and Loss for the sphere: see [CLL04].
Theorem 4.1 can also be viewed as a Hadamard-type inequality for permanents. The classical Hadamard inequality states that if M is a matrix with columns v 1 , . . . , v n ∈ C n then where | · | is the usual Euclidean norm on C n . By comparison Theorem 4.1 states that In this section we deduce two consequences of Theorem 4.1, neither of them original: a version of entropy subadditivity for the symmetric group, and a concentration-of-measure result for a statistic of Hoeffding.

Entropy subadditivity for the symmetric group
Given f : S n → [0, ∞) with α = f we define the entropy of f to be To be more precise we might call S( f ) the Kullback-Liebler divergence of ( f /α) dπ from uniform, but we will use the shorter term for simplicity. Similarly, given g : Ω → [0, ∞) with β = g we define S(g) = Ω (g/β ) log(g/β ).
All logarithms are of course taken to the natural base.
In the coup de grâce of our argument we will apply the following entropy-subadditivity inequality.
Theorem 4.2 (Subadditivity of entropy). Suppose f : S n → [0, ∞). Then Note that this is much stronger than what one gets from just applying usual entropy subadditivity to f as a function [n] n → [0, ∞).
Theorems 4.1 and 4.2 are more closely related than it may appear, as shown in some generality by Carlen and Cordero-Erausquin [CCE09]. We repeat the rather simple deduction of Theorem 4.2 from Theorem 4.1 here for the convenience of the reader.

Proof of Theorem 4.2. Put
and put α = f . Then by Jensen's inequality we have On the other hand by Theorem 4.1 we have so log α ≤ n 2 log α and the theorem follows from (4.1).

Concentration for Hoeffding's statistic
Given an n × n complex matrix (a i j ) we consider the sum where π ∈ S n is random permutation. The study of such sums goes back at least to Hoeffding [Hoe51], who proved a central limit theorem for X under suitable hypotheses, and so we refer to X as Hoeffding's statistic. More recently work on Hoeffding's statistic has been more or less wedded to Stein's method of exchangeable pairs, starting with Bolthausen's [Bol84] Berry-Esseen-type estimate for the error in Hoeffding's theorem, and following with the work of Chatterjee [Cha07], who proved the first nonasymptotic concentration-type result for such sums.
In the next section we will need the following Bernstein-type concentration inequality for Hoeffding's statistic, which was proved in the more general context of random matrix theory by Mackey, Jordan, Chen, Farrell, and Tropp [MJC + 14, Corollary 10.3], using an extension of Chatterjee's method.
Theorem 4.3. Let (a i j ) be an n × n matrix such that ∑ n i, j=1 a i j = 0 and such that |a i j | ≤ M for each i, j. Let v = 1 n ∑ n i, j=1 |a i j | 2 . Let π ∈ S n be chosen uniformly at random, and let Then for all t > 0 we have where c is some positive constant.
The purpose of this subsection is to give another proof of the above theorem, not relying on Stein's method, but instead relying on the Carlen-Lieb-Loss inequality Theorem 4.1. The main value of doing so is to reduce the reliance of the present paper on results proved elsewhere, but it may also be of independent interest.
Proof of Theorem 4.3. By replacing (a i j ) with (a i j − 1 n ∑ j a i j ) if necessary and slightly reducing the constant c we may assume that ∑ j a i j = 0 for each i: note that this operation does not change X, it can at worst double max |a i j |, and it can only reduce v. We may also assume that (a i j ) is real, for otherwise we may just deal with the real and imaginary parts separately. Now for λ > 0 we have, by Theorem 4.1, The claimed result now follows by bounding and similarly bounding P(−X > t).
The reader familiar with the usual Bernstein inequality may recognize that from (4.2) onwards all we have done is reproduce the usual proof. Indeed, if Y is the sum of n independent random variables, the ith of which takes values a i1 , . . . , a in each with probability 1/n, then (4.2) states that E exp(λ X) ≤ (E exp(2λY )) 1/2 , so it suffices to extract from the proof of the usual Bernstein inequality an upper bound for E exp(2λY ).

Refined concentration for rearrangements
In this section we prove a refined concentration estimate for Hoeffding's statistic under the hypothesis that a i j = u i v j for some (u i ) and (v i ) for which we have some sort of entropy control. Moreover we are particularly interested in the concentration from below, which in certain regimes we expect to be stronger than the concentration from above.
Lemma 5.2. Let h 1 , h 2 : Ω → [0, 1] be functions such that h i is supported on a set H i of density δ i , and such that 1/2 ≤ h i ≤ 1 on H i . Let f : S n → [0, 1] be a function with f = α. Then if δ 1 δ 2 n −1 we have To explain the two cases appearing in Lemma 5.2, let us momentarily think of h i as the indicator of H i . The inner product f * h 1 , h 2 /α is then the density of a random intersection π(H 1 ) ∩ H 2 , where π is chosen randomly according to f /α. If H 1 and H 2 are not too small then we expect |π(H 1 ) ∩ H 2 | to be highly concentrated around δ 1 δ 2 n with a Gaussian-type tail: this is the first case in the lemma. However if H 1 and H 2 are small then |π(H 1 ) ∩ H 2 | has a Poisson-type distribution, so we expect π(H 1 ) ∩ H 2 to be nonempty with probability about δ 1 δ 2 , and in any case almost surely bounded in size by about log n: this is the second case in the lemma. The lower bound in the second case is trivial.
If δ 1 δ 2 n −1 and C is sufficiently large it follows that as claimed.
For the second part of the lemma put t = C log n for some constant C. Then we obtain Now if δ 1 δ 2 n −1 and C is sufficiently large it follows that The remaining inequalities asserted by the lemma are trivial: just note that and α h 1 h 2 αδ 1 δ 2 .
We will deduce Theorem 5.1 from Lemma 5.2 using a dyadic decomposition, but first we need two basic entropy computations.
Lemma 5.3. Let g : Ω → [0, 1] be a function such that g = β and such that g ≤ β − t on a set of density at least δ , where t, δ > 0. Then S(g) δt 2 β 2 . Proof. We must have t ≤ β , so by replacing t with t/100 if necessary we may assume that t/β ≤ 1/100. Similarly, by reducing δ if necessary we may assume that δ ≤ 1/2 and that δ n is an integer. Now by convexity S(g) is minimized under the stated conditions when g = β − t on a set of density δ and otherwise equal to β + δ 1−δ t, and in this case By inserting the Taylor expansion The last inequality follows from our assumption t/β ≤ 1/100. Lemma 5.4. Let g : Ω → [0, 1] be a function such that g = β and such that g ≥ β + t on a set of density at least δ , where t, δ > 0. Then Proof. We must have (β + t)δ ≤ g = β , i.e., so by replacing t with t/100 if necessary we may assume that .
As before we may also assume that δ ≤ 1/2 and that δ n is an integer. Now by convexity S(g) is minimized under the stated conditions when g = β + t on a set of density δ and otherwise equal to β − δ 1−δ t, and in this case By inserting (5.1) we thus have Now we separate into cases depending on the size of t/β . If t/β ≥ 1 then we have On the other hand if t/β ≤ 1 then by reducing t if necessary we may assume that t/β ≤ 1/100, and then by inserting (5.1) again we have As before we used our assumption about the size of δt/β or t/β to justify the absorption of the error terms.
Proof of Theorem 5.1. Write where s ranges over all s of the form ±2 −k for which n −100 ≤ |s| ≤ 1, and where g s i is defined to be equal to g i − g i where g i − g i has the same sign as s and |s|/2 < |g i − g i | ≤ |s| and zero elsewhere. Then For each s,t we apply Lemma 5.2 with h 1 = g s 1 /s and h t = g t 2 /t. Let δ s 1 be the density of points where g 1 − g 1 has the same sign as s and |s|/2 < |g 1 − g 1 | ≤ |s| and let δ t 2 be the density of points where g 2 − g 2 has the same sign as t and |t|/2 < |g 2 − g 2 | ≤ |t|. If δ s 1 δ t 2 1/n then we get the bound f * g s 1 , g t 2 − α g s 1 g t 2 α|s||t|(δ s 1 ) 1/2 (δ t 2 ) 1/2 log n n 1/2 + O(n −100 ), and the total contribution from all such cases is bounded by ∑ s,t α|s||t|(δ s 1 ) 1/2 (δ t 2 ) 1/2 log n n 1/2 + O(n −100 ) α g 1 − g 1 2 g 2 − g 2 2 log n n 1/2 + O(n −99 ).

Open questions
The most obvious outstanding open question is whether the logarithms can be removed from Theorem 1.2. Specifically, does the largest product-free subset of A n have density O(n −1/2 )? Can you say anything about the extremal examples? It is possible that all near-extremizers look roughly like the first example in Section 2, or its inverse, but this may be difficult to quantify, and even more difficult to prove.
Another obvious outstanding open question is whether a one-sided product-mixing phenomenon persists in other groups for densities lower than that given by Theorem 1.1. For example take G = SL 2 (p). For this group m ∼ p. By Theorem 1.1 there is two-sided product mixing for sets of density at least p −1/3 , by Proposition 2.2 there is no two-sided product mixing for sets of density less than p −1/3 , and by Proposition 2.1 there is no product mixing at all below density p −1/2 . Do we have one-sided product mixing for sets of densities between p −1/2 and p −1/3 ? Another great question, which has been asked before by both Kedlaya [Ked98] and Gowers [Gow08], is about the product-mixing properties of SU(n). To make the question concrete, what is the measure of the largest product-free subset of SU(n)? By straightforward adaptation of Theorem 1.1 it is at most O(n −1/3 ), but the only lower bounds we know have the form c n for some c < 1. Apart from being an interesting and natural question in its own right, answering this question may be relevant for understanding the product-mixing behaviour of groups not having a permutation representation of dimension ∼ m.