Permutations contained in transitive subgroups

In the first paper in this series we estimated the probability that a random permutation $\pi\in\mathcal{S}_n$ has a fixed set of a given size. In this paper, we elaborate on the same method to estimate the probability that $\pi$ has $m$ disjoint fixed sets of prescribed sizes $k_1,\dots,k_m$, where $k_1+\cdots+k_m=n$. We deduce an estimate for the proportion of permutations contained in a transitive subgroup other than $\mathcal{S}_n$ or $\mathcal{A}_n$. This theorem consists of two parts: an estimate for the proportion of permutations contained in an imprimitive transitive subgroup, and an estimate for the proportion of permutations contained in a primitive subgroup other than $\mathcal{S}_n$ or $\mathcal{A}_n$.

In the first paper [EFG15b] in this series we showed that the proportion i(n, k) of permutations π ∈ S n having some fixed set of size k is of order k −δ (1 + log k) −3/2 uniformly Date: July 26, 2016. 2000 Mathematics Subject Classification. Primary 05A05 and 20B30; Secondary 20B15. for 1 k n/2, where δ = 1 − 1/(log 2) − (log log 2)/(log 2). If n is even, it follows that the proportion of π ∈ S n contained in a transitive subgroup other than S n or A n is at least cn −δ (log n) −3/2 for some constant c > 0. In that paper we stated our belief that a matching upper bound holds, and that stronger upper bounds hold for odd n. The purpose of the present paper is to prove this. Specifically, we prove the following theorem.
Here and throughout the paper the notation X Y means that c 1 Y X c 2 Y for some constants c 1 , c 2 > 0. We will also use X Y to mean X cY for some constant c, as well as standard O(·) and o(·) notation.
Theorem 1. 1. Let T (n) be the proportion of π ∈ S n contained in a transitive subgroup other than S n or A n , and let p be the smallest prime factor of n. Then We record here the first few values of the sequence δ m for easy reference: The theorem that T (n) → 0 as n → ∞ is due to Luczak and Pyber [ LP93], whose method can be used to prove T (n) = O(n −c ) for some small c > 0. This theorem has been widely hailed in the literature and has seen several applications: see for example Cameron and Kantor [CK93] for an application to the group generated by the first two rows of a random Latin square, Babai and Hayes [BH06] for an application to generating the symmetric group with one random and one fixed generator, Diaconis, Fulman, and Guralnick [DFG08] for an application to counting derangements in arbitrary actions of the symmetric group, and Kowalski and Zywina [KZ12] and Eberhard, Green, and Ford [EFG15a] for applications to invariable generation. The rate of decay of T (n) had remained somewhat of a mystery, however, and this question was emphasized by Cameron and Kantor as well as by Babai and Hayes. Theorem 1.1 therefore fills a rather large gap in our understanding of the subgroup structure of the symmetric group.
Theorem 1.1 is actually a composite of two theorems, one about imprimitive transitive subgroups and one about primitive subgroups. Recall that a subgroup H S n is called imprimitive if it preserves some nontrivial partition of {1, . . . , n} into blocks. If H is transitive, then the blocks of such a partition must all have the same size. Therefore, if I(n) is the proportion of π ∈ S n contained in an imprimitive transitive subgroup, and I(n, ν) is the proportion of π ∈ S n preserving some partition of {1, . . . , n} into ν blocks of size n/ν, then I(n) ν|n 1<ν<n I(n, ν).
On the other hand, if H does not preserve a nontrivial partition of {1, . . . , n}, then H is called primitive. Let P (n) be the proportion of π ∈ S n contained in a primitive subgroup other than S n or A n . We prove the following estimates for I(n) and P (n).
Theorem 1.2. Let ν be a divisor of n. Then I(n, ν) if 5 ν log n, n −1 if log n ν n/ log n, n −1+ν/n if n/ log n ν < n.
Thus, if n is composite and p is the smallest prime factor of n, then I(n) I(n, p) + n −1+O(1/ log log n) , with I(n, p) as above.
Remark 1.1. The term n −1+O(1/ log log n) cannot be completely removed. In Remark 6.1, we construct integers n for which I(n) log n log log n I(n, p).
The theorem that I(n) → 0 as n → ∞ is due to Luczak and Pyber [ LP93]. The somewhat older theorem that P (n) → 0 as n → ∞ is due to Bovey [Bov80], who proved the bound P (n) n −1/2+o(1) . More recently Bovey's estimate was improved to P (n) n −2/3+o(1) by Diaconis, Fulman, and Guralnick [DFG08, Section 7], who also conjectured that P (n) O(n −1 ). In truth, P (n) depends rather delicately on the arithmetic of n, and in fact P (n) = 0 for almost all n (see Cameron, Neumann, and Teague [CNT82]), but O(n −1 ) would be the best possible bound which depends only on the size of n. For example if n happens to be prime then every n-cycle generates a primitive subgroup; similarly, if p = n − 1 is prime then every n-cycle is contained in a primitive subgroup isomorphic to SL 2 (p). Our proof of the bound n −1+o(1) is essentially that of [DFG08], except that we insert our new bound for I(n, ν) at a critical stage in the proof.
The proof of Theorem 1.2 is self-contained, except for a theorem we borrow from [DFG08] to deal with ν of size n 1−o(1) . The proof of Theorem 1.3 on the other hand makes essential use of the classification of finite simple groups via work of Liebeck and Saxl [LS91] classifying primitive subgroups of small minimal degree (extended by Guralnick and Magaard [GM98]).
The connection between I(n, ν) and i(n, k) is easy to explain. Suppose π preserves a partition of {1, . . . , n} into ν blocks of size n/ν. Then π induces a permutationπ ∈ S ν on the set of blocks. Ifπ has cycle lengths d 1 , . . . , d m , then it follows that π has disjoint fixed sets A 1 , . . . , A m such that |A i | = d i n/ν and such that all cycles of π| A i are divisible by d i . For example, assume that we have the permutation π = 1 2 3 4 5 6 7 8 9 4 5 6 1 2 3 8 9 7 , counted by I(9, 3), since it permutes the blocks {1, 2, 3}, {4, 5, 6} and {7, 8, 9}. Then the induced permutationπ is the permutation 1 2 3 2 1 3 , whose cycle lengths are 2 and 1. We may then take A 1 = {1, 2, 3, 4, 5, 6} and A 2 = {7, 8, 9}, which are both fixed subsets of π. In addition, π| A 1 = (1 4)(2 5)(3 6) consists only of 2-divisible cycles. The converse to the above relation holds as well : if π has disjoint fixed sets A 1 , . . . , A m such that |A i | = d i n/ν and such that all cycles of π| A i are divisible by d i , then π preserves a system of ν blocks of size n/ν. We are thus naturally led to the following definition: for k = (k 1 , . . . , k m ) such that m i=1 k i = n and d = (d 1 , . . . , d m ), let i(n, k, d) be the proportion of π ∈ S n having disjoint fixed sets A 1 , . . . , A m such that |A i | = k i and such that all cycles of π| A i are divisible by d i . Then we have where the max and sum run over partitions (d 1 , . . . , d m ) of ν. Thus, at least for small ν, it suffices to understand i(n, k, d).
Moreover, it turns out that the only nontrivial case for which we need sharp bounds is the case in which d i = 1 for each i. In this case we write just i(n, k) for i(n, k, d): this is simpy the proportion of permutations π having disjoint fixed sets of sizes k 1 , . . . , k m . Our main task therefore is to establish the following estimate for i(n, k). Note that because i(n, k) = i(n, (k, n − k)), this generalizes the main result of [EFG15b].
Theorem 1.4. Let m 2 and assume 2 k 1 · · · k m and m i=1 k i = n. Then In particular, if k i m n for each i then i(n, k) m n −δm (log n) −3/2 .
In [EFG15b], we relied on an analogy with analytic number theory wherein the problem of estimating i(n, k) corresponds to the problem of estimating the proportion of integers n x with a divisor in a given dyadic interval (y, 2y]: this is the so-called multiplication table problem, which was solved up to a constant factor by the second author [For08a,For08b]. Similarly, the problem of estimating i(n, k, d) is related to higher-dimensional versions of the multiplication table problem. The connection is closest for i(n, k), which under the analogy corresponds to the proportion of n x that are decomposable as n 1 · · · n m with n i ∈ (y i , 2y i ] for each i. Except in some cases in which the sizes of the parameters y i are too wildly different, this proportion was computed up to a constant factor by the third author [Kou10,Kou14]. For comparison with Theorem 1.4, refer in particular to [Kou10, Theorem 1]. Thus, as in [EFG15b], the task of proving of Theorem 1.4 is largely one of translation.
Given the strength of the analogy with [Kou10, Theorem 1], one might hope to be able to deduce the result directly using transference ideas. While unfortunately this does not appear to be possible, the basic outline of the proof is the same.
When the vector d is allowed to be arbitrary, however, there are some additional complications, and while there is still some connection with the generalized multiplication table problem, in fact it is somewhat fortunate that the partitions of ν constituting the main contribution to I(n, ν) correspond to d for which we know how to estimate i(n, k, d) satisfactorily, while for the rest we can get away with a crude bound.
We have made an effort to follow the exposition and technical notation previously used in [For08a,For08b,Kou10,Kou14,EFG15b], but unfortunately many notational clashes have been unavoidable. nologies.

Outline of the proof
In this section we sketch the broad idea and initial reductions involved in the proof of Theorem 1.2. The proof of Theorem 1.3 relies on Theorem 1.2 but is otherwise unrelated, so we defer discussion to Section 7.
Let ν be a proper nontrivial divisor of n. When ν becomes large we will survive on a combination of crude arguments and previous work of Diaconis, Fulman, and Guralnick [DFG08], so in this outline assume ν is bounded. As explained in the introduction, our starting point is the relation (1.1), whence we immediately infer that The estimation of I(n, ν) for ν bounded is thus immediately subsumed by the general problem of estimating i(n, k, d).
Call a partition (d i ) of ν maximizing i(n, There is a comparatively simple bound for i(n, k, d) which already shows that, for every ν, every dominant partition has the form (d, 1, . . . , 1) for some d 1.
Here, we assume of course that k and d have the same length m , k and d have the same length m , i k i = n , and i k i = n (c) For every k and d, we have that (d) For every fixed ν 1 and sufficiently large n, every dominant partition of ν has the form (d, 1, . . . , 1) for some d 1.
(c) This follows immediately from parts (a) and (b). (1 − 1/d i ) 1, so by part (c), On the other hand, part (b) implies that i(n, (n), (ν)) n −1+1/ν , Though in general i(n, k, d) is a rather subtle quantity, the case d = (1, 1) for instance being the subject of the paper [EFG15b], some cases are elementary. For instance in Lemma 2.1(a) we saw rather simply that i(n, (n), (d)) n −1+1/d . It turns out that estimation of i(n, (k 1 , k 2 ), (d, 1)) is also elementary whenever d 3.
Lemma 2.2. Let d 3, and assume k 1 , k 2 1 and that k 1 is divisible by d. Then Proof. The upper bound is contained in Lemma 2.1(c). Recall the proof, which follows from parts (a) and (b) of that lemma: The number of ways of choosing a set A 1 of size k 1 is n k 1 , and the number of π ∈ S k having all cycles divisible by d is k!/k 1−1/d , so i(n, (k 1 , k 2 ), (d, 1)) 1 n! n k 1 Given π ∈ S n , let X = X(π) denote the number of acceptable choices for sets A 1 of size k 1 that are fixed by π and such that π| A 1 consists of d-divisible cycles. Then the argument in the above paragraph uses the simple relations i(n, (k 1 , k 2 ), (d, 1)) = P(X > 0) EX, where the underlying probability measure is the uniform measure on S n , and then proceeds by showing that EX k −1+1/d 1 . To find a matching lower bound, we will compute the second moment EX 2 , or in other words the number of pairs of k 1 -sets A 1 , A 1 such that π fixes both A 1 and A 1 and such that π| A 1 and π| A 1 are both wholly composed of d-divisible cycles. Note then that π must fix each of the sets A 1 ∩ A 1 , A 1 \ A 1 , A 1 \ A 1 , and the restriction of π to each of these sets must be wholly composed of d-divisible cycles. The number of ways of choosing two sets of size k 1 which overlap in a set of size k 11 is n k 11 , k 1 − k 11 , k 1 − k 11 , k 2 − k 1 + k 11 , so we deduce that Hence by Cauchy-Schwarz we have This proves the lemma.
On the other hand, estimation of i(n, k) (that is, i(n, k, d) in the case in which d i = 1 for each i) is not nearly so straightforward, and most of the paper will be devoted to establishing an estimate in this case, namely Theorem 1. 4. The proof of this theorem is divided over the next three sections. Specifically we prove a useful local-global principle in Section 3, we then prove the upper bound in Section 4, and finally we prove the lower bound in Section 5.
Assuming that we have proved Theorem 1.4, we can then combine our various bounds for i(n, k, d) to determine the dominant partition of ν for each bounded ν. Moreover, since we have a sharp estimate for i(n, (d i n/ν) i , (d i ) i ) for each such dominant partition, we are able to deduce a sharp estimate for I(n, ν).

Meanwhile, by Lemma 2.1(a) we have
Thus the exponents we are comparing are and we claim that the last of these is the smallest whenever ν 5. Since δ m = (m−1)/ log m 1 (log t)dt, the sequence (δ m ) m 2 is increasing. In particular, δ ν δ 6 > 1 for ν 6, and one checks by direct computation that δ 5 = 0.77 . . . > 1 − 1/4 too.
Next, if 2 d ν − 4, then which one checks by direct computation.
This completes the sketch of the proof of Theorem 1.2 when ν is bounded. As ν begins to grow with n, we must be more careful about some of our bounds, but we can afford to be more relaxed about others, and, by and large, the proof becomes simpler, using as key input Lemma 2.1 and the case m = 2 of Theorem 1.4. As ν becomes very large, say of size n 1−o(1) , then our method begins to falter, and we outsource most of the work to [DFG08]. For all this, see Section 6.

A local-to-global principle
Given a k-tuple c = (c 1 , . . . , c k ) of nonnegative integers, let L m (c) be the set of all mtuples where (x ij ) is an m×k matrix whose entries are nonnegative integers such that m i=1 x ij = c j for each j. Note then that i(n, k) is precisely the probability of the event k ∈ L m (c), where c is the cycle type of a random permutation: here we say that π ∈ S n has cycle type c if π has exactly c j j-cycles for each j n. Instead of measuring this probability directly, however, we will use a convenient local-to-global principle which relates i(n, k) to the average size of L m (c), given in Proposition 3.1 below. The terminology 'local-to-global' means that we turn a question about the local distribution of the set L m (c) (whether it contains the point k) to a question about its global distribution. Notice that if k m−1 k 1 = k, then a naive heuristic implies that the event k ∈ L m (c) occurs with probability ≈ |L m (c)|/k m−1 . Our local-to-global estimate proves that this naive heuristic is true on average: We start with a few basic upper bounds for L m (c). Throughout this section we will denote by P m−1 the projection onto the first m − 1 coordinates, and we will often use the observation that |L m (c)| = |P m−1 L m (c)|: this holds simply because L m (c) is contained in the hyperplane of R m defined by We can find (y ij ) and (z ij ) such that x ij = y ij + z ij for all i, j, and such that i y ij = c j and i z ij = c j for each j.
as claimed.
(c) The claimed inequality follows immediately from parts (a) and (b).
Since k j=k+1 1/j k k dt/t = log(k /k), the claimed result follows. We need some further notation in connection with type vectors c = (c 1 , . . . , c k ). We define If c = (c 1 , . . . , c n ) is the cycle type of some π ∈ S n then note that S(c) = n. Occasionally however we will keep track of cycle types of partial permutations, in which case S(c) can be thought of as the total length represented by c. We define also C + (c) to be the largest j such that c j > 0, or else zero if none exists. Similarly we define C − (c) to be the smallest j such that c j > 0, else ∞ if none exists. If c is the cycle type of π ∈ S n then C + (c) and C − (c) are the lengths of respectively the longest and shortest cycles of π; we will take the liberty of also using the alternative notation C + (π) and C − (π) to denote the same quantities.
The result follows immediately from this, the observation that E|L m (X k )| E|L m (X k )|, and the bound 2 ar m r /j r! e 2 a m j .
(b) By the multinomial theorem and part (a), we have that Let J be the set of indices i such that a i = 0. For each J, the product on the right side above is k r−|J| , and there are O r (1) choices for the numbers a i , i ∈ J, with sum r. For each j ∈ {1, 2, . . . , r}, there are k j subsets J ⊂ {1, . . . , k} of cardinality j. Thus the sum above over a 1 , . . . , a r is O(k r ), as claimed.
(c) We have that For the second summand, we have that by Lemma 3.2(a,b). Now by straightforward modification of the proof in part (b) we have We also need to recall [EFG15b, Proposition 2.1].
Proposition 3.5. Let c 1 , . . . , c k be nonnegative integers such that n − S(c) is at least k + 1.
Then the number of π ∈ S n with exactly c i i-cycles for each i k is We are now ready to prove Proposition 3.1. In keeping with the analogy with analytic number theory, in the proof we will speak about "factorizations" π = π 1 · · · π m . By this we mean simply that π has fixed sets A 1 , . . . , A m such that π i = π| A i for each i. We may think of π 1 , . . . , π m as partially defined permutations, and we define their cycle types accordingly. Note in this connection that if c i is the cycle type of π i then S(c i ) = |A i |.

3.1.
The lower bound in Proposition 3.1. Recall that k = k 1 k 2 · · · k m and that k m−1 ck. Assume that n is sufficiently large depending on m and c. Let M = 2e 2m and h = k/(4M ) . We also fix integers L i = O m,c (1) for i m − 1. We focus our attention on permutations π factorizing as where every cycle of α has length h, the total length of α is |α| < M h, each σ ij is a cycle of length in the range M h < |σ ij | < 3M h, and all cycles of β have length 3M h. If α is of type c = (c 1 , . . . , c h ) and |σ ij | = ij for each i, j, then we further assume that (3.1) This implies that π is counted by i(n, k). Indeed, (3.1) is equivalent to the existence of non-negative integers (x ij ) i m−1,j h such that This means that there are sets A 1 , . . . , A m−1 of sizes k 1 , . . . , k m−1 , respectively, left invariant by π. We then define A m = {1, . . . , n} \ m−1 j=1 A j , which is also kept invariant by π and has size k m . Thus π as above is counted by i(n, k), as claimed.
Thus Proposition 3.5 applies and asserts that the number of such π is at least where L = m−1 i=1 L i is the total number of σ ij . Fix c such that S(c) M h, and suppose L i and ( ij ) 1 i m−1,1 j L i −1 have been chosen so that each ij is in the range M h < ij < 3M h and To bound this from below, we use the inequality By Lemma 3.4(a), we have The lower bound in Proposition 3.1 follows from the above inequality and Lemma 3.3.

3.2.
The upper bound in Proposition 3.1. Put k = k 1 and K = k m−1 . Suppose that π ∈ S n has invariant sets of sizes k 1 , . . . , k m . Then π = π 1 π 2 · · · π m , where π i is a product of disjoint cycles of total length k i . Fix a permutation τ ∈ S m such that C + (π τ (1) ) · · · C + (π τ (m) ) and, for each i, choose a cycle σ i of π τ (i) of length i = C + (π τ (i) ). Note then that 1 k and m−1 K. We can then write π as a product of disjoint permutations π = αα σ 1 · · · σ m−1 β, and also Moreover, since all cycles of π τ (1) other than σ 1 are cycles of α, we must have 1 + S(c) k. Therefore In particular, by (3.5) we have We can now show our hand. We will bound the number of choices for π by choosing first τ ∈ S m , then c such that (3.4) holds, then c such that (3.8) holds, ( i ) such that (3.7) and (3.6) hold, and finally disjoint α, α , σ 1 , . . . , σ m−1 , β of total length n such that α has type c, α has type c , σ i is a cycle of length i for each i, and every cycle of β has length at least m−1 and at least one cycle of length m .

The upper bound in Theorem 1.4
We now turn to the upper bound in Theorem 1. 4. Having proved our local-global principle Proposition 3.1, our aim is now to prove that (4.1) We begin with If we fix r = c 1 + · · · + c k , then 1 (4.4) The most common way for |L * m (a)| to be small is for many of the a i to be small. To capture this, letã 1 ã 2 · · · be the increasing rearrangement of the sequence a (the order statistics of a). Following the proof of Lemma 3.2(c), we find that (1 +ã 1 + · · · +ã j ) m−1 m r−j . (4.5) It is not unreasonable to expect that k a 1 ,...,ar=1 where here we have enlarged the domain of G to include r-tuples of positive real numbers. However, G is not an especially regular function and so (4.6) is perhaps too much to hope for. The function G is, however, increasing in every coordinate, and we may exploit this to prove an approximate version of (4.6).
1 To see the equality (4.3), associate to each vector a the vector c with c i the number of indices j such that a j = i. Then L m (c) = L * m (a), k j=1 j cj = a 1 · · · a r , and each c comes from r!/(c 1 ! · · · c k !) different choices of a. If one thinks of c 1 , . . . , c k as representing the number of j-cycles for j k in a random permutation π ∈ S n (which is only really valid in the limit n → ∞, with k fixed), then one can think of a 1 , . . . , a r as the lengths of the cycles of length at most k, in no particular order. |L * m (a)| a 1 · · · a r m r (1 + log k) r r! Ωr min 0 j r m −j (1 + k ξ 1 + · · · + k ξ j ) m−1 dξ, where Ω r = {ξ : 0 ξ 1 · · · ξ r 1}.

Proof.
Write h a for the harmonic sum a j=1 1/j. Motivated by the equality define the product sets Then (4.5) implies that k a 1 ,...,ar=1 Consider some t ∈ R(a). Writingt 1 t 2 . . . t r for the increasing rearrangement of t, and noting that a i < a j implies t i t j , we have In particular, from the inequality h a log(a + 1) we see thatt i ã i for all i. Hence The lemma now follows from the symmetry of the integrand and the bound h k 1+log k.
Having established Lemma 4.1, we can finish the proof of (4.1) by quoting [Kou10, Lemma 4.4]. Indeed, in the notation of that paper

Now, by [Kou10, Lemma 4.4] we have
U r m − 1 log m log k; m − 1 1 + |r − r * | 2 (r + 1)!(m r−r * + 1) uniformly for 0 r 10(m − 1)r * , where Otherwise, we use the trivial bound (from the j = 0 term in the minimum) since 10(m − 1)r * 5m(1 + log k) for large enough k in terms of m. Stirling's formula then completes the proof of (4.1) and thus that of the upper bound in Theorem 1.4.

The lower bound in Theorem 1.4
We now turn to the lower bound in Theorem 1.4. Having proved our local-global principle Proposition 3.1, our aim is now to prove that For each j 1, b j represents the number of cycles in the interval [e j−1 , e j ). By arguing just as in the derivation of (4.4), we have Proof. Given a ∈ N r and x ∈ Z m 0 , let R(a, x) be the number of partitions P such that x i = s∈P i a s for each i = 1, . . . , m. Then the support of R(a, x) is L * m (a), and x R(a, x) = m r , the total number of partitions P. Thus, Hölder's inequality yields that , so by another application of Hölder's inequality we have The lemma follows from this and (5.4).

5.2.
Bounding the low moment. Next, fix P and Q and consider S(P, Q), the sum of 1/(a 1 · · · a r ) over all solutions a to the linear system or, equivalently, In order to bound S(P, Q) we will in effect upper-triangularize this system. This process admits a convenient combinatorial description. Form a weighted graph G with vertices {1, . . . , m} by placing an edge between i 1 and i 2 whenever the equations in (5.5) indexed by i 1 and i 2 have a variable in common, i.e., whenever and weight w e = j se .
Note that if P i = Q i for some i, then the vertex labeled i is isolated in the graph G. Also, note that the labels must be distinct, while the weights need not be. If I n , 1 n N , are the components of G, we then find that P i 1 ∩ Q i 2 = ∅ whenever i 1 ∈ I n 1 and i 2 ∈ I n 2 for n 1 = n 2 . Consequently, so that the more components G has, the more relations we have between the partitions P and Q.
For a subgraph H ⊂ G (a subset of the vertices and edges of G), we denote by A(H) the set of labels occurring in H. We show in the next lemma that, given a subforest F ⊂ G (that is to say, an acyclic subgraph of G or, equivalently, a disjoint union of subtrees of G), the variables (a s ) s∈A(F ) are determined by (a s ) s / ∈A(F ) and (5.5). Moreover, the quality of the bound implied for S(P, Q) is measured by the total weight of F. Proof. Write A = A(F) for convenience. For the first part, first note that for any edge e = {i 1 , i 2 } ∈ F, the variable a se appears in the equations s∈P i \Q i a s − s∈Q i \P i a s = 0 for i = i 1 and i = i 2 , and no others, since the sets P 1 , . . . , P m are pairwise disjoint, and the same is true for the sets Q 1 , . . . , Q m . Thus, if i is a leaf of F and e is the edge of F incident with i, then, out of all the variables (a s ) s∈W , the equation involves only a se , so indeed a se is determined by (a s ) s / ∈W and (5.5). Next, remove e from F and continue inductively. To apply Lemma 5.2 most profitably, we should choose a subforest F ⊂ G which maximizes the total weight W (F) := Such a F will necessarily be a spanning subforest, and thus have the same number of connected components as G. See e.g. Figure 1.
Proof. We will consider several graphs throughout the proof, but we fix for all time the vertex set as {1, . . . , m}.
Before we begin, we make some observations. Fix, for the moment, two partitions P and Q, and consider the associated weighted graph G. As noted earlier, a F which is a heaviest subforest is a spanning subforest of G. For any s, denote by F s the subforest of F consisting of all edges e ∈ F with s e s. We now show that there is a heaviest subforest F with the following property: whenever s ∈ P i ∩ Q j , then i and j lie in the same component of F s . To see this, we separate three cases. i → · · · → i s → j → · · · → j with at least one label s < s and weight j s . But then we can create another subforest F by removing the edge {i , j } from F (breaking the tree) and adding the edge {i, j} (reconnecting the tree), whose label is max(P i ∩ Q j ) s with substitute weight at least j s j s .
We are now ready to prove the lemma. Given an ordered partition P, a forest F with N components, and a set of labels A = A(F) on the edges of F, write M (P, F, A) for the number of Q for which the associated graph G has a heaviest subforest F. The above discussion implies that for each s ∈ {1, . . . , r}, the number of possibile j ∈ {1, . . . , m} so that s ∈ Q j is at most |I sts |, where I s1 , . . . , I sNs denote the components of F s and t s is defined by s ∈ P i and i ∈ I sts . It follows that M (P, F, A) r s=1 |I sts |.
Note that max{x p 1 + · · · + x p n : x 1 + · · · + x n = m, x 1 , . . . , x n 1} = (m − n + 1) p + n − 1 for m n: this follows from convexity of the function (x 1 , . . . , x n ) → x p 1 + · · · + x p n , since the maximum of a convex function in a simplex occurs at one of its vertices. Therefore Let f denote the number of edges in F, and write s 1 < · · · < s f for the edge labels of F , which we know are distinct. We also write s 0 = 0 and s f +1 = r for convenience. Recall that N is the number of components of F, so that N 1 = N . Since a tree of n vertices contains exactly n − 1 edges, we must have that f = m − N .
Note that N s = N s i is constant when s ∈ (s i−1 , s i ], as well as that N s i = min{m, N s i−1 +1}, since the removal of one edge from F s i−1 cuts one component into two pieces, creating exactly one additional component in F s i . Consequently, N s i = N + i − 1 for i f and N s = m for s > s f , so that There are O m (1) forests F, and O m (1) orderings of the edges within each forest. Therefore, with the summand m r corresponding to f = 0, that is to say the forest with no edges. Lemma 3.7 in [Kou10] implies that provided that p is sufficiently close to 1 in terms of m, so that Proposition 5.4. If p ∈ (1, 2] is sufficiently close to 1 in terms of m, then Remark. The analysis given in this subsection differs technically from the corresponding analysis in [Kou10]. First of all, the combinatorial language of trees and forests used to describe the interdependencies in the relevant linear system is new, but even when both arguments are cast in this language, there is a difference, related to how we analyze the partitions giving rise to a particular heaviest subforest. The difference is parallel to that between two of the best known algorithms for finding a minimal spanning tree, namely Prim's algorithm, which builds a tree by repeatedly adding the least expensive edge growing out of the current tree, and Kruskal's algorithm, which builds a forest by repeatedly adding the least expensive edge which does not create a cycle. In [Kou10], the argument is more closely related to Prim's algorithm, while the argument here is more closely related to Kruskal's algorithm. 5. 3. Input from order statistics. Now fix r = m−1 log m J + O(1), and let B = B C,C be the set of all b = (b 1 , . . . , b J ) such that Here, C and C are two integers which we will choose to be sufficiently large depending only on m. In this case, Proposition 5.4 implies where the second inequality holds just because the product is convergent and we can choose C sufficiently large. Let R(b) be the set of all ξ ∈ [0, 1] r such that 0 ξ 1 · · · ξ r < 1 and such that, for each j ∈ {1, . . . , J − C}, exactly b j+C of the variables ξ s are such that where Y is the set of all ξ ∈ [0, 1] r such that 0 ξ 1 · · · ξ r < 1, ξ s (s − C 2 )/(CJ − C 2 ) for each s, and If C is large enough in terms of m, and then C is sufficiently large in terms of C, p and m, then [Kou10, Lemma 3.10] implies that It follows from this and a short calculation using Stirling's formula that The lower bound in Theorem 1.4 is now a direct corollary of this estimate and of Proposition 3.1.

Imprimitive transitive subgroups
In this section we use Theorem 1.4 to prove Theorem 1.2 by fleshing out the argument outlined in Section 2. We will start with bounded ν and gradually treat larger and larger ν.
To bound I(n) we will then use the trivial bound I(n) ν|n 1<ν<n I(n, ν).
(6.1) 6.1. Small ν. We did most of the work for the case in which ν is bounded already in Section 2. We state the conclusion here.
Proposition 6.1. Let ν be a bounded divisor of n. Then Proof. This follows immediately from Lemma 2.  6.2. Intermediate ν. For unbounded but not too large ν our goal is still to prove I(n, ν) n −1+1/(ν−1) . As long as ν is less than log n, this is not the same as n −1 , and as long as ν is less than (log n) 1/2 , this is not the same as n −1+1/ν , so we must continue to give special status to the partition (ν − 1, 1).
Thus it suffices to prove the upper bound. Consider a partition (d i ) of ν into m parts, where d 1 d 2 · · · d m . If m = 1, we have i(n, (n), (ν)) n −1+1/ν by Lemma 2.1(a). If m = 2, d 1 = 1 and d 2 = ν − 1 we get i(n, ((ν − 1)n/ν, n/ν), (ν − 1, 1)) n −1+1/(ν−1) as above. We will show that the sum of all other terms i(n, (d i n/ν) i , (d i ) i ) is O(n −1 ), which will prove the lemma. We will use Lemma 2.1(b), together with Lemma 2.1(c) for the parts d i 2, and the main result of [EFG15b] for the parts d i = 1, namely i(2n/ν, (n/ν, n/ν)) (cn/ν) −δ 2 for some absolute constant c ∈ (0, 1]. Writing λ for the number of i < m such that d i = 1, we find that where we used the fact that d m ν/m. We also make use of the following estimate: This is easily proved by observing that the term d = 2 is 1/(2x) 1/2 , while the terms with d > 2 contribute at most Now, we use the above discussion to estimate the total contribution of the remaining terms i(n, (d i n/ν) i , (d i ) i ). First, we deal with those terms that have m = 2. Relations (6.2) and (6.3) imply that By [DFG08, Theorem 6.3(1)], I(n, ν) is bounded by the coefficient of z ν in Consider for a moment the polynomial Clearly, p has nonnegative coefficients, p(0) = 1, and p(1) = s. In particular, p is a convex function in [0, 1], and we deduce that Inserting x = 1/k, we find that Thus the coefficients of f (z) are bounded by those of Since s n 1/2 , this implies that I(n, ν) n −1+1/s , as claimed.
Theorem 1.2 follows immediately from Propositions 6.1, 6.2, and 6.4, the bound (6.1), and the divisor bound, which states that the number of divisors ν of n is bounded by n O(1/ log log n) .
Also, by Theorem 1.2, I(n, p 1 ) n −1 . Hence I(n) log n log log n n −1 log n log log n I(n, p).

Primitive subgroups
We start by recalling the definition of wreath product. The reader may refer to [Rot95, Chapter 7] for more details. Let D and Q be groups with Q acting on some set Ω. Then Q acts on the set of functions D Ω via the operation 2 q · (d ω ) ω∈Ω := (d q −1 ω ) ω∈Ω .
If D also acts on some set, say Λ, then D Q acts on Λ Ω via the operation (There is also a natural action of D Q on Λ×Ω, defined by ((d ω ) ω∈Ω , q)·(λ,ω) := (d qω λ, qω), but this action is generically imprimitive, so it will not concern us here.) Moreover, this action is faithful if the actions of D on Λ and Q on Ω are so (and |Λ| 2), in which case D Q can be realized as a subgroup of S Λ Ω . In the special case when D = S a , Q = S b , Λ = {1, . . . , a} and Ω = {1, . . . , b}, we find that S a S b is a transitive subgroup of S a b .
We need one last definition: given a nontrivial subgroup G of S n , the minimal degree of G is the smallest number of points moved by a nontrivial element of G. Obviously if 1 = H G then the minimal degree of G is at most that of H.
We will combine the following two results.
Theorem 7.1 (Bovey [Bov80]). Let α ∈ (0, 1). If we choose π from S n uniformly at random, then the probability that π = 1 and π has minimal degree at least n α is n −α+oα(1) . In fact the constant 1/3 in this theorem can be improved to 3/7, and even to 1/2 with explicit exceptions: see Guralnick and Magaard [GM98]. However we only need the following corollary.
Corollary 7.3. Let G be a primitive subgroup of S n of minimal degree at most n 1−ε , and assume that n is sufficiently large depending on ε. Then there are positive integers m, k, r with k, r ε 1 such that A ×r m G S m S r , with the action described in Theorem 7.2. In particular, one of the following alternatives holds: (i) G = S n or A n ; (ii) G S m , where S m acts on k-sets of {1, . . . , m}, n = m k , and 1 < k ε 1; or (iii) G S m S r , where S m S r acts on {1, . . . , m} r , n = m r , and 1 < r ε 1.
Proof. Let ∆ be the set of k-sets in {1, . . . , m}. We must show that the minimal degree of S m S r acting on ∆ r is at least n 1−ε unless k, r ε 1. Let g = (π 1 , . . . , π r ; σ) ∈ S m S r . We note that an r-tuple (A 1 , . . . , A r ) ∈ ∆ r is a fixed point of g if, and only if, π j (A σ −1 (j) ) = A j (1 j r). (7.1) We separate two cases. First, suppose that σ = 1. In particular, σ has a cycle of length s > 1, say (1 · · · s). We then find that g respects the decomposition ∆ r = ∆ s × ∆ r−s , and g has at most m k fixed points in its action on ∆ s : if we know A 1 and π 1 , . . . , π s , then A 2 , . . . , A s are determined by the relations (7.1). Thus g has at most On the other hand if σ = 1, then g fixes the point (A 1 , . . . , A r ) ∈ ∆ r if and only if π i fixes A i for each i. Clearly then the greatest number of points are fixed by an element of the form (π 1 , 1, . . . , 1) with π 1 = 1. Find x ∈ {1, . . . , m} such that π 1 (x) = x. Consequently, if π 1 fixes A, then either x, π 1 (x) ∈ A or x, π 1 (x) / ∈ A. We thus find that the number of fixed points of g acting on ∆ r is at most , and, as a matter of fact, exactly that if π 1 is a transposition. By comparing with (7.2), we see that the greatest number of points are fixed by a transposition in one coordinate in the base, so the minimal degree of S m S r acting on ∆ r is This is at least m k r(1−ε) unless k, r ε 1.
The last part of the corollary follows by assigning the case k = r = 1 to (i), the case k > 1, r = 1 to (ii), and the case r > 1 to (iii). In the last case we must replace m by m = m k . We need a couple lemmas to help rule out cases (ii) and (iii) of Corollary 7.3.

Proof.
Write Ω for {1, . . . , m} and Ω k for the set of k-sets of Ω. Either there are at least m 1/2 disjoint cycles in Ω, or there is a cycle of length at least m 1/2 . In the former case we get at least one cycle in Ω k for each choice of k distinct cycles in Ω, so there are at least m 1/2 k k m k/2 m 1/2 cycles in Ω k . In the latter case, fix a cycle C in Ω of length at least m 1/2 . There are |C| k k-sets contained in C, and each cycle in C k has length at most |C|, so there are at least 1 |C| |C| k k |C| k−1 m 1/2 cycles in C k . Lemma 7.5. If r 2, then every g ∈ S m S r which is nontrivial in the S r factor has at least m/r cycles in its action on {1, . . . , m} r .
The coordinates appearing here are conjugate to one another, so they have the same number of cycles of each length i, say c i . But if x 1 , . . . , x s are each contained in cycles of length i, then (x 1 , . .