Beyond Expansion IV: Traces of Thin Semigroups

We continue our study of particular instances of the Affine Sieve, producing levels of distribution beyond those attainable from expansion alone. Motivated by McMullen's Arithmetic Chaos Conjecture regarding low-lying closed geodesics on the modular surface defined over a given number field, we study the set of traces for certain sub-semi-groups of SL2(Z) corresponding to absolutely Diophantine numbers. In particular, we are concerned with the level of distribution for this set. While the standard Affine Sieve procedure, combined with Bourgain-Gamburd-Sarnak's resonance-free region for the resolvent of a"congruence"transfer operator, produces some exponent of distribution alpha>0, we are able to produce the exponent alpha<1/3. This recovers unconditionally the same exponent as what one would obtain under a Ramanujan-type conjecture for thin groups. A key ingredient, of independent interest, is a bound on the additive energy of SL2(Z).


Introduction
In this paper, we reformulate McMullen's (Classical) Arithmetic Chaos Conjecture (see Conjecture 1.6) as a local-global problem for the set of traces in certain thin semigroups, see Conjecture 1.13. Our main goal is to make some partial progress towards this conjecture by establishing strong levels of distribution for this trace set, see §1.5.

Low-Lying Closed Geodesics With Fixed Discriminant.
This paper is motivated by the study of long closed geodesics on the modular surface defined over a given number field, which do not have high excursions into the cusp. Let us make this precise.
To set notation, let H denote the upper half plane, and let X = T 1 (SL 2 (Z)\H) ∼ = SL 2 (Z)\ SL 2 (R) be the unit tangent bundle of the modular surface. A closed geodesic γ on X corresponds to a hyperbolic matrix M ∈ SL 2 (Z) (more precisely its conjugacy class). Let α M ∈ ∂H be one of the two fixed points of M , the other being its Galois conjugate α M ; then γ is the projection mod SL 2 (Z) of the geodesic connecting α M and α M . We will say that γ is defined over the (real quadratic) field K = Q(α M ) and has discriminant ∆ M , where ∆ M is the discriminant of K. This discriminant is (up to factors of 4) the square-free part of To study excursions into the cusp, let Y (γ) denote the largest imaginary part of γ in the standard upper-half plane fundamental domain for the modular surface. Given a "height" C > 1, we say that the closed geodesic γ is low-lying (of height C) if Y (γ) < C. By the well-known connection [Art24,Ser85] between continued fractions and the cutting sequence of the geodesic flow on X , the condition that γ be low-lying can be reformulated as a Diophantine property on the fixed point α M of M , as follows. Write the (eventually periodic) continued fraction expansion Question 1.2. Given a real quadratic field K and a height C, can one find longer and longer primitive closed geodesics defined over K which are low-lying of height C? Equivalently, given a fixed fundamental discriminant ∆ > 0 and a height A ≥ 1, we wish to find larger and larger (non-conjugate) matrices M so that their fixed points α M are Diophantine of height A, and so that t = tr(M ) solves the Pell equation t 2 −∆s 2 = 4; cf. (1.1). If solutions exist, how rare/ubiquitous are they?
Example 1.3. An example with the field K = Q( √ 3) of discriminant ∆ = 12, and heights C = 2, A = 3 is illustrated in Figure 1 Observe that α M is Diophantine: the partial quotients of α M are all at most A = 3. Moreover, it is evident from Figure 1 that γ is low-lying: we clearly have Y (γ) < C = 2.

Arithmetic Chaos.
On one hand, the answer to Question 1.2 is, on average, negative. Indeed, Duke's equidistribution theorem [Duk88] forces generic closed geodesics to have arbitrarily high excursions into the cusp. On the other hand, McMullen's (Classical) Arithmetic Chaos Conjecture (see [McM09,McM12] for the dynamical perspective and origin of this problem) predicts that solutions exist, and while not of positive proportion, they should be of positive entropy: Conjecture 1.5 (Arithmetic Chaos [McM12]). There exists an absolute height A ≥ 2 so that, for any fixed real quadratic field K, the cardinality of the set [a 0 , a 1 , . . . , a ] ∈ K : 1 ≤ a j ≤ A (1.6) grows exponentially, as → ∞.
Remark 1.7. Though we have stated the conjecture with some absolute height A, McMullen formulated this problem with A = 2 (of course A = 1 only produces the golden mean). He further suggested it should also hold whenever the corresponding growth exponent exceeds 1/2, see Remark 1.15.
Remark 1.8. As pointed out to us by McMullen, one can also formulate a GL n (Z) version of Arithmetic Chaos by strengthening [McM09, Conjecture 1.7 (3)] so as to postulate exponential growth of periodic points and positive entropy, instead of just infinitude.
It is not currently known whether the following much weaker statement is true: for some A and every K, the cardinality of the set (1.6) is unbounded. Even worse, it is not known whether (1.6) is eventually non-empty, that is, whether there exists an A ≥ 2 so that any K contains at least one element which is Diophantine of height A. Some progress towards Conjecture 1.5 appears in [Woo78,Wil80,McM09,Mer12], where special periodic patterns of partial quotients are constructed to lie in certain prescribed real quadratic fields. (In particular, we constructed our example (1.4) by following Wilson's algorithm [Wil80,p. 139].) These results prove that for any K, there exists an A = A(K) so that the cardinality of (1.6) is unbounded with ; but exponential growth is not known in a single case. In light of this conjecture, we call a number absolutely Diophantine if it is Diophantine of height A for some absolute constant A ≥ 1. That is, when we speak of a number being absolutely Diophantine, the height A is fixed in advance. In the next subsection, we describe a certain "local-global" conjecture which has the Arithmetic Chaos Conjecture as a consequence.
First we need some more notation. Consider a finite subset A ⊂ N, which we call an alphabet, and let denote the set of all α ∈ R with all partial quotients in A. If max A ≤ A for an absolute constant A, then every α ∈ C A is clearly absolutely Diophantine. Assuming 2 ≤ |A| < ∞, each C A is a Cantor set (see Figure 2) of some Hausdorff dimension 0 < δ A < 1, and by choosing A appropriately (for example, A = {1, 2, . . . , A} with A large), one can make δ A arbitrarily close to 1 [Hen92].
It is easy to see that the matrix M = a 0 1 1 0 so we introduce the semi-group 1 of all such matrices whose fixed points α M lie in C A . Preferring to work in SL 2 , we immediately pass to the even-length (determinant-one) subsemi-group which is (finitely) generated by the products a 1 1 0 · b 1 1 0 , for a, b ∈ A. Having accounted for the "low-lying" (or Diophantine) criterion, we must study the discriminants, or what is essentially the same, the set Borrowing language from Hilbert's 11 th problem on numbers represented by quadratic forms, we call an integer t admissible (for the alphabet A) if for every q ≥ 1, that is, if t passes all finite local obstructions.
Remark 1.12. If {1, 2} ⊆ A, then, allowing inverses, the group Γ A generated by the semigroup Γ A is all of SL 2 (Z), and hence every integer is admissible. In general, Strong Approximation [MVW84] shows that admissibility can be checked using a single modulus q(A).
We say t is represented if t ∈ T A , and let M A (t) denote its multiplicity, M A (t) := #{M ∈ Γ A : tr M = t}.
Since the entries of Γ A are all positive, the multiplicity is always finite.
The following conjecture seems plausible.
Conjecture 1.13 (Local-Global Conjecture). If the dimension δ A exceeds 1/2, then the set T A of traces contains every sufficiently large admissible integer. Moreover, the multiplicity M A (t) of an admissible t ∈ [N, 2N ) is at least (1.14) Remark 1.15. It is now clear how to generalize Conjecture 1.5; the same should hold for a j restricted to any alphabet A, as long as δ A > 1/2.
Our interest in this conjecture stems from the following where M 2 = tr(M t M ). That is, the log-norm and wordlength metrics are commensurable. Choose a large parameter N , and find a solution t ∆ N to the Pell equation t 2 − ∆s 2 = 4. Every t is admissible for this alphabet A, so (1.14) implies the existence of at least M A (t) > N c > c matrices M ∈ Γ A with trace t, as desired.
Remark 1.18. It is clear from the proof that to establish the Arithmetic Chaos Conjecture, it would be enough to demonstrate Conjecture 1.13 with the exponent 2δ A −1 in (1.14) replaced by any constant c > 0. The reason behind predicting this particular exponent is clarified below.
In light of the lemma, we switch our focus henceforth to the localglobal problem. In the next subsection, we present some evidence for the conjecture, before stating our partial results.

Hensley's Theorem.
A reformulation of a theorem of Hensley's [Hen89] shows that the traces up to N , counted with multiplicity, satisfy (1.20) If one expects the admissible traces of size t N to be roughly uniformly distributed, then each should appear with multiplicity about N 2δ A −1 . This is some evidence for the exponent in the multiplicity bound (1.14). When δ A > 1/2, this exponent is positive, so the multiplicity should eventually be positive; that is, sufficiently large admissible t's should be represented, as claimed in Conjecture 1.13.
The estimate (1.20) also shows that the number 1/2 in Conjecture 1.13 cannot be reduced; if δ A < 1/2, then the set of traces is certainly a thin subset of the integers.

The Circle Method.
A circle method approach as in [BK11] would predict that for t N , where S(t) ≥ 0 is a certain "singular series" which vanishes for nonadmissible t, and otherwise can fluctuate by factors of log log N . Some numerical evidence for this behavior is presented in Figure 3. Here we have taken the alphabet A = {1, 2, . . . , 10} with dimension δ A ≈ 0.9257 [Jen04], and plotted t versus the ratio of the multiplicity M A (t) to the expected size t 2δ A −1 . All numbers are admissible for this alphabet, and the value t = 49 is the largest known to not be represented.

A Lower Bound.
For another piece of evidence, we have the following trivial Lemma 1.21. Counting without multiplicity, we have Proof. We first bound the multiplicity of t < N by In particular, choosing A so that δ A is sufficiently near 1, one can produce N 1−ε traces in T A , for any fixed ε > 0. Of course this is not even a positive proportion of numbers, so is still very far from the Local-Global Conjecture 1.13. The exponent 2δ A − 1 in (1.22) can be improved by methods similar in spirit to those going into [BK11, Theorem 1.22].

Zaremba's Conjecture.
Perhaps our most convincing evidence is the similarity of Conjecture 1.13 to Zaremba's Conjecture on absolutely Diophantine rational numbers, see [BK11]. Here one considers the same semi-groups Γ A for a fixed finite alphabet A, but instead of studying the traces, for a b c d ∈ Γ A . To make this more precise, let A := {a ∈ Z : ∃ a * * * ∈ Γ A } be the set of top-left entries in question and let denote the set of "admissible" numbers in this context. Building on Hensley's Conjecture [Hen96, Conjecture 3], the authors proposed in [BK11, Conjecture 1.7] the conjecture that, if δ A > 1/2, then every sufficiently large member of B also belongs to A . Moreover, they showed the following density-one version of this conjecture, together with a multiplicity bound, under a more stringent restriction on the dimension δ A .
as N → ∞. Moreover, the multiplicity of an admissible a ∈ A of size a N produced in the above is at least 2 of the order N 2δ A −1−ε .
Remark 1.26. To make the connection to Conjecture 1.13 more clear, Theorem 1.24 is a translation of the former in which (1) the set T A of traces is replaced by the set A of top-left entries, (2) a new version of "admissibility" is encoded in the set B, (3) "contains every sufficiently large" is replaced by "contains almost every," and (4) the condition δ A > 1/2 is replaced by the more stringent con- Even a density-one version of Conjecture 1.13 (as conjectured by Mc-Mullen [McM12]) would not, as far as we know, have the Arithmetic Chaos Conjecture 1.5 as a consequence, since solutions to Pellian equations are exponentially rare. Nevertheless, proving such a result may be an important first step, and it is not even known for a single finite alphabet A that T A contains a positive proportion of numbers.
Results of this strength seem out of reach of current technology. Therefore we shift our focus to the study of the arithmetical properties of the trace set, more specifically to its equidistribution along progressions, with applications to almost primes. Our main goal is to make some progress in this direction.

Statements of the Main Theorems.
In this subsection, we state our main theorems, though we defer the precise (and somewhat technical) definitions to the next section.
For several applications, an important barometer of our understanding of a sequence is its level of distribution, defined roughly as follows. In our context, we wish to know that the traces in T A up to some growing parameter N are equi-distributed along multiples of integers q, with q as large as possible relative to N . That is, the quantity #{t ∈ T A : t < N, t ≡ 0(q)}, counted with multiplicity, should be "close" to in the sense that their difference should be much smaller than the total number of t ∈ T A up to N . This proximity cannot be expected once q is as large as N , say, but perhaps can be established with q of size N 1/2 or more generally N α for some α > 0. If this is the case, in an average sense, then N α is called a level of distribution for T A , and α is called an exponent of distribution. Let us make matters a bit more precise. Looking at traces up to N counted with multiplicity is essentially the same as looking at matrices in the semigroup Γ A of norm at most N . Writing for the "remainder" terms, we will say, again roughly, that In applications it is enough to consider only square-free q in the sum. If (1.27) can be established with Q as large as N α , then we will say T A has exponent of distribution α. See §2.1 for a precise definition of level and exponent of distribution.
Remark 1.28. Note that a level and exponent of distribution is not a quantity intrinsic to T A , but rather a function of what one can prove about T A . The larger this exponent, the more control one has on the distribution of T A on such arithmetic progressions.
Remark 1.29. The set T A is of Affine Sieve type; see [BK13] for a definition. As such, the general Affine Sieve procedure introduced in [BGS06, BGS10], combined with the "expansion property" established in [BGS11], shows that T A has some exponent of distribution α > 0, see §2.2.
Our main goal in this paper is to make some partial progress towards Conjecture 1.13 by establishing levels of distribution for T A beyond those available from expansion alone.
Theorem 1.30. For any small η > 0, there is an effectively computable δ 0 = δ 0 (η) < 1 so that, if the dimension δ A of the alphabet A exceeds δ 0 , then the set T A has exponent of distribution (1.31) We can make further progress, assuming the following Conjecture 1.32 (Additive Energy in SL 2 (Z)). For any ε > 0, If true, this conjecture is sharp, in the sense that the exponent 4 in (1.33) cannot be replaced by a smaller number (since the diagonal γ 1 = γ 3 , γ 2 = γ 4 already contributes at least N 4 terms to the count). Conditioned on this conjecture, we have the following Theorem 1.34. Assume Conjecture 1.32. Then Theorem 1.30 holds with (1.31) replaced by (1.35) Applying standard sieve theory [Gre86], these levels of distribution have the following immediate corollary on almost primes. Recall that a number is R-almost-prime if it has at most R prime factors.
Corollary 1.36. There exists an effectively computable δ 0 < 1 so that, if the dimension δ A of the alphabet A exceeds δ 0 , then the set T A of traces contains an infinitude of R-almost-primes, with R = 5. Assuming Conjecture 1.32, the same holds with R = 4.
As an afterthought, we explore what can be said about R-almostprimes, not in the set T A of traces, but in the set of discriminants which arise. To this end, recalling (1.1), we define where sqf(·) denotes the square-free part. As explained in §7, an easy consequence of Mercat's thesis [Mer12], combined with Theorem 1.24 and Iwaniec's theorem [Iwa78], gives the following Theorem 1.38. For the alphabet A = {1, . . . , 50}, the set D A contains an infinitude of R-almost-primes with R = 2.
In §2, we give precise definitions of level and exponent of distribution, thus making unambiguous the statements of Theorems 1.30 and 1.34. There we also discuss the main ingredients involved in the proofs. In §3, we give some preliminaries needed in the analysis, the proofs of which are reserved for the two appendices. We spend §4 constructing the sequence A and executing the main term analysis. The error analysis is handled separately: Theorem 1.30 is proved in §5 and Theorem 1.34 is proved in §6. Finally, Theorem 1.38 is proved quickly in §7. Some technical calculations are reserved for the appendices.

Notation.
We use the following notation throughout. Set e(x) = e 2πix and e q (x) = e( x q ). We use the symbol f ∼ g to mean f /g → 1. The symbols f g and f = O(g) are used interchangeably to mean the existence of an implied constant C > 0 so that f (x) ≤ Cg(x) holds for all x > C; moreover f g means f g f . The letters c, C denote positive constants, not necessarily the same in each occurrence. Unless otherwise specified, implied constants may depend at most on A, which is treated as fixed. The letter ε > 0 is an arbitrarily small constant, not necessarily the same at each occurrence. When it appears in an inequality, the implied constant may also depend on ε without further specification. The symbol 1 {·} is the indicator function of the event {·}. The trace of a matrix γ is denoted tr γ. The greatest common divisor of n and m is written (n, m) and their least common multiple is [n, m]. The function ν(n) denotes the number of distinct prime factors of n. The cardinality of a finite set S is denoted |S| or #S. The transpose of a matrix g is written t g. When there can be no confusion, we use the shorthand r(q) for r(mod q). The prime symbol in Σ r(q) means the range of r(mod q) is restricted to (r, q) = 1. The set of primitive vectors in Z 4 (ones with coprime coordinates) is denoted P(Z 4 ). It is our pleasure to thank Curt McMullen for many detailed comments and suggestions on an earlier version of this paper, and Tim Browning, Zeev Rudnick, and Peter Sarnak for illuminating conversations. Thanks also to Michael Rubinstein for numerics in support of Conjecture 1.32. The second-named author would like to thank the hospitality of the IAS, where much of this work was carried out.

Levels of Distribution.
In this subsection, we give precise definitions of level and exponent of distribution. Fix the alphabet A and let T A be the set of traces of Γ A . First we assume that the set of traces is primitive, that is, gcd(T A ) = 1. (2.1) If not, 4 then replace T A by T A / gcd(T A ). Given a large parameter N , let A = {a N (n)} be a sequence of non-negative numbers supported on T A ∩ [1, N ], and set |A| = n a N (n).
We require that A is well-distributed on average over multiples of square-free integers q. More precisely, setting (1) the "local density" β is a multiplicative function assumed to satisfy the "linear sieve" condition for some C > 1 and any 2 ≤ w < z; and (2) the "remainders" r(q) are small on average, in the sense that q<Q |r(q)| K 1 (log N ) K |A|, (2.4) for some Q ≥ 1 and any K ≥ 1. That is, we require an arbitrary power of log savings. If a sequence A exists for which the conditions (2.2)-(2.4) hold, then we say that T A has a level of distribution Q. If (2.4) can be established with Q as large as a power, then we say that T A has an exponent of distribution α.

The Main Ingredients.
This subsection is purely heuristic and expository. First we recall how the "standard" Affine Sieve procedure applies in this context, explaining Remark 1.29. Since δ A is assumed to be large, we must have {1, 2} ⊂ A, whence for all q ≥ 1, the reduction Γ A (mod q) is all of SL 2 (q); cf. Remark 1.12. Initially, we could construct the sequence A by setting which is clearly supported on n ∈ T A , n N . Then (1.20) gives and |A q | can be expressed as where we have decomposed the γ sum into residue classes mod q. A theorem of Bourgain-Gamburd-Sarnak [BGS11] in this context states very roughly (see Theorem 3.2 for a precise statement) that #{γ ∈ Γ A : γ < N, γ ≡ γ 0 (q)} (2.9) for some Θ > 0. (We reiterate that the error in (2.9) is heuristic only; a statement of this strength is not currently known. That said, the true statement serves the same purpose in our application.) This is the "spectral gap" or "expander" property of Γ A , and follows from a resonance-free region for the resolvent of a certain "congruence" transfer operator, see §3.1. Inserting the expander property (2.9) into |A q | in (2.8) gives the desired decomposition (2.2), with local density Then the local density condition (2.3) follows classically from primitivity (2.1), and, in light of (2.7), the average error condition (2.4) requires (a condition weaker than) In this way, one can prove some exponent of distribution α > 0, cf. Remark 1.29, but without making numeric the error term in (2.9), there is not more one can currently say. While the constants C and Θ are in principle effectively computable, if one were to estimate them numerically, the known methods would lead to an astronomically small exponent α.
The novel technique employed here, used in some form already in [BK10, BK11, BK12, BK13], is to take inspiration from Vinogradov's method, developing a "bilinear forms" approach, as follows. Instead of (2.6), let X and Y be two more parameters, each a power of N , with XY = N , and set (roughly) (2.10) This sum better encapsulates the group structure of Γ A , while still only being supported on the traces T A of Γ A . Again, this is still an oversimplification; see §4 for the actual construction of A. Instead of directly appealing to expansion as in (2.8), we first invoke finite abelian harmonic analysis, writing (2.11) After some manipulations, we decompose our treatment according to whether q is "small" or "large". For q small, we apply expansion as before. For q large, the corresponding exponential sum already has sufficient cancellation (on average over q up to the level Q) that it can be treated as an error term in its entirety. It is in this range of large q that we exploit the bilinear structure of (2.10). On several occasions we replace the deficient group Γ A by all of SL 2 (Z); this perturbation argument only works when δ A is near 1, at least some δ 0 . Assuming Conjecture 1.32, we are able to more efficiently estimate the resulting exponential sums, giving the improvement from Theorem 1.30 to Theorem 1.34.

Expansion.
Let A ⊂ N be a finite alphabet with dimension δ A sufficiently near 1. As such, it must contain the sub-alphabet A 0 := {1, 2} ⊂ A. This has the consequence that for all q ≥ 1, Γ(mod q) ∼ = SL 2 (q), (3.1) cf. Remark 1.12. Furthermore, we will only require expansion for the fixed alphabet A 0 , so as to make the expansion constants absolute, and not dependent on A; see Remark 5.24. To this end, let Γ 0 ⊂ SL 2 (Z) be the semigroup as in (1.11) corresponding to A 0 . It is easy to see that Γ 0 is free, that every non-identity matrix γ ∈ Γ 0 is hyperbolic, and that tr γ γ .
The following theorem is a consequence of the general expansion theorem proved by Bourgain-Gamburd-Sarnak in [BGS11]. so that, for any square-free q ≡ 0(B) and any ω ∈ SL 2 (q), as Y → ∞, we have Remark 3.7. This theorem is proved in [BGS11, see Theorem 1.5] under the assumption that Γ is a convex-cocompact subgroup of SL 2 (Z), but the proof is the same when the group is replaced by our free semigroup Γ 0 ; we emphasize again that Γ 0 has no parabolic elements. The error term (3.6) is the consequence of a Tauberian argument applied to a resonance-free region [BGS11, see Theorem 9.1] of the form for the resolvent of a certain "congruence" transfer operator, see [BGS11,§12] for details. For small q, we only obtain a "Prime Number Theorem"quality error (given here in crude form), while for larger q, (3.8) is as good as a resonance-free strip.
We have stated the result only for the case B | q. The distribution modulo B cannot be obtained directly from present methods, even though all reductions of Γ are surjective; cf. (3.1). Nevertheless, one can construct a set which has the desired equidistribution for all q, as claimed in the following Proposition 3.9. Given any Y 1, there is a non-empty subset ℵ = ℵ(Y ) ⊂ Γ 0 so that (1) for all a ∈ ℵ, a < Y , and (2) for all square-free q and a 0 ∈ SL 2 (q), (3.10) Here E is given in (3.6).
Note that we do not have particularly good control on the cardinality of ℵ; regardless, the estimate (3.10) is only nontrivial if q < Y Θ/C . The construction of the set ℵ proceeds in a similar way to [BK11, §8]; we sketch a proof of Proposition 3.9 for the reader's convenience in Appendix A.

An Exponential Sum over SL 2 (Z).
In this subsection, we state an estimate, showing roughly that there is cancellation in a certain exponential sum over SL 2 (Z) in a ball. We identify Z 4 with M 2×2 (Z), and observe that for A, B ∈ M 2×2 (Z), where the operation on the right is the dot product in Z 4 . Recall that P(Z 4 ) denotes the set of primitive vectors in Z 4 , that is, those for which the coordinates are coprime.
Proposition 3.12. Let X 1 be a growing parameter and for a fixed non-negative, smooth, even function ϕ : R → R ≥0 of compact support, which is assumed to be at least 1 on [−10, 10], let ϕ X : M 2×2 (R) → R ≥0 be given by (Then ϕ X (γ) ≥ 1 if γ < X.) For any q ≥ 1, any a primitive vector s ∈ P(Z 4 ), and any ε > 0, The proof is an application of Kloosterman's version of the circle method. Since it is of a more classical nature, we give a sketch in Appendix B.

Construction of A.
The first goal in this subsection is to construct the appropriate sequence A = {a N (n)}. Let A ⊂ N be our fixed alphabet with corresponding dimension δ A near 1, and let Γ A be the semigroup in (1.11). Since A is fixed, we drop the subscripts, writing Γ = Γ A and δ = δ A .
Let N be the main growing parameter, and let be some parameters to be chosen later; in particular, We think of X as large, X > N 1/2 , and Y as tiny. The final choices of the parameters depend on the treatment, that is, whether we are proving Theorem 1.30 or the conditional Theorem 1.34. which is supported on n N by (4.2). We have that |A| = |Ξ| · |ℵ| · |Ω| |ℵ|(XZ) 2δ . (4.6) Next for parameters 1 Q 0 < Q and any square-free q < Q, we decompose (4.7) say, according to whether or not q < Q 0 . Here will be treated as a "main" term, the remainder r(q) being an error.

Main Term Analysis.
We now analyze the M q term, proving the following Proposition 4.9. Let β be the multiplicative function given at primes by where χ 4 is the Dirichlet character mod 4. There is a decomposition Proof. Inserting the definition (4.5) of a N into (4.8) gives Apply (3.10) to the innermost sum, giving e q (r tr(γ)), and The error E is as given in (3.6). We estimate thus proving (4.12).
Returning to M (1) q , we add back in the large divisors q | q, writing e q (r tr(γ)).
Lastly, we deal with r (2) . It is easy to see from the above that |ρ(p)| 1/p, so |ρ(q)| q ε /q, giving the bound The estimate (4.13) follows immediately, completing the proof.
Remark 4.14. Since Y in (4.1) is a small power of N , the first error term in (4.12) saves an arbitrary power of log N , as required in (2.4).
For the rest of the paper, all other error terms will be power savings.
In particular, setting the error in (4.13) is already a power savings, while the second term in (4.12) requires that It is here that we crucially use the expander property for Γ, but the final level of distribution will be independent of Θ.

Initial Manipulations.
Returning to (4.7), it remains to control the average error term e q (r tr(ξaω)) . (4.17) We first massage E into a more convenient form. Let ζ(q) := |r(q)|/r(q) be the complex unit corresponding to the absolute value in (4.17), and rearrange terms as: where we have set Leaving the special set ℵ alone, we break the q sum into dyadic pieces and estimate ζ 1 (q) log Q. We obtain It remains to estimate E 1 (Q; a). In the next two sections, we give two different treatments, depending on whether or not we allow ourselves to use the Additive Energy Conjecture 1.32.

Proof of Theorem 1.30
In this section, we analyze E 1 (Q; a) in (4.19) unconditionally, that is, without use of the Additive Energy Conjecture 1.32. Our first main result is the following Theorem 5.1. For any ε > 0, and any 1 we have To begin the proof, we apply Cauchy-Schwarz in the "long" variable ξ in (4.19), giving Here we have extended the ξ sum to all of SL 2 (Z), and inserted the weighting function ϕ X from Proposition 3.12. Since the trace of a product is a dot-product (on identifying Z 4 with M 2×2 (Z) as in (3.11)), it is linear, and hence when we open the square, we obtain |E 1 (Q; a)| 2 |Ξ| q,q Q ω,ω r(q) r (q ) ξ∈SL 2 (Z) ϕ X (ξ) e ξ · r q aω − r q aω .
(5.4) Write the bracketed expression in lowest terms as with s = s(q, q , r, r , ω, ω , a) ∈ P(Z 4 ) a primitive vector and q 0 ≥ 1 depending on the same parameters as s. To study this expression in greater detail, we introduce some more notation. All variables labelled q, however decorated, denote square-free numbers. Write q := (q, q ), q = q 1 q, q = q 1 q, q := [q, q ] = q 1 q 1 q, and observe from (5.5) that q 1 q 1 | q 0 and q 0 | q. Hence we can furthermore write q 0 := (q 0 , q), q = q 0 q 0 = q 1 q 1 q 0 q 0 , whence q 0 = q 1 q 1 q 0 . Note also that Q q Q 2 .
Since (q 1 r , q 0 ) = 1 = (q 1 r, q 0 ), we obtain where u 2 ≡ 1( q 0 ). There are at most 2 ν( q 0 ) N ε such u(mod q 0 ), where ν(m) is the number of distinct prime factors of m. It follows that ω ≡ uω mod q 0 . (5.7) Returning to (5.4), we will only get sufficient cancellation in the ξ sum when q 0 is not too small. So we introduce another parameter and break the sum according to whether or not q 0 ≤ Q 1 , writing Here for ♦ ∈ {≤, >}, we have written E ♦ := Q q Q 2 q 1 q 1 q 0 q 0 = q, q:=q 1 q 0 q 0 Q, q :=q 1 q 0 q 0 Q, q 0 :=q 1 q 1 q 0 ♦ Q 1 ϕ X (ξ) e q 0 (ξ · s) . (5.10) We give a separate analysis for the values of ♦ in the following two subsections.

Analysis of E ≤ .
In this subsection, we prove the following Proposition 5.11. With notation as above, Proof. Since q 0 is small, we may not have any cancellation in the ξ sum and instead save by turning the modular restriction (5.7) into an archimedean one, as follows.
Observe that we have Then choosing with implied constants small enough forces ω = uω from (5.7) and (4.3) that ω , ω < Z.

1.
Working from the inside out, the innermost ξ sum contributes X 2 .
There are at most q / q 0 values for r , and at most q values for r; note that The u sum contributes N ε , as does the sum on divisors of q. Putting everything together gives our final estimate The claim easily follows.

5.2.
Estimate of E > . Now we give the following estimate.
Proposition 5.14. Keeping notation as above, we have Proof. Returning to (5.10), we are now in the large q 0 range, so (5.7) cannot be effectively used; on the other hand, we are in position to exploit cancellation in the ξ sum using Proposition 3.12. Applying the estimate (3.13) and using (5.6) gives Now we argue as follows. We estimate the ω, ω sum trivially. The r, r sums again contribute at most qq / q 0 = qq q 0 / q, and the u sum at most N ε . Then we have |E > | N ε q Q 2 q 1 q 1 q 0 q 0 = q, q=q 1 q, q =q 1 q, q,q Q q 0 =q 1 q 1 q 0 >Q 1 qq q |Ω| 2 X 2 q 1/2 0 from which the claim follows.
We are now in position to prove (5.3).
5.3. Proof of Theorem 5.1. Combining (5.9) with (5.12) and (5.15) gives We choose Q 1 to balance the first two errors, setting as in (2.2), with β given by (4.10). It is classical that (2.3) holds, so it remains to verify (2.4) with Q as large as possible. Write x + y + z = 1.
(5.16) The bounds (4.12) and (4.13) are sufficient as long as To bound the average of r(q), assume (5.2) and insert (5.3) into (4.18), giving Remark 5.23. Of course (5.21) implies (5.22), so the latter condition may be dropped. Taking y, α 0 , and z very small and x and δ very near 1, it is clear that (5.21) will not allow us to do better than α < 1/4; this is what we achieve below.
Let η > 0 be given, and set as claimed in (1.31). We may already assume that (more stringent restrictions on δ follow), and set Then whence (5.21) is satisfied. Next we replace C by C + 1, say, so that we can make the choice α 0 = yΘ C and satisfy (5.17). Making the choice z = 3yΘ 7C = 3α 0 7 < 3α 0 3 + 4δ will satisfy (5.19), whence (5.16) requires In other words, we set .
Remark 5.24. It is here that we need ℵ to come from the fixed alphabet {1, 2} ⊂ A. Indeed, the constants Θ and C, and hence δ 0 , are then absolute, and do not depend on A.

Proof of Theorem 1.34
Returning to (4.19), we devote this section to proving an even stronger (but conditional) bound for E 1 (Q; a), by now allowing ourselves to use the Additive Energy Conjecture 1.32. Our main result is the following Theorem 6.1. Assume Conjecture 1.32. For any ε > 0, and any we have Proof. This time, start the proof by applying Cauchy-Schwarz in q, r, and the "short" variable ω to (4.19). This opens the "long" variable ξ into a pair of such, as follows.
Here we have extended both the ω and ξ, ξ sums to all of SL 2 (Z) (after inserting absolute values). Collect the difference of ξ and ξ into a single variable, writing The Additive Energy Conjecture 1.32 is then the assertion that So writing ω <Z e q (r tr(M aω)) , we apply Cauchy-Schwarz in the M variable, giving where we have inserted a suitable bump function Ψ and applied Poisson summation. Assuming qq Q 2 = o(X) as in (6.2), the innermost condition implies q rω ≡ qr ω (mod qq ).
But this implies (q ) 2 ≡ 0(mod q), since (r, q) = 1. Because q is squarefree, we have thus forced q ≡ 0(mod q). By symmetry, we also have q ≡ 0(mod q ), and hence q = q , r ≡ ur (mod q), and ω ≡ uω (mod q), where u 2 ≡ 1(q); again there are at most 2 ν(q) N ε such u's. We then have We dispose of u in the last summation via Cauchy-Schwarz: since (u, q) = 1. Applying this estimate gives where we have again used the notation We first isolate the M = 0 term, writing Next apply Cauchy-Schwarz yet again, now to E 2 in the q and M variables. Assuming (6.2) that Q < Z gives where we used the Additive Energy Conjecture 1.32 a second time. Combining (6.6) with (6.5) gives (6.3), as claimed.
6.1. Proof of Theorem 1.34. The proof is now nearly identical to that in §5.4, so we give a brief sketch. Again let A be constructed as in (4.5). Assuming (6.2) and Conjecture 1.32, insert (6.3) into (4.18). Together with (4.4), this gives With δ very near 1, the second term is a power savings as long as Q 0 is some tiny power, which requires Y = N y to be some tiny power, y ≈ ε.
Writing Q = N α , X = N x and Z = N z with 1 = x + y + z ≈ x + z the first term is a power savings if z = α − ε, while (6.2) is satisfied if x = 2α − ε. Since x + z ≈ 1, this gives a maximal value of α ≈ 1 3 − ε. We leave the details to the reader.

Proof of Theorem 1.38
Recall from (1.37) that D A is the set of discriminants which arise from the alphabet A. Set A = {1, . . . , A} with A = 50. In his thesis, Mercat connects the Arithmetic Chaos Conjecture with Zaremba's, by proving the following Theorem 7.1 ( [Mer12]). If the reduced rational m/n has all partial quotients bounded by A, and if the denominator n arises as a solution to the Pellian equation n 2 − ∆r 2 = ±1, then Q[ In fact, he exhibits a periodic continued fraction in Q[ √ ∆] via an explicit construction involving the partial quotients of m/n.
With his theorem, we can now sketch a Proof of Theorem 1.38. Iwaniec's theorem [Iwa78] states that the number of n up to N with ∆ = n 2 + 1 having at most 2 prime factors is at least CN/ log N . Taking the alphabet A = {1, . . . , 50} in Theorem 1.24, the error term in the estimate (1.25) is much smaller than N/ log N , and hence 100% of such denominators n have a coprime numerator m with m/n having all partial quotients bounded by A = 50. Clearly setting r = 1 gives a solution to n 2 −∆r 2 = −1, whence ∆ ∈ D A by Mercat's theorem.
Appendix A. Construction of ℵ In this section, we sketch a proof of Proposition 3.9, constructing the special set ℵ ⊂ Γ 0 with good modular distribution properties, and all its elements having archimedean size at most a given parameter Y . Recall that Γ 0 is the semigroup corresponding to the fixed alphabet A 0 = {1, 2}. The constants B, c, C, and Θ in Theorem 3.2 depend only on A 0 , and thus are absolute. As we no longer need the original alphabet A, we drop the subscript 0 from Γ 0 , writing just Γ; similarly write δ for δ A 0 .
Let T be a parameter to be chosen later relative to Y . Let Observe that the elements in S (T ) · s R−1 T are all congruent the identity mod B. Write SL 2 (B) = {γ 1 , . . . , γ R }, and find x 1 , . . . , x R ∈ Γ with x j ≡ γ j (mod B).
Such x j can be found of size x j 1.
For each j = 1, . . . , R, let This is a subset of Γ, in which each element s ∈ S j (T ) has size as most s T R .
Choose T Y 1/R so that all elements s ∈ j S j (T ) satisfy s < Y.
Applying Theorem 3.2 gives that for each j = 1, . . . , R, any q < Q 0 with q ≡ 0(B), and any ω ∈ SL 2 (q) with ω ≡ x j (B), we have Then the sets S j (T ) each have good modular distribution properties for distinct residues mod B. Note that they also all have the same cardinality, namely that of S (T ). Moreover after renaming constants, E(T ; q) E(Y ; q). Hence setting gives the desired special set. The equidistribution (3.10) is now clear for any q ≡ 0(B), while the same for other q it is obtained by summing over suitable arithmetic progressions. This completes the proof.
From stationary phase, we estimate |J X (β; z)| min(X, |β| −1/2 ); moreover only X ε values of k contribute to (B.3). A standard argument (see, e.g. [Bou93]) then gives the bound V r,K 4 j=1 G X δ j θ; s j q e(−4θ)dθ X ε X 2 K 1 r 3/2 , (B.4) which is insufficient for our purposes, since it has no decay in the q aspect. Therefore we must extract more cancellation from the fact that the λ-variables in (B.4) are actually rational numbers, λ = s j /q. Assume for simplicity that (s 1 , q) = 1. Returning to (B.2) with λ = s/q, (s, q) = 1, apply van der Corput's shifting trick, averaging (in smooth form) over an auxiliary parameter | | < L, with L given by L = L r,K = X 1−ε Kr . (B.5) Thus shifting x → x + r and writing θ = a r + β gives x + r X e θx 2 + s q x e (β r(2x + r)) e s q r , where ψ is an appropriate bump function. Then G X may be replaced by x∈Z ϕ 1 x X e θx 2 + s q x 1 L ∈Z ψ L e s q r , where we have replaced e(β r(2x + r)) by 1 using (B.5), and ϕ 1 has the same properties as ϕ. Applying Poisson summation allows us essentially to replace the bracketed term by 1 { sr/q <1/L} , where now · is the distance to the nearest integer. Having inserted this factor into the first copy of G X , we proceed as before, replacing (B.4) with V r,K · · · X ε X 2 K 1 r 3/2 1 { sr q < 1 L } , where s = s 1 is coprime to q.
(B.6) Group the contributions according to whether or not L r,K > q. In the former range, sr/q = 0, and hence q | r since (s, q) = 1; in particular, r ≥ q. The contribution in this range to (B.6) is X ε q<r<X r≡0(q) K<X/r dyadic X 2 K 1 r 3/2 X ε X 2 q 3/2 , giving the first term on the right hand side of (3.13). When L r,K ≤ q, break r into dyadic regions, r R < X, and in each, write sr = uq + v, −q/2 ≤ v < q/2, so that sr/q = |v|/q < 1/L. Then by (B.5), there are at most q/L qX ε KR/X choices for v, and at most R/q + 1 values for u, giving the contribution X ε R,K dyadic RK<X, r R, L r,K ≤q These constitute the last two terms in (3.13), thus completing the proof.