The Erdos discrepancy problem

We show that for any sequence $f: {\bf N} \to \{-1,+1\}$ taking values in $\{-1,+1\}$, the discrepancy $$ \sup_{n,d \in {\bf N}} \left|\sum_{j=1}^n f(jd)\right| $$ of $f$ is infinite. This answers a question of Erd\H{o}s. In fact the argument also applies to sequences $f$ taking values in the unit sphere of a real or complex Hilbert space. The argument uses three ingredients. The first is a Fourier-analytic reduction, obtained as part of the Polymath5 project on this problem, which reduces the problem to the case when $f$ is replaced by a (stochastic) completely multiplicative function ${\bf g}$. The second is a logarithmically averaged version of the Elliott conjecture, established recently by the author, which effectively reduces to the case when ${\bf g}$ usually pretends to be a modulated Dirichlet character. The final ingredient is (an extension of) a further argument obtained by the Polymath5 project which shows unbounded discrepancy in this case.


Introduction
Given a sequence f : N → H taking values in a real or complex Hilbert space H, define the discrepancy of f to be the quantity In other words, the discrepancy is the largest magnitude of a sum of f along homogeneous arithmetic progressions {d, 2d, . . . , nd} in the natural numbers N = {1, 2, 3, . . . }.
The main objective of this paper is to establish the following result: for every i = 0, . . . , k. Summing in i, we conclude that n ∑ j=1χ 3 ( j) = k + 1 log n.
More generally, for natural numbers n, ∑ n j=1χ 3 ( j) is equal to the number of 1s in the base 3 expansion of n. Thusχ 3 has infinite discrepancy, but the divergence is only logarithmic in the n parameter; indeed from the above calculations and the complete multiplicativity ofχ 3 we see that sup n≤N;d∈N | ∑ n j=1χ 3 ( jd)| is comparable to log N for N > 1. This can be compared with random sequences f : N → {−1, +1}, whose discrepancy would be expected to diverge like N 1/2+o (1) . See the paper of Borwein, Choi, and Coons [2] for further analysis of functions such asχ 3 , which seems to have been first discussed in [20]; see also [4] for some further discussion of the sign patterns inχ 3 . One can also reduce the discrepancy of this example slightly (by a factor of about two) by changing the value of the completely multiplicative functionχ 3 at 3 from +1 to −1.
If one lets χ be a primitive Dirichlet character whose period is a prime p = 1 (4), and definesχ similarly toχ 3 above, then the partial sums ∑ n j=1χ ( j) remain unbounded, but the Cesàro sum ∑ n j=1 (1 − Thus the discrepancy of this function is infinite and diverges like √ log N. Example 1.6 (Random Borwein-Choi-Coons example). Let g : N → {−1, +1} be the stochastic (i.e. random) multiplicative function defined by setting g(n) = χ 3 (n) is coprime to 3, and g(3 j ) := ε j for j = 1, 2, 3, . . . , where χ 3 is as in Example 1.4 and ε 1 , ε 2 , · · · ∈ {−1, +1} are independently identically distributed signs, attaining −1 and +1 with equal probability. Arguing similarly to 1.5, we have where to reach the second line we use the additivity of variance for independent random variables. Thus g in some sense has discrepancy growth like √ log N "on the average". (Note that one can interpret this example as a special case of Example 1.5, by setting H to be the Hilbert space of real-valued squareintegrable random variables.) However, by carefully choosing the base 3 expansion of n depending on the signs ε 1 , . . . , ε k (similarly to Example 1.4) one can show that and so the actual discrepancy grows like log N. So this random example actually has essentially the same discrepancy growth as Example 1.4. We do not know if scalar sequences of significantly slower discrepancy growth than this can be constructed.
Example 1.7 (Numerical examples). In [12] a sequence f (n) supported on n ≤ N := 1160, with values in {−1, +1} in that range, was constructed with discrepancy 2 (and a SAT solver was used to show that 1160 was the largest possible value of N with this property). A similar sequence with N = 130 000 of discrepancy 3 was also constructed in that paper, as well as a sequence with N = 127 645 of discrepancy 3 that was the restriction to {1, . . . , N} of a completely multiplicative sequence taking values in {−1, +1} (with the latter value of 127 645 being the best possible value of N; see also [1] for a separate computation confirming this threshold). This slow growth in discrepancy may be compared with the √ log N type divergence in Example 1.5.
The above examples suggest that completely multiplicative functions are an important test case for Theorem 1.1 and Corollary 1.2; the importance of this case was already isolated in [9]. More recently, the Polymath5 project [19] obtained a number of equivalent formulations of the Erdős discrepancy problem and its variants, including the logical equivalence of Theorem 1.1 with the following assertion involving such functions. We define a stochastic element of a measurable space X to be a random variable g taking values in X, or equivalently a measurable map g : Ω → X from an ambient probability space (Ω, µ) (known as the sample space) to X. Theorem 1.8 (Equivalent form of vector-valued Erdős discrepancy problem). Let g : N → S 1 be a stochastic completely multiplicative function taking values in the unit circle S 1 := {z ∈ C : |z| = 1} (where we give the space (S 1 ) N of functions from N to S 1 the product σ -algebra). Then By converting all the probabilistic language to measure-theoretic language, Theorem 1.8 has the following equivalent form: Theorem 1.9 (Measure-theoretic formulation). Let (Ω, µ) be a probability space, and let g : Ω → (S 1 ) N be a measurable function to the space (S 1 ) N of functions from N to S 1 , such that g(ω) ∈ (S 1 ) N is completely multiplicative for µ-almost every ω ∈ Ω (that is to say, g(ω)(nm) = g(ω)(n)g(ω)(m) for all n, m ∈ N and µ-almost all ω ∈ Ω). Then one has The equivalence between Theorem 1.1 and Theorem 1.8 (or Theorem 1.9) was obtained in [19] using a Fourier-analytic argument; for the convenience of the reader, we reproduce this argument in Section 2. The close similarity between Example 1.5 and Example 1.6 can be interpreted as a special case of this equivalence.
It thus remains to establish Theorem 1.8. To do this, we use a recent result of the author [21] regarding correlations of multiplicative functions: Theorem 1.10 (Logarithmically averaged nonasymptotic Elliott conjecture). [21, Theorem 1.3] Let a 1 , a 2 be natural numbers, and let b 1 , b 2 be integers such that a 1 b 2 − a 2 b 1 = 0. Let ε > 0, and suppose that A is sufficiently large depending on ε, a 1 , a 2 , b 1 , b 2 . Let x ≥ w ≥ A, and let g 1 , g 2 : N → C be multiplicative functions with |g 1 (n)|, |g 2 (n)| ≤ 1 for all n, with g 1 "non-pretentious" in the sense that for all Dirichlet characters χ of period at most A, and all real numbers t with |t| ≤ Ax. Then This theorem is a variant of the Elliott conjecture [8] (as corrected in [16]), which in turn is a generalisation of a well known conjecture of Chowla [5]. See [21] for further discussion of this result, the proof of which relies on a number of tools, including the recent results in [14], [16] on mean values of multiplicative functions in short intervals. It can be viewed as a sort of "inverse theorem" for pair correlations of multiplicative functions, asserting that such correlations can only be large when both of the multiplicative functions "pretend" to be like modulated Dirichlet characters n → χ(n)n it .
Using this result and a standard van der Corput argument, one can show that the only potential counterexamples to Theorem 1.8 come from (stochastic) completely multiplicative functions that usually "pretend" to be like modulated Dirichlet characters (cf. Examples 1.3, 1.4, 1.6). More precisely, we have Proposition 1.11 (van der Corput argument). Suppose that g : N → S 1 is a stochastic completely multiplicative function, such that for some finite C > 0 and all natural numbers n (thus, g would be counterexample to Theorem 1.8).
Let ε > 0, and suppose that X is sufficiently large depending on ε,C. Then with probability 1 − O(ε), one can find a (stochastic) Dirichlet character χ of period q = O C,ε (1) and a (stochastic) real number (See Section 1.1 below for our asymptotic notation conventions.) We give the (easy) derivation of Proposition 1.11 from Theorem 1.10 in Section 3. One can of course reformulate Proposition 1.11 in measure-theoretic language if desired, much as Theorem 1.8 may be reformulated as Theorem 1.9; we leave this to the interested reader. Of course, Theorem 1.8 implies that the hypotheses of Proposition 1.11 cannot hold, and so Proposition 1.11 is in fact vacuously true; nevertheless it is necessary to establish this proposition independently of Theorem 1.8 to avoid circularity.
It remains to demonstrate Theorem 1.8 for random completely multiplicative functions g that obey (1.4) with high probability for large X and small ε. Such functions g can be viewed as (somewhat complicated) generalisations of the Borwein-Choi-Coons example (Example 1.4), and it turns out that a more complicated version of the analysis in Example 1.4 (or Example 1.5) suffices to establish a lower bound for E| ∑ n j=1 g( j)| 2 (of logarithmic type, similar to that in Example 1.5) which is enough to conclude Theorem 1.8 and hence Theorem 1.1 and Corollary 1.2. We give this argument in Section 4.
In principle, the arguments in [21] provide an effective value for A as a function of ε, a 1 , a 2 , b 1 , b 2 in Theorem 1.10, which would in turn give an explicit lower bound for the divergence of the discrepancy in Theorem 1.1 or Corollary 1.2. However, this bound is likely to be far too weak to match the √ log N type divergence observed in Example 1.5. Nevertheless, it seems reasonable to conjecture that the √ log N order of divergence is best possible for Theorem 1.1 (although it is unclear to the author whether such a slowly diverging example can also be attained for Corollary 1.2).
The arguments in this paper can also be used to partially classify the multiplicative (but not completely multiplicative) functions taking values in {−1, +1} that have bounded partial sums; see Section 5. Remark 1.12. In [19,10], Theorem 1.1 was also shown to be equivalent to the existence of sequences (c m,d ) m,d∈N , (b n ) n∈N of non-negative reals such that ∑ m,d c m,d = 1, ∑ n b n = ∞, and such that the real quadratic form is positive semi-definite. The arguments of this paper thus abstractly show that such sequences exist, but do not appear to give any explicit construction for such a sequence. Remark 1.13. In [10, Conjecture 3.12], the following stronger version of Theorem 1.1 was proposed: if C ≥ 0 and N is sufficiently large depending on C, then for any matrix (a i j ) 1≤i, j≤N of reals with diagonal entries equal to 1, there exist homogeneous arithmetic progressions P = {d, 2d, . . . , nd} and Setting a i j := f (i), f ( j) H , we see that this would indeed imply Theorem 1.1 and thus Corollary 1.2. We do not know how to resolve this conjecture, although it appears that a two-dimensional variant of the Fourier-analytic arguments in Section 2 below can handle the special case when a i j = ±1 for all i, j (which would still imply Corollary 1.2 as a special case). We leave this modification of the argument to the interested reader. If d is a natural number and D is any odd number larger than d, then we see that the block . . , f (D!) and thus sums to zero; in fact if we divide f (d), f (2d), . . . into consecutive blocks of length (D + 1)!/d then all such blocks sum to zero, and so sup n | ∑ n j=1 f ( jd)| is finite for all d.
There is a curious superficial similarity between the arguments in this paper and the Hardy-Littlewood circle method. In the latter, Fourier analytic arguments are used to reduce matters to estimates on "major arcs" and "minor arcs"; in this paper, Fourier analytic arguments are used to reduce matters to estimates for "pretentious multiplicative functions" and "non-pretentious multiplicative functions". We do not know if there is any deeper significance to this similarity.

Notation
We adopt the usual asymptotic notation of X Y , Y X, or X = O(Y ) to denote the assertion that |X| ≤ CY for some constant C. If we need C to depend on an additional parameter we will denote this by subscripts, e.g. X = O ε (Y ) denotes the bound |X| ≤ C ε Y for some C ε depending on ε. For any real number α, we write e(α) := e 2πiα .
All sums and products will be over the natural numbers N = {1, 2, . . . } unless otherwise specified, with the exception of sums and products over p which is always understood to be prime.
We use d|n to denote the assertion that d divides n, and n (d) to denote the residue class of n modulo d. We use (a, b) to denote the greatest common divisor of a and b.
We will frequently use probabilistic notation such as the expectation EX of a random variable X or a probability P(E) of an event E. We will use boldface symbols such as g to refer to random (i.e. stochastic) variables, to distinguish them from deterministic variables, which will be in non-boldface.

Fourier analytic reduction
In this section we establish the logical equivalence between Theorem 1.1 and Theorem 1.8 (or Theorem 1.9). The arguments here are taken from a website 3 of the Polymath5 project [19].
The deduction of Theorem 1.9 from Theorem 1.1 is straightforward: if (Ω, µ) and g are as in Theorem 1.9, one takes H to be the complex Hilbert space L 2 (Ω, µ), and for each natural number n, we let f (n) ∈ H be the function is clearly a unit vector in H. For any homogeneous arithmetic progression {d, 2d, . . . , nd}, one has and on taking suprema in n and d we conclude that Theorem 1.9 follows from Theorem 1.1. (Note that this argument also explains the similarity between Example 1.6 and Example 1.5.) Since Theorem 1.9 is equivalent to Theorem 1.8, it remains to show that Theorem 1.8 implies Theorem 1.1. We take contrapositives, thus we assume that Theorem 1.1 fails, and seek to conclude that Theorem 1.8 also fails. By hypothesis, we can find a function f : N → H taking values in the unit sphere of a Hilbert space H and a finite quantity C such that for all homogeneous arithmetic progressions d, 2d, . . . , nd. By complexifying H if necessary, we may take H to be a complex Hilbert space. To obtain the required conclusion, it will suffice to construct a random completely multiplicative function g taking values in S 1 , such that We claim that it suffices to construct, for each X ≥ 1, a stochastic completely multiplicative function g X taking values in S 1 such that for all n ≤ X, where the implied constant is uniform in n and X, but we allow the underlying probability space defining the stochastic function g X to depend on X. This reduction is obtained by a standard compactness argument 4 , but we give the details for sake of completeness. Suppose that for each X, we have such a g X obeying (2.2) as above. Let M be the space of completely multiplicative functions g : N → S 1 from N to S 1 ; one can view this space as isomorphic to an infinite product of S 1 's, since completely multiplicative functions are determined by their values at the primes. In particular, M is a compact metrisable space; it can be viewed as a compact subspace of the space (S 1 ) N of arbitrary functions (not necessarily multiplicative) from N to S 1 . Just as Theorem 1.8 is equivalent to Theorem 1.9, we can view each g X as a measurable map f X : Ω X → M, such that for all n ≤ X. We can then define a Radon probability measure ν X on M to be the probability distribution (or law) of the random variable g X , or equivalently the pushforward of the measure µ X via f X . That is to say, for any continuous function F : M → C. The functions g → ∑ n j=1 g( j) 2 are continuous on M, and hence M n ∑ j=1 g( j) 2 dν X (g) C 1 4 If one wished to obtain a more quantitative version of Theorem 1.1, one would avoid this compactness argument and work instead with truncated versions of Theorem 1.8 (or Theorem 1.9) in which one restricts the n parameter to be less than some large cutoff. This would then require similar truncations to be made in the arguments in later sections, which in particular requires some treatment of error terms created when truncating Euler products, but such errors can be made negligible by making the truncation parameter extremely large with respect to all other parameters. We leave the details of this reformulation of the argument to the interested reader. for all n ≤ X. By vague compactness of probability measures on compact metrisable spaces such as M (Prokhorov's theorem), we can thus extract a subsequence ν X j of the ν X with X j → ∞ such that the ν X j converge to a Radon probability measure ν on M, that is to say as j → ∞ for all continuous functions F : M → C. Applying this in particular to the continuous functions g → ∑ n j=1 g( j) 2 , we conclude that for all n. We then define the random completely multiplicative function g : N → S 1 (or equivalently, a measurable map from a probability space to M) by choosing (M, ν) as the underlying probability space, and using the identity function g → g as the measurable map. We then have for all n, and the claim follows. It remains to construct the random multiplicative functions g X for each X. Let X ≥ 1, and let p 1 , . . . , p r be the primes up to X. Let M ≥ X be a natural number that we assume to be sufficiently large depending on C, X. Define a function F : (Z/MZ) r → H by the formula note that π is well defined for M ≥ X. Applying (2.1) for n ≤ X and d of the form p a 1 1 . . . p a r r with 1 ≤ a i ≤ M − X, we conclude that for all n ≤ X and all but O X (M r−1 ) of the M r elements x = (x 1 , . . . , x r ) of (Z/MZ) r . For the exceptional elements, we have the trivial bound n ∑ j=1 F(x + π( j)) H ≤ n ≤ X from the triangle inequality. Square-summing in x, we conclude (if M is sufficiently large depending on C, X) that By Fourier expansion, we can write where (x 1 , . . . , x r ) · (ξ 1 , . . . , ξ r ) := x 1 ξ 1 + · · · + x r ξ r , and the Fourier transformF : (Z/MZ) r → H is defined by the formulaF A routine Fourier-analytic calculation (using the Plancherel identity) then allows us to write the left-hand side of (2.3) as On the other hand, from a further application of the Plancherel identity we have for all n ≤ X. If we then define the stochastic completely multiplicative function g X by setting g X (p j ) := e(ξ j /M) for j = 1, . . . , r, and g X (p) := 1 for all other primes, we obtain for all n ≤ X, as desired.
Remark 2.1. It is instructive to see how the above argument breaks down when one tries to use the Dirichlet character example in Example 1.3. While χ often has magnitude 1 in the ordinary (Archimedean) sense, the function (a 1 , . . . , a r ) → χ(p a 1 1 . . . p a r r ) is almost always zero, since the argument p a 1 1 . . . p a r r of χ is likely to be a multiple of q. As such, the quantity F (ξ ) 2 H sums to something much less than 1, and one does not generate a stochastic completely multiplicative function g with bounded discrepancy.
Remark 2.2. The above arguments also show that Theorem 1.1 automatically implies an apparently stronger version 5 of itself, in which one assumes f (n) H ≥ 1 for all n, rather than f (n) H = 1. Indeed, if f has bounded discrepancy then it must be bounded (since f (n) is the difference of ∑ n j=1 f ( j) and ∑ n−1 j=1 f ( j)), and the above arguments then carry through; the sum ∑ ξ ∈(Z/MZ) r F (ξ ) 2 H is now greater than or equal to 1, but one can still define a suitable probability distribution from the F (ξ ) 2 H by normalising. Remark 2.3. If Theorem 1.8 failed, then we could find a constant C > 0 and a stochastic completely multiplicative function g : N → S 1 such that for all n. In particular, by the triangle inequality we have for all deterministic completely multiplicative functions g : N → S 1 , all N ≥ 1, and some function ω(N) of N that goes to infinity as N → ∞. This was in fact the preferred form of the Fourier-analytic reduction obtained by the Polymath5 project [19], [10]. It is conceivable that some refinement of the analysis in this paper in fact yields a bound of the form (2.4), though this seems to require removing the logarithmic averaging from Theorem 1.10, as well as avoiding the use of Lemma 4.1 below.

Applying the Elliott-type conjecture
In this section we prove Proposition 1.11. Let g,C, ε be as in that proposition. Let H ≥ 1 be a moderately large natural number depending on ε to be chosen later, and suppose that X is sufficiently large depending on H, ε. a similar argument (for X large enough) gives and hence by the triangle inequality Thus from Markov's inequality we see with which we rewrite as We can expand out the left-hand side of (3.1) as The diagonal term h 1 , h 2 contributes a term of size H log X to this expression. Thus, choosing H to be a sufficiently large quantity depending on C, ε, we can apply the triangle inequality and pigeonhole principle to find distinct (and stochastic) h 1 , h 2 ∈ [1, H] such that ∑ √ X≤n≤X g(n + h 1 )g(n + h 2 ) n C,ε,H log X.
Applying Theorem 1.10 in the contrapositive, we obtain the claim. (It is easy to check that the quantities χ,t produced by Theorem 1.10 can be selected to be measurable, for instance one can use continuity to restrict t to be rational and then take a minimal choice of (χ,t) with respect to some explicit well-ordering of the countable set of possible pairs (χ,t).) Remark 3.1. The same argument shows that the hypothesis |g(n)| = 1 may be relaxed to |g(n)| ≤ 1, and g need only be multiplicative rather than completely multiplicative, provided that one has a lower bound of the form ∑ √ X≤n≤X |g(n)| 2 n log X. Thus the Dirichlet character example in Example 1.3 is in some sense the "only" example of a bounded multiplicative function with bounded discrepancy that is large for many values of n, in that any other such example must "pretend" to be like a (modulated) Dirichlet character. (We thank Gil Kalai for suggesting this remark.)

A generalised Borwein-Choi-Coons analysis
We can now complete the proof of Theorem 1.8 (and thus Theorem 1.1 and Corollary 1.2). Our arguments here will be based on those from a website 6 of the Polymath5 project [19], which treated the case in which the functions g and χ appearing in Proposition 1.11 were real-valued (and the quantity t was set to zero).
Suppose for contradiction that Theorem 1.8 failed 7 , then we can find a constant C > 0 and a stochastic completely multiplicative function g : N → S 1 such that for all natural numbers n. We now allow all implied constants to depend on C, thus for all n. The stochastic nature of g is a mild technical nuisance for our arguments, but the reader may wish to assume g as a deterministic completely multiplicative function for a first reading, as this case already captures the key aspects of the argument.
We will need the following large and small parameters, selected in the following order: • A quantity 0 < ε < 1/2 that is sufficiently small depending on C.
• A natural number H ≥ 1 that is sufficiently large depending on C, ε.
• A natural number k ≥ 1 that is sufficiently large depending on C, ε, H.
• A real number X ≥ 1 that is sufficiently large depending on C, ε, H, δ , k.
We will implicitly assume these size relationships in the sequel to simplify the computations, for instance by absorbing a smaller error term into a larger if the latter dominates the former under the above assumptions. The reader may wish to keep the hierarchy in mind in the arguments that follow. One could reduce the number of parameters in the argument by setting δ := 1/k, but this does not lead to significant simplifications in the arguments below.
By Proposition 1.11, we see with probability 1 − O(ε) that there exists a Dirichlet character χ of period q = O ε (1) and a real number t = O ε (X) such that By reducing χ if necessary we may assume that χ is primitive. It will be convenient to cut down the size of t.
Proof. By Proposition 1.11 with X replaced by X δ , we see that with probability 1 − O(ε), one can find a Dirichlet character χ of period q = O ε (1) and a real number We may restrict to the event that |t − t| ≥ X δ , since we are done otherwise. Applying the pretentious triangle inequality (see [11,Lemma 3.1]), we conclude that The character χ χ has period O ε (1). Applying the Vinogradov-Korobov zero-free region for L(·, χ χ ) (see [17, §9.5]), we see that L(σ + it, χ χ ) = 0 for |t| ≥ 10 and σ ≥ 1 − c ε (log |t|) 2/3 (log log |t|) 1/3 for some c ε > 0 depending only on ε; furthermore, an inspection of the Vinogradov-Korobov arguments (based on estimation of the logarithmic derivative of L(·, χ χ ) in the zero-free region) in fact yields the crude bound 8 | log L(σ + it, χ χ )| log O(1) |t| in this region (using a suitable branch of the logarithm), possibly after shrinking c ε if necessary. Using the contour-shifting arguments in [15,Lemma 2] and the bounds X δ ≤ |t − t| ε X, it is then not difficult to show that if X is sufficiently large depending on ε, δ , a contradiction (note that the summands in (4.3) are nonnegative). The claim follows.
Let us now condition to the probability 1 − O(ε) event that χ, t exist obeying (4.1) and the bound (4.2); we can of course do this as ε is assumed to be small.
The bound (4.1) asserts that g "pretends" to be like the completely multiplicative function n → χ(n)n it . We can formalise this by making the factorisation g(n) :=χ(n)n it h(n) (4.4) whereχ is the completely multiplicative function of magnitude 1 defined by settingχ(p) := χ(p) for p q andχ(p) := g(p)p −it for p|q, and h is the completely multiplicative function of magnitude 1 defined by setting h(p) := g(p)χ(p)p −it for p q, and h(p) = 1 for p|q. The functionχ should be compared with the functionχ 3 in Example 1.4 and the function g in Example 1.6.
With the above notation, the bound (4.1) simplifies to The model case to consider here is when t = 0 and h = 1, in which case g =χ. In this case, one could skip directly ahead to (4.8) below. Of course, in general t will be non-zero (albeit not too large) and h will not be identically 1 (but "pretends" to be 1 in the sense of (4.5)). We will now perform some manipulations to remove the n it and h factors from g and isolate medium-length sums (4.8) of theχ factor, which are more tractable to compute with than the corresponding sums of g; then we will perform more computations to arrive at an expression (4.12) just involving χ which we will be able to control fairly easily.
We turn to the details. The first step is to eliminate the role of n it . From (1.3) and the triangle inequality we have for all n (even after conditioning to the 1 − O(ε) event mentioned above). The 1 H ∑ H<H ≤2H averaging will not be used until much later in the argument, and the reader may wish to ignore it for the time being.
By (4.4), the above estimate can be written as For n ≥ X 2δ , we can use (4.2) and Taylor expansion to conclude that (n + m) it = n it + O ε,H,δ (X −δ ). n 1+1/ log X log X.
(The zeta function type weight of 1 n 1+1/ log X will be convenient later in the argument when one has to perform some multiplicative number theory, as the relevant sums can be computed quite directly and easily using Euler products.) Thus, with probability 1 − O(ε), one has from Markov's inequality that We condition to this event, which we may do as ε is assumed to be small. From this point onwards, our arguments will be purely deterministic in nature (in particular, one can ignore the boldface fonts in the arguments below if one wishes).
We have successfully eliminated the role of n it ; we now work to eliminate h. To do this we will have to partially decouple theχ and h factors in the above expression, which can be done 9 by exploiting the almost periodicity properties ofχ as follows. Call a residue class a (q k ) bad if a + m is divisible by p k for some p|q and 1 ≤ m ≤ 2H, and good otherwise. We restrict n to good residue classes, thus By Cauchy-Schwarz, we conclude that Now we claim that for n in a given good residue class a (q k ), the quantityχ(n + m) does not depend on n. Indeed, by hypothesis, (n + m, q k ) = (a + m, q k ) is not divisible by p k for any p|q and is thus a factor of 9 The argument here was loosely inspired by the Maier matrix method [13].
q k−1 , and n+m (n+m,q k ) = n+m (a+m,q k ) is coprime to q. We then factor where in the last line we use the periodicity of χ. Thus we haveχ(n + m) =χ(a + m), and so Shifting n by m, we see that and thus (for X large enough) (4.6) Now, we perform some multiplicative number theory to understand the innermost sum in (4.6), with the aim of showing that the summand here is approximately equidistributed modulo q k . From taking Euler products, we have ∑ n h(n) n 1+1/ log X = S where S is the Euler product From (4.5) and Mertens' theorem one can easily verify that log X ε |S| ε log X. (4.7) More generally, for any Dirichlet character χ 1 we have If χ 1 is a non-principal character of period dividing q k , then the L-function L(s, χ) := ∑ n χ 1 (n) n s is analytic near s = 1, and in particular we have We conclude that where we have used the Cauchy-Schwarz inequality, Mertens' theorem, and (4.5). For a principal character χ 0 of period r dividing q k we have ∑ n χ 0 h(n) thanks to (4.7) and the fact that h(p) = 1 for all p|r, and that all prime factors of r divide q and are thus of size O ε (1). By expansion into Dirichlet characters we conclude that for all r|q k and primitive residue classes b (r). For non-primitive residue classes b (r), we write r = (b, r)r and b = (b, r)b . The previous arguments then give which since h((b, r)) = 1 gives (again using (4.7)) for all b (r) (not necessarily primitive). Inserting this back into (4.6) we see that The contribution of the O q,k (exp(O ε ((log log X) 1/2 ))) error term here can be shown by (4.7) to be at most c ε log 2 X/q k in magnitude if X is large enough, for any c ε > 0 depending only on ε. Removing this error term and then applying (4.7) again to cancel off the S term, we conclude that We have now eliminated both t and h. The remaining task is to establish some lower bound on the discrepancy of medium-length sums ofχ that will contradict (4.8). As mentioned above, this will be a more complicated variant of the analysis in Examples 1.4, 1.5, 1.6 in which the perfect orthogonality in Example 1.5 is replaced by an almost orthogonality argument.
We turn to the details. We first dispose of the easy case 10 when q = 1. In that caseχ is identically one, and the left-hand side simplifies to 1 H ∑ H<H ≤2H (H ) 2 , which is comparable to H 2 and leads to a contradiction since H is large. Thus we may restrict to the event that q > 1, so that the primitive character χ is non-principal.
Write d 1 := (a + m 1 , q k ) and d 2 := (a + m 2 , q k ), thus d 1 , d 2 |q k−1 and for i = 1, 2 we havẽ We  We reinstate the bad a. The number of such a is at most so their total contribution here is O H (2 −k q k ) which is negligible, thus we may drop the requirement in (4.9) that a is good. Note that as χ is already restricted to numbers coprime to q, and d 1 , d 2 divide q k−1 , we may replace the constraints (a + m i , q k ) = d i with d i |a + m i for i = 1, 2. Summarising these modifications, we have arrived at the estimate  Consider the contribution to the left-hand side of (4.10) of an off-diagonal term d 1 = d 2 for a fixed choice of m 1 , m 2 . To handle these terms we use the Fourier transform to expand the character χ(n) (which, as mentioned before Lemma 4.1, can be taken to be primitive) as a linear combination of e(ξ n/q) for ξ ∈ (Z/qZ) × . Thus, the function n → 1 d 1 |n χ n d 1 can be written as a linear combination of n → 1 d 1 |n e(ξ n/d 1 q) for ξ ∈ Z coprime to q, which by Fourier expansion of the 1 d 1 |n factor (and the fact that all the prime factors of d 1 also divide q) can in turn be written as a linear combination of n → e(ξ n/d 1 q) for (ξ , d 1 q) = 1. Translating, we see that the function a → 1 d 1 |a+m 1 χ a + m 1 d 1 can be written as a linear combination of a → e(ξ a/d 1 q) for (ξ , d 1 q) = 1. Similarly a → 1 d 2 |a+m 2 χ a + m 2 d 2 can be written as a linear combination of a → e(ξ a/d 2 q) for (ξ , d 2 q) = 1. If d 1 = d 2 , then the frequencies involved here are distinct; since q k is a multiple of both d 1 q and d 2 q, we conclude the perfect cancellation 11 Thus we only need to consider the diagonal contribution d 1 = d 2 to (4.10). For these diagonal terms we do not perform a Fourier expansion of the character χ. Theχ(d 1 )χ(d 2 ) terms helpfully cancel, and we Informally, this theorem asserts that the only multiplicative functions f : N → {−1, +1} with bounded afterwards, the author obtained the averaged version of that conjecture in [21], which turned out to be sufficient to complete the argument. The author also thanks Timothy Gowers for helpful discussions and encouragement, as well as Cristóbal Camarero, Christian Elsholtz, Andrew Granville, Gergely Harcos, Gil Kalai, Joseph Najnudel, Royce Peng, Uwe Stroinski, and anonymous blog commenters for corrections and comments on the above-mentioned blog post and on other previous versions of this manuscript. Finally, we thank the anonymous referee for a thorough reading of the manuscript and for many comments and corrections.