Good Weights for the Erdős Discrepancy Problem

The Erdős discrepancy problem, now a theorem by T. Tao, asks whether every sequence with values plus or minus one has unbounded discrepancy along all homogeneous arithmetic progressions. We establish weighted variants of this problem, for weights given either by structured sequences that enjoy some irrationality features, or certain random sequences. As an intermediate result, we establish that weighted sums of bounded multiplicative functions and products of shifts of such functions are unbounded. A key ingredient in our analysis for the structured weights, is a structural result for measure preserving systems naturally associated with bounded multiplicative functions that was recently obtained in joint work with B. Host.

1 Introduction and main results

Introduction
The Erdős discrepancy problem is an elementary question that dates back to the 1930's and asks if there is a sequence a : N → {−1, 1} that is evenly distributed along all homogeneous arithmetic progressions, in the sense that the sequence of partial sums (∑ n k=1 a(dk)) n∈N is bounded uniformly in d ∈ N. The problem remained dormant for a long time and it was not until 2010 that interest was rejuvenated, when it became the subject of the Polymath5 project (see [7,13] for related details). The problem was finally solved in 2015 by T. Tao [15] who proved the following (henceforth, with S we denote the unit circle and with U the complex unit disc): Theorem 1.1 (Tao [15]). For every sequence a : N → S we have sup d,n∈N n ∑ k=1 a(dk) = +∞. (1) We seek to obtain weighted variants of the previous result. To facilitate exposition, we introduce the following notion: Definition 1.2. We say that a sequence w : N → U is a good weight for the Erdős discrepancy problem, or simply, a good weight, if for every a : N → S we have sup d,n∈N n ∑ k=1 a(dk) w(k) = +∞.
(2) Theorem 1.1 implies that w = 1 (and more generally w = f where f : N → S is a completely multiplicative function) is a good weight for the Erdős discrepancy problem. On the other hand, sequences with bounded partial sums, like the sequence (e(kα)) k∈N , where α ∈ R \ Z and e(t) := e 2πit , are not good weights, and more generally, a product of a completely multiplicative function f : N → S with a sequence that has bounded partial sums is not a good weight (take a =f ). It is less clear if some other oscillatory sequences like (e(k l α)) k∈N , where l ≥ 2 and α is irrational, or random sequences of ±1's are good weights. We will show in Corollary 1.5 and Theorem 1.7 that they are; that is, for every a : N → S we have sup d,n∈N n ∑ k=1 a(dk) e(k l α) = +∞ and a similar statement holds if we use as weights random sequences of ±1. Moreover, in Theorem 1.4 we give a rather general criterion that allows us to show that a large class of zero entropy sequences that enjoy certain irrationality features are good weights for the Erdős discrepancy problem.
On a related result of independent interest, we show that certain weighted sums of multiplicative functions are unbounded. For instance, we prove in Corollary 1.10 that if l ≥ 2, α is irrational, and f , g : N → S are multiplicative functions, then sup n∈N n ∑ k=1 f (k) g(k + 1) e(k l α) = +∞, and in Theorems 1.11 we prove an analogous result when the weights are given by random sequences of ±1's.

Results related to the weighted Erdős discrepancy problem
The next result gives necessary conditions for a bounded sequence of complex numbers to be a good weight for the Erdős discrepancy problem. In order to explain the exact assumptions needed, we use ergodic terminology that is explained in Section 3.2, and in Corollary 1.5 we give some explicit examples. See also Section 1.6 for our notation regarding averages; for reasons that are explained in Section 3.2 we use logarithmic averages. Definition 1. 3. We say that the sequence a : N → U • has vanishing self-correlations, if for every h ∈ N we have E log n∈N a(n + h) a(n) = 0; • is non-null for logarithmic averages, or simply, non-null, if lim inf Our main result regarding structured (zero entropy) weights is the following one: Theorem 1.4. Suppose that w : N → U is non-null and totally ergodic, and has zero entropy and vanishing self-correlations. Then w is a good weight for the Erdős discrepancy problem.

Remarks.
• As was the case in [15], if H is an arbitrary inner product space and a : N → H is such that a(k) H = 1 for all k ∈ N, then our argument works without any change and shows that • Using Theorem 1.9 below, it is straightforward to adapt the proof of Theorem 1.4 in order to get the following stronger conclusion: For Q(k) = ∏ j=1 (k + h j ), k ∈ N, where ∈ N, h 1 , . . . , h ∈ Z + , and w is as before, we have for every sequence a : N → S that sup d,n∈N n ∑ k=1 a(dQ(k)) w(k) = +∞.
But our methods do not allow us to deal with the non-weighted version (where w = 1) even when Q(k) = k(k + 1), k ∈ N.
• The zero entropy assumption cannot be removed. To see this, let a(k) = f (k) and w(k) 1} is any multiplicative function that satisfies the Elliott conjecture, in which case w has vanishing self-correlations and is totally ergodic (in fact Bernoulli). Also, the assumption that the self-correlations of w vanish cannot be removed. To see this, let a = 1 and w(k) = e(kα), k ∈ N, where α is irrational. On the other hand, it is not clear whether the assumption of total ergodicity can be removed.
The proof of Theorem 1.4 has a few interesting features. Unlike the proof of Theorem 1.1 in [15], we are not using explicitly or implicitly results from [10,11,14] on averages of multiplicative functions in short intervals, and also we do not carry out a separate analysis in the case where the sequence (a(k)) k∈N is a pretentious multiplicative function. To compensate for this, our argument crucially uses the following ergodic result that was proved in [3] using a combination of ergodic theory and number theory tools developed in [2] and [16] (the notions involved are defined in Section 3): Theorem 1.6 (F., Host [3]). All Furstenberg systems of a multiplicative function with values on U are disjoint from all zero entropy totally ergodic systems.
To get a sense of why Theorem 1.6 is useful, we note that it implies (via Proposition 4.1 below) that if w is a totally ergodic sequence with zero entropy and f : N → U is a multiplicative function, then the self-correlations of the sequence f · w split into a product of the self-correlations of f and the self-correlations of w. Hence, if we assume that w has vanishing self-correlations, then the same holds for f · w, and this property implies Theorem 1.4 (see Proposition 2.7).
Lastly, we give examples of good weights that are given by random sequences. The first result applies to independent symmetric random variables and its proof is rather elementary. Theorem 1.7. Let (X k (ω)) k∈N be a sequence of independent random variables with P(X k = −1) = P(X k = 1) = 1 2 , k ∈ N, and a : N → U be a non-null sequence. Then ω-almost surely the sequence (a(k) X k (ω)) k∈N is a good weight for the Erdős discrepancy problem.
The second result applies to independent random variables that are not necessarily symmetric as long as they take a fixed non-zero complex value not too rarely. Its proof, due to M. Kolountzakis, is simple, but makes essential use of Theorem 1.1 (via the criterion given in Lemma 5.5 below). Theorem 1.8. Let (X k (ω)) k∈N be a sequence of independent, complex valued, random variables. Suppose that for some c ∈ C \ {0} the sequence ρ k := P(X k = c), k ∈ N, is decreasing and satisfies ∑ k∈N ρ l k = +∞ for every l ∈ N. Then ω-almost surely the sequence (X k (ω)) k∈N is a good weight for the Erdős discrepancy problem.
Remark. The assumption of monotonicity cannot be removed. To see this, take P(X k = 1) = 1 if k is prime, and P(X k = 0) = 1 for all other k ∈ N, and let a : N → {−1, 1} be a completely multiplicative function that is equal to (−1) n on the n-th prime. Then ω-almost surely we have sup d,n∈N ∑ n k=1 a(dk) X k (ω) ≤ 1.
If we take c = 1 and decreasing ρ k such that ρ k ≥ 1 log k and P(X k = 0) = 1 − ρ k for k ≥ 2, then Theorem 1.8 applies, and gives that the indicator functions of certain sparse random subsets of the integers are good weights for the Erdős discrepancy problem.

Results related to weighted sums of multiplicative functions
As was the case in the proof of Theorem 1.1 in [15], the unboundedness of weighted discrepancy sums for arbitrary unit modulus sequences follows from similar unboundedness properties of unit modulus completely multiplicative functions. We state next some related results that are of independent interest. Theorem 1.9. Let f : N → U be a non-null multiplicative function and w : N → U be non-null, totally ergodic, with zero entropy, and vanishing self-correlations. Then In fact, the following stronger property holds: If w is as before, f 1 , . . . , f : N → U are multiplicative functions, and h 1 , . . . , h ∈ Z + are such that the sequence (∏ j=1 f j (k + h j )) k∈N is non-null, then we have Remark. Note that for w = 1 although (3) holds for all completely multiplicative functions with values on S, it fails for some non-null multiplicative functions with values on U. For instance it fails for f (k) = (−1) k+1 , k ∈ N, and for all non-trivial Dirichlet characters.
Regarding the non-weighted version of (4), not much is known for ≥ 2. For instance, it is not known whether for every completely multiplicative function f : N → S we have This problem was raised by J. Teräväinen and A. Klurman, who remarked that it is not even clear how to prove that where λ is the Liouville function. On the other hand, it is an immediate consequence of the next corollary, that if f : N → S is a multiplicative function, l ≥ 2, and α is irrational, then we have Corollary 1.10. Let φ : T → U be a Riemann integrable function with φ = 0 and |φ | = 0, and P : R → T be a polynomial with degree at least 2 and irrational leading coefficient. Then for all multiplicative functions f 1 , . . . , f : N → U and h 1 , . . . , h ∈ Z + such that the sequence (∏ j=1 f j (k + h j )) k∈N is non-null, we have Regarding weights given by random ±1 sequences, we have the following result: Theorem 1.11. Let (X k (ω)) k∈N be a sequence of independent random variables with P(X k = −1) = P(X k = 1) = 1 2 , k ∈ N. Then ω-almost surely the following holds: For every ∈ N, all multiplicative functions f 1 , . . . , f : N → U, and h 1 , . . . , h ∈ Z + such that the sequence (∏ j=1 f j (k + h j )) k∈N is non-null, we have Remarks. • It is not hard to show that for any fixed collection of arbitrary sequences f 1 , . . . , f : N → U, we have that (5) holds ω-almost surely. So the important point in Theorem 1.11 is that the set of ω's for which the conclusion holds is independent of the (uncountably many) multiplicative functions f 1 , . . . , f .
• For = 1, Theorem 1.7 gives better results that apply to not necessarily symmetric random variables. But for ≥ 2 the method of proof of Theorem 1.7 fails to give (5) (since the relevant unweighted result is not known). Theorem 1.11 is based on Theorem 5.3 below, which is proved by combining some simple counting arguments and concentration of measure estimates for sums of independent random variables.

Proof strategy
Let us first recall the proof strategy of Theorem 1.1 given in [15]. An immediate consequence of Theorem 1.1 is that for every completely multiplicative function f : It turns out that a variant of this special case (see Proposition 2.5 below for w = 1) is the key ingredient in the proof of Theorem 1.1. The proof of (6) given in [15] proceeds by considering separately the case where f is structured ("pretentious") and random ("non-pretentious"). The latter case can be treated (as in Proposition 2.6 below) using the identities which hold for random-like ("non-pretentious") multiplicative functions. Likewise, our arguments rely on weighted variants of (6) and (7) that are of independent interest. For instance, we prove that if l ≥ 2 and α is irrational, then for every multiplicative function f : and we also prove stronger results involving weighted sums of products of shifts of several multiplicative functions. To prove (8) we rely on one of the main results in [3], which implies that for every l ∈ N and The fact that (9) holds for every multiplicative function f : N → S (which is not true for (7)) simplifies the proof of (8), versus the argument given for the proof of (6) in [15], and ultimately of the fact that (e(k l α)) k∈N is a good weight. One reason is that we do not have to carry out a separate analysis in the case where f is structured ("pretentious"), as was the case in [15]. The proofs of the results concerning random weights are simpler. Theorem 1.7 is based on a variant of (9) that uses random weights and is proved in Theorem 5.3 via elementary techniques. Theorem 1.8 is deduced from Theorem 1.1 using an elementary argument given in Section 5.2.

Some open problems
When w = 1 the problem is open even when a = b = f , where f : N → S is a completely multiplicative function (see remarks on Section 1.3). More generally, one can ask whether for the previous choices of the sequence w, for every a 1 , . . . , a : N → S and all h 1 , . . . , h ∈ Z + we have Corollary 1.10 shows that the answer is yes when a 1 , . . . , a are multiplicative functions with values on S and w is the sequence (e(k 2 α)) k∈N with α irrational. But unlike the previous discrepancy statements, we do not have a way to reduce Problem 1 to one about weighted sums of multiplicative functions. Any such reduction probably depends upon obtaining an integral representation result, analogous to Proposition 2.4 below, for sequences of the form where Φ is a multiplicative Følner sequence (see Section 2.1) along which all previous averages exist. Note though that more complicated "higher order multiplicative functions" arise this way, for instance, if f : N → S is defined by f (k) = e((n 1 α 1 + · · · + n l α l ) 2 ), where k = p n 1 1 · · · p n l l is the unique factorization of k ∈ N, and α 1 , . . . , α l ∈ R, then On a different direction, it seems likely that the zero integral condition in Corollary 1.5 can be removed. Proving this would probably necessitate to combine arguments of this article with a detailed analysis of the pretentious case (similar to the one in [15]), and it is not clear how to do this. Problem 2. Is it true that Corollary 1.5 holds even if we do not assume that φ = 0?
Let us say that a subset S of N is good for the Erdős discrepancy problem, or simply, good, if the indicator function 1 S is a good weight for the Erdős discrepancy problem. By taking the sequence (a(k)) k∈N in (2) to be an appropriate multiplicative function one easily verifies that the sets {n ≡ 0 (mod r)} for r ≥ 3, {2 n , n ∈ N}, and {p n , n ∈ N}, where p n is the n-th prime, are bad. On the other hand, it is easy to deduce form Theorem 1.1 that the sets rZ for r ∈ N and {n l , n ∈ N} for l ∈ N, are good. But it is not at all clear whether certain simple sets that lack multiplicative structure are good.
Problem 3. Are the sets {p n + 1, n ∈ N}, {n 2 ± 1, n ∈ N}, {2 n + 1, n ∈ N}, or {[n c ], n ∈ N} for c > 1 not an integer, good for the Erdős discrepancy problem? Theorem 1.8 implies that random subsets of the integers with positive density, and certain sparse random subsets with density roughly (log N) −1 in [N], are almost surely good. But how about sparser random subsets?
Is it true that ω-almost surely the sequence (X k (ω)) k∈N is a good weight for the Erdős discrepancy problem?

Notation
With U we denote the complex unit disc {z ∈ C : |z| ≤ 1} and with S we denote the complex unit circle {z ∈ C : |z| = 1}. With T we denote the 1-dimensional torus that we identify with R/Z. With N we denote the positive integers and with Z + the non-negative integers. For N ∈ N we let [N] := {1, . . . , N}. For t ∈ R we also let e(t) := e 2πit .
If A is a non-empty finite subset of N we let whenever the limits exist.
whenever the limits exist. Using partial summation one sees that if E n∈N a(n) = 0, then also E log n∈N a(n) = 0 (but the converse does not hold in general).

Reduction to statements about multiplicative functions 2.1 Multiplicative averages
We denote by Q + the multiplicative group of positive rationals.
An example of a multiplicative Følner sequence is given by is a multiplicative Følner sequence and a : N → U is such that the average below exists, we define the multiplicative average of the sequence a along Φ by Note that property (10) implies the following dilation invariance property of the multiplicative averages: For every a : Q + → U, multiplicative Følner sequence Φ, and r ∈ Q + , we have

Reduction to multiplicative functions via Bochner's theorem
A variant of the next lemma was proved in [15, Section 2] using Fourier analysis on an appropriate finite Abelian group (of the form (Z/MZ) r for large M, r ∈ N) and a compactness argument. We use a somewhat different approach (also used in [1, Section 10.2]) that invokes Bochner's theorem on positive definite functions. We first introduce some notation. Endowed with pointwise multiplication and the topology of pointwise convergence, the set M is a compact (metrizable) Abelian group.
where a : N → C is a bounded sequence and Φ = (Φ N ) N∈N is a multiplicative Følner sequence such that all the averages above exist. Then there exists a (positive) measure σ on the space M, with total mass equal to E d∈Φ |a(d)| 2 , such that Proof. We first extend the sequence a to the positive rationals Q + by letting a(r) = 0 for r ∈ Q + \ N. We define B : Q + → C as follows B(r) := E d∈Φ a(rd) a(d), r ∈ Q + .
Using the dilation invariance property (11) and our assumption that the averages defining the sequence A exist, we deduce that the averages below exist and we have B(rs −1 ) = E d∈Φ a(rd) a(sd), r, s ∈ Q + .
We are going to use this identity in order to verify that B is a positive definite sequence on Q + with pointwise multiplication. Indeed, for all c 1 , . . . , c N ∈ C and r 1 , . . . , r N ∈ Q + , we have Note that the dual group of (Q + , ·) consists of the completely multiplicative functions on Q + with unit modulus, and any such ψ : Q + → S satisfies ψ(m/n) = f (m) f (n), m, n ∈ N, for some completely multiplicative function f ∈ M. A well known theorem of Bochner gives that there exists a (positive) Borel measure σ on the space M such that The total mass of σ is B(1) = E d∈Φ |a(d)| 2 . Lastly, we have B(k/l) = E d∈Φ a(kd/l) a(d) = E d∈Φ a(kd) a(ld), and the proof is complete.
Using the previous representation theorem we get the following criterion: Proposition 2.5. Let w : N → U be such that for every probability measure σ on the space M we have Then w is a good weight for the Erdős discrepancy problem.
Proof. Arguing by contradiction, suppose that w is not a good weight for the Erdős discrepancy problem. Then there exists a sequence a : N → S such that sup d,n∈N n ∑ k=1 a(dk) w(k) < +∞.
We average with respect to d over a multiplicative Følner sequence of intervals Φ = (Φ N ) N∈N , chosen so that all relevant averages below exist (such a sequence can always be found using a diagonalisation argument), and deduce that Expanding the square we get that the expression in (12) is equal to where A(k, l) := E d∈Φ a(dk) a(dl), k, l ∈ N.
By Lemma 2.4, there exists a (positive) measure σ on the space M, with total mass E d∈Φ |a(d) We deduce that the expression (13), and hence the expression in (12), is equal to This contradicts our assumption and completes the proof.

Reduction to correlation estimates
As was the case in [15], a key step in the proof of our main results is an elementary observation that allows to deduce unboundedness of partial sums from vanishing of self-correlations (which are defined using logarithmic averages because of reasons explained in the next section).
Proposition 2.6. Let b : N → U be a non-null sequence such that for every h ∈ N we have Proof. Arguing by contradiction, suppose that the conclusion fails. Then there exists C > 0 such that Using this, we can find a sequence of intervals N = ([N l ]) l∈N , with N l → ∞, such that all averages E log n∈N written below exist and for every H ∈ N we have Since the sequence b is non-null, we have since by our assumption E log n∈N b(n + h 1 ) b(n + h 2 ) = 0 for h 1 = h 2 and we also used twice that the logarithmic averages of a bounded sequence are translation invariant. From the above we deduce that HB ≤ 4C 2 and we get a contradiction by choosing H > 4C 2 /B. Proposition 2.7. Let w : N → U be a non-null sequence such that for every multiplicative function f : N → S and every h ∈ N we have Then w is a good weight for the Erdős discrepancy problem.
Proof. Arguing by contradiction, suppose that the conclusion fails. Then by Proposition 2.5 there exist a sequence w : N → U, a probability measure σ on the space M, and C > 0, such that Using this and a diagonalization argument, we can find a sequence of intervals N = ([N l ]) l∈N , with N l → ∞, such that E log n∈N |w(n)| 2 and all averages E log n∈N written below exist and for every H ∈ N we have We let A := E log n∈N |w(n)| 2 > 0 where the positiveness follows since the sequence w is non-null by our assumption. Next, notice that Since σ is a probability measure, we deduce using the bounded convergence theorem that Combining (14) and (15) we deduce that H A ≤ 4C and we get a contradiction by choosing H > 4C/A.

Notions and results from ergodic theory
The proof of our main results regarding structured (zero entropy) sequences depend on some notions and results in ergodic theory that we describe next. The material in this section is not needed for the results concerning random weights.

Measure preserving systems
A measure preserving system, or simply a system, is a quadruple (X, X, µ, T ) where (X, X, µ) is a probability space and T : X → X is an invertible, measurable, measure preserving transformation. We typically omit the σ -algebra X and write (X, µ, T ). Throughout, for n ∈ N we denote by T n the composition T • · · · • T (n times) and let T −n := (T n ) −1 and T 0 := id X . Also, for f ∈ L 1 (µ) and n ∈ Z we denote by T n f the function f • T n .
We say that the system (X, µ, T ) is ergodic if the only functions f ∈ L 1 (µ) that satisfy T f = f are the constant ones. It is totally ergodic if (X, µ, T d ) is ergodic for every d ∈ N.

Furstenberg systems
For readers convenience, we reproduce here some ergodic notions and constructions that can also be found in [2,3]. For the purposes of this article, all averages in the definitions below are taken to be logarithmic. The reason is that we later on invoke results from ergodic theory, like Theorem 3.7 below, that are only known when the joint Furstenberg systems are defined using logarithmic averages. This limitation comes from the number theoretic input used in the proof of Theorem 3.7, in particular, the identities in [3, Theorem 3.1]. For every finite collection of sequences that admits log-correlations on a given sequence of intervals, we use a variant of the correspondence principle of Furstenberg [5,6] in order to associate a measure preserving system that captures the statistical properties of these sequences.  := (a 1 , . . . , a ) is thought of as an element of X. We call (X, µ, T ) the joint Furstenberg system associated with (A, N).
Remark. If we are given sequences a 1 , . . . , a : N → U that are defined on N, we extend them to Z in an arbitrary way. It is easy to check that the measure µ will not depend on the extension.
Note that a collection of sequences a 1 , . . . , a : Z → U may have several non-isomorphic joint Furstenberg systems depending on which sequence of intervals N we use in the evaluation of their joint correlations. For convenience of exposition, we sometimes associate a property of ergodic nature with a given finite collection of sequences if all joint Furstenberg systems of the collection have this property. In particular, we often use the following terminology: Definition 3.3. We say that a sequence a : Z → U is totally ergodic and/or has zero entropy, if all its Furstenberg systems are totally ergodic and/or have zero entropy.
Remark. In [8], a zero entropy sequence is called completely deterministic.
Examples of zero entropy sequences include the sequences (e(n l α)) n∈N where l ∈ N and α ∈ R; these sequences are also totally ergodic if α is irrational (see Proposition 4.2 below).

Disjointness properties
We will use the following notion that was introduced by Furstenberg in [4]: Definition 3.4. We say that two systems (X, µ, T ) and (Y, ν, S) are disjoint, if the only T × S invariant measure on the product space (X × Y, µ × ν), with first and second marginals the measures µ and ν respectively, is the product measure µ × ν.
The notion of disjointness in ergodic theory naturally introduces the following notion of statistical disjointness of two finite collections of bounded sequences.
Proof. Arguing by contradiction, suppose that the conclusion fails. Then there exists a sequence of intervals N = ([N l ]) l∈N , with N l → ∞, on which the family A ∪ A admits log-correlations and we have for some choice of A n = ∏ m j=1ã j (n + h j ), A n = ∏ m j=1ã j (n + h j ), n ∈ N, where m, m , h j , h j ∈ N and a j ∈ A ∪ A,ã j ∈ A ∪ A . Let (X, µ, T ) and (X , µ , T ) be the joint Furstenberg systems associated with (A , N) and (A , N) respectively.
We let x 0 := (a 1 , . . . , a ) ∈ X and x 0 := (a 1 , . . . , a ) ∈ X . After passing to a subsequence of N (which for simplicity we denote again by N), we can assume that the weak-star limit exists and defines a T × T invariant measure on X × X . The projection of ρ on X is the weak-star limit lim l→∞ E log n∈[N l ] δ x 0 , which is the measure µ. Likewise, the projection of ρ on X is the measure µ . Since the families A and A are statistically disjoint, the systems (X, µ, T ) and (X , µ , T ) are disjoint, hence Now for x = (x 1 (n), . . . , x (n)) n∈Z ∈ X we let Likewise, for x = (x 1 (n), . . . , x (n)) n∈Z ∈ X we let . . , }.
With the above notation, we define the function F(x) := ∏ m j=1 G h j , j (x), x ∈ X, where for j = 1, . . . , m ifã j = a k j or a k j for some k j ∈ {1, . . . , } we set G h j , j to be F h j ,k j or F h j ,k j respectively. Likewise, we define the function F (x ) := ∏ m j=1 G h j , j (x ), x ∈ X . Then using (16) and the definition of the measures µ, µ and the measure ρ given by (17), we get that This contradicts (18) and completes the proof.  Restating Theorem 3.7 using terminology introduced in the previous definitions we get the following result:  First we show that the assumption of Proposition 2.6 is satisfied for various sequences of interest. Proposition 4.1. Suppose that w : N → U is a totally ergodic sequence with zero entropy and vanishing self-correlations. Let also f 1 , . . . , f : N → U be multiplicative functions, h 1 , . . . , h ∈ Z + , and b(n) := w(n) ∏ j=1 f j (n + h j ), n ∈ N. Then for every h ∈ N we have converges to zero as N → ∞. Since by our assumption E log n∈N w(n + h) w(n) = 0 for every h ∈ N, the result follows.
To prove Theorem 1.9, we note first that by Theorem 3.8, the collection of sequences { f 1 , . . . , f } and {w} are statistically disjoint. Hence, Proposition 3.6 gives that the difference converges to 0 as N → ∞. Using this and our assumption that the sequences (w(n)) n∈N and (∏ j=1 f j (n + h j )) n∈N are non-null, we deduce that their product is also non-null. With this in mind, Theorem 1.9 follows from Propositions 2.6 and 4.1.

Proof of Corollaries 1.5 and 1.10
We will need the following fact: be a non-constant polynomial with irrational leading coefficient and let φ : T → U be Riemann integrable. Then the sequence (φ (P(n))) n∈N has zero entropy, is totally ergodic, and has a unique Furstenberg system.
Remark. In order to have total ergodicity it is essential that the leading coefficient of P (and not just any non-constant coefficient) is irrational. For example, if a(n) := e( n 3 3 + n 2 α), n ∈ N, where α is irrational, then it turns out that the sequence (a(n)) is not totally ergodic. We thank S. Pattison for pointing this out, see Sections 5.3 and 5.4 in [12] for a related discussion.
Proof. Let d := deg P. We start with the well known fact (see [6,Section 1.7] or [12,Section 4.4]) that there exists a unipotent affine transformation S : T d → T d , with unique invariant measure the Haar measure m T d , so that the system (T d , m T d , S) is totally ergodic (here we used that the leading coefficient of P is irrational), a Riemann integrable function Ψ : T d → U, and y 0 ∈ T d , such that Ψ(S n y 0 ) = φ (P(n)) for every n ∈ Z.
Clearly we have π • T = S • π. Next, let m ∈ N and −m , . . . , m ∈ Z. We define the function where we used the following conventions: for z ∈ U and k < 0 we have z k := z −k and 0 0 = 0. Note that the linear span of all such functions forms a conjugation closed subalgebra of C(X) that separates points, hence it is dense in C(X). Next note that for x 0 := (φ (P(n))) n∈Z ∈ X we have where to justify the second identity we use (19), for the third we use the unique ergodicity of S and the fact that Ψ • S n is Riemann integrable for n ∈ Z, and for the fourth we use (20). By linearity and density, it follows that the sequence of measures (E n∈[N] δ T n x 0 ) N∈N (and hence the sequence (E log n∈[N] δ T n x 0 ) N∈N ) converges weak-star to a measure µ on X, which is equal to the image of the measure m T d under π. From the above, we deduce that the sequence (φ (P(n))) n∈Z has a unique Furstenberg system, which is (X, µ, T ), and π is a factor map from the system (T d , m T d , S) to the system (X, µ, T ). Since the system (T d , m T d , S) is totally ergodic and has zero entropy, the same holds for its factor (X, µ, T ). This completes the proof.
Proof of Corollaries 1.5 and 1.10. It suffices to verify that the sequence w(n) := φ (P(n)), n ∈ N, satisfies the assumptions of Theorem 1.4. Since P has a non-constant coefficient irrational, the sequence (P(n)) n∈N is equidistributed in T, which gives that E log n∈N |w(n)| 2 = |φ | 2 > 0, so w is non-null. Moreover, it follows from Proposition 4.2 that w has zero entropy and is totally ergodic. It remains to verify that it has vanishing self-correlations, meaning, E log n∈N w(n + h) w(n) = 0 for every h ∈ N. In fact, we establish a stronger property: If φ , ψ : T → C are Riemann integrable, then for every h ∈ N we have Using standard Weyl estimates this is easily shown to be the case when φ (t) := e(kt) and ψ := e(lt) for some k, l ∈ Z (this is the only point where we use the assumption that P has a non-linear coefficient irrational). Using linearity and uniform approximation by trigonometric polynomials, we deduce that (21) holds for all φ , ψ ∈ C(T). Finally, we deduce that (21) holds for all Riemann integrable φ , ψ by approximating them in L 1 (m T ) by continuous functions and using that the sequence (P(n + h)) n∈N is equidistributed in T for every h ∈ Z. This completes the proof.
We also let B ε be an ε-net of points in U of minimal cardinality (thus |B ε | ≤ 4ε −2 ) and define M ε,N := {g ∈ M N : g(k) ∈ B ε for all prime powers k ∈ [N]}.
We need two lemmas. The first is an approximation property. Proof. Since B ε is an ε-net of U, and an element of M can take arbitrary prescribed values on prime powers, as long as these values are taken in U, there exists g ∈ M ε,N such that g(1) = f (1) and For n ∈ {2, . . . , N}, let n = k 1 · · · k l , where l ≤ log 2 N, be the unique factorization of n into prime powers k 1 , . . . , k l . Using the multiplicativity of f and g, the estimate (22), and telescoping, we get This completes the proof.
The next lemma gives an upper bound on the elements of M ,ε,N that suffices for our purposes. . Since for large enough N there are at most 2 N log N prime powers up to N and f j (k) ∈ B ε for j = 1, . . . , , we deduce that The asserted bound follows since |B ε | ≤ 4ε −2 .
Combining the previous two lemmas we can prove the following result, which is an essential ingredient of the proofs of Theorems 1.7 and 1.11. Theorem 5.3. Let (X n (ω)) n∈N be a sequence of independent random variables with P(X n = −1) = P(X n = 1) = 1 2 , n ∈ N. Then for every a : N → U we have that ω-almost surely the following holds: For every ∈ N, all multiplicative functions f 1 , . . . , f : N → U, and all h 1 , . . . , h ∈ Z + , we have (24) Remarks.
• As was the case with Theorem 1.11, the important point in this statement is that the set of ω's for which (24) holds can be chosen independently of the (uncountably many) multiplicative functions f 1 , . . . , f : N → U.
• We note that for = 1 the previous result can also be proved using an orthogonality criterion by utilizing the fact that for every b : N → U we have ω-almost surely E n∈N b(n) X np (ω) X nq (ω) = 0 for all p = q. But this method does not seem to be of much help when ≥ 2 and it is the = 2 case that is needed in the proof of Theorem 1.7.
Proof. Since and h 1 , . . . , h take values on a countable set, it suffices to show that for all fixed ∈ N, h 1 , . . . , h ∈ Z + , and a : N → U, the following statement holds ω-almost surely: For all multiplicative functions f 1 , . . . , f : To prove this, we first note that using standard concentration of measure estimates (for example Bernstein's exponential inequality) we have for every fixed sequence b : N → U and every N ∈ N and δ > 0 that We let δ N := (log N) −1/3 and ε N := (log N) −2 , N ∈ N.
Using the notation introduced in (23), we get for every large enough N ∈ N that P sup This completes the proof.
Proof of Theorems 1.7 and 1.11. Let f 1 , . . . , f and h 1 , . . . , h be as in Theorem 1.11. Note that ω-almost surely the sequence (X k (ω) ∏ j=1 f j (k + h j )) k∈N is non-null, since ω-almost surely |X k (ω)| = 1, k ∈ N, and by assumption (∏ j=1 f j (k + h j )) k∈N is non-null. Likewise, if a : N → U is a non-null sequence and f : N → S is a multiplicative function, then ω-almost surely (a(k) X k (ω) f (k)) k∈N is non-null. Since all fixed parameters that appear below take values on a countable set, by Proposition 2.7 (for Theorem 1.7) and Proposition 2.6 (for Theorem 1.11) it suffices to show that for every fixed b : N → S, all h, ∈ N, and all h 1 , . . . , h ∈ Z + , we have ω-almost surely the following (for Theorem 1.7 we only need to use the case = 1, h 1 = 0): For all multiplicative functions f 1 , . . . , f : N → U we have E n∈N b(n) X n+h (ω) · X n (ω) ∏ j=1 f j (n + h + h j ) ∏ j=1 f j (n + h j ) = 0.
(26) (Note that then (26) also holds with E log n∈N in place of E n∈N .) We partition the positive integers into the following two sets We let Y n (ω) := X n+h (ω) · X n (ω), n ∈ N.
Note that P(Y n = −1) = P(Y n = 1) = 1 2 for all n ∈ N. Moreover, for n ∈ S 1 (and fixed h ∈ N) the random variables Y n (ω) are independent, and the same holds for the random variables Y n (ω) for n ∈ S 2 . For i = 1, 2 we consider independent random variables Z n,i (ω), n ∈ N, such that P(Z n,i = −1) = P(Z n,i = 1) = 1 2 , n ∈ N, and Z n,i := Y n for n ∈ S i . For i = 1, 2, we apply Theorem 5.3 for the random variables (Z n,i (ω)) n∈N and a i (n) := b(n) 1 S i (n) (then a i (n) Z n,i = b(n) 1 S i (n)Y n , n ∈ N), and deduce that ω-almost surely we have E n∈N 1 S i (n) b(n)Y n (ω) ∏ j=1 f j (n + h + h j ) ∏ j=1 f j (n + h j ) = 0 for i = 1, 2. Adding the two identities we get (26). This completes the proof.

Proof of Theorem 1.8
We will use the following finitistic strengthening of Theorem 1.1 that can be deduced from Theorem 1.1 using a compactness argument: Theorem 5.4. For every C > 0 there exists m ∈ N such that for every sequence a : [m] → S there exist d, n ∈ N with dn ≤ m such that | ∑ n k=1 a(dk)| > C.
We deduce from this some necessary conditions for a sequence to be a good weight for the Erdős discrepancy problem.
Lemma 5.5. Let w : N → C be a sequence and c ∈ C \ {0}. Suppose that for infinitely many m ∈ N there exists r ∈ N such that w r m! i + j = c for all i, j ∈ {1, . . . m}.
Then w is a good weight for the Erdős discrepancy problem.
Remark. The conclusion fails if we simply assume that w is equal to a non-zero constant on a union of arbitrarily long intervals. To see this, let (a(k)) k∈N be a completely multiplicative function that is equal to (−1) n on a sequence of intervals with lengths even numbers that increase to infinity (such a multiplicative function can be explicitly constructed). Let also w be the indicator function of the union of this sequence of intervals. Then sup d,n∈N ∑ n k=1 a(dk) w(k) ≤ 1.
Proof. Let a : N → S be a sequence and C > 0. Let m ∈ N be so that Theorem 5.4 applies for this C and (27) holds for some c ∈ C \ {0} and r ∈ N. We use Theorem 5.4 for the sequence (a(rm! + k)) k∈[m] and we get that there exist d, n ∈ N, with dn ≤ m, such that n ∑ k=1 a(rm! + dk) ≥ C |c| .
We let