Ergodicity of the Liouville system implies the Chowla conjecture

The Chowla conjecture asserts that the values of the Liouville function form a normal sequence of plus and minus ones. Reinterpreted in the language of ergodic theory it asserts that the Liouville function is generic for the Bernoulli measure on the space of sequences with values plus or minus one. We show that these statements are implied by the much weaker hypothesis that the Liouville function is generic for an ergodic measure. We also give variants of this result related to a conjecture of Elliott on correlations of multiplicative functions with values on the unit circle. Our argument has an ergodic flavor and combines recent results in analytic number theory, finitistic and infinitary decomposition results involving uniformity seminorms, and qualitative equidistribution results on nilmanifolds.


Introduction and main results
1.1. Introduction. Let λ : N → {−1, 1} be the Liouville function which is defined to be 1 on integers with an even number of prime factors, counted with multiplicity, and −1 elsewhere. It is generally believed that the values of the Liouville function enjoy various randomness properties and one manifestation of this principle is an old conjecture of Chowla [5] which asserts that for all ℓ ∈ N and all distinct n 1 , . . . , n ℓ ∈ N we have λ(m + n 1 ) · · · λ(m + n ℓ ) = 0.
The conjecture is known to be true only for ℓ = 1; this case is elementarily equivalent to the prime number theorem. For ℓ = 2 and for all odd values of ℓ ∈ N, a variant involving logarithmic averages was recently established by Tao [42] and Tao, Teräväinen [44] respectively, and an averaged form of Chowla's conjecture was established by Matomäki, Radziwiłł, and Tao [36] using a recent breakthrough of Matomäki and Radziwiłł [35] concerning averages of bounded multiplicative functions on typical short intervals. For all ℓ ≥ 2 the conjecture remains open for Cesàro averages and for all even ℓ ≥ 4 it remains open for logarithmic averages. It is a consequence of the previous results that all size three sign patterns are taken by consecutive values of λ with positive lower density [37] (and in fact with logarithmic density 1/8 [44]) and all size four sign patterns are taken with positive lower density [44]. Similar results are not known for patterns of longer size and in fact out of the 2 ℓ possible size ℓ sign patterns only O(ℓ) of them are known to be taken by consecutive values of λ (the Chowla conjecture predicts that all 2 ℓ patterns are taken and each one with density 2 −ℓ ). We can reinterpret the Chowla conjecture in the language of ergodic theory, hoping that this offers some appreciable advantage (a point of view also taken for example in [1,39]). Assuming for the moment that the limit on the left hand side of (1) exists for all ℓ ∈ N and n 1 , . . . , n ℓ ∈ N, we introduce in a natural way a dynamical system (see Proposition 2.3), which we call the "Liouville system". The Chowla conjecture implies that this system is a Bernoulli system, but up to now, randomness properties of the Liouville system that are much weaker than independence remain elusive. For instance, it is not known whether this system is of positive entropy, weakly mixing, or even ergodic. We prove that the much weaker hypothesis of ergodicity implies Bernoullicity of the Liouville system and the Chowla conjecture: Theorem. If the Liouville system is ergodic, then the Chowla conjecture is satisfied.
Thinking of λ as a point on the sequence space {−1, 1} N , we can reformulate this result using notation from [18,Definition 3.4] as follows: If the Liouville function is generic for an ergodic measure on the sequence space, then the Chowla conjecture is satisfied.
An implicit assumption made in these statements is that the Liouville function admits correlations for Cesàro averages on the integers. In Section 1.2.1 we give results that do not depend on such strong hypothesis; we work with sequences of intervals with left end equal to 1 along which the Liouville function admits correlations for logarithmic averages (such sequences are guaranteed to exist), and our main result is Theorem 1.1 which states that ergodicity of the corresponding measure preserving system implies that the Chowla conjecture holds for logarithmic averages on the same sequence of intervals.
Three main ingredients enter the proof of Theorem 1.1: (i) A recent result of Tao (see Theorem 1.7) enables to reduce the Chowla conjecture for logarithmic averages to a local uniformity property of the Liouville function (this is the only reason why some of our statements involve logarithmic averages). Our goal then becomes to prove this uniformity property (stated in Theorem 1.6). (ii) An inverse theorem for local uniformity seminorms, which takes a particularly useful form for ergodic sequences (see Theorem 4.1). In order to prove it we use both infinitary and finitary decomposition results (see Propositions 4.4 and 4.6). The former is proved via an ergodic inverse theorem of Host and Kra [29], and the latter via a finitistic inverse theorem of Green, Tao, and Ziegler [26]. The ergodicity of the sequence is essential; without this assumption we are led to conditions that we are unable to verify for the Liouville function. (iii) An asymptotic orthogonality property of the Liouville function with nilsequences taken on typical short intervals (see Proposition 5.1); this is needed in order to verify that the aforementioned inverse theorem is applicable to the Liouville function. For Abelian nilsequences the orthogonality property follows from recent work of Matomäki, Radziwiłł, and Tao (see Proposition 2.10). For general nilsequences additional tools are needed; the heart of the argument is a result of purely dynamical context (see Proposition 5.6) and the only extra numbertheoretic input needed is the orthogonality criterion of Lemma 5.5. Our argument also works for the Möbius function; hence, ergodicity of the Möbius function implies a related Chowla-type result, and as a consequence, it also implies a conjecture of Sarnak [39,40] stating that the Möbius function is uncorrelated with any bounded deterministic sequence. Moreover, our argument shows that every ergodic strongly aperiodic multiplicative function (see Definition 2.9) is locally uniform (see Theorem 1.6). This last property implies an Elliott-type result for this larger class of multiplicative functions (see Theorem 1.4) and in turn gives non-correlation with any bounded deterministic sequence.
1.2. Main results. In this subsection we give the precise statements of our main results, modulo notation that appears in the next section. We let [N ] = {1, . . . , N }. Remarks. • Since for every ℓ ∈ N each size ℓ sign pattern is expected to be taken by consecutive values of λ, we cannot substitute the intervals [N k ] with intervals that do not start at 1. The same comment applies to the results of the next subsection.
• We stress that if we assume ergodicity of the Liouville system for Cesàro (instead of logarithmic) averages on I, our argument does not allow to deduce that the Chowla conjecture is satisfied for Cesàro averages on I.
Since for every a ∈ ℓ ∞ (N) and I = ([N k ]) k∈N , N k → ∞, there exists a subsequence I ′ of I on which the sequence a admits correlations, we deduce from Theorem 1.1 the following: Corollary 1.2. Suppose that whenever the Liouville (or the Möbius) function admits correlations for logarithmic averages on a sequence of interval,s the induced measure preserving system is ergodic. Then the Liouville (resp. the Möbius) function satisfies the Chowla conjecture for logarithmic averages on ([N ]) N ∈N .
Since convergence of Cesàro averages on I = ([N ]) N ∈N implies convergence to the same limit of logarithmic averages on I, we deduce the result stated in the introduction: Further analysis of structural properties of measure preserving systems naturally associated with the Liouville or the Möbius function appear in the recent article [15]. The direction taken in [15] is complementary to the one in this article and the techniques used very different.

1.2.2.
Ergodicity and Elliott's conjecture. We give a variant of our main result which applies to correlations of arbitrary multiplicative functions with values on the unit circle. This relates to logarithmically averaged variants of conjectures made by Elliott in [9,10]. Theorem 1.4. Let f 1 ∈ M be a strongly aperiodic multiplicative function which is ergodic for logarithmic averages on I = ([N k ]) k∈N , N k → ∞. Then for every s ≥ 2, all f 2 , . . . , f s ∈ M and all distinct n 1 , . . . , n s ∈ N, we have (2) E log m∈I f 1 (m + n 1 ) · · · f s (m + n s ) = 0. Elliott conjectured that the conclusion holds for Cesàro averages without the ergodicity assumption and under the weaker assumption of aperiodicity (which coincides with strong aperiodicity for real valued multiplicative functions), but in [36,Theorem B.1] it was shown that for complex valued multiplicative functions a stronger assumption is needed and strong aperiodicity seems to be the right one.
Specializing the previous result to the case f 1 = · · · = f s = f where f is an aperiodic multiplicative function taking values plus or minus one only (aperiodicity implies strong aperiodicity in this case) we deduce the following: 1} be an aperiodic multiplicative function which admits correlations on I = ([N k ]) k∈N , N k → ∞, for logarithmic averages. Then the Furstenberg system induced by f and I for logarithmic averages is ergodic if and only if it is Bernoulli.

1.2.3.
Ergodicity and local uniformity. The key step taken in this article in order to prove Theorem 1.1, is to establish local uniformity for the class of ergodic strongly aperiodic multiplicative functions. The precise statement is as follows (the notions used are explained in Section 2): Theorem 1.6. Let f ∈ M be a strongly aperiodic multiplicative function which is ergodic for Cesàro (or logarithmic) averages on Remark. It is shown in [14] (and previously in [23,25,26] for the Möbius and the Liouville function) that if f is an aperiodic multiplicative function, then for every s ∈ N we have lim N →∞ f U s (Z N ) = 0 where · U s (Z N ) are the Gowers uniformity norms. It should be stressed though, that when I = ([N ]) N ∈N , the local uniformity condition f U s (I) = 0 is strictly stronger and cannot be inferred from Gowers uniformity for any s ≥ 2. For example, the (non-ergodic) sequence a(n) is ergodic for Cesàro averages on I and satisfies lim N →∞ b U s (Z N ) = 0 for every s ∈ N, but b U 2 (I) = 1.
For a sketch of the proof of Theorem 1.6 see Section 5.1. The link between Theorem 1.1 and Theorem 1.6 is given by the following result of Tao (it follows from [43, Theorem 1.8 and Remarks 1.9, 3.4]): [43]). Let s ∈ N, f be the Liouville or the Möbius function, and suppose that f admits correlations for logarithmic averages on If f U s * ,log (I) = 0, then f satisfies the logarithmic Chowla conjecture on I for correlations involving s + 1 terms.
Remarks. • The equivalence is proved in [43] only when N k = k, k ∈ N, but the argument in [43] also gives the stated result.
• An extension of this result that covers more general multiplicative functions is suggested in [43, Remarks 1.10 and 3.5]. We give a related result in Theorem 1.8 below.
• The two main ingredients used in the proof of Theorem 1.7 is a newly devised "entropy decrement" argument from [42] and the Gowers uniformity of the W -tricked von Mangoldt function established in [23,24,26].
In order to obtain Theorem 1.4 we use the following variant of the previous result which is established in Section 2.7. The starting point of the proof is an identity for general sequences (see Proposition 2.11) which is implicit in [42]. Theorem 1.8. Let f 1 ∈ M be a multiplicative function which admits correlations for logarithmic averages on I = ([N k ]) k∈N , N k → ∞, and satisfies f 1 U s log (I) = 0 for some s ≥ 2. Then E log m∈I f 1 (m + n 1 ) · · · f s (m + n s ) = 0 holds for all f 2 , . . . , f s ∈ M and distinct n 1 , . . . , n s ∈ N.
1.3. A problem. The previous results motivate the following problem: Problem. Let f ∈ M be a strongly aperiodic multiplicative function which admits correlations for Cesáro (or logarithmic) averages on I = ([N k ]) k∈N , N k → ∞. Then the sequence (f (n)) n∈N is ergodic on I for Cesàro (corr. logarithmic) averages.
Remark. In fact, it seems likely that every real valued bounded multiplicative function is ergodic for Cesàro averages on I = ([N ]) N ∈N .
A solution to this problem for logarithmic averages for the Liouville (or the Möbius) function, coupled with Corollary 1.2, would imply that the Liouville (or the Möbius) function satisfies the Chowla conjecture, and hence the Sarnak conjecture, for logarithmic averages. It would also imply that all possible sign patterns are taken by consecutive values of λ, and each size ℓ pattern with logarithmic density 2 −ℓ , and as a consequence, with upper natural density greater than 2 −ℓ .
Currently, we cannot even exclude the (unlikely) possibility that λ is generic for a measure on {−1, 1} N which induces a system with ergodic components circle rotations.

Background, notation, and tools
In this section we define some concepts used throughout the article.
2.1. Cesàro and logarithmic averages. Recall that for N ∈ N we let [N ] = {1, . . . , N }. If A is a finite non-empty subset of N and a : A → C, then we define the • Cesàro average of (a(n)) n∈A on A to be E n∈A a(n) := 1 |A| n∈A a(n); • logarithmic average of (a(n)) n∈A on A to be We say that the sequence of intervals I = (I N ) N ∈N is a Følner sequence for If a : N → C is a bounded sequence, and I = (I N ) N ∈N is a Følner sequence of intervals for Cesàro or logarithmic averages, we define the • Cesàro mean of (a(n)) n∈N on I to be if the limit exists; • logarithmic mean of (a(n)) n∈N on I to be if the limit exists. • If the previous mean values exist for every Følner sequence of intervals I, then we denote the common mean value by E n∈N a(n) and E log n∈N a(n) respectively. Note that all these mean values are shift invariant, meaning, for every a ∈ ℓ ∞ (N) and h ∈ N the sequences (a(n)) n∈N and (a(n + h)) n∈N have the same Cesàro/logarithmic mean on I.
It is easy to see using partial summation, that if (a(n)) n∈N has a mean value on Definition 2.1. Let I = (I N ) N ∈N be a sequence of intervals with |I N | → ∞. We say that the sequence a ∈ ℓ ∞ (N) satisfies the Chowla conjecture for Cesàro averages on I if E m∈I a(c 1 m + n 1 ) · · · a(c s m + n s ) = 0, for all s ∈ N, c 1 , . . . , c s ∈ N, and non-negative integers n 1 , . . . , n s such that c i n j = c j n i for all i = j.
Similar definitions apply for logarithmic averages and when we restrict the number of terms in the product.

2.2.
Measure preserving systems. A measure preserving system, or simply a system, is a quadruple (X, X , µ, T ) where (X, X , µ) is a probability space and T : X → X is an invertible measure preserving transformation. The system is ergodic if the only sets that are left invariant by T have measure 0 or 1. The von Neumann ergodic theorem states that for ergodic systems we have for every sequence of intervals (I N ) N ∈N with |I N | → ∞ and functions F, G ∈ L 2 (µ). In the previous statement and throughout, with T F we denote the composition F • T .

Ergodicity of sequences.
To each bounded sequence that is distributed "regularly" along a sequence of intervals with lengths increasing to infinity, we associate a measure preserving system; the notion of ergodicity of this sequence is then naturally inherited from the corresponding property of the system. Definition 2.2. Let I := (I N ) N ∈N be a sequence of intervals with |I N | → ∞. We say that the sequence a ∈ ℓ ∞ (N) admits correlations for Cesàro averages on I, if the limit exists, for every s ∈ N, n 1 , . . . , n s ∈ N (not necessarily distinct), and all sequences b 1 , . . . , b s that belong to the set {a,ā}.
A similar definition applies for logarithmic averages; in place of E m∈I N use E log m∈I N . Remark. If a ∈ ℓ ∞ (Z), then using a diagonal argument we get that any sequence of intervals I = (I N ) N ∈N has a subsequence I ′ = (I N k ) k∈N , such that the sequence (a(n)) n∈N admits correlations on I ′ .
The correspondence principle of Furstenberg was originally used in [17] in order to translate Szemerédi's theorem on arithmetic progressions to an ergodic statement. We will use the following variant which applies to general bounded sequences: Proposition 2.3. Let a ∈ ℓ ∞ (N) be a sequence that admits correlations for Cesàro averages on the sequence of intervals I := (I N ) N ∈N with |I N | → ∞. Then there exist a system (X, X , µ, T ) and a function F ∈ L ∞ (µ), such that E m∈I a 1 (m + n 1 ) · · · a s (m + n s ) = T n 1 F 1 · · · T ns F s dµ, for every s ∈ N, n 1 , . . . , n s ∈ N, where for j = 1, . . . , s the sequence a j is either a or a and F j is F or F respectively. A similar statement holds for logarithmic averages.
Remark. For sequences bounded by 1, in the previous correspondence, X, X , T , and F can be taken to be fixed, and it is only the measure µ that varies. Furthermore, the system constructed is uniquely determined up to isomorphism by the pair (a, I).
Proof. Let X := D Z , where D is the closed disk in C of radius a ∞ , be endowed with the product topology and with the invertible and continuous shift T given by (T x)(k) = x(k + 1), k ∈ Z. We define F ∈ C(X) by F (x) := x(0) and ω ∈ D Z by ω(k) := a(k) for k ∈ N and ω(k) = 0 for k ≤ 0. Lastly, we let µ be a w * -limit point for the sequence of measures µ N := 1 |I N | n∈I N δ T n ω , N ∈ N. Then µ is a T -invariant probability measure on X, and since F (T n ω) = a(n) for n ∈ N and (a(n)) n∈N admits correlations for Cesàro averages on I, the asserted identity follows immediately.
Definition 2.4. Let a ∈ ℓ ∞ (N) be a sequence that admits correlations for Cesàro averages on the sequence of intervals I := (I N ) N ∈N with |I N | → ∞. We call the system defined in Proposition 2.3 the Furstenberg system induced by a and I for Cesàro averages.
A similar definition applies for logarithmic averages.
Remarks. • A priori a sequence a ∈ ℓ ∞ (Z) may have uncountably many non-isomorphic Furstenberg systems depending on which sequence of intervals I we choose to work with. Furthermore, for fixed (a, I) the Furstenberg systems associated with Cesáro and logarithmic averages could be very different.
• If we assume that the Liouville function admits correlations on ([N ]) N ∈N , then the corresponding Furstenberg system is the Liouville system alluded to in the introduction.
Definition 2.5. Let I = (I N ) N ∈N be a sequence of intervals with |I N | → ∞. We say that a sequence a ∈ ℓ ∞ (N) is ergodic for Cesàro averages on I if (i) it admits correlations for Cesàro averages on I; (ii) the induced measure preserving system for Cesàro averages is ergodic. A similar definition applies for logarithmic averages and we say that a ∈ ℓ ∞ (N) is ergodic for logarithmic averages on I.
Note that condition (ii) for Cesàro averages is equivalent to having the identities for all b, c ∈ ℓ ∞ (N) of the form b(m) = a 1 (m + h 1 ) · · · a s (m + h s ), m ∈ N, for some s ∈ N, non-negative integers h 1 , . . . , h s , and a i ∈ {a, a}, and similarly for (c(m)) m∈N . For logarithmic averages a similar condition holds with E m∈I replaced by E log m∈I .
2.4. Ergodic seminorms and the factors Z s . Following [29], if (X, X , µ, T ) is a system we define the Host-Kra seminorms of F ∈ L ∞ (µ) inductively by for s ∈ N, where the implicit limits defining the mean values E h∈N are known to exist by [29]. It is also shown in the same article that for every We are going to use the following important structure theorem (nilsystems are defined in Section 3.1): Theorem 2.6 (Host, Kra [29]). Let (X, X , µ, T ) be an ergodic system and s ∈ N. Then the system (X, Z s , µ, T ) is an inverse limit of s-step nilsystems.
The last property means that there exist T -invariant sub-σ-algebras Z s,n , n ∈ N, that span Z s , such that for every n ∈ N the factor system associated with Z s,n is isomorphic to an s-step nilsystem.
2.5. Local uniformity seminorms. Let I = (I N ) N ∈N be a sequence of intervals with |I N | → ∞ and a ∈ ℓ ∞ (N) be a sequence that admits correlations for Cesàro averages on I. Following [30], we define the uniformity seminorms a U s (I) inductively as follows: It is not immediately clear that all the iterative limits defining the above averages exist. This can be proved by reinterpreting these seminorms in ergodic terms using the measure preserving system (X, X , µ, T ) and the function F ∈ L ∞ (µ) induced by a and I. We then have a U s (I) = |||F ||| s where |||F ||| s is defined as in Section 2.4. Using the ergodic reinterpretation and [29, Theorem 1.2] we deduce the identity and for s ∈ N we let 0 := (0, . . . , 0), and for ǫ = (ǫ 1 , . . . , ǫ s ) we let |ǫ| := ǫ 1 + · · · + ǫ s . Furthermore, the limit E h∈N s can be defined using averages taken over arbitrary Følner sequences of subsets of N s , or can be taken to be the iterative limit E hs∈N · · · E h 1 ∈N . All these limits exist and are equal; this follows from [29,Theorem 1.2]. It is shown in [29] that |||F ||| s ≤ |||F ||| s+1 for every F ∈ L ∞ (µ) and s ∈ N; we deduce that In a similar fashion, if a ∈ ℓ ∞ (N) admits correlations for logarithmic averages on I, we define the uniformity seminorms for logarithmic averages a U s log (I) as follows: All implicit limits defining the mean values E h∈N can be shown to exist. Note that in the definition of the uniformity seminorms for logarithmic averages only the inner-most average is logarithmic, the others can be given by any shift invariant averaging scheme we like. For example, we have We also use variants of some local uniformity seminorms introduced by Tao in [43] when I = ([N ]) N ∈N . For s ∈ N, a ∈ ℓ ∞ (N), and Følner sequence of intervals I, we let (where (S n a)(m) := a(n + m)) and We used that and a U s (Z N ) are the Gowers uniformity norms. These were defined in [20] as follows: , where for N ∈ N we use the periodic extension of a · 1 [N ] to Z N in the previous computations, or equivalently, we define (S h a)(n) := a(n + h mod N ) for n ∈ Z N . Proposition 2.7. Let s ∈ N. If a ∈ ℓ ∞ (N) is a sequence that admits correlations for Cesàro averages on the sequence of intervals I, then a U s * (I) ≤ 4 a U s (I) . A similar statement holds for logarithmic averages on I and the corresponding estimate is a U s * ,log (I) ≤ 4 a U s log (I) . Proof. Let H ∈ N and a : Z H → C be bounded by 1. Then arguing as in the proof of Proposition 3.2 in [6], we get that for every H 1 , . . . , H s ∈ N the following estimate holds and we extend it periodically to Z 2H . Using the definition of a U s [H] in conjunction with the estimate (5), applied to a H , and using that where the sums h+ǫ·h are taken in N. Using this estimate for the sequence S n a, averaging over n ∈ I N , taking N → ∞, and then making the change of variables n → n − h, we get that for every H, H 1 , . . . , H s ∈ N we have Finally, recall that a U s . Thus, if on the last estimate we take H → ∞ and then let H s → ∞, . . . , H 1 → ∞, we get that This proves the first estimate. The proof of the second estimate is similar.
It is called completely multiplicative if the previous identity holds for every m, n ∈ N. We let M := {f : N → C is multiplicative, bounded, and |f (p)| = 1 for every p ∈ P}.
We say that f ∈ M is aperiodic (or non-pretentious following [21]) if The uniformity result stated in the introduction (see Theorem 1.6) holds for a class of multiplicative functions that satisfy a condition introduced in [36] which is somewhat stronger than aperiodicity. In order to state it we need the notion of the distance between two multiplicative functions defined as in [21]: Definition 2.8. Let P be the set of primes. We let D : M × M → [0, ∞] be given by We also let D : and A celebrated theorem of Halász [27] states that a multiplicative function f ∈ M has zero mean value if and only if for every t ∈ R we either have For our purposes we need information on averages of multiplicative functions taken on typical short intervals. Such results were obtained in [35,36], under conditions that motivate the following definition: Note that strong aperiodicity implies aperiodicity. The converse is not in general true (see [36,Theorem B.1]), but it is true for (bounded) real valued multiplicative functions (see [36,Appendix C]). In particular, the Liouville and the Möbius function are strongly aperiodic. Furthermore, if f ∈ M satisfies (i) f (p) is a d-th root of unity for all but finitely many primes p, and (ii) D(f, χ) = ∞ for every Dirichlet character χ, then f is strongly aperiodic (see [13, Proposition 6.1]).
We will need the following result; a quantitative variant of which is implicit in [36] (the stated version is also deduced from [ [36]). Let f ∈ M be a strongly aperiodic multiplicative function that admits correlations for Cesàro averages on the sequence of A similar statement also holds for logarithmic averages on I.
Remark. It follows from [36, Theorem B.1] that strong aperiodicity cannot be replaced by aperiodicity; in particular, there exist an aperiodic multiplicative function f ∈ M, a positive constant c, and a sequence of intervals

2.7.
Local uniformity implies the Elliott conjecture. In this subsection we prove Theorem 1.8 by adapting the argument in [42] which deals with the case where all the multiplicative functions are equal to the Liouville or the Möbius function. In what follows, if (a(p)) p∈P is a sequence indexed by the primes, we denote by E p∈P a(p) the limit lim N →∞ log N N p≤N a(p) if it exists. Our starting point is the following identity which is implicit in [42] and its proof was sketched in [15, Appendix C] (see also [44,Theorem 3.6] for a variant of this identity): k∈N be a sequence of intervals with N k → ∞, (c p ) p∈P be a bounded sequence of complex numbers, s ∈ N, a 1 , . . . , a s ∈ ℓ ∞ (N), and n 1 , . . . , n s ∈ N. Then, assuming that on the left and right hand side below the limit E log m∈I exists for every p ∈ P and the limit E p∈P exists, we have the identity We deduce from this the following identity for multiplicative functions: . . , f s ∈ M, and n 1 , . . . , n s ∈ N. Suppose that for every p ∈ P on the left and right hand side below the limit E log m∈I exists and the limit E p∈P exists. Suppose further that the limit E log m∈I s j=1 f j (pm + pn j ) exists for every p ∈ P. Then we have the identity Proof. For p ∈ P and j = 1, . . . s, we have f j (p(m + n j )) = f j (p) f j (m + n j ) unless m ≡ −n j (mod p). Hence, where the implicit constant depends only on s and on the sup-norm of f 1 , . . . , f s . Averaging over p ∈ P we get Applying Proposition 2.11, we get the asserted identity.
We will also need the following multiple ergodic theorem: Proof. It suffices to show that For w ∈ N let W denote the product of the first w primes. Following the proof of [16, Theorem 1.3] (which uses the Gowers uniformity of the W -tricked von Mangoldt function established in [23,24,26]) we get that the average on the left hand side is equal to In order to show that this limit vanishes, it suffices to show that for all distinct l 1 , . . . , l s ∈ N and for arbitrary k 1 , . . . , k s ∈ N we have It follows from [32, Theorem A.8] (for s ≥ 3, but a simple argument works for s = 2) that in order to establish this identity it suffices to show that s,T and by assumption we have |||F 1 ||| s,T = 0. This completes the proof.
Proof of Theorem 1.8. Arguing by contradiction, suppose that the conclusion fails. Then there exist multiplicative functions f 2 , . . . , f s ∈ M, distinct n 1 , . . . , n s ∈ N, and a subse- exists for every s, k 1 , . . . , k s , l 1 , . . . , l s ∈ N, and g 1 , . . . , g s ∈ {a 1 , . . . , a s , a 1 , . . . , a s }, and such that Using Corollary 2.12, we will get a contradiction if we show that In order to prove this identity we will reinterpret it in ergodic terms. Using a variant of Proposition 2.3 which applies to several sequences (see [13,Proposition 3.3]) we get that there exist a system (X, X , µ, T ) and functions F 1 , . . . , F s ∈ L ∞ (µ) such that holds for every p ∈ P and f 1 U s log (I ′ ) = |||F 1 ||| s . Since by assumption f 1 U s log (I) = 0 and I ′ is a subsequence of I, we have f 1 U s log (I ′ ) = 0. Hence, |||F 1 ||| s = 0, and Proposition 2.13 gives that We deduce from this and identity (7) that (6) holds. This completes the proof.
3. Nilmanifolds, nilcharacters, and nilsequences 3.1. Nilmanifolds. If G is a group we let G 1 := G and G j+1 := [G, G j ], j ∈ N. We say that G is s-step nilpotent if G s+1 is the trivial group. An s-step nilmanifold is a homogeneous space X = G/Γ, where G is an s-step nilpotent Lie group and Γ is a discrete cocompact subgroup of G. An s-step nilsystem is a system of the form (X, G/Γ, m X , T b ), where X = G/Γ is a k-step nilmanifold, b ∈ G, T b : X → X is defined by T b (gΓ) := (bg)Γ, g ∈ G, m X is the normalized Haar measure on X, and G/Γ is the completion of the Borel σ-algebra of G/Γ. We call the map T b or the element b a nilrotation. Henceforth, we assume that every nilsystem is equipped with a fixed Riemannian metric d. If Ψ is a function on X we let Ψ Lip(X) := sup x∈X |Ψ(x)| + sup x,y∈X x =y . With Lip(X) we denote the set of all functions Ψ : X → C with bounded Lip(X)-norm.
If H is a closed subgroup of G, then it is shown in [33, Section 2.2] that the following three properties are equivalent: For any such H, the nilmanifold H/(H ∩ Γ) is called a sub-nilmanifold of X.
With G 0 we denote the connected component of the identity element in G (this is a normal subgroup of G). If the nilsystem is ergodic, then since G ′ : For every ergodic nilsystem we will use such a representation for X and thus assume that G = G 0 , b . This implies (see for example [3,Theorem 4.1]) that for j ≥ 2 all the commutator subgroups G j are connected.
Throughout the article, in the case of ergodic nilsystems, we are going to use these properties without further reference.

3.2.
Equidistribution. Let X = G/Γ be a nilmanifold. We say that a sequence g : where m X denotes the normalized Haar measure on X.
It is proved in [34] (see also [33]) that for every b ∈ G the set Y = {b n · e X , n ∈ N} is a sub-nilmanifold of X, the nilrotation b acts ergodically in Y , and the sequence (b n · y) n∈N is equidistributed in Y for every y ∈ Y . Furthermore, we can represent Y as Y = H/∆ where H is a closed subgroup of G that contains the element b (see the remark following [33,Theorem 2.21]). If Y is connected, then for every k ∈ N the nilrotation b k acts ergodically on Y . If Y is not connected, then there exists r ∈ N such that the nilrotation b r acts ergodically on the connected component Y 0 of e X in Y .

3.3.
Vertical nilcharacters on X and on X 0 . Let s ∈ N and X = G/Γ be a (not necessarily connected) s-step nilmanifold and suppose that G = G 0 , b for some b ∈ G. If s ≥ 2, then G s is connected and the group K s := G s /(G s ∩Γ) is a finite dimensional torus (perhaps the trivial one). Let K s be the dual group of K s ; it consists of the characters of G s that are If χ is a non-trivial character of K s , we say that Φ is a non-trivial vertical nilcharacter, otherwise we say that it is a trivial vertical nilcharacter. The linear span of vertical nilcharacters with modulus 1 is dense in C(X) with the uniform norm.
If the nilmanifold X is not connected, let X 0 be the connected component of e X in X. We claim that for s ≥ 2 the restriction of a non-trivial vertical nilcharacter Φ of X onto X 0 is a non-trivial vertical nilcharacter of X 0 with the same frequency. To see this, note first that since (G 0 Γ)/Γ is a non-empty closed and open subset of the connected space X 0 , we have X 0 = (G 0 Γ)/Γ. It thus suffices to show that (G 0 Γ) s = G s . To this end, let r be the smallest integer such that b r ∈ G 0 Γ. Then G/(G 0 Γ) is isomorphic to the cyclic group Z r . By induction, for every k ≥ 1 and all g 1 , . . . , g k ∈ G, we have , . . . , g r k ] mod G k+1 . Letting k = s and using that G s+1 is trivial, we get for all g 1 , . . . , g s ∈ G that , . . . , g r s ] ∈ (G 0 Γ) s . Using this and because G s is Abelian and spanned by the elements [[. . . [g 1 , g 2 ], g 3 ], . . . , g s ], we deduce that for every h ∈ G s we have h r s ∈ (G 0 Γ) s . Since G s is connected for s ≥ 2 it is divisible, hence the map h → h r s is onto, and we conclude that (G 0 Γ) s = G s .
3.4. Nilsequences. Following [3] we define: Definition 3.1. If X = G/Γ is an s-step nilmanifold, F ∈ C(X), and b ∈ G, we call the sequence (F (b n · e X )) n∈N an s-step nilsequence (we omit the adjective "basic"). A 0-step nilsequence is a constant sequence.
Remarks. • As remarked in Section 3.2, the set Y = {b n · e X , n ∈ N} is a sub-nilmanifold of X that can be represented as Y = H/∆ for some closed subgroup H of G with b ∈ H. Thus, upon replacing X with Y we can assume that b is an ergodic nilrotation of X.
• For every x = gΓ ∈ X, the sequence (F (b n x)) n∈N is a nilsequence, as it can be represented in the form (F ′ (b ′n · e X )) n∈N where g ′ := g −1 bg and F ′ (x) := F (gx), x ∈ X.
3.4.1. Nilsequences of bounded complexity. To every nilmanifold X (equipped with a Riemannian metric) we associate a class of nilsequences of "bounded complexity" which will be used in the formulation of the inverse theorem in the next section.
Definition 3.2. Let X = G/Γ be a nilmanifold. We let Ψ X be the set of nilsequences of the form (Ψ(b n x)) n∈N where b ∈ G, x ∈ X, and Ψ ∈ Lip(X) satisfies Ψ Lip(X) ≤ 1.
Remark. Although Ψ X is not an algebra, there exists a nilmanifold X ′ (take X ′ = X ×X with a suitable Riemannian metric) such that Ψ X ′ contains the sum and the product of any two elements of Ψ X . We will often use this observation without further notice.

3.4.2.
Approximation by multiple-correlation sequences. The next lemma will help us establish certain anti-uniformity properties of nilsequences that will be needed later. It is a consequence of [12, Proposition 2.4]. Lemma 3.3. Let s ∈ N and X be an s-step nilmanifold. Then for every ε, L > 0 there exists M = M (ε, X, L) such that the following holds: If ψ ∈ L · Ψ X , then there exist a system (Y, Y, µ, T ) and functions F 0 , . . . , F s ∈ L ∞ (µ), all bounded by M , such that the sequence (b(n)) n∈N , defined by Remark. Alternatively, we can use as approximants sequences of the form b(n) := lim M →∞ E m∈[M ] a 0 (m) · a 1 (m + k 1 n) · . . . · a s (m + k s n), n ∈ N, where for j = 0, . . . , s the sequences a j ∈ ℓ ∞ (N) are defined by a j (m) := F j (T m y 0 ), m ∈ N, for suitable y 0 ∈ Y .
Proof. Let ε > 0. First note that since the space of functions on (X, d X ) with Lipschitz constant at most L is compact with respect to the · ∞ -norm, we can cover this space by a finite number of · ∞ -balls of radius at most ε. It follows from this that in order to verify the asserted approximation property, it suffices to verify the property for every fixed nilsequence ψ without asking for additional uniformity properties for the L ∞ (µ) norms of the functions F 0 , . . . , F s ∈ L ∞ (µ). This statement now follows immediately from [12, Proposition 2.4].

3.4.3.
Reduction of degree of nilpotency. The next result will be used in the proof of the inverse theorem in the next section. It is a direct consequence of the constructions in [24,Section 7] and it is stated in a form equivalent to the one below in [41, Lemma 1.6.13]: Proposition 3.4 (Green, Tao [24]). For s ≥ 2 let X = G/Γ be an s-step nilmanifold.
Then there exist an (s − 1)-step nilmanifold Y and C = C(X) > 0 such that for every vertical nilcharacter Φ of X with Φ Lip(X) ≤ 1, every b ∈ G, and every h ∈ N, the An example is given by the 2-step nilsequence (e(n 2 α)) n∈N which can be defined by a vertical nilcharacter on the Heisenberg nilmanifold; taking the difference operation results to the 1-step nilsequences (e(2nhα + h 2 α)) n∈N which can be represented as , h ∈ N, and β := 2hα.

U s (I)-inverse theorem for ergodic sequences
Henceforth, we assume that I = (I N ) N ∈N is a sequence of intervals with |I N | → ∞.
Remarks. • A variant for logarithmic averages also holds; one needs to assume ergodicity for logarithmic averages and replace a U s+1 (I) with a U s+1 log (I) and E n∈I N with E log n∈I N . Despite its apparent simplicity, this condition is very hard to verify for particular arithmetic sequences, and it is still unknown for the Möbius and the Liouville function.

4.2.
Sketch of proof for s = 2 versus s > 2. The proof of Theorem 4.1 is rather simple for s = 2; we sketch it in order to motivate and explain some of the maneuvers needed in the general case. The argument proceeds as follows: • We first use ergodicity of the sequence (a(n)) n∈N in order to establish the identity Using this identity and our assumption a U 2 (I) > 0, we deduce that This step generalizes straightforwardly when s > 2 and gives relation (13) below. • We can decompose the (positive definite) sequence (A(n)) n∈N into a structured component which is a trigonometric polynomial sequence and an error term which is small in uniform density. Hence, we can assume that (9) holds when A(n) = e(nt), n ∈ N, for some t ∈ R. This infinitary decomposition result is crucial in order to get for s = 2 an inverse condition that does not involve a supremum in the inner-most average and for s ≥ 3 an inverse condition that involves a supremum over (s − 2)-step (and not (s − 1)-step) nilsequences of bounded complexity. The appropriate decomposition result when s ≥ 3 is Proposition 4.4 which is proved using deep results from ergodic theory (the main ingredient is Theorem 2.6). Since in this more complicated setup we cannot later on utilize simple identities that linear exponential sequences satisfy, we take particular care to use as structured components sequences which take a very convenient (though seemingly complicated) form. • After interchanging the averages over h and n in (9) which immediately implies the conclusion of Theorem 4.1 when s = 2. This step is harder when s > 2 and two additional maneuvers are needed (described in Steps 3 and 4 in the proof of Theorem 4.1). One key idea is to introduce an additional short range average that allows us to replace some unwanted expressions with (s − 2) nilsequences. This part of the argument uses the finitary decomposition result of Proposition 4.6 which is the reason why we get an inverse condition involving a sup over all (s − 2)-nilsequences of bounded complexity. Another idea needed is to use Proposition 3.4 in order to remove an unwanted supremum over a parameter h ∈ N; not doing so would cause problems later on when we try to verify the inverse condition for the class of multiplicative functions we are interested in. We give the details of the proof of Theorem 4.1 in the next subsections. 4.3. Uniformity estimates. We will use the next estimate in the proof of Lemma 4.3, which in turn will be used in the proof of Proposition 4.6.  Since a 0 ∞ ≤ 1, the claimed estimate follows from the Gowers-Cauchy-Schwarz inequality [20, Lemma 3.8].

Proof. Notice that the left hand side is equal to
We use this lemma in order to deduce a similar estimate for non-periodic sequences: where C s := (s + 1) s+1 ((2s) s + 1).
Proof. Let M := (s + 1)M . We first reduce matters to estimating a similar average over Z M . Let h = (h 1 , . . . , h s ) and notice that the average in (10) is bounded by (s + 1) s+1 times where the sums m + ǫ · h are taken (mod M ).
Next, we reduce matters to estimating a similar average that does not contain the indicator functions. Let R be an integer that will be specified later and satisfies 0 < R < M/2. We define the "trapezoid function" φ on Z M so that φ(0) = 0, φ increases linearly from 0 to 1 on the interval [0, R], φ(r) = 1 for R ≤ r ≤ M − R, φ decreases linearly from 1 to 0 on [M − R, M ], and φ(r) = 0 for M < r < M .
After telescoping, we see that the absolute value of the difference between the average (11) and the average is bounded by 2sR/ M . Moreover, it is classical that and thus (12) is bounded by For j = 1, . . . , s, let ǫ j ∈ [[s]] * be the element that has 1 in the j-th coordinate and 0's elsewhere. Upon replacing a ǫ j (n) with a ǫ j (n) e(−nξ j / M ), j = 1, . . . , s, and a (1,...,1) (n) with a (1,...,1) (n) e(n(ξ 1 + · · · + ξ s )/ M ), and leaving all other sequences unchanged, the U s (Z M )-norm of all sequences remains unchanged (we use here that s ≥ 2) and the term s j=1 e(h j ξ j / M ) disappears. We are thus left with estimating the average Combining the preceding estimates, we get that the average in the statement is bounded by (s + 1) s+1 times Choosing R := ⌊U 1 s+1 M /(4s)⌋ + 1 (then R ≤ M/2 for M ≥ 5) we get that the last quantity is bounded by When M ≤ 4 the asserted estimate is trivial, completing the proof. 4.4. Two decompositions. We will use the following infinitary decomposition result which is proved using tools from ergodic theory. C |ǫ| a(n + ǫ · h), h ∈ N s , admits a decomposition of the form A = A st + A er such that (i) A st : N s → C is a uniform limit of sequences of the form where the integration takes place on a probability space (X, X , µ), and for x ∈ X the sequence A st,x : N s → C is defined by Remarks. • The ergodicity assumption in this statement is a convenience; we can prove a similar statement without it by using the decomposition result in [7,Proposition 3.1] in place of Theorem 2.6.
• It can be shown that the sequence A st is a uniform limit of s-step nilsequences in s variables, but such a decomposition result is less useful for our purposes.
Proof. Let (X, X , µ, T ) be the ergodic system and F ∈ L ∞ (µ) be the function associated to (a(n)) n∈N and I by the correspondence principle of Proposition 2.3. Then

We set
where F st := E(F |Z s ) is the orthogonal projection of F onto L 2 (Z s ) and Z s is the σ-algebra defined in Section 2.4. Furthermore, we let We first deal with the sequence A er . It follows from [29,Theorem 13 T ǫ·h F ǫ dµ = 0.

Using this, telescoping, and since
Next we establish the asserted structural property of the sequence A st . Theorem 2.6 gives that the system (X, Z s , µ, T ) is an inverse limit of s-step nilsystems. It follows from this that the sequence A st is a uniform limit of sequences of the form where X = G/Γ is an s-step nilmanifold, b ∈ G, m X is the normalized Haar measure of X, and Φ ∈ C(X) satisfies As remarked in Section 3.1, for every h ∈ N s the limit E n∈I ǫ∈[[s]] C |ǫ| Φ(b n+ǫ·h x) exists. Using this property, the bounded convergence theorem, and the preservation of m X by left translation by b n , n ∈ N, we get The next result is proved in [22] using the finitary inverse theorem for the Gowers uniformity norms in [26]. norms are comparable (see [14,Lemma A.4]). Lastly, the statement in [22] contains a third term that is small in L 2 [M ], this term has been absorbed by the a M,un term in our statement.
We will use Theorem 4.5 and Lemma 4.3 in order to establish the following finitary decomposition result: Proof. Let ε > 0 and s ≥ 2. We use the decomposition result of Theorem 4.5 for δ = δ(ε, s) to be determined momentarily. We get an (s−1)-step nilmanifold X = X(δ, s) an L = L(δ, s) > 0, such that for every large enough M ∈ N we have a decomposition a(n) = a M,st (n) + a M,un (n), n ∈ [(s + 1)M ], where a M,st ∈ L · Ψ X , a N,st ∞ ≤ 4, and a M,un U s (Z (s+1)M ) ≤ δ. Without loss of generality we can assume that a ∞ ≤ 1.
Step 1 (Using ergodicity). Using the ergodicity of a(n)for Cesàro averages on I (this is the only place where we make essential use of ergodicity in the proof of Theorem 4.1) we get that To see this note that C |ǫ| a(n + ǫ · h) where the first identity follows from (4), the second follows from the remarks made in Section 2.5, and the third from our ergodicity assumption using identity (3).
As remarked in Section 2.5, we can replace the average E h∈N s−1 with lim H→∞ E h∈[H] s−1 .

By Proposition 4.4 we have a decomposition
where for x ∈ X the sequence φ Using uniform approximation and the second condition we deduce that Using Fatou's lemma we deduce that there exists an x ∈ X such that Using the form of A st,x and the fact that both limits E n∈I · · · and E n ′ ∈I · · · exist, we get that Hence, renaming φ x as φ, we get that there exists an (s − 1)-step nilsequence φ such that where for n, n ′ ∈ N the sequence (ã n,n ′ (k)) k∈N is defined bỹ a n,n ′ (k) := a(n + k) φ(n ′ + k), k ∈ N.
Step 3 (Using a finitary decomposition). Next, we shift the averages over n and n ′ by m ∈ N and average over m ∈ [M ]. 1 We deduce that where for M, n, n ′ ∈ N we let For M, n, n ′ ∈ N, we use Proposition 4.6 for ε := δ/3 in order to decompose the finite sequence A M,n,n ′ (m), m ∈ [M ]. We get that there exist C = C(δ, s) > 0, an (s − 2)-step nilmanifold Y = Y (δ, s), and for large enough M ∈ N there exist (s−2)-step nilsequences ψ M,n,n ′ ,h ∈ C · Ψ Y , where h ∈ [M ] s−1 , n, n ′ ∈ N, such that This implies that (notice that n → ψ(n + k) is in Ψ Y for every k ∈ Z) (14) lim sup for some (s − 1)-step nilsequence φ.
Step 4 (Removing the sup over h). It remains to show that the supremum over h ∈ N can be removed. As explained in Section 3.3, we can assume that φ(n) = Φ(b n ·e X ), n ∈ N, for some (s − 1)-step nilmanifold X, b ∈ G, and vertical nilcharacter Φ of X with |Φ| = 1 and Φ Lip(X) ≤ 1. It follows from Proposition 3.4 that there exist an (s − 2)-nilmanifold in (14), and notice that upon enlarging the (s − 2)-nilmanifold Y , the (s − 2)-nilsequence (φ(n + h) φ(n)) m∈N can be absorbed in the supremum over ψ ∈ Ψ Y . We deduce that This completes the proof.

U s (I)-uniformity for the Liouville function
Our goal in this section is to prove Theorem 1.6. Note that this uniformity result combined with Theorem 1.7 gives Theorem 1.1 and combined with Theorem 1.8 gives Theorem 1.4. We present the proof of Theorem 1.6 for Cesàro averages but a similar argument also works for logarithmic averages and we indicate in the remarks which statements need to be modified for this purpose. 5.1. Sketch of the proof. We proceed by induction as follows: • For s = 2 we get that Theorem 1.6 follows from the ergodicity of f and Proposition 2.10 (the strong aperiodicity of f is only used here). Assuming that s ≥ 2 and f U s (I) = 0, our goal then becomes to show that f U s+1 (I) = 0. • We first use the inverse result of Theorem 4.1 in order to reduce matters to an orthogonality property of f with nilsequences on typical short intervals (see Proposition 5.1). Essential use of ergodicity of f is made here. • The orthogonality property involves a fixed s-step nilsequence φ and a supremum over a set of (s − 1) step nilsequences of bounded complexity. If φ is an (s − 1)step nilsequence, then we are done by the induction hypothesis (the ergodicity of f is used again here) via elementary estimates. If not, we reduce matters to the case where the s-step nilsequence φ is defined by a non-trivial vertical nilcharacter (see Proposition 5.3).
• We then use the orthogonality criterion of Lemma 5.5 (the multiplicativity of f is only used here) in order to reduce matters to a purely dynamical statement about "irrational nilsequences" (see Proposition 5.6). • Lastly, we verify the dynamical statement using elementary estimates, qualitative equidistribution results on nilmanifolds, and ideas motivated from [14].

5.2.
Step 1 (Setting up the induction and cases s = 1, 2). We prove Theorem 1.6 by induction on s ∈ N. We cover the cases s = 1 and s = 2 separately, partly because we want to show their relation to recently established results, but also because the inductive step s → s + 1 is slightly different when s ≥ 2.
and the last limit is 0 by [36,Theorem A.1]. Note that this argument did not use our ergodicity assumption on f . Assuming ergodicity of f for Cesàro averages on I, then one simply notes that f U 1 (I) = |E n∈I f (n)| = 0. For s = 2, using our hypothesis that the sequence (f (n)) n∈N is ergodic for Cesàro averages on I we derive exactly as in the first step of the proof of Theorem 4.1 the identity This limit is 0 by Proposition 2.10. Suppose now that Theorem 1.6 holds for s ≥ 2; in the remaining subsections we will show that it holds for s + 1.

5.3.
Step 2 (Using the inverse theorem). We start by using the inverse theorem proved in the previous section. It follows from Theorem 4.1 and Proposition 2.7 that in order to prove Theorem 1.6 it suffices to establish the following result: Remark. • A variant for logarithmic averages also holds where one assumes ergodicity for logarithmic averages on I and replaces E n∈[N k ] with E log n∈[N k ] . The proof is similar. • If we remove the sup ψ∈Ψ Y , then our proof works without an ergodicity assumption on f . This simpler result was also obtained recently in [11], and prior to this, related results were obtained in [14,25]. But none of these results allows to treat the more complicated setup with the supremum over the set Ψ Y and this is crucial for our purposes.

5.4.
Step 3 (Reduction to non-trivial nilcharacters). Since φ is an s-step nilsequence, there exist an s-step nilmanifold X = G/Γ, an ergodic nilrotation b ∈ G, and a function Φ ∈ C(X), such that φ(n) = Φ(b n · e X ), n ∈ N. Since vertical nilcharacters of X span C(X) (see Section 3.3) we can assume that Φ is a vertical nilcharacter of X.
If Φ is a trivial nilcharacter of X, then it factorizes through the nilmanifold The group G/G s is (s − 1)-step nilpotent and X ′ is an (s − 1)-step nilmanifold. Writing b ′ for the image of b in G/G s , we have that φ(n) = Φ ′ (b ′n ·e X ′ ), n ∈ N, for some Φ ′ ∈ C(X ′ ). We deduce that φ is an (s − 1)-step nilsequence. Moreover, since the sequence (f (n)) n∈N is ergodic for Cesàro averages on I = ([N k ]) k∈N , the induction hypothesis gives that f U s (I) = 0. Hence, the following direct theorem (which does not require ergodicity assumptions) implies that the conclusion of Proposition 5.1 holds when the function Φ defining the nilsequence φ is a trivial nilcharacter of X: Lemma 5.2. Let s ≥ 2 and a ∈ ℓ ∞ (N) be a sequence that admits correlations for Cesàro averages on the sequence of intervals I = (I N ) N ∈N . Suppose that a U s (I) = 0. Then for every (s − 1)-step nilsequence φ and every (s − 1)-step nilmanifold Y , we have Remark. A variant for logarithmic averages also holds where one replaces a U s (I) with a U s log (I) and E n∈I N with E log n∈I N . The proof is similar.
Proof. First notice that since every (s − 1)-step nilsequence φ can be uniformly approximated by (s − 1)-step nilsequences defined by functions with bounded Lipschitz norm, the sequence φ can be absorbed in the sup (upon enlarging the nilmanifold Y ). Hence, we can assume that φ = 1. We implicitly assume that the s = 1 case corresponds to an estimate where there are no functions on the left hand side. So in order to verify the base case, we need to show that To this end, we apply the van der Corput lemma for complex numbers. We get for all M, R ∈ N with R ≤ M and all n ∈ N, that Hence, where the last estimate holds because E r∈ [R] (1 − rR −1 ) Re(E n∈I a(n + r) · a(n)) is the Cesàro average of E r∈ [R] Re(E n∈I a(n + r) · a(n)) with respect to R. Hence, the asserted estimate holds. Suppose now that the estimate (15) holds for s − 1 ∈ N, where s ≥ 2, we will show that it holds for s. We apply the van der Corput lemma in the Hilbert space L 2 (µ M,n ) and then use the Cauchy-Schwarz inequality. We get for all M, R ∈ N with R ≤ M and all n ∈ N, that Applying the induction hypothesis for the sequences S r a · a, r ∈ N (which also admit correlations for Cesàro averages on I and are bounded by 1), the functionsF j,M,n,r , and the integersk j , j = 1, . . . , s − 2, and averaging over r ∈ N, we deduce that the last expression is bounded by 16 times Taking square roots, we deduce that (15) holds, completing the induction.
Hence, it suffices to consider the case where Φ is a non-trivial vertical nilcharacter of X, and we have thus reduced matters to proving the following result (note that strong aperiodicity and ergodicity are no longer needed): We follow the argument used in the proof of [2,Theorem 4] in order to disjointify the intervals [n, n+M ]. Since M k /N k → 0, we have for every bounded sequence (a(k, n)) k,n∈N that lim Applying this for the sequence we deduce that for every large enough k ∈ N there exists r k ∈ [M k ] such that Upon changing the sequences (ψ k,n (m)) m∈N by multiplicative constants of modulus 1 that depend on k and n only, we can remove the norm in the previous estimate. Hence, without loss of generality, we can assume for all large enough k ∈ N that Since M k /N k → 0, we deduce from the last estimate that Hence, in order to get a contradiction and complete the proof of Proposition 5.3, it remains to verify that lim The only property to be used for the intervals [(ℓ − 1)M k , ℓM k ) is that their lengths tend to infinity as k → ∞ uniformly in ℓ ∈ N.
We deduce from this lemma the following: Applying Lemma 5.5, it suffices to show that for every p, p ′ ∈ N with p = p ′ and every c > 0, we have lim where g k is as in (16). Equivalently, we have to show that (the sum below is finite) Note that for fixed k ∈ N the intervals I k,ℓ,ℓ ′ , ℓ, ℓ ′ ∈ N, are disjoint (and some of them empty). Since M k → ∞, they partition the interval [cN k ] into subintervals J k,l , l = 1, . . . , L k , with L k → ∞ and min l∈[L k ] |J k,l | → ∞ as k → ∞, and a set Z k with |Z k |/N k → 0 as k → ∞. Since |Z k |/N k → 0 as k → ∞, it suffices to show that Thus, in order to prove Proposition 5.3 it suffices to verify the following asymptotic orthogonality property which has purely dynamical context: Proposition 5.6. For s ≥ 2 let X = G/Γ be an s-step nilmanifold and b ∈ G be an ergodic nilrotation. Furthermore, let Φ, Φ ′ be non-trivial vertical nilcharacters of X with the same frequency. Then for every p, p ′ ∈ N with p = p ′ , every sequence of intervals (I N ) N ∈N with |I N | → ∞, and every (s − 1)-step nilmanifold Y , we have (17) lim A model case is when Φ(b n · e X ) = Φ ′ (b n · e X ) = e(n s β) with β irrational. Then the statement to be proved reduces to lim N →∞ sup ψ∈Ψ Y |E n∈I N e(n s α) ψ(n)| = 0 where α := (p s − q s )β is irrational. This can be verified easily by using van der Corput's lemma for complex numbers and Lemma 3.3. The proof in the general case is much harder though; it is given in the next subsection.

5.7.
Step 6 (Proof of the dynamical property). The goal of this last subsection is to prove Proposition 5.6. Let us remark first, that although we were not able to adapt a related argument in [14, Theorem 6.1] to the current setup, we found some of the ideas used there very useful.
The main idea is as follows: We apply the van der Corput lemma for complex numbers (s−1) times in order to cancel out the term ψ (Lemma 3.3 is useful in this regard) and we reduce (17) to verifying U s (I)-uniformity for the sequence Φ(b pn · e X ) Φ ′ (b p ′ n · e X ) n∈N . The fact that the supremum over Ψ Y no longer appears has the additional advantage that we only need to use qualitative (and not quantitative) equidistribution results on nilmanifolds.
The key in obtaining the needed U s (I)-uniformity is to establish that the nilcharacter Φ ⊗ Φ ′ is non-trivial on the s-step nilmanifold W := {(b pn · e X , b p ′ n · e X ), n ∈ N}.
Although the precise structure of the nilmanifold W is very difficult to determine (and depends on the choice of the ergodic nilrotation b) it is possible to extract partial information on W that suffices for our purposes. This last idea is taken from the proof of [14, Proposition 6.1] and the precise statement is as follows: Proposition 5.7. For s ∈ N let X = G/Γ be a connected s-step nilmanifold and b ∈ G be an ergodic nilrotation. Let p, p ′ ∈ N be distinct and let W be the closure of the sequence (b pn · e X , b p ′ n · e X ) n∈N in X × X. Then W is a nilmanifold that can be represented as W = H/∆ where ∆ = Γ × Γ and H is a subgroup of G × G such that (b p , b p ′ ) ∈ H and (18) (u p s , u p ′s ) ∈ H s for every u ∈ G s . Lemma 5.8. For s ≥ 2 let W = H/∆ be an s-step nilmanifold and h ∈ H be an ergodic nilrotation. Let Φ be a non-trivial vertical nilcharacter of W and φ(n) := Φ(h n · e W ), n ∈ N.
Then φ U s (I) = 0 for every sequence of intervals I = (I N ) N ∈N with |I N | → ∞.
Proof. As remarked in Section 3.2, we have φ U s (I) = |||Φ||| s , where the seminorm is computed for the system induced on W with the normalized Haar measure m W by the ergodic nilrotation by h ∈ H. Let Z s−1 (W ) be defined as in Section 2.4. It is implicit in [29,Theorem 13.1] and also follows by combining [45,Lemma 4.5] and [32], that L 2 (Z s−1 (W )) consists exactly of those functions in L 2 (m W ) that are H sinvariant ( [32] shows that the factors Z s and Y s defined in [29] and [45] respectively are the same). Since Φ is a non-trivial vertical nilcharacter of W , it is orthogonal to any H s -invariant function in L 2 (m W ), hence Φ is orthogonal to any function in L 2 (Z s−1 (W )). As remarked in Section 2.4, this implies that |||Φ||| s = 0 and completes the proof. Proof. Let φ(n) := Φ(h n · e W ), n ∈ N. By Lemma 5.8 we have that φ U s (I) = 0.
It follows from Lemma 3.3 that it suffices to establish the following: Let s ∈ N, I = (I N ) N ∈N be a sequence of intervals with |I N | → ∞, and a ∈ ℓ ∞ (N) be a sequence that admits correlations for Cesàro averages on I. Furthermore, for N ∈ N, let (X N , X N , µ N , T N ) be a system, F 0,N , . . . , F s−1,N ∈ L ∞ (µ N ) be functions bounded by 1, and let k 1 , . . . , k s−1 ∈ Z. Then we have This estimate can be proved by induction on s in a rather standard way using the van der Corput lemma for inner product spaces, the details are given in [12, Section 2.3.1].
We are now ready to prove Proposition 5.6.
Proof of Proposition 5.6. We argue by contradiction. Suppose that for some s ≥ 2 there exist an s-step nilmanifold X = G/Γ, an ergodic nilrotation b ∈ G, non-trivial vertical nilcharacters Φ, Φ ′ of X with the same frequency, p, p ′ ∈ N with p = p ′ , a sequence of intervals (I N ) N ∈N with |I N | → ∞, and an (s − 1)-step nilmanifold Y , such that (19) lim sup We first reduce matters to the case where the nilmanifold X is connected. As remarked in Section 3.2, there exists r ∈ N such that b r acts ergodically on the connected component X 0 of the nilmanifold X. Then for some j ∈ {0, . . . , r − 1} we have lim sup N →∞ sup ψ∈Ψ Y |E n∈I N Φ(b p(rn+j) · e X ) Φ ′ (b p ′ (rn+j) · e X ) ψ(rn + j)| > 0.
Let h := (b p , b ′p ′ ). By the discussion in Section 3.2 and Proposition 5.7, the element h acts ergodically on a nilmanifold W that can be represented as W = H/∆ where H is a subgroup of G × G such that h ∈ H and (u p s , u p ′s ) ∈ H s for every u ∈ G s . We will show that the restriction of the function Φ ⊗ Φ ′ on W is a non-trivial vertical nilcharacter of W . To this end, we use our hypothesis that Φ(u · x) = χ(u) Φ(x) and Φ ′ (u · x) = χ(u) Φ ′ (x) for u ∈ G s and x ∈ X, where χ is a non-trivial element of the dual of G s that is (G s ∩ Γ)-invariant. Hence, , for u, u ′ ∈ G s and x, x ′ ∈ X.
Since H s ⊂ G s × G s , it follows from this identity that Φ ⊗ Φ ′ is a vertical nilcharacter of W = H/∆. It remains to show that χ · χ is non-trivial on H s . Arguing by contradiction, suppose it is. Since (u p s , u p ′s ) ∈ H s for every u ∈ G s , we get χ(u p s −p ′s ) = χ(u p s ) χ(u p ′s ) = 1 for every u ∈ G s .
Since G s is connected for s ≥ 2 and p = p ′ , the map u → u p s −p ′s is onto G s , hence χ is the trivial character on G s , a contradiction.
Combining the above, we get that Proposition 5.9 applies and gives This contradicts (19) and completes the proof of Proposition 5.6.