Concatenation theorems for anti-Gowers-uniform functions and Host-Kra characteristic factors

We establish a number of"concatenation theorems"that assert, roughly speaking, that if a function exhibits"polynomial"(or"Gowers anti-uniform","uniformly almost periodic", or"nilsequence") behaviour in two different directions separately, then it also exhibits the same behavior (but at higher degree) in both directions jointly. Among other things, this allows one to control averaged local Gowers uniformity norms by global Gowers uniformity norms. In a sequel to this paper, we will apply such control to obtain asymptotics for"polynomial progressions"$n+P_1(r),\dots,n+P_k(r)$ in various sets of integers, such as the prime numbers.


Concatenation of polynomiality
Suppose P : Z 2 → R is a function with the property that n → P(n, m) is an (affine-)linear function of n for each m, and m → P(n, m) is an (affine-)linear function of m for each n. Then it is easy to see that P is of the form P(n, m) = αnm + β n + γm + δ for some coefficients α, β , γ, δ ∈ R. In particular, (n, m) → P(n, m) is a polynomial of degree at most 2.
The above phenomenon generalises to higher degree polynomials. Let us make the following definition: Definition 1.1 (Polynomials). Let P : G → K be a function from one additive group G = (G, +) to another K = (K, +). For any subgroup H of G and for any integer d, we say that P is a polynomial of degree < d along H according to the following recursive definition 1 : (i) If d ≤ 0, we say that P is of degree < d along H if and only if it is identically zero.
(ii) If d ≥ 1, we say that P is of degree < d along H if and only if, for each h ∈ H, there exists a polynomial P h : G → K of degree < d − 1 along H such that for all x ∈ G.
We then have Proposition 1.2 (Concatenation theorem for polynomials). Let P : G → K be a function from one additive group G to another K, let H 1 , H 2 be subgroups of G, and let d 1 , d 2 be integers. Suppose that (i) P is a polynomial of degree <d 1 along H 1 .
(ii) P is a polynomial of degree <d 2 along H 2 .
Then P is a polynomial of degree <d 1 + d 2 − 1 along H 1 + H 2 .
The degree bound here is sharp, as can for instance be seen using the example P : Z × Z → R of the monomial P(n, m) := n d 1 −1 m d 2 −1 , with H 1 := Z × {0} and H 2 := {0} × Z.
Proof. The claim is trivial if d 1 ≤ 0 or d 2 ≤ 0, so we suppose inductively that d 1 , d 2 ≥ 1 and that the claim has already been proven for smaller values of d 1 + d 2 .
Let h 1 ∈ H 1 and h 2 ∈ H 2 . By (i), there is a polynomial P h 1 : G → K of degree <d 1 − 1 along H 1 such that P(x + h 1 ) = P(x) + P h 1 (x) for all x ∈ G. Similarly, by (ii), there is a polynomial P h 2 : G → K of degree <d 2 − 1 along H 2 such that for all x ∈ G. Replacing x by x + h 1 and combining with (3), we see that for all x ∈ G, where P h 1 ,h 2 (x) := P h 1 (x) + P h 2 (x + h 1 ).
Proof. The claim is again trivial when d 1 = 0 or d 2 = 0, so we may assume inductively that d 1 , d 2 ≥ 1 and that the claim has already been proven for smaller values of d 1 + d 2 .

Concatenation of anti-uniformity
Now we turn to a more non-trivial variant of Proposition 1.2, in which the property of being polynomial in various directions is replaced by that of being anti-uniform in the sense of being almost orthogonal to Gowers uniform functions. To make this concept precise, and to work at a suitable level of generality, we need some notation. Recall that a finite multiset is the same concept as a finite set, but in which elements are allowed to appear with multiplicity. where the sum ∑ a∈A is also counting multiplicity. For instance, E a∈{1,2,2} a = 5 3 . Definition 1.7 (G-system). Let G = (G, +) be an at most countable additive group. A G-system (X, T ) is a probability space X = (X, B, µ), together with a collection T = (T g ) g∈G of invertible measure-preserving maps T g : X → X, such that T 0 is the identity and T g+h = T g T h for all g, h ∈ G. For technical reasons we will require that the probability space X is countably generated modulo null sets (or equivalently, that the Hilbert space L 2 (X) is separable). Given a measurable function f : X → C and g ∈ G, we define T g f := f • T −g . We shall often abuse notation and abbreviate (X, T ) as X. Remark 1.8. As it turns out, a large part of our analysis would be valid even when G was an uncountable additive group (in particular, no amenability hypothesis on G is required); however the countable case is the one which is the most important for applications, and so we shall restrict to this case to avoid some minor technical issues involving measurability. Once the group G is restricted to be countable, the requirement that X is countably generated modulo null sets is usually harmless in applications, as one can often easily reduce to this case. In combinatorial applications, one usually works with the case when G is a finite group, and X is G with the uniform probability measure and the translation action T g x := x + g, but for applications in ergodic theory, and also because we will eventually apply an ultraproduct construction to the combinatorial setting, it will be convenient to work with the more general setup in Definition 1.7. Definition 1.9 (Gowers uniformity norm). Let G be an at most countable additive group, and let (X, T ) be a G-system. If f ∈ L ∞ (X), Q is a non-empty finite multiset and d is a positive integer, we define the Gowers uniformity (semi-)norm f U d Q (X) by the formula where ∆ h,h is the nonlinear operator More generally, given d non-empty finite multisets Q 1 , . . . , Q d , we define the Gowers box (semi-)norm f d Q 1 ,...,Q d (X) by the formula so in particular f U d Q (X) = f d Q,...,Q (X) . Note that the ∆ h,h commute with each other, and so the ordering of the Q i is irrelevant.
It is well known that f U d Q (X) and f d f U d Q (X) * := sup{| f , g L 2 (X) | : g ∈ L ∞ (X), g U d Q (X) ≤ 1} for all f ∈ L ∞ (X), where f , g L 2 (X) := X f g dµ.
Our first main theorem is analogous to Proposition 1.2, and is stated as follows.
Theorem 1.11 (Concatenation theorem for anti-uniformity norms). Let Q 1 , Q 2 be coset progressions of ranks r 1 , r 2 respectively in an at most countable additive group G, let (X, T ) be a G-system, let d 1 , d 2 be positive integers, and let f lie in the closed unit ball of L ∞ (X). Let c 1 , c 2 : (0, +∞) → (0, +∞) be functions such that c i (ε) → 0 as ε → 0 for i = 1, 2. We make the following hypotheses: 2 Strictly speaking, one should refer to the tuple (Q, H, r, ) as the coset progression, rather than just the multi-set Q, as one cannot define key concepts such as rank or the dilates εQ without this additional data. However, we shall abuse notation and use the multi-set Q as a metonym for the entire coset progression.
Heuristically, hypothesis (i) (resp. (ii)) is asserting that f "behaves like" a function of the form x → e 2πiP(x) for some function P : X → R/Z that "behaves like" a polynomial of degree <d 1 along Q 1 (resp. <d 2 along Q 2 ), thus justifying the analogy between Theorem 1.11 and Proposition 1.2. The various known inverse theorems for the Gowers norms [21,5,27,28,33] make this heuristic more precise in some cases; however, our proof of the above theorem does not require these (difficult) theorems (which are currently unavailable for the general Gowers box norms).
We prove Theorem 1.11 in Sections 6, 7, after translating these theorems to a nonstandard analysis setting in Section 5.1. The basic idea is to use the hypothesis (i) to approximate f by a linear combination of "dual functions" along the Q 1 direction, and then to use (ii) to approximate the arguments of those dual functions in turn by dual functions along the Q 2 direction. This gives a structure analogous to the identities (3)-(6) obeyed by the function P considered in Proposition 1.2, and one then uses an induction hypothesis to conclude. To obtain the desired approximations, one could either use structural decomposition theorems (as in [14]) or nonstandard analysis (as is used for instance in [28]). We have elected to use the latter approach, as the former approach becomes somewhat messy due to the need to keep quantitative track of a number of functions such as ε → c(ε), whereas these functions are concealed to the point of invisibility by the nonstandard approach. We give a more expanded sketch of Theorem 1.11 in Section 4 below. Remark 1.12. It may be possible to establish a version of Theorem 1.11 in which one does not shrink the coset progressions Q by a small parameter ε, so that the appearance of εQ in (9) is replaced by Q. This would give the theorem a more "combinatorial" flavor, as opposed to an "ergodic" one (if one views the limit ε → 0 as being somewhat analogous to the ergodic limit n → ∞ of averaging along a Følner sequence Φ n ). Unfortunately, our methods rely heavily on techniques such as the van der Corput inequality, which reflects the fact that Q is almost invariant with respect to translations in εQ when ε is small. As such, we do not know how to adapt our methods to remove this shrinkage of the coset progressions Q. Similarly for Theorem 1.13 below.
We also have an analogue of Proposition 1.5: Theorem 1.13 (Concatenation theorem for anti-box norms). Let d 1 , d 2 be positive integers. For any i = 1, 2 and 1 ≤ j ≤ d i , let Q i, j be a coset progression of rank r i, j in an at most countable additive group G. Let (X, T ) be a G-system, let d 1 , d 2 be positive integers, and let f lie in the unit ball of L ∞ (X). Let c 1 , c 2 : (0, +∞) → (0, +∞) be functions such that c i (ε) → 0 as ε → 0 for i = 1, 2. We make the following hypotheses: Then there exists a function c : (0, ∞) → (0, +∞) with c(ε) → 0 as ε → 0, which depends only on d 1 , d 2 , c 1 , c 2 and the r 1, j , r 2, j , such that The proof of Theorem 1.13 is similar to that of Theorem 1.11, and is given at the end of Section 7.

Concatenation of characteristic factors
Analogues of the above results can be obtained for characteristic factors of the Gowers-Host-Kra seminorms [30] in ergodic theory. To construct these factors for arbitrary abelian group actions (including uncountable ones), it is convenient to introduce the following notation (which should be viewed as a substitute for the machinery of Følner sequences that does not require amenability). Given an additive group H, we consider the set F[H] of non-empty finite multisets Q in H. We can make F[H] a directed set by declaring Q 1 ≤ Q 2 if one has Q 2 = Q 1 + R for some non-empty finite multiset R; note that any two Q 1 , Q 2 have a common upper bound Q 1 + Q 2 . One can then define convergence along nets in the usual fashion: given a sequence of elements x Q of a Hausdorff topological space indexed by the non-empty finite multisets Q in H, we write lim Q→H x Q = x if for every neighbourhood U of x, there is a finite non-empty multiset Q 0 in H such that x Q ∈ U for all Q ≥ Q 0 . Similarly one can define joint limits lim (Q 1 ,...,Q k )→(H 1 ,...,H k ) x Q 1 ,...,Q k , where each Q i ranges over finite non-empty multisets in H i , using the product directed set F[H 1 ] × · · · × F[H d ]. Thus for instance lim (Q 1 ,Q 2 )→(H 1 ,H 2 ) x Q 1 ,Q 2 = x if, for every neighbourhood U of x, there exist Q 1,0 , Q 2,0 in H 1 , H 2 respectively such that x Q 1 ,Q 2 ∈ U whenever Q 1 ≥ Q 1,0 and Q 2 ≥ Q 2,0 . If the x Q or x Q 1 ,...,Q k take values in R, we can also define limit superior and limit inferior in the usual fashion. Remark 1.14 (Amenable case). If G is an amenable (discrete) countable additive group with a Følner sequence Φ n , and a g is a bounded sequence of complex numbers indexed by g ∈ G, then we have the relationship lim between traditional averages and the averages defined above, whenever the right-hand side limit exists. Indeed, for any ε > 0, we see from the Følner property that for any given Q ∈ F[G], that E g∈Φ n a g and E g∈Φ n +Q a g differ by at most ε if n is large enough; while from the convergence of the right-hand side limit we see that E g∈Φ n +Q a g and E g∈Q a g differ by at most ε for all n if Q is large enough, and the claim follows. A similar result holds for joint limits, namely that lim n 1 ,...,n d →∞ whenever Φ n,i is a Følner sequence for H i and the a h 1 ,...,h d are a bounded sequence of complex numbers.
Given a G-system (X, T ), a natural number d, a subgroup H of G, and a function f ∈ L ∞ (X), we define the Gowers-Host-Kra seminorm we will show in Theorem 2.1 below that this limit exists, and agrees with the more usual definition of the Gowers-Host-Kra seminorm from [30]; in fact the definition given here even extends to the case when G and H are uncountable abelian groups. More generally, given subgroups H 1 , . . . , H d of G, we define the Gowers-Host-Kra box seminorm again, the existence of this limit will be shown in Theorem 2.1 below. Define a factor of a G-system (X, T ) with X = (X, B, µ) to be a G-system (Y, T ) with Y = (Y, Y, ν) together with a measurable factor map π : X → Y intertwining the group actions (thus T g • π = π • T g for all g ∈ G) such that ν is equal to the pushforward π * µ of µ, thus µ(π −1 (E)) = ν(E) for all E ∈ Y. By abuse of notation we use T to denote the action on the factor Y as well as on the original space X. Note that L ∞ (Y) can be viewed as a (G-invariant) subalgebra of L ∞ (X), and similarly L 2 (Y) is a (G-invariant) closed subspace of the Hilbert space L 2 (X); if f ∈ L 2 (X), we write E( f |Y) for the orthogonal projection onto L 2 (Y). We also call X an extension of Y. Note that any subalgebra Y of B can be viewed as a factor of X by taking Y = X and ν = µ Y . For instance, given a subgroup H of G, the invariant σ -algebra B H consisting of sets E ∈ B such that T h E = E up to null sets for any h ∈ H generates a factor X H of X, and so we can meaningfully define the conditional expectation E( f |X H ) for any f ∈ L 2 (X).
Two factors Y, Y of X are said to be equivalent if the algebras L ∞ (Y) and L ∞ (Y ) agree (using the usual convention of identifying functions in L ∞ that agree almost everywhere), in which case we write Y ≡ Y . We partially order the factors of X up to equivalence by declaring This gives the factors of X up to equivalence the structure of a lattice: the meet Y ∧ Y of two factors is given (up to equivalence) by setting L ∞ (Y ∧ Y ) = L ∞ (Y) ∩ L ∞ (Y ), and the join Y ∨ Y of two factors is given by setting L ∞ (Y ∨ Y ) to be the von Neumann algebra generated by L ∞ (Y) ∪ L ∞ (Y ) (i.e., the smallest von Neumann subalgebra of L ∞ (X) containing L ∞ (Y) ∪ L ∞ (Y )).
We say that a G-system X is H-ergodic for some subgroup H of G if the invariant factor X H is trivial (equivalent to a point). Note that if a system is G-ergodic, it need not be H-ergodic for subgroups H of G. Because of this, it will be important to not assume H-ergodicity for many of the results and arguments below, which will force us to supply new proofs of some existing results in the literature that were specialised to the ergodic case.
For the seminorm U d H (X), it is known 3 (see [30] or Theorem 2.4 below) that there exists a characteristic factor (Z <d H (X), T ) = (Z <d H , Z <d H , ν, T ) of (X, T ), unique up to equivalence, with the property that for all f ∈ L ∞ (X); for instance, Z <1 H (X) can be shown to be equivalent to the invariant factor X H , and the factors Z <d H (X) are non-decreasing in d. In the case when H is isomorphic to the integers Z, and assuming for simplicity that the system X is H-ergodic, the characteristic factor was studied by Host and Kra [30], who obtained the important result that the characteristic factor Z <d H (X) was isomorphic to a limit of d − 1-step nilsystems (see also [37] for a related result regarding characteristic factors of multiple averages); the analogous result for actions of infinite-dimensional vector spaces F ω := n F n was obtained in [5]. More generally, given subgroups H 1 , . . . , H d , there is a unique (up to equivalence) characteristic factor Z <d for all f ∈ L ∞ (X); this was essentially first observed in [29], and we establish it in Theorem 2.4. However, a satisfactory structural description of the factors Z <d H 1 ,...,H d (X) (in the spirit of [30]) is not yet available; see [1] for some recent work in this direction.
We can now state the ergodic theory analogues of Theorems 1.11 and 1.13. In these results G is always understood to be an at most countable additive group. Because our arguments will require a Følner sequence of coset progressions of bounded rank, we will also have to temporarily make a further technical restriction on G, namely that G be the sum of a finitely generated group and a profinite group (or equivalently, a group which becomes finitely generated after quotienting by a profinite subgroup). This class of groups includes the important examples of lattices Z d and vector spaces F ω = n F n over finite fields, but excludes the infinitely generated torsion-free group Z ω = n Z n . Observe that this class of groups is also closed under quotients and taking subgroups. Theorem 1.15 (Concatenation theorem for characteristic factors). Suppose that G is the sum of a finitely generated group and a profinite group. Let X be a G-system, let H 1 , H 2 be subgroups of G, and let d 1 , d 2 be positive integers. Then Equivalently, using the lattice structure on factors discussed previously, or equivalently, In Section 2, we deduce these results from the corresponding combinatorial results in Theorems 1.11, 1.13. and where α ∈ R/Z is a fixed irrational number. These shifts commute and generate a Z 2 -system T (n,m) (x, y, z) := (x + nα, y + mα, z + ny + mx + nmα) (compare with (1)). The shift T (1,0) does not act ergodically on X, but one can perform an ergodic decomposition into ergodic components R/Z × {y} × R/Z for almost every y ∈ R/Z, with T (1,0) acting as a circle shift (x, z) → (x + α, z + y) on each such component. From this one can easily verify that On the other hand, Z <2 Z 2 (X) = X, as there exist functions in L ∞ (Z) whose U 2 Z 2 (X) norm vanish (for instance the function (x, y, z) → e 2πiz ). Nevertheless, Corollary 1.15 concludes that Z <3 Z 2 (X) = X (roughly speaking, this means that X exhibits "quadratic" or "2-step" behaviour as a Z 2 -system, despite only exhibiting "linear" or "1-step" behaviour as a Z × {0}-system or {0} × Z-system).
Remark 1.18. In the case that H is an infinite cyclic group acting ergodically, Host and Kra [30] show that the characteristic factor Z <d H (X) = Z <d H,...,H (X) is an inverse limit of d − 1-step nilsystems. If H does not act ergodically, then (assuming some mild regularity on the underlying measure space X) one has a similar characterization of Z <d H (X) on each ergodic component. The arguments in [30] were extended to finitely generated groups H acting ergodically in [19]; see also [5] for an analogous result in the case of actions of infinite-dimensional vector spaces F ω over a finite field. Theorem 1.15 can then be interpreted as an assertion that if X acts as an inverse limit of nilsystems of step d 1 − 1 along the components of one group action H 1 , and as an inverse limit of nilsystems of step d 2 − 1 along the components of another (commuting) group action H 2 , then X is an inverse limit of nilsystems of step at most d 1 + d 2 − 2 along the components of the joint H 1 + H 2 action. It seems of interest to obtain a more direct proof of this assertion. A related question would be to establish a nilsequence version of Proposition 1.2. For instance one could conjecture that whenever a sequence f : Z × Z → C was such that n 1 → f (n 1 , n 2 ) was a Lipschitz nilsequence of step d 1 − 1 uniformly in n 2 (as defined for instance in [24]), and n 2 → f (n 1 , n 2 ) was a Lipschitz nilsequence of step d 2 − 1 uniformly in n 1 , then f itself would be a Lipschitz nilsequence jointly on Z 2 of step d 1 + d 2 − 2. It seems that Proposition 1.11 is at least able to show that f can be locally approximated (in, say, an L 2 sense) by such nilsequences on arbitrarily large scales, but some additional argument is needed to obtain the conjecture as stated.
We are able to remove the requirement that G be the sum of a finitely generated group and a profinite group from Theorem 1.15: Theorem 1.19 (Concatenation theorem for characteristic factors). Let G be an at most countable additive group. Let (X, T ) be a G-system, let H 1 , H 2 be subgroups of G, and let d 1 , d 2 be positive integers. Then We prove this result in Section 3 using an ergodic theory argument that relies on the machinery of cubic measures and cocycle type that was introduced by Host and Kra [30], rather than on the combinatorial arguments used to establish Theorems 1.11, 1.13. It is at this point that we use our requirement that G-systems be countably generated modulo null sets, in order to apply the ergodic decomposition (after first passing to a compact metric space model), as well as the Mackey theory of isometric extensions. It is likely that a similar strengthening of Theorem 1.16 can be obtained, but this would require extending much of the Host-Kra machinery to tuples of commuting actions, which we will not do here.

Globalizing uniformity
We have seen that anti-uniformity can be "concatenated", in that functions which are approximately orthogonal to functions that are locally Gowers uniform in two different directions are necessarily also approximately orthogonal to more globally Gowers uniform functions. By duality, one then expects to be able to decompose a globally Gowers uniform function into functions that are locally Gowers uniform in different directions. For instance, in the ergodic setting, one has the following consequence of Theorem 1.15: Corollary 1.20. Let (X, T ) be a G-system, let H 1 , H 2 be subgroups of G, and let d 1 , d 2 be positive integers. If f ∈ L ∞ (X) is orthogonal to L ∞ (Z <d 1 +d 2 −1 H 1 +H 2 (X)) (with respect to the L 2 inner product), then one can write f = f 1 + f 2 , where f 1 ∈ L ∞ (X) is orthogonal to L ∞ (Z <d 1 H 1 (X)) and f 2 ∈ L ∞ (X) is orthogonal to L ∞ (Z <d 2 H 2 (X)); furthermore, f 1 and f 2 are orthogonal to each other.
Proof. By Theorem 1.19 and (13) applied to the system Z <d 1 H 1 (X), any function in L ∞ (Z <d 1 H 1 (X)) orthogonal to L ∞ (Z <d 1 +d 2 −1 This seminorm is the same as the U d 2 H 2 (X) seminorm, and so by (13) again this function must be necessarily orthogonal to L ∞ (Z <d 2 H 2 (X)). We conclude that the restrictions of the spaces L ∞ (Z <d 1 H 1 (X)) and L ∞ (Z <d 2 H 2 (X)) to L ∞ (Z <d 1 +d 2 −1 H 1 +H 2 (X)) are orthogonal, and the claim follows.
One can similarly use Theorem 1.16 to obtain Corollary 1.21. Suppose that G is the sum of a finitely generated group and a profinite group. Let (X, T ) be a G-system, let d 1 , d 2 be positive integers, and let H )) and f 2 ∈ L ∞ (X) is orthogonal to L ∞ (Z <d 2 H 2,1 ,...,H 2,d 2 (X)); furthermore, f 1 and f 2 are orthogonal to each other.
We can use the orthogonality in Corollary 1.20 to obtain a Bessel-type inequality: Corollary 1.22 (Bessel inequality). Let G be an at most countable additive group. Let (X, T ) be a G-system, let (H i ) i∈I be a finite family of subgroups of G, and let (d i ) i∈I be a family of positive integers.
Then for any f ∈ L ∞ (X), we have Proof. Write f i := E( f |Z <d i H i (X)). We can write the left-hand side of (14) as f , ∑ i∈I f i which by the Cauchy-Schwarz inequality is bounded by and the claim follows.
One can use Corollary 1.21 to obtain an analogous Bessel-type inequality involving finitely generated subgroups H i,k , k = 1, . . . , d i , which we leave to the interested reader.
Returning now to the finitary Gowers norms, one has a qualitative analogue of the Bessel inequality involving the Gowers uniformity norms: Theorem 1.23 (Qualitative Bessel inequality for uniformity norms). Let (Q i ) i∈I be a finite non-empty family of coset progressions Q i , all of rank at most r, in an additive group G. Let (X, T ) be a G-system, and let d be a positive integer. Let f lie in the unit ball of L ∞ (X), and suppose that for some ε > 0. Then where c : (0, +∞) → (0, +∞) is a function such that c(ε) → 0 as ε → 0. Furthermore, c depends only on r and d. (In particular, c is independent of the size of I.) We prove this theorem in Section 8. We remark that the theorem is only powerful when the cardinality of the set I is large compared to ε, otherwise the claim would easily follow from considering the diagonal contribution i = j to (15). Theorem 1.23 has an analogue for the Gowers box norms: Theorem 1.24 (Qualitative Bessel inequality for box norms). Let d be a positive integer. For each 1 ≤ j ≤ d, let (Q i, j ) i∈I be a finite family of coset progressions Q i, j , all of rank at most r, in an additive group G. Let (X, T ) be a G-system. Let f lie in the unit ball of L ∞ (X), and suppose that Then where c : (0, +∞) → (0, +∞) is a function such that c(ε) → 0 as ε → 0. Furthermore, c depends only on r and d.
implies some non-trivial lower bound for f U d 1 +d 2 −1 εQ 1 +εQ 2 (X) ). Unfortunately this is not the case; a simple counterexample is provided by a function f of the form , Q 2 are large subgroups of G, X = G with the translation action, and f 1 is constant in the Q 1 direction but random in the Q 2 direction), and vice versa for f 2 .

A sample application
In a sequel to this paper [36], we will use the concatenation theorems proven in this paper to study polynomial patterns in sets such as the primes. Here, we will illustrate how this is done with a simple example, namely controlling the average In that case we will be able to control this expression by the global Gowers U 3 norm: Z/NZ (Z/NZ) ≤ ε for some ε > 0 and i = 1, . . . , 4, where we give Z/NZ the uniform probability measure. Then for some quantity c(ε) depending only on ε and C that goes to zero as ε → 0.
For instance, using this proposition (embedding [N] in, say, Z/5NZ) and the known uniformity properties of the Möbius function µ (see [22]) we can now obtain the asymptotic as N → ∞; we leave the details to the interested reader. As far as we are able to determine, this asymptotic appears to be new. In the sequel [36] to this paper we will consider similar asymptotics involving various polynomial averages such as , and arithmetic functions such as the von Mangoldt function Λ.
We prove Proposition 1.26 in Section 9. Roughly speaking, the strategy is to observe that can be controlled by averages of "local" Gowers norms of the form U 2 Q , where Q is an arithmetic progression in Z/NZ of length comparable to M. Each individual such norm is not controlled directly by the U 3 Z/NZ norm, due to the sparseness of Q; however, after invoking Theorem 1.23, we can control the averages of the U 2 Q norms with averages of U 3 Q+Q norms, where Q, Q are two arithmetic progressions of length comparable to M. For typical choices of Q, Q , the rank two progerssion Q + Q will be quite dense in Z/NZ, allowing one to obtain the proposition.
One would expect that the U 3 norm in Proposition 1.26 could be replaced with a U 2 norm. Indeed this can be done invoking the inverse theorem for the U 3 norm [21] as well as equidistribution results for nilsequences [26]: . This result can be proven by adapting the arguments based on the arithmetic regularity lemma in [25, §7]; we sketch the key new input required in Section 10. In the language of Gowers and Wolf [15], this proposition asserts that the true complexity of the average A N,M is 1, rather than 2 (the latter being roughly analogous to the "Cauchy-Schwarz complexity" discussed in [15]). This drop in complexity is consistent with similar results established in the ergodic setting in [4], and in the setting of linear patterns in [15,16,17,18,25], and is proven in a broadly similar fashion to these results. In principle, Proposition 1.27 is purely an assertion in "linear" Fourier analysis, since it only involves the U 2 Gowers norm, but we do not know of a way to establish it without exploiting both the concatenation theorem and the inverse U 3 theorem.
We thank the anonymous referees for a careful reading of the paper and for many useful suggestions and corrections.

The ergodic theory limit
In this section we show how to obtain the main ergodic theory results of this paper (namely, Theorem 1.15 and Theorem 1.16) as a limiting case of their respective finitary results, namely Theorem 1.11 and Theorem 1.13. The technical hypothesis that the group G be the sum of a finitely generated group and a profinite group will only be needed near the end of the section. Readers who are only interested in the combinatorial assertions of this paper can skip ahead to Section 4.
We first develop some fairly standard material on the convergence of the Gowers-Host-Kra norms, and on the existence of characteristic factors. Given a G-system (X, T ), finite non-empty multisets Q 1 , . . . , Q d of G, and elements f ω of L 2 d (X) for each ω ∈ {0, 1} d , define the Gowers inner product where ω = (ω 1 , . . . , ω d ), |ω| := ω 1 + · · · + ω d , and C : f → f is the complex conjugation operator; the absolute convergence of the integral is guaranteed by Hölder's inequality. Comparing this with Definition 1.9, we see that We also recall the Cauchy-Schwarz-Gowers inequality (see [20,Lemma B.2]). By setting f ω to equal f when ω d +1 = · · · = ω d = 0, and equal to 1 otherwise, we obtain as a corollary the monotonicity property We have the following convergence result: Theorem 2.1 (Existence of Gowers-Host-Kra seminorm). Let (X, T ) be a G-system, let d be a natural number, and let H 1 , . . . , H d be subgroups of G. For each ω ∈ {0, 1} d , let f ω be an element of L 2 d (X).
Then the limit exists. In particular, the limit exists for any f ∈ L 2 d (X).
It is likely that one can deduce this theorem from the usual ergodic theorem, by adapting the arguments in [30], but we will give a combinatorial proof here, which applies even in cases in which G or H 1 , . . . , H d are uncountable.
Proof. By multilinearity we may assume that the f ω are all real-valued, so that we may dispense with the complex conjugation operations. We also normalise the f ω to lie in the closed unit ball of L 2 d (X).
and ω = (ω 1 , . . . , ω d ) agree in the first d components (that is, ω i = ω i for i = 1, . . . , d ). We will prove Theorem 2.1 by downward induction on d , with the d = 0 case establishing the full theorem.
Thus, assume that 0 ≤ d ≤ d and that the claim has already been proven for larger values of d (this hypothesis is vacuous for d = d). We will show that for any given (and sufficiently small) ε > 0, and for sufficiently large only increases by at most ε if one increases any of the Q i , that is to say for any i = 1, . . . , d and any finite non-empty multiset R. Applying this once for each i, we see that the limit superior of the f d Q 1 ,...,Q d (X) does not exceed the limit inferior by more than dε, and sending ε → 0 (and using the boundedness of the f d Q 1 ,...,Q d (X) , from Hölder's inequality) we obtain the claim. It remains to establish (20). There are two cases, depending on whether i ≤ d or i > d . First suppose that i ≤ d ; by relabeling we may take i = 1. Using (17) and the d -symmetry (and hence 1-symmetry) of f , we may rewrite f d From the unitarity of shift operators and the triangle inequality, we have for any finite non-empty multiset R in H 1 . This gives (20) (without the epsilon loss!) in the case i ≤ d . Now suppose i > d . By relabeling we may take i = d. In this case, we rewrite f d where Note from Hölder's inequality that the f ω d all lie in the closed unit ball of L 2 (X). A similar rewriting shows that the quantity . This tuple of functions is d + 1-symmetric after rearrangement, and so by induction hypothesis this expression converges to a limit as (Q 1 , . . . , Q d ) → (H 1 , . . . , H d ). In particular, for (Q 1 , . . . , Q d ) sufficiently large, we have for any n ∈ H d . By the parallelogram law, this implies that for all n ∈ H d , which by the triangle inequality implies that for any finite non-empty multiset R in H d . By Cauchy-Schwarz and (21), this implies (for ε small enough) that In the degenerate case d = 0 we adopt the convention D 0 () = 1. We also abbreviate D d H,...,H as D d H . We can similarly define the local dual operators ..,H d in the weak operator topology. We can upgrade this convergence to the strong operator topology: Proof. The claim is trivially true for d = 0, so assume d ≥ 1. By multilinearity we may assume the f ω are real. By a limiting argument using Hölder's inequality, we may assume without loss of generality that f ω all lie in L ∞ (X) and not just in . Theorem 2.1 already gives weak convergence, so it suffices to show that the limit exists in the strong L 2 (X) topology. By the cosine rule (and the completeness of L 2 (X)), it suffices to show that the joint limit exists. But the expression inside the limit can be written as an inner product andf ω := 1 for all other ω , and the claim then follows from Theorem 2.1 (with d replaced by 2d).
In the d = 1 case, we have H f is orthogonal to all H-invariant functions; thus we obtain the mean ergodic theorem In particular, we have which on taking limits using Theorem 2.1 and dominated convergence implies that From this, we see that the seminorm U d H (X) defined here agrees with the Gowers-Host-Kra seminorm from [30]; see [5, Appendix A] for details.
A key property concerning dual functions is that they are closed under multiplication after taking convex closures: Proposition 2.3. Let X be a G-system, and let H 1 , . . . , H d be subgroups of G. Let B be the closed convex hull (in L 2 (X)) of all functions of the form D d , where the f ω all lie in the closed unit ball of L ∞ (X). Then B is closed under multiplication: if F, F ∈ B, then FF ∈ B.
Proof. We may assume d ≥ 1, as the d = 0 case is trivial. By convexity and a density argument, we may assume that F, F are themselves dual functions, thus and for some f ω , f ω in the closed unit ball of L ∞ (X). By Proposition 2.2, we can write FF as where the limits are in the L 2 (X) topology. For any given h ∈ G, averaging a bounded sequence over Q i and averaging over Q i + h are approximately the same if Q i is sufficiently large (e.g. if Q i is larger than the progression {0, h, . . . , Nh} for some large h). Because of this, we can shift the k i variable by h i in the above expression without affecting the limit. In other words, FF is equal to By Proposition 2.2 (with d replaced by 2d), the above expression remains convergent if we work with the joint limit lim In particular, we may interchange limits and write the above expression as Computing the inner limit, this simplifies to This is the strong limit of convex averages of elements of B, and thus lies in B as required.
We can now construct the characteristic factor (cf. [ to be the set of measurable sets E such that 1 E is expressible as the limit (in L 2 (X)) of a uniformly bounded sequence in the set RB :  (25) we conclude that f is orthogonal to B, and hence to L ∞ (Z <d We have a basic corollary of Theorem 2.4 (cf. [30,Proposition 4.6]): Corollary 2.5. Let X be a G-system, let Y be a factor of X, and let H 1 , . . . , H d be subgroups of G. Then Proof and (by Theorem 2.4) f can be expressed as the limit of dual functions of functions in L ∞ (Y), and hence in L ∞ (X), and so the inclusion then follows from another application of We now can deduce Theorem 1.15 and Theorem 1.16 from Theorem 1.11 and Theorem 1.13 respectively. At this point we will begin to need the hypothesis that G is the sum of a finitely generated group and a profinite group. We just give the argument for Theorem 1.15; the argument for Theorem 1.16 is completely analogous and is left to the reader. Let (X, T ), G, H 1 , H 2 , d 1 , d 2 be as in Theorem 1.15. By Corollary 2.5, we may assume without loss of generality that Indeed, if we set we see from Corollary 2.5 that X obeys the condition (27), and that Z <d 1 +d 2 −1 . By Theorem 2.4, we see that for every δ > 0, there exists a real number F(δ ) such that f lies within δ in L 2 (X) norm of both F(δ ) · B 1 and F(δ ) · B 2 . On the other hand, from the Cauchy-Schwarz-Gowers inequality (18) one has whenever i = 1, 2 and f i ∈ B i , and f is in the closed unit ball of L ∞ (X). We conclude that (22) and (9) we conclude that for any coset progression Q i in H i . The right-hand side goes to zero as ε → 0.
Since G is the sum of a finitely generated group and a profinite group, the subgroups H 1 , H 2 are also. In particular, for each i = 1, 2, we may obtain a Følner sequence Q i,n for H i of coset progressions of bounded rank (thus for any g ∈ H i , Q i,n and Q i,n + h differ (as multisets) by o(|Q i,n |) elements as n → ∞). (Indeed, if H i is finitely generated, one can use ordinary progressions as the Følner sequence, whereas if H i is at most countable and bounded torsion, one can use subgroups for the Følner sequence, and the general case follows by addition.) Applying Theorem 1.11, we conclude that for some c(ε) independent of n that goes to zero as ε → 0. Since the Q i,n are Følner sequences for H i , Q 1,n + Q 2,n is a Følner sequence for H 1 + H 2 . In particular, by (11) one has for every ε > 0. Sending ε → 0, we obtain the claim.

An ergodic theory argument
We now give an ergodic theory argument that establishes Theorem 1.19. The arguments here rely heavily on those in [30], but are not needed elsewhere in this paper. For this section it will be convenient to restrict attention to G-systems (X, T ) in which X is a compact metric space with the Borel σ -algebra, in order to access tools such as disintegration of measure. The requirement of being a compact metric space is stronger than our current hypothesis that X is countably generated modulo null sets; however, it is known (see [8,Proposition 5.3] that every G-system that is countably generated modulo null sets is equivalent (modulo null sets) to another G-system (X , T ) in which X is a compact metric space with the Borel σ -algebra. The corresponding characteristic factors such as Z <d 1 H 1 (X) are also equivalent up to null sets (basically because the Gowers-Host-Kra seminorms are equivalent). Because of this, we see that to prove Theorem 1.19 it suffices to do so when X is a compact metric space.
We now recall the construction of cubic measures from [30], which in [29] was generalised 4 to our current setting of arbitrary actions of multiple subgroups of an at most countable additive group. Definition 3.1 (Cubic measures). Let (X, T ) be a G-system with X = (X, B, µ), and let H 1 , . . . , H d be subgroups of G. We define the G-system (X H 1 ,...,H d is the unique probability measure such that for all f ω ∈ L ∞ (X), where the tensor product ω∈{0,1} d C |ω| f ω is defined as Finally, the shift T on X H 1 ,...,H d is defined via the diagonal action:  (28). We leave the details of these arguments to the interested reader. Once this measure is constructed, it is easy to see that the diagonal action of T preserves the measure µ H 2 ,H 1 , with the isomorphism given by the map (x 00 , x 10 , x 01 , x 11 ) → (x 00 , x 01 , x 10 , x 11 ).
One can informally view the probability space X H 1 ,...,H d as describing the distribution of certain d-dimensional "parallelopipeds" in X, where the d "directions" of the parallelopiped are "parallel" to H 1 , . . . , H d . We will also need the following variant of these spaces, which informally describes the distribution of "L-shaped" objects in X.
Definition 3.2 (L-shaped measures). Let (X, T ) be a G-system with X = (X, B, µ), and let H, K be subgroups of G. We define the system (X L H,K , T ) by setting X L H,K := (X L , B L , µ L H,K ), where X L := X {00,01,10} is the set of tuples (x 00 , x 10 , x 01 ) with x 00 , x 10 , x 01 ∈ X, B L is the product measure, and µ L H,K is the unique probability measure such that for all f 00 , f 01 , f 10 ∈ L ∞ (X). The shift T on X L H,K is defined by the diagonal action.
The system X L H,K is clearly a factor of X [2] H,K , and also has factor maps to X [1] H and X [1] K given by (x 00 , x 10 , x 01 ) → (x 00 , x 10 ) and (x 00 , x 10 , x 01 ) → (x 00 , x 01 ) respectively. A crucial fact for the purposes of establishing concatenation is that X L H,K additionally has a third factor map to the space X H+K , T ).
Informally, this lemma reflects the obvious fact that if x 00 and x 10 are connected to each other by an element of the H action, and x 00 and x 01 are connected to each other by an element of the K action, then x 10 and x 01 are connected to each other by an element of the H + K action.
Proof. If f 00 , f 10 , f 01 ∈ L ∞ (X) are real-valued, then by Theorem 2.1, Definition 1.9, and Definition 3.2 we have where we use Proposition 2.2 and (26) in the last two lines. Specialising to the case f 00 = 1, we conclude in particular that H+K (X) = X [1] f 10 ⊗ f 01 dµ We now begin the proof of Theorem 1.19. Let (X, T ), G, H 1 , H 2 , d 1 , d 2 be as in Theorem 1.15. We begin with a few reductions. By induction we may assume that the claim is already proven for smaller values of d 1 + d 2 . By shrinking G if necessary, we may assume that G = H 1 + H 2 (note that replacing G with H 1 + H 2 does not affect factors such as Z <d 1 H 1 (X)). Next, we observe that we may reduce without loss of generality to the case where the action of G on (X, T ) is ergodic. To see this, we argue as follows. As X was assumed to be a compact metric space, we have an ergodic decomposition µ = Y µ y dν(y) for some probability space (Y, ν) (the invariant factor X G of X), and some probability measures µ y on X depending measurably on y, and ergodic in G for almost every y; see [8,Theorem 5.8] or [6,Theorem 6.2]. Let X y = (X, B, µ y ) denote the components of this decomposition. From (12), (7) we have the identity for any f ∈ L ∞ (X). We conclude that a bounded measurable function f ∈ L ∞ (X) vanishes in U d 1 H 1 (X) if and only if it vanishes in U d 1 H 1 (X y ) for almost every y. By (13), this implies that f is measurable (modulo null sets) with respect to Z <d 1 H 1 (X) if and only if it is measurable (modulo null sets) with respect to Z <d 1 H 1 (X y ) for almost every y. Similarly for Z <d 2 H 2 (X) and Z <d H 1 +H 2 (X). From this it is easy to see that Theorem 1.15 for X will follow from Theorem 1.15 for almost every X y . Thus we may assume without loss of generality that the system (X, T ) is G-ergodic.
Next, if we set X := Z <d 1 H 1 (X) ∧ Z <d 2 H 2 (X), we see from Corollary 2.5 (as in the proof of Theorem 1.15) that Z <d 1 +d 2 −1 Thus we may replace X by X and assume without loss of generality that Following [30] (somewhat loosely 5 ), let us say that a G-system X is of H-order <d if X ≡ Z <d (X); thus we are assuming that X has H 1 -order <d 1 and H 2 -order <d 2 . Our task is now to show that X has G-order <d 1 + d 2 − 1. For future reference we note from Corollary 2.5 that any factor of a G-system with H-order <d also has H-order <d.
For future reference, we observe that the property of having G-order <d is also preserved under taking Host-Kra powers: Lemma 3.4. Let H, H be subgroups of G. Let Y be a G-system with H-order <d for some d ≥ 1. Then Y [1] H is also of H-order <d.
Proof. The space Y [1] H contains two copies of Y as factors, which we will call Y 1 and Y 2 . By Corollary 2.5, we have , and the claim follows.
If d 1 = 1, then every function in L ∞ (X) is H 1 -invariant, and it is then easy to see that the U d 2 H 2 (X) and U d 2 H 1 +H 2 (X) seminorms agree. By (13), we conclude that Z <d 2 H 2 (X) is equivalent to Z <d 2 H 1 +H 2 (X), and the claim follows. Similarly if d 2 = 1. Thus we may assume that d 1 , d 2 > 1.
We now set Y := Z <d 1 +d 2 −2 From the induction hypothesis we have We now analyse X as an extension over Y, following the standard path in [10], [30]. Given a subgroup H of G, we say (as in [9]) that X is a compact extension of Y with respect to the H action if any function in L ∞ (X) can be approximated to arbitrary accuracy (in L 2 (X)) by an H-invariant finite rank module over L ∞ (Y). We have We now invoke [9, Proposition 2.3], which asserts that if one system X is a compact extension of another Y for two commuting group actions H, K, then it is also a compact extension for the combined action of H + K. Since we are assuming G = H 1 + H 2 , we conclude that X is a compact extension of Y as a G-system.
Since X is also assumed to be G-ergodic, we may now use the Mackey theory of isometric extensions from [10, §5], and conclude that X is an isometric extension of Y in the sense that X is equivalent to a system of the form Y × ρ K/L where K = (K, ·) is a compact group, L is a closed subgroup of K, ρ : G × Y → K is a measurable function obeying the cocycle equation and the Y × ρ K/L is the product of the probability spaces Y and K/L (the latter being given the Haar measure) with action given by the formula T g (y,t) := (T g y, ρ(g, y)t) for all y ∈ Y and t ∈ K/L. We now give the standard abelianisation argument, originating from [10] and used also in [30], that allows us to reduce to the case of abelian extensions. Proposition 3.6 (Abelian extension). With ρ, K, L as above, L contains the commutator group [K, K]. In particular, after quotienting out by L we may assume without loss of generality that K is abelian and L is trivial.
Proof. This is essentially [30, Proposition 6.3], but for the convenience of the reader we provide an arrangement (essentially due to Szegedy [32]) of the argument here, which uses the action of H 1 but does not presume H 1 -ergodicity.
We identify X with Y × ρ K/L. For any k ∈ K, we define the rotation actions τ k on L ∞ (X) by τ k f (y,t) := f (y, k −1 t).
At present we do not know that these actions commute with the shift T g ; however they are certainly measure-preserving, so in particular = 0 for all f ∈ L ∞ (X) and k ∈ K. From the Cauchy-Schwarz-Gowers inequality (23) we conclude that the inner product ( f ω ) ω∈{0,1} d 1 −1 U d 1 −1 H 1 (X) vanishes whenever one of the functions f ω ∈ L ∞ (X) is of the form f − τ k f for some f ∈ L ∞ (X) and k ∈ K. By linearity, this implies that the inner product is unchanged if τ k is applied to one of the functions f ω ∈ L ∞ (X) for some k ∈ K. Using the recursive relations between the Gowers inner products, connected by an edge, for any k ∈ K. Taking the commutator of this fact using two intersecting edges on {0, 1} d 1 (recalling that d 1 > 1), we conclude that ( f ω ) ω∈{0,1} d 1 U d 1 H 1 (X) is unchanged if a single f ω is shifted by τ k for some k in the commutator group [K, K]. Equivalently, for f ∈ L ∞ (X) and k ∈ [K, K], f − τ k f is orthogonal to all dual functions for U d 1 H 1 (X); since X ≡ Z <d 1 H 1 (X), this implies that f − τ k f is trivial. Thus the action of [K, K] on Y × ρ K/L is trivial, and so L lies in [K, K] as required.
With this proposition, we may thus write X = Y × ρ K for some compact abelian group K = (K, ·); our task is now to show that Y × ρ K has G-order <d 1 + d 2 − 1.
Define an S 1 -cocycle (or cocycle, for short) of a G-system (Y, T ) to be a map η : G × Y → S 1 taking values in the unit circle S 1 := {z ∈ C : |z| = 1} obeying the cocycle equation (32); this is clearly an abelian group. Observe that for any character χ : K → S 1 in the Pontryagin dual of K, χ • ρ is a cocycle. We say that a cocycle is an H-coboundary if there is a measurable F : X → S 1 such that η(h, x) = F(T h x)/F(x) for all h ∈ H and almost every x ∈ X; this is a subgroup of the space of all cocycles for each H. Given a cocycle η : G × Y → S 1 on Y and a subgroup H of G, we define the cocycle d [1] H η : G × Y [1] H → S 1 on the Host-Kra space Y [1] H by the formula d [1] H η(g, x, x ) := η(g, x)η(g, x ) for all g ∈ G and x, x ∈ X; it is easy to see that d [1] H is a homomorphism from cocycles on Y to cocycles on Y [1] H , which maps H -coboundaries to H -coboundaries for any subgroup H of G. We may iterate this construction to define a homomorphism d Note that d [1] H and d [1] K commute for any H, K (after identifying Y [2] H,K with Y [2] K,H in the obvious fashion), and so the operator d H . We say that a cocycle σ : H σ is a H-coboundary. Because the operator d [1] H maps H-coboundaries to H-coboundaries for any H ≤ G, we see that d [1] H maps cocycles of H-type d to cocycles of H-type d, and that any cocycle of H-type d is also of H-type d for any d > d.
We have a fundamental connection between type and order from [30]: Proposition 3.7 (Type equation). Let d ≥ 1 be an integer, let H be a subgroup of an at most countable additive group G, and let Y be an G-system of H-order <d that is G-ergodic. Let K be a compact abelian group, and let ρ : H ×Y → K be a cocycle.
(i) If the system Y × ρ K has H-order <d, then χ • ρ has H-type d − 1 for all characters χ : K → S 1 .
One can use a result of Moore and Schmidt [31] to conclude that ρ itself is of H-type d − 1 in conclusion (i), but we will not need to do so here. A key technical point to note is that no H-ergodicity hypothesis is imposed.
Proof. For part (i), see 6 [30,Proposition 6.4] or [5,Proposition 4.4]. For part (ii), we argue as follows 7 . Suppose for contradiction that Y × ρ K did not have H-order <d, then by (13) there is a non-zero function 6 Again, the argument in [30] is stated only for G = Z, but extends to other at most countable additive groups G without any modification of the argument. In any event, the version of the argument in [5] is explicitly stated for all such groups G. 7 One can also establish (ii) by using [30,Proposition 7.6] to handle the case when Y × ρ K is ergodic, and Mackey theory and the ergodic decomposition to extend to the non-ergodic case; we leave the details to the interested reader. respect to the K-action, since this action commutes with the H-action. Using Fourier inversion in the K direction (i.e. decomposition of L 2 (Y × ρ K) into K-isotypic components of the K-action) and the triangle inequality, we may assume that f takes the form f (y, k) = F(y)χ(k) for some F ∈ L ∞ (Y) and some character χ : K → S 1 , with F not identically zero. By (13) and the hypothesis that Y has H-order <d, the property of having vanishing U d H norm is unaffected by multiplication by functions in L ∞ (Y), as well as shifts by G, thus (y, k) → (T g |F(y)|)χ(k) also has vanishing U d H norm for g ∈ G. Averaging and using the ergodic theorem, we conclude that the function u : (y, k) → χ(k) has vanishing U d H (Y × ρ K) seminorm, and hence so doesFu for anyF ∈ L ∞ (Y). By the Gowers-Cauchy-Schwarz inequality, this implies that H ), where the integrals are understood to be with respect to the cubic measure µ , where by abuse of notation we write B ⊗ B for the function ((y , k ), (y, k)) → B(y )B(y). By construction of the cubic measure on (Y × ρ K) [d] H and the ergodic theorem, this implies that In our current context, Proposition 3.7(i) shows that χ • ρ is of H i -type d i − 1 for all i = 1, 2 and all characters χ : K → S 1 .
We will shortly establish the following proposition:  where e(x) := e 2πix , then σ is a cocycle. One can view Y [1] H 1 as the space of pairs ((x, y), (x , y)) with x, y, x ∈ R/Z, with the product shift map. We have d [1] H 1 σ ((n, 0), ((x, y), (x , y))) = 1 and so σ is certainly of H 1 -type 1; it is similarly of H 2 -type 1. The system Y × σ S 1 is the system X in Example 1.17, and is thus of G-order < 3. One can also check that d [2] G σ is identically 1, basically because the phase ny + mx + nmα is linear in x, y.
Assuming Proposition 3.8 for the moment, we combine it with Proposition 3.7(i) to conclude that χ • ρ is of G-type d 1 + d 2 − 2 for all characters χ : K → S 1 . Applying Proposition 3.7(ii), we conclude that Y × ρ K is of G-order <d 1 + d 2 − 1, as required.
It remains to establish Proposition 3.8. We first need a technical extension of a result of Host and Kra: Proposition 3.11. Let Y be an G-system that is G-ergodic, and let ρ : G ×Y → S 1 be a cocycle which is of H 1 -type d 1 − 2. Then ρ differs by a G-coboundary from a cocycle which is measurable with respect to Proof. If Y was H 1 -ergodic then this would be immediate from [30,Corollary 7.9] (the argument there is stated for H 1 = Z, but extends to more general at most countable additive groups). To extend this result to the G-ergodic case, we will give an alternate arrangement 8 of the arguments in [30], which does not rely on H 1 -ergodicity. Let X denote the G-system X := Y × ρ S 1 , then S 1 acts on X by translation, with each element ζ of S 1 transforming a function f : (y, z) → f (y, z) in L ∞ (X) to the translated function τ ζ f : (y, z) → f (y, ζ −1 z). As X is an abelian extension of Y, the S 1 -action commutes with the G-action and in particular with the H 1 -action. This implies that the factor Z <d 1 −1 H 1 (X) of X inherits an S 1 -action (which by abuse of notation we will also call τ ζ ) which commutes with the G-action.
Let us say that a function f ∈ L ∞ (X) has S 1 -frequency one if one has τ ζ f = ζ −1 f for all ζ ∈ S 1 , or equivalently if f has the form f (y, ζ ) =f (y)ζ for somef ∈ L ∞ (Y). We claim 9 that there is a function f of S 1 -frequency one with non-vanishing U d 1 −1 H 1 (X) seminorm. Suppose for the moment that this were the case, then by (13) there is a function f ∈ L ∞ (Z <d 1 −1 H 1 (X)) which has a non-zero inner product with a function of S 1 -frequency one. Decomposing f into Fourier components with respect to the S 1 action, and recalling that this action preserves L ∞ (Z <d 1 −1 H 1 (X)), we conclude that L ∞ (Z <d 1 −1 H 1 (X)) contains a function F of S 1 -frequency one. The absolute value |F| of this function is S 1 -invariant and lies in L ∞ (Z <d 1 −1 . The support of |F| may not be all of Y, but from G-ergodicity we can cover Y (up to null sets) by the support of countably many translates |T g F| of |F|. By gluing these translates T g F together and then normalizing, we may thus find a function u in L ∞ (Z <d 1 −1 H 1 (X)) of S 1 -frequency one which has magnitude one, that is to say it takes values in S 1 almost everywhere. One can then check that the functionρ : as usual) is a cocycle that differs from ρ by a G-coboundary, giving the claim.
It remains to prove the claim. We use an argument similar to the one used to prove Proposition 3.7(ii). Suppose for contradiction that all functions of S 1 -frequency one had vanishing U d 1 −1 H 1 (X) norm. By the Cauchy-Schwarz-Gowers inequality (18), we then have ) by tensor products, we conclude that ). Now recall that ρ is of H 1 -type d 1 − 2, so that one has an identity of the form for g ∈ H 1 and almost every y in Y ) taking values in S 1 . Since T g (y, z) = (T g y, ρ(g, y)z), this implies that for almost every ((y , y), z) in X We use this result to obtain a variant of Proposition 3.8: Concatenation of type, variant). Let Y be a G-system of G-order <d 1 + d 2 − 2, of H 1 -order <d 1 , and H 2 -order <d 2 . Let σ : G × Y → S 1 be a cocycle which is of H 1 -type d 1 − 2 and Similarly if one assumes instead that σ has H 1 -type d 1 − 1 and H 2 -type d 2 − 2.
Proof. We just prove the first claim, as the second is similar. Applying the preceding proposition to the restriction of σ to H ×Y , we see that σ differs by a G-coboundary from a cocycle σ : G ×Y → S 1 which is measurable with respect to Z <d 1 −1 (Y) when restricted to H 1 ×Y . By Proposition 3.7(ii), we now conclude that the system Z <d 1 −1 On the other hand, since σ is of H 2 -type d 2 − 1, σ is also. Since Y is of H 2 -order <d 2 , we may apply Proposition 3.7(ii) to conclude that X is of H 2 -order <d 2 . In particular f also lies in Z <d 2 −1 Applying the induction hypothesis for Theorem 1. 19, we conclude that f lies in L ∞ (Z <d 1 +d 2 −2 G (X)). Since Y was already of G-order <d 1 + d 2 − 2, L ∞ (Y) also lies in L ∞ (Z <d 1 +d 2 −2 G (X)). By Fourier analysis, any element of L ∞ (X) can be approximated in L 2 to arbitrary accuracy by polynomial combinations of f and elements of L ∞ (Y), and hence L ∞ (X) is contained in L ∞ (Z <d 1 +d 2 −2 G (X)); that is to say, X has G-order <d 1 + d 2 − 2. By Proposition 3.7(i), this implies that σ is of G-type d 1 + d 2 − 3 on Y. Since σ differs from σ by a G-coboundary, we conclude that σ is of G-type d 1 + d 2 − 3 also, as required.

Sketch of combinatorial concatenation argument
In this section we give an informal sketch of how Theorem 1.11 is proven, glossing over several technical issues that the nonstandard analysis formalism is used to handle.
We assume inductively that d 1 , d 2 > 1, and that the theorem has already been proven for smaller values of d 1 + d 2 . Let us informally call a function f structured of order < d 1 along H 1 if it obeys bounds similar to (i), and similarly define the notion of structured of order < d 2 along H 2 ; these notions can be made rigorous once one sets up the nonstandard analysis formalism. Roughly speaking, Theorem 1.11 then asserts that functions that are structured of order < d 1 along H 1 and structured of order < d 2 along H 2 are also structured of order < d 1 + d 2 − 1 along H 1 + H 2 . A key point (established using the machinery of dual functions) is that the class of functions that have a certain structure (e.g. being structured of order < d 1 along H 1 ) form a shift-invariant algebra, in that they are closed under addition, scalar multiplication, pointwise multiplication, and translation.
By further use of the machinery of dual functions, one can show that if f is structured of order < d 1 along H 1 , then the shifts T n f of f with n ∈ H 1 admit a representation roughly of the form where E h represents some averaging operation 11 with respect to some parameter h, the g h are bounded functions, and the c n,h are functions that are structured of order < d 1 − 1 along H 1 ; this type of "higher order uniformly almost periodic" representation of shifts of structured functions generalizes (2), and first appeared in [34]. In particular, if n ∈ H 2 , we have T n+n f ≈ E h T n c n,h T n g h .
A crucial point now (arising from the shift-invariant algebra property mentioned earlier) is that the structure one has on the original function f is inherited by the functions c n,h and g h . Specifically, since f is structured of order < d 1 along H 1 and of order < d 2 along H 2 , the functions g h appearing in (35) should also be structued in this fashion. This implies that where c n ,h,h is structured of order < d 1 along H 1 and of order < d 2 − 1 along H 2 , and g h,h is some bounded function. This leads to a representation of the form where c n,n ,h,h = (T n c n,h )c n ,h,h . But by the induction hypothesis as before one can show that c n ,h,h is structured of order < d 1 + d 2 − 2 along H 1 + H 2 , and then from the shift-invariant algebra property mentioned earlier, we see that c n,n ,h,h is also structured of order < d 1 + d 2 − 2 along H 1 + H 2 . The representation (37) can then be used (basically by a Cauchy-Schwarz argument, similar to one used in [34]) to establish that f is structured of order < d 1 + d 2 − 1 along H 1 + H 2 .
Remark 4.1. We were not able to directly adapt this argument to give a purely ergodic theory proof of Theorem 1.15 or Theorem 1.16, mainly due to technical problems defining the notion of "uniform almost periodicity" in the ergodic context, and in ensuring that this almost periodicity was uniformly controlled with respect to parameters such as n, n , h, h . Instead, the natural ergodic analogue of this argument appears to be the variant inclusion under the hypotheses of Theorem 1.19, where the Furstenberg factors F <d H (X) [7] are defined recursively by setting F <1 H (X) = X H to be the invariant factor and F <d+1 H (X) to be the maximal compact extension of F <d H (X). This inclusion can be deduced from [9, Proposition 2.3] (which was already used in Section 3) and an induction on d 1 + d 2 ; we leave the details to the interested reader. We remark that the proof of [9, Proposition 2.3] can be viewed as a variant of the arguments sketched in this section. The Furstenberg factors F <d H (X) are, in general, larger than the Host-Kra factors Z <d H (X), because the cocycles in the latter must obey the type condition in Proposition 3.7, whereas the former factors have no such constraint.

Taking ultraproducts
To prove our main combinatorial theorems rigorously, it is convenient to use the device of ultraproducts to pass to a nonstandard analysis formulation, in order to hide most of the "epsilon management", as well as to exploit infinitary tools such as countable saturation, Loeb measure and conditional expectation. The use of nonstandard analysis to analyze Gowers uniformity norms was first introduced by Szegedy [32], [33], and also used by Green and the authors in [28].
We quickly set up the necessary formalism. (See for instance [11] for an introduction to the foundations of nonstandard analysis that is used here.) We will need to fix a non-principal ultrafilter α ∈ β N\N on the natural numbers, thus α is a collection of subsets of natural numbers such that the function A → 1 A∈α forms a finitely additive {0, 1}-valued probability measure on N, which assigns zero measure to every finite set. The existence of such a non-principal ultrafilter is guaranteed by the axiom of choice. We refer to elements of α as α-large sets.
We assume the existence of a standard universe U -a set that contains all the mathematical objects of interest to us, in particular containing all the objects mentioned in the theorems in the introduction. Objects in this universe will be referred to as standard objects. A standard set is a set consisting entirely of standard objects, and a standard function is a function whose domain and range are standard sets. The standard universe will not need to obey all of the usual ZFC set theory axioms (though one can certainly assume this if desired, given a suitable large cardinal axiom); however we will at least need this universe to be closed under the ordered pair construction x, y → (x, y), so that the Cartesian product of finitely many standard sets is again a standard set.
A nonstandard object is an equivalence class of tuples (x n ) n∈A of standard objects indexed by an α-large set A, with two tuples (x n ) n∈A , (y n ) n∈B equivalent if they agree on an α-large set. We write lim n→α x n for the equivalence class associated with a tuple (x n ) n∈A , and refer to this nonstandard object as the ultralimit of the x n . Thus for instance a nonstandard natural number is an ultralimit of standard natural numbers, a nonstandard real number is an ultralimit of standard real numbers, and so forth. If (X n ) n∈A is a sequence of standard sets indexed by an α-large set A, we define the ultraproduct ∏ n→α X n to be the collection of all ultralimits lim n→α x n , where x n ∈ X n for an α-large set of n. An internal set is a set which is an ultraproduct of standard sets. We use the term external set to denote a set of nonstandard objects that is not necessarily internal. Note that every standard set X embeds into the nonstandard set * X := ∏ n→α X, which we call the ultrapower of X, by identifying every standard object x with its nonstandard counterpart lim n→α x. In particular, the standard universe U embeds into the nonstandard universe * U of all nonstandard objects.
If X = ∏ n→α X n and Y = ∏ n→α Y n are internal sets, we have a canonical isomorphism which (by abuse of notation) allows us to identify the Cartesian product of two internal sets as another internal set. Similarly for Cartesian products of any (standard) finite number of internal sets. We will implicitly use such identifications in the sequel without further comment. An internally finite set is an ultraproduct of finite sets (such sets are also known as hyperfinite sets in the literature). Similarly with "set" replaced by "multiset". (The multiplicity of an element of an internally finite multiset will of course be a nonstandard natural number in general, rather than a standard natural number.) Given a sequence ( f n ) n∈A of standard functions f n : X n → Y n indexed by an α-large set A, we define the ultralimit lim n→α f n : ∏ n→α X n → ∏ n→α Y n to be the function This is easily seen to be a well-defined function. Functions that are ultralimits of standard functions will be called internal functions. We use the term external function to denote a function between external sets that is not necessarily an internal function. We will use boldface symbols such as f to refer to internal functions, distinguishing them in particular from functions f that take values in the standard complex numbers C rather than the nonstandard complex numbers * C.
Using the ultralimit construction, any ultraproduct X = ∏ n→α X n of structures X n for some first-order language L, will remain a structure of that language L; furthermore, thanks to the well-known theorem of Łos, any first-order sentence will hold in X if and only if it holds in X n for an α-large set of n. For instance, if G n is an additive group for an α-large set of n, then ∏ n→α G n will also be an additive group.
A crucial property of internal sets for us will be the following compactness-like property of internal sets.

Theorem 5.1 (Countable saturation).
(i) Let (X (i) ) i∈N be a countable sequence of internal sets. If the finite intersections N i=1 X (i) are non-empty for every (standard) natural number N, then ∞ i=1 X (i) is also non-empty.
(ii) Let X be an internal set. Then any countable cover of X by internal sets has a finite subcover.
Proof. It suffices to prove (i), as the claim (ii) follows from taking complements and contrapositives. Write is non-empty for all n ∈ A N . By shrinking the A N as necessary, we may assume that the A N are nonincreasing in N. If we then choose x n for any n in A N \A N+1 to lie in N i=1 X (i) n for all N, the ultralimit lim n→α x n lies in ∞ i=1 X (i) , giving the claim.
A nonstandard complex number z ∈ * C is said to be bounded if one has |z| ≤ C for some standard C, and infinitesimal if |z| ≤ ε for all standard ε > 0. We write z = O(1) when z is bounded and z = o(1) when z is infinitesimal. By modifying the proof of the Bolzano-Weierstrass theorem, we see that every bounded z can be uniquely written as z = stz + o(1) for some standard complex number stz, known as the standard part of z.
Countable saturation has the following important consequence: Corollary 5.2 (Overspill/underspill). Let A be an internal subset of * C.
(i) If A contains all standard natural numbers, then A also contains an unbounded natural number.
(ii) If all elements of A are bounded, then A is contained in {z ∈ * C : |z| ≤ C} for some standard C > 0.
(iii) If all elements of A are infinitesimal, then A is contained in {z ∈ * C : |z| ≤ ε} for some infinitesimal ε > 0.
(iv) If A contains all positive standard reals, then A also contains a positive infinitesimal real.
Proof. If (i) failed, then we would have N = A ∩ * N, and hence N would be internal, which contradicts Theorem 5.1(ii). The claim (ii) follows from the contrapositive of (i) applied to the internal set {n ∈ * N : |z| ≥ n for some z ∈ A}. The claim (iii) similarly follows from (i) applied to the internal set {n ∈ * N : |z| ≤ 1/n for all z ∈ A}. Finally, the claim (iv) follows from (i) applied to the internal set {n ∈ * N : 1/n ∈ A}.
An internal function f : X → * C is said to be bounded if it is bounded at every point, or equivalently (thanks to overspill or countable saturation) if there is a standard C such that | f (x)| ≤ C for all x ∈ X, and we denote this assertion by f = O(1). Similarly, an internal function f : X → * C is said to be infinitesimal if it is infinitesimal at every point, or equivalently (thanks to underspill or countable saturation) if there is an infinitesimal ε > 0 such that | f (x)| ≤ ε for all x ∈ X, and we denote this assertion by f = o(1).
Let X n = (X n , B n , µ n ) be a sequence of standard probability spaces, indexed by an α-large set A. In the ultraproduct X := ∏ n→α X n , we have a Boolean algebra B of internally measurable sets -that is to say, internal sets of the form E = ∏ n→α E n , where E n ∈ B n for an α-large set of n. Similarly, we have a complex *-algebra A[X] of bounded internally measurable functions -functions f : X → * C that are ultralimits of measurable functions f n : X n → C, and which are bounded. We have a finitely additive nonstandard probability measure * µ := lim n→α µ n : B → * [0, 1], and a finitely additive nonstandard integral X f d * µ := lim n→α X n f n dµ n defined for bounded internally measurable functions f = lim n→α f n ∈ A[X]. The standard part µ := st * µ of the nonstandard measure * µ is then an (external) finitely additive probability measure on B. From Theorem 5.1(ii), this finitely additive measure is automatically a premeasure, and so by the Carathéodory extension theorem it may be extended to a countably additive measure µ : L → [0, 1] to the σ -algebra L generated by B. The space X := (X, L, µ) is then known as the Loeb probability space associated to the standard probability spaces X n . By construction, every Loeb-measurable set E ∈ L can be approximated up to arbitrarily small error in µ by an internally measurable set. As a corollary, for any (standard) 1 ≤ p < ∞, then any function in L p (X) can be approximated to arbitrary accuracy in L p (X) norm by the standard part of a bounded internally measurable function, that is to say stA[X] is dense in L p (X). Indeed, we can say a little more: Proof. We may normalise f to lie in the closed unit ball of L ∞ (X). By density, we may find a sequence f n ∈ A[X] for n ∈ N bounded in magnitude by 1, such that f − stf n L 1 (X) < 1/n for all n. In particular, f N − f n * L 1 (X) ≤ 2/n for all n ≤ N. By countable saturation, there thus exists an internally measurable function f bounded in magnitude by 1 such that f − f n * L 1 (X) ≤ 2/n for all n. Taking limits we see that f − stf L 1 (X) = 0, and the claim follows.
By working first with simple functions and then taking limits, we easily establish the identity X stf dµ = st If H = ∏ n→α H n is an internally finite non-empty multiset, and h → f h is an internal map from H to A[X], then the internal function h → f h * L ∞ (X) is bounded, and hence (by Corollary 5.2) its image is bounded above by some standard C, so that {stf h : h ∈ H} is bounded in L ∞ (X). We can also define the internal average E h∈H f h as where h → f h is the ultralimit of the maps h n → f h n ,n . We have a basic fact about the location of this average: Proof. If this were not the case, then by the Hahn-Banach theorem there would exist ε > 0 and g ∈ L 2 (X) such that Re stE h∈H f h , g L 2 (X) > Re stf h , g L 2 (X) + ε for all h ∈ H. By truncation (and the boundedness of {stf h : h ∈ H}) we may assume (after shrinking ε slightly) that g ∈ L ∞ (X), and then by Lemma 5.3 we may write g = stg for some g ∈ A[X]. But then we have which on taking internal averages implies which is absurd.

Translation to nonstandard setting
We now translate Theorems 1.11, 1.13 to a nonstandard setting. Let G = ∏ n→α G n be a nonstandard additive group, that is to say an ultraproduct of standard additive groups G n . Define a internal coset progression Q in G to be an ultraproduct Q = ∏ n→α Q n of standard coset progressions Q n in G n . We will be interested in internal coset progressions of bounded rank, which is equivalent to Q n having bounded rank on an α-large set of n. Given a internal coset progression Q, we define the (external) set o(Q) as o(Q) := ε∈R + (εQ), that is to say the set of all x ∈ G such that x ∈ εQ for all standard ε > 0. Here we are interpreting εQ and o(Q) as sets rather than multisets.
Given a sequence (X n , T n ) of standard G n -sytems, we can form a G-system (X, T ) by setting X to be the Loeb probability space associated to the X n , and T g to be the ultralimit of the T g n n for any g = lim n→α g n . It is easy to verify that this is a G-system; we will refer to such systems as Loeb G-systems.
Let d be a standard positive integer, and let Q = ∏ n→α Q n be an internal coset progression. If f = lim n→α f n ∈ A[X] is a bounded internally measurable function, we can define the internal Gowers norm

From the Hölder and triangle inequalities one has
for each n, and hence on taking ultralimits and then on taking standard parts .
This can be rewritten in turn as The claim (39) now follows from the Cauchy-Schwarz-Gowers inequality (18). (X) on L 2 d (X), and hence on L ∞ (X).
We can now state the nonstandard theorem that Theorem 1.11 will be derived from.
Theorem 5.6 (Concatenation theorem for anti-uniformity norms, nonstandard version). Let Q 1 , Q 2 be internal coset progressions of bounded rank in a nonstandard additive group G, let (X, T ) be a Loeb G-system, let d 1 , d 2 be standard positive integers, and let f lie in the closed unit ball of L ∞ (X). We make the following hypotheses: Let us assume this theorem for the moment, and show how it implies Theorem 1.11. It suffices to show that for any d 1 , d 2 , r 1 , r 2 , c 1 , c 2 as in Theorem 1.11, and any δ > 0, there exists an ε > 0, such that Suppose this were not the case. Then there exists d 1 , d 2 , r 1 , r 2 , c 1 , c 2 , δ as above, together with a sequence G n of standard additive groups, sequences Q 1,n , Q 2,n of standard coset progressions in G n of ranks r 1 , r 2 respectively, a sequence (X n , T n ) of G n -systems, and a sequence f n of functions in the closed unit ball of L ∞ (X) such that f n U d 1 and f n U d 2 Q 2,n (X n ) * ,ε ≤ c 2 (ε) for all (standard) ε > 0, but such that for some ε n > 0 that goes to zero as n → ∞. Now we take ultraproducts, obtaining a nonstandard additive group G, internal coset progressions Q 1 , Q 2 of bounded rank in G, a Loeb G-system (X, T ), and a bounded internally measurable function f := lim n→α f n . Since the f n lie in the closed unit ball of L ∞ (X n ), we see that stf lies in the closed unit ball of L ∞ (X). Now suppose that g ∈ L ∞ (X) is such that g U d 1 o(Q 1 ) (X) = 0; we claim that stf is orthogonal to g. We may normalise g to lie in the closed unit ball of L ∞ (X). By Lemma 5.5 one has g U d 1 εQ 1 (X) = 0 for all standard ε > 0. We can write g as the limit in L 2 d 1 (X) of stg (i) as i → ∞, for some g (i) ∈ A[X] that are bounded in magnitude by 1. Since the L 2 d 1 (X) norm controls the U d 1 εQ 1 (X) semi-norm, we have for any standard δ > 0 that stg (i) for sufficiently large i and all ε > 0. In particular, writing g (i) = lim n→α g (i) n and setting ε = 2δ , we have for sufficiently large i that for an α-large set of n. By (40) we conclude that and thus on taking ultralimits and then standard parts and hence on sending i to infinity | stf, g L 2 (X) | ≤ c 1 (2δ ).
Sending δ to zero, we obtain the claim. For similar reasons, stf is orthogonal to any g ∈ L ∞ (X) with g U d 2 o(Q 2 ) (X) = 0. Applying Theorem 5.6, we conclude that stf is orthogonal to any g ∈ L ∞ (X) with g U d 1 +d 2 −1 o(Q 1 +Q 2 ) (X) = 0. On the other hand, from (41) and (9), we can find a g n in the closed unit ball of L ∞ (X n ) for each n such that g n U d 1 +d 2 −1 εnQ 1,n +εnQ 2,n (X n ) ≤ ε n and | f n , g n L 2 (X n ) | ≥ δ .
Setting g := st lim n→α g n , we conclude on taking ultralimits that g is in the closed unit ball of L ∞ (X) with for some infinitesimal ε > 0 and | stf, g L 2 (X) | ≥ δ .
But from Lemma 5.5 we have g U d 1 +d 2 −1 o(Q 1 +Q 2 ) (X) = 0 and we contradict the previously established orthogonality properties of stf. This concludes the derivation of Theorem 1.11 from Theorem 5.6.
An identical argument shows that Theorem 1.13 is a consequence of the following nonstandard version.

Nonstandard dual functions
We now develop some nonstandard analogues of the machinery in Section 2, and use this machinery to prove Theorem 5.6 and Theorem 5.7 (and hence Theorem 1.11 and Theorem 1.13 respectively).
Let (X, T ) be a Loeb G-system for some nonstandard abelian group G, and let Q 1 , . . . , Q d be internal coset progressions of bounded rank in G for some standard d ≥ 0. Given bounded inter- where ω = (ω 1 , . . . , ω d ), |ω| := ω 1 + · · · + ω d , and C : f → f is the complex conjugation operator. (When d = 0, we adopt the convention that * D 0 () = 1.) This is a multilinear map from A[X] 2 d −1 to A[X], and from the definition of the internal box norm * d Q 1 ,...,Q d (X) we have the identity for any f ∈ A[X]. From Hölder's inequality and the triangle inequality we see that By a limiting argument and multilinearity, we may thus uniquely define a bounded multilinear dual operator D d Q 1 ,...,Q d : for all f ω ∈ L 2 d (X), and such that An important fact about dual functions, analogous to Theorem 2.4, is that the dual operator maps factors to characteristic subfactors for the associated Gowers norm: Theorem 6.1 (Dual functions and characteristic factors). Let (Y, T ) be a factor of (X, T ), and let Q 1 , . . . , Q d be internal coset progressions of bounded rank. Let V be the linear span of the space  We now prove this theorem. From (i) and (ii) it is clear that Z <d o(Q 1 ),...,o(Q d ) (Y) is unique; the difficulty is to establish existence. We first observe Lemma 6.2. We have V ⊂ L ∞ (Y).

Proof. It suffices by linearity to show that
But such functions lie in a bounded subset of L ∞ (Y), and the claim follows.
Next, we observe that V is almost closed under multiplication (cf. Proposition 2.3): 3. If f , f ∈ V , then f f is the limit (in L 2 (X), or equivalently in L 2 d /(2 d −1) (X)) of a sequence in V that is uniformly bounded in L ∞ norm.
Proof. By linearity and density, as well as Lemma 5.3 and (43), we may assume that , which we may take to be real-valued. We can thus write Let ε * > 0 be infinitesimal. We can shift εQ i − εQ i or ε Q i − ε Q i by an element of ε * Q i − ε * Q i while only affecting the above average by o (1). We conclude that Performing the k 1 , . . . , k d average first, we conclude that This bound holds for all infinitesimal ε * > 0. By overspill, we conclude that for any standard δ > 0 we have for all sufficiently small standard ε * > 0. By Lemma 5.4, the second term inside the norm is the limit in L 2 of a bounded sequence in V , and the claim follows.
Let W denote the space of all functions in L ∞ (X) which are the limit (in L 2 d /(2 d −1) ) of a sequence in V that is uniformly bounded in L ∞ norm. From the previous two lemmas we see that W is a subspace of L ∞ (Y) that is closed under multiplication, and which is also closed with respect to limits in L 2 d /(2 d −1) of sequences uniformly bounded in L ∞ . If we let Z denote the set of all measurable sets E in Y such that 1 E lies in W , we then see that Z is a σ -algebra. From the translation invariance identity we see that V , and hence W and Z, are invariant with respect to shifts T n , n ∈ G. If we set Z In particular, f is not orthogonal to V , and hence to L ∞ (Z <d o(Q 1 ),...,o(Q d ) (Y)). Conversely, suppose that By the (internal) Cauchy-Schwarz-Gowers inequality (18), we thus have We conclude that f is orthogonal to V , and hence to L ∞ (Z <d o(Q 1 ),...,o(Q d ) (Y)). This concludes the proof of Theorem 6.1.
We record a basic consequence of Theorem 6.1 (cf. Corollary 2.5 or [30, Proposition 4.6]): Corollary 6.4 (Localisation). Let X be a Loeb G-system for some nonstandard additive group G, let Y be a factor of X, and let Q 1 , . . . , Q d be internal coset progressions of bounded rank in G for some standard positive integer d. Then Proof Remark 6.5. Following the informal sketch in Section 4, functions that are measurable with respect to Z <d o(Q 1 ),...,o(Q d ) (X) should be viewed as being "structured" along the directions Q 1 , . . . , Q d . The factor Y represents some additional structure, perhaps along some directions unrelated to Q 1 , . . . , Q d . Corollary 6.4 then guarantees that this additional structure is preserved when one performs such operations as orthogonal projection to L 2 (Z <d o(Q 1 ),...,o(Q d ) (X)).
For the purpose of concatenation theorems, the following special case of Corollary 6.4 is crucial (cf. Corollary 1.20): Corollary 6.6. Let (X, T ) be a Loeb G-system for some nonstandard additive group G, and let Q 1,1 , . . . , Q 1,d 1 , Q 2,1 , . . . , Q 2,d 2 be internal coset progressions of bounded rank. Then Proof. The first claim is immediate from Corollary 6.4. For the second claim, note from the first claim and Theorem 6.1 that f 1 has vanishing d 2 o(Q 2,1 ),...,o(Q 2,d 2 ) (X) norm, and the claim follows from a second application of Theorem 6.1.
We now briefly discuss the proof of Theorem 5.6, which is proven very similarly to Theorem 5.7. With d 1 , d 2 , G, Q 1 , Q 2 , X, T as in that theorem, our task is to show that f , g are orthogonal whenever The base case d 1 = d 2 = 1 is treated as before, so assume inductively that d 1 + d 2 > 2 and the claim has already been proven for smaller values of d 1 + d 2 . As before, we reduce to the case where f is expressible in terms of f ω , f ω,ω , f, f ω , f ω,ω as in the preceding argument (with Q 1,i = Q 1 and Q 2, j = Q 2 ), and again arrive at the representation (49). Repeating the previous analysis of stc n,h , but now using the inductive hypothesis for Theorem 5.6 rather than Theorem 5.7 (and not using the monotonicity of Gowers box norms), we see that stc n,h is measurable with respect to Z <d 1 +d 2 −2 (o(Q 1 +Q 2 ),...,o(Q 1 +Q 2 ) for any n, h ∈ G; similarly for stc n ,h and stc n 1 ,n 1 ,h,h c n 2 ,n 2 ,h,h . Repeating the previous arguments (substituting Gowers box norms by Gowers uniformity norms as appropriate), we conclude Theorem 5.6.

Proof of qualitative Bessel inequality
We now prove Theorem 1.23. The proof of Theorem 1.24 is very similar and is left to the interested reader. Our arguments will be parallel to those used to prove Corollary 1.22, but using a nonstandard limit rather than an ergodic limit.
It will be convenient to reduce to a variant of this theorem in which the index set I has bounded size. More precisely, we will derive Theorem 1.23 from Theorem 8.1. Let M ≥ 1, and let Q 1 , . . . , Q M be a finite sequence of coset progressions Q i , all of rank at most r, in an additive group G. Let X be a G-system, and let d be a positive integer. Let f lie in the unit ball of L ∞ (X), and suppose that for some coefficients c ξ with ∑ ξ ∈Z/NZ |c ξ | 1. By the pigeonhole principle, we may thus find ξ 1 , ξ 2 , ξ 3 ∈ Z/NZ such that E x,a,b,c∈Z/NZ ∏ ω 1 ,ω 2 ,ω 3 ∈{0,1} f i (x + ω 1 a + ω 2 b + ω 3 c) e 2πi(ξ 1 a+ξ 2 b+ξ 3 c)/N 1.
Writing e 2πiξ 1 a/N = e 2πiξ 1 (x+a)/N e −2πix/N , and similarly for the other two phases in the above expression, we thus have

Reducing the complexity
In this section we sketch how the U 3 norm in Proposition 1.26 can be lowered to U 2 . The situation here is closely analogous to that of [25, Theorem 7.1], and we will assume familiarity here with the notation from that paper. That theorem was proven by combining an "arithmetic regularity lemma" that decomposes an arbitrary function into an "irrational" nilsequence, plus a uniform error, together with a "counting lemma" that controls the contribution of the irrational nilsequence. The part of the proof of [25, Theorem 7.1] that involves the regularity lemma goes through essentially unchanged when establishing Proposition 1.26; the only difficulty is to establish a suitable counting lemma for the irrational nilsequence, which in this context can be taken to a nilsequence of degree ≤ 2. More precisely, we need to show using the asymptotic notation conventions from [25].