An Analytic Approach to Sparse Hypergraphs: Hypergraph Removal

The use of tools from analysis to approach problems in graph theory has become an active area of research. Usually such methods are applied to problems involving dense graphs and hypergraphs; here we give the an extension of such methods to sparse but pseudorandom hypergraphs. We use this framework to give a proof of hypergraph removal for sub-hypergraphs of sparse random hypergraphs.


Introduction
In this paper we attempt to bring together two recent trends in extremal graph theory: the study of "sparse random" analogs of density theorems, and the use of methods from analysis and logic to handle complex dependencies of parameters.
To illustrate these methods, we will prove a version of the Hypergraph Removal Lemma for dense sub-hypergraphs of sparse but sufficiently pseudorandom hypergraphs.The original removal theorem was Rusza and Szemerédi's Triangle Removal Lemma [27], which states: Theorem 1.1.For any every ǫ > 0, there is a δ > 0 such that whenever G ⊆ V 2 is a graph with at most δ|V | 3 triangles, there is a set C ⊆ G with |C| ≤ ǫ|V | 2 such that G \ C contains no triangles at all.This result was later extended to graphs other than triangles [8], and ultimately to hypergraphs [10,15,25].All these arguments depend heavily on the celebrated Szemerédi Regularity Lemma [29], and its generalization, the hypergraph regularity lemma [15,26].(Recently, Fox [9] has given a proof of graph removal without the use of the regularity lemma, which gives better bounds as a result.) There has been a growing interest in analytic approaches to graph theory.Probably the most widely studied approach is the method of graph limits and graphons introduced by Lovász and coauthors [2,23,24].Related but distinct approaches have been studied by Hrushovski, Tao, and others [1,13,16].Analytic proofs of regularity and removal lemmas have been giving using all these methods [7,30,31,34].These techniques obtain a correspondence between a sequence of arbitrarily large finite graphs on the one hand, and some sort of infinitary structure on the other.Statements about density fit naturally in these frameworks since the normalized counting measure on large finite graphs corresponds to an ordinary measure on the infinitary structure.
In this paper, we describe a similar correspondence which applies to subhypergraphs of sparse, pseudorandom hypergraphs.In the finite setting, the natural replacement for the normalized counting measure is the counting measure normalized by the ambient hypergraph.This introduces new complications in the infinitary world: we end up with a natural measure on sets of k-tuples which is not a genuine product measure.(This perspective on the problem was suggested to us by Hrushovski.)In place of a single measure, we end up with a family of measures, and the pseudorandomness from the finitary setting is used to ensure that this family of measures obeys certain compatibility properties.
We use this method to give an analytic proof of sparse hypergraph removal.Our proof is heavily inspired by the combinatorial proof of triangle removal in sparse graphs [20,21].Very recently, combinatorial proofs of sparse hypergraph removal [5,11,28] have also been given by several authors.
Our approach to hypergraph removal depends heavily on the use of the Gowers uniformity (semi)norms [14].As Conlon and Gowers point out [5], such an approach cannot hope to give optimal bounds, and, relatedly, depends on a much stronger notion of pseudorandomness than strictly needed.We stick to this method both because we believe these norms are interesting in their own right, and because we believe it illustrates the analytic approach to sparse hypergraphs more clearly than an attempt to derive optimal bounds would.
In [13], Isaac Goldbring and the author proposed a general framework for handling analytic arguments of this type, which we called approximate measure logic.In this paper, there is no assumption that the reader is familiar with that particular framework, but we pass quickly over the logical preliminaries, and refer the reader to that paper for more detailed exposition.
We now give a brief outline of the paper.Sections 3 and 4 culminate in a proof of the dense hypergraph removal lemma, giving an outline of how we will prove sparse hypergraph removal.The reader interested primarily in the formal framework for handling sparse hypergraphs may wish to read Section 2 (which simply introduces notational conventions we use throughout the paper) and then skip to Section 5, where we actually introduce the framework.
Section 3 introduces the σ-algebras B V,I , which contain those sets of tuples which can be defined (approximately) using only certain restricted sets of coordinates.(For instance, the "cut-norm" used in the theory of graph limits is closely related to the norm of the projection of a function onto the simplest non-trivial example, B V,1 .)Proofs of hypergraph removal are typically divided into a regularity lemma and a counting lemma; in Section 4 we define the notion of measure having regularity-an analog of satisfying an infinitary analog of the regularity lemma-and then prove a counting lemma.We then show that the measures corresponding to dense hypergraphs have regularity, giving a proof of the ordinary (dense) hypergraph removal lemma.This forms the outline of our approach, and we spend the rest of the paper showing that certain sparse measures also have regularity.
Section 5 finally introduces our formal framework: we define the notion of a canonical family of measures, a collection of measures having certain joint properties, and show that the ultraproduct of sufficiently random finite graphs gives us such a family.Section 6 then introduces the Gowers uniformity seminorms and begins the project of showing that, under suitable conditions, a function has positive uniformity seminorm exactly when it correlates with certain σ-algebras.Finally, in Section 7 we complete the proof of this relationship for canonical families of measures, and show that We are grateful to Ehud Hrushovski for providing the crucial motivating idea, and to Isaac Goldbring for many helpful discussions on this topic.

Notation
Throughout this paper we use a slightly unconventional notation for tuples which is particularly conducive to our arguments.When V is a finite set, a Conversely, if we have specified a V -tuple x V , we often write x v for x V (v).When V, W are disjoint sets, we write x V ∪ x W for the corresponding V ∪ W -tuple.When I ⊆ V and x V is a given V -tuple, we write x I for the corresponding I-tuple: x I (i) = x V (i) for i ∈ I.We write 0 V for the tuple which is constantly equal to 0. (This is the only constant tuple we will explicitly refer to.) One of the key tools in this paper will be the use of the collection of definable sets in a model of first-order logic.We will refer to our models as M, N, and to the corresponding universes of these models as M, N respectively.We will refer to formal variables in the language of first-order logic with the letter w, reserving the letters x, y and so on for elements of models (for instance, when integrating over a model).We will often refer to fixed elements of a model (used as constants or parameters) with the letters a, b, c.In keeping with our tuple notation, we will often refer to finite sets of variables as w V , w W , etc..
Recall that when ϕ is a formula with free variables w V , M is a model of first-order logic, and x V ∈ M V , we write M ϕ(x V ) to indicate that the formula holds when we interpret each free variable w v by the element When the model M is clear from context, we will often equate formulas with the sets they define-for instance, if B is a definable set, we will also consider B to be the formula defining this set, so by abuse of notation, If a set has multiple groups of parameterssay, B ⊆ M W ∪V -we will write B(a W ) for the slice {x V | a W ∪ x V ∈ B} corresponding to those coordinates.We say B is definable from parameters if B = C(a W ) for some definable set C.
Similarly, when f is a simple function built from sets definable from parameters, so f = i α i χ C i where each α i is rational and each C i is definable from parameters, we sometimes view f as being a "rational linear combination" of formulas, and refer to the union of the parameters defining all the sets C i as the parameters of f .

σ-Algebras
Models come equipped with certain natural σ-algebras.Definition 3.1.Let M be a model and let V be a finite set of indices.We define B 0 V to be the Boolean algebra generated of subsets of M V definable from parameters.
For I ⊆ V , we define B 0 V,I to be the Boolean algebra generated by subsets of M n of the form V,I for the Boolean algebra generated by For any I ⊆ V , we write <I for the set of proper subsets of I.The principal algebras are those of the form In all cases, we drop the superscript 0 to indicate the σ-algebra generated by the algebra.
The algebras B 0 V,I are generally uncountable, and so the corresponding σ-algebras B V,I are generally non-separable.It is possible to recover separability by allowing only formulas whose parameters come from an elementary submodel.(This causes some additional complications, since the slices of some set A ⊆ M 2 are no longer necessarily measurable; rather, the slices are measurable with respect to some slightly larger σ-algebra which depends on the choice of slice.These complications can be addressed by a small amount of additional model-theoretic work; this separable approach is used in [13,34].)These σ-algebras are closely related to the Szemerédi Regularity Lemma; for instance, in [13] it is shown that the usual regularity lemma follows almost immediately from the existence of the projection of a set onto B {1,2},1 .
Note that, while a σ-algebra is well-defined independently of the choice of a particular measure, notions like the projection onto a σ-algebra do depend on a particular choice of measure.
The simplest interesting case of these algebras is B {1,2},1 , which is a σalgebra on M 2 consisting of sets of pairs which can be "defined one coordinate at a time"-B {1,2},1 is generated by sets of the form (The first introduction of these algebras is that we know of is in [32], where Tao already notes the relationship with the Gowers uniformity norms which we will discuss in detail below.The work in this paper builds on further developments in [31,33].) There is some flexibility in the choice of the set I; for instance, B {1,2,3},{{1,2}} = B {1,2,3},{{1,2},{1}} (since {1, 2} ∈ I, we may already use formulas which refer only to the coordinate 1, so adding {1} does nothing).This leads to two canonical choices for I: a minimal choice with only the sets of coordinates absolutely necessary, or a maximal choice which adds every set of coordinates allowed without changing the meaning.Depending on the situation, we want one or the other canonical form.Lemma 3.2.If for every I ∈ I there is an But this is easily seen from the definition, since a formula containing only the variables w I certainly contains only the variables w I ′ .Corollary 3.3.For any V, I, there exist I 0 , I 1 such that: (1) If I, J ∈ I 1 then J ⊆ I.
Definition 3.4.Given I, J ⊆ P(V ), we define I ∧ J to consist of those sets K such that there is an I ∈ I and a J ∈ J such that K ⊆ I ∩ J.

Hypergraph Removal
In this section, we present a proof of the ordinary hypergraph removal theorem, essentially the one given in [34], which is in turn based on the arguments in [30,31].We first state a necessary property on measures, and prove a lemma reminiscent of the hypergraph counting lemma.Definition 4.1.Let ν V be a probability measure on B V .We say ν V has J-regularity for J ⊆ V if: Suppose I ⊆ P(V ) and for each I ∈ I, I ∩ J I. For each When J = V , this is trivial.
Theorem 4.2.Suppose ν V has J-regularity for all J ⊆ V with |J| ≤ k, that k < |V |, I ⊆ V k , and for each I ∈ I we have a set A I ∈ B V,I .Further, suppose there is a δ > 0 such that whenever B I ∈ B 0 V,I and ν V (A I \ B I ) < δ for all I ∈ I, I∈I B I is non-empty.Then ν V ( I∈I A I ) > 0.
Proof.We proceed by main induction on k.When k = 1, the claim is trivial: we must have ν V (A I ) > 0 for all I, since otherwise we could take B I = ∅; then ν V ( A I ) = ν V (A I ) > 0. So we assume that k > 1 and that whenever B I ∈ B V,I and ν V (A I \ B I ) < δ for all I, I∈I B I is non-empty.Throughout this proof, the variables I and I 0 range over I.
Claim 1.For any I 0 , there is an A ′ I 0 ∈ B V,<I 0 such that: Suppose that for each )dν V = 0, we have ν V (A I 0 \ B I 0 ) < δ as well, and therefore I∈I B I is non-empty.⊣ By applying the previous claim to each I ∈ I, we may assume for the rest of the proof that for each I, A I ∈ B V,<I .
Fix some finite algebra B ⊆ B 0 V,k−1 so that for every I, such a B exists because there are finitely many I and each . By Chebyshev's inequality, the measure of this set is at most Proof.For each I 0 , ⊣ Each A * I may be written in the form i≤r I A * I,i where A * i I ,J,I ).
For each i ∈ I [1, r I ], let D i = I J∈( I k−1 ) A * i I ,J,I .Each A * I,i I ,J is an element of B 0 V,J , so we may group the components and write D i = J∈( V k−1 ) D i,J where D i,J = I⊃J A * I,i I ,J .Suppose, for a contradiction, that ν V ( I A * I ) = 0. Then for every i ∈ Proof.Suppose x ∈ I B * I = I i≤r I J B * I,i,J .Then for each I, there is an i I ≤ r I such that x ∈ J B * I,i I ,J .Since B * I,i I ,J ⊆ A * I,i I ,J , for each I and J ⊂ I, x ∈ A * I,i I ,J .For any J, let I ⊃ J. Then ) for the particular i we have chosen.Since x ∈ A * I,i I ′ ,J for each I ′ ⊃ J, it must be that x ∈ B i,J .This holds for any J, so x ∈ J B i,J .⊣ From our assumption, I B * I is non-empty, and therefore there is some i such that J B i,J .But this leads to a contradiction, so it must be that ν V ( I A * I ) > 0, and therefore, as we have shown, In order to prove the hypergraph removal theorem, we would then hope to argue as follows: the failure of hypergraph removal implies the existence of a family of counterexamples of unbounded size.We could then take the ultraproduct of these counterexamples to obtain an infinite model in which ν V ( I∈( V k ) A I ) = 0 for a family of sets A I corresponding to the graph we are trying to remove.By the previous theorem, we would have an arbitrarily small family of definable sets B I , and we would then argue that that these sets correspond to sets in the finite models whose removal causes the removal of all copies of the hypergraph.The only remaining difficulty in this argument is showing that the measure we obtain has J-regularity for all J ⊆ V .
In the remainder of this section, we carry out the proof for the dense case.
Definition 4.3.Let ϕ(w V , w W ) be a formula with the displayed free variables.Then there is a corresponding function . We say ν is an definable Keisler probability measure if for every formula ϕ with parameters, ν ϕ is continuous with respect to the topology generated by B 0 W . Lemma 4.4.Suppose that for each J ⊆ V , ν J is a definable Keisler probability measure on B J such that for any Proof.We have For each a V \J , the function I f I (a I\J , x I∩J ) is measurable with respect to B V,<J , so we have Since this holds for every a V \J , the claim follows by integrating over all choices of a V \J .Definition 4.5.Let K, A be k-uniform hypergraphs on vertex sets V (K), V (A) respectively.π : Theorem 4.6.For every k-uniform hypergraph K and constant ǫ > 0, there is a δ such that whenever A is a finite k-uniform hypergraph with d(K, A) < δ, there is a subset Proof.Suppose not.Let K, ǫ be a counterexample, and since there is no such δ, for each n we may choose a k-uniform hypergraph ), and predicates making the normalized counting measure In particular, this means the counting measure is a uniformly definable Keisler probability measure. Let and therefore Now take an ultraproduct of the models (M n , A n , . ..) to obtain M = (M, A, . ..). (See [12] for the construction and, in particular, the demonstration that the measures defined by ν J , the ultraproduct of the ν J n , extend to probability measures on B J .)By [17,18] (or see the next section), the decomposition in the statement of Lemma 4.4 holds in M, and therefore ν V has I-regularity for all I ⊆ V .We have ν V ( I∈K A I ) = 0, and therefore by the previous theorem, there are C is definable from parameters in M , and therefore is a formula, which is therefore satisfied by the corresponding set in almost every (M n , A n , . ..).Let C n be the set defined in the model (M n , A n , . ..) by the formula defining C. Then there is some sufficiently large n such that ν contradicting the assumption.Our goal is to obtain the same result when A is not a dense hypergraph, but rather a dense subset of a sparse random graph.The main idea is that we will replace ν V with a measure concentrating on the sparse random graph; however this will not satisfy the easy Fubini decomposition we used for the dense case, so we will need to use the randomness-plus a large amount of additional machinery-to prove that the resulting measures nonetheless have regularity.

Families of Measures
To motivate our construction, we first consider the situation in large finite graphs.Suppose we have a large finite set of vertices G and a sparse random graph Γ on G.There are two natural measures we might consider on subsets of G 2 : the usual normalized counting measure and the counting measure normalized by Γ: When we consider subsets of G 3 , we have even more choices; we could normalize with respect to all possible triangles or only those triangles entirely in Γ or only those triangles where certain specified edges belong to Γ: Indeed, further consideration suggests that we have multiple choices for measures even on subsets of G: in addition to the normalized counting measure, we could fix any element x ∈ G and define When Γ is a k-uniform hypergraph, we have yet more possibilities.We therefore introduce a general notation for referring to all such measures.We first describe this notation in the setting of a large finite graph, but we will primarily use it in the infinitary setting.We assume that a value for k and a k-uniform hypergraph Γ have been fixed.We let V be a finite set of indices, and we describe a family of measures on Note the significance of our notation for tuples here: x e is a k-tuple which may consist both of elements from the fixed set x W and from x V .E specifies which sets of k indices from V ∪ W are required to belong to Γ.For instance, in the case where k = 2, so Γ is a graph, Γ We then define For instance, in the measures above, λ = µ {(1,2),(1,3)},∅ , and When W and x W are clear from context, we just write µ V E for µ V E,x W , and call x W the background parameters of µ V E .When integrating over µ V E,x W , we always assume the variable being integrated is x V .
A key feature of this notation is that it makes it easy to specify the Fubinitype properties that we would like these measures to satisfy.
To avoid having to endlessly specify the restriction of E to the appropriate vertices, we will generally allow E to have extra edges not included in the vertex set V ; for instance, we will not distinguish between µ V 0 E ′ ,x W and µ V 0 E,x W , and will usually write In our infinitary setting, we no longer have the underlying counting measures to refer to, so we will have to define formally the properties we want a family of measures to have.
We will use the meta-variable µ for a family of probability measurestechnically, a function from appropriate finite sets to probability measures, so when µ is a family of probability measures, µ V E,x W is an actual probability measure for suitable values of V , E, x W . Definition 5.1.Let M be a model.A weakly canonical family of probability measures of degree k and size d, µ, consists of, for any sets V, W with .
We say µ is a a canonical family of probability measures if additionally When Weak canonicity merely enforces a certain amount of uniformity on these measures-the parameter x w only matters if there is an edge in E connecting w to V , and the measures don't depend on the particular choice of indices used.The Fubini condition is non-trivial, and it is ensuring this property that requires us to work only with sufficiently random sparse graphs.
We wish to work in models which have two additional features: first, the model actually includes formulas defining the all of the measures in the family µ.Second, the model contains extra function symbols max which pick out values maximizing certain integrals.(The construction of such languages has appeared a few times (see [16,35]; a general theory of constructions of this kind is given in [13]).)Definition 5.2.Let L be a language of first-order logic containing a kary relation symbol γ, and let d be given.L γ,d is the smallest language containing L such that: • Whenever ϕ(w V , w W , w P ) is a formula with the displayed free variables, W is a set disjoint from V , E is a k-uniform hypergraph on V ∪ W with |E| ≤ d, and q ∈ [0, 1] is rational, there are formulas m V E,w W ≤ q.ϕ and m V E,w W < q.ϕ with free variables w W , w P , and a partition of V , W and P are finite sets with V, W, P pairwise disjoint, f is a rational linear combination of formulas with free variables w W , w V , and ϕ(w W , w P , w V ) is a formula with the displayed free variables, for each p ∈ P there is a function symbol max E,V 0 ,f,ϕ p (w W , w V 0 ).
Note that the formulas m V E,w W ≤ q.ϕ and m V E,w W < q.ϕ bind the variables w V .We will "abbreviate" these formulas as m V E,w W (ϕ) ≤ q and m V E,w W (ϕ) < q respectively.We will abbreviate ¬m ≤ q whenever B is definable from parameters, and similarly for m V E,a W (B) < q.Suppose we have interpreted the formula ϕ and all the formulas defining the simple function f .Let B be the set defined by ϕ.For each a W ∈ Note that we consistently use m to refer to the formula of first-order logic describing a measure, and µ to the actual measure corresponding to m.Also, note that in the interpretation of max E,V 0 ,f,ϕ P (a W , x V 0 ), B depends on a W , x V , and b P , while f depends on only a W and x V .
We need to ensure we work with sufficiently random sparse hypergraphs: Definition 5.4.Let Γ be a k-uniform hypergraph on a set of n vertices and let V, W be disjoint sets and W such that there is some partition Let Γ be a k-uniform hypergraph on a set of n vertices.We say Γ is δ, d-suitably random if whenever V, W are disjoint sets with This definition says that "most" x V 0 ∈ Γ V 0 E,a W have roughly the correct number of extensions to elements of Γ V E,a W .This can be seen as a hypergraph generalization of the notion of uniformity used in, for instance, [19] to prove versions of the regularity lemma in sparse random graphs.
Let L be the language consisting of two k-ary relation symbols, γ and α.
Theorem 5.5.Let ǫ > 0. Suppose that for each n, Γ n is a δ n , d-suitably random k-uniform hypergraph where δ n → 0, and let A n ⊆ Γ n be given with Let U be an ultrafilter on N and let M be the ultraproduct of the models M Γn,d n .Then M is a model of L γ,d such that: (2) There is a canonical family of probability measures of degree k and size d, µ V E,x W on the σ-algebra generated by the definable subsets of M V such that whenever B is definable from parameters, a partition of V , W and P are finite sets with V, W, P pairwise disjoint, f is a rational linear combination of formulas with free variables w W , w V , and ϕ(w W , w P , w V ) is a formula with the displayed free variables, for every Proof.
(1) The first part is the standard Loś Theorem for ultraproducts.(2) That the measure µ V E,a W defined in this way extends to a genuine measure on B V is the standard Loeb measure construction.Weak canonicity holds in each finite model, and therefore there are formulas holding in each finite model specifying that weak canonicity holds.By the Loś Theorem, these formulas hold in M, and therefore the measures µ V E,a W are weakly canonical.Note that the formulas satisfied by m V E,a W in M and the actual measure µ V E,a W almost line up: when < q then we can only be sure that µ V E,a W (B) ≤ q.To see that the measures µ V E,a W are actually canonical, it suffices to show that for each Suppose not; then for some set B definable from parameters, there is a set of x W of positive measure such that this equality fails.It follows that for some rational δ > 0 there is a set X 0 of x W of positive measure such that We need to represent the integral in this definition closely enough by a formula to let us define a set of points where this violation occurs.Consider the function (Roughly speaking, the problem is that integrals are not directly definable in our language, and there are "different ways" a function could have a given integral-say, by having a small number of points where the value is large, or a larger number of points where the value is smaller.However we will show that there must be a set of positive measure where the functions f x W not only all have nearly the same integral, but all these integrals can be finitely approximated using the same level sets.This will allow us to write down a formula defining a set of points of positive measure, and with the property that every point satisfying this formula belongs to X 0 .) We may partition the interval [0, 1] into finitely many intervals I i = [δ i , δ i+1 ) of size < δ/8 and with rational endpoints.Let us set . We choose X 1 ⊆ X 0 of positive measure and, for each i, an interval J i = (η j , η ′ j ) with rational end points such that π i (x W ) ∈ J i for each Choose a rational σ > 0 very small, and let . By choosing σ small enough, we may find a set X 2 ⊆ X 1 of positive measure so that for x W ∈ X 2 , each π ′ i (x W ) ∈ J i as well.Now we may consider the set Θ of Note that Θ is definable from parameters and X 2 ⊆ Θ.Consider any x W ∈ Θ, not necessarily in X 2 .Since each On the other hand, since each (since we chose σ small enough).So when x W ∈ Θ, we have , and therefore Let ψ be the conjunction of this formula with the formula defining Θ.
Then we have M ψ(x W ) whenever x W ∈ X 2 , and therefore M m W E (ψ) > ζ for some ζ > 0. We also have that whenever M ψ(x W ), it is actually true that µ Since the formula m W E (ψ) > ζ holds in the ultraproduct, it also holds in infinitely many finite models.But any finite model where this holds fails to be ζ, d-suitably random.This contradicts the assumption that the finite models are δ n , d-suitably random with δ n → 0.
(3) The third requirement follows immediately the Loś Theorem: the formula m holds in every finite model, and therefore in M as well, and therefore µ (4) Fortunately, the integral in this statement does not cause as much difficulty, since we do not need to deal with it uniformly in param- > ǫ for some ǫ, there is a formula holding of the parameters a W , x V 0 , b P which is a conjunction of components of the form or negations of such components, and which implies that the integral is ≥ ǫ.But then this formula holds in U -almost every finite model, which means that we must have f χ B(a W ,x V 0 ,max E,V 0 ,f,ϕ almost every finite model (where a P , etc., refer to the corresponding parameters in those finite models).But then this formula also holds in M, so 6. Uniformity Seminorms 6.1.Seminorms for Principal Algebras.In this section we define a family of seminorms, the Gowers uniformity seminorms, corresponding to the σ-algebras defined above.Fix disjoint sets V, P and a k-uniform hypergraph E ⊆ V ∪P k ; let m = |E ∩ P k | and let µ be a canonical family of measures of degree k and size I∈E 2 |I∩V | .Let a P be such that the measure µ V E,a P , and the measures we generate from it below, satisfy the appropriate Fubini properties.(We will only work with a finite family of measures, so the set of such a P has µ P E -measure 1.) To avoid repeating the background parameters a P over and over, we will write µ V E as an abbreviation for µ V E,a P .Definition 6.1.For each I ⊆ V , we define µ where E V +I is given as follows: for each J ∈ E and each ω : J ∩ I → {0, 1}, there is an edge is the measure obtained by making a second copy of the indices in {(1,2)} , the measure whose underlying graph has two points connected by an edge, µ V +V E is a measure on 4-tuples, where the four coordinates are {(1, 0), (1, 1), (2, 0), (2, 1)} and there is an edge between each pair (1, b), (2, b ′ ).
Note that where the variables being integrated over are exactly the ones displayed.If ω : I → {0, 1}, we write x ω I for the tuple . Note that we chose the degree of our measure to be I∈E 2 |I∩V | because this is precisely the size needed to ensure Fubini properties for µ We have to check that these are well-defined.We actually prove the following stronger lemma, which will be useful later.
Proof.It suffices to show the claim in the case when Expanding the product gives a sum of 2 2 n terms of the form where each S ω is either χ B or χ B .We will show that each of these terms is non-negative.Since is one of these terms, both inequalities follow.
Note that χ Sω (x ω V ) depends only on x ω I .In particular, if there are any In particular, one of these two values must be 0, so the whole product is 0.
So may restrict to the case where S ω depends only on ω ↾ I. Let v be the unique element in V \ I and let E ′ = E ↾ I k .Then we have the decomposition The second equality holds because the graph E V +V used to defined the measure µ V +V E does not contain any edges containing both (v, 0) and (v, 1).So we have Since the inside of the integral is always non-negative, this term is nonnegative.
In particular, since is defined.Next we want a Cauchy-Schwarz style inequality for these seminorms: Proof.Fix some v ∈ V , and let I = V \ {v}.Note that we have the decomposition As above, the second equality holds because the graph in µ V +V E does not contain any edges containing both (v, 0) and (v, 1).For ω ∈ {0, 1} I and b ∈ {0, 1}, let us write ωb for the element of {0, 1} V given by (ωb Therefore, using Cauchy-Schwarz, we have: In particular, applying this repeatedly to each coordinate in V , we have for a suitable m.In particular, this bound is precisely as desired.
In particular, the work above gives: > 0 then we may find, for each By repeatedly applying Lemma 6.3, once to each I, we have We will obtain the converse, which will show that and in particular will enable us to show that µ has Jregularity.
6.2.Characterization in Product Measures.Definition 6.8.We say µ V E is a product measure if no element of E contains more than one element of V .
(Recall that µ V E abbreviates µ V E,a P , so there may be edges in E connecting elements of V to elements of P .)We call such measures product measures because they are extensions of the ordinary product measure v∈V µ v E .We are not concerned with the converse in the case where P = ∅, so we state it only when E ⊆ V k .Theorem 6.9.If µ V E is a product measure, and Proof.This is essentially identical to the argument we gave for regularity for ordinary measures.Suppose This last equality holds because µ V E is a product measure, and so the inner copy of µ V E does not depend on the choice of x 1 V .
Observe that, for every particular value of 6.3.Seminorms for Non-Principal Algebras.We will need a more general family of seminorms corresponding to arbitrary algebras of the form B V,I .
Definition 6.10.For J ⊆ V , define . We need to generalize to norms U V,J where J is a set.A natural choice would be to take the product of U V,J over all J ∈ J , but this is not a seminorm.Instead we need the following form: Definition 6.11.Let J ⊆ P(V ) be a set such that if J, J ′ ∈ J are distinct then J ⊆ J ′ .Then we define where the infimum is taken over all sequences f 0 , . . ., f k such that f = i≤k f i .
It is not immediately obvious that U V,J ∞ and U V,{J} ∞ calculate the same value, but this will follow immediately once we show that U V,J ∞ is a seminorm.Lemma 6.12.
Proof.First consider the case where J is a singleton {J}.Again, let For the general case, first observe that, setting c .

This holds for any
Once again positive homogeneity is obvious from the definition, so we need only check that the triangle inequality holds.
We first consider the case where J is a singleton: The main thing that makes the uniformity seminorms useful to us is that they easily pass across different measures: Proof.
In order to associate these more general seminorms with the correct algebras, we introduce the following definition: Definition 6.16.If I ⊆ P(V ) is non-empty, we define I ⊥ to be the set of J ⊆ V such that: (1) There is no If J ′ J then there is an • ⊥ and • − depend on the choice of the ambient set V .Note that I ⊥ always has the property that if J, J ′ ∈ I ⊥ are distinct then We will show that when µ V E is nice enough, B V,I and J∈I ⊥ B V,J − agree up to µ V E measure 0. Lemma 6.17.If there is no J ∈ J such that J ⊆ I and . Proof.It suffices to show this for J a singleton {J}.Write V ′ = V \ J. Observe that for any fixed x V ′ , the function χ B (x V ′ , •) is a B J,J∩I -measurable function, where J ∩ I must be a proper subset of J.So we have: > 0 then we may find, for each I ∈ I, a set .
Observe that for each I ∈ I we may apply the previous lemma, so we have . Definition 6.19.Let µ be a canonical family of measures of size k and degree I∈E 2 |I∩V | .For some I ⊆ P(V ), we say Lemma 6.20.If U J,I ∞ (µ J E ) is characteristic for each I ⊆ P(J) then µ V E has J-regularity.
Proof.Let J V and I ⊆ P(V ) be given, and let g and f I be as in the definition of regularity.Let The exact choice of which set of measure 1 this holds on depends on the choice of representative of h.) 1   1 We note the similarity of this argument to the one in [21].The argument there uses two equivalent characterizations of a regularity type property, DISC and PAIR, the former analogous to having 0 projection and the latter to having 0 uniformity norm; a key step in that DISC implies PAIR in the dense setting, PAIR in the dense setting implies PAIR in the sparse setting, and then PAIR in the sparse setting implies DISC.
Including x V \J as part of the background parameters, Theorem 6.18 implies that ||E(h | B J,I∧J )|| L 2 (µ J E,x V \J ) = 0, and so since for almost every fixed x V \J , I f I is B J,I∧J -measurable.
6.4.Characterization for Non-Principal Seminorms.The principal seminorms are the controlling case for showing that the uniformity norms are characteristic: in this subsection we show that if the principal algebra of a given size is characterized by its uniformity norm then all algebras of the same size are characterized by their uniformity norms.We only need this for the case of a product measure, but we include the general argument for completeness.
Lemma 6.21.Let I be given and let Proof.By definition, we have B V,I∧{J} ⊆ B V,I ∩ B V,J .
For the converse, we first show that if g is B V,J -measurable with E(g | B V,I∧{J} ) = 0, also E(g | B V,I ) = 0. Let such a g be given, and for each I ∈ I, let f I be B V,I -measurable, so I f I is B V,I -measurable.Since g = g − E(g | B V,I∧{J} ) and µ V E has J-regularity, g showing that E(g | B V,I ) = 0. Now let g B V,I ∩ B V,J -measurable be given.It suffices to show that g ′ = g − E(g | B V,I∧{J} ) = 0.But g ′ is B V,I ∩ B V,J -measurable and satisfies E(g ′ | B V,I∧{J} ) = 0, and therefore satisfies E(g ′ | B V,I ) = g ′ = 0. Lemma 6.22.For any I, J ⊆ P(V ), if µ V E has J-regularity for every Proof.The direction B V,I∧J ⊆ B V,I ∩ B V,J is immediate from the definition.
For the converse, we may assume J = {J 1 , . . ., J n } where i = j implies J i ⊆ J j , and we proceed by induction on n.When n = 1 this is just the previous lemma.Suppose the claim holds for J and we wish to show it for J ∪ {J}.Note that It suffices to show that whenever f is B V,I -measurable then E(f | B V,J ∪{J} ) is B V,I∧(J ∪{J}) -measurable.For any f , we have measurable, and therefore, by IH, B V,I∧J -measurable.By the previous lemma, E -measure 0 (it is easy to see that ∧ is associative and commutative, so this follows by repeated application of Lemma 6.22).We need only check that J∈I ⊥ J − = I.
If I ∈ I (or even I ⊆ I ′ ∈ I) then for every J ∈ I ⊥ , we have J ⊆ I, and therefore I ∈ J − , and therefore I ∈ J∈I ⊥ J − .Conversely, if there is no I ′ ∈ I such that I ⊆ I ′ then there is a J ⊆ I such that J ∈ I ⊥ , and therefore no J ′ ∈ J − such that I ⊆ J ′ , and therefore I ∈ J∈I ⊥ J − .Note that the following theorem is one of the places where we directly appeal to the definability structure of our σ-algebras.This is for a good reason: the statement would not be true if we replaced our σ-algebras with, say, simple product algebras.Theorem 6.24.Suppose that for every J ∈ I ⊥ and every J ′ ⊆ J, U J ′ ∞ (µ J ′ E ) is characteristic and that for µ We proceed by main induction on |V |.In particular, if V ∈ I ⊥ then the claim is given by the assumption, so we may assume that every element J ∈ I ⊥ has |J| < |V |, and so by IH, each U J,J ⊥ ∞ (µ J E ) is characteristic.We start with the case where This means that for almost every x V ′ ∈ S 0 , we may choose a set (Note that here even the index set Q x V ′ may depend on x V ′ .)There are only countably many formulas, so we may assume that there is a single formula defining B(x V ′ , b x V ′ Q ), independently of x V ′ ∈ S 0 .However there are uncountably many choices of parameters, so we may not assume that the parameters b x V ′ Q are independent of x V ′ .We may choose an ǫ > 0, an approximation of f by a simple function f ′ , and a set S 1 ⊆ S 0 of positive measure so that for (where a W are the parameters in the definition of f ′ ), In particular, for each Clearly at least one of S + 1 and S − 1 has measure ≥ µ V ′ E (S 1 )/2; without loss of generality, we assume S + 1 does.Since f ′ is simple, we have f ′ = i≤n α i χ C i .We may write a large union D of sets consisting of those ) and every element of D satisfies The formula defining this set has only free variables a Since we chose f ′ to be an arbitrarily close approximation of f , we may assume , and so we have For the general case, suppose ||E(f | B V,I )|| L 2 (µ V E ) = 0.By Lemma 6.20, µ V E has J-regularity for each J ∈ I ⊥ , so by Lemma 6.23, B V,I = J∈I ⊥ B V,J − , and so = 0, and therefore by the previous part, But this means the whole sum is 0, and therefore We show a more general result: Let I ⊆ V be given and let The main result is then the case where I = V and f ∅,ω = f for all ω.
We proceed by induction on |I|.When I = ∅, this is trivial, so assume |I| > 0.
Fix some v ∈ I, and let I ′ = I \ {v}.For each ω ∈ {0, 1} I ′ and each b ∈ {0, 1} we will write ωb for the corresponding elements of {0, 1} I .We define a function Let W be the vertex set of the measure µ . (There is likely some room here for optimizing the exact degree of the canonical family needed.)Let J ⊆ P(W ) be the collection of subsets of the form for some ω ∈ {0, 1} I ′ .That is, J consists of those sets which contain V ′ together with exactly one copy of each coordinate from I ′ .The elements of J ⊥ are pairs J = {(i, 0), (i, 1)} for some i ∈ I ′ .No edge of E ′ contains both elements of one of the pairs {(i, 0), (i, 1)}, so µ J E ′ and µ J E ′ ,x W \J ′ are product measures, and in particular, U J ∞ (µ J E ′ ) and U J ∞ (µ J E ′ ,x W \J ) are characteristic by Theorem 6.9.
We claim that G is B W,J -measurable (with respect to the measure µ W E ′ ).Suppose H is a function with ||E(H | B W,J )|| L 2 (µ W E ′ ) = 0.By Theorem 6.24, = 0, and therefore for µ v Ealmost-every Since this holds for any H with ||E(H | B W,J )|| L 2 (µ W E ′ ) = 0, it follows that G is B W,J -measurable.This means that we may write in the L 2 (µ W E ′ )-norm.We may assume the g ω,n,N are L ∞ (µ W E ′ ) functions.Then we have some ǫ such that Choosing N large enough, we may make and therefore In particular, there must be some n such that Consider the functions given by, for each ω ∈ {0, 1} I ′ , setting f ′ ω = f ω0 g ω,n,N .We apply IH to I ′ , and conclude that ||E(f We can now give a sparse version of the hypergraph removal lemma: Theorem 7.2.For every k-uniform hypergraph K on vertices V and every constant ǫ > 0, there are δ, ζ so that whenever Γ is a ζ, |K|2 2k -suitably random k-uniform hypergraph and A n ⊆ Γ n with hom(K,An) < δ then there is a subset L of A with |L| ≤ ǫ|Γ| such that hom(K, A \ L) = 0.
Proof.Suppose not.Let K, ǫ be a counterexample.Since there are no such δ, ζ, for each n we may choose k-uniform hypergraphs H n ⊆ Γ n with Γ n 1/n, |K|2 2k -suitably random and hom(K,H) < 1/n.Let M be the model given by Theorem 5.5.
Let V be the set of vertices of K.For any J ⊆ V , Theorem 7.1 implies that U J ∞ (µ J K ) is characteristic, and therefore by Lemma 6.20, µ V E has J-regularity.Therefore there must be arbitrarily large finite models where these formulas are satisfied.But this contradicts the choice of the hypergraphs H n , Γ n .

Conclusion
The notion of suitable randomness used in this paper is strong compared to other notions of pseudorandomness for hypergraphs that have been considered [3,4,6,22].The next step towards developing a rich analytic approach to working with sparse random hypergraphs would be a detailed investigation of the relationship between notions of pseudorandomness in the finite setting and the corresponding properties of measures in the infinitary setting.With less than suitable randomness, we would expect to lose the full Fubini property, but the notions that replace it are likely to be of interest themselves.
The approach Conlon and Gowers use to prove hypergraph regularity [5] depends, like our approach, on the use of various norms to detect the presence of certain properties.Their norms are much more narrowly tailored than the general uniformity norms.The uniformity norms are strikingly natural in the infinitary setting, lining up with canonical algebras of definable sets; it is possible that other norms also correspond to algebras which might be of independent interest.