Hypergraph Removal Lemmas via Robust Sharp Threshold Theorems

The classical sharp threshold theorem of Friedgut and Kalai (1996) asserts that any symmetric monotone function $f:\{0,1\}^{n}\to\{0,1\}$ exhibits a sharp threshold phenomenon. This means that the expectation of $f$ with respect to the biased measure $\mu_{p}$ increases rapidly from 0 to 1 as $p$ increases. In this paper we present `robust' versions of the theorem, which assert that it holds also if the function is `almost' monotone, and admits a much weaker notion of symmetry. Unlike the original proof of the theorem which relies on hypercontractivity, our proof relies on a `regularity' lemma (of the class of Szemer\'edi's regularity lemma and its generalizations) and on the `invariance principle' of Mossel, O'Donnell, and Oleszkiewicz which allows (under certain conditions) replacing functions on the cube $\{0,1\}^{n}$ with functions on Gaussian random variables. The hypergraph removal lemma of Gowers (2007) and independently of Nagle, R\"odl, Schacht, and Skokan (2006) says that if a $k$-uniform hypergraph on $n$ vertices contains few copies of a fixed hypergraph $H$, then it can be made $H$-free by removing few of its edges. While this settles the `hypergraph removal problem' in the case where $k$ and $H$ are fixed, the result is meaningless when $k$ is large (e.g. $k>\log\log\log n$). Using our robust version of the Friedgut-Kalai Theorem, we obtain a hypergraph removal lemma that holds for $k$ up to linear in $n$ for a large class of hypergraphs. These contain all the hypergraphs such that both their number of edges and the sizes of the intersections of pairs of their edges are upper bounded by some constant.


1.1.
Problems on H-free families. For any set V we use V k to denote the family of all subsets of V of size k. Any H ⊆ V k is called a k-uniform hypergraph or a k-uniform family on the vertex set V , and the elements of H are its edges. We write [n] for the set {1, . . . , n} .
The celebrated Mantel's Theorem [46] from 1907 says that the largest triangle free graph G ⊆ [n] Note that the k-expansion of a d-uniform hypergraph with h edges is (h, d)-expanded. Conversely, any (h, d)-expanded hypergraph can be easily seen to be the k-expansion of some d (h − 1)-uniform hypergraph.
Many problems in extremal combinatorics can be expressed as determining ex k (n, H + ) for a fixed hypergraph H (see e.g. [28,34,43,26], and the survey of Mubayi and Verstraëte [50] for the case where H is a graph). The methods used for attacking such problems are varied. One of the most successful methods is the delta-system method of Erdős, Deza, and Frankl [13]. This method was applied by Frankl and Füredi [24,27,28] to solve various Turán problems for expanded hypergraphs (including the case where H is a special simplex, a sunflower, or the hypergraph that consist of two edges with some intersection of a fixed size). This allowed them to make significant progress on several longstanding open problems in extremal combinatorics.
Another notable technique is the shifting technique of Erdős, Ko, and Rado [20]. This technique was applied, e.g., in a recent breakthrough of Frankl [26]. He gave the best bound for the Erdős Matching Conjecture [18], which asks to determine ex k (n, 2 is a matching of size s. Other methods include the Erdős-Simonovits stability method [59], and the random sampling from the shadow method of Kostochka, Mubayi, and Verstraëte (see [43,44,45]).
Recently, a new approach towards the Turán problem for expansion was initiated be Keller and the author [42] and further developed by Ellis, Keller, and the author [17].

Definition 1.4. A family F ⊆ [n]
k is said to depend on the set of coordinates J if for each sets A, B ∈ [n] k that satisfy A ∩ J = B ∩ J we have A ∈ F ⇐⇒ B ∈ F . A family F is said to be a j-junta if it depends on a set J of size at most j. We say that a family F 1 is ǫ-essentially contained in F 2 if The notion of a junta was introduced to extremal combinatorics by Dinur and Friedgut [14] who showed the following. Theorem 1.5 (Dinur-Friedgut [14]). For each r ∈ N, there exist C > 0, j ∈ N, such that any intersecting family F ⊆ [n] k is C k n r -essentially contained in an intersecting j-junta.
Note that the theorem is trivial for k n = Θ (1) , while it is meaningful once k n is sufficiently small. Inspired by [14], Keller and the author [42] extended Theorem 1.5 to show that for each h, r there exist C > 0, j ∈ N, such that any M + h -free family F ⊆ [n] k is C k n r -essentially contained in an M + h -free j-junta, and obtained the following result for general expanded hypergraphs.
Theorem 1.6 ( [42]). For each ǫ > 0, h, d ∈ N, there exist C > 0, j ∈ N, such that the following holds. Let C < k < n C , and let H be a k-uniform (h, d)-expanded hypergraph. Then any H-free family F ⊆ [n] k is ǫ-essentially contained in an H-free j-junta. Theorem 1.6 serves as the first step in the following strategy for determining ex (n, H) .
(1) Show that any H-free family is essentially contained in an H-free junta J .
(2) Find the extremal junta J ex that is free of H.
(3) Show that if an H-free junta has size that is close to |J ex |, then it must be a small perturbation of J ex . (4) Show that any H-free small perturbation of J ex must have smaller size than it.
These four steps together suffice in order to show that the extremal junta is the family J ex . Indeed, if F is the extremal H-free family, then Step 1 implies that F is essentially contained in an H-free junta J . The fact that F is of the extremal size implies that the size of J cannot be much smaller than the size of J ex .
Step 3 implies that F is essentially contained in J ex , and Step 4 implies that F is actually equal to J ex .
This strategy was successfully carried out in [42] to solve the Turán problem for various (h, d)expanded hypergraphs, in the regime where C < k < n C for some C = C (h, d) . Later, [17] showed that this strategy can be carried out also for some hypergraphs in the regime where ǫn < k < 1 2 − ǫ n for an arbitrarily small constant ǫ, and a sufficiently large n. Specifically, they considered the case where the forbidden hypergraph H is I 2,d that consists of two edges that intersect in d elements.
Their basic observation was that any junta that does not contain a copy of I 2,d must be free of I 2,d ′ for any d ′ < d as well. In other words, any two sets in an I 2,d -free junta have intersection of size at least d + 1. This essentially reduces the problem to the well known problem on the size of (d + 1)-wise intersecting families, which was solved decades ago using the shifting technique (see Ahlswede-Khachatrian [3], Filmus [21], and Frankl [23]).
It is our belief that this strategy may be carried out for various other (h, d)-expanded hypergraphs, and that the following result we prove in this paper will serve as the first step in the solution of the Turán problem for various other hypergraphs in the regime where ǫn < k ≤ 1 h − ǫ n. k is ǫ-essentially contained in an H-free j-junta.
The special case of Theorem 1.7 where H = M + 2 = M 2 was already proved recently by Friedgut and Regev [33] who built upon the work of Dinur and Friedgut [14]. Other special cases of Theorem 1.7 were proved in [17], which settles the case h = 2 of the theorem. Theorem 1.7 is actually a special case of our main Theorem 1.13 below, which deals also with families that contain few copies of H, rather than dealing only with H-free families.
Similarly to the case where H = I 2,d , it turns out that it is a general phenomenon that H-free juntas are automatically free of some other hypergraphs. Definition 1.8. Let H be a hypergraph and let v be a vertex of H. The resolution of H at v, denoted by res (H, v), is the hypergraph obtained from H by taking v out of each edge of H that contains v, and by replacing it with a new vertex that belongs only to this edge. The resolution of H at a set of vertices S, denoted by res (H, S), is the hypergraph obtained by resolving H at the vertices of S one after the other. Any hypergraph of the form res (H, S) will be called a resolution of H. Example 1.9. Any hypergraph H is a resolution of itself since res (H, ∅) = H. Defining the center of a hypergraph H to be the set of its vertices that belong to at least two of its edges, the k-uniform h-matching M h := M + h is the resolution of any k-uniform hypergraph with h edges at its center. Another simple example is the hypergraph I 2,d : its resolutions are the hypergraphs of the form It is easy to show that any j-junta G ⊆ [n] k that is free of a hypergraph H with h edges is also free of every resolution of H, provided that C < k ≤ 1 h − ǫ n and n is large enough. Hence, in order to show that a given junta J is an extremal H-free family, it would essentially be enough to show that it is the extremal family that is free of a copy of H as well as of all of its resolutions.
1.3. Removal lemma for expanded hypergraphs. While Theorem 1.7 tells us the structure of H-free families it tells us nothing on families that are 'almost H-free', a notion that may be defined more precisely as follows. Definition 1.10. Let δ > 0 and let H be a k-uniform hypergraph. We say that a family F ⊆ [n] k is δ-almost H-free if a random copy of H in [n] k lies within F with probability at most δ.
The celebrated triangle removal lemma says that for any ǫ > 0, there exists δ > 0, such that any δ-almost triangle-free graph is ǫ-essentially contained in a triangle-free graph. This was generalized by Gowers [35,36], and independently by Nagle, Rödl, Schacht, and Skokan [51,53] to show that for each fixed k-uniform hypergraph H, there exists ǫ > 0, such that if a family F ⊆ [n] k is δalmost H-free, then F is ǫ-essentially contained in an H-free family. This result is known as the hypergraph removal lemma. (See the survey of Conlon and Fox [11] for a more thorough history, and for quantitative aspects of removal lemmas.) While the hypergraph removal lemma settles the case where k, H, and ǫ are fixed, it becomes quite useless for k that tends to infinity with n. Indeed, the initial dependence of δ on ǫ in the graph case where k = 2 was , and this was improved by Fox [22] to tower O H log 1 ǫ −1 . For k = 3 the best known bound is , and the bounds similarly worsen as k increases (see [60,Remark 2.11]).
Friedgut and Regev [33] were the first to prove a removal lemma in the case where k is linear in n. They showed that for each ǫ > 0 there exists k is a δ-almost M 2 -free family, then F is ǫ-essentially contained in an M 2 -free family. Note that the Friedgut-Regev Theorem does not follow from the hypergraph removal lemma. While the hypergraph removal lemma deals with the case where k and the hypergraph H are fixed, the Friedgut-Regev theorem deals with the case where k is linear in n. Our goal in this paper it to prove removal lemmas for other expanded hypergraphs in the regime where k is up to linear in n.
In light of Theorem 1.7, it may seem the Friedgut-Regev Theorem can be generalized to all (h, d)expanded hypergraphs. However, we show that the following surprising statement holds. Theorem 1.11. For each h, d ∈ N, ǫ > 0 there exists C, δ > 0 such that the following holds. Let C ≤ k ≤ 1 h − ǫ n, and let H be a k-uniform (h, d)-expanded hypergraph, then we have the following. (1) If the family F is δ-almost H-free, then F is ǫ-essentially contained in an M h -free family.
(2) Conversely, if the family F is δ-essentially contained in an M h -free family, then F is ǫ-almost H-free.
So suppose that F is a family that we want to check whether it is H-free or not. One natural way to check if F is H-free is to choose uniformly at random copies of H, and to check that none of them are contained in F . While we could expect that this would tell us that F is close to some H-free family, we instead obtain from Theorem 1.11 that this implies that F is close to a family that is free of the hypergraph M h . Even more surprisingly, also the converse holds! Any family that is close to an M h -free family contains few copies of H. This phenomenon becomes clearer by inspecting the following example.
n/3 : 1 ∈ A is o (1)-almost free of the hypergraph I 2,1 , which consists of two edges that intersect in a singleton {i}. Indeed, the probability that a random copy of this hypergraph lies in the star is 1 n , as it is the probability that a random injection from the vertices of I 2,1 to [n] sends the vertex i to 1. As Theorem 1.11 guarantees, the star is o (1)-essentially contained in an M 2 -free family as it is in itself M 2 -free. However, the star is not o (1)-essentially contained in any family free of the hypergraph I 2,1 .
More generally, suppose that G is a j-junta depending on a set J and that H is an (h, d)-expanded hypergraph. Then the center of a random copy of H most likely does not intersect J. So from the 'point of view' of the junta G, a random copy of H and a random copy of M h look the same. It is therefore easy to see that k , and let H be a hypergraph. We say that F is (H, s)-free if it is free of any resolution of H whose center is of size at most s. While Example 1.12 shows that being o (1)-almost free of H is not sufficient for guaranteeing closeness to an H-free family, the following theorem shows that a stronger assumption is sufficient.
h − ǫ n, and let F be a δ n s -almost H-free family. Then F is ǫ-essentially contained in an (H, s)-free j-junta.
Note that Theorem 1.7 is a special case of Theorem 1.13. Indeed, Theorem 1.13 implies that if H is a hypergraph whose center is of size c, then any δ n c -almost H-free family is ǫ-essentially contained in an (H, c)-free family, i.e. to a family free of H and of any resolution of it. On the other hand, Theorem 1.7 yields the same conclusion under the stronger hypothesis that F is H-free.
The following proposition is a converse to Theorem 1.13. It shows that any (H, s)-free j-juntas is O 1 n s+1 -almost H-free. So in particular, j-juntas are δ n s -almost H-free, provided that n is sufficiently large as a function of δ. Proposition 1.14. For each h, c, j, s ∈ N, there exists a constant C > 0, such that the following holds. Let H be a hypergraph with h edges whose center is of size c. Let C ≤ k ≤ 1 h − ǫ n, and let J be an (H, s)-free j-junta. Then J is C n s+1 -almost H-free. 1.4. Sketch of Proof of Theorem 1.13 for matching. We shall now sketch the proof of Theorem 1.13 in the case where the forbidden hypergraph is M h . The proof relies on the regularity method and on a novel sharp threshold result for 'almost monotone' Boolean functions that will be presented in Section 2.
Let ǫ > 0, h ∈ N be fixed constants, and let F be a family which is not ǫ-contained in any family free of the matching M h . Our goal is to show that a random matching lies in F with probability Θ (1).
Note that any set J decomposes the sets in F into 2 |J| parts according to their intersection with J. Following Friedgut and Regev [33] and [17] we apply a regularity lemma which says that we may find a set J, such that in the decomposition of F induced by J, almost all of the parts are either 'random-like' or sufficiently small that we can ignore them. We may then take as our approximating junta, the family The fact that F is not ǫ-essentially contained in an M h -free family will allow us to show that G is not M h -free. This in turn will imply that there exist pairwise disjoints sets A 1 , . . . , A h ⊆ J that correspond to random parts of F . Now note that a random matching {B 1 , . . . , B h } intersects the set J in the sets A 1 , . . . , A h with probability Θ (1). So the remaining task is to show that if F 1 , . . . , F h are 'random-like' parts, then a random matching A 1 , . . . , A h satisfies A i ∈ F i with probability Θ (1). We will accomplish this task using an enhancement of the 'sharp threshold technology' presented by Dinur and Friedgut [14]. Let us recall first the method in [14]. We say that families F 1 , . . . , F h are cross free of a matching if there exist no pairwise disjoint sets A 1 , . . . , A h such that A i ∈ F i for each i, otherwise they cross-contain a matching.
The p-biased distribution on P ([n]) is a probability distribution on sets A ⊆ [n], where each element is chosen to be in A independently with probability p. For a family G, write µ p (G) for The 'sharp threshold principle' essentially says that for a random-like monotone family F the p-biased measure of F jumps from being near 0 to being near 1 in a short interval.
Roughly speaking, the analogue of the strategy in [17] for the hypergraph M h goes as follows. ( are cross-free of a matching, then their up-closures are also cross-free of a matching (in the sense that there are no pairwise disjoint sets cannot be cross free of a matching. This plan fails completely when we try to show the desired statement that random-like families contain many matchings. The step which stops to work is the first one. While it is true that if {F i } are cross free of a matching, then their up closures F ↑ i are cross-free of a matching, it is not true that if {F i }-are almost cross free of a matching (in the sense that they cross contain few matchings), then the families F ↑ i are also almost cross free of a matching. We resolve this issue by replacing the up-closure of F by the family {A ∈ P ([n]) : |A| ≥ k and a random k-subset of A lies in F with probability Θ (1)} .
However, this new family is not monotone, and instead it satisfies a weaker hypothesis that may be called 'almost monotonicity'.
So to make the above plan work, we shall need to generalize the sharp threshold principle from monotone families to 'almost monotone' families. This statement is made more precise in Section 2. It is accomplished with the help of the invariance principle of Mossel, O'Donnell, and Oleszkiewicz [49].
In our view, the main contribution of this paper comes from the fact that we relate between sharp threshold results and hypergraph removal problems. We believe that further exploration of the relation between these two well studied problems will improve the understanding of each of them.
In the following section we give a more thorough introduction of the sharp threshold principle of monotone Boolean functions, and state our sharp threshold result for almost monotone Boolean functions. (Note that Boolean functions f : {0, 1} n → {0, 1}, and families F ⊆ P ([n]) can be identified).

Sharp threshold theorems for almost monotone functions
We use bold letters to denote random variables, and we write [n] for the set {1, . . . , n} . We shall use the convention that the ith coordinate of an x ∈ {0, 1} n is denoted by n , where a random element x ∼ µ p is chosen by letting its coordinates x i to be independent random variables that take the value 1 with probability p. For a function f : It is easy to see that for any monotone function f : {0, 1} n → {0, 1}, the function p → µ p (f ) is a monotone increasing function of p. Roughly speaking, a Boolean function f : {0, 1} n → {0, 1} is said to have a sharp threshold if there exists a 'short' interval [q, p], such that µ q (f ) is 'close' to 0, and µ p (f ) is 'close' to 1. Otherwise, it is said to have a coarse threshold.
A central problem in the area of analysis of Boolean functions is the following (see e.g. [10,31,32,38]). We shall now make the above discussion more formal. For a non-constant monotone f : {0, 1} n → {0, 1}, the critical probability of f (denoted by p c (f )) is the unique number in the interval (0, 1), such that µ pc (f ) = 1 2 . Bollobás and Thomason [7] showed that for any fixed ǫ > 0, and each Boolean function f there exists an interval [q, p] with q, p = Θ (p c (f )) , such that µ q (f ) < ǫ, and µ p (f ) > 1−ǫ. Therefore, f should be considered to have a sharp threshold if there exists an interval [q, p] of length significantly smaller than p c (f ), such that µ q (f ) < ǫ and µ p (f ) > 1 − ǫ.
A function f : {0, 1} n → {0, 1} is said to be transitive symmetric if the group of all permutations σ ∈ S n , such that ∀x ∈ {0, 1} n : f x σ(1) , . . . , x σ(n) ≡ f (x 1 , . . . , x n ) acts transitively on {1, . . . , n} . The Friedgut-Kalai Theorem [32] says that if f is transitive symmetric and p c (f ) is bounded away from 0 and 1, then f exhibits a sharp threshold. On the other hand, f need not exhibit a coarse threshold if f is no longer assumed to be transitive symmetric. Let j be a constant. A function f is said to be a j-junta if it depends on at most j coordinates. It is easy to see that any non-constant monotone j-junta exhibits an ǫ-coarse threshold for some constant ǫ = ǫ (j) > 0. A well known corollary of the celebrated Friedgut's Junta Theorem [30] is a partial converse to this statement. We shall say that f is (µ r , ǫ)-close to g if n → {0, 1} be a Boolean function, and let q, p be numbers in the interval (0, 1) that satisfy p > q + ǫ. Then there exists some r in the interval [q, p] , such that f is (µ r , ǫ)-close to a j-junta.
Note that Friedgut's Junta Theorem becomes trivial if µ q (f ) < ǫ or if µ p (f ) > 1 − ǫ as in which case we may take the junta to be the corresponding constant function. For that reason, Friedgut's Junta Theorem can be interpreted by saying that non-junta like functions exhibit a sharp threshold behavior.
2.1. Structural results on monotone families. We extend Theorems 2.2 and 2.3 in the following directions.
• We replace the condition that f is monotone with the weaker condition that f satisfies a notion we call (q, p, δ)-almost monotonicity. • We strengthen the Friedgut-Kalai theorem by relaxing the condition that f is transitive symmetric to the weaker condition that f satisfies a notion called (r, δ, µ q )-regularity. • Bearing in mind our applications to the removal problem, we modify Theorem 2.3 by replacing the condition that f is 'close' to a junta with respect to the µ r measure with a condition that says that f is 'close' to a junta in a sense that involves only the measures µ p and µ q , i.e. the measures at the ends of the interval. We shall now define the above notions more precisely, starting with (q, p, δ)-almost monotonicity.
Intuitively, a function f should be called 'almost monotone' if f (x) ≤ f (y) for almost all values of x and y that satisfy ∀i ∈ [n] : x i ≤ y i . However, there are many ways to interpret the notion 'almost all values of x and y'. For instance, the following definitions all seem to fit equally well.
• Choose x uniformly out of {0, 1} n and then choose y uniformly among the set of all the elements y ∈ {0, 1} n that satisfy ∀i : n , then choose x among the set of all x ∈ {0, 1} n that satisfy ∀i : • Choose a uniformly random pair of elements x, y ∈ {0, 1} n among the x, y ∈ {0, 1} n that satisfy ∀i : All these notions are captured by the following framework.
(1) The pairs (x i , y i ) are independent random variables.
(2) We have x i ≤ y i with probability 1.
(3) We have Pr [x i = 1] = q and Pr [y i = 1] = p. We write x, y ∼ D (q, p) to denote that they are chosen according to this distribution. We say that It will be convenient for us to define the following notion of 'closeness' between a function f and a function g that takes into considerations both the p-biased measure and the q-biased one. We say that functions f, g : can be partitioned into the union of two sets S p and S q , such that µ p (S p ) < ǫ and µ q (S q ) < ǫ.
We give the following variant of Friedgut's junta theorem, which implies that if ǫ, q < p are fixed numbers in the interval (0, 1), if δ > 0 is small enough, and if j ∈ N is sufficiently large, then any (q, p, δ)-almost monotone function f is (µ q , µ p , ǫ)-close to a monotone j-junta.
Theorem 2.5. For each ǫ > 0, there exists j ∈ N, δ > 0, such that the following holds. Let p, q be numbers in the interval (ǫ, 1 − ǫ) that satisfy p− q > ǫ and let f : Then there exists a monotone j-junta g, such that Pr Note that Theorem 2.5 is really a theorem about functions that have a coarse threshold. Indeed, if we have either µ p (f ) > 1 − ǫ or µ q (f ) < ǫ, then the theorem becomes trivial by taking g to be a suitable constant function.
The conclusion of Theorem 2.5 says that f can be 'approximated' by the junta g, where our approximation notion is the 'two-sided' notion of (q, p, ǫ)-closeness. It is natural to ask whether f can also be approximated by a junta according to a 'one-sided' notion, such as the notions of (µ p , ǫ)closeness and (µ q , ǫ)-closeness. The following example demonstrates that the two-sided approximation is actually necessary.

The Central Limit Theorem implies that
Since both µ q (f ) and µ p (f ) are bounded away from 0 and 1, we obtain that f has an ǫ-coarse threshold for some constant ǫ independent of n. On the other hand, it is easy to see that f is not -close to an O (1)-junta and is not (µ q , q (1 − q))-close to an O (1)-junta, provided that n is sufficiently large. However, if we take g to be the dictator function defined by g (x) = x 1 , then we have Pr as Theorem 2.5 guarantees.
The proof of Theorem 2.5 is based on the invariance principle of Mossel, O'Donnell, and Oleszkiewicz [49] and on a recent unpublished regularity lemma of O'Donnell, Servedio, Tan, and Wan. A presentation of their proof was recently given by Jones [39].
For our next extension of the Friedgut-Kalai Theorem, we need the notion of (r, ǫ, µ p )-regularity, (see O'Donnell [52,Chapter 7] for more about this notion). Let R be a subset of [n] , and let y ∈ {0, 1} R . We write f R→y for the Boolean function on the domain {0, where z is the vector whose projection to {0, 1} R is y and whose projection to Note that a function f is a j-junta if there exists a set J of size j, such that all the restrictions f J→x are constant functions. On the other extreme, we have the following notion of regularity which could be thought of as the complete opposite of being a junta. It says that for each set J of constant size r, the µ p measures of f and of f J→y are not far apart.
of size at most r and each y ∈ {0, 1} J .
As we explain below the following is a robust version of the Friedgut-Kalai Theorem.
Applying Theorem 2.8 (with f = g), we see that it strengthens Theorem 2.2 in the following ways. It shows that we may replace the hypothesis that f is monotone by the weaker hypothesis that f is (q, p, δ)-almost monotone, and that we may replace the hypothesis that f is transitive symmetric, with the weaker hypothesis that the f is 1 δ , δ, µ q -regular for some δ = δ (n) , where lim n→∞ δ (n) = 0. Example 3.8 below shows that the latter hypothesis is indeed weaker. Remark 2.9. While Theorem 2.8 is more general than the Friedgut-Kalai Theorem, we remark that the Friedgut-Kalai Theorem is better in the quantitative aspects that we have not addressed. We would also like to remark that the proof of Theorem 2.5 is very different than the standard proofs of Theorems 2.2 and 2.3. While the traditional proofs are based on the hypercontractivity theorem of Bonami, Gross, and Beckner [8,37,5] and on Russo's Lemma [54], our proof of Theorem 2.8 is based instead on the invariance principle of O'Donnell, Mossel, and Oleszkiewicz [49].

2.2.
Sketch of the proof of Theorems 2.5 and 2.8. Our proof of Theorem 2.5 is based on the regularity method. In the setting of the regularity method we are given a space S, and our goal is to show a 'removal lemma' asserting that any subset A ⊆ S that contain few copies of a given 'forbidden' configuration may be approximated by a family that contains no copies of that configuration. The proof contains two ingredients.
(1) A regularity lemma showing that for any set A, we may decompose B into some parts, such that the intersections of A with 'almost all' of the parts are either 'quasirandom' or 'small'. (2) A counting lemma showing that if we take the quasirandom parts, then together they contain many forbidden configurations.
These two ingredients are put together by approximating A by the set J defined to be the union of all the quasirandom parts of B. The task is then to use the counting lemma to show that any forbidden configuration that appears in J results in many forbidden configurations back in A.
The invariance principle of Mossel O'Donnell and Oleszkiewicz [49] considers a notion of smoothness called small noisy influences. It roughly says that we may replace the variables of a smooth function f : {0, 1} n → [0, 1] by Gaussian random variables and obtain a function that behaves similarly. We call this function the Gaussian analogue of f . The proof of Theorem 2.5 goes through the following steps.
(1) We apply a regularity lemma presented by Jones [39], which shows that we may find a set J of constant size that decomposes f into the parts {f J→y } y∈{0,1} J , such that almost all of the parts either have expectation very close to 0, or have small noisy influences. (2) We give a counting lemma that shows that if two functions The proof of the second part follows [49]. We express in terms of the Fourier expansions of f 1 and f 2 , and we show that this expression can be approximated by a similar expression involving the Gaussian analogues of f 1 and f 2 . We then apply a classical theorem by Borell [9] to lower bound the value of the corresponding expression. The proof of Theorem 2.8 is similar. Suppose that f is a 1 δ , δ, µ q -regular function with µ q (f ) > ǫ.
(1) We apply the regularity lemma of [39] to find a set J of constant size that decomposes f into the parts {f J→y } y∈{0,1} J , such that most of the parts either have expectations very close to 0, or have small noisy influences. The 1 δ , δ, µ q -regularity of f implies that there are no parts with expectations close to 0, so only the latter option is available.
(2) We note that the term We deduce from the above counting lemma (applied with f 1 = f J→x , f 2 = g J→y ) that for such x, y, if f J→x has small noisy influences, then the function g J→y has expectation close to 1. (4) It is easy that for 'almost all' y ∈ {0, 1} J we may find x ∈ {0, 1} J with x i ≤ y i for each i, such that f J→x has small noisy influences. So Step 3 implies that for almost all y the expectation of g J→y is close to 1. Therefore, the expectation of g is close to 1.

Prior results and notions that we make use of
In this section we review some facts on the Fourier analysis of the p-biased cube. Many of them are standard results that can be found e.g. in O'Donnell [52, Chapters 2,8, and 11].
3.1. Fourier analysis on the p-biased cube. Given a distribution D on a space Ω, we write x ∼ (Ω, D) or x ∼ D to denote thatx is chosen out of Ω according to the distribution D. We shall use bold letters to denote random variables.
We denote by The p-biased norm is defined by setting f = f, f . In any time that we write that f is an element of the space The p-biased Fourier characters are an orthonormal basis of L 2 ({0, 1} n , µ p ) defined as follows.
The Fourier character corresponding to the singleton {i} is the function More generally, let S be a subset of [n]. The Fourier character corresponding to the set S ⊆ [n] is the function χ p S := i∈S χ p i . The Fourier characters are known to be an orthonormal basis for L 2 ({0, 1} n , µ p ). Thus, each function has a unique expansion of the form f This expansion is called the p-biased Fourier expansion of f , or just the Fourier expansion of f , where p is clear from context. We also have the following identities known as the Parseval identities.
For any T ⊆ [n], the averaging operator has the following nice Fourier analytical interpretation. It is the operator that annihilates all the Fourier coefficients that correspond to sets that have nonempty intersection with T .
n , µ p ) be a function that has the Fourier expansion Then Another notion of importance for us is the notion of influence by Ben-Or and Linial [6].
We shall also need to introduce the noise operator.
is a probability distribution on elements y ∈ {0, 1} n , where we set each coordinate y i independently to be x i with probability ρ, and to a new p-biased element of {0, 1} with probability 1 − ρ.
The noise operator T ρ,p on the space L 2 ({0, 1} n , µ p ) is the operator that associates to each f ∈ L 2 ({0, 1} n , µ p ) the function We have the following Fourier formula for T ρ,p [f ] . ρ |S|f (S) χ p S .

3.2.
The directed noise operators. We shall now introduce a directed analogue of the noise operator. Recall that D (q, p) is the joint distribution on elements (x, y) ∈ {0, 1} n × {0, 1} n , such that x ∼ µ q , y ∼ µ p , ∀i : y i ≥ x i . We define an operator T p→q : and its adjoint T q→p : . These operators were first studied by Ahlberg, Broman, Griffiths, and Morris [2], and then again by Abdullah and Venkatasubramania [1]. The one sided noise operator has the following Fourier formulas: Then T q→p (f ) has the p-biased Fourier expansion: Similarly, if g = S⊆[n]ĝ (S) χ p S is a function in L 2 ({0, 1} n , µ p ), then the function T p→q (g) has the q-biased Fourier expansion T p→q (g) =
Proof. We shall prove it for the operator T q→p , as the proof for the other operator T p→q will be similar. By linearity, it is enough to prove the lemma in the case where f = χ q S for some S ⊆ [n] . Let y ∈ {0, 1} n . Note that Claim 3.7 below shows that . By the independence of the random variables χ q i (x i ) for any x, y ∼ D (q, p) | y = y we obtain: Combining (3.2) with (3.3), we complete the proof.
Since the functions χ q i , χ p i depend only on the ith coordinate we may assume that n = 1, and we shall write χ p = χ p 1 as well as χ q = χ q 1 for brevity. We shall start by showing (3.4), and the proof of (3.5) will be similar. Let h ∈ L 2 ({0, 1} n , µ p ) be the map Note the space L 2 ({0, 1} , µ p ) is a linear space of dimension 2. We shall show that h = ρχ p by showing that there are two independent linear functionals on the space L 2 ({0, 1} , µ p ) that agree on the functions h and ρχ p . Namely, the first functional is the functional of evaluating at 0, and the second functional is the expectation according to the p-biased distribution. Indeed, we may use the fact that elements x, y ∼ D (q, p) satisfy x i ≤ y i with probability 1 to obtain: On the other hand, Since the expectation functional and the evaluating by 0 functionals are independent, and since the space L 2 ({0, 1} , µ p ) is of dimension 2, we obtain h = ρχ p . This completes the proof of (3.4). We prove (3.5) in a similar fashion. Define h ∈ L 2 ({0, 1} , µ q ) by Similarly to the proof of (3.4), it is enough to prove the identities To prove the former, note that To prove the latter, note that

Fourier regularity.
We shall say that a function f : It is easy to see (see O'Donnell [52,Chapter 7]) that any (r, δ, µ p )regular function is (r, δ, µ p )-Fourier regular, and on the converse any (r, δ, µ p )-Fourier regular function is (r, 2 r δ, µ p )-regular. So in a sense these notions are equivalent. r , then the fact that f is transitive symmetric implies that there exist distinct r-subsets of [n], S 1 , . . . , S ⌈ n r ⌉ , such thatf (S i ) =f (S) for each i. By Parseval's identity, we have After rearranging, we obtain that f is r, 2 r r n , µ p -Fourier regular, so it is in fact (r, δ, µ p )-Fourier regular, provided that n > 4 r δ 2 . 3.4. The noisy influences. Let f ∈ L 2 ({0, 1} n , µ p ) be a function. The noise stability of f is defined by By Fact 3.5, and by Parseval's identity, we have wheref (S) are the Fourier coefficients of f with respect to the p-biased distribution. The (ρ, µ p )-noisy influences of f are defined by 3.5. Regularity lemmas we use. We shall make use of the following regularity lemma presented by Jones [39]. Remark 3.11. Jones [39] proved Theorem 3.10 only for the case where p = 1 2 . However, as in most of the results in the area, their proof can be extended verbatim to the p-biased distribution, for any p bounded away from 0 and 1.
We also make use of the following regularity lemma of [16]. ( Functions on Gaussian spaces. Let γ be the standard normal probability distribution N (0, 1) on R. Abusing notation, we will also use γ to denote the product normal probability distribution N (0, 1) n on R n . We shall denote by L 2 (R n , γ) the space of functions f : R n → R, such that f γ := E γ f 2 < ∞. This space is equipped with the inner product The operator T ρ on the space (R n , N (0, 1)), also known as the Ornstein-Uhlenbeck operator, is defined as follows.
Definition 3.13. Let ρ ∈ (0, 1), and let x ∈ R n , the ρ-noisy distribution of x is the distribution N ρ,γ (x), where we choose y by setting each coordinate y i independently to be ρx i + 1 − ρ 2 z i , where z is a new independent γ-distributed element of R. The noise operator T ρ on the space L 2 (R n , γ) is the operator that associates to each f ∈ L 2 (R n , γ) the function Remark 3.14. The analogy between the distribution N ρ,p , and N ρ,γ stems from the fact if we choose x ∼ γ, and y ∼ N ρ,γ (x) , then we have the following properties.
• The R 2 -valued random variables(x i , y i ) are independent of each other. These properties are similarly satisfied when we choose x ∼µ p and then choose y ∼ N ρ,p (x) .
We would also like to remark that we have the following Fourier formula for T ρ [f ], in the case where f is a multilinear polynomial:

3.7.
The invariance principle. The invariance principle is a powerful theorem due to Mossel, O'Donnell, and Oleszkiewicz [49] that relates the distribution of a 'smooth' function f : {0, 1} n → R with the distribution of functions on Gaussian spaces. To state a corollary of it that we shall apply, we need to introduce some terminology.
Let f : R n → R be a function. Following [15], we define the function Chop (f ) by setting We shall also need the following definition. We let the Gaussian analogue of it be the multilinear polynomialf ∈ L 2 (R n , γ) defined bỹ Roughly speaking, the invariance principle says that if the function f is sufficiently 'smooth', then the distribution of f (x), where x ∼ ({0, 1} n , µ p ) is somewhat similar to the distribution off (y), where y ∼ (R n , γ) is a Gaussian random variable. The smoothness requirement that we need is the following. Let δ, ǫ > 0, we shall say that a function f ∈ L 2 ({0, 1} As a corollary of the invariance principle, one can show (see [15,Theorem 3.8]

Counting lemma for the ρ-noisy influence regularity lemma
In this section we prove our version of the majority is stablest theorem that would serve as a counting lemma in the proof of Theorem 1.7. The proof is a straightforward adaptation of the proof by Mossel, O'Donnell, and Oleszkiewicz [49] of the Majority is Stablest Theorem, and its generalizations by Mossel [47].  ρ |S|f (S)ĝ (S) < Λ ρ (µ q (f ) , µ p (g)) + ǫ.
We divide the proof into three parts. In each of these parts we prove that if f, g satisfy certain requirement then (4.1) holds. The hypothesis will be the strongest in the first part, weaker on the second part, and the weakest on the third part. The parts are as follows.

4.1.
Proof of the proposition in the case where f is (δ, 1 − ǫ, µ q )-smooth and g is (δ, 1 − ǫ, µ p )smooth. The idea of the proof is to convert the statement on f and g to a corresponding statement about their Gaussian analoguesf andg, and then to prove the corresponding statement by applying Borrel's Theorem. A difficulty that arises in this approach is the fact that Borrel's Theorem may be applied only on functions that take their values in the interval [0, 1] , while the functionsf and g may take their values outside of this interval. However, we overcome this difficulty by noting that Borrel's theorem may be applied on the functions Chop f and Chop (g) , and by observing that  For each ǫ > 0, there exists δ > 0 such that the following holds. Let q, p ∈ (ǫ, 1 − ǫ), let ρ ∈ (0, 1) , let f = f (S) χ q S be a (δ, 1 − ǫ, µ q )-smooth function, and let g = ĝ (S) χ p S be a (δ, 1 − ǫ, µ p )-smooth function. Then Proof. Let ǫ > 0 and suppose that δ = δ (ǫ) is sufficiently small. Letf be the Gaussian analogue of f and letg be the Gaussian analogue of g. By Fact 3.18 we have S⊆[n] ρ |S|f (S)ĝ (S) = T ρf ,g .
So our goal is to show that provided that δ is sufficiently small. Let and let Applying Borrel's Theorem on the functions Chop f , Chop (g) , we obtain and hence So to complete the proof we need to show that ǫ 1 + ǫ 2 < ǫ provided that δ is sufficiently small. Proof. Note that it follows from Jensen's inequality that the operator T ρ on the space L 2 (R n , γ) is a contraction. Indeed, for each function h ∈ L 2 (R n , γ) we have Moreover, we note that by Parseval g = ĝ (S) 2 = E y∼µ p g (y) 2 ≤ 1. Therefore, We may now apply Corollary 3.20 with ǫ replacing η and ǫ 4 replacing ǫ, to obtain that provided that δ is sufficiently small. This completes the proof of the claim.
To finish the proof of the lemma it remains to prove the following claim.
is the probability of the event X < t 1 , Y < t 2 for the proper values of t 1 , t 2 . Similarly, Λ ρ E Chop f , E [Chop (g)] is the probability of the event X < t 3 , Y < t 4 for the proper values of t 3 , t 4 . These events differ either if X is in the interval whose endpoints are t 1 , t 3 or if Y is in the interval whose endpoints are t 2 , t 4 . The Probability of the former event is E Chop f − E f , and the probability of the latter event is Therefore, a union bound implies that: max Inf Lemma 4.5. For each ǫ > 0, there exists δ > 0 such that the following holds. Let q, p ∈ (ǫ, 1 − ǫ), let ρ ∈ (0, 1), let f = f (S) χ q S be a function and suppose that max Proof. Let ǫ > 0, let δ 1 = δ 1 (ǫ) be sufficiently small, and let δ = δ (δ 1 ) be sufficiently small. Let We assert that the functions f ′ is (δ, 1 − δ 1 , µ q )-smooth and the function g ′ is (δ, 1 − δ 1 , µ p )-smooth, provided that δ is small enough. Indeed, the functions f ′ is (δ, 1 − δ 1 , µ q )-smooth since: The function g is (δ, 1 − δ 1 , µ p )-smooth for similar reasons. Provided that δ is small enough, Lemma 4.2 implies that By Lemma 3.17 we have provided that δ 1 is sufficiently small. Hence min Inf Proof. Let ǫ > 0, let δ 1 = δ 1 (ǫ) be sufficiently small, and let δ = δ (δ 1 ) be sufficiently small. Let We shall now bound S⊆[n] ρ |S|f (S)ĝ (S) by bounding each of the terms in the right hand side.
Upper bounding S⊆B ρ |S|f (S)ĝ (S) Since f ′ , g ′ satisfy the hypothesis of Lemma 4.5 (with δ 1 replacing δ), we have provided that δ 1 is small enough.
Upper bounding S∩A =∅ ρ |S|f (S)ĝ (S) . By Cauchy Schwarz, we have for any i ∈ [n] . Moreover, we have Inf So this completes the proof provided that δ ≤ ǫ 2 4|A| 2 . We shall now complete the proof by showing that |A| = O δ1 (1) .
Upper bounding |A| We show that |A 1 | = O δ1 (1) , as the proof that |A 2 | = O δ1 (1) is similar. Note that the quantity is on the one hand bounded from below by |A 1 | δ 1 , and on the other hand we have the following upper bound on it. Hence . This completes the proof of the proposition.

Proof of the structural result on almost monotone functions
In this section we prove Theorem 2.5. We restate it for the convenience of the reader.
Theorem. For each ǫ > 0, there exists j ∈ N, δ > 0, such that the following holds. Let p, q be numbers in the interval (ǫ, 1 − ǫ) that satisfy p − q > ǫ and let f : {0, 1} n → {0, 1} be a (q, p, δ)almost monotone function. Then there exists a monotone j-junta g, such that Pr We recall that the proof relies on the regularity method, with the regularity lemma being Theorem 3.10 of [39], and with the corresponding counting lemma being Proposition 4.1.
The regularity lemma allows us to decompose f into functions {f J→x } x∈{0,1} J , such that for most of the parts the function f J→x has small noisy influences and a q-biased measure that is bounded away from 0. We shall then approximate f by the 'least' monotone junta g : {0, 1} J → {0, 1} that takes the value 1 on all the the x, such that the function f J→x has small noisy influences. Here, by least we mean smallest with respect to the partial order: g ≤ h if and only if g (x) ≤ h (x) for each x.
Let Q ⊆ {0, 1} J be the set of 'quasirandom parts' consisting of all Let A be the up-closure of Q/N , i.e. the set of all x ∈ {0, 1} J , such that there exists some y ∈ Q\N that satisfies ∀i : we have g (x) = 0, and particularly x / ∈ Q\N. So we either have x / ∈ Q or we have the unlikely event that f (x) = 1 although x J ∈ N. The former event occurs with probability at most δ 2 , and the latter event occurs with probability at most ǫ 2 so provided that δ 2 is sufficiently small. Showing that Pr x∼µp [f (x) < g (x)] < ǫ.
Let y ∈ A, let x ∈ Q\N be with ∀i : x i ≤ y i , and let ρ = q(1−p) p(1−q) . Since x is in Q, we may apply Proposition 4.1 to obtain that provided that δ 2 is sufficiently small. This gives us an upper bound on T q→p f J→x , f J→y . On the other hand we may use the fact that f is almost monotone to obtain a lower bound on T q→p f J→x , f J→y as follows. Note that we have Thus, provided that δ = δ (δ 1 , j, ǫ) is small enough. Combining (5.1) and (5.3) we obtain By Lemma 3.16 we have µ p (f J→y ) > 1− ǫ 2 provided that δ 1 is small enough (note that µ q (f J→x ) > ǫ/2, since x / ∈ N ). This shows that any y with g (y) = 1, f (y) = 0 satisfies the unlikely event that f (y) = 0 while µ p (f J→yJ ) > 1 − ǫ/2. Since a random y ∼ µ p satisfies this event with probability at most ǫ, we obtain Pr y∼µp [f (y) < g (y)] < ǫ. This completes the proof of the theorem.
We may repeat the proof of Theorem 2.5 to obtain the following lemma that we use in the proof of Theorem 2.8.
We shall now prove Theorem 2.8. We restate it for the convenience of the reader.

Counting matchings
In this section we prove Theorem 7.1 in the case where H is a matching. Theorem 6.1. For each h ∈ N, ǫ > 0, there exists δ > 0, such that the following holds. Let k h be families whose measure is at least ǫ. Suppose that for each i ∈ [n], such that k i ≥ δn the family F i is 1 δ , δ -regular, and choose uniformly at random a matching {A 1 , . . . , A h } , such that Then We start by stating some constructions that we shall use throughout the proof.
6.1. Basic constructions and overview of the proof. We completely identify between an element x ∈ {0, 1} n with the set of i ∈ [n] , such that x i = 1. Thus, we shall use the notations F B J , F 1B J interchangeably, we write P (x) for the family of all subsets of {i : x i = 1} , we shall write x k for the family of all subsets in P (x) whose size is k, and we write |x| for # {i : x i = 1} .
The first construction that we need associates to each family F ⊆ [n] k a function f F : {0, 1} n → [0, 1] . This construction is originated in the work of Friedgut and Regev [33].
k , we associate to F the function f F defined by Another construction we need turns a function f : {0, 1} n → R into a Boolean function Cut δ (f ) .
Given a function f : {0, 1} n → R, and a δ ∈ R, we define the function Cut δ (f ) by setting: We shall also need to introduce the following distributions.
• We write µ ≥k p for the conditional probability distribution on x ∼ ({0, 1} n , µ p ) given that |x| ≥ k. (The distributions µ >k p , µ <k p , µ ≤k p are defined accordingly.) • We write µ ≥k p , J → B for the conditional distribution on sets A ∼ µ ≥k p given that A ∩ J = B. The distributions (µ p , J → B) , [n] k , J → B are defined accordingly. Another construction we need is the construction of a random matchings such that each of the sets B i is distributed according to the 1 h -biased distribution. Definition 6.5. Choose uniformly and independently [0, 1]-valued random variables X 1 , . . . , X n . For each i ∈ {1, . . . , h} we let B i be the set of all j ∈ [n] , such that X j is in the interval j−1 h , j h . We call the sets (B 1 , . . . , B h )  We will be concerned with the case where k ≤ n h − Θ (n). This would yield that |B i | ≥ k asymptotically almost surely for all i. So intuitively, the distribution of a 1 h -biased matching is not much different than the distribution of a 1 h , k -biased matching. The proof of Theorem 6.1 consists of three steps: (In the following ǫ 1 is sufficiently small and ǫ 2 = ǫ 2 (ǫ 1 ) is sufficiently small) (1) We set q to be slightly larger than k n . The first step is to show that for each of the families F i of Theorem 6.1, the function f Fi is 1 ǫ1 , ǫ 1 , µ q -regular.
(2) The second step is to show that the measure µ matching (Cut ǫ2 (f Fi )) is very close to 1. This step is based on Theorem 2.8, and the proof roughly goes as follows.
• We shall apply Theorem 2.8 to deduce that µ 1 h (Cut ǫ2 (f Fi )) is large. • We shall use the similarity between µ 1 s and µ matching and for each choice of disjoint B 1 , . . . , B h these events are independent. Therefore, the probability that M i is in F for each i cannot be much smaller than ǫ h 2 . We shall start with the proof of the first step.

6.2.
Showing that if F is regular, then the function f F is 1 ǫ , ǫ, µ q -regular. In order to show that the function f F is regular, we will need to show that µ q (f F ) is approximately µ q ((f F ) J→x ) for each |J| ≤ 1 ǫ , and each x ∈ {0, 1} J . In order to accomplish this we would need to write both of the quantities µ q (f F ) , µ q ((f F ) J→x ) in terms of F . We shall start by showing that µ q (f F ) is approximately equal to µ (F ) . Lemma 6.6. For each ǫ > 0, there exists n 0 > 0, such that the following holds. Let n > n 0 , let q ∈ (0, 1) , k ≤ n satisfy q ≥ k n + ǫ, and let F ⊆ [n] k be some family. Then Proof. We have However, whenever we choose x ∼ µ ≥k q , and an A ∼ x k , we obtain a set A that is distributed uniformly in [n] k . Thus, The lemma follows by combining (6.2) with the fact that Pr x∼µq [|x| ≥ k] tends to 1 as n tends to infinity.
We now turn to the task of approximating µ q ((f F ) J→x ) in terms of F . We show that for some λ > 0 the term µ q ((f F ) J→x ) can be approximated by Lemma 6.7. For each ǫ > 0 there exists n 0 , such that the following holds. Let n > n 0 , let k ≤ n and let q be a number in the interval k n + ǫ, 1 , and set λ = k qn . Then Proof. As in Lemma 6.6, we have

Note that
Pr y∼(µq ,J→x) where the o (1) is with respect to n tending to infinity. So to complete the proof it remains show that Choose y ∼ µ ≥k q , J → x ,A ∼ y k , then A ∩ J is equal to some subset C of x. Note also that the conditional distribution of A given that C =C is the distribution of a uniformly random element of [n] k that intersects J at the set C. Therefore, So to complete the proof it remains to show that Indeed, with high probability |y| = qn (1 + o (1)) , and the conditional probability that C = C given that |y| = s is (1)) .
Let s = |y| . Thus, where the last equality follows from the fact that k s = λ (1 + o (1)) with high probability. This completes the proof of the lemma.
We are now ready to show that if F ⊆ [n] k is a 1 ǫ , ǫ -regular family, and if we choose q that is bounded from below away of k n , then the function f F is 1 2ǫ , 2ǫ, µ q -regular, provided that n is sufficiently large. Lemma 6.8. For each ǫ < 0, there exists n 0 , such that the following holds. Let n > n 0 , let F ⊆ [n] k be a 1 2ǫ , 2ǫ -regular family, and let q ≥ k n + ǫ. Then the function f F is 1 ǫ , ǫ, µ q -regular.
Proof. Fix ǫ > 0, let n 0 be sufficiently large, and let F ⊆ [n] k , be as in the hypothesis of the lemma. Let B ⊆ J ⊆ [n] be sets, such that |J| ≤ 1 ǫ . By Lemma 6.6 provided that n 0 is large enough. By Lemma 6.7 provided that n 0 is large enough.
By hypothesis Thus, This completes the proof that f F is 1 ǫ , ǫ, µ q -regular. 6.3. Showing that if k n is small, then f F is 1 ǫ , ǫ, µ q -regular. Lemma 6.9. For each ǫ > 0, there exists δ > 0 such that the following holds. Let k n < δ, let q ≥ ǫ, and let F ⊆ [n] k be some family. Then the function f F is 1 ǫ , ǫ, µ q -regular. Proof. Let J be of size at most 1 ǫ , and let B ⊆ J. We have (6.9) . We shall complete the proof by giving an upper bound of ǫ 2 for each of the summands in the right hand side of (6.9).
Showing that µ F ∅ J − µ q (f F ) ≤ ǫ 2 . By decreasing δ if necessary we may assume that n is as large as we wish. Therefore Lemma 6.6 implies that provided that δ is small enough. Now note that So provided that δ is small enough, we have provided that n is large enough. Now note that similarly to (6.10) we have Thus, provided that δ is small enough. This completes the proof of the lemma.

6.4.
Showing that if F is 1 δ , δ, µ q -regular, then µ 1 h (Cut δ (f F )) is large. In the last two sections we gave two criteria on a family F that imply that the function f F is 1 δ , δ, µ q -regular for a small δ. We shall now show that both criteria imply that µ matching (Cut δ (f F )) is large. As mentioned, we will first show that the pair T Proof. We show that the stronger statement that for each value y of y, we obtain that if we choose conditionally x, y ∼ D (q, p) given that y = y, then This clearly holds if Cut δ (f F ) (y) = 1. So suppose that Cut δ (f F ) (y) = 0, we also suppose that |y| ≥ k for otherwise we would have |x| < k, and hence f F (x) = 0. Now note that This completes the proof of the lemma.
We shall now complete the second step of showing that µ 1 h (Cut δ (f F )) is large. Lemma 6.11. For each ǫ > 0, there exists δ > 0, such that the following holds. Let p ≥ k n + ǫ, and let F ⊆ [n] k be some family whose measure is at least ǫ. Suppose that we either have k ≤ δn or the family F is 1 δ , δ -regular. Then Proof. Let q = k n + ǫ 2 , and note that by Lemma 6.10 we have By decreasing δ if necessary, we may assume that n is sufficiently large for Lemma 6.6 to imply that µ q (f F ) ≥ ǫ 2 . By Theorem 2.8 (applied with ǫ 2 rather than ǫ) we have µ p (f F ) ≥ 1 − ǫ 2 > 1 − ǫ, provided that δ is small enough. This completes the proof of the lemma. k be some family whose measure is at least ǫ. Suppose that we either have k ≤ δn or the family F is 1 δ , δ -regular. Then Proof. By Lemma 6.11, we have µ 1 h (Cut δ (f F )) > 1 − ǫ 2 provided that δ is small enough. Also note that we may assume that n is sufficiently large by decreasing δ if necessary.
We shall now define a coupling between 1 h -biased matching, and 1 h , k -biased matchings as follows. Thus, provided that n is sufficiently large to imply Pr [M 1 = M ′ 1 ] < ǫ 2 .
Proof of Theorem 6.1.
k h be some families that satisfy the hypothesis of the theorem, let δ ′ = δ ′ (ǫ) be sufficiently small, and let δ = (δ ′ ) h 3 . By Corollary 6.12, for each i, provided that δ ′ is small enough. A union bound implies that if we choose a 1 h , k -biased matching A 1 , . . . , A h , then the event ∀i Cut δ ′ (f Fi ) (A i ) = 1 happens with probability greater than This completes the proof of the theorem, since the hypergraph {M 1 , . . . , M h } is a uniformly random matching.

Counting expanded hypergraphs
We shall now generalize Theorem 6.1 to general expanded hypergraphs.
Note that H ⊆ P (V ) can be written in the form where C := E 1 ∪ · · · ∪ E h is the center of H, and where the sets C, D 1 , . . . , D h are pairwise disjoint. If π : V → [n] is a random injection, then π (E 1 ∪ D 1 ) , . . . , π (E h ∪ D h ) is a uniformly random copy of H. Write E i = π (E i ) , D i = π (D i ) , and C = π (C) . Our basic observation is that the following events are equal.
(1) The families F 1 , . . . , F h cross contain the random copy of H (2) The families (F i ) Ei C cross contain the uniformly random matching D 1 , . . . , D h . Therefore it is natural to try to apply Theorem 6.1 on the families (F i ) Ei C . As it turns out, the only hypothesis of Theorem 6.1 that the families (F i ) Ei C do not obviously satisfy is the hypothesis that The Fairness Proposition tells us that for any F ⊆ [n] k and any J ∼ [n] s is ǫ-fair for F with high probability, provided that k is not too close to either 0 or n. Proof of Theorem 7.1 . Let E i , C, D i be as above. Our goal is to show that the families (F i ) Ei C cross contain the uniformly random matching D 1 , . . . , D h with probability ≥ δ. Noting that the size of C is fixed, the following observations are easy to verify provided that δ is sufficiently small: • Proposition 7.3, implies that the set C is 1 2 -fair with probability at least 1 2 . For any such C the measure of the family (F i ) Ei C ⊆ [n]\C ki−|Ei| is at least ǫ 2 . • For any i such that ki n < δ, we have ki−|Ei| n−|C| < 2δ.
• If F i is 1 δ , δ -regular, then (F i ) Ei C is 1 2δ , 2δ -regular. We shall also assume that the δ of this lemma is small enough for Theorem 6.1 to hold with 2δ replacing δ, and ǫ 2 replacing ǫ. These observations allow us to apply Theorem 6.1, and to deduce that for each set C ′ that is 1 2 -fair for F , and for each set E ′ i ∈ C ′ |Ei| we have Pr ∀i : D i ∈ F Ei C ′ > 2δ. Therefore, This completes the proof of the theorem.

Removal lemma for expanded hypergraphs
In this section we prove Theorem 1.13, Proposition 1.14, and Theorem 1.11. Let H = {A 1 , . . . , A h } ⊆ P (V ) . Any hypergraphs of the form {A 1 ∩ S, . . . , A h ∩ S} is called a trace of H. We shall need the following lemma.
Lemma 8.1. For each h, c, s, j ∈ N, ǫ > 0 there exists δ > 0, such that the following holds. Let H be a hypergraph with h edges whose center is of size c, and let G ⊆ P (J). Let 1 δ ≤ k ≤ 1 h − ǫ n. Then the following are equivalent.
(2) There exists no copy of a trace of H in G, whose center is of size at most s.

Proof.
We start by showing that if (2) does not hold, then (1) does not holds. By hypothesis, there exist a trace {C 1 , . . . , C h } of H in G whose center is of size at most s. Let B 1 ∈ [n]\J k−|C1| , . . . , B h ∈ [n]\J k−|C h | be some pairwise disjoint sets (such sets exist provided that δ is large enough). Then the hypergraph {C 1 ∪ B 1 , . . . , C h ∪ B h } is contained in G , it is the resolution of the hypergraph H and its center is of size at most s. Therefore, the family G is not (H, s)-free and so (1) does not hold. We now show that if (1) does not hold, then (2) does not holds. Let {A 1 , . . . , A h } ⊆ G be a resolution of H whose center is of size at most s. The hypergraph {A 1 ∩ J, . . . , A h ∩ J} is contained in G, its center is of size at most s, and in order to complete the proof we need to show that it is a trace of H.
For each i = 1, . . . , h let D i ⊆ [n] \J be a sufficiently large set that is contained in A i and does not intersect any other edge of H. The fact that {A 1 , . . . , A h } is a resolution of H implies that there exist sets E 1 , . . . , E h ⊆ [n] \J, such that is a copy of H. Now note that if we intersect each of the edges of H ′ with J, we obtain the original hypergraph {A 1 ∩ J, . . . , A h ∩ J} . Therefore, A 1 ∩J, . . . , A h ∩J is indeed a trace of H. This completes the proof of the lemma.
We will also need the following lemma. Provided that δ is small enough, this contradicts the hypothesis.
Note that in the proof of Theorem 1.13, the hypothesis k > 1 δ is not needed in the case where H is a matching as we may apply Theorem 6.1 rather than Theorem 7.1.
We shall now prove Proposition 1.14. We restate it for the convenience of the reader.
Proposition. For each constants h, c, j, s ∈ N, there exists a constant C > 0, such that the following holds. Let H be a hypergraph with h edges whose center is of size c. Let ǫn ≤ k ≤ n 1 h − ǫ , and let J be some (H, s)-free j-junta. Then J is C n s+1 -almost H-free. Proof. Let {A 1 , . . . , A h } be a random copy of H, and let J be a set of size at most j, such that J depends on J. Let C 1 = A 1 ∩J, . . . , C h = A h ∩J. Since J is (H, s)-free, we obtain by Lemma 8. Finally we shall prove Theorem 1.11 that we restate for the convenience of the reader.
Theorem. For each h, d ∈ N, ǫ > 0 there exists C, δ > 0 such that the following holds. Let C ≤ k ≤ 1 h − ǫ n, and let H be a k-uniform (h, d)-expanded hypergraph, then we have the following.
(1) If the family F is δ-almost H-free, then F is ǫ-essentially contained in an M h -free family.
(2) Conversely, if the family F is δ-essentially contained in an M h -free family, then F is ǫ-almost H-free.
Proof of Theorem 1.11. (1) =⇒ (2) follows by applying Theorem 1.13 with s = 0, noting that a family is (H, 0)-free if and only if it is free of a matching. We now show the converse implication. Suppose that (2) holds. By Theorem (1.13) F is ǫ h+1 -essentially contained in an M h -free junta, provided that δ is small enough. Let {A 1 , . . . , A h } be a random copy of H. Note that the event {A 1 , . . . , A h } ⊆ F can occur only if for some i we have A i ∈ J \F , or if {A 1 , . . . , A h } ⊆ J . So a union bound implies that it is enough to show that each of these events occurs with probability < ǫ h+1 . By Proposition 1.14 a random copy of H lies in J with probability O 1 n < ǫ h+1 , provided that C is sufficiently large to imply the needed lower bound on n. Moreover, each A i is uniformly distributed in [n] k . Therefore, for each i the probability that A i is in F but not in J is at most ǫ h+1 . This completes the proof of the theorem.