An efficient container lemma

We prove a new, efficient version of the hypergraph container theorems that is suited for hypergraphs with large uniformities. The main novelty is a refined approach to constructing containers that employs simple ideas from high-dimensional convex geometry. The existence of smaller families of containers for independent sets in such hypergraphs, which is guaranteed by the new theorem, allows us to improve upon the best currently known bounds for several problems in extremal graph theory, discrete geometry, and Ramsey theory.


Introduction
The hypergraph container theorems, proved several years ago by Balogh, Morris, and Samotij [6] and, independently, by Saxton and Thomason [42], state that the family of independent sets of any uniform hypergraph whose edges are distributed somewhat evenly may be covered by (the families of subsets of) a small collection of sets, called containers, each of which is nearly independent. The original motivation for these results were several specific questions concerning enumeration of graphs avoiding a given subgraph and of sets of integers contining no arithmetic progressions of a given length. (The idea of considering these problems in the general context of independent sets in hypergraphs had been successfully pursued earlier in the breakthrough works of Conlon-Gowers [14] and Schacht [44] on extremal properties of random graphs and random sets of integers.) However, over the years, the scope of applicability of the container theorems has grown quite substantially (see, for example, the survey [7] and references therein). The two major reasons for this are the general form of the theorems (many problems can be cast in the language of independent set in auxiliary hypergraphs) and the explicit, optimal dependence between the various parameters disguised under the vague phrases 'small collection', 'evenly distributed', and 'nearly independent' above.
The vast majority of applications of the container theorems concern sequences of hypergraphs of fixed uniformity and growing order and size. As a result, the explicit dependence of the various parameters involved in the statement of the container theorems on the uniformity of the hypergraph is merely a minor detail in all these works. Recently, however, the container theorems have been used to analyse sequences of hypergraphs whose uniformity grows with the number of vertices and edges. In these applications of the container method to questions in Ramsey theory [12,40], discrete geometry [9], and extremal graph theory [5,33], the explicit dependence between uniformity and other parameters turned out to lie at the heart of the matter, obstructing the way to obtaining optimal bounds for several well studied functions. It is only fair to note here that this dependence is more favourable in the version of the theorem proved by Saxton and Thomason [42]. Having said that, the two constructions of containers presented in [6] and [42] are essentially equivalent and the differences between the final results reflect merely the differences in their analysis. This analysis was performed more carefully, and with a wiser choice of parameters, by the authors of [42].
The basic container lemma, which is the building block of both proofs that really lies at the heart of the matter, is a statement that asserts the existence of a small family C of containers for independent sets of an s-uniform hypergraph H that satisfies |C| (1 − δ)v(H) for every C ∈ C and some positive constant δ; see [6,Proposition 3.1] and [42,Theorem 3.4]. The stronger form of the theorem described in the first paragraph is then derived by recursively applying this basic lemma to the subhypergraphs of H induced by the sets C ∈ C as long as C still contains many edges of H. The caveat here is that the proof methods used in both [6] and [42] necessarily yield δ 1/s!. (The short, non-algorithmic proof of the basic container lemma given recently by Bernshteyn, Delcourt, Towsner, and Tserunyan [10] seems to yield δ that is doubly-exponentially small in s.) Since one typically requires the ratio |C|/v(H) to be bounded away from one for each final container C, at least exp Ω(s log s) iterations are required; this substantially blows up the final number of containers when s is no longer a fixed constant. Finally, we remark that a different method of building containers for independent sets in hypergraphs was proposed and analysed by Saxton and Thomason [43]. Even though the parameter δ in the basic container lemma proved in [43] is only polynomially small in the uniformity, the upper bound on the number of containers is far from optimal. Moreover, the lemma applies only to simple hypergraphs (i.e., hypergraphs whose every pair of vertices is contained in at most one edge) whereas the hypergraphs considered in most applications of the container method are far from being simple.
The main result of this work is a new, more efficient version of the basic container lemma in which the parameter δ is only polynomially small in the uniformity. We postpone stating the strongest form of our new lemma until Section 2 and state here only its corollary that can be easily compared with [6,Proposition 3.1]. Following the notational convention of [6], given a nonempty s-uniform hypergraph H, we shall denote the numbers of its vertices and edges by v(H) and e(H), respectively. Moreover, for every T ⊆ V (H), we define deg H T = |{A ∈ E(H) : T ⊆ A}| and, for every t ∈ {1, . . . , s}, we let ∆ t (H) = max deg H T : T ⊆ V (H) and |T | = t .
The following theorem, an efficient basic container lemma, is a simplified version of our main technical result, Theorem 2.1 below. Theorem 1.1. Let s be a positive integer and let H be a nonempty s-uniform hypergraph. Suppose that q ∈ (0, 1) and K > 0 are such that q · v(H) 10 8 s 6 K and, for every t ∈ {1, . . . , s}, weaker assumption 2 r (log n) 1/4 , almost all K r+1 -free subgraphs of K n are rpartite. (Both [33] and [5] relied on the original hypergraph container theorems.) Our first application of the new, efficient container lemma is the following strengthening of this result. Theorem 1.2. If a function r : N → N satisfies 2 r(n) log n/(120 log log n), then almost all K r+1 -free subgraphs of K n are r-partite.
We point out that the assumption on the growth rate of r in Theorem 1.2 is nearly optimal. Indeed, a standard first-moment calculation shows that, for every positive constant ε, a uniformly random subgraph G ⊆ K n contains no clique with ⌊(2+ε) log 2 n⌋ vertices whereas χ(G) Ω(n/ log n). On the other hand, it may well be that the assertion of the theorem remains true as long as r(n) (2 − ε) log 2 n for some positive constant ε. However, even removing the doubly-logarithmic term from the denominator in the assumed upper bound on r(n) will likely require significantly new ideas.

Lower bounds on ε-nets.
Suppose that X is a finite set and let R be an arbitrary collection of subsets of X. Given a positive number ε, an ε-net in R is any set N ⊆ X that intersects every element of R with cardinality at least ε|X|. In other words, N is an ε-net if N ∩ A = ∅ for every A ∈ R with |A| ε|X|.
One is usually interested in finding a small ε-net. However, this is not always possible. For example, if R comprises all subsets of X, then every ε-net in R must have more than (1−ε)|X| elements. One can rule out such 'pathological' examples by imposing a natural assumption on a measure of complexity of the family R called the VC dimension. We say that a set S is shattered by a family R if {A ∩ S : A ∈ R} contains all 2 |S| subsets of S. The VC dimension (a shorthand for Vapnik-Chervonenkis dimension) of R is the largest cardinality of a set that R shatters. A seminal result of Haussler and Welzl [26] states that every family of subsets whose VC dimension is at most d admits an ε-net with at most ⌈(8d/ε) log(8d/ε)⌉ elements, for every ε > 0. Komlós, Pach, and Woeginger [29] improved this upper bound on the smallest size of an ε-net to d + o(1) · (1/ε) log(1/ε), where o(1) denotes some function tending to zero with ε. Moreover, they constructed, for every d 2, (random) families with VC dimension d that have no ε-net smaller than d − 2 + 2/(d + 1) − o(1) · (1/ε) log(1/ε). On the other hand, it was proved that various set families arising in geometry admit εnets of cardinality merely O(1/ε), see [29,31]. In view of this, many researchers believed that in 'geometric scenarios' (with bounded VC dimension), there always exists an ε-net of size O(1/ε). This belief was shown to be wrong by Alon [1], who showed that, for arbitrary small ε, there are finite sets X of points in the plane such that every ε-net for the family comprising the intersections of X with straight lines (the range space of lines on X) must have at least (1/ε) · ω(1/ε) points, for some (very slowly growing) function ω with lim x→∞ ω(x) = ∞. Alon speculated that there are planar sets of points X for which the factor ω(1/ε) in the above statement could be replaced by Ω log(1/ε) .
In a paper that served as the main motivation for this work, Balogh and Solymosi [9] showed that, for arbitrarily small ε > 0, there are sets X such that the range space of lines on X does not have ε-nets with fewer than (1/ε) log(1/ε) 1/3−o(1) points; their proof relied on the hypergraph container theorems. We review the construction of Balogh and Solymosi [9] and, using our new, efficient container lemma, we further improve their lower bound, replacing the constant 1/3 in the exponent with 1/2. Theorem 1.3. The following holds for every ε 0 > 0. There exists an ε ∈ (0, ε 0 ) and a finite set X ⊆ R 2 such that the smallest size of an ε-net for the family of intersections of straight lines with X is at least We should mention here that, several years prior to [9], Pach and Tardos [36] showed that the families defined by intersections of finite point sets with axis parallel rectangles (in R 2 ) and axis-parallel boxes in R 4 may require ε-nets of sizes Ω (1/ε) log log(1/ε) and Ω (1/ε) log(1/ε) , respectively; both these lower bounds are tight up to multiplicative constants, see [3].
1.3. Upper bounds on Ramsey numbers. Given graphs G and H and a positive integer k, we write G → (H) k , and say that G is Ramsey for H in k colours, if every k-colouring of the edges of G contains a monochromatic copy of H. In other words, G → (H) k if, for every c : E(G) → k , there is some i ∈ k such that the graph c −1 (i) contains H as a subgraph. 4 The famous theorem of Ramsey [38] states that, for all positive integers n and k, there is an integer N such that K N → (K n ) k ; we shall denote the smallest such N , the k-colour Ramsey number of K n , by R(n; k). It is well-known that R(n; k) (kn)!/(n!) k k kn , see [19,25].
Fifty years ago, Folkman [21] proved that, for every n, there exists a graph G such that G K n+1 but, nevertheless, G → (K n ) 2 and Nešetřil and Rödl [35] generalised this result to an arbitrary number of colours. Define the k-colour Folkman number for K n by The constructions given in [21,35] gave upper bounds on F (n; k) that are tower functions of height polynomial in n and k. On the other hand, the strongest lower bound on F (n; k), due to Lefmann [30] is only exponential in kn. In the recent years, the transference theorems of Conlon-Gowers [14] and Schacht [44] (see also [23]) were used by Rödl,Ruciński,and Schacht [41] and by Conlon and Gowers (unpublished) to give improved upper bounds on F (n; k) that were merely doubly-exponential in n and k. Soon afterwards, the first of these two groups of authors [40] used the hypergraph container theorems to give the first exponential bound F (n; k) exp O(n 4 log n+n 3 k log k) . Our next application of the efficient container lemma is the following, modest improvement of this result. Theorem 1.4. There exists a constant C such that, for all positive integers n and k, F (n; k) CknR(n; k) 21n 2 exp Ckn 3 log k .
Another well studied variation of the classical Ramsey numbers are induced Ramsey numbers. Given graphs G and H and a positive integer k, we write G → ind (H) k , and say that G is induced-Ramsey for H in k colours, if every k-colouring of the edges of G contains a monochromatic induced copy of H. In other words, G → ind (H) k if, for every c : E(G) → k , there are an i ∈ k and an injection ϕ : V (H) → V (G) such that ϕ E(H) ⊆ c −1 (i) and ϕ E(H) c ∩ E(G) = ∅. The existence of induced-Ramsey graphs for every H and any number of colours k was established, independently, by Deuber [15], by Erdős, Hajnal, and Pósa [18], and by Rödl [39]. We may thus define the k-colour Ramsey number of H by The upper bounds on R ind (H; k) implied by the constructions of [15,18,39] were enormous. In spite of that, Erdős [16] conjectured that, for every n-vertex graph H, the 2-colour induced Ramsey number R(H; 2) is only exponential in n. The bestknown result to date was obtained by Conlon, Fox, and Sudakov [13], who proved that R ind (H; 2) exp O(n log n) for every n-vertex graph H. However, their method does not work when the number of colours is larger than two. The strongest general upper bound for k-colour induced Ramsey numbers in the case k > 2 that can be found in the literature is due to Fox and Sudakov [22], who showed that R ind (H; k) exp(C k n 3 ) for every n-vertex H, where C k depends only on k. However, Fox (private communication) informed us that the methods of [22], which were optimised for sparse graphs H, may be used to prove that R ind (H; k) exp(Ckn 2 log k). Our final application of the efficient container lemma is a short derivation of this bound. Theorem 1.5. There exists a constant C such that, for every positive integer k and every n-vertex graph H, Finally, let us mention that Conlon, Dellamonica Jr., La Fleur, Rödl, and Schacht [12] used the original container theorems to prove strong bounds on the induced Ramsey numbers of uniform hypergraphs.
1.4. Packaged statement. Each of the four illustrations of Theorem 1.1 presented in this paper require iterative/recursive applications of the theorem. In order to save ourselves (and the reader) from repeating similar, routine arguments and calculations several times, it will be convenient for us to work with the following 'packaged' version of the theorem that is analogous to [6,Theorem 2.2] and [42,Corollary 3.6]. Theorem 1.6. Let s be a positive integer and let H be a nonempty s-uniform hypergraph. Suppose that α, β, q ∈ (0, 1) and E v(H) are such that αβq · v(H) 10 9 s 7 and 10 4 s 5 q β and, for every t ∈ {2, . . . , s}, Then there is a family C ⊆ P(V (H)) of at most exp 10 4 s 5 β −1 log(e/α) · q log(e/q) · v(H) sets such that: The derivation of Theorem 1.6 from Theorem 1.1 is presented in Section 2.4 1.5. Organisation of the paper. The remainder of this paper is organised as follows. In Section 2, we introduce the crucial concept of degree measures, state our main technical result, Theorem 2.1, and derive from it Theorems 1.1 and 1.6. Section 3 is devoted to establishing key properties of degree measures. These properties are used in the subsequent Section 4, which contains the proof of Theorem 2.1. Two probabilistic inequalities needed for the four applications of our new container lemma are stated in Section 5. Finally, Section 6 is devoted to the proof of Theorem 1.2, Section 7 gives the proof of Theorem 1.3, and Section 8 contains proofs of Theorems 1.4 and 1.5.
1.6. Acknowledgement. First of all, we are indebted to Rob Morris, David Saxton, and Andrew Thomason for sharing their numerous insights about the container theorems that had a strong bearing on this work. The notion of degree measures, which is central to our approach here, as well as the important idea of allowing hypergraphs to have multiple edges were first introduced by Andrew Thomason and David Saxton [42]. Additionally, we would like to thank Noga Alon for his comments and suggestions regarding lower bounds on ε-nets. The second named author thanks Jacob Fox, Frank Mousset, and Bhargav Narayanan for inspiring discussions about upper-bounding induced Ramsey numbers. Last but not least, the second named author owes his deepest gratitude to Lev Buhovski for an inspiring discussion about high-dimensional convex geometry that laid foundations for Lemma 4.12, which lies at the very heart of the proof of Theorem 2.1.

A word of motivation.
The key idea behind the proof of the container lemma due to Morris and the authors [6] is to, given an (r + 1)-uniform hypergraph H and an independent set I ∈ I(H), consider a sequence of vertices of H for inclusion in a small 'signature' set S and construct an r-uniform hypergraph G from the neighbourhoods (link hypergraphs) of those among the considered vertices that belong to I. Crucially, each element of this sequence is allowed to depend only on the intersection of I with the set of its predecessors; this guarantees that G depends solely on S. Since G comprises only neighbourhoods of vertices in I, we have I ∈ I(G). This facilitates induction on the uniformity of the hypergraph.
Whereas there is essentially one way to define containers for independent sets in a 1-uniform hypergraph, the general description of the inductive step given above leaves plenty of room for manoeuvre. The approach taken in [6] was, roughly speaking, to cap the degrees of all vertices of G at some predefined value ∆ and, at the same time, make sure that e(G) β∆v(G) for some constant β; this way, the ratio of the maximum and the average degrees of the constructed hypergraph G remained bounded by a constant. The advantage of this approach was its relative simplicity. However, this simplicity came at a price; the gap between the maximum and the average degrees was forced to grow by a factor of at least r + 1, at each step of the induction (reducing uniformity from r + 1 to r). 5 As a result, the crucial parameter δ in the basic container lemma could not exceed 1/s!, where s is the uniformity of the original hypergraph. 5 For those readers who are somewhat familiar with the proof in [6], the essence of the above shortcoming was the following. Only one vertex of degree ∆ in G forced us to remove an edge from H, but while counting the edges of G that contain some vertex with degree ∆, we accounted for the possibility that every edge of G contains r vertices of degree ∆.
Here, we use a similar high-level inductive strategy. However, we take a refined approach to choosing a sequence of vertices of the (r + 1)-uniform H while constructing the r-uniform G; this yields a much more favourable dependence of the parameter δ on the uniformity s. The key new idea is to abandon the wish to control the maximum degree of G and instead focus on the ℓ 2 -norm of its degree sequence. In other words, we measure hypergraphs with ℓ 2 -norms, rather than ℓ ∞ -norms, of their degree sequences. Viewing hypergraphs as vectors in high-dimensional Euclidean spaces allows us to reduce the problem of constructing a sequence of vertices to be considered for inclusion in the 'signature' to an elementary problem in convex geometry.

Degree measures.
We begin by extending the notion of the degree measure of a hypergraph, which was introduced by Saxton and Thomason [42]. For a non-empty runiform hypergraph H with vertex set V and a t ∈ r , we define the t-degree measure of H, denoted by σ (t) H , to be the probability distribution on V t , the family of all t-element subsets of V , given by .
In other words, σ H is the probability distribution induced by the following random process. Select an edge A of H uniformly at random and output a t-element subset T ⊆ A chosen uniformly at random from A t . Throughout this paper, we shall identify (as we already did in the above definition) the measure σ (t) H with its density (with respect to the counting measure), which we shall view as an element of the |V | t -dimensional vector space of R-valued functions on V t . Since the 1-degree measure will be of particularly high importance, we shall refer to it simply as the degree measure and often suppress the superscript (1) from the notation, denoting it by σ H . Given a positive integer d and a vector ξ = (ξ 1 , . . . , ξ d ) ∈ R d , we denote by ξ its ℓ 2 -norm, so that

2.3.
The main technical result. We are now ready to state the main technical result of this paper, Theorem 2.1 below. We postpone the proof of the theorem to Section 4; the proof will use several simple properties of degree measures that will be derived in Section 3.
Theorem 2.1. Let s ∈ N and suppose that a nonempty s-uniform hypergraph H and positive reals p and δ satisfy Then, there exist a family S ⊆ V (H) 30s 2 p·v(H) and functions f : S → P(V (H)) and g : I(H) → S such that, for every I ∈ I(H), and |f (g(I))| (1 − δ) · v(H).
2.4. The simple and packaged versions. In this section, we derive Theorems 1.1 and 1.6 from our main technical result, Theorem 2.1 above. We start with a short proof of Theorem 1.1.
Derivation of Theorem 1.1 from Theorem 2.1. Let p = q/(30s 2 ) and let δ = (10 3 s 4 K) −1 . It suffices to verify that H, p, and δ satisfy the assumptions of Theorem 2.1, which will give us the claimed family S and functions g and f . To this end, note that for every t ∈ s and thus the assumptions of the theorem imply that Moreover, 300s 4 · 2K 1/δ and p · δ · v(H) = q · (30s 2 · 10 3 s 4 K) −1 · v(H) 500.
We now turn to the proof of Theorem 1.6. The key ingredient here is the following lemma, which, roughly speaking, states that a hypergraph that is 'robustly dense' contains a large subhypergraph whose maximum degree is not much larger than its average degree. The statement and the proof of the lemma are inspired by the work of Morris and Saxton [32].
Proof. Let H ′ be a largest (in terms of the number of edges) subhypergraph of H satisfying ∆ 1 (H ′ ) ⌈ sM βv(H) ⌉ and let X ⊆ V (H) be the set of vertices of H ′ whose degree achieves the bound ⌈ sM βv(H) ⌉. Observe that every edge of H that is disjoint from X must belong to H ′ and, consequently, e(H ′ ) e(H − X). If |X| βv(H), then e(H ′ ) M by our assumption on H. Otherwise, if |X| > βv(H), This completes the proof of the lemma.
Proof of Theorem 1.6. We shall say that a set C ⊆ V (H) is a good container if either |C| αv(H) or if there is a subset W ⊆ C with |W | (1−β)|C| such that e(H[W ]) < E. We will construct a rooted tree T whose vertices are subsets of V (H) that has the following properties: (i) The root of T is V (H).
(ii) If an independent set I ∈ I(H) is contained in a non-leaf vertex of T , then I is contained in some child of this vertex in T . (iii) Every leaf of T is a good container. (iv) Every non-leaf vertex of T has at most (e/q) q·v(H) children.
The set of leaves of T will then form a collection of containers for the independent sets of H that has the desired properties.
We build such a tree starting from the root V (H) by iteratively applying Theorem 1.1 to (a carefully chosen subhypergraph of) the subhypergraph of H induced by a leaf C of T that is not yet a good container and attaching the resulting family of containers (for the independent sets of H[C], that is, the independent sets of H that are contained in C) as children of C (which, as a result, ceases to be a leaf of T ) until no such leaves are left. This way, properties (i)-(iii) are clearly satisfied. However, we still need to show that the final tree has properties (iv) and (v). To where the second inequality follows from our assumption that E v(H) |C|. Since, for every t ∈ {2, . . . , s}, , establishing (iv). Finally, for every D ∈ C, since we assumed that 10 4 s 5 q β. In particular, if a set C is a non-leaf vertex of the final tree T that lies at distance d from the root, then This implies that height(T ) 10 4 s 5 β · log 1 α + 1 10 4 s 5 β · log e α , establishing (v).

Properties of degree measures
3.1. Norms of degree measures. As we shall be estimating the ℓ 2 -norms of t-degree measures of various uniform hypergraphs, we collect here several useful properties of this quantity. We first give general lower and upper bounds on the ℓ 2 -norm of the tdegree measure of a hypergraph in terms of the numbers of its vertices and edges and its maximum t-degree. Throughout this section, r is a positive integer. We stress here that all of our hypergraphs are allowed to have multiple edges, that is, every edge can have an arbitrary positive multiplicity. (This idea was first introduced by Saxton and Thomason [42].) Moreover, when computing deg H and e(H), we always count edges with multiplicities.
Fact 3.1. Suppose that H is a nonempty r-uniform hypergraph. For every t ∈ r , .
Proof. The upper bound is straightforward: .
For the lower bound, let It follows form the Cauchy-Schwarz inequality that implying the lower bound.
Our second observation states that the ℓ 2 -norm of a t-degree measure of a hypergraph cannot increase much when one deletes from it a small proportion of its edges.

Fact 3.2.
If H ′ is a nonempty subhypergraph of an r-uniform hypergraph H, then, for every t ∈ r , Proof. The assertion follows simply because deg H ′ T deg H T for every T ⊆ V (H ′ ) and hence Our final lemma relates the ℓ 2 -norm of the degree measure of a uniform hypergraph to a simple property of its edge distribution.
Proof. Let 1 D ∈ R V (H) be the characteristic vector of D. It follows from the Cauchy-Schwarz inequality that Since at least an ε-proportion of edges of H contains at least one vertex of D, giving the desired lower bound on |D|.

Degree measures and link hypergraphs. Suppose that H is an
, that is, the r-uniform hypergraph with vertex set V whose edges are all the r-element sets A such that {v} ∪ A is an edge of H. A property of crucial importance for us is that, for each t ∈ r , the t-degree measure of H is a convex combination of the t-degree measures of the link hypergraphs of its vertices. Moreover, each of these convex combinations has the same coefficients -the coordinates of the 1-degree measure vector σ H .

Remark. Even though σ (t)
H was defined only for nonempty hypergraphs H, for the sake of brevity, we shall often write 0 · σ Proof. It follows from our definition of a link hypergraph that e(H v ) = deg H v for each v ∈ V and, more generally, for every T ∈ V t , where we used the identity (r + 1) r t = r+1 t (r + 1 − t).
In our arguments, we shall employ the following relation between the ℓ 2 -norm of the (t + 1)-degree measure of a hypergraph and the ℓ 2 -norms of the t-degree measures of the link hypergraphs of its vertices.
Proof. A quick way to verify the claimed identity is to observe that both the left-and the right-hand sides of the claimed equality express the probability of obtaining the same outcome in two independent executions of the following random process: Pick an edge A of H uniformly at random, choose a (t + 1)-element subset S of A uniformly at random, mark a vertex v ∈ S chosen uniformly at random, and return the pair (v, S). More explicitly, using the identities deg

3.3.
Linear combinations of degree measures. It will be convenient to introduce another piece of notation. Given a vector α ∈ R r with nonnegative coordinates and a nonempty hypergraph K with uniformity at least r, we define so that The following generalisation of Fact 3.4 holds.
Fact 3.6. Suppose that K is a nonempty hypergraph with uniformity at least r + 1. For every α ∈ R r with nonnegative coordinates,

4.1.
Outline. Our proof of Theorem 2.1 follows the general strategy of [6]. We construct functions g, f * : I(H) → P(V (H)) that satisfy the following three conditions for every I ∈ I(H): The existence of such functions easily implies the assertion of the theorem. Indeed, condition (c) guarantees that there is an implicit decomposition f * = f • g.
Given an independent set I, the sets g(I) and f * (I) are constructed by an algorithm that operates in a sequence of at most s − 1 rounds, which are indexed by r = s − 1, . . . , 1. At the start of round r, the algorithm receives as input an (r + 1)-uniform hypergraph H (r+1) satisfying I ∈ I(H (r+1) ); we initialise the first round, indexed by r = s − 1, with H (s) = H. During the round, the algorithm tries to construct an runiform hypergraph H (r) with I ∈ I(H (r) ) and some additional desirable properties. The definition of 'desirable' is where we significantly depart from the previous approaches. In [6], as well as in [42], this desirability property was defined in terms of a lower bound on the number of edges of H (r) and upper bounds on the maximum degrees ∆ t (H (r) ), for all t ∈ r . Here, we aim to control the ℓ 2 -norms of the t-degree measures of H (r) . More precisely, 'desirable' means that a carefully chosen linear combination of σ where t ranges over r , is small. As in [6], in the event that such a hypergraph H (r) cannot be constructed for our input set I, the algorithm is able to define the required set f * (I) already at the end of round r. Crucially, the amount of information about the set I that is needed to describe H (r) , or the set f * (I), is rather small. More precisely, it naturally corresponds to a set of O(p · s · v(H)) elements of I, which we shall denote here by S (r) . In particular, if we let g(I) be the union of the sets S (r) from all of the (at most s − 1) rounds of the algorithm, the knowledge of g(I) alone (without any additional knowledge of the set I other than the fact that it is independent) is sufficient to recreate the entire execution of the algorithm, and thus also the final set f * (I).
In order to construct the r-uniform H (r) given the (r + 1)-uniform H (r+1) and the set I, our algorithm considers a sequence of queries 'Does v belong to I?' for some carefully chosen sequence of vertices v ∈ V (H). We record all the positive answers by placing the respective vertices v in the (initially empty) set S (r) . Each queried vertex v is clearly not in the set I \ S (r) and hence it may be omitted from the set f * (I). In particular, the algorithm will produce the desired set f * (I) if at least δ · v(H) queries are made. In case a queried vertex v does belong to I, we add its r-uniform link hypergraph H (r+1) v to the (initially empty) hypergraph H (r) . Note that this guarantees that I ∈ I(H (r) ).
Recall that our aim is to produce a hypergraph H (r) whose t-degree measures have small ℓ 2 -norms. The crux of the matter is the choice of the next vertex v to be queried for membership in I. Indeed, if v happens to belong to I, then H (r+1) v will be added to H (r) and, as a result of this operation, the t-degree measures of H (r) will change -the new σ , with appropriate coefficients (our hypergraphs are allowed to have multiple edges). It turns out that choosing the 'right' candidate vertex v is an optimisation problem that admits a rather simple geometric description. The solution to this geometric problem, presented as Lemma 4.12 and expressed in the language of degree measures of hypergraphs in Proposition 4.11, lies at the heart of our argument.
The bottom line is that there is a way to choose a sequence of vertices to be queried for membership in I such that, if at least Ω(p · s · v(H)) out of the first δ · v(H) queried vertices belong to the set I, some linear combination of σ where t ranges over r , will be at most 1 + O(1/s) times larger than a respective linear combination of σ where t ranges over r + 1 . Consequently, either one of the s − 1 rounds of the algorithm will output a desired set f * (I) of size δ · c(H) or the algorithm will eventually produce a 1-uniform hypergraph H (1) such that I ∈ I(H (1) ) and In case the latter happens, we may simply let f * (I) comprise all vertices v such that {v} ∈ H (1) . The upper bound on σ H (1) implies that there are at most (1 − δ) · v(H) such vertices, as shown in Lemma 3.3.

4.2.
The key lemma. The following lemma summarises a single round of our new, refined algorithm for constructing containers. We denote by Hyp r (V ) the family of runiform hypergraphs with vertex set V ; recall again that we allow our hypergraphs to have multiple edges. .
Let α ∈ R r be a vector with nonnegative coordinates and define Then, there exist (disjoint) families S ′ , S ′′ ⊆ V b and functions S : I(G) → S ′ ∪ S ′′ , C : S ′ → P(V ), and F : S ′′ → Hyp r (V ) such that, for every I ∈ I(G), we have S I ⊆ I. Moreover: (2) If S I ∈ S ′′ , then I ∈ I(F(S I )) and σ α (F(S I )) σ α * (G) .
Finally, if S I ⊆ I ′ and S I ′ ⊆ I for some I, I ′ ∈ I(G), then S I = S I ′ .
Before we embark on the proof of Lemma 4.1, we shall first show, in the next subsection, how it implies Theorem 2.1. The remainder of this section, Subsections 4.4-4.9 will be devoted to the proof of the lemma. 4.3. Derivation of Theorem 2.1. Let H be a nonempty s-uniform hypergraph with vertex set V and suppose that δ, p ∈ (0, 1) satisfy (2), that is, Define ε = 1 10s and Γ = 50s ε 2 and let α (1) ∈ R 1 , . . . , α (s) ∈ R s be vectors defined by for every r ∈ s and each t ∈ r . Given an independent set I of H, we construct the sets g(I) and f * (I) using the following procedure.
(C4) Otherwise, if S I ∈ S ′′ , we let H (r) ← F(S I ) and CONTINUE.
If STOP has not been called, then r = 1 and H (1) has been defined. Let In the remainder of this section, we shall show that, for every I ∈ I(H), the above procedure indeed constructs sets g(I) and f * (I) that have the desired properties; in particular, we shall show that f * decomposes as f * = f • g.
Claim 4.2. For each r ∈ s , the hypergraph H (r) , if it was defined, satisfies: The proof of Claim 4.2 requires the following simple fact, which justifies our definition of the vectors α (1) , . . . , α (s) . Fact 4.3. Suppose that r ∈ s − 1 . Let α ∈ R r and let α * ∈ R r+1 be defined as in (3).
for all t ∈ r + 1 .
Proof. Note first that and, since 50(r + 1) 50s = Γε 2 , Finally, if 1 < t r, then 6 In order to do so, we have to make sure that σ H (r+1) ε 3 p 50(r+1) . In the analysis of the procedure, below, we will verify that this is always the case.
where the second to last equality is Pascal's formula.
Proof of Claim 4.2. We prove the claim by induction on s−r. The basis of the induction is the case r = s. Property (i ) is satisfied, as H (s) = H and I ∈ I(H). In order to see that property (ii ) holds as well, note first that the first inequality in the main assumption (2) of Theorem 2.1 gives where the second inequality holds as Γ = 5000s 3 and the last inequality holds since ε = 1/(10s) and, consequently, (1 + ε) 10s e 10εs = e 3. The second inequality in (2) implies that 1 s · δ · |V | = 10ε δ · |V | εp 50 and thus we may conclude that Suppose now that r ∈ s − 1 and that the hypergraph H (r+1) was defined. We first argue that we are allowed invoke Lemma 4.1 in step (C1) above. Indeed, α (r+1) 1 = (1 + ε) 10r 1 and thus our inductive assumption implies that as needed (in order to apply Lemma 4.1 with G ← H (r+1) ). If the hypergraph H (r) was defined, in step (C4), then Lemma 4.1 guarantees that where α * is defined as in (3), with α ← α (r) . However, Fact 4.3 states that α * α (r+1) coordinate-wise and, therefore, we may conclude that as needed.
We now verify that g(I) and f * (I) have the desired properties and that f * decomposes as f * = f • g. Since g(I) = S (s−1) ∪ · · · ∪ S (r) , for some r ∈ s − 1 , the fact that g(I) ⊆ I is an immediate consequence of the definitions of S (s−1) , . . . , S (r) , made in step (C2), and the fact that the respective sets S I are all contained in I, as guaranteed by Lemma 4.1. Moreover, since each of these sets S I has at most ⌈2p|V |/ε⌉ elements, see Lemma 4.1, we have where the last inequality holds because the main assumption (2) and Fact 3.1 imply that p s 4 · σ (1) H 2 s 4 /|V |. The set f * (I) is defined either in step (C3), for some r ∈ s − 1 , or at the end of the procedure, if the 1-uniform hypergraph H (1) is constructed. In the former case, f * (I) = C(S I ) for functions S and C obtained from Lemma 4.1. Note that I \ g(I) ⊆ and, on the other hand, Claim 4.2 states that In the latter case, f * (I) = v ∈ V : {v} ∈ H (1) . In particular, we must have I ⊆ f * (I), since otherwise I would not be an independent set in H (1) , which would contradict ) and property (ii ) in Claim 4.2 allows us to conclude that Finally, we show that f * decomposes as f * = f • g. To this end, it suffices to show that if g(I) = g(I ′ ), for some I, I ′ ∈ I(H), then f * (I) = f * (I ′ ). In fact, we shall prove the following stronger statement. Note that Claim 4.4 implies the desired property of f * . Indeed, assume that g(I) = g(I ′ ). Since g(I) ⊆ I and g(I ′ ) ⊆ I ′ , as shown above, we have g(I) = g(I ′ ) ⊆ I ∩ I ′ and the claim yields f * (I) = f * (I ′ ).
Proof of Claim 4.4. The claim is an easy consequence of the respective property of the function S from the statement of Lemma 4.1. Indeed, it suffices to show that the container-constructing procedure described above defines the same sets S (r) and the same hypergraphs H (r) when applied to both I and I ′ ; this is because g(I) and g(I ′ ) are unions of the respective sets S (r) and the sets f * (I) and f * (I ′ ) depend only on the sets S (r) and the hypergraphs H (r) . One may prove this assertion by induction on s − r. For the induction step, note that while the procedure performs step (C1), the respective hypergraphs H (r+1) are identical (by the inductive assumption) and therefore so are the functions S, C, and F. Moreover, since S I ⊆ g(I) and S I ′ ⊆ g(I ′ ), by our definition of g(I) and g(I ′ ), we may conclude that S I ⊆ I ′ and S I ′ ⊆ I and thus, the final assertion of Lemma 4.1 gives us the equality S I = S I ′ .
4.4. Pruning hypergraphs. In order to streamline the analysis of our algorithm that constructs the r-uniform hypergraph H (r) from the (r + 1)-uniform H (r+1) , we will first prune the latter hypergraph by removing from it vertices with unusually high degree. More precisely, define, for a nonempty r-uniform hypergraph H and t ∈ r , one should think of∆ t (H) as a robust analogue of the maximum degree ∆ t (H). In particular, Fact 3.1 implies that∆ 1 (H) ∆ 1 (H); even though equality sometimes holds (when H is regular), in general the ratio ∆ 1 (H)/∆ 1 (H) can be arbitrarily large. Our next lemma shows that this inequality becomes nearly tight, up to a multiplicative factor of O(r), after we delete a small proportion of the edges of H.
Suppose that H is a nonempty r-uniform hypergraph. Then, for every R r, there is an Proof. Given a nonempty r-uniform hypergraph H and R r, define By the definition of X and∆ 1 (H), we have In particular, deleting from H all edges containing at least one vertex of X yields a hypergraph H ′ satisfying the assertion of this lemma.
Since the degree measures σ (t) (H) are not defined when H is an empty hypergraph, in order to streamline our analysis, we will start building the hypergraph H (r) by seeding it with a fixed well-behaved r-uniform hypergraph. In order to guarantee that, at the end of the algorithm, this initial seed constitutes only a negligible proportion of H (r) , we need to make sure that the link hypergraphs H (r+1) v that the algorithm adds to H (r) are somewhat large. We will achieve this by (temporarily) removing vertices of very small degree from various subhypergraphs of H (r+1) . Input. Let G be an (r + 1)-uniform hypergraph with vertex set V . Let ε ∈ (0, 1/(9r)) and p ∈ (0, 1), define a = 25 ε 2 , and suppose that .
Observe that uniformly scaling the multiplicities of all edges of G by a positive integer factor k does not affect the t-degree measure σ (t) G , for any t ∈ r , nor does it change the family I(G) of independent sets of G. It does, however, increase the value of e(G), and thus also the value of∆ t (G), for each t ∈ r , by the same multiplicative factor k. Consequently, we may assume, without loss of generality, that there is a (large) positive integer m such that Finally, let α ∈ R r be a vector with nonnegative coordinates and let I be an independent set of G.
Setup. Let L be the empty set and let G (0) * be the hypergraph obtained from the complete r-uniform hypergraph with vertex set V by changing the multiplicities of all of its edges to m, so that e(G (0) * ) = m |V | r . Further, apply Lemma 4.5, with R ← r+1 ε , to find an and Main loop. Do the following for j = 0, 1, . . .: (S1) If |L| = b or e(A (j) ) < (1 − 2ε) · e(G), then let J = j and STOP.
(S2) LetÂ (j) be a canonically chosen spanning subgraph of A (j) satisfying the assertion of Fact 4.6 with β ← ε.
v and let v j be a canonically chosen vertex that minimises the quantity If v j ∈ I, then add j to the set L and let Otherwise, let G 4.6. Basic properties of the algorithm and the key dichotomy. In this section, we establish several basic properties of the algorithm and state its key 'dichotomy' property, which we shall derive in later sections. Moreover, we explain how to use the algorithm to prove Lemma 4.1. We start by showing that the algorithm terminates on every input and that the output hypergraph G * and the final set L retain important information about the input set I. Proof. The first assertion holds because in step (S4), all edges containing v j are removed, and hence its degree remains zero in each A (j ′ ) with j ′ > j. Therefore, the algorithm stops after at most |V | iterations of the main loop. The second assertion holds because G * comprises only edges of the link hypergraphs G v for which v ∈ I and because j ∈ L if and only if v j ∈ I.
We next observe that the set L contains all the information about the input set I that is needed to reconstruct the execution of the algorithm. Proof. The only decisions that depend on the input set I are taken in step (S3) of the algorithm. Each time this step is executed, the decision taken is encoded in the set L by placing, or not placing, the index j in L.
The function S : I(G) → P(V ) whose existence is asserted by Lemma 4.1 will be defined as follows: In other words, S I comprises precisely those among the queried vertices v 0 , . . . , v J−1 that belong to the input set I. Note that |S I | = |L| b, since the algorithm terminates in step (S1) as soon as L has b elements. We shall now show that the knowledge of the set S I is enough to reconstruct the final set L and hence, as stated in Observation 4.8, the entire execution of the algorithm. In fact, the following stronger statement is true.
Lemma 4.9. Suppose that, for two inputs I, I ′ ∈ I(G), we have S I ⊆ I ′ and S I ′ ⊆ I. Then, for both these inputs, the algorithm outputs the same set L.
Suppose that S I = S I ′ for some I, I ′ ∈ I(G). As S I ⊆ I and S I ′ ⊆ I ′ , by construction, Lemma 4.9 implies that the output set L must be the same for both I and I ′ .
Proof of Lemma 4.9. Suppose that two inputs I and I ′ yield sets L and L ′ , respectively, with L = L ′ . Let j be the smallest index such that j ∈ (L \ L ′ ) ∪ (L ′ \ L); without loss of generality, we may assume that j ∈ L \ L ′ . Since L ∩ {0, . . . , j − 1} = L ′ ∩ {0, . . . , j − 1}, the algorithm produces the same sequences v 0 , . . . , v j while working with inputs I and I ′ . Since j ∈ L \ L ′ , we must have v j ∈ S I and v j / ∈ I ′ . In particular, S I I ′ .
Finally, define the vector α * ∈ R r+1 as in (3): The key dichotomy property, stated in our next lemma, is that either the algorithm inspects many vertices of the hypergraph (before encountering the bth vertex of I) or the final hypergraph G * is a good 'model' of G, in the sense that the ℓ 2 -norm of σ α (G * ) does not exceed the ℓ 2 -norm of σ * α (G).
Lemma 4.10. At least one of the following holds: We shall prove Lemma 4.10, which lies at the heart of the matter, in the next two sections. We finish the current section with a short derivation of Lemma 4.1, which is now straightforward. Given an (r + 1)-uniform hypergraph G and numbers ε and p as in the statement of the lemma, we may define the function S : I(G) → P(V ) as in (8), by running the algorithm on each input I ∈ I(G). If J Since we want the final hypergraph G * to have small ℓ 2 -norm of σ α (G * ), in step (S2) of the algorithm, we consider a vertex v that, essentially, minimises the ℓ 2 -norm of σ α (G (j,v) * ) over all eligible v ∈ V . The following proposition, which is the core of the proof of Theorem 2.1, bounds the minimum of σ α (G (j,v) * ) from above. Given an (r + 1)-uniform hypergraph A and a vector α ∈ R r , we definê Proposition 4.11. Suppose that A is an (r + 1)-uniform hypergraph with vertex set V and that G * is an r-uniform hypergraph with the same vertex set. Suppose that α ∈ R r has nonnegative coordinates. Then, there exists a vertex v ∈ V with nonzero degree in A such that the hypergraph Since the right-hand side of (9) is rather complicated, let us explain the underlying intuition. The two terms∆ 1 (A)/e(G * ) and∆ α (A)/e(G * ) should be viewed as 'error terms'. If we assumed that they are both zero, inequality (9) would simplify to This simplified inequality (10) states that, as long as the ℓ 2 -norm of σ α (G * ) exceeds that of σ α (A), there is a vertex v ∈ V such that the ℓ 2 -norm of σ α (G v * ) is strictly smaller than that of σ α (G * ). Moreover, the difference σ α (G * ) − σ α (G v * ) is proportional to the difference σ α (A) − σ α (G * ) . Proposition 4.11 will allow us to show that, as we repeatedly update G * ← G v * in step (S3) of the algorithm, the value σ α (G * ) drifts, rather quickly, towards σ α (A) .
The reason why Proposition 4.11 is true stems from Fact 3.6, which states that the vector σ α (A) is a convex combination of the vectors σ α (A v ), where v ranges over V , and the coefficient of each σ α (A v ) in this combination is proportional to deg A v. This basic property of the degree measures enables us to express the problem of minimising σ α (G v * ) , solved by the proposition, in a simple, abstract way, as we now do in the next lemma.
Lemma 4.12. Suppose that ν 1 , . . . , ν k ∈ R d and λ ∈ R k all have nonnegative coordinates and λ 1 = λ 1 + · · · + λ k = 1. Define For every positive x, every µ ∈ R d with nonnegative coordinates, and all x 1 , . . . , x k ∈ (0, x], there exists an i ∈ k such that λ i > 0 and the vector µ i defined by Proof. Note first that Since ν i , µ 0, by our assumption on non-negativity of the coordinates, we have Substituting this inequality into (12), dividing both sides by x i , and summing over The definition of ν, the assumption λ 1 = 1, and the Cauchy-Schwarz inequality give Substituting this inequality into (13), recalling the assumption that max i x i x, yields Finally, as λ 1 = 1, there must exist an i ∈ k such that λ i > 0 and the ith summand in the left-hand side of (14) is at most λ i times the right-hand side of (14). This gives which is easily seen to be equivalent to the desired inequality (11).
Proof of Proposition 4.11. For every v ∈ V , let G v * = G * ∪ A v . We claim that, for each t ∈ r , the t-degree measure of G v * is a convex combination of the t-degree measures of G * and A v and the coefficients in this convex combination are proportional to e(G * ) and deg A v, respectively. Indeed, since for every T ⊆ V , the degree of T in G v * is simply the sum of the degrees of T in G * and A v , we have Av .
Dividing the above equality through by e(G v * ) = e(G * ) + deg A v, we obtain Av .
It is now straightforward to verify that Lemma 4.12 implies the existence of a vertex v ∈ V satisfying the assertion of the proposition.

4.8.
Proof of the key dichotomy property. In this section, we use Proposition 4.11 to bound the expression from step (S2) in the description of our algorithm. This is the most technically demanding part of the proof. Throughout this section, we use the notation introduced in Section 4.5. We start with an easy dichotomy.
Proof. Since A is obtained from A (0) by removing all edges that contain at least one of the vertices v 0 , . . . , v J−1 , we have e(A (0) ) − e(A) Consequently, it follows from (7) and our upper bound on e(A) that Lemma 4.14. If e(A) (1 − 2ε) · e(G), then e(G * ) p · e(G).
Proof. We prove (17) by induction on j. Since G (0) * is (an integer multiple of) the complete r-uniform hypergraph, σ (t) G (0) * is the uniform probability measure on V t and, consequently, for every t ∈ r . On the other hand, Fact 3.5 implies that, for every t ∈ r , Fact 3.1 implies that, for every v ∈ V such that G v is nonempty, Recall from (6) that we have chosen m so that which, substituted into the previous inequality, implies that After we multiply both sides of (18) by α t and sum the resulting inequalities over all t ∈ r , we obtain , which implies (17) when j = 0.
Suppose now that j 0 and assume that (17) holds; we shall show that this inequality remains true after we replace j with j + 1. We may assume that G (j) * and there is nothing to prove. Let v be a vertex satisfying the assertion of Proposition 4.11 with A ←Â (j) and G * ← G (j) * . The vertex v j was chosen in step (S2) so that so it suffices to bound the right-hand side of (19) from above by a ·∆.
The assertion of Proposition 4.11, inequality (9), is equivalent to the inequality . (20) Since e(G (j,v) * ) = e(G (j) * ) + degÂ (j) v, inequality (20) may be rewritten as We shall now simplify the right-hand side of (21) somewhat. To this end, observe first that, as the algorithm did not terminate in step (S1), we must have Consequently, Fact 3.2 implies that, for every t ∈ r + 1 , and, since clearly e(Â (j) ) e(G), Summing (22), with both sides squared, and (23) over all t, with appropriate weights, yields Furthermore, recall from (6) that we have chosen m so that and, by (7), where the last inequality holds as a 2 = 625/ε 4 2(r + 1)/ε. We may now substitute (24) and (25) into (21) and rearrange the terms to obtain the following inequality: We now consider two cases, depending on how large σ α (G (j) * ) 2 is.
We first claim that  To see this, note that the left-hand side of (28) is negative when σ α (G (j) * ) > 2σ, as a 12. Otherwise, the first factor in the left-hand side is at most 3/a and the second factor is at most 4σ 2 . Substituting (28) into (27), using the assumed upper bound on where the second inequality holds because degÂ (j) v ∆ 1 (Â (j) ) (a/2)·e(G (j) * ), see (26).
We will show that the second term in the right-hand side of (27) is nonpositive, which will give which in turn implies the desired inequality, as e(G where the last inequality holds due to our assumption that ε < 1/9.

4.9.
Proof of the key lemma. After having made all the preparations, we are finally ready to prove Lemma 4.10.
Consequently, Fact 3.2 implies that It now follows from Lemma 4.15 that .

Probabilistic inequalities
The proofs of Theorems 1.2, 1.3, and 1.4 make use of well-known probabilistic inequalities. The first of them are standard tail bounds for binomial distributions. We shall also need the following version of Janson's inequality [27], which can be easily deduced from the statements found in [2,Chapter 8].
Theorem 5.2 (Janson's inequality). Suppose that Ω is a finite set and let B 1 , . . . , B k be arbitrary subsets of Ω. Form a random subset R ⊆ Ω by independently keeping each ω ∈ Ω with probability p ω ∈ [0, 1]. For each i ∈ k , let X i be the indicator of the event that B i ⊆ R and define Then, 6. The typical structure of K r+1 -free graphs 6.1. Outline. The first, key part of the proof of Theorem 1.2 is showing that, for sufficiently small δ, the number of K r+1 -free subgraphs of K n that are not δn 2 -close to being r-partite is much smaller than 2 ex(n,K r+1 ) , which is a trivial lower bound on the number of K r+1 -free graphs. This statement is derived from a container lemma for K r+1 -free graphs (Proposition 6.1 below), which is obtained by applying Theorem 1.6 to the r+1 2uniform hypergraph that encodes copies of K r+1 in K n , and the 'supersaturated' version of the stability theorem of Erdős and Simonovits proved in [5] and stated as Lemma 6.3 below. Proposition 6.1, which is the main result of this section, supplies a covering of all K r+1 -free subgraphs of K n with few containers, each of which is a subgraph of K n with either fewer than n 2 /8 edges or fewer than n r+1/2 copies of K r+1 (after we delete from it some n 2−1/(8r) edges), whereas Lemma 6.3 is used to show that all containers with nearly ex(n, K r+1 ) edges must be close to being r-partite.
The remainder of the proof is showing that all but an 2 −n/(10r) 4 -proportion of K r+1free subgraphs of K n that are δn 2 -close to being r-partite are in fact r-partite. Our three-step argument is loosely based on the methods of [8]. First, we show that all but a tiny fraction of graphs in our collection admit an optimal, balanced r-partition with at most δn 2 monochromatic edges (i.e., edges whose both endpoints belong to the same part of the partition); an r-partition is optimal if it minimises the number of monochromatic edges and it is balanced if each partite set comprises at least n/(2r) vertices. Second, we bound from above the number of remaining graphs whose associated r-partition induces a monochromatic copy of K 1,D in one of the parts, where D = ⌊n/(2 14 r 4 log n)⌋. Third, we bound from above the number of remaining graphs whose associated r-partition induces a monochromatic matching with a given number of edges in one of the parts. The second and third steps complement one another as every graph with t edges contains either a copy of K 1,D or a matching with at least t/D edges. 6.2. An efficient container lemma for K r+1 -free graphs. The following statement, which is the main technical result of this section, is an efficient container lemma for K r+1 -free subgraphs of K n . It is obtained by applying Theorem 1.6 to the r+1 2 -uniform hypergraph that encodes copies of K r+1 in K n . Proposition 6.1. For all sufficiently large n and all r satisfying 2 r log n/(120 log log n), there exists a collection G of at most exp n 2−1/(8r) subgraphs of K n such that: (i) Each K r+1 -free subgraph of K n is contained in some member of G.
(ii) Each G ∈ G either has fewer than n 2 /8 edges or it contains a subgraph G ′ with e(G ′ ) e(G) − n 2−1/(8r) that has fewer than n r+1/2 copies of K r+1 .
Proof. Let n be a large integer and suppose that r satisfies 2 r log n/(120 log log n). Let γ = 1/(8r) and observe that n γ = exp log n 8r exp(12 log log n) = (log n) 12 .

6.3.
Almost all K r+1 -free graphs are almost r-partite. The following theorem, which is a rather straightforward consequence of our container lemma for K r+1 -free graphs (Proposition 6.1) and the 'supersaturated' version of the stability theorem of Erdős and Simonovits (Lemma 6.3 below) proved by Balogh, Bushaw, Collares, Liu, Morris, and Sharifzadeh [5], may be viewed as an approximate version of Theorem 1.2. It states that, under the assumptions of Theorem 1.2, almost all K r+1 -free subgraphs of K n are almost r-partite. To make this notion precise, given nonnegative integers r and t with r 2, we shall say that a graph G is t-close to being r-partite if G can be made r-partite by removing from it at most t edges. In other words, G is t-close to being r-partite if G contains an r-partite subgraph G ′ with e(G ′ ) e(G) − t. Conversely, we shall say that a graph G is t-far from being r-partite if G is not t-close to being r-partite or, in other words, if χ(G ′ ) > r for every G ′ ⊆ G with e(G ′ ) e(G) − t.
Theorem 6.2. The following holds for all sufficiently large n and all r satisfying 2 r log n/(120 log log n). Let F denote the family of K r+1 -free subgraphs of K n that are (8 log n) −13 n 2 -far from being r-partite. Then |F| 2 ex(n,K r+1 )−n .

Lemma 6.3 ([5]
). Suppose that n, r, and t are positive integers. Every n-vertex graph G that is t-far from being r-partite contains at least Proof of Theorem 6.2. Let δ = (8 log n) −13 so that F is the family of K r+1 -free subgraphs of K n that are δn 2 -far from being r-partite. Let G be the family of containers for K r+1 -free graphs supplied by Proposition 6.1. We partition G into two parts as follows: Fix an arbitrary G ∈ G 1 . Since e(G) > n 2 /8, it must be the case that G contains a subgraph G ′ with that contains fewer than n r+1/2 copies of K r+1 . Let G ′ be any such subgraph of G and let t ′ be the smallest number of edges one can delete from G ′ to make it r-partite. Lemma 6.3 implies that and hence, by (30), t ′ − 3n 2−1/(8r) e 2r · r! · n 3/2 r 3r · n 3/2 n 7/4 .
6.4. Balanced and unbalanced r-partitions. Let Π be an arbitrary r-partition of n . We shall say that Π is balanced if min P ∈Π |P | n 2r and that it is unbalanced otherwise. In the sequel, we denote by K Π the complete r-partite graph whose colour classes are the parts of Π. Fact 6.4. Suppose that r 2 and let Π be an unbalanced r-partition of n . Then e(K Π ) ex(n, K r+1 ) − n 2 16r 2 + n.
Proof. Let P ∈ Π be an arbitrary part satisfying |P | < n 2r and let Q ∈ Π be an arbitrary part satisfying |Q| n r . Set let Π ′ be the partition obtained from Π by moving some m vertices from Q to P , and observe that Since |Q| − |P | > n 2r , then m n 4r and, consequently, m 2 n 2 16r 2 − n. Finally, since K Π ′ is an r-partite graph, we have e(K Π ′ ) ex(n, K r+1 ) and the claimed upper bound on e(K Π ) follows.
Our next lemma bounds from above the number of subgraphs of K n that admit an unbalanced r-partition with few monochromatic edges.
Lemma 6.5. The following holds for all sufficiently large n and all r satisfying 2 r log n. Let F denote the family of all G ⊆ K n that satisfy e(G \ K Π ) n 2 (r log n) 2 for some unbalanced r-partition Π. Then |F| 2 ex(n,K r+1 )−n .
Proof. Denote by P u the family of all unbalanced r-partitions of n . For every Π ∈ P u , let F Π denote the family of all graphs G ⊆ K n that satisfy e(G \ K Π ) n 2 (r log n) 2 . We have · 2 e(K Π ) n 2 n 2 (r log n) 2 · 2 e(K Π ) 2 e(K Π )+ 4n 2 r 2 log n .
Let Col r (n) denote the family of all r-partite subgraphs of K n . Even though some graphs in Col r (n) admit many different proper r-colourings, our next lemma, which is implicit in the work of Prömel and Steger [37], shows that the average number of proper r-colourings of a graph in Col r (n) is only slightly larger than one. Our proof of the lemma is an adaptation of the argument underlying the proof of [8, Proposition 5.5].
Lemma 6.6. The following holds for all sufficiently large n and all r satisfying 2 r log n. Denoting by P the family of all r-partitions of n , we have Proof. Denote by P b the family of all balanced r-partitions of n and let P u = P \ P b .
Since, for every pair Π, Π ′ ∈ P, there are exactly 2 e(K Π ∩K Π ′ ) subgraphs of K n that are properly r-coloured by both Π and Π ′ , Bonferroni's inequality (the inclusion-exclusion principle) gives |Col r (n)| The claimed inequality will thus follow once we establish the following claim.
Claim 6.7. For every Π ∈ P b , Fix distinct Π, Π ′ ∈ P b . Suppose that Π = {P 1 , . . . , P r } and Π ′ = {P ′ 1 , . . . , P ′ r } and, for all i, j ∈ r , let P i,j = P i ∩ P ′ j . We will say that the vertices in P i,j are moved from P i to P ′ j . For every i ∈ r , define L i and S i as the largest and the second largest subclasses of P i , respectively (with ties broken arbitrarily). Note that |P i | n 2r implies that |L i | n 2r 2 . Set s = max j∈ r |S j | and let S = S j for the smallest j for which the maximum in the definition of s is achieved. Note that 1 s n/2, as s = 0 would imply that (P ′ 1 , . . . , P ′ r ) is a permutation of (P 1 , . . . , P r ), and therefore Π = Π ′ . By the pigeonhole principle, either some pair {L i , L j } of largest subclasses or some largest subclass L i and S, where S P i , are moved to the same vertex class P ′ k . Since P ′ k is an independent set in K Π ′ , it follows that K Π ∩ K Π ′ has no edges between the sets L i and L j or L i and S. Since, we have e(K Π ∩ K Π ′ ) e(K Π ) − sn 2r 4 . Observe that, given a Π ∈ P, we can describe any Π ′ ∈ P \ {Π} by first picking the (ordered) partitions (P i,j ) j∈ r for every i and then setting P ′ j = i∈ r P i,j . We claim that, for every s, the number of ways to choose all P i,j in such a way that max i∈ r |S i | = s is at most n r 2 · n sr 2 . Indeed, one may first specify the sequence |P i,j | i,j∈ r and then specify, for each i ∈ r , the elements of each P i,j with j ∈ r , apart from L i (which will comprise all the remaining, unspecified elements of P i ).
We may thus conclude that as claimed.
6.5. The number of K r+1 -free graphs with a monochromatic star. The following lemma will be used to bound from above the number of K r+1 -free graphs whose optimal r-partition induces a monochromatic copy of K 1,D in one of the parts.
Lemma 6.8. Let D be an integer satisfying D 2 r r. Suppose that Π is an r-partition of n and that S is a copy of K 1,D with V (K 1,D ) ⊆ P for some P ∈ Π. If v ∈ P is the centre vertex of S, then Proof. Let G be a uniformly chosen random subgraph of K Π . Expose G on all the edges of K Π that have an endpoint in P and condition on deg It suffices to show that, for every such conditioning, Pr(G ∪ S K r+1 ) 2 − D 2 8r 2 . For each Q ∈ Π \ {P }, choose an arbitrary set of D neighbours of v in Q and let K be the family of all D r copies of K r in K Π whose vertices belong to the chosen D-element sets or to V (K 1,D ) \ {v} ⊆ P . Since, under our conditioning, v is adjacent (in G ∪ S) to all the vertices of each K ∈ K, we have Pr(G ∪ S K r+1 ) Pr(K G for all K ∈ K). We may bound the latter probability from above using Janson's inequality. Define, as in the statement of Theorem 5. .
6.6. The number of K r+1 -free graphs with a monochromatic matching. The following lemma will be used to bound from above the number of K r+1 -free graphs whose optimal r-partition induces a monochromatic matching with a given number of edges in one of the parts. Lemma 6.9. Suppose that Π is a balanced r-partition of n and that M is a matching with m edges such that V (M ) ⊆ P for some P ∈ Π. If r 2 · 2 r+3 n, then Proof. Let G be a uniformly chosen random subgraph of K Π , so that the assertion of the lemma becomes equivalent to the inequality Pr(G ∪ M K r+1 ) 2 − mn 2 10 r 4 . Let N = Q∈Π\{P } |Q| and note that the assumption that Π is balanced implies that N n 2r r−1 . Denote by K − r+1 the graph obtained from K r+1 by removing from it a single edge and let K be the collection of all copies of K − r+1 in K Π that form a K r+1 with an edge of M . Note that |K| = mN and that Pr(G ∪ M K r+1 ) = Pr(K G for all K ∈ K). We may thus bound this probability from above using Janson's inequality. Define, as in the statement of Theorem 5. .
We conclude that as claimed. 6.7. Proof of Theorem 1.2. Suppose that positive integers n and r satisfy 2 r log n/(120 log log n) and let Col r (n) and F denote the families of all r-partite and all K r+1 -free subgraphs of K n , respectively. Since Col r (n) ⊆ F, it suffices to show that Let δ = (8 log n) −13 and let P be the family of all r-partitions of n . Define, for every graph G ∈ F, t(G) = min{e(G \ K Π ) : Π ∈ P} and let F close = {G ∈ F : 1 t(G) δn 2 } and F far = {G ∈ F : t(G) > δn 2 }, so that F close ∪ F far = F \ Col r (n). Furthermore, for every G ∈ F close , let Π(G) be an arbitrary r-partition that achieves the minimum in the definition of t(G). Let F b close comprise these G in F close for which Π(G) is a balanced partition and let F u close = F close \ F b close . Finally, for every balanced partition Π ∈ P and every integer t satisfying 1 t δn 2 , define close : t(G) = t and Π(G) = Π}. Letting P b denote the set of balanced r-partitions of n , we thus have It follows from Theorem 6.2 and Lemma 6.5 that the first and the second terms in the right-hand side of (32) are at most 2 ex(n,K r+1 )−n each. To bound the final term, we shall derive the following estimate.
Proof of Claim 6.10. Let D = n 2 14 r 4 log n and define has a matching of size ⌈t/(Dr)⌉ for some P ∈ Π . Since every graph with t edges contains either a vertex with degree at least D or a matching with at least t/D edges, we have F t,Π = F S t,Π ∪ F M t,Π and we may bound |F t,Π | from above in two steps.
First, we claim that if G ∈ F S t,Π and v ∈ P ∈ Π is the centre vertex of a copy of K 1,D in G[P ], then deg G (v, Q) D for all Q ∈ Π. Indeed, if this were not true, then moving v from P to Q would yield a partition Π ′ such that which would contradict our assumption that Π = Π(G). It thus follows from Lemma 6.8 (which we may apply as D n 1/2 2 r r when n is sufficiently large) that where the last inequality holds because, by our choice of δ and D, D 2 8r 2 n 2 2 32 r 10 (log n) 2 n 2 (8 log n) 12 8δn 2 log n 4t log n + n + t.
Second, it follows from Lemma 6.9 that where the last inequality holds because ⌈t/(Dr)⌉n 2 10 r 4 − 4t log n − t min τ ∈{1,2,... } τ n 2 10 r 4 − τ Dr · (4 log n + 1) as the definition of D assures that n 2 14 Dr 4 log n. Since |F t,Π | |F S t,Π | + |F M t,Π |, combining the two bounds above gives the assertion of the claim. 7. Lower bounds for ε-nets 7.1. Outline. Our (randomised) construction of planar point sets X without a small ε-net for the range space of lines on X is a slight simplification of the construction of Balogh and Solymosi [9]. The high-level idea of both constructions, which can be traced back to the work of Alon [1], may be summarised as follows. We find an integer s, a finite set X ⊆ R 2 , and a sub-collection L of all lines in R 2 such that: (i) Every line in L contains at most s points from X.
(ii) The s-uniform hypergraph H whose edges are all intersections of the lines in L with X that have exactly s points has no independent sets larger than (1− c)|X|, for some constant c > 0.
Given such s, X, and L, we set ε = s/|X| and observe that, by (i), the complement of every ε-net N for the range space of lines on X is an independent set of H. By (ii), this means that |N | c|X| = cs/ε, which improves upon the trivial bound |N | Ω(1/ε) if s can be made arbitrarily large. The challenge is to make s as large as possible, as a function of |X|.
In the construction of Alon [1], the set X is a generic projection of the d-dimensional grid s d to R 2 and L is the image of all combinatorial lines in s d via this projection; property (i) holds trivially whereas property (ii), for large enough d, is guaranteed by the density version of the Hales-Jewett theorem proved by Furstenberg and Katznelson [24]. In Balogh and Solymosi's [9] construction, X was a generic projection of a random subset of a larger, high-dimensional integer grid, trimmed appropriately to guarantee property (i) for a careful choice of L; property (ii) was established with the use of the hypergraph container theorem of Saxton and Thomason [42]. Here, we take X to be a random subset of n 2 , trimmed appropriately, and establish (ii) using our efficient container lemma, Theorem 1.6.
We shall find an ε ∈ (0, 1/n) and a set X ⊆ R 2 without a small ε-net among the subsets of the integer grid n 2 , which we shall from now on denote by P . We will be able to prove the claimed lower bound on the smallest size of an ε-net of X for the range space of all lines in R 2 by considering only a fairly small family L of lines that we now specify. Given an integer h ∈ M − 1 and a point (x 0 , y 0 ) ∈ n × M , we let ℓ(x 0 , y 0 ; h) be the line passing through (x 0 , y 0 ) whose slope is M/h, that is, Since M is prime, and thus co-prime with h, the vector t · (h, M ) has integer coordinates if and only if t ∈ Z. Moreover, if t is an integer, then y 0 + tM ∈ n if and only if t ∈ {0, . . . , m − 1}. In particular, ℓ(x 0 , y 0 ; h) intersects P in at most m points; it intersects P in exactly m points if and only if x 0 + (m − 1)h n. Now, for every h ∈ M − 1 , let so that every line in L h intersects P in exactly m points. Since the lines in L h are pairwise disjoint (as they are parallel), we have Let h max = ⌊n/(10m)⌋ so that L h has at least 9n 2 /10 points for every h ∈ h max . Finally, define We shall say that a set A ⊆ P is L-collinear if A is contained in some line in L. As every line in L contains exactly m points of P , the number of a-element L-collinear subsets of P is precisely |L| · m a for every a ∈ {2, . . . , m}. Suppose that p satisfies K · m 10 · n −1/(s−1) · log n p m −1 · n −1/s (35) for some large absolute constant K; such a number does indeed exist as n 1/(s−1)−1/s n 1/s 2 m m 2 /s 2 = m 100 K · m 10 · log n, provided that m is sufficiently large. Let R be a p-random subset of P and let X ⊆ R be a largest subset of R that contains no L-collinear subset of s + 1 points. By maximality of X, every point of R \ X forms an L-collinear (s + 1)-element set with some s points of X. In particular, |R \ X| is at most the number of L-collinear (s + 1)-element subsets of R. It follows that by the second inequality in (35) Claim 7.1. With probability at least 1/2, every set I ⊆ R with |I| 3n 2 p/5 contains an L-collinear subset of s points.
Together with the above calculations, Claim 7.1 implies that there exists a set X ⊆ P of at least 4n 2 p/5 points that has the following two properties: (a) X has no L-collinear subset with s + 1 elements; (b) every set of 3n 2 p/5 elements of X contains an L-collinear s-element subset.
Suppose that X is such a set, let ε = s/|X|, and assume that N ⊆ X is an ε-net for the range space of lines. In particular, N intersects every L-collinear subset of X that has at least s = ε|X| elements. Since X contains no L-collinear set with more than s points, X \ N contains no L-collinear subset of s points, and thus Finally, since 1/ε = |X|/s |P | = n 2 , we have, using (33), This gives the assertion of the theorem. We now prove Claim 7.1. Let H be the s-uniform hypergraph with vertex set P whose edges are all L-collinear s-element subsets of P . The assertion of the claim is that, with probability at least 1/2, the random set R contains no independent set of H that has at least 3n 2 p/5 elements. This is a simple consequence of the following lemma, which lies at the heart of the matter.
Lemma 7.2. There is a family C of at most exp(pn 2 /300) containers for the independent sets of H such that |C| n 2 /2 for every C ∈ C.
We first show how Lemma 7.2 implies the assertion of Claim 7.1. Let C be a family of containers for the independent sets of H supplied by the lemma and let B be the event that R contains an independent set of H with at least 3n 2 p/5 elements. Since every independent set of H is contained in some member of C, each of which has at most n 2 /2 elements, we have so that b s is convex and b s (a) = a s whenever a is a nonnegative integer. Jensen's inequality gives Recall that, for every h ∈ h max , the lines in L h cover all but at most n 2 /10 points of P . In particular, for every such h, We now verify that we may apply Theorem 1.6, with α ← 1/2 and β ← 1/3, to the hypergraph H. First, we have αβq · v(H) = pn 2 1800m 5 log n 10 9 s 7 and 10 4 s 5 q = 10 4 s 5 p m 6 log n 1 log n β, provided that n is sufficiently large. Second, for every t ∈ {2, . . . , s}, by (34) and (35), provided that K is sufficiently large. The theorem supplies a collection C of containers for the independent sets of H such that |C| exp 10 4 s 5 β −1 log(e/α) · q log(e/q) · v(H) exp m 5 · q log n · n 2 exp pn 2 /300 , where we used the inequality q e/n, which holds if K is sufficiently large, and, for every C ∈ C, either |C| α · v(H) = n 2 /2 or there is a subset W ⊆ C with |W | (1 − β)|C| = 2|C|/3 such that e(H[W ]) < E = |L|. We claim that, in fact, |C| n 2 /2 for every C ∈ C. Indeed, if |C| > n 2 /2 and W ⊆ C satisfies |W | 2|C|/3 > n 2 /3, then e(H[W ]) |L|, by Lemma 7.3.

Upper bounds on Ramsey numbers
In this section, we derive the upper bounds on Folkman numbers and on induced Ramsey numbers stated in Theorems 1.4 and 1.5. We shall do this by building containers for non-Ramsey colourings of subgraphs of a large complete graph K N and examining how a random subgraph of K N , drawn with an appropriately chosen distribution for each of the two theorems, intersects these containers. This approach to studying Ramsey properties of random graphs was introduced in the work of Nenadov and Steger [34]. The only Ramsey-theoretic ingredient in our proof is the following supersaturated version of Ramsey's theorem, which is a refinement of [34, Corollary 2.2].
Lemma 8.1. Suppose that n and k are positive integers and let R = R(n; k). If N R, then every colouring c : E(K N ) → k + 1 either assigns the colour k + 1 to at least (1/2) · (N/R) 2 edges or it contains at least (1/2) · (N/R) n monochromatic copies of K n in colours 1, . . . , k.
Proof. The choice of R guarantees that the edge-colouring induced by every subset of R vertices of K N contains either an edge coloured k + 1 or a monochromatic copy of K n in one of the remaining k colours. On the other hand, each edge and each copy of K n are contained in, respectively, N −2 R−2 and N −n R−n such subsets. Denoting by M the total number of monochromatic copies of K n in colours 1, . . . , k, we thus have In particular, since, for every ℓ ∈ {2, n}, (36) implies that either |c −1 (k+1)| (1/2)·(N/R) 2 or M (1/2)·(N/R) n .
8.1. Folkman numbers (proof of Theorem 1.4). Let k and n be positive integers, let R = R(n; k) and suppose that an integer N satisfies for some large constant Γ. We shall give a randomised construction of a K n+1 -free subgraph of K N that satisfies G → (K n ) k , proving that F (n; k) N . We shall from now on assume that k 2 and n 3, as otherwise the assertion of the theorem is trivial.
Suppose that G ⊆ K N . We shall identify a k-colouring c : E(G) → k of the edges of G with the set (e, c e ) : e ∈ E(G) ⊆ E(K N ) × k .
Let H be the hypergraph with vertex set E(K N ) × k whose edges are all sets of the form ϕ E(K n ) × {i}, where ϕ : V (K n ) → V (K N ) is an arbitrary injection and i ∈ k . If a graph G ⊆ K N admits a colouring c : E(G) → k with no monochromatic copy of K n , then c, when viewed as a subset of E(K N ) × k , is an independent set of H. We shall say that a graph G ⊆ E(K N ) is compatible with a set C ⊆ E(K N ) × k if there exists a colouring c : E(G) → k that is contained in C. Equivalently, G is compatible with C if and only if {e} × k ∩ C = ∅ for every e ∈ E(G). In other words, defining X(C) = e ∈ E(K N ) : e × k ∩ C = ∅ , G is compatible with C if and only if X(C) ∩ E(G) = ∅. Suppose that p satisfies D · (knR) 20 · N −2/(n+1) log N p N −2/(n+2) DR (38) for some large constant D; such a number does indeed exist as (37) implies that provided that Γ is sufficiently large. The following lemma is key.
There is a family C of at most exp pN 2 256R 2 containers for the independent sets of H such that |X(C)| N 4R 2 for every C ∈ C.
We first show how Lemma 8.2 implies the assertion of the theorem. To this end, suppose that G ∼ G N,p and denote by Z the number of copies of K n+1 in G. The upper bound in (38) implies that E[Z] p ( n+1 2 ) N n+1 = pN 2 · p (n+2)/2 N n−1 pN 2 (DR) (n+2)(n−1)/2 pN 2 64R 2 , provided that D is sufficiently large. Let G ′ be the subgraph obtained from G by deleting an arbitrary edge from every copy of K n+1 ; observe that K n+1 G ′ and e(G ′ ) e(G) − Z.
Suppose that G ′ → (K n ) k . This means that there is a colouring c : E(G ′ ) → k that is an independent set of H. Therefore, G ′ must be compatible with some container from C. In other words, there is some C ∈ C such that X(C) ∩ E(G ′ ) = ∅ and, consequently, |X(C) ∩ E(G)| e(G) − e(G ′ ) Z.
We may conclude that Fix an arbitrary C ∈ C. Since |X(C)| In particular, there is a graph G ′ ⊆ K N such that G ′ K n+1 and G ′ → (K n ) k , as claimed.
We shall prove that, with probability very close to one, the uniformly chosen random subgraph of K N is induced-Ramsey for H in k colours, proving that R ind (H; k) N . We shall from now on assume that k 2 and n 3, as otherwise the assertion of the theorem is trivial.
Suppose that G ⊆ K N . We shall identify a k-colouring c : E(G) → k of the edges of G with the set where ϕ : V (H) → V (K N ) is an arbitrary injection and i ∈ k . If a graph G ⊆ K N admits a colouring c : E(G) → k such that c −1 (i) does not contain a copy of H that is induced in G for any i ∈ k , then c, when viewed as a subset of E(K N ) × {0, . . . , k}, is an independent set of H.
We shall say that a graph G ⊆ E(K N ) is compatible with a set C ⊆ E(K N ) × {0, . . . , k} if there exists a colouring c : E(G) → k that is contained in C. Equivalently, G is compatible with C if and only if (e, 0) ∈ C for every e ∈ E(K N ) \ E(G) and {e} × k ∩ C = ∅ for every e ∈ E(G). In other words, defining X 1 (C) = e ∈ E(K N ) : (e, 0) / ∈ C and X 2 (C) = e ∈ E(K N ) : e × k ∩ C = ∅ , G is compatible with C if and only if X 1 (C) ⊆ E(G) and X 2 (C) ∩ E(G) = ∅. The following lemma is key.
It remains to show that |X 1 (C) ∪ X 2 (C)| N 4R 2 for every C ∈ C. Suppose that this were not true and set X(C) = X 1 (C) ∪ X 2 (C and let c : E(K N ) → k + 1 be an arbitrary colouring such that (e, c e ) ∈ W for every e / ∈ X(W ) and c e = k + 1 otherwise. By Lemma 8.1, either |X(W )| = |c −1 (k + 1)| (1/2) · (N/R) 2 or the colouring c has at least (1/2) · (N/R) n monochromatic copies of K n in colours 1, . . . , k. However, the former inequality contradicts (42) and thus the latter must hold. Let K be an arbitrary copy of K n in K N that c colours with some i ∈ k . Since E(K) ∩ X 1 (W ) = ∅, then E(K) × {0, i} ⊆ W and, as a result, any injection ϕ : V (H) → V (K) corresponds to an edge of H[W ]. This implies that e(H[W ]) (1/2) · (N/R) n E, contradicting our assumption.