Asymptotic Structure for the Clique Density Theorem

The famous Erd\H{o}s-Rademacher problem asks for the smallest number of $r$-cliques in a graph with the given number of vertices and edges. Despite decades of active attempts, the asymptotic value of this extremal function for all $r$ was determined only recently, by Reiher [Annals of Mathematics, 184 (2016) 683--707]. Here we describe the asymptotic structure of all almost extremal graphs. This task for $r=3$ was previously accomplished by Pikhurko and Razborov [Combinatorics, Probability and Computing, 26 (2017) 138--160].

Let us turn to the asymptotic version. Namely, given α ∈ [0, 1] take any integer-valued function 0 e(n) It is not hard to see from basic principles (see e.g. [PST19, Lemma 2.2]) that the limits exist and do not depend on the choice of the function e(n). Thus, the determination of g r (α) amounts to estimating the Erdős-Rademacher function within additive error o(n r ). Clearly, h r (α) is an upper bound on g r (α).
For each α ∈ [0, 1], it is not hard to find some sequence of graphs (H α,n ) n∈N , that give the value of h r (α). If α = 1, then we can let H 1,n := K n be the complete graph. Suppose that 0 α < 1. Let integer k 1 satisfy α ∈ [1 − 1 k , 1 − 1 k+1 ). Then fix the (unique) c > 1 k+1 so that the complete (k + 1)-partite graph H α,n := K(V 1 , . . . , V k+1 ) with parts V 1 , . . . , V k+1 , where |V 1 | = · · · = |V k | = ⌊cn⌋, has edge density (α + o(1)) n 2 as n → ∞. It is easy to show, see (2.8), that c 1 k . Thus |V k+1 | = n − k⌊cn⌋ 0. (In fact, our choice to round cn down was rather arbitrary, just to make the graph H α,n well-defined for each α and n.) It is routine to show (see e.g. [Nik11, Theorem 1.3] for a derivation) that these ratios give the value of h r (α), that is, (1.1) The function h r stays 0 when α 1 − 1 r−1 (when k r − 2). Also, as Lemma 2.7 implies, h r consists of countably many concave "scallops" with cusps at α = 1 − 1 m for integer m r − 1. For a while, the best known lower bound on the limit function h r , by Bollobás [Bol76], was the piecewise linear function which coincides with h r on the cusp points. Fisher [Fis89] showed that g 3 (λ) = h 3 (λ) for all 1/2 λ 2/3, that is, he determined g 3 in the first scallop. Razborov used his newly developed theory of flag algebras first to give a different proof of Fisher's result in [Raz07] and then to determine the whole function g 3 in [Raz08]. The function g 4 was determined by Nikiforov [Nik11] and the function g r for any r 5 was determined by Reiher [Rei16] (with these two papers also giving new proofs for the previously solved cases of r).
1.1. Main result. We are interested in the asymptotic structure of (almost) extremal graphs for the Erdős-Rademacher K r -minimisation problem, that is, a description up to o(n 2 ) edges of every (n, m)-graph with G r (n, m) + o(n r ) copies of K r as n → ∞. Of course, such a result tells us more about the problem than just the value of g r . Asymptotic structure results are often very useful for proving enumerative and probabilistic versions of the corresponding extremal problem. For example, the more general problem of understanding the structure of graphons with any given edge and r-clique densities appears in the study of exponential random graphs (see Chatterjee and Diaconis [CD13] and its follow-up papers), phases in large graphs (see the survey by Radin [Rad18]), and large deviation inequalities for the clique density (see the survey by Chatterjee [Cha16]). Last but not least, asymptotic structure results often greatly help, as a first step, in obtaining the exact structure of extremal graphs via the so-called stability approach pioneered by Simonovits [Sim68]. Here, knowing extremal n-vertex graphs within o(n 2 ) edges greatly helps in the ultimate aim of ruling out even a single "wrong" adjacency. In fact, almost all cases when the Erdős-Rademacher problem was solved exactly were established via the stability approach.
In order to state the main result of this paper, we have to define further graph families. For α = 1, we let H 1,n := {K n }. For α ∈ [0, 1), by using the notation defined before (1.1), let H α,n consist of all graphs that are obtained from the complete k-partite graph on parts V 1 , . . . , V k−1 and U := V k ∪ V k+1 by adding a triangle-free graph on U with |V k | · |V k+1 | edges. (In particular, H 0,n := { K n } consists of the empty graph only.) Clearly, for any r 3, the number of r-cliques in the obtained graph does not depend on the choice of the graph added on U . Also, H α,n ∋ H α,n is always non-empty (but typically has many non-isomorphic graphs). Finally, for r 3, let H r,n be the union of H α,n over all α ∈ [0, 1] as well as the family all K r -free n-vertex graphs.
Pikhurko and Razborov [PR17] proved that every almost extremal (n, m)-graph G is o(n 2 ) close in the edit distance to some graph in H 3,n . Our main result is to extend this structural result to all r 4: Theorem 1.2. For every real ε > 0 and integer r 4, there are δ > 0 and n 0 such that every graph G with n n 0 vertices and at most (g r (α) + δ) n r r-cliques, where α := e(G)/ n 2 , can be made isomorphic to some graph in H r,n by changing at most εn 2 adjacencies.
Note that H 3,n ⊆ H r,n . Also, all graphs in H r,n \ H 3,n are K r -free; these are "trivial" minimisers for the g r -problem that need not be minimisers for the g 3 -problem. Thus, apart from these graphs, the g 3 and g r extremal problems have the same set of approximate minimisers and we explore this in our proof as follows. In brief, we take any almost G r (n, m)extremal graph G. Suppose that G has strictly more than G 3 (n, m) + o(n 3 ) triangles for otherwise G is o(n 2 )-close in the edit distance to H 3,n by the result in [PR17] and we are done since H 3,n ⊆ H r,n . If α := m/ n 2 is 1− 1 k + o(1) for some integer k r (that is, the edge density of G is close to that of some Turán graph T k (n)), then we use the result of Lovász and Simonovits [LS83] that G has to be o(n 2 )-close in the edit distance to T k (n), giving the desired conclusion. Thus we can assume that the edge density is strictly inside one of the scallops. Lemma 2.7 shows that the function h r is differentiable for such α. This allows us to derive various properties of G via variational principles. The property that we will need is that, for a typical vertex x of G, there is an asymptotic linear relation between the degree of x and the number of r-cliques containing x. This relation comes from the Lagrange multiplier method. Since we know the extremal function g r , we can determine all Lagrange multipliers and write an explicit relation. Since the graph G is "heavy" on triangles, we can find a typical vertex x that is "heavy" in terms of triangles containing it. When we restrict ourselves to the graph G ′ induced by the set of neighbours of x, then the counts of triangles and r-cliques in G containing x correspond to the counts of respectively edges and (r − 1)-cliques in G ′ . Some calculations show G ′ is too "heavy" on K 2 when compared to the number of (r − 1)-cliques, contradicting the asymptotic result for r − 1 and finishing the proof. Thus, the main results on which our proof of Theorem 1.2 for a given r 4 crucially relies are the values of g r and g r−1 as well as the asymptotic structure for r = 3.
We found it more convenient to present our proof in terms of graphons that are analytic objects representing subgraph densities in large dense graphs. This reduces the number of parameters in various statements. For example, the statement that some 'natural' property fails for o(n) vertices corresponds in the limit to the statement that the set of failures has measure 0. Also, the variational principles are easier to state and derive using the graphon language, especially that the limit versions of the families H r,n , namely the families H r defined in Section 1.2, are much cleaner to define. Some downside of this is that we have to use various non-trivial (but standard) facts of measure theory. We rectify this by giving discrete analogues of some analytic constructions and properties that we use. Also, we believe that graphons, as a tool in extremal graph theory, are by now standard and widely known.
Organisation. The rest of the paper is organised as follows. We will rephrase our main result in terms of the structure of extremal graphons in the next subsection, Section 1.2. Further properties of graphons and of the family of extremal graphons are discussed in Sections 2.2 and 2.3. This is preceeded by Section 2.1 that contains some notions and results of measure theory that we will need. In Section 2.4, we derive Theorem 1.2 from our result on graphons. Then in Section 3, we present the proof of the graphon version of our main result.
1.2. Graphons with minimum clique density. For an introduction to graphons, we refer the reader to the exellent book by Lovász [Lov12].
For the purposes of this paper, it is convenient to define a graphon as a pair (W, µ), where W : [0, 1] × [0, 1] → [0, 1] is a symmetric Borel function and µ is a non-atomic probability measure on Borel subsets of [0, 1]. By small abuse of notation, we may call just the function W a graphon (if the measure µ is understood). Each graph G = (V, E) with V = {v 1 , . . . , v n } corresponds naturally to a graphon (W G , µ) with µ being the Lebesgue measure on [0, 1] and W G being the adjacency function of G which assumes value 1 on For a graph F on [r], define its homomorphism density in a graphon W by In particular, t(K 2 , W ) is called the (edge)-density of the graphon W . If W = W G , then we get the homomorphism density t(F, G) := t(F, W G ) of F in G, which is the probability that a uniformly at random chosen function f : V (F ) → V (G) maps every edge of F to an edge of G.
A sequence of graphons W n converges to W if for every graph F we have In the special case when W n = W Gn , we get the convergence of graphs G n to W . Let us call two graphons U and W weakly isomorphic if t(F, U ) = t(F, W ) for every graph F . Theorem 13.10 in [Lov12] gives several equivalent definitions of weak isomorphism. Let [W ] denote the equivalence class of a graphon W up to weak isomorphism and let If we fix some enumeration F = {F 1 , F 2 , . . . } of all graphs up to isomorphism, then one can identify each [W ] ∈ W with the sequence (t(F, W )) F ∈F ∈ [0, 1] F and the above convergence is the one corresponding to the product topology on [0, 1] F . Since the product of compact sets is compact, the closed subspace W ⊆ [0, 1] F is compact. As F is countable, every infinite sequence of graphons/graphs has a convergence subsequence. Also, for each F , the function W → t(F, W ) is continuous (as it is just the projection on the F -th coordinate). Another key property of graphons is that finite graphs are dense in W. Let

Further let
This is a direct analogue of the graph family H r,n : for example, to constract a graphon in G k we take a "complete partite" graphon with parts Ω 1 , . . . , Ω k and add a triangle-free graphon into the last part. In fact, by Lemma 2.12, H r is precisely the set of possible limits of increasing graphs sequences (G n ) ∞ n=1 with each G n ∈ H r,n . This (or an easy direct calculation) gives that t(K r , W ) = h r (t(K 2 , W )) for each W ∈ H r , where h r is defined in (1.1). Also, note that, by definition, H 3 H 4 . . . ⊆ W.
The result of Reiher [Rei16] that determines the function g r can be equivalently rephrased in the language of graphons as follows.
We call a graphon W to be K r -extremal if t(K r , W ) = g r (t(K 2 , W )). In other words, it is K r -extremal if it has the minimum K r -density among all graphons with the same edge density. In fact, the asymptotic structure result for triangles by Pikhurko and Razborov was first derived via a statement about graph limits (see [PR17, Theorem 2.1]): Our main result in terms of graphons is as follows.
This completely characterises all graphons achieving the equality in Theorem 1.4 (and implies Theorem 1.2, see Section 2.4).
A graph is always a finite graph with non-empty vertex set. For a graph G = (V, E) and bijection φ from V to some set X, we denote by φ(G) the graph on X with edge set φ(E) : The edit distance |G△H| between two graphs G and H of the same order is the minimum of |E(G)△E(φ(H))| over all bijections φ : V (H) → V (G); in other words, this is the minimum number of edge edits needed to make G and H isomorphic.
The constants in the hierarchies used to state our results have to be chosen from right to left. More precisely, if we claim that a result holds whenever e.g. c ≪ b ≪ a 1 , . . . , a s , then this means that there are coordinatewise non-decreasing functions f : (0, 1] → (0, 1] and g : (0, 1] s → (0, 1] such that the claimed result holds whenever 0 < c < f (b) and 0 < b < g(a 1 , . . . , a s ).
2.1. Some notions and results of measure theory. Let us recall some basic notions that apply when A is a σ-algebra on a set X and ν is a measure on (X, A). A function f : X → R is called A-measurable if the preimage of any open (equivalently, Borel) subset of R is in A. This class of functions is closed under arithmetic operations, pointwise limits, etc, see e.g. [Coh13, Section 2.1]. A set Y ⊆ X is called (ν-) null if there is Z ∈ A with Z ⊇ Y and ν(Z) = 0. We say that a property holds (ν-) a.e. if the set of x where it fails is ν-null. The ν-completion A ν of A consists of those A ⊆ X for which there exist B, C ∈ A with B ⊆ A ⊆ C and ν(B) = ν(C); equivalently, A ν is the σ-algebra generated by the union of A and all ν-null sets. Let µ be a probability measure on ([0, 1], B). By µ k , we denote the measure on ([0, 1] k , B) which is the product of k copies of µ. We call the sets in the µ k -completion of B([0, 1) k ) measurable and, when k is understood, denote this σ-algebra by B µ . We call a function f : Let us state some results that will be useful for us. The first one is an easy consequence of the countable addivitity of a measure, see e.g. [Coh13, Proposition 1.2.5].
Lemma 2.1 (Continuity of measure). For every measure space (X, A, ν) and every nested sequence X 0 ⊆ X 1 ⊆ X 2 ⊆ . . . of sets in A, the measure of their union ∪ n∈N X n is equal to lim n→∞ ν(X n ).
The following result will allow us to work with just Borel sets and functions.
1} and, for definiteness, we require that infinitely many of b i (x) are 0 (thus do not allow expansions where all digits are eventually 1). Note that each Then g is A-measurable as the countable convergent sum of A-measurable functions. Also, the set where f and g differ is a subset of N := ∪ ∞ i=0 N i , which has measure 0. Some points in A 0 ⊆ N 0 may have g-value greater than 1, so let f ′ (x) := min{g(x), 1}. The set where the A-measurable function f ′ differs from f is still a subset of the null set N , as required.
The following result will be frequently used (allowing us, in particular, to change the order of integration), so we state it fully. For a proof, see e.g. [Coh13, Proposition 5.2.1].
Theorem 2.3 (Tonelli's theorem). Let (X, A, µ) and (Y, C, ν) be σ-finite measure spaces and let f : Also, we will need the following result.
Proof. We will use only the special case (X, A) = ([0, 1], B) of the theorem, whose proof is very simple. Namely, the function  [Lov12,Section 13.1]. The motivation for our definition is that our proof requires changing the measure a few times while the assumption that the function W is Borel ensures that all sets and functions that we will encounter are everywhere defined and Borel.
We could have also applied the so-called purification of the graphon introduced by Lovász and Szegedy [LS10] (see also [Lov12,Section 13.3]) which would eliminate a few (simple) applications of the continuity of measure in our proof. However, we decided against using this (non-trivial) result as this could obscure the simplicity of this step.
One consequence of Tonelli's theorem (Theorem 2.3) and the identity B For graph F on [r] with 1, . . . , k designated as roots, define its rooted homomorphism density in W by Note that if F ′ is obtained from a graph F by rooting it on a fixed vertex, then If F has two roots 1 and 2 that are adjacent and F − is obtained from F by removing the When the graphon is undestood, we abbreviate it to d(x). By Tonelli's theorem, d(x) is defined for every x ∈ [0, 1] and . Note that, in the above definition, the function W remains the same and only the measure is changed. With this definition, we have that for every r 2 For example, B r,ℓ consists of those r-tuples from [0, 1] such that the number of m-subtuples that belong to B ⊆ [0, 1] m is exactly ℓ. We will later need the property that B r,2 + is much smaller in measure than B ⊆ [0, 1] 2 . This is not true in general: for instance, consider ). However, the following lemma gives the desired property provided that we can pass to a subset of B first. We call a set B ⊆ [0, 1] 2 symmetric if (x, y) ∈ B implies that (y, x) ∈ B.
Lemma 2.6. Let µ be a non-atomic measure on ([0, 1], B), η −1 ∈ N and B ⊆ [0, 1] 2 be a symmetric Borel set. Then there exists a symmetric Borel subset C ⊆ B satisfying the following for all r ≥ 3: Clearly, C is a symmetric Borel set that satisfies (C1). Given x = (x 1 , x 2 ) ∈ [0, 1] 2 , we consider the following random experiment. We choose x 3 , . . . , x r ∈ [0, 1] independently at random with respect to the probability measure µ. Let E be the event that |{ij ∈ [r] 2 : (x i , x j ) ∈ C}| ≥ 2. Note that by the definition of C, if the event E happens then at least one of x 3 , . . . , x r lies inside I i 0 ∪ I j 0 . Thus the probability of E satisfies This implies (C2).

2.3.
Properties of H r and h r . Everywhere in this section, r 3 is fixed. In order to deal with the graphon family H r , it would be convenient to define some related parameters. For t, ℓ ∈ N with ℓ 2 and γ ∈ R, define Note that if 0 γ 1 t then κ ℓ,t (γ) is the asymptotic density of ℓ-cliques in a complete partite graph with t parts of size γn and one part of size (1 − tγ)n as n → ∞. Next, for This formula comes from taking γ t (x) to be the larger root γ of the quadratic equation (2.6) With this preparation we are ready to define the two main parameters, k and c, associated to edge density α ∈ [0, 1), namely (2.7) Note that c = c(α) in (2.7) is the same as in (1.1) and Definition 1.3; also, (2.5) gives an explicit formula for c.
In other words, the function c : [0, 1) → (0, 1] is obtained by taking γ t on the interval I t for t ∈ N. (Recall that these intervals partition [0, 1).) Since the left and right limits of the function c coincide at any internal boundary point 1 − 1 t+1 , with t ∈ N (namely, both are 1 t+1 ), the explicit formula in (2.5) gives that c is a continuous and strictly monotone decreasing function on [0, 1).
(2.8) Furthermore, h r (α) = κ r,k (c) = p r,k (α), where the function h r (α) was defined in (1.1). In fact, as we will show in Lemma 2.11, h r (α) = max{p r,t (α) : t k}, that is, h r is the maximum of those functions p r,t that are defined at a given point, with p r,t being a largest one on I t .
The following lemma computes the first two derivatives of p r,t (and thus of h r in all interior points of each I t ), where we write these derivatives in terms of γ t for convenience. Note that h r is not differentiable at points 1 − 1 t for integer t r − 1: the left and right derivatives of h r exist at these points but are different.
The following lemma directly follows from the previous lemma and Taylor's approximation.
Lemma 2.8. Let α ∈ [0, 1). Let k ∈ N and c be as in (2.7). If α = 1 − 1 k (that is, α is in the interior of I k ), then there is ε > 0 such that for each α ′ = α ± ε, we have α ′ ∈ I k and The following lemma proves Theorem 1.6 for the special edge densities where the function h r is not differentiable.
Proof. The quickest way to prove the lemma is to use the weaker version of a result of Lovász and Simonovits [LS83, Theorem 2] that every graph of order n → ∞ with (α+o(1)) n 2 edges and (h r (α)+o(1)) n r copies of K r is o(n 2 )-close in the edit distance to the Turán graph T t (n). Applying this result to a sequence of graphs (G n ) ∞ n=1 , where G n has n vertices, that converges to the graphon W , we can make each G n into T t (n) by changing o(n 2 ) adjacencies. This change does not affect the convergence to W . Now, the limit of the t-partite Turán graphs is clearly [W Kt ], giving the required.
Remark 2.10. Alternatively, one can prove Lemma 2.9 operating with graphons only. Namely, the proof of Lovász and Simonovits [LS83, Theorems 1-2] for graphons would be to write t(K r , W )/t(K 2 , W ) as a telescopic product over 3 s r of t(K s , W )/t(K s−1 , W ) and bound each ratio separately, using the Cauchy-Schwartz Inequality (with double counting replaced by Tonelli's theorem). In particular, since t(K r , W ) is smallest possible, the graphon W also minimises the triangle density for the given edge density α = 1 − 1 t . By unfolding the corresponding argument from [LS83], one can show that the induced density of 3-sets spanning exactly one edge is 0. It follows with a bit of work that, similarly to graphs, W is a complete partite graphon a.e. Now, a routine optimisation (see e.g. [Nik11, Theorem 1.3]) shows that, apart a null-set, there are exactly t parts of equal measure.
We will also need the following result, essentially a consequence of the piecewise concavity of the function h r .
Lemma 2.11. For every t ∈ N and α ∈ [0, 1 − 1 t ), we have that h r (α) ≥ p r,t (α). Proof. Let x 0 := 1 − 1 t and define L r,t (x) := p r,t (x 0 ) + p ′ r,t (x 0 ) (x − x 0 ) , for x ∈ R. In other words, y = L r,t (x) is the line tangent to the curve y = p r,t (x) at x = x 0 . Since γ t (x) 1 t > 1 t+1 for x x 0 by (2.5), Lemma 2.7 gives that the function p r,t has the negative second derivative and is thus concave. We conclude that p r,t (x) ≤ L r,t (x) for all x ≤ x 0 .
Thus we are done if we show that h r (x) L r,t (x) for all x ∈ [0, x 0 ]. Note that h r (x 0 ) = L r,t (x 0 ). Since h r is a continuos function which is differentiable for every x ∈ [0, x 0 ] apart finitely many points, it is enough to show by the Mean Value Theorem that h ′ r (x) p ′ r,t (x 0 ) for each x ∈ [0, x 0 ] where h r is differentiable. So, let x ∈ I s with 0 s < t. Since h r = p r,s on I s and, by Lemma 2.7, the derivative p ′ r,s is a decreasing function, it is enough to check that p ′ r,s (1 − 1 s ) p ′ r,t (1 − 1 t ). Note that γ m (1 − 1 m ) = 1 m for each m ∈ N. If s r − 2, then by Lemma 2.7 we have that r,t (x 0 ) 0, also giving the desired inequality. Lemma 2.12. Every graphon (W, µ) in H r is the limit of some sequence (H n ) ∞ n=1 where H n ∈ H r,n for each integer n 1. Also, for all integers n 1 < n 2 < . . . and graphs H i ∈ H r,n i such that the sequence (H i ) ∞ i=1 converges, its limit is in H r . Proof. Assume that α := t(K 2 , W ) < 1 (as otherwise we can take H n to be the complete graph) and that t(K r , W ) > 0 (as H r,n contains all K r -free graphs of order n). Let Ω 1 ∪· · ·∪Ω k be the partition of the underlying space [0, 1] for the graphon W , as in Definition 1.3.
For each n 1, let G n ∼ G(n, W ) be a graph on [n] which is an n-vertex sample of W , that is, we pick n points x n,1 , . . . , x n,n ∈ [0, 1] using the probability measure µ and make i, j ∈ [n] adjacent with probability W (x i , x j ), with all choices being independent. Then the sequence G n converges to W with probability 1, see Lovász and Szegedy [LS06, Corollary 2.6]. Each graph G n comes with a vertex partition V n,1 , . . . , V n,k , where we put i ∈ V (G n ) into V n,j if x n,i ∈ Ω j . By the Chernoff Bound, we have that |V n,j |/n converges to µ(Ω j ) = c for every j ∈ [k] as n → ∞, with probability 1. Since W is an (explicit) {0, 1}-valued function, we know all edges of G n apart from the ones inside V n,k . Using that lim n→∞ t(K s , G n ) = t(K s , W ) in the special cases s = 2, 3, we derive that V n,k induces o(n 3 ) triangles in G n as well as the asymptotically correct number of edges. Fix a sequence (G n ) ∞ n=1 that satisfies all above properties.
Now we are ready to show that the edit distance between G n and some graph in H α,n is o(n 2 ), which will be enough to prove the first part of the lemma. For each n, move o(n) vertices between the parts of G n so that |V n,i | = ⌊cn⌋ for each i ∈ [k − 1]. (The new adjacencies of a moved vertex are determined by its new part, except if we move a vertex into V n,k we make it adjacent to e.g. every other vertex for definiteness.) The new graphs G n still satisfy the above properties and have the correct part sizes. Using the Triangle Removal Lemma [RS78,EFR86] (see e.g. [KS96, Theorem 2.9]), we make G n [V n,k ] triangle-free by changing o(n 2 ) adjacencies. The definition of H α,n requires to have exactly ⌊cn⌋ · (|V n,k | − ⌊cn⌋) edges in V n,k . This can be achieved by [PR17, Lemma 2.2] which states that if G is triangle-free graph with m → ∞ vertices and s = e(G) + o(m 2 ) is at most t 2 (m), then G is o(m 2 )-close in the edit distance to a triangle-free graph with exactly s edges, as desired.
Let us show the second part of the lemma. Assume that a sequence (H i ) ∞ i=1 contradicts it. As H r contains the constant-1 graphon, the limiting density α := lim i→∞ t(K 2 , H i ) must be in [0, 1). Also, lim i→∞ t(K r , H i ) > 0 since H r contains all graphons with zero K r -density.
. By the compactness of W, some subsequence of (F i ) ∞ i=1 converges to some graphon W ′ . The limiting graphon W ′ has zero triangle density. Since we know all edges of H i except inside U i , the graphon W ′ has the correct edge density. Now, define W ∈ H r as in Definition 1.3 with c = c(α), k = k(α), and W [Ω k ] being weakly isomorphic to W ′ . Since we know all adjacencies except inside Ω k , a routine calculation shows that H i converges to W , as required.
2.4. Asymptotic structure from extremal graphons. We are ready to show that Theorem 1.6 implies Theorem 1.2 by adopting the analogous step from [PR17, Section 2.2].
Proof of Theorem 1.2. Suppose for the sake of contradiction that Theorem 1.2 is false, which is witnessed by some r 4 and ε > 0. Thus we can find a sequence (G n ) n∈N of graphs of increasing orders v n := v(G n ) such that t(K r , G n ) = g r (t(K 2 , G n )) + o(1) and each G n is εv 2 n -far in the edit distance from H r,vn . By using the compactness of W and passing to a subsequence, we can additionally assume that the sequence (G n ) n∈N is convergent to some graphon W . Let α := t(K 2 , W ). Clearly, t(K r , W ) = g r (α). By Theorem 1.6, [W ] ∈ H r . By Lemma 2.12 pick H n ∈ H r,vn such that the sequence (H n ) n∈N converges to W .
For two graphs G and H of the same order n, define the cut distanceδ (G, H) to be the minimum over all bijections φ : give thatδ (G n , H n ) → 0. Namely, [BCL + 08, Theorem 2.7] states that if two graphs have similar subgraph densities, then they are close in the fractional cut-distance δ (which is defined the same way asδ except, informally speaking, φ distributes each vertex of H fractionally among V (G)), while [BCL + 08, Theorem 2.3] provides an upper bound ofδ in terms of δ .
Up to relabelling of each H n , assume thatd(G n , H n ) → 0. Take any n ∈ N and let v : (2.9), then we conclude that V i spans o(v 2 ) edges (resp. V i is almost complete to the rest). Thus, by changing o(v 2 ) adjacencies in G n , we can assume that the graphs G n and H n coincide except for the subgraphs induced by U . Suppose that |U | = Ω(v) for otherwise we get a contradiction to Lemma 2.9. We have Of course, when we modify o(v 2 ) adjacencies in G n , then the K r -density changes by o(1). Also, each edge of G n [U ] (and of H n [U ]) is in the same number of r-cliques whose remaining r−2 vertices are in V (G n )\U . Since H n [U ] is triangle-free and G n is asymptotically extremal, we conclude that G n [U ] spans o(v 3 ) triangles. We can change o(v 2 ) adjacencies and make G n [U ] to be triangle-free by the Triangle Removal Lemma and have the "correct" number of edges by [PR17, Lemma 2.2]. The obtained graph (which is o(v 2 )-close in the edit distance to G n ) is in H r,v , contradicting our assumption.

Proof of the main result
Suppose that Theorem 1.6 is not true. Let r ≥ 3 be the minimum integer such that there exists a K r -extremal graphon W = (W, µ) which does not belong to H r . Let α := t(K 2 , W ), k := k(α), and c := c(α) as in (2.7).
Our strategy is as follows. Using (W1), we will find a point x ∈ [0, 1] such that t x (K r , W ) is small while t x (K 3 , W ) is large. Note that (2.2) translates these two values together with d W (x) into t(K r−1 , N W (x)) and t(K 2 , N W (x)). Hence, this will eventually enable us to translate the assumption [W ] / ∈ H r into some conclusion about N W (x), which will violate Theorem 1.4 for r − 1.
In order to work in N W (x), we need to relate d W (x) and t x (K r , W ). For this purpose, we will make use of the following auxiliary functions. For integer t 3 and real x ∈ [0, 1], define q t (x) : By Tonelli's theorem (Theorem 2.3), d W (x) and t x (K t , W ) are (everywhere defined) Borel functions of x ∈ [0, 1], so q t and f t are also Borel. Later, in Claim 1, we will show that f r (x) = 0 for almost all x, which provides the desired relation between d W (x) and t x (K r , W ). We first prove the following lemma, which partly motivates the definition of the function f t .
Lemma 3.1. For each integer t 3, we have Proof. By definition, we have Recalling the definition of h t (α) from (1.1), one can see that the right hand side above simplifies to h t (α) − t(K t , W ), as desired.
We shall try to locate the desired point x ∈ [0, 1] as outlined above in the following subsections.
3.1. Almost all points are "K r -typical". We further introduce the following two sets. Let that is, M 0 is the set of "K r -atypical" points. Let that is, N 0 is the set of "K 3 -heavy" points. Note that both sets are Borel as f 3 and f r are Borel functions. We first show that M 0 is of negligible measure.
Proof. The statement that f r = 0 a.e. follows with some calculations from Razborov's differential calculus [Raz07, Corollary 4.6]. Informally speaking, the quantity f r (x) measures the "contribution" of x to h r (t(K 2 , W )) − t(K r , W ). The terms of f r that are linear in d(x) and t x (K r , W ) give the gradient when we increase or decrease the density of µ at x (while the constant term is chosen to make the average of f r zero). Here α = t(K 2 , W ) is in the interior of I k , where h r is differentiable. Since we cannot push h r (t(K 2 , W )) − t(K r , W ) = 0 into positive values by Theorem 1.4, it follows that f r (x) = 0 for almost every x ∈ [0, 1].
We next show that the set N 0 , which consists of "K 3 -heavy" points, has positive measure.
Proof. For each γ > 0, let N γ := {x ∈ [0, 1] : f 3 (x) < −γ}. Let On the other hand, we have that, rather roughly, f 3 (x) ≥ q 3 (x) − 1 −k 2 for all x ∈ [0, 1]. Thus we have  which one can think of as the set of edges that are "K r -heavy". As W is Borel, the set B * is Borel. The following claim states that most of the pairs of "adjacent" points are not contained in too many copies of K r .
In terms of graphs, the argument roughly says that if, on the contrary, Ω(n 2 ) edges of an almost extremal (n, m)-graph G are each in too many copies of K r (namely, in at least H r (n, m) − H r (n, m − 1) + Ω(n r−2 ) copies), then by removing a carefully selected subset of such edges we can destroy so many r-cliques so that the asymptotic result (Theorem 1.4) is violated. Again, let us give a direct proof of the claim. Suppose µ 2 (B * ) > 0. For each ε > 0, let is Borel by Tonelli's theorem. As {B ε } ε>0 forms a collection of nested sets and ε>0 B ε = B * , there is ε > 0 such that the µ 2 (B ε ) ≥ ε. We fix such ε > 0. By lowering the value of ε if necessary, and choosing a constant η, we assume that 0 < η ≪ ε ≪ µ 2 (B), α, 1/r, 1/k, c.
We can now show that D, the set of "large degree" points, is negligible, thus imposing an additional "maximum degree" condition on our graphon. Proof. In the graph theory language, the argument is informally as follows. Claim 3 bounds the number of r-cliques per typical edge of an almost extremal graph G. This, by double counting, bounds the number of r-cliques per typical vertex x in terms of its degree. On other hand, the last two parameters are linearly related by Claim 1. Putting all together, we derive the claim. ≤ µ({y ∈ Ω : (x, y) ∈ B * }) + (k − 1) (r−2) c r−2 d W (x) where the final inequality follows from the assumption x / ∈ B(0). Rearranging this, we obtain showing that x / ∈ D. Hence, D ⊆ M 0 ∪ B(0) as claimed.