Square functions and the Hamming cube: Duality

For $1<p\leq 2$, any $n\geq 1$ and any $f:\{-1,1\}^{n} \to \mathbb{R}$, we obtain $(\mathbb{E} |\nabla f|^{p})^{1/p} \geq C(p)(\mathbb{E}|f|^{p} - |\mathbb{E}f|^{p})^{1/p}$ where $C(p)$ is the smallest positive zero of the confluent hypergeometric function ${}_{1}F_{1}(\frac{p}{2(1-p)}, \frac{1}{2}, \frac{x^{2}}{2})$. Our approach is based on a certain duality between the classical square function estimates on the Euclidean space and the gradient estimates on the Hamming cube.

Theorem 1.1. For any 1 < p ≤ 2, n ≥ 1, and any f : {−1, 1} n → R we have Here p = p p−1 is the conjugate exponent of p, and by s q we denote the smallest positive zero of the confluent hypergeometric function 1 F 1 (− q 2 , 1 2 , x 2 2 ) (see (6) for the definition).
In Lemma A.2 we obtain a lower bound s p ≥ 2/p for 1 < p ≤ 2 which is precise when p → 2. If p = 2k for k ∈ N, then s p becomes the smallest positive zero of the Hermite polynomial H 2k (x) where H m (x) = R (x + iy) m e −y 2 /2 √ 2π dy.
The constant s p in (1) is larger then all previously known bounds [15,2] when p is in a neighborhood of 2, say p ∈ (1.26, 2). For example, the estimate (1) improves the Naor-Schechtman bound [15] for the class of real valued functions for all 1 < p < 2. Indeed, it follows from an application of Khinchin inequality with the sharp constant and (1) that we have the following corollary where E x and E x average in variables x and x = (x 1 , . . . , x n ) ∈ {−1, 1} n correspondingly.
On the other hand s p degenerates to 0 when p → 1+ which should not be the case for the best possible constant by a result of Talagrand (see Section 3.5). For this endpoint case, when p is close to 1, the result of Ben-Efraim-Lust-Piquard [2] gives the better bounds and when p = 1 it is widely believed that the sharp constant in the left hand side of (3) should be 2/π instead of 2/π (see Section 3.5 for more details).
We think that the main contribution of the current paper is not just Theorem 1.1 that we obtain but rather a new duality approach that we develop between two different classes of extremal problems: square function estimates on the interval [0, 1] and gradient estimates on the Hamming cube, and Theorem 1.1 should be considered as an example. Roughly speaking one can take a valid estimate for a square function, dualize it by a certain double Legendre transform, and one can write its corresponding dual estimate on the Hamming cube and vice versa. To illustrate another example of our duality approach, in Section 3.4 we present a short proof of the following theorem which improves a well-known inequality of Beckner Theorem 1.3 (see [11]). For any n ≥ 1, and any f : {−1, 1} n → R we have where ℜ denotes the real part, and z 3/2 is understood in the sense of principal brunch in the upper half-plane.
Going back to Theorem 1.1, it will be explained later that s p in a "dual" sense coincides with the sharp constant found by B. Davis in the L q norm estimates Here B t is the standard Brownian motion starting at zero, and T is any stopping time. It was explained in [8] that the same sharp estimates (4) and (5) hold with B T replaced by an integrable function g on [0, 1] with mean zero, and T 1/2 replaced by the dyadic square function of g.
We notice the essential difference between the Davis estimates (4), (5) and (1) that for a given power p, 1 < p ≤ 2, we need the "dual" constant s p = s p p−1 in the theorem. Besides, inequality (1) cannot be extended to the full range of exponents p with some finite strictly positive constant c(p) unlike (4) and (5) (see [8,4,6] and (49)).

Proof of the main result 2.1 An anonymous Bellman function
In this section we want to define a function U : R 2 → R that satisfies some special properties. Let α ≥ 2 and let β = α α−1 ≤ 2 be the conjugate exponent of α. Let be the confluent hypergeometric function. N α (x) satisfies the Hermite differential equation with initial conditions N α (0) = 1 and N α (0) = 0. Let s α be the smallest positive zero of N α . Set smooth even concave function. The concavity follows from Lemma A.1 and the fact that N α (s α ) < 0. Finally we define For the first time the function U(p, q) appeared in [8]. Later it was also used in [20,21] in the form u(p,t) = U(p, √ t), t ≥ 0. It was explained in [8] that U(p, q) satisfies the following properties: U(p, q) ≥ |q| α s α α − |p| α for all (p, q) ∈ R 2 , and when q = 0, the equality holds; 2U(p, q) ≥ U(p + a, a 2 + q 2 ) +U(p − a, a 2 + q 2 ) for all (p, q, a) ∈ R 3 .
We should refer to (9) as the obstacle condition, and to (10) as the main inequality. We caution the reader that in [8] one may not find (10) written explicitly but one will find its infinitesimal form which follows from the main inequality by expanding it into Taylor's series with respect to a near a = 0 and comparing the second order terms. Here u pp is defined everywhere except the curve |p/ √ t| = s α where u is only differentiable once.
In fact, the reverse implication also holds, i.e., one can derive (10) from (11) for this special U. This was done in the PhD thesis of Wang [21] but we will present a short proof in Section A.2, which partly follows the Davis argument. Essentially the same argument also appeared later in [1] in a slightly different setting.
The function U(p, q) is essential in obtaining the result in the Davis paper, namely it is used in the proof of (4), and the argument goes as follows. First one shows that is a supermartingale which is guaranteed by (11). Finally, by the optional stopping theorem, which yields (4). One may notice that U(p, q) is the minimal function with properties (9) and (10). Davis mentions that the proof presented in his paper was suggested by an anonymous referee, and this explains the title of the current section.
Lemma 2.1. For each (x, y) ∈ R × R + , there exists (p * , q * ) = (p * (x, y), q * (x, y)) such that min q≤0 sup p∈R Ψ(p, q, x, y) = max p∈R inf q≤0 Ψ(p, q, x, y) = Ψ(p * , q * , x, y) and we have Proof. First let us show that for each fixed (x, y) the function Ψ(p, q, x, y) is convex in q and concave in p. The concavity in p follows from Lemma A.1, and the fact that U is even and C 1 smooth in p.
Since u α (z) coincides with N α (z) up to a positive constant, the convexity follows from Lemma A.1 and the fact that α ≥ 2.

Proof.
Set Lemma 2.1 gives points (p * , q * ) and (p ± , q ± ) corresponding to (x, y) and (x ± , y ± ). It follows from (14) that to prove (17) it would be enough to find numbers p ∈ R and q 1 , q 2 ≤ 0 such that The right choice will be but let us explain it in details. Notice that by Cauchy-Schwarz we have Denoting r 2 j = q 2 j − (q * ) 2 for j = 1, 2, we see that it is enough to find p ∈ R and r 1 , r 2 ≥ 0 such that By choosing p = p + +p − 2 , and substituting the values for x ± = x ± a we see that it would suffice to find r 1 , r 2 ≥ 0 such that We will choose But this inequality follows from (10).
To verify the obstacle condition (16), notice that (9) for U(p, q) gives Finally if y = 0, then we obtain Equality (*) follows from the fact that is an even convex map.
For any a, x ∈ R, all y, b ∈ R N , and any N ≥ 1, we have Proof. It follows from the definition of M that the map y → M(x, y) is decreasing in y for y ≥ 0. Therefore by (17) and the triangle inequality we obtain The inequality (20) gives rise to the estimate Indeed, the reader can find in [11] the passage from (20) to (21). In fact, inequality (20) is the same as where E x j takes the average in the coordinate x j , i.e., The rest follows by iterating (22), the fact that E = E x 1 . . . E x n and |∇E f | = 0.

The proof of Theorem 1.1
We have and this gives inequality (1).

Remarks and Applications
where g I = 1 |I| I g. The square function S(g) is defined as follows For convenience we always assume that the number of nonzero terms in (23) is finite so that S(g)(x) makes sense. Let O(p, q) be a continuous real valued function, and suppose one wants to estimate the quantity 1 0 O(g, S(g)) from above in terms of 1 0 g. If one finds a function then one obtains (see [20]) the bound Conversely, suppose that the inequality holds for all integrable functions g on [0, 1] and some F. Then there exists U(p, q) such that the conditions (24), (25) are satisfied and U(p, 0) ≤ F(p). Indeed, consider the extremal problem This U satisfies (24) (take g = p constant), and, in fact, it satisfies (25). The latter fact can be proved by using the standard Bellman principle (see Chapter 8, [17], and survey [16]). Besides, because of (27). Therefore there is one to one correspondence between the extremal problems for the square function of the form (27) and the functions U(p, q) with the properties (24) and (25). The gradient estimates on the Hamming cube are more subtle. Take any real valued O(x, y) and suppose that we want to estimate E O( f , |∇ f |) from above in terms of E f for any f : {−1, 1} n → R and for all n ≥ 1.
then 1 one can obtain the estimate (see [11]) Thus finding such M is sufficient to obtain the estimate but it is unclear whether conditions (28) satisfy (29) and (28), and, thereby, (30) for any f : {−1, 1} n → J and any n ≥ 1.
One may think that finding U(p, q) with the property (25) is a difficult problem. Let us make a quick remark here that if it happens that t → U(p, √ t) is convex for each fixed p ∈ I, then (25) is automatically implied by its infinitesimal form, i.e., by U pp +U q /q ≤ 0 (see the proof of Lemma A.4).
Proof. The proof essentially repeats the proof of Lemma 2.2. Let us sketch the argument. Define Ψ(p, q, x, y) := px + qy +U(p, |q|). The existence of a saddle point (p * , q * ) with properties (13) and (14) is guaranteed by Lemma A.5. The convexity of the set I allows us to choose p from I, and q 1 , q 2 ∈ (−∞, 0] according to (18). The rest of the proof of the theorem is the same as in Lemma 2.2. Inequality (32) follows from (24). Convexity of J is needed, for example, to ensure that if f : {−1, 1} n → J, then E f ∈ J, so that (30) makes sense.

Going from M to U: from Hamming cube to square function
Another interesting observation is that equality (31) was lurking in a solution of a certain Monge-Ampère equation. For example, taking a, b → 0 in (29), and using the Taylor's series expansion (assuming that M is smooth enough) one obtains When looking for the least function M with M ≥ O and (33), it is reasonable to assume that condition (33) should degenerate except, possibly, on the set where M coincides with its obstacle O. The degeneracy of (33) means that the determinant of the matrix in (33) is zero. This is a general Monge-Ampère type equation and, after an application of the exterior differential systems of Bryant-Griffiths (see [12]), we obtain that the solutions can be locally characterized as follows: where U satisfies the equation In [12] we used u(p,t) = −U(p, √ 2t) instead of U(p, q), in which case (35) becomes just the backward heat equation for u(p,t). We will not formulate a formal statement but we do make a remark that such a reasoning allows us to guess the dual of M, i.e., to find U given M. The way this guess works will be illustrated in Section 3.4.
Our final remark is that one may try to use U(p, q) := M(p, q) with O(p, q) := O(p, q) because (29) clearly implies (25). It will definitely give some estimate for the square function but not the sharp one. Indeed, for the sharp estimates, condition (25) for U usually degenerates, namely (35) holds. On the other hand, if M xx + M y /y = 0 and (33) holds, then M xy = 0, and for some constants C, D, Q ∈ R. This family of functions corresponds to the trivial inequality 1 0 S(g) 2 ≤ 1 0 g 2 . Analogously, the best possible function U satisfying (24) and (25) will almost never satisfy (33) except for a very particular case when U(p, q) = C(p 2 − q 2 ) + Dp + Q.
Next, repeating a standard argument, namely, considering tg and applying Chebyshev's inequality (see Theorem 3.1 in [7]), one obtains the superexponential bound for any λ ≥ 0. We should remind that the log-Sobolev inequality via the Herbst argument [13] gives Gaussian concentration inequalities, namely, for any λ ≥ 0 and any smooth f : R n → R with ∇ f ∞ < ∞. Here γ is the standard Gaussian measure on R n .
In other words we just illustrated that estimates (39) and (38) are dual to each other in the sense of duality between functions M = x ln x − y 2 2x and U = e p−q 2 /2 .

Poincaré inequality 3/2: a simple proof via duality
It was proved in [11] that for any f : where z 3/2 is taken in the sense of the principal brunch in the upper half-plane. Inequality (40) improves Beckner's bound for a particular exponent [11]. Consider It was explained in [11] that to prove (40) it is enough to check that M(x, y) satisfies (29), and the latter fact involved careful investigation of the roots of several very high degree polynomials with integer coefficients. Let us give a simple proof of (29) using our duality technique. Proof. M(x, y) is a solution of the homogeneous Monge-Ampère equation (33), and therefore it has a representation of the form (34) (see Section 3.1.4 in [12]): M(x, y) = px + qy +U(p, q).
This leads us to the following guess which can be directly checked. Using Theorem A.6 with (p 0 , q 0 ) = (0, 0), and following the proof of Lemma 2.2, it is enough to check that U(p, q) satisfies (25). Notice that (25) is an identity for U(p, q) = − 4 27 (p 3 − 3pq 2 ). This finishes the proof of the proposition.

Sobolev inequalities 3.5.1 The Hamming cube {−1, 1} n
For p ∈ [1,2], let c p be the best possible constant such that Our theorem implies that c p ≥ s p p for p ∈ (1, 2]. Notice that when p = 2, we have c 2 = s 2 2 = 1, and (41) recovers the classical Poincaré inequality. When p → 1+ the constant s p p tends to zero which should not be the case for c p . Indeed, it follows from a deep result of Talagrand [19] that if T p is the best possible constant in the following estimate then T p > 0 for all p ∈ [1, ∞). Now notice that T 1 = c 1 , T 2 = c 2 and T p ≥ c p for p ∈ (1, 2). When p > 2, by example (49), we must have c p = 0 unlike the fact that T p > 0 for p > 2. So one may wonder whether the positivity of T p may not imply the positivity of c p on the interval (1, 2). Let us mention that this is not the case, in fact 2c p ≥ T p for p ∈ (1, 2). Indeed, it will suffice to prove that 2E| Plugging x = f /E f , and taking the expectation, we obtain 2E| To verify (43), without loss of generality assume that p > 1 (otherwise the inequality is trivial). Consider g(x) = 2|x − 1| p − |x| p + 1. Its second derivative changes signs at points x which satisfy the equation |x − 1| = 2 1/(2−p) |x|, i.e., when x = x ± = 1 1±2 1/(2−p) . The right hand side of (43) represents the tangent line to the graph of g at the point x = 1. Clearly g is convex on [x + , ∞). Therefore (43) is true on this interval. Next, g is concave on [x − , x + ] and since x − < 0, we have g ≥ p(1 − x) on [0, x + ] because g(0) > p(1 − 0). Thus (43) is true for x ≥ 0. For x ≤ 0, by Bernoulli we have To the best of our knowledge, the constants c p , T p are unknown for p ∈ [1, 2). There is a remarkable result of Ben-Efraim-Lust-Piquard [2] that T p ≥ 2 π for 1 ≤ p ≤ 2. This, combined to our theorem, gives the lower bound T p ≥ max{ 2 π , s p } for 1 ≤ p ≤ 2. However, due to the inequalities of Bobkov-Götze and Maurey-Pisier (see the next section), it is widely believed that π . An elegant idea of Naor-Schechtman [15] based on Burkholder's inequality [5] gives an estimate Let us show that our bound (2) obtained in Corollary 1.2 is better.

Gaussian measure on R n
The application of the Central Limit Theorem to (1) gives a dimension independent Sobolev inequality.
Corollary 3.5. For any smooth bounded f : R n → R and any n ≥ 1, we have The best possible constant in (47), unlike s p p , should not degenerate when p → 1+. Indeed, (see [14], pp. 115) one has where the constant 2 π is the best possible in the left hand side of (48). We should mention that estimate (48) can be also easily obtained by a remarkable trick of Maurey-Pisier [18].
Notice that (47) cannot be extended for the range of exponents p > 2 with some positive constant C(p) instead of s p p . Indeed, assume the contrary. Consider n = 1 and take f (x) = 1 + ax. Using Jensen's inequality, we obtain Therefore, taking a → 0, we obtain the contradiction with pa 2 /2 > |a| p C(p) for p > 2.

Discrete surface measure
Let A ⊂ {−1, 1} n be a subset of the Hamming cube with cardinality |A| = 2 n−1 . Define w A : {−1, 1} n → N∪{0} so that w A (x) is the number of boundary edges to A containing x, i.e., w A (x) counts the number of edges with one endpoint in A and another one in the complement of A such that one of the endpoints is x. Clearly w A (x) = 0 if x is in the "strict interior" of A, or in the "strict complement" of A, and it is nonzero if and only if x is on the "boundary" of A. Notice that w A (x) can be nonzero for some x / ∈ A. The function w A maybe be understood as a discrete surface measure of the boundary of A. Consider the following quantity It follows from Harper's edge-isoperimetric inequality [10] that σ (2) = 1 and the value is attained on the halfcube. The monotonicity of σ (p) in p implies that σ (p) = 1 for all p ≥ 2. Also notice that considering Hamming balls, one can easily show that σ (p) = 0 for 0 ≤ p < 1. Therefore the first nontrivial value is σ (1).
Inequality (51) gives the lower bound σ (p) ≥ s p p which tends to 1 as p → 2−, but fails to be sharp when p → 1+. Thus combining this result with Bobkov's inequality we obtain the bound
Proof. Consider G α (t) := e −t 2 /4 N α (t). Notice that the zeros of G α and N α are the same. It follows from (7) that Besides we know that the solution is even. Consider the critical case α = 2. In this case G 2 (t) = e −t 2 /4 (1 − t 2 ) and the smallest positive zero is s 2 = 1. Therefore it follows from the Sturm comparison principle that 0 < s α < 1 for α > 2 (see below). Moreover, the same principle applied to G α 1 and G α 2 with α 1 > α 2 implies that G α 1 has a zero inside the interval (−s α 2 , s α 2 ). Thus we conclude that s α is decreasing in α.
To verify that N α , N α ≤ 0 on [0, s α ], first we claim that for α 1 > α 2 > 0. Indeed the proof works in the same way as the proof of Sturm's comparison principle. For the convenience of the reader we decided to include the argument. As before, consider G α j = e −t 2 /4 N α j . It is enough to show that G α 2 ≥ G α 1 on [0, s α 1 ]. It follows from (53) that G α 2 (0) > G α 1 (0). Therefore, using the Taylor series expansion at the point 0, we see that the claim is true at some neighbourhood of zero, say [0, ε) with ε sufficiently small. Next we assume the contrary, i.e., that there is a point a ∈ [ε, s α 1 ] such that G α 2 ≥ G α 1 on [0, a], G α 2 (a) = G α 1 (a) and G α 2 (a) < G α 1 (a) (notice that the case G α 2 (a) = G α 1 (a), by the uniqueness theorem for ODEs, would imply that G α 2 = G α 1 everywhere, which is impossible). Consider the Wronskian We have W (0) = 0 and W (a) = G α 1 (a)(G α 1 (a) − G α 2 (a)) ≥ 0. On the other hand, we have which is a clear contradiction, and this proves the claim. It follows from (6) that it follows from the Sturm comparison principle (see the previous discussions) that G α > V α > 0 on (0, 2/α).

A.2 Heat inequality
Let U(p, q) be defined as in (8).
Lemma A.3. For any p ∈ R, the map is convex.
Proof. Without loss of generality, assume that p ≥ 0. We recall that U(p, √ t) = t α/2 u α (p/ √ t). Since α ≥ 2, the only interesting case to consider is when p/ √ t < s α (otherwise t α/2 is convex). In this case we have U(p, √ t) = t α/2 N α (p/ √ t) up to a positive constant which we are going to ignore, and, therefore, by (7) we Therefore it would be enough to show that for any γ ≥ 0, the function N γ (x) x γ is decreasing for x ∈ (0, s γ+2 ). Differentiating, and using (7) again, we obtain d dx which is nonpositive by Lemma A.1.
Lemma A.4 (Barthe-Mauery [1]). Let J be a convex subset of R, and let V (p, q) : J × R + → R be such that V pp + V q q ≤ 0 for all (p, q) ∈ J × R + ; (56) t → V (p, √ t) is convex for each fixed p ∈ J.
The lemma says that the global discrete inequality (58) is in fact implied by its infinitesimal form (56) under the extra condition (57).
Proof. The argument is borrowed from [1]. The similar argument was used by Davis [8] in obtaining sharp square function estimates from the ones for the Brownian motion.
Without loss of generality assume a ≥ 0. Consider the process X t = V (p + B t , q 2 + t), t ≥ 0.
Here B t is the standard Brownian motion starting at zero. It follows from Ito's formula together with (56) that X t is a supermartingale. Let τ be the stopping time τ = inf{t ≥ 0 : B t / ∈ (−a, a)}.

A.3 Minimax theorem for noncompact sets
Let P, Q be nonempty closed convex sets in R. We say that a pair (p * , q * ) ∈ P × Q is a saddle point of f on P × Q if f (p, q * ) ≤ f (p * , q * ) ≤ f (p * , q) for all (p, q) ∈ P × Q. For the proof we refer the reader to Proposition 1.2 in [9], pp. 167.
Theorem A. 6. Suppose that f : P × Q → R is continuous, concave in p, convex in q, and there exists (p 0 , q 0 ) ∈ P × Q such that The theorem is Proposition 2.2 in [9], pp. 173.