Coding for Sunflowers

- Paul G. Allen school of computer science and engineering, University of Washington
- More about Anup Rao

*Discrete Analysis*, February. https://doi.org/10.19086/da.11887.

### Editorial introduction

Coding for sunflowers, Discrete Analysis 2020:2, 8 pp.

The sunflower problem is an old problem of Erdős and Rado in extremal set theory. A *sunflower* is defined to be a collection of sets A1,…,Ar such that if B=⋂ri=1Ai is their intersection, then the sets Ai∖B are disjoint. Equivalently, all the pairwise intersections Ai∩Aj (with i≠j) are the same. Erdős asked how many sets of size k there can be without a sunflower of size r. To see that there is some finite bound, observe that if one starts with m sets, then either one can find r of them that are disjoint, in which case one has a sunflower, or one can find a set A that intersects at least m/r−1 of the other sets. By the pigeonhole principle, there is an element of A that is contained in at least k−1(m/r−1) of the other sets. Removing that element from each of those sets gives a collection of sets of size k−1 and one can apply induction.

A suitably careful version of this argument gives an upper bound on m of k!(r−1)k. Erdős and Rado conjectured that the correct bound is exponential in k for fixed r. This problem is still open even for r=3.

Until recently, the best known bound did not improve much on this factorial bound. The record when r=3, due to Kostochka, was Ck!(logloglogk/loglogk)k. To get a feel for this, note that k! is approximately (k/e)k, so Kostochka replaced the linear function k/e by a function that was sublinear, but only because of a loglog factor. Then in 2019, Alweiss, Lovett, Wu and Zhang improved the bound dramatically, to one of the form (logk)k(rloglogk)O(k), replacing Kostochka’s almost linear function by a logarithmic one. If one imagines a spectrum of growth rates with exponential at one end and factorial at the other, this replaced a growth rate that was almost at the factorial end by one that was almost at the exponential end.

To achieve this, they obtained an upper bound for the number of sets of size k one could find without a structure called a *robust sunflower*, which is a collection of sets A1,…,Ar with intersection B such that if elements of ⋃i(Ai∖B) are chosen independently at random with probability α, then with probability at least 1−β there is some i such that all the elements of Ai∖B are chosen. It can be shown that if α=β=r−1, then a robust sunflower with parameters α and β contains a sunflower of size r. Interestingly, their upper bound for set systems without robust sunflowers was close to sharp, so they reached the natural barrier for their method. A subsequent paper of Frankston, Kahn, Narayanan and Park removed the loglogk factor from the bound, giving a bound of (Clogklog(rk))k.

The main purpose of this paper is to give a shorter and cleaner proof of the result of Alweiss, Lovett, Wu and Zhang, using the converse of Shannon’s noiseless coding theorem. In fact, it achieves the same bound as Frankston, Kahn, Narayanan and Park, and also gives essentially sharp estimates for the size of a set system without a robust sunflower with parameters α and β, which have other applications in theoretical computer science.