Rank bounds for design matrices with block entries and geometric applications

Design matrices are sparse matrices in which the supports of different columns intersect in a few positions. Such matrices come up naturally when studying problems involving point sets with many collinear triples. In this work we consider design matrices with block (or matrix) entries. Our main result is a lower bound on the rank of such matrices, extending the bounds proved in {BDWY12,DSW12} for the scalar case. As a result we obtain several applications in combinatorial geometry. The first application involves extending the notion of structural rigidity (or graph rigidity) to the setting where we wish to bound the number of `degrees of freedom' in perturbing a set of points under collinearity constraints (keeping some family of triples collinear). Other applications are an asymptotically tight Sylvester-Gallai type result for arrangements of subspaces (improving {DH16}) and a new incidence bound for high dimensional line/curve arrangements. The main technical tool in the proof of the rank bound is an extension of the technique of matrix scaling to the setting of block matrices. We generalize the definition of doubly stochastic matrices to matrices with block entries and derive sufficient conditions for a doubly stochastic scaling to exist.


Introduction
Design matrices, defined in [BDWY12], are (complex) matrices that satisfy certain conditions on their support (the set of non-zero entries). Roughly speaking, a design matrix has few non-zero entries per row, many non-zero entries per column and, most importantly, the supports of every two columns intersect in a small number of positions. In [BDWY12,DSW12], lower bounds on the rank of such matrices were given and applied to upper bound the dimension of point configurations in C d containing many collinear triples. In particular, [DSW12] used this method to give a new elementary proof of Kelly's theorem (the complex analog of the Sylvester-Gallai theorem). In this work we generalize the rank bounds from [BDWY12,DSW12] to handle design matrices with matrix entries. We then use these bounds to prove several new results in combinatorial geometry.
Our geometric applications are of three types. The first deals with bounding the number of 'degrees of freedom' when smoothly perturbing a set of points while maintaining a certain family of triples collinear. This is in the same spirit of structural rigidity results [Lam70] in which pairwise distances are maintained along the edges of a graph embedded in the plane. The second application is a generalization of the Sylvester-Gallai theorem for arrangements of subspaces. Such a result was recently proved in [DH16] and we are able to give an asymptotically tight improvement to their results. The last application involves arrangements of lines and curves in C d that have many pairwise incidences (each line/curve intersects many others). We are able to show upper bounds on the dimension of such configurations as a function of the number of incidences and under the assumption that no low dimensional subspace contains 'too many' of the lines/curves.
The main tool used to prove the rank bounds for design matrices in [BDWY12,DSW12] was matrix scaling. Given a complex matrix A = (A ij ), we try to find coefficients r i , c j for each row/column so that the matrix with entries B ij = r i A ij c j is doubly stochastic. In this setting, one is actually interested in the ℓ 2 norms of all rows/columns being equal (instead of ℓ 1 ). The main technical difficulty is in giving sufficient conditions that guarantee the existence of such a scaling. Following the pioneering work of Sinkhorn [Sin64], such conditions are analyzed completely in [RS89]. To handle design matrices with block entries we study the problem of matrix scaling for block matrices. Finding sufficient conditions for scaling is intimately related to the well studied problem of operator scaling [Gur04,LSW98,GGOW15]. We give a (mostly) self-contained and elementary derivation of sufficient conditions for scaling to exist relying only on the work of [BCCT08] which gives sufficient conditions for scaling of matrices with one column (see Theorem 2.16 below). We note that [BCCT08] does not mention matrix scaling explicitly in their work (which studies the Brascamp-Lieb inequalities). The observation that this part of their work can be interpreted through this angle seems to not have been noticed before.
We describe our results in more detail in the subsections below. The main technical work involving matrix scaling will be discussed in Section 2.

Design matrices with block entries
For the rest of the paper, all matrices are complex unless otherwise noted. By positive definite (semi-definite) matrix we mean Hermitian matrix with positive (non-negative) eigenvalues.
Let M m,n (r, c) denote the set of m×n matrices with entries being r×c matrices. When referring to rows and columns of A we mean the m rows of blocks (and n columns). We sometimes refer to the entries of A as the blocks of A. For a matrix A ∈ M m,n (r, c) we denote byÃ the M rm,cn (1, 1) matrix obtained from A in the natural way (ignoring blocks). We define rank(A) to be the rank of A (as a complex matrix). We will sometimes identify a matrix A ∈ M m,n (r, c) with a linear map from C nc to C mr given byÃ.
To define design matrices with block entries we will need the following definition.
Definition 1.1 (well-spread set). Let S = {A 1 , . . . , A s } ⊂ M r,c (1, 1) be a set (or multiset) of s complex r × c matrices. We say that S is well-spread if, for every subspace V ⊂ C c we have i∈ [s] dim(A i (V )) ≥ rs c · dim(V ).
The following definition extends the definition of design matrices given in [BDWY12].
Definition 1.2 (design matrix). A matrix A ∈ M m,n (r, c) is called a (q, k, t)-design matrix if it satisfies the following three conditions 1. Each row of A has at most q non zero blocks.
2. Each column of A contains k blocks that, together, form a well-spread set.
3. For any j = j ′ ∈ [n] there are most t values of i ∈ [m] so that both A ij and A ij ′ are non-zero blocks. In other words, the supports of two columns intersect in at most t positions.
Comment 1.3. Notice that, for the case r = c = 1, the second item simply requires that each column has at least k non-zero entries. Hence, this definition extends the definitions of design matrices from previous works ( [BDWY12], [DSW12]). More generally, if r = c then the second item is equivalent to asking that each column contains at least k non singular blocks.
Our main theorem is the following lower bound on the rank of design matrices. Setting r = c = 1 we recover the rank bound from [DSW12].
Theorem 1.4 (rank of design matrices). Let A ∈ M m,n (r, c) be a (q, k, t)-design matrix. Then We now describe the various geometric applications of this theorem.

Projective rigidity
Given a finite set of points V in C 2 containing some collinear triples, we can apply any projective transformation on V and keep all collinear triples collinear. This gives 8 'degrees of freedom' for us to 'move' V (keeping its collinearity structure). But are there more transformations we can perform? To study this question more formally, we begin with the following definition.
Definition 1.5 (Projective Rigidity). Let V = (v 1 , . . . , v n ) ∈ (C d ) n be a list of n points in C d and let T ⊂ [n] 3 be a multiset of triples on the set [n] (we allow repetitions of triples for technical reasons). Let K T ⊂ C nd be the variety of lists of n points in which all triples in T are collinear. Let P V ∈ C nd denote the concatenation of coordinate vectors of all points in V . We say that (V, T ) is r-rigid if P V is a non singular point of K T and the dimension of its irreducible component is at most r. We denote the set of pairs (V, T ) as above (with P V ∈ K T ) by COL(n, d).
Hence, showing that a point set V ⊂ C 2 with a family of triples T is 8-rigid means showing that it cannot be changed smoothly in any nontrivial way. Using our rank bound for design block matrices, we are able to prove a general theorem (Theorem 4.1) giving quantitative bounds on the rigidity of pairs (V, T ) satisfying certain conditions. For example, if every pair of points in V is in exactly one triple in T and no line contains more than half of the points in V then we can prove an upper bound of 15 on the rigidity of the pair (V, T ). We refer the reader to Section 4 for a more complete description of these results.
Other notions of rigidity: A more well-studied notion of geometric rigidity has to do with fixing the distances between pairs of points. Let G = G(V, E) be a graph, where |V | = n, |E| = m. Let p = (p v ) v∈V be an embedding of G in R d , where to each vertex v ∈ V we assign the point p v ∈ R d . By fixing the order of the vertices in V , we can identify the set of embeddings of G in R d with points p ∈ (R d ) n = R dn . Given such point-bar framework (G, p), one is generally interested in the study of all continuous paths in R dn which preserve the distances of all pairs of points in E. More succinctly, given the distance function of G ∆ G : R dn → R m defined by we are interested in studying all continuous paths in R dn starting from p which leave ∆ G unchanged.
If, for a given framework (G, p), it turns out that every continuous path from p which preserves ∆ G terminates at a point q ∈ R dn such that q is an isometry of p, we say that the framework (G, p) is rigid. That is, if ∆ G (p) = ∆ G (q) implies ∆ Kn (p) = ∆ Kn (q) for all q ∈ R dn obtained from p in the above manner, we say that the framework (G, p) is rigid. Otherwise, we call the framework (G, p) flexible. For more concrete motivations to the study of rigidity, we refer the reader to [Lam70] and references therein.
A different notion of rigidity, which is closer to ours in spirit, is the one given by Raz [Raz16], which we now define. Given a (multi)set of lines L = {ℓ 1 , . . . , ℓ n } in C 3 , we define the intersection graph of L as the graph G L = G L ([n], E) where {i, j} ∈ E iff i = j and the corresponding lines ℓ i and ℓ j intersect. For a graph G, we say that L is a realization of G if G ⊆ G L . With these definitions, we say that a graph G is rigid if for any generic realization L = {ℓ 1 , . . . , ℓ n } of G, we must have G L = K n .

Sylvester-Gallai for subspaces
Another application of Theorem 1.4 gives a quantitative improvement to the results of [DH16] who generalized the Sylvester-Gallai theorem for arrangements of subspaces in C d . We show the following: The original bound proven in [DH16] was a slightly worse O(ℓ 4 /δ 2 ). For δ = 1 and ℓ = 1 the bound of 3 we get is completely tight as there are three dimensional configurations of one dimensional subspaces over C with every pair spanning some third subspace (this can be obtained by taking the Hesse configuration and moving to projective space). When δ = 1 and ℓ > 1 it remains open whether or not the bound 4ℓ − 1 bound is tight or not (one can get a lower bound of 3ℓ by taking the product of the one dimensional example).
The condition V i ∩ V i ′ = { 0} is needed due to the following example given in [DH16]: Set ℓ = 2 and n = d(d − 1)/2 and let { e 1 , e 2 , . . . , e d } be the standard basis of C d . Define the n spaces to be V ij = span{ e i , e j } with 1 ≤ i < j ≤ d. Now, for each (i, j) = (i ′ , j ′ ) the sum V ij + V i ′ j ′ will contain a third space (since the size of {i, j, i ′ , j ′ } is at least three). However, this arrangement has dimension d > √ n.
The one dimensional (ℓ = 1) version of Theorem 1.6 was originally proven in [BDWY12,DSW12] as an application of the rank bound for (scalar) design matrices. In [DH16], a different, more lossy, proof technique was developed to handle the higher dimensional case (also relying on methods similar to [BCCT08]). Our proof goes back to the original proof strategy using rank of design matrices, now with block entries, and applying Theorem 1.4.

Pairwise incidences of lines and curves
Our final application of Theorem 1.4 is the following result about pairwise incidences in a given set of lines.
Theorem 1.7. Let L 1 , . . . , L n ⊂ C d be lines such that each L i intersects at least k other lines and, among those k lines, at most k/2 have the same intersection point on L i . Then, the n lines are contained in an affine subspace of dimension at most 4n k+2 − 1.
We also prove an analog of Theroem 1.7 for higher degree curves. We refer to a curve as a degree r parametric curve if it is given as the image of a polynomial map (in one variable) of degree at most r.
Theorem 1.8. Let γ 1 , . . . , γ n ⊂ C d be degree r parametric curves such that each γ i intersects at least k other curves and, among those k curves, at most k/2r have the same intersection point on γ i . Then, the n curves are contained in a subspace of dimension at most 2(r+1) 4 n k .

Organization
In Section 2 we develop the necessary machinery for scaling of matrices with block entries. In Section 3 we prove our main theorem, Theorem 1.4. In Section 4 we give the applications for geometric rigidity. In Section 5 we prove our improved Sylvester-Gallai theorem for subspaces (Theorem 1.6). In Section 6 we prove Theorem 1.7 and its generalization for higher degree curves.

Matrix scaling and capacity
In this section we develop the machinery needed to prove Theorem 1.4. We denote by I s ∈ M s,s (1, 1) the s × s identity matrix. For a matrix A we denote A 2 2 = tr(AA * ).
If all matrices R i (A) are non singular (and hence, positive definite) we define the row normalizing matrix of A as the matrix R(A) ∈ M m,m (r, r) whose diagonal blocks are the matrices R(A) i,i = (R i (A)) −1/2 . We define the row normalization of A as the product Row(A) = R(A)·A. If R i (A) = I r for all i ∈ [m] we say that A is row normalized.
Definition 2.2 (Column Normalization). Let A ∈ M m,n (r, c). For each j ∈ [n] let If all matrices C j (A) are non singular (and hence, positive definite) we define the column normalizing matrix of A as the matrix C(A) ∈ M n,n (c, c) whose diagonal blocks are the matrices C(A) j,j = (C j (A)) −1/2 . We define the column normalization of A as the product Col(A) = A·C(A).
If C j (A) = I c for all j ∈ [n] we say that A is column normalized.
Definition 2.3 (Doubly stochastic block matrices). A matrix A ∈ M m,n (r, c) is said to be doubly stochastic if it is both row normalized and column normalized. We define the distance of A to a doubly stochastic matrix, denoted by ds(A), as obtained as follows: Let R 1 , . . . , R m ∈ M r,r (1, 1) and C 1 , . . . , C n ∈ M c,c (1, 1) be non-singular complex matrices. We refer to the R i 's as row scaling coefficients and to the C j 's as column scaling coefficients. Now, we let We would like to understand when a matrix has a doubly stochastic scaling. For technical reasons, it is more natural to ask when a matrix can be scaled to be arbitrarily close to doubly stochastic. This question turns out to have a much nicer answer and, for our purposes, an 'almost' doubly stochastic matrix will do just fine.
Definition 2.5 (Scalable Matrices). A matrix A ∈ M m,n (r, c) is said to be scalable if, for every ǫ > 0 there exist a scaling B of A such that ds(B) ≤ ǫ.
Our goal is to give sufficient conditions for a matrix to be scalable. For this we need to define a measure called capacity which is a generalization of capacity of non-negative matrices defined in [GY98] (used to study Sinkhorn's algorithm) and a special case of capacity of operators defined in [Gur04] (used to study an operator generalization of Sinkhorn's algorithm).
Definition 2.6 (Capacity). The capacity of a block matrix A ∈ M m,n (r, c) is defined as: Where the X i 's are r × r complex Hermitian positive definite matrices.
The main technical result of this section is given in the following theorem. We will prove it at the end of the section, following some preliminaries. The proof will mimic the analog result for scalar matrices (Sinkhorn's algorithm) using alternate left/right scaling and using the capacity as a progress measure (this is also the approach taken in [GGOW15] for operator scaling).
The proof of the lemma will be using an iterative algorithm that, at each step performs row/column normalization of A. We will show that this process must converge to a doubly stochastic matrix. We start with some useful claims. The first claim relates the capacity of A with that of A * . For our purposes, we will only need to use the fact that, if one of them is zero, then so is the other.
Claim 2.8. Let A ∈ M m,n (r, c). Then Proof. Let PD k denote the set of k × k hermitian positive definite matrices. Notice that Similarly Suppose for now that, cap(A * ) is non-zero. We have . Continuing: where the second inequality follows from the AM-GM inequality, applied to the (non negative) eigenvalues of a PSD matrix. In the other direction, we apply a similar argument to A * to obtain that, if cap(A) is nonzero then cap(A * ) 1/mr cap(A) 1/nc ≥ mr nc Rearranging completes the proof.
Claim 2.9 (Capacity of normalized matrices). Let A ∈ M m,n (r, c) be a column-normalized matrix.
Claim 2.10 (Capacity of a scaling). Let A ∈ M m,n (r, c) and let B be a scaling of A with row scaling coefficients R 1 , . . . , R m ∈ M r,r (1, 1) and column scaling coefficients C 1 , . . . , C n ∈ M c,c (1, 1). Then Proof.
Where the last equality is obtained by observing the effect of scaling all the Y i 's by the same constant α on the capacity.
To prove a quantitative bound on the rate of growth of the capacity under row/column scaling we will need the following quantitative variant of the AM-GM inequality. A proof (along the lines of [LSW98]) is given for completeness.
Proof. First assume that ǫ ≤ 1. Let y i = x i − 1 so that i y i = 0 and i y 2 i = ǫ. Using the inequality 1 + t ≤ e t−t 2 /2+t 3 /3 which holds for all real t we get that where the last inequality used the fact that ǫ ≤ 1. To argue about values of ǫ larger than 1 we observe that the function f (z) = i (1+zy i ) is decreasing in the range 0 ≤ z ≤ 1. To see this, notice that the derivative of ln f (z) is precisely i is also decreasing. Hence, we can apply the bound for small ǫ to get that f (1) ≤ f (z * ) ≤ exp(−1/6) for z * = ǫ −1/2 ≤ 1.
Claim 2.12 (Capacity and row/column normalization). Let A ∈ M m,n (r, c) be a matrix such that ds(A) = ǫ. Then, 1. If A is column-normalized, then cap(Row(A)) ≥ cap(A) (assuming Row(A) is defined).
Comment: One can prove a similar quantitative bound in terms if ǫ also in item (1) but we will not need it.
Proof. We start by proving the first item.
Since the scaling coefficients used to get Row(A) from A are R i (A) −1/2 , we get that, by Claim 2.10, (1) Hence, by the AM-GM inequality we get that Plugging this into Eq. 1 proves the first part of the claim.
To prove the second part, let C j (A) = nc mr m i=1 A * ij A ij and recall that Col(A) is obtained from A by scaling the columns with coefficients C j (A) −1/2 . Hence, by Claim 2.10, we have (2) Let µ j1 , . . . , µ jc be the eigenvalues of C j (A). As before, we have that tr(I r ) = nc.
Using the assumption ds(A) = ǫ and the fact that A is row normalized we can also deduce that Hence, we can use Claim 2.11 to obtain the bound Proof. Suppose B ij = R i A ij C j for non-singular scaling coefficients R 1 , . . . , R m ∈ M r,r (1, 1) and C 1 , . . . , C n ∈ M c,c (1, 1). To show that Row(B) is well defined we need to argue that, for each i ∈ [m], the PSD matrix is non singular. We can take out the non singular R i and R * i factors and so we need to show that is non singular. This r ×r PSD matrix is singular iff there exists a vector v ∈ C r so that C * j A * ij v = 0 for all j ∈ [n]. Since the C j 's are non singular, such a v would also be in the kernel of R i (A) = n j=1 A ij A * ij in contradiction to our assumption that Row(A) is well defined. The proof for Col(A) is identical.
Proof of Lemma 2.7. Let A 0 = Col(A) and define recursively Notice that Col(A) is well defined since cap(A) > 0 and that Row(A) is well defined since cap(A * ) > 0 (using Claim 2.8). Hence, by Claim 2.13, this property will remain true for all matrices A k in the sequence (since they are all scalings of A). We wish to show that ds(A k ) approaches zero when k goes to infinity. Assume in contradiction that ds(A k ) ≥ ǫ for some 0 < ǫ < 1 and all k ≥ 0. Applying Claim 2.12, we get that cap(A k+1 ) ≥ exp(ǫ/6) · cap(A k ).
The matrices A k are all column-normalized and so, by Claim 2.9, cap(A k ) ≤ 1 for all k ≥ 0. This gives a contradiction to the claimed growth of cap(A k ).

Bounding the capacity of a matrix
In this section we will develop machinery useful for proving that the capacity of certain matrices is positive. Proof. The claim following from the simple fact that, for two PSD matrices X, Y , we have det(X + Y ) ≥ det(X). Using this in the definition of capacity, we see that, replacing some blocks in A with zeros can only decrease the product of determinants being minimized. Proof. To save on notations, we will only prove the claim for s = 2 (the general case is proved along the same lines). Suppose therefore that M has diagonal blocks A, B ∈ M m,n (r, c) and zero blocks in the two off diagonal positions. More precisely, viewing M as an element of M 2m,2n (r, c) (and treating M ij as the actual r × c blocks of M ), we have M ij = A ij for 1 ≤ i ≤ m and 1 ≤ j ≤ n, M ij = B (i−m)(j−n) for m + 1 ≤ i ≤ 2m, n + 1 ≤ j ≤ 2n and M ij = 0 for all other pairs i, j.
To see that the capacity splits into the product of capacities, it is enough to rewrite the capacity in a scale invariant form: The following is a special case of Theorem 1.13 of [BCCT08]. Even though the treatment in [BCCT08] is for real matrices, the proof of this particular result carries over without difficulty to the complex numbers. In particular, we are only relying on the arguments in Lemma 5.1 and Proposition 5.2 of [BCCT08] whose proofs only use basic properties of PSD matrices as well as compactness of the set of orthonormal (in our case unitary) bases. Another comment is that [BCCT08] assumes the matrices A 1 , . . . , A k are non-degenerate, meaning that each A i is full rank and that their common kernel is trivial. These properties follow easily from being well-spread and so we are allowed to apply their results.

Rank of design matrices with block entries
In this section we will prove Theorem 1.4. First, we analyze a transformation taking any design matrix to another design matrix which is scalable.

Regularization of a design matrix
Definition 3.1 (Design matrix in regular form). A (q, k, t)-design matrix A ∈ M m,n (r, c) is in regular form if m = nk and, in each column i ∈ [n], the k blocks A (i−1)k+1,i , . . . , A (i−1)k+k,i form a well-spread set. That is, the second item in the definition of a design matrix is satisfies by k-tuples of blocks that are row-disjoint in A.
Claim 3.2. Let A ∈ M m,n (r, c) be a (q, k, t)-design matrix. Then, there exists a (q, k, tq)-design matrix B ∈ M nk,n (r, c) in regular form such that rank(B) ≤ rank(A).
Proof. We construct B in n steps. In the first step we add to B k rows of A so that their first column entries are well-spread. In the next step we add k more rows to B using the k rows in A in which the second column entries form a well spread set. We continue in this manner until we end up with B having nk rows. Since each row of A contains at most q non zero blocks, we have that each row of A is repeated at most q times in B. Hence, the supports of two columns in B can intersect in at most tq positions. Since all rows of B are from A the rank of B cannot increase (it might decrease if we do not use all rows of A).
Claim 3.3. Suppose B ∈ M nk,n (r, c) is a (q, k, t)-design matrix in regular form. Then B is scalable.
Proof. We call the entries of B in positions ((i − 1)k + ℓ, i) for ℓ ∈ [k] special. Let B ′ ∈ M nk,n (r, c) be the matrix obtained from B by replacing all the non special entries of B by zero blocks. By Claim 2.14 and Lemma 2.7 it is enough to prove that cap(B ′ ) > 0. We can consider B ′ as a diagonal n × n matrix with entries in M k,1 (r, c) and so, using Claim 2.15, it is enough to show that the special entries in each column form a M k,1 (r, c) matrix with positive capacity. This follows from Theorem 2.16 and using the assumption that the special entries in each column form a well spread set.

Proof of Theorem 1.4
We will use the following folklore lemma on diagonal dominant matrices. We call a matrix H satisfying these two conditions an (L, S)-diagonal dominant matrix.
Proof. First, notice that we can assume w.l.o.g that H i,i = L for all i. Indeed, otherwise we scale the i'th row and column by 0 < L/H ii ≤ 1 to get a new Hermitian matrix with L on the diagonal and with smaller S. Then, The following claim is an easy consequence of Cauchy-Schwartz (applied coordinate-wise) Claim 3.5. Let A 1 , . . . , A t ∈ M r,c (1, 1) then Another useful claim: Claim 3.6. Suppose C 1 , . . . , C q ∈ M r,c (1, 1) are such that i∈[q] C i C * i = I r . Then Proof. The sum in the claim is equal to the difference of the two sums: First notice that Next notice that, by Claim 3.5, we have These tow calculations complete the proof.
The bulk of the proof is given in the next lemma.
Lemma 3.7. Suppose M ∈ M m,n (r, c) is a (q, k, t)-design matrix that is scalable. Then Proof. Since scaling does not change rank and preserves the property of being a (q, k, t)-design, we may assume w.l.o.g that M is already scaled (for some ǫ that we will later send to zero). Notice that we could, w.l.o.g, assume that the 'row sums' of M are perfectly scaled and that the 'error' is only in the column sums (just apply one additional row normalization). That is, is a matrix that goes to zero (entry wise) with ǫ going to zero.
Equation (3) follows from the scaling condition on the columns of M since the diagonal c × c blocks of H are rm cn I c plus error that vanishes with epsilon. We now turn to prove the bound (4) on S (the sum of squares of off-diagonal entries). We have Using Claim 3.5 and the fact that the supports of two columns of M intersect in at most t blocks, we continue: Now, applying Claim 3.6 and using the fact that each row of M has at most q non-zero blocks, we get S ≤ tmr(1 − 1/q).
We can now apply Lemma 3.4 with the above L and S to get that with X = mrq cnt(q−1) . Since this inequality holds for all ǫ we can take ǫ to zero and conclude that it holds without the o(1) term as well. The final observation is that rank(M ) = rank(H) and so we are done.
We can now prove the main rank theorem for design matrices.
Proof of Theorem 1.4. Let A ∈ M m,n (r, c) be a (q, k, t)-design matrix. Let B ∈ M nk,n (r, c) be the matrix given by Claim 3.2. So B is a (q, k, qt)-design matrix in regular form with rank(B) ≤ rank(A). By Claim 3.3 B is scalable. Thus, we can apply Lemma 3.7 to conclude that This completes the proof.

Projective rigidity
Below, we will prove the following rigidity theorem (following some corollaries and preliminaries). 3. For all 0 < ℓ < d there are at most ℓ d k triples in T (counting repetitions) so that all of them intersect at some point and the corresponding triples in V are contained in an ℓ-dimensional affine subspace. .
For example, if we have a triple system in C 2 in which every pair is in exactly one triple and so that no line contains more than half the points, we get that the configuration is 15-rigid. Indeed, setting k = (n − 1)/2, d = 2, t = 1, q = 3 the bound on r becomes 8n 4 + (n − 1)/2 = 16 · n n + 7 = 15.
We now discuss the implications for δ-SG (Sylvester-Gallai) configurations, defined in [BDWY12]. A theorem from [DSW12] show that a δ-SG configuration must be contained in a subspace of dimension at most O(1/δ). We can use Theorem 4.1 to prove the following. Proof. For each line containing r ≥ 3 points we construct a triple multiset of r 2 − r triples so that each point on the line is in exactly 3(r − 1) triples and every pair is in at most 6 triples (see Lemma 5.1). Taking the union of all these triples we get a family of triples T ′ ⊂ T (containment as sets, not multisets) and so it is enough to bound the rigidity of the pair (V, T ′ ). Each point is in at least k = 3δ(n − 1) triples in T ′ and every pair is in at most 6. To apply Theorem 4.1 we need to argue that every ℓ-dim affine subspace can contain at most ℓ d k = 3ℓδ(n−1) d intersecting triples in T ′ . If there exist an affine subspace W that violates this inequality then V must contain at least points of V contradicting the assumptions. Applying Theorem 4.1 we get that (V, T ′ ) is r-rigid with

The rigidity matrix
For a pair (V, T ) ∈ COL(n, d) we define a matrix A = A(V, T ) ∈ M m,n (d − 1, d) with m = |T | called the rigidity matrix of (V, T ). The matrix will be defined so that dn − rank(A) will upper bound the rigidity of (V, T ). To this end, we first define a certain d − 1 × d block that will be used in the construction of A.
in position k and zero blocks everywhere else. If T is a multiset and a triple repeats several times, we also repeat the corresponding row in A the same number of times.
Claim 4.6. If A(V, T ) has rank dn − r then (V, T ) is r-rigid.
Proof. Let P (t) be a smooth curve in K T ⊂ C nd with P (0) = P V . LetṖ (t) be the tangent vector. Then we claim that A ·Ṗ (0) = 0. By the construction of A it is enough to show that, for a triple (i, j, k) ∈ T we have This follows by taking the derivative w.r.t the variable t of the d − 1 identities (for ℓ = 2 . . . d) that hold for every collinear triple v i , v j , v k and any t.
Hence, the vectorṖ (0) must lie in an r dimensional subspace. This implies that the dimension of K T at P V is at most r.

Proof of Theorem 4.1
Let (V, T ) be as in the statement of the theorem and let A = A(V, T ) be the corresponding rigidity matrix. We may assume w.l.o.g that the vectors v 1 , . . . , v n forming V are distinct in the first coordinate (this can be achieved by applying a generic affine transformation).
Claim 4.7. Let w 1 , . . . , w k ∈ C d be such that the first coordinate in each w i is non zero and such that, for all 0 < ℓ < d, any ℓ-dimensional subspace of C d contains at most ℓ d k of the w i 's. Then, the set of matrices ∆(w 1 ), . . . , ∆(w k ) ∈ M d−1,d (1, 1) is well-spread.
Proof. Fix a subspace V ⊂ C d of dimension 0 < ℓ < d. We have that dim (∆(w i )(V )) is equal to ℓ − 1 if w i ∈ V and to ℓ otherwise. Hence, We then only have to argue that the definition of well-spread set is satisfied also for the special case of V = {0} and V = C d . The first is trivial to see and the second follows since each ∆(w i ) is full rank.
Proof. By construction, each row of A has three non zero blocks. Pairwise intersections of columns follow from the assumption that at most t triples contain a particular pair of points. Now, consider k triples of T containing a particular point v i (we assume at least k such triples exist). The corresponding blocks in the i'th column of A are given by ∆(v k − v j ) with v j , v k being the other two points in that triple. Notice that all the vectors v k − v j have a non-zero first coordinate and so we can use the fact that the kernel of ∆ Since we assume that no ℓ-dimensional affine subspace contains more than ℓ d k of these k (intersecting) triples, by Claim 4.7, these k entries will form a well-spread set.
Using the last claim, we can apply Theorem 1.4 to conclude that .
Noticing that the rank is an integer, we can add the floor to the obtained bound. This completes the proof of the theorem

Sylvester-Gallai for subspaces
In this section we prove Theorem 1.6. Let k = δ(n − 1) and assume w.l.o.g that k is an integer. For each i ∈ [n] pick some basis B i = {v i1 , . . . , v iℓ } for the subspace V i . Let A V ∈ M nℓ,d (1, 1) be the matrix whose first ℓ rows are the elements of B 1 , the next ℓ rows are the elements of B 2 etc up to B n . Our goal is then to prove an upper bound on the rank of A V . For that purpose we will construct another matrix A C ∈ M m,n (ℓ, ℓ) of high rank such that A C · A V = 0.
We will now describe how to construct the matrix A C . The first step is to construct a multiset of triples T ⊂ [n] 3 . We will use the following simple lemma from [DSW12]. Notice that we are using multisets as, for example, if r = 3 we must use the same (and only) triple with multiplicity 6. Since the pair-wise intersections of the V i 's are all trivial, every pair of them spans a 2ℓ dimensional subspace of C d . We will call a 2ℓ dimensional subspace of C d special if it contains at least three of the V i 's. For every special 2ℓ-dimensional space containing r ≥ 3 spaces among the V i 's we use Lemma 5.1 to construct a multiset of r 2 − r triples on the r spaces contained in that special subspace satisfying the two conditions of the lemma (we view these triples as triples in [n] since each subspace is indexed by an element of [n]). We then define the triple multiset T ⊂ [n] 3 to be the union (counting multiplicities) of all triples obtained this way (going over all special 2ℓ-dimensional spaces).
Claim 5.2. The triple multiset T ⊂ [n] 3 constructed above satisfies the following three conditions (counting multiplicities).
• Each i ∈ [n] appears in at least 3k triples in T .
• Every pair i = j appears together in at most 6 triples in T .
Proof. The first item is satisfied since we only take triples contained in a 2ℓ dimensional space and every pair has trivial intersection (and so spans the entire 2ℓ-dimensional space). To prove the second item, fix some i ∈ [n] and suppose V i is contained in s special 2ℓ-dimensional spaces W 1 , . . . , W s such that W j contains r j ≥ 3 spaces among the V 1 , . . . , V n (including V i ). By the conditions of the theorem, we know that s j=1 (r i −1) ≥ k. Hence, using the bounds from Lemma 5.1 V i (or actually i) will be in s j=1 3(r i − 1) ≥ 3k triples in T . The last item follows from the fact that a particular pair V i , V j can belong to at most one special 2ℓ-dimensional space and then using the bound on pairs from Lemma 5.1.
We now construct the matrix A C ∈ M m,n (ℓ, ℓ) by adding to A C a specially constructed row (of ℓ × ℓ blocks) for each triple in T (if a triple repeats more than once we also repeat the corresponding row the same number of times). The construction of the row is given in the following claim.
1. For each i ∈ {i 1 , i 2 , i 3 }, the i'th block in R (t) is zero.
2. The three blocks of R (t) indexed by i 1 , i 2 , i 3 are non singular ℓ × ℓ matrices.
3. The product R (t) · A V is zero (viewed as an ℓ × d scalar matrix).
Proof. Since V i 1 , V i 2 , V i 3 are all contained in a 2ℓ dimensional space (spanned by any two of them), every basis element in one of the spaces, say in V i 1 , is spanned by the basis elements in the other two. Let B i denote the matrix whose rows are the elements of the basis of V i . We can thus find ℓ × ℓ matrices C 2 , C 3 so that Moreover, both matrices C 2 , C 3 are non singular, since otherwise V i 1 would intersect one of the spaces V i 2 , V i 3 non-trivialy. Hence, we can take the row R (t) to have the identity ℓ × ℓ block in position i 1 and the non singular blocks −C 2 , −C 3 in positions i 2 , i 3 (with zeros everywhere else). By construction of A V we have that the product R (t) · A V is zero.
We now take the matrix A C ∈ M m,n (ℓ, ℓ) to have the rows (in whatever order we wish) R (t) for all t ∈ T (counting multiplicities). By the last claim we have that A C · A V = 0.
Proof. First notice that, by construction, each row of A C has at most three non zero blocks. By properties of the triple system T , every pair of columns i = j will have at most 6 rows of A C in which both columns are non zero (since there are at most 6 triples in T containing both i and j). So we only need to show that each column contains at least 3k blocks that form a well spread (multi)set. By Claim 5.3, each non zero block in A C is non singular and so, by Comment 1.3 , it is enough to show that each column contains at least 3k non zero blocks. This follows from the properties of T since each i appears in at least 3k triples.
We now apply Theorem 1.4 to bound the rank of A C : Using the identity A C · A V = 0 we conclude that Now, using the fact that the rank is an integer and that we have a strict inequality we can in fact bound the rank by ⌈4ℓ/δ⌉ − 1. This concludes the proof of Theorem 1.6.

Incidences between lines and curves
In this section we use Theorem 1.4 to prove a bounds on the incidence structure of arrangements of lines and curves in C d . We begin by restating our theorem handling intersections of lines.
Theorem 6.1. Let L 1 , . . . , L n ⊂ C d be distinct lines such that each L i intersects at least k other lines and, among those k lines, at most k/2 have the same intersection point on L i . Then, the n lines are contained in an affine subspace of dimension at most 4n k+2 − 1.
This theorem can be equivalently stated as the following statement about two dimensional subspaces.
Theorem 6.2. Let V 1 , . . . , V n ⊂ C d be distinct two dimensional subspaces such that each V i nontrivially intersects at least k other V j 's and, among those k subspaces, at most k/2 have the same intersection with V i . Then Proof of equivalence of Theorem 6.1 and Theorem 6.2. Suppose Theorem 6.1 holds and proceed to prove Theorem 6.2 as follows. Let H be a generic affine hyperplane (not passing through the origin) and let L i = V i ∩ H be the set of n lines obtained by intersecting each V i with H. Clearly, the incidence structure remains the same and so we can apply Theorem 6.1 to claim that the lines L 1 , . . . , L n are contained in an affine subspace (inside H) of dimension at most 4n k+2 − 1. This results in a dimension bound of 4n k+2 on the V i 's since we add back the origin. In the opposite direction, suppose Theorem 6.2 holds and proceed to prove Theorem 6.1 as follows. Let L 1 , . . . , L n ⊂ C d be lines as in the theorem. Embed C d into C d+1 as the hyperplane x d+1 = 1. Each line L i defines a two dimensional subspace in C d+1 by taking its linear span. If the lines L i span a d ′ -dimensional affine subspace in C d then the resulting arrangement of two dimensional spaces in C d+1 spans a d ′ + 1 dimensional linear subspace. Again, the incidence structure stays the same and so we can apply Theorem 6.2 and subtract one from the resulting dimension bound.
6.1 Proof of Theorem 6.2 The overall proof structure is similar to the proof of Theorem 1.6. We pick a basis {u i , v i } ∈ C d for each V i and consider the 2n × d (scalar) matrix A V whose rows are u 1 , v 1 , u 2 , v 2 , . . . , u n , v n . To upper bound the rank of A V we will construct a matrix A C ∈ M m,n (1, 2) of high rank such that A C · A V = 0. As before, each row of A C will come from some dependency (in this case pair-wise intersection) among the spaces V 1 , . . . , V n . More specifically, for every pair V i , V j with non trivial intersection we add a row R ∈ M 1,n (1, 2) to A C (rows can be added in whatever order we wish), where R is constructed as follows. Let a 1 , b 1 , a 2 , b 2 ∈ C be such that a 1 u i +b 1 v i +a 2 u j +b 2 v j = 0 and with |a 1 | + |b 1 | = 0 and |a 2 | + |b 2 | = 0 (such coefficients exist since there is non trivial intersection). We take the row R to have the block (a 1 , b 1 ) in position i and the block (a 2 , b 2 ) in position j, with zeros everywhere else. By construction we have R · A V = 0 and so we end up with A C · A V = 0 as well.
Claim 6.3. The matrix A C constructed above is a (2, k, 1)-design matrix.
Proof. Clearly every row has at most two non zero blocks and a pair of columns can have at most one row in which both are non zero (the row corresponding to their intersection, if one exists). So we only need to show that each column has k blocks forming a well spread set. Fix some column i and let (a 1 , b 1 ), . . . , (a k , b k ) be the k blocks in the i'th column appearing in rows corresponding to the intersections of V i with k subspaces V j 1 , . . . , V j k of which at most k/2 have the same intersection with V i . This last condition implies that, of the k row vectors (a 1 , b 1 ), . . . , (a k , b k ), at most k/2 are pairwise linearly dependent. This implies that they satisfy the definition of well-spread blocks. Indeed, since the blocks are 1 × 2, we only need to consider one dimensional subspaces U ⊂ C 2 in the definition of well-spread. For such a subspace, the linear map φ i from C 2 to C 1 defined by a block (a i , b i ) will have a one dimensional image on U if and only if (a i , b i ) is not in the orthogonal complement of U . Since at most k/2 of the (a i , b i ) can be in U ⊥ we get that as required.
Applying Theorem 1.4 on A C we get that rank(A C ) ≥ 2n − 2n 1+k/2 . Hence, rank(A V ) ≤ 4n k+2 . Since the rank is integer we get This completes the proof of the theorem.

Generalizing to curves
Here we extend Theorem 6.1 to handle curves of higher degree. For our methods to work we must require that the curves are given in parametric form as the image of a low degree polynomial map.
Definition 6.4. We say that γ ⊂ C d is a degree r parametric curve if there exists d polynomials γ 1 , . . . , γ d ∈ C[t] of degree at most r each such that γ = {(γ 1 (t), . . . , γ d (t)) | t ∈ C} and at least one of the γ i 's is a non constant polynomial.
It is easy to see that a parametric degree r curve as defined above also has degree at most r under the usual algebraic geometry definition of degree (intersecting it with a generic hyperplane, we get at most r intersection points). A parametric curve as defined above is also an irreducible curve as it is the image of an irreducible curve under a polynomial map. Combining these two facts, and using Bezout's theorem (see e.g., [?]) we can deduce the following.
We now restate our theorem for curve arrangements.
Theorem 6.6. Let γ 1 , . . . , γ n ⊂ C d be degree r parametric curves such that each γ i intersects at least k other curves and, among those k curves, at most k/2r have the same intersection point on γ i . Then, the n curves are contained in a subspace of dimension at most 2(r+1) 4 n k .
Proof of Theorem 6.6. We take the same general steps appearing in the proof of Theorem 6.1. First, for each curve γ i , let v i0 , . . . , v ir ∈ C d be such that In other words, v ij contains the coefficients of t j in the d polynomials defining γ. Clearly, upper bounding the dimension of the span of the v ij 's (over all i and j) will give an upper bound for the dimension of the smallest subspace containing all of the curves. For that purpose, let Γ be the n(r + 1) × d matrix whose first r + 1 rows are v 10 , . . . , v 1r , second r + 1 rows are v 20 , . . . , v 2r etc.
We will now use the incidences between the curves to construct a design matrix A ∈ M m,n (1, r+ 1) so that A · V = 0. Each intersection between a pair of curves will give one row in A as follows. Suppose γ i intersects γ i ′ for some i and i ′ . Let t, t ′ ∈ C be such that Then, we can add a row R to the matrix A such that the i'th block of R is (1, t, t 2 , . . . , t r ), the (i ′ )'th block of R is (1, t ′ , . . . , (t ′ ) r ) and all other blocks are zero. By construction we have that R · Γ = 0 and so, we will end up with a matrix A such that A · Γ = 0.
All is left is to argue that A is a design matrix.
Proof. By construction, each row of A contains at most 2 non-zero blocks. By Bezout's theorem (Claim 6.5), two curves can intersect in at most r 2 points and so two columns of A can have at most r 2 non zero common indices. To complete the proof we need to show that each columns of A contains at least k/2 blocks forming a well spread (multi)set. Fix some column i, and notice that there are at least k non-zero blocks in that column, each corresponding to an intersection of γ i with some other curve. Let t 1 , . . . , t k ∈ C be such that the k non-zero blocks in the i'th column are given by (1, t j , . . . , t r j ) with j = 1 . . . k. Some of the t i 's could be the same (if a single point on γ i is the intersection point with more than one curve). Notice that, by Vandermonde's theorem, if we take r + 1 distinct values of t i then the corresponding blocks (treated as row vectors in C r+1 are linearly independent and thus form a basis of C r+1 . Our strategy for picking a large well-spread set among these k block is as follows: We will greedily pick r + 1 blocks corresponding to r + 1 distinct intersection points and add them to our set. As long as we can find r + 1 distinct intersections we continue. If we can't find such a set, it means that all the remaining intersection points on γ i are concentrated in at most r points. Since each point can intersect at most k/2r curves from the original k (per the conditions of the theorem), there could be at most (k/2r) · r = k/2 points left. This means that we managed to construct a (multi)set of k ′ ≥ k/2 blocks in a way that there is a partition of them into r + 1 linearly independent sets. It is now easy to see that such a set is well-spread since a subspace V ⊂ C r+1 of dimension ℓ can contain at most k ′ ℓ r+1 of the k ′ blocks (at most ℓ from each of the linearly independent sets in the partition).
This completes the proof.