← cgad.ski 2023-09-19

A Coin Flip by Any Other Name…

Consider the following graph.

If we remove each edge with independent probability 1/2,1/2, what is the probability that there is still a path from the top to the bottom vertex?

Since the removal of each subset of edges is equally likely, this kind of question is equivalent to counting the subsets of edges that we can remove without disconnecting two vertices. How easily we crunch through this enumeration depends on how we manage to decompose the problem.

However, suppose we learn that the probability of a path remaining in this particular graph is exactly 1/2.1/2. Forget combinatorics—something strange is happening! In fact, it turns out that our special case can be solved with a "duality" trick, illustrated below. (Click the graph to generate a new random configuration.)

There will be a path between the top and bottom nodes exactly when there is no path between the left and the right nodes. Furthermore, the top-to-bottom and left-to-right graphs are the same and, once identified, have the same distribution of missing edges. In other words, we have found a rearrangement of the outcomes of our random process—the subsets of missing edges—that swaps the event that a path exists with the event that no path exists. Since this rearrangement preserves probability, it follows that each event must have probability 1/2.1/2. (I learned this beautiful idea from this mathpages article.)

Let's take a look at another suspicious problem. Suppose we choose four points on the unit circle. What is the probability that the origin lies in their convex hull? (Again, click for random configurations.)

As it turns out, the answer is again exactly 1/2.1/2. If you're skeptical, you can press the "play" icon above to resample the experiment at high speed. (The black lines under the bar graph delineate the ±3σ\pm 3 \sigma interval that the empirical proportion should stay within about 99%99\% of the time for large nn.)

In my experience, things don't happen with even odds unless they have a very good reason to, and that reason is usually a symmetry. For example, a coin will land heads as often as it lands tails because it is approximately symmetric, and there is no reason for the laws of physics to privilege one side of a symmetric coin so long as we have given it enough energy to make its initial condition approximately irrelevant.

Is there any appropriate symmetry in the problem of the four points? In such questions, it is important to have a degree of faith. Of course there is a symmetry; symmetry is everywhere! Unfortunately, symmetry does occasionally work in mysterious ways, and here it is not exactly obvious.

Given a pair (x,y)(x, y) of unit vectors, define the function normal(x,y)\text{normal}(x, y) to give the unique pair (x,y)(x', y') of unit vectors that satisfy the relations x,x=0,y,y=0,x,y0,y,x0.\begin{align*} & \langle x', x \rangle = 0, & \langle y', y \rangle = 0, \\ & \langle x', y \rangle \le 0, & \langle y', x \rangle \le 0. \end{align*} Now, consider the operation taking the four points ((a,b),(c,d))((a, b), (c, d)) to (normal(a,b),normal(c,d)).(\text{normal}(a, b), \text{normal}(-c, -d)). We claim that this is measure-preserving for the distribution under which each point is independent and uniformly distributed. Furthermore, ignoring an event of probability 0,0, we claim that the original four points generate a convex hull that contains the origin iff our four new points do not. So, our polygon contains the origin with the same probability that it does not. This observation solves the four-points problem in the same way that we solved the graph problem.

We can visualize this construction, although the picture is admittedly a little messy. In the diagram below, vectors with the same color are in the same quadruplet. (Note that which quadruplet is which does not matter, since the operation we have defined is its own inverse.) Meanwhile, the vectors drawn with solid lines are the pairs (a,b)(a, b) and normal(a,b),\text{normal}(a, b), while the vectors with dashed lines are the pairs (c,d)(c, d) and normal(c,d).\text{normal}(-c, -d).

Geometrically, each vector is a 9090 degree rotation of one of the vectors of the opposite color. For the vectors drawn with solid lines, these rotations turn them "away" from the pair of oppositely colored solid lines, while the dotted line vectors are turned "towards" the oppositely colored pair.

(After writing this post, I learned another geometric argument for this problem from Ian Agol on mathstodon.)

The remainder of this post is divided into three parts.

General Solution

We begin with an apparently unrelated combinatorial problem: how many cells do kk linear hyperplanes cut Rn\R^n into? This is related to the classic lazy caterer problem, which asks how many pieces we can cut out of a pizza pie with a certain number of cuts.

The lazy caterer problem is solved by observing that each cut increases the number of pieces of pizza by exactly 1+c,1 + c, where cc is the number of existing cuts that our new cut intersects. In fact, any number of cuts can be made in such a way that each cut intersects all the ones made before, so the maximum number of pieces we can make with nn cuts is 1+(1+2++n)=1+(n+12).1 + (1 + 2 + \dots + n) = 1 + \binom{n + 1}{2}. The case of kk linear hyperplanes cutting Rn\R^n can be analyzed similarly. It is only important to note that, while in the lazy caterer problem a new cut may or may not intersect a given existing cut, two non-equal hyperplanes always intersect, and so it will happen that kk hyperplanes always make the same number of cells in Rn.\R^n.

Let FknF^n_k be this number. The increase in cells cut out of Rn\R^n caused by the addition of one new hyperplane HH equals the number of cells that HH intersects. This is the same as the number of cells that the existing kk hyperplanes cut HH into. Since HH is (n1)(n - 1)-dimensional, we have the recurrence Fk+1n=Fkn+Fkn1.F^n_{k + 1} = F^n_k + F^{n - 1}_k. Especially in view of this recurrence, it makes sense to define F0n=1F^n_0 = 1 for all n,n, and Fk0=0F^0_k = 0 for k>1.k > 1. (It's slightly odd to imagine a "hyperplane" cutting R0\R^0 into 00 cells, but it's what's demanded by the recurrence relation we've observed.)

We can now compute FknF^n_k in general using a generating function. Let's put Fk(x)=Fknxn.F_k(x) = F^n_k x^n. Then the recurrence above means that, as far as the non-constant terms of Fk(x)F_k(x) are concerned, Fk+1(x)=Fk(x)+xFk(x)=(1+x)Fk(x).F_{k + 1}(x) = F_k(x) + x F_k(x) = (1 + x) F_k(x). Keeping in mind that Fk0=0F^0_k = 0 for k>1,k > 1, we find that F0(x)=11x,F1(x)=2x(1+x)1x,F_0(x) = \frac{1}{1-x}, \quad F_1(x) = \frac{2x(1 + x)}{1 - x}, \quad \dots and in general, Fk(x)=2x(1+x)k11x.F_k(x) = \frac{2x(1 + x)^{k - 1}}{1 - x}. We arrive at the following.

Proposition: Generically, k>0k > 0 hyperplanes cut Rn\R^n into Fkn=[xn]Fk(x)=[xn]2x(1+x)k11x=2j<n(k1j)\begin{align*} F^n_k & = [x^n] F_k(x) = [x^n] \frac{2 x (1 + x)^{k - 1}}{1 - x} \\ & = 2 \sum_{j < n} \binom{k - 1}{j} \end{align*} cells.

We now return to computing probabilities. Suppose we have kk points in Rn,\R^n, represented by the columns of an n×kn \times k matrix M.M. We are interested in the probability of the event xRk,x>0Mx=0,(1)\exists x \in \R^k, x > 0 \land M x = 0, \tag{1} which is equivalent to the origin belonging to the convex hull generated by the columns of M.M.

Let's take a moment to look at this problem in a few different ways. Ignoring the singular event that the origin is on the boundary of our convex hull, Farkas' lemma tells us that the unsatisfiability of (1)(1) is equivalent to the condition yRn,MTy>0.(2)\exists y \in \R^n, - M^T y > 0. \tag{2} Geometrically, this would mean that all the columns of MM lie strictly within some half-space. We can also read this as the event that the column space of MT-M^T contains a strictly positive element, or as the event that a system of kk strict linear inequalities over Rn\R^n admits a solution.

As it turns out, it is most straightforward to compute the probability of this last event—the existence of a solution for a random system of strict inequalities. In fact, we need to know remarkably little about the distribution of MM. We will assume merely that the distribution of its columns λi\lambda_i is invariant under negation of individual columns; that is, under maps of the form (λ1,,λk)(λ1,,λi,,λn).(\lambda_1, \dots, \lambda_k) \mapsto (\lambda_1, \dots, -\lambda_i, \dots, \lambda_n). Consider the kk hyperplanes defined by equalities λi,x=0.\langle \lambda_i, x \rangle = 0. As we explained above, any set of kk generic hyperplanes cut Rn\R^n into a certain number FknF^n_k of cells. Let's condition on the value of the vectors λi\lambda_i modulo their sign, which we'll call the variable L.L. The FknF^n_k cells of Rn\R^n are now fixed, and our event space is discrete; its elements correspond to choices of directions for each inequality. We will prove that the probability of the event AA that our system admits a solution is constant when conditioned on L.L. In particular, this proves that the overall probability equals this constant, since P(A)=E(P(AL)).P(A) = E(P(A | L)). Given choices of directions, we can assign each cell to a boolean vector in {0,1}k\{0, 1\}^k that records which inequalities hold over it. We'll write 11 to mean an inequality holds and 00 to mean it doesn't. We're interested in the probability that the assignment SS of cells to binary vectors assigns some cell to the all-ones vector 1=(1,1,,1).\mathbf 1 = (1, 1, \dots, 1).

Consider the group Z2k\Z^k_2 that acts by flipping directions of our inequalities. An element gZ2kg \in \Z^k_2 will act on SS by flipping bits in the binary vectors it assigns to each cell. The set of all possible assignments is clearly the orbit of any particular assignment S0S_0 under this action, and by assumption of invariance of our distribution, each of the assignments gS0g S_0 is equally likely.

Now, notice that gS0g S_0 contains 1\mathbf 1 iff S0S_0 contains g1.g \mathbf 1. So, we're looking at the probability that the random binary vector g1g \mathbf 1 belongs to some fixed set S0S_0 with FnkF^k_n elements, which is simply Fkn/2k.F^n_k/2^k. Combined with our computation of FknF^n_k from above, we have proven the following.

Proposition: Suppose MM is an n×kn \times k random matrix with distribution invariant under negation of any individual column. Then there exists a vector yRny \in \R^n verifying MTy<0M^T y < 0 with probability Fkn2k=2(k1)j<n(k1j).\frac{F^n_k}{2^k} = 2^{-(k - 1)} \sum_{j < n} \binom{k - 1}{j}. On the other hand, the columns of MM generate the origin in convex combination with probability 1Fkn2k=2(k1)jn(k1j).1 - \frac{F^n_k}{2^k} = 2^{-(k - 1)} \sum_{j \ge n} \binom{k - 1}{j}.

Note that these probabilities have another natural interpretation. If we introduce a variable Sk1S_{k - 1} giving the sum of (k1)(k - 1) independent Bernoulli variables, we find that the convex hull of kk vectors in Rn\R^n contains the origin with probability P(Sk1n).P(S_{k - 1} \ge n). Before moving on, let's review how our reasoning above applies to the original problem of the four points. We were interested in the probability of the condition xRk,x>0Mx=0(1)\exists x \in \R^k, x > 0 \land M x = 0 \tag{1} for a matrix MM with k=4k = 4 columns and n=2n = 2 rows. According to our Farkas' lemma argument, this is equivalent (with probability 11) to the non-existence of a solution to the system MTy>0\, - M^T y > 0 of 44 inequalities in R2.\R^2. (This corresponds to the fact that the polygon spanned by our vectors does not contain the origin exactly when there exists some half-plane containing all of them.) Since 44 lines cut R2\R^2 into F42=8F^2_4 = 8 pieces and there are 24=162^4 = 16 possible binary vectors with length 4,4, our invariance argument tells us that our polygon contains the origin with probability 1/2.1/2.

In general, a distribution of our four points that is invariant under Z24\Z^4_2 is a mixture of distributions like the one depicted below in the circle on the left.

We've also depicted the dual problem—the system of inequalities—on the right by drawing the half-space normal to each vector. Farkas' lemma says that our polygon on the left will contain the origin exactly when one of the cells on the right is not colored by any half-space.

It's interesting to keep in mind that the method we used to compute this probability is very different from the duality-type bijection we mentioned at the beginning of this post. Rather than transforming the event space as a whole, here we are fixing a particular way that the vectors λi\lambda_i decompose Rn\R^n into cells and using an invariance that acts on the relationship between those particular cells and inequalities.

Fun Applications

Now, let's take a look at the probabilities we have computed for general kk and n.n. To mix things up, let's also look at condition (1)(1) another way: it is the event that kk randomly chosen hemispheres cover the entire unit sphere in Rn\R^n. How many hemispheres do we need to paint at random until the whole sphere is painted?

Let XX be the number of hemispheres required. Using the artificial variable Sk1S_{k - 1} that gives a sum of k1k - 1 Bernoulli variables, we have P(X1>k)=P(Sk<n).P(X - 1> k) = P(S_k < n). So, X1X - 1 is distributed like the number of Bernoulli variables we need to sum to reach n.n. We can also write this as a sum of geometrically distributed variables, so it's easy to compute its expected value: E(X)=1+E(X1)=1+2n.E(X) = 1 + E(X - 1) = 1 + 2n. This results in one of my all-time favorite mathematical "party facts": if we are trying to paint the whole surface of a sphere in R3\R^3 by painting one random hemisphere at a time, on average we need to paint 77 hemispheres. "Why 77??"

Let's do the same kind of analysis for the dual condition yRn,MTy>0,(2)\exists y \in \R^n, - M^T y > 0 \tag{2}, which we will view as the probability that nn random vectors in Rk\R^k admit a strictly positive linear combination. What can we say about how many random vectors we need to draw from Rk\R^k until we can get a positive linear combination?

Again, we use the variables Sk.S_k. We proved above that condition (2)(2) is satisfied with probability P(Sk1<n).P(S_{k - 1} < n). When nk,n \ge k, this equals 1.1. Indeed, it is obvious that kk vectors in Rk\R^k generate the whole space with probability 1.1. Less obviously, we have better than even odds of having a positive linear combination exactly when n>k/21/2.n > k/2 - 1/2. When kk is even, n=k/2n = k/2 will have exactly even odds. On the other hand, applying the law of large numbers to the variable Sk1/(k1),S_{k - 1}/(k - 1), we find that a subspace of dimension n>(12+ϵ)kn > \left( \frac{1}{2} + \epsilon \right)k will have overwhelming probability to contain a positive vector for large k.k. Informally, a large random matrix with dimensions n×kn \times k will have a positive element in its column space with probability close to 00 if k<n/2k < n/2 but probability close to 1 if k>n/2.k > n / 2.

Finding Duality

Finally, let's return to the original mystery of the 44 points. Our combinatorial proof above shows that, indeed, kk points in R2k\R^{2k} with distribution invariant under the action of Z2k\Z^k_2 will generate the origin in convex combination with probability 1/2.1/2. However, we wonder if there might be another way of proving this fact. Specifically, we would like to prove it in the same manner as the bridge puzzle—by finding a natural way to rearrange our event space that swaps our event with its converse.

One idea is to try looking at duality. Throughout our analysis above, we used the fact that the conditions xRk,x>0Mx=0,(1)\exists x \in \R^k, x > 0 \land M x = 0, \tag{1} and yRn,MTy>0.(2)\exists y \in \R^n, - M^T y > 0. \tag{2} are mutually exclusive with probability 11. Geometrically, the satisfiability of one gives a convex certificate for the unsatisfiability of the other. Unfortunately for us, the two conditions here have an inherently different structure; unlike the graph of our earlier puzzle, our condition is not "self-dual."

On the other hand, consider the similar-looking problem xRk,x>0Mx>0.(3)\exists x \in \R^k, x > 0 \land M x > 0. \tag{3} Excluding a probability 00 event as usual, the convex certificate for infeasibility of this problem would be yRn,y>0MTy>0.(4)\exists y \in \R^n, y > 0 \land - M^T y > 0. \tag{4} If MM is a square matrix so that k=nk = n, this problem is just the condition above with MM replaced by MT.- M^T. So, for distributions on MM invariant under this transformation, we have proved that the probability of condition (3)(3) is 1/2.1/2.

Can we fit our problem into this format? Given a matrix MM with dimensions n×2n,n \times 2n, and let's divide it into blocks M=[M0M1].M = \begin{bmatrix} M_0 & M_1 \end{bmatrix}. We're interested in knowing when there exists a positive vector x=(x0  x1)x = (x_0 \; x_1) so that Mx=M0x0+M1x1=0.(5)M x = M_0 x_0 + M_1 x_1 = 0. \tag{5} Since M0M_0 is invertible with probability 1,1, we can multiply this condition on the left to reach the equivalent equation M01M1x1=x0.M_0^{-1} M_1 x_1 = x_0. We will have positive vectors (x0,x1)(x_0, x_1) solving this equation exactly when M01M1M_0^{-1} M_1 satisfies condition (3)(3)—that is, its columns generate a positive vector in conic combination. Finally, we need M01M1M_0^{-1} M_1 to coincide in distribution with (M01M1)T.-(M_0^{-1} M_1)^T. Unfortunately, this is a bit of a tricky step! I see that it is true when M0M_0 and M1M_1 are independently distributed over the Haar measure on orthogonal matrices, but not when they are, say, Gaussian random matrices. (Write me an email if you know whether this is true!)

Fortunately, there is another way ahead. From another point of view, the existence of x>0x > 0 satisfying Equation (5)(5) means that the interior of the cone C0C_0 generated by the columns of M0M_0 intersects the interior of the cone C1C_1 generated by the columns of M1.-M_1. (Keep in mind that an element is in the interior of a cone generated by nn vectors in Rn\R^n exactly when it can be written as a positive conic combination of them.) Meanwhile, the convex certificate for the non-existence of such an intersection would be a hyperplane strictly separating C0C_0 from C1,C_1, which is the same as an element simultaneously in the interiors of C0C_0^\perp and C1,-C_1^\perp, where C={xRn:yC,x,y0}C^\perp = \{ x \in \R^n : \forall y \in C, \langle x, y \rangle \le 0 \} denotes the dual cone. Now, with probability 1,1, the dual of a cone generated by nn vectors in Rn\R^n is also generated by nn vectors. So, let M0M_0^\perp and M1M_1^\perp be matrices whose columns give us such generators. Altogether we have argued that the matrix M=[M0M1]M = \begin{bmatrix} M_0 & M_1 \end{bmatrix} contains the origin exactly when M=[M0M1]M' = \begin{bmatrix} M_0^\perp & -M_1^\perp \end{bmatrix} does not. (We described exactly this construction for n=2n = 2 at the beginning of this post.) To finish the argument, we would just need to argue that the operation ()(-)^\perp can be defined in a way that leaves a certain distribution over sets of vectors invariant. For example, this is true (and not too hard to prove) for sets of vectors selected independently at random from the unit sphere.

(My colleague Reynaldo Gil Pons from across the hall came up with this argument to solve a problem I had taped to my office door.)

Over all the topics of we've dealt with in this post, two unresolved issues stand out to me. First, I wonder if it is possible to interpret the "artificial" binomial variable Sk1S_{k - 1} in a sensible way. For example, it there a "reasonable" way to map ordered bases of Rn\R^n to sequences of (n1)(n - 1) bits in such a way that each bit ends up being independently and uniformly distributed and so that, where cc is the number of non-zero bits produced, c+1c + 1 is the number of vectors we need to take from the sequence in order to get a positive linear combination? Because the probabilities agree, there are at least "strange and unreasonable" ways of making this assignment. Are there, say, any ways that we can compute in practice?

Second, I wonder if the duality argument that we have described at the end of this post can be extended to match the combinatorics of our problem in the general case. (The operation that maps generators of a cone to generators of its normal cone does not preserve Z2k\Z^k_2-invariant distributions in general, so for the moment our duality argument appears to be totally unrelated from our combinatorial argument.) For a more concrete problem, consider how the symmetry of Sk1S_{k - 1} implies P(Sk1<n)=P(k1Sk1<n)=P(Sk1>(k1)n)=1P(Sk1<kn).\begin{align*} P(S_{k - 1} < n) & = P(k - 1 - S_{k - 1} < n) \\ & = P(S_{k - 1} > (k - 1) - n) = 1 - P(S_{k - 1} < k - n). \end{align*} Does there exist a corresponding "geometric" proof of the fact that kk strict linear inequalities in Rn\R^n have a solution with the same probability that kk strict linear inequalities in Rkn\R^{k - n} do not have a solution?

← cgad.ski