Consider the following graph.
If we remove each edge with independent probability what is the probability that there is still a path from the top to the bottom vertex?
Since the removal of each subset of edges is equally likely, this kind of question is equivalent to counting the subsets of edges that we can remove without disconnecting two vertices. How easily we crunch through this enumeration depends on how we manage to decompose the problem.
However, suppose we learn that the probability of a path remaining in this particular graph is exactly Forget combinatorics—something strange is happening! In fact, it turns out that our special case can be solved with a "duality" trick, illustrated below. (Click the graph to generate a new random configuration.)
There will be a path between the top and bottom nodes exactly when there is no path between the left and the right nodes. Furthermore, the top-to-bottom and left-to-right graphs are the same and, once identified, have the same distribution of missing edges. In other words, we have found a rearrangement of the outcomes of our random process—the subsets of missing edges—that swaps the event that a path exists with the event that no path exists. Since this rearrangement preserves probability, it follows that each event must have probability (I learned this beautiful idea from this mathpages article.)
Let's take a look at another suspicious problem. Suppose we choose four points on the unit circle. What is the probability that the origin lies in their convex hull? (Again, click for random configurations.)
As it turns out, the answer is again exactly If you're skeptical, you can press the "play" icon above to resample the experiment at high speed. (The black lines under the bar graph delineate the interval that the empirical proportion should stay within about of the time for large .)
In my experience, things don't happen with even odds unless they have a very good reason to, and that reason is usually a symmetry. For example, a coin will land heads as often as it lands tails because it is approximately symmetric, and there is no reason for the laws of physics to privilege one side of a symmetric coin so long as we have given it enough energy to make its initial condition approximately irrelevant.
Is there any appropriate symmetry in the problem of the four points? In such questions, it is important to have a degree of faith. Of course there is a symmetry; symmetry is everywhere! Unfortunately, symmetry does occasionally work in mysterious ways, and here it is not exactly obvious.
Given a pair of unit vectors, define the function to give the unique pair of unit vectors that satisfy the relations Now, consider the operation taking the four points to We claim that this is measure-preserving for the distribution under which each point is independent and uniformly distributed. Furthermore, ignoring an event of probability we claim that the original four points generate a convex hull that contains the origin iff our four new points do not. So, our polygon contains the origin with the same probability that it does not. This observation solves the four-points problem in the same way that we solved the graph problem.
We can visualize this construction, although the picture is admittedly a little messy. In the diagram below, vectors with the same color are in the same quadruplet. (Note that which quadruplet is which does not matter, since the operation we have defined is its own inverse.) Meanwhile, the vectors drawn with solid lines are the pairs and while the vectors with dashed lines are the pairs and
Geometrically, each vector is a degree rotation of one of the vectors of the opposite color. For the vectors drawn with solid lines, these rotations turn them "away" from the pair of oppositely colored solid lines, while the dotted line vectors are turned "towards" the oppositely colored pair.(After writing this post, I learned another geometric argument for this problem from Ian Agol on mathstodon.)
The remainder of this post is divided into three parts.
First, we explain how this kind of problem can be solved in other dimensions and for other numbers of points using a combinatorial argument.
Second, we investigate some particularly memorable consequences of this general solution.
Finally, we return to the problem of finding a "duality-like" rearrangement to explain the special fact that points in have the origin in their convex hull with probability We also make note of some remaining mathematical mysteries.
We begin with an apparently unrelated combinatorial problem: how many cells do linear hyperplanes cut into? This is related to the classic lazy caterer problem, which asks how many pieces we can cut out of a pizza pie with a certain number of cuts.
The lazy caterer problem is solved by observing that each cut increases the number of pieces of pizza by exactly where is the number of existing cuts that our new cut intersects. In fact, any number of cuts can be made in such a way that each cut intersects all the ones made before, so the maximum number of pieces we can make with cuts is The case of linear hyperplanes cutting can be analyzed similarly. It is only important to note that, while in the lazy caterer problem a new cut may or may not intersect a given existing cut, two non-equal hyperplanes always intersect, and so it will happen that hyperplanes always make the same number of cells in
Let be this number. The increase in cells cut out of caused by the addition of one new hyperplane equals the number of cells that intersects. This is the same as the number of cells that the existing hyperplanes cut into. Since is -dimensional, we have the recurrence Especially in view of this recurrence, it makes sense to define for all and for (It's slightly odd to imagine a "hyperplane" cutting into cells, but it's what's demanded by the recurrence relation we've observed.)
We can now compute in general using a generating function. Let's put Then the recurrence above means that, as far as the non-constant terms of are concerned, Keeping in mind that for we find that and in general, We arrive at the following.
Proposition: Generically, hyperplanes cut into cells.
We now return to computing probabilities. Suppose we have points in represented by the columns of an matrix We are interested in the probability of the event which is equivalent to the origin belonging to the convex hull generated by the columns of
Let's take a moment to look at this problem in a few different ways. Ignoring the singular event that the origin is on the boundary of our convex hull, Farkas' lemma tells us that the unsatisfiability of is equivalent to the condition Geometrically, this would mean that all the columns of lie strictly within some half-space. We can also read this as the event that the column space of contains a strictly positive element, or as the event that a system of strict linear inequalities over admits a solution.
As it turns out, it is most straightforward to compute the probability of this last event—the existence of a solution for a random system of strict inequalities. In fact, we need to know remarkably little about the distribution of . We will assume merely that the distribution of its columns is invariant under negation of individual columns; that is, under maps of the form Consider the hyperplanes defined by equalities As we explained above, any set of generic hyperplanes cut into a certain number of cells. Let's condition on the value of the vectors modulo their sign, which we'll call the variable The cells of are now fixed, and our event space is discrete; its elements correspond to choices of directions for each inequality. We will prove that the probability of the event that our system admits a solution is constant when conditioned on In particular, this proves that the overall probability equals this constant, since Given choices of directions, we can assign each cell to a boolean vector in that records which inequalities hold over it. We'll write to mean an inequality holds and to mean it doesn't. We're interested in the probability that the assignment of cells to binary vectors assigns some cell to the all-ones vector
Consider the group that acts by flipping directions of our inequalities. An element will act on by flipping bits in the binary vectors it assigns to each cell. The set of all possible assignments is clearly the orbit of any particular assignment under this action, and by assumption of invariance of our distribution, each of the assignments is equally likely.
Now, notice that contains iff contains So, we're looking at the probability that the random binary vector belongs to some fixed set with elements, which is simply Combined with our computation of from above, we have proven the following.
Proposition: Suppose is an random matrix with distribution invariant under negation of any individual column. Then there exists a vector verifying with probability On the other hand, the columns of generate the origin in convex combination with probability
Note that these probabilities have another natural interpretation. If we introduce a variable giving the sum of independent Bernoulli variables, we find that the convex hull of vectors in contains the origin with probability Before moving on, let's review how our reasoning above applies to the original problem of the four points. We were interested in the probability of the condition for a matrix with columns and rows. According to our Farkas' lemma argument, this is equivalent (with probability ) to the non-existence of a solution to the system of inequalities in (This corresponds to the fact that the polygon spanned by our vectors does not contain the origin exactly when there exists some half-plane containing all of them.) Since lines cut into pieces and there are possible binary vectors with length our invariance argument tells us that our polygon contains the origin with probability
In general, a distribution of our four points that is invariant under is a mixture of distributions like the one depicted below in the circle on the left.
We've also depicted the dual problem—the system of inequalities—on the right by drawing the half-space normal to each vector. Farkas' lemma says that our polygon on the left will contain the origin exactly when one of the cells on the right is not colored by any half-space.
It's interesting to keep in mind that the method we used to compute this probability is very different from the duality-type bijection we mentioned at the beginning of this post. Rather than transforming the event space as a whole, here we are fixing a particular way that the vectors decompose into cells and using an invariance that acts on the relationship between those particular cells and inequalities.
Now, let's take a look at the probabilities we have computed for general and To mix things up, let's also look at condition another way: it is the event that randomly chosen hemispheres cover the entire unit sphere in . How many hemispheres do we need to paint at random until the whole sphere is painted?
Let be the number of hemispheres required. Using the artificial variable that gives a sum of Bernoulli variables, we have So, is distributed like the number of Bernoulli variables we need to sum to reach We can also write this as a sum of geometrically distributed variables, so it's easy to compute its expected value: This results in one of my all-time favorite mathematical "party facts": if we are trying to paint the whole surface of a sphere in by painting one random hemisphere at a time, on average we need to paint hemispheres. "Why ??"
Let's do the same kind of analysis for the dual condition which we will view as the probability that random vectors in admit a strictly positive linear combination. What can we say about how many random vectors we need to draw from until we can get a positive linear combination?
Again, we use the variables We proved above that condition is satisfied with probability When this equals Indeed, it is obvious that vectors in generate the whole space with probability Less obviously, we have better than even odds of having a positive linear combination exactly when When is even, will have exactly even odds. On the other hand, applying the law of large numbers to the variable we find that a subspace of dimension will have overwhelming probability to contain a positive vector for large Informally, a large random matrix with dimensions will have a positive element in its column space with probability close to if but probability close to 1 if
Finally, let's return to the original mystery of the points. Our combinatorial proof above shows that, indeed, points in with distribution invariant under the action of will generate the origin in convex combination with probability However, we wonder if there might be another way of proving this fact. Specifically, we would like to prove it in the same manner as the bridge puzzle—by finding a natural way to rearrange our event space that swaps our event with its converse.
One idea is to try looking at duality. Throughout our analysis above, we used the fact that the conditions and are mutually exclusive with probability . Geometrically, the satisfiability of one gives a convex certificate for the unsatisfiability of the other. Unfortunately for us, the two conditions here have an inherently different structure; unlike the graph of our earlier puzzle, our condition is not "self-dual."
On the other hand, consider the similar-looking problem Excluding a probability event as usual, the convex certificate for infeasibility of this problem would be If is a square matrix so that , this problem is just the condition above with replaced by So, for distributions on invariant under this transformation, we have proved that the probability of condition is
Can we fit our problem into this format? Given a matrix with dimensions and let's divide it into blocks We're interested in knowing when there exists a positive vector so that Since is invertible with probability we can multiply this condition on the left to reach the equivalent equation We will have positive vectors solving this equation exactly when satisfies condition —that is, its columns generate a positive vector in conic combination. Finally, we need to coincide in distribution with Unfortunately, this is a bit of a tricky step! I see that it is true when and are independently distributed over the Haar measure on orthogonal matrices, but not when they are, say, Gaussian random matrices. (Write me an email if you know whether this is true!)
Fortunately, there is another way ahead. From another point of view, the existence of satisfying Equation means that the interior of the cone generated by the columns of intersects the interior of the cone generated by the columns of (Keep in mind that an element is in the interior of a cone generated by vectors in exactly when it can be written as a positive conic combination of them.) Meanwhile, the convex certificate for the non-existence of such an intersection would be a hyperplane strictly separating from which is the same as an element simultaneously in the interiors of and where denotes the dual cone. Now, with probability the dual of a cone generated by vectors in is also generated by vectors. So, let and be matrices whose columns give us such generators. Altogether we have argued that the matrix contains the origin exactly when does not. (We described exactly this construction for at the beginning of this post.) To finish the argument, we would just need to argue that the operation can be defined in a way that leaves a certain distribution over sets of vectors invariant. For example, this is true (and not too hard to prove) for sets of vectors selected independently at random from the unit sphere.
(My colleague Reynaldo Gil Pons from across the hall came up with this argument to solve a problem I had taped to my office door.)
Over all the topics of we've dealt with in this post, two unresolved issues stand out to me. First, I wonder if it is possible to interpret the "artificial" binomial variable in a sensible way. For example, it there a "reasonable" way to map ordered bases of to sequences of bits in such a way that each bit ends up being independently and uniformly distributed and so that, where is the number of non-zero bits produced, is the number of vectors we need to take from the sequence in order to get a positive linear combination? Because the probabilities agree, there are at least "strange and unreasonable" ways of making this assignment. Are there, say, any ways that we can compute in practice?
Second, I wonder if the duality argument that we have described at the end of this post can be extended to match the combinatorics of our problem in the general case. (The operation that maps generators of a cone to generators of its normal cone does not preserve -invariant distributions in general, so for the moment our duality argument appears to be totally unrelated from our combinatorial argument.) For a more concrete problem, consider how the symmetry of implies Does there exist a corresponding "geometric" proof of the fact that strict linear inequalities in have a solution with the same probability that strict linear inequalities in do not have a solution?