An Integrability Condition
In my post last week on the policy gradient theorem, I mentioned that the matrix inverse has a nice derivative: where we denote N(A)=A−1 and dN(A,B)=d/dtN(A+tB),
dN(A,B)=−N(A)BN(A).
After writing this, I wondered what other differential equations of this sort have solutions. For a simple example, consider
dN(A,B)=N(A)B.
Setting f(t)=N(tA) and differentiating in t gives
dtdf(t)=dN(tA,A)=N(tA)A=f(t)A
which implies N(A)=f(1)=CeA for a choice of initial condition f(0)=C. However, we know that
dtdeA+tB=eAB
in general for non-commutative A and B. In fact, the only function N solving this equation for n×n matrices with n>1 turns out to be N=0. How can we tell in advance when such a differential equation will have non-trivial solutions?
Let's consider this problem from the point of view of differential geometry. Let f:Rn→Rm be a smooth map and let df be its total differential. We are curious to know when the equation
df(x)=g(f(x),x)
admits local solutions. Clearly, making nice regularity assumptions about g—which we will make use of liberally in the following—solutions to this equation are unique over path-connected domains, should they exist. Indeed, taking some path γ(t), the equation above gives us enough information to compute
dtdf(γ(t))=g(f(γ(t)),γ(t))(γ˙(t)).
Thus, subject to a choice for the value of f at some point along γ, f(γ(t)) is determined as a unique solution to this ODE. On the other hand, it may happen that the values of f we obtain by fixing its value at some point and solving ODEs along different paths are not path-independent, in which case our equation won't have a solution.
An intuition for differential geometry tells us that path-independence of this integral over a simply connected domain will come down to a system of equations involving the first partial derivatives of g. The nicest way to figure out exactly what these are is by using the Frobenius theorem. Let us introduce coordinates (x1,…,xn,f1,…,fm) on Rn×Rm and define the vector fields
Xi=∂xi∂+gi,j∂fj∂.
It is fairly clear that a (local) solution to our equation is the same as an integral submanifold for the distribution spanned by {X1,…,Xn}. Furthermore, involutivity of our distribution in this case boils down to the equations
[Xi,Xj]=0,
for the simple reason that [Xi,Xj] is, at each point, a linear combination of the tangent vectors ∂/∂fi, and our distribution admits no elements of this form except 0.
These brackets are readily computed, taking a bit of care with indices of summation:
[Xi,Xj]=[∂xi∂+ga,i∂fa∂,∂xj∂+gb,j∂fb∂]=[∂xi∂,gb,j∂fb∂]−[∂xj∂,ga,i∂fa∂]+[ga,i∂fa∂,gb,j∂fb∂]=(∂xi∂gk,j−∂xj∂gk,i +ga,i∂fa∂gk,j−ga,j∂fa∂gk,i)∂fk∂.
From this, we can (perhaps in a future post) understand something about what matrix operator differential equations can be integrated.
This result is actually an exercise in Lee's Introduction to Smooth Manifolds in the chapter on the Frobenius theorem. However, if we forget to use the Frobenius theorem—as I did, when I considered this question a few days ago—we can also discover the utility of Lie brackets for ourselves.
Suppose for simplicity that f is real-valued, and consider an equation of the simpler type
df(x)=g(x)=gi(x)dxi.
(Here, g is a differential form.) If a solution exists, we can recover it by integrating g over paths. Furthermore, path-independence of our integral is the same as saying that it vanishes over loops.
When do our loop integrals vanish? Stoke's theorem gives the answer: when our domain is simply connected, every loop is the boundary of a disk, a disk can be partitioned into little tiny subregions, and integrals over big loops are sums of many integrals over little tiny loops. So for our equation to be integrable, we just have to check that the 2-form telling us the integral of g over little tiny loops—its exterior derivative—vanishes.
Let's recall the usual way these little-tiny-loop-integrals are computed. Think of a function f:Rn→R near the origin and integrate over loops □i,j(ϵ) traversing the points
(0,ϵei,ϵ(ei+ej),ϵej,0)
for two basis vectors ei and ej. We have
∫□i,j(ϵ)g=∫0ϵgi(tei)+gj(ϵei+tej)−gi(tei+ϵej)−gj(tej)dt.
By making substitutions of the form
gi(x+ϵej)−gi(x)=∫0ϵ∂xj∂gi(x+sej)ds
within the integral, we get the Stoke's theorem-type formula
∫□i,j(ϵ)g=∫0ϵ∫0ϵ∂xi∂gj(tej+sei)−∂xj∂gi(tei+sej)dsdt=∫0ϵ∫0ϵ(∂xi∂gj−∂xj∂gi)(tei+sej)dsdt.
In particular, the approximation for small ϵ is
∫□i,j(ϵ)g=ϵ2(∂ei∂gj−∂ej∂gi)(0)+O(ϵ3).
The scalars ∂gj/∂ei−∂gi/∂ej are exactly the coefficients of the exterior derivative dg.
Can we do the same thing for a system of equations
dfi(x)=gi(f(x),x)=gi,j(f(x),x)dxj?
Path integrals of a sort can still be defined; for any path γ parameterized on [0,1] and initial value f0∈Rm, define Gγ(f0)∈Rm to equal p(1) where p(t):[0,1]→Rm solves the ODE
{p(0)=f0p˙i(t)=gi,j(p(t),γ(t))γ˙j(t).
Figuring out a Stoke's theorem in this situation will be considerably more confusing, since a path integral over an infinitesimal loop is now a vector field in the domain of f. Nevertheless, we have the inkling that taking the limit
ϵ→0limϵ21G□i,j(ϵ)(v)
should give us some functions playing a role similar to the one played above by the coefficients of the exterior derivative. Unfortunately, computing this limit seems pretty hopeless without either divine intuition or hard work.
One easy way out is to forget about path integrals and instead use the symmetry of partial derivatives
∂xi∂xj∂2fk=∂xj∂xi∂2fk
of a prospective solution f. Taking partial derivatives with respect to xi of
∂xj∂fk=gk,j(f(x),x)
gives
∂xi∂xj∂2fk=∂xi∂gk,j(f(x),x)=∂xi∂gk,j+ga,i∂fa∂gk,j,
from which we conclude that
∂xi∂gk,j+ga,i∂fa∂gk,j−∂xj∂gk,i−ga,j∂fa∂gk,i=0.
Actually, these functions are exactly what we would find by computing the approximation to G□i,j(ϵ)(v), as we will see next. The "divine intuition" we will need is to relate the flow of vector fields to the Lie bracket.
One of the many fun ways you get Lie brackets to show up is by developing a commutator product of formal exponential series in two non-commuting variables X and Y:
eϵXeϵYe−ϵXe−ϵY=I+[X,Y]ϵ2+O(ϵ3).
The operator sending vector fields to their flows can also be dealt with formally as an exponential map. Specifically, when ϕϵX(p) denotes the image of p∈Rn under the flow of X for time ϵ, the estimate
ϕ−ϵY∘ϕ−ϵX∘ϕϵY∘ϕϵX(p)=p+ϵ2[X,Y](p)+O(ϵ3).
can be proven from the formal calculation above by looking at things from the right angle. The key observation is that, if p(t) is an integral curve of X, then for any smooth function f we have
dtdf(p(t))=df(X(p(t)))=X(f)(p(t)).
Iterating then gives
(dtkdk)t=0f(p(t))=Xk(f)(p0).
Using the suggestive notation eϵX(f)=f∘ϕϵX, we conclude that
k=0∑∞k!ϵkXk(f)
gives the right power series for eϵX. More precisely, we mean that evaluating this series at any point p encodes the higher derivatives of
eϵX(f)(p)=f(ϕϵX(p))
with respect to ϵ. Furthermore, a bit of thought reveals that this correspondence from families of smooth maps to formal power series of differential operators (really, asymptotic series) is an antihomomorphism for composition. This curious observation simplifies the derivation of many series approximations involving compositions of flows; for example,
f(ϕϵX(ϕϵY(p)))=(eϵYeϵX)(f)(p)=(1+(X+Y)ϵ+(2X2+Y2+YX)ϵ2+…)(f)(p).
With this result in hand, let's return to our equation
dfi(x)=gi(f(x),x)=gi,j(f(x),x)dxj.
The vector fields
Xi=∂xi∂+gi,j∂fj∂
that we defined above have another use: their flows give "path integrals" along coordinate axes. In particular,
G□i,j(ϵ)(v0)=v(ϕ−ϵXj∘ϕ−ϵXi∘ϕϵXj∘ϕϵXi(v0,0))=v0+ϵ2v([Xi,Xj](v0,0))+O(ϵ3).
For me, this is a nice way to see why Lie brackets express the integrability conditions of our differential equation so well.