← cgad.ski 2023-04-17

An Integrability Condition

In my post last week on the policy gradient theorem, I mentioned that the matrix inverse has a nice derivative: where we denote N(A)=A1N(A) = A^{-1} and dN(A,B)=d/dtN(A+tB),dN(A, B) = d/dt N(A + t B), dN(A,B)=N(A)BN(A).dN(A, B) = -N(A) B N(A). After writing this, I wondered what other differential equations of this sort have solutions. For a simple example, consider dN(A,B)=N(A)B.dN(A, B) = N(A)B. Setting f(t)=N(tA)f(t) = N(tA) and differentiating in tt gives ddtf(t)=dN(tA,A)=N(tA)A=f(t)A\frac{d}{dt}f(t) = dN(tA, A) = N(t A) A = f(t) A which implies N(A)=f(1)=CeAN(A) = f(1) = C e^{A} for a choice of initial condition f(0)=C.f(0) = C. However, we know that ddteA+tBeAB\frac{d}{dt} e^{A + t B} \neq e^A B in general for non-commutative AA and BB. In fact, the only function NN solving this equation for n×nn \times n matrices with n>1n > 1 turns out to be N=0N = 0. How can we tell in advance when such a differential equation will have non-trivial solutions?

The Problem

Let's consider this problem from the point of view of differential geometry. Let f ⁣:RnRmf \colon \R^n \to \R^m be a smooth map and let dfdf be its total differential. We are curious to know when the equation df(x)=g(f(x),x)df(x) = g(f(x), x) admits local solutions. Clearly, making nice regularity assumptions about gg—which we will make use of liberally in the following—solutions to this equation are unique over path-connected domains, should they exist. Indeed, taking some path γ(t)\gamma(t), the equation above gives us enough information to compute ddtf(γ(t))=g(f(γ(t)),γ(t))(γ˙(t)).\frac{d}{dt}f(\gamma(t)) = g(f(\gamma(t)), \gamma(t))(\dot \gamma(t)). Thus, subject to a choice for the value of ff at some point along γ\gamma, f(γ(t))f(\gamma(t)) is determined as a unique solution to this ODE. On the other hand, it may happen that the values of ff we obtain by fixing its value at some point and solving ODEs along different paths are not path-independent, in which case our equation won't have a solution.

An intuition for differential geometry tells us that path-independence of this integral over a simply connected domain will come down to a system of equations involving the first partial derivatives of gg. The nicest way to figure out exactly what these are is by using the Frobenius theorem. Let us introduce coordinates (x1,,xn,f1,,fm)(x_1, \dots, x_n, f_1, \dots, f_m) on Rn×Rm\R^n \times \R^m and define the vector fields Xi=xi+gi,jfj.X_i = \frac{\partial}{\partial x_i} + g_{i,j} \frac{\partial}{\partial f_j}. It is fairly clear that a (local) solution to our equation is the same as an integral submanifold for the distribution spanned by {X1,,Xn}\{X_1, \ldots, X_n\}. Furthermore, involutivity of our distribution in this case boils down to the equations [Xi,Xj]=0,[X_i, X_j] = 0, for the simple reason that [Xi,Xj][X_i, X_j] is, at each point, a linear combination of the tangent vectors /vi\partial / \partial v_i, and our distribution admits no elements of this form except 00.

These brackets are readily computed, taking a bit of care with indices of summation: [Xi,Xj]=[xi+ga,ifa,xj+gb,jfb]=[xi,gb,jfb][xj,ga,ifa]+[ga,ifa,gb,jfb]=(gk,jxigk,ixj +ga,igk,jfaga,jgk,ifa)fk.\begin{align*} [X_i, X_j] & = \left[ \frac{\partial}{\partial x_i} + g_{a,i} \frac{\partial}{\partial f_a}, \frac{\partial}{\partial x_j} + g_{b, j} \frac{\partial}{\partial f_b}\right] \\ & = \left[ \frac{\partial}{\partial x_i}, g_{b, j} \frac{\partial}{\partial f_b} \right] - \left[ \frac{\partial}{\partial x_j}, g_{a,i} \frac{\partial}{\partial f_a} \right] + \left[ g_{a,i} \frac{\partial}{\partial f_a}, g_{b,j} \frac{\partial}{\partial f_b} \right] \\ & = \left( \frac{\partial g_{k,j}}{\partial x_i} - \frac{\partial g_{k,i}}{\partial x_j} \\ \ + g_{a,i} \frac{\partial g_{k,j}}{\partial f_a} - g_{a,j} \frac{\partial g_{k,i}}{\partial f_a} \right) \frac{\partial}{\partial f_k}. \end{align*} From this, we can (perhaps in a future post) understand something about what matrix operator differential equations can be integrated.

This result is actually an exercise in Lee's Introduction to Smooth Manifolds in the chapter on the Frobenius theorem. However, if we forget to use the Frobenius theorem—as I did, when I considered this question a few days ago—we can also discover the utility of Lie brackets for ourselves.

Integrating over Loops

Suppose for simplicity that ff is real-valued, and consider an equation of the simpler type df(x)=g(x)=gi(x)dxi.df(x) = g(x) = g_i(x) d x_i. (Here, gg is a differential form.) If a solution exists, we can recover it by integrating gg over paths. Furthermore, path-independence of our integral is the same as saying that it vanishes over loops.

When do our loop integrals vanish? Stoke's theorem gives the answer: when our domain is simply connected, every loop is the boundary of a disk, a disk can be partitioned into little tiny subregions, and integrals over big loops are sums of many integrals over little tiny loops. So for our equation to be integrable, we just have to check that the 22-form telling us the integral of gg over little tiny loops—its exterior derivative—vanishes.

Let's recall the usual way these little-tiny-loop-integrals are computed. Think of a function f ⁣:RnRf \colon \R^n \to \R near the origin and integrate over loops i,j(ϵ)\square_{i, j}(\epsilon) traversing the points (0,ϵei,ϵ(ei+ej),ϵej,0)(0, \epsilon e_i, \epsilon (e_i + e_j), \epsilon e_j, 0) for two basis vectors eie_i and eje_j. We have i,j(ϵ)g=0ϵgi(tei)+gj(ϵei+tej)gi(tei+ϵej)gj(tej)dt.\begin{align*} \int_{\square_{i, j}(\epsilon)} g & = \int_0^\epsilon g_i(t e_i) + g_j(\epsilon e_i + t e_j) - g_i(t e_i + \epsilon e_j) - g_j(t e_j) \, dt. \end{align*} By making substitutions of the form gi(x+ϵej)gi(x)=0ϵgixj(x+sej)dsg_i(x + \epsilon e_j) - g_i(x) = \int_0^\epsilon \frac{\partial g_i}{\partial x_j}(x + s e_j) \, ds within the integral, we get the Stoke's theorem-type formula i,j(ϵ)g=0ϵ0ϵgjxi(tej+sei)gixj(tei+sej)dsdt=0ϵ0ϵ(gjxigixj)(tei+sej)dsdt.\begin{align*} \int_{\square_{i,j}(\epsilon)} g & = \int_0^\epsilon \int_0^\epsilon \frac{\partial g_j}{\partial x_i}(t e_j + s e_i) - \frac{\partial g_i}{\partial x_j}(t e_i + s e_j) \, ds\,dt \\ & = \int_0^\epsilon \int_0^\epsilon \left(\frac{\partial g_j}{\partial x_i} - \frac{\partial g_i}{\partial x_j}\right)(t e_i + s e_j) \, ds\,dt. \end{align*} In particular, the approximation for small ϵ\epsilon is i,j(ϵ)g=ϵ2(gjeigiej)(0)+O(ϵ3).\int_{\square_{i,j}(\epsilon)} g = \epsilon^2 \left(\frac{\partial g_j}{\partial e_i} - \frac{\partial g_i}{\partial e_j} \right)(0) + O(\epsilon^3). The scalars gj/eigi/ej\partial g_j / \partial e_i - \partial g_i / \partial e_j are exactly the coefficients of the exterior derivative dgdg.

Can we do the same thing for a system of equations dfi(x)=gi(f(x),x)=gi,j(f(x),x)dxj?d f_i(x) = g_i(f(x), x) = g_{i, j}(f(x), x) d x_j? Path integrals of a sort can still be defined; for any path γ\gamma parameterized on [0,1][0, 1] and initial value f0Rmf_0 \in \R^m, define Gγ(f0)RmG_{\gamma}(f_0) \in \R^m to equal p(1)p(1) where p(t) ⁣:[0,1]Rmp(t) \colon [0, 1] \to \R^m solves the ODE {p(0)=f0p˙i(t)=gi,j(p(t),γ(t))γ˙j(t).\begin{cases} p(0) = f_0 \\ \dot p_i(t) = g_{i,j}(p(t), \gamma(t)) \dot \gamma_j(t). \end{cases} Figuring out a Stoke's theorem in this situation will be considerably more confusing, since a path integral over an infinitesimal loop is now a vector field in the domain of ff. Nevertheless, we have the inkling that taking the limit limϵ01ϵ2Gi,j(ϵ)(v)\lim_{ \epsilon \to 0 } \frac{1}{\epsilon^2}G_{\square_{i, j}(\epsilon)}(v) should give us some functions playing a role similar to the one played above by the coefficients of the exterior derivative. Unfortunately, computing this limit seems pretty hopeless without either divine intuition or hard work.

One easy way out is to forget about path integrals and instead use the symmetry of partial derivatives 2fkxixj=2fkxjxi\frac{\partial^2 f_k}{\partial x_i \partial x_j} = \frac{\partial^2 f_k}{\partial x_j \partial x_i} of a prospective solution ff. Taking partial derivatives with respect to xix_i of fkxj=gk,j(f(x),x)\frac{\partial f_k}{\partial x_j} = g_{k, j}(f(x), x) gives 2fkxixj=xigk,j(f(x),x)=gk,jxi+ga,igk,jfa,\frac{\partial^2 f_k}{\partial x_i \partial x_j} = \frac{\partial}{\partial x_i} g_{k, j}(f(x), x) = \frac{\partial g_{k, j}}{\partial x_i} + g_{a,i} \frac{\partial g_{k, j}}{\partial f_a}, from which we conclude that gk,jxi+ga,igk,jfagk,ixjga,jgk,ifa=0.\frac{\partial g_{k, j}}{\partial x_i} + g_{a,i} \frac{\partial g_{k, j}}{\partial f_a} - \frac{\partial g_{k, i}}{\partial x_j} - g_{a,j} \frac{\partial g_{k, i}}{\partial f_a} = 0. Actually, these functions are exactly what we would find by computing the approximation to Gi,j(ϵ)(v)G_{\square_{i, j}(\epsilon)}(v), as we will see next. The "divine intuition" we will need is to relate the flow of vector fields to the Lie bracket.

Lie Brackets

One of the many fun ways you get Lie brackets to show up is by developing a commutator product of formal exponential series in two non-commuting variables XX and YY: eϵXeϵYeϵXeϵY=I+[X,Y]ϵ2+O(ϵ3).e^{\epsilon X} e^{\epsilon Y} e^{-\epsilon X} e^{-\epsilon Y} = I + [X, Y] \epsilon^2 + O(\epsilon^3). The operator sending vector fields to their flows can also be dealt with formally as an exponential map. Specifically, when ϕϵX(p)\phi_{\epsilon X}(p) denotes the image of pRnp \in \R^n under the flow of XX for time ϵ\epsilon, the estimate ϕϵYϕϵXϕϵYϕϵX(p)=p+ϵ2[X,Y](p)+O(ϵ3).\phi_{-\epsilon Y} \circ \phi_{-\epsilon X} \circ \phi_{\epsilon Y} \circ \phi_{\epsilon X}(p) = p + \epsilon^2 [X, Y](p) + O(\epsilon^3). can be proven from the formal calculation above by looking at things from the right angle. The key observation is that, if p(t)p(t) is an integral curve of XX, then for any smooth function ff we have ddtf(p(t))=df(X(p(t)))=X(f)(p(t)).\frac{d}{dt} f(p(t)) = df(X(p(t))) = X(f)(p(t)). Iterating then gives (dkdtk)t=0f(p(t))=Xk(f)(p0).\left(\frac{d^k}{dt^k}\right)_{t = 0} f(p(t)) = X^k(f)(p_0). Using the suggestive notation eϵX(f)=fϕϵXe^{\epsilon X}(f) = f \circ \phi_{\epsilon X}, we conclude that k=0ϵkk!Xk(f)\sum_{k = 0}^\infty \frac{\epsilon^k}{k!} X^k(f) gives the right power series for eϵXe^{\epsilon X}. More precisely, we mean that evaluating this series at any point pp encodes the higher derivatives of eϵX(f)(p)=f(ϕϵX(p))e^{\epsilon X}(f)(p) = f(\phi_{\epsilon X}(p)) with respect to ϵ\epsilon. Furthermore, a bit of thought reveals that this correspondence from families of smooth maps to formal power series of differential operators (really, asymptotic series) is an antihomomorphism for composition. This curious observation simplifies the derivation of many series approximations involving compositions of flows; for example, f(ϕϵX(ϕϵY(p)))=(eϵYeϵX)(f)(p)=(1+(X+Y)ϵ+(X2+Y22+YX)ϵ2+)(f)(p).\begin{align*} f(\phi_{\epsilon X}(\phi_{\epsilon Y}(p))) & = (e^{\epsilon Y} e^{\epsilon X})(f)(p) \\ & = \left( 1 + (X + Y)\epsilon + \left(\frac{X^2 + Y^2}{2} + YX \right) \epsilon^2 + \dots\right)(f)(p). \end{align*} With this result in hand, let's return to our equation dfi(x)=gi(f(x),x)=gi,j(f(x),x)dxj.d f_i(x) = g_i(f(x), x) = g_{i, j}(f(x), x) d x_j. The vector fields Xi=xi+gi,jfjX_i = \frac{\partial}{\partial x_i} + g_{i,j} \frac{\partial}{\partial f_j} that we defined above have another use: their flows give "path integrals" along coordinate axes. In particular, Gi,j(ϵ)(v0)=v(ϕϵXjϕϵXiϕϵXjϕϵXi(v0,0))=v0+ϵ2v([Xi,Xj](v0,0))+O(ϵ3).\begin{align*} G_{\square_{i, j}(\epsilon)}(v_0) & = v(\phi_{-\epsilon X_j} \circ \phi_{-\epsilon X_i} \circ \phi_{\epsilon X_j} \circ \phi_{\epsilon X_i}(v_0, 0)) \\ & = v_0 + \epsilon^2 v([X_i, X_j](v_0, 0)) + O(\epsilon^3). \end{align*} For me, this is a nice way to see why Lie brackets express the integrability conditions of our differential equation so well.

← cgad.ski