← cgad.ski 2023-09-12

A Note on Scale-Free Networks

In their 1999 paper Emergence of Scaling in Random Networks, Barabasi and Albert observed that some large real-world networks had degree distributions well-approximated by power laws. They called such networks "scale-free," and the name stuck.

The same paper also proposed a random model, now called the Barabasi-Albert model, for a network undergoing certain idealized forms of growth and preferential attachment. It can be shown that a network produced by the BA model is scale-free with high probability in the large time limit, which raises the hypothesis that scale-free networks in the real world are also produced by mechanisms of growth and preferential attachment.

But, why do we call scale-free networks scale-free? In physics, a law is scale-free when it is unaffected by an operation of scaling. In mathematics, the term "scale-invariant" seems to be preferred, but has the same meaning. So, when I first encountered "scale-free networks", I was confused. What scaling action leaves a scale-free network invariant?

In the usage of Barabasi and Albert, "scale freeness" simply refers to the scale-free property of a network's degree distribution. Indeed, Pareto distributions of the form P(Xx)=CxγP(X \ge x) = C x^{-\gamma} for γ>0\gamma > 0 are exactly the scale-invariant distributions of positive numbers, and the power law tails we find in degree statistics of scale-free networks are their discrete analogs. The widget below gives a geometrical sense of what it means for a distribution, represented by its density function, to be "invariant" under scaling. (Note that invariance under scaling means that the tail of the distribution scales uniformly under scaling—in other words, the distribution is unaffected by scaling after we condition on large outcomes. To help visualize this, I've amplified a tail of each distribution, displayed in grey.)

In reality and mathematics alike, symmetries are typically not surface-deep. If the rotational symmetry of tree trunks makes you suspect that the phloem of trees are also pretty round, you would be right. So, it is reasonable to wonder whether similar invariances can be found in other properties of scale-free networks, or perhaps in the mechanisms that produce them.

In this post, we'll take a look at two different mathematical arguments for the power law degree distributions of networks generated by the BA model. We begin with the argument used in Barabasi's and Albert's original paper, which focuses on the expected degree of a given vertex and is very simple to derive. Next, we consider a more direct characterization of stationary degree distributions in the large time limit.

Both of these approaches are well-known in the network science literature, but in our presentation we highlight the role of invariance. In each case, we will find that scale-invariance of degree distributions is a mathematical consequence of a scale-invariance in the mechanism of preferential attachment. In our consideration of stationary distributions, the pursuit of scale-invariance also leads us to discover a simple approximate relationship between preferential attachment laws and limiting degree distributions in general. Overall, we hope to shine some light on the scale-freeness of scale-free networks, and to convince the reader that tracking the role of symmetries occasionally provides unexpected insights!

Scale-Free Degree Dynamics

In the BA model, a graph is grown iteratively by adding vertices one at a time. When a vertex is added, it connects to existing vertices of the graph by growing mm random edges, the destinations of which are chosen independently with replacement from a certain distribution Π\Pi over vertices. (Multi-edges and self-edges are allowed.)

"Preferential attachment" is the property that new vertices will prefer to attach to existing vertices that already have high degrees. In the usual BA model, the probability that a particular vertex with degree kk is selected will be Π(k,n)=kn\Pi(k, n) = \frac{k}{n} where the normalizing constant nn is the sum of degrees over our graph at a particular moment. In other words, the rate at which a vertex acquires new edges is proportional to the share of edges that it already owns in the graph.

There is already a kind of scale-invariance at work here. First of all, the function Π(k,n)\Pi(k, n) is homogeneous, meaning that the way new vertices connect into the graph is invariant under the scaling of existing degrees. This is a pretty reasonable-sounding property; for example, if we replaced every hyperlink on the internet with two identical hyperlinks, it seems that no web page would become more or less "relevant" or likely to be linked to in the future. If we assume homogeneity, the definition of Π\Pi made above is fully determined by the additional property of linearity in k,k, which is another kind of scale-invariance: scaling the degree of an individual node causes a proportional scaling of its likelihood to receive new connections. We will return to these observations momentarily.

Let K(t)K(t) be the degree of a particular node over time and N(t)N(t) be the sum of degrees over the graph. In expected value, the number of edges our node gains at time t+1t + 1 is E[K(t+1)K(t)]=mE[Π(K(t),N(t))]=mΠ(E[K(t)],N(t))=mE[K(t)]N(t).\begin{align*} E[K(t + 1) - K(t)] & = m E[\Pi(K(t), N(t))] \\ & = m \Pi(E[K(t)], N(t)) = m \frac{E[K(t)]}{N(t)}. \end{align*} Note that the expected value commuted nicely with Π\Pi due to the linear dependence of Π\Pi on kk and the determinism of N(t)N(t). (Since mm edges are added at every step, N(t)=2tm.N(t) = 2tm.)

Now, a simple way to understand how K(t)K(t) will evolve for large tt is to approximate its evolution by the ODE dkdt=mkn.(1)\frac{dk}{dt} = \frac{m k}{n}. \tag{1} In fact, it can be shown that the error of this approximation is small relative to the growth of K(t).K(t). Furthermore, substituting n=2tm,n = 2tm, this equation is easy to solve by separation of variables: dkk=dt2t    ln(k)=12ln(t)+C    k=eCt.\begin{align*} \frac{dk}{k} = \frac{dt}{2t} & \implies \ln(k) = \frac{1}{2} \ln(t) + C \\ & \implies k = e^C \sqrt{t}. \end{align*} We conclude that a vertex started at time t0t_0 with mm initial edges should have degree around K(t)=mt/t0K(t) = m \sqrt{t / t_0} at time t,t, in expectation. At a large time TT, this lets us conclude that the (ensemble) average degree of a randomly chosen vertex VV will be approximately power-law distributed until some upper bound on degree. Indeed, a vertex started at some time ti<Tt_i < T has degree larger than kk in expectation when mT/tik    tiT(mk)2.m \sqrt{T / t_i} \ge k \iff t_i \ge T \left( \frac{m}{k} \right)^2. So, where KVK_V is the degree of a uniformly chosen vertex V,V, we get P(KVk)=#{ti:T/tik}T(mk)2P(\langle K_V \rangle \ge k) = \frac{\# \{ t_i : \sqrt{T/t_i} \ge k \}}{T} \approx \left( \frac{m}{k} \right)^{2} for large T.T.

Are we surprised to arrive at a power law? In fact, as we will argue now, the existence of a power law relationship is implied by the invariances of the preferential attachment mechanism we described above.

Let a group SR2S \cong \R^2 act on the (k,n)(k, n)-plane by scaling each axis independently. We claim that the family R\Rc of relationships between kk and nn produced by solutions to the system dkdt=Π(k,n)=kn,dndt=2\frac{dk}{dt}=\Pi(k, n) = \frac{k}{n}, \quad \frac{dn}{dt} = 2 is invariant under S,S, in the sense that any possible relationship, when scaled by an element of S,S, gives another possible relationship.

For example, we said that the likelihood of a given vertex to attract new edges is unaffected when all degrees in our network are scaled. This corresponds to the fact that uniform scaling of both kk and nn is an invariant for R.\Rc. What about the fact that preferential attachment scales proportionally with the degree of a particular vertex? This means that scaling kk and leaving nn fixed is an invariant for R.\Rc. These two subgroups generate all of S,S, so we are done.

What kind of family could R\Rc be? As we know, power law relations are called scale-free because they are invariant under scaling; specifically, they are orbits of one-parameter subgroups of S.S. From here, it can be seen that if a family R\Rc of curves is SS-invariant and sufficiently regular—say, satisfies the definition of a smooth foliation—then it must be a family of power laws.

For some geometric intuition, drag around on the following graph to view a power law transformed under different elements of S,S, and drag the slider to change the coefficient of the power law. The colored lines represent the family of power law relationships with the chosen coefficient. (We could only draw finitely many of them, of course!) Although individual curves may be affected by scaling, the overall family remains the same.

This insight does not reveal the coefficient of the power law—for that, we must use the particular form of Equation (1)(1)—but it is interesting to notice that the "scale-freeness" of the degree distribution is implied by nothing but the scale-freeness of preferential attachment.

Steady State Distributions

Rather than tracking the evolution of the degree of a given vertex, let us now suppose that the distribution of degrees has reached an approximate steady state p(k)p(k) and see what we can figure out about this distribution.

We'll also consider a somewhat more general scenario than before. Let's say that preferential attachment gives a weight π(k)\pi(k) to edges of degree k,k, so that the probability that the vertex chosen by preferential attachment has degree kk is Π(k)=π(k)p(k)μ\Pi(k) = \frac{\pi(k) p(k)}{\mu} where μ\mu is certain a normalizing constant μ=kπ(k)p(k).\mu = \sum_k \pi(k) p(k). Note that when π(k)=k,\pi(k) = k, μ\mu is the expected degree of a uniformly chosen vertex, which is 2m2m because there are always mm times more edges than vertices.

How does the number N(k)N(k) of vertices with degree kk change when one edge is added to the network? When k>1,k > 1, N(k)N(k) will increase if the vertex selected to receive the new connection has degree k1k - 1 and decrease if the selected vertex has degree k.k. In expectation, a new vertex that arrives with mm independently chosen edges will cause an increase of ΔN(k)=m(Π(k1)Π(k))\begin{align*} \langle \Delta N(k) \rangle = m(\Pi(k - 1) - \Pi(k)) \end{align*} to the value of N(k)N(k) for k>m,k > m, and an increase of exactly 11 to N(m).N(m).

If p(k)p(k) is a stationary distribution under this process—which strictly speaking will only happen in the large time limit—then p(k)p(k) will be proportional to the expected increase ΔN(k).\Delta N(k). Indeed, since only one vertex is added, the constant of proportionality is kΔN(k)=1.\sum_k \Delta N(k) = 1. Overall, we conclude the stationarity condition p(k)=m(Π(k1)Π(k))=mμ(π(k1)p(k1)π(k)p(k))\begin{align*} p(k) & = m(\Pi(k - 1) - \Pi(k)) \\ & = \frac{m}{\mu} (\pi(k - 1) p(k - 1) - \pi(k) p(k)) \end{align*} for all k>m.k > m.

If we hope to talk about scale freeness of the distribution p(k),p(k), we need to upgrade it to a distribution over continuous-valued quantities. So, let us take a leap of faith and make kk into a real-valued variable, replacing the equation above with the corresponding differential equation p(k)=mμ(ddkπ(k)p(k))=mμ(p(k)dπdk(k)+π(k)dpdk(k)).(2)p(k) = \frac{-m}{\mu} \left( \frac{d}{d k} \pi(k) p(k) \right) = \frac{-m}{\mu} \left( p(k) \frac{d \pi}{d k}(k) + \pi(k) \frac{d p}{d k}(k) \right). \tag{2} Our hope, which we will not attempt to justify in this post, is that the limiting distributions for "continuous-valued degree" predicted by this equation will be representative of our limiting degree distributions for large k.k.

One complication here is that the normalizing constant μ\mu also depends on p,p, and in fact was defined based on the "discrete version" of p.p. So, for the purposes of our continuous-valued degree analysis, we'll focus on solving the eigenfunction problem for the operator Aπ(p)=pdπdkπdpdk.(3)A_\pi(p) = - p \frac{d \pi}{d k} - \pi \frac{d p}{d k}. \tag{3} (Specifically, we're asking for functions pp so that Aπ(p)=λpA_\pi(p) = \lambda p for sufficiently large inputs k.)k.) Let's call AπA_\pi the "attachment operator" for π.\pi. It can be viewed as giving the Lie derivative of probability densities with respect to the vector field π(k)k\pi(k) \partial_k that "boosts" degree kk proportionally to π(k).\pi(k). Specifically, when π(k)>0,\pi(k) > 0, ϕt(k)\phi_t(k) gives the flow of the vector field π(k)k\pi(k) \partial_k and (ϕt)(\phi_t)_* denotes the pushforward of a probability measure, Aπ(p)(k)=ddtt=0(ϕt)(p)(k)=ddtt=0p(ϕt(k))ϕt(k).A_\pi(p)(k) = \frac{d}{dt}_{t = 0} (\phi_t)_*(p)(k) = \frac{d}{dt}_{t = 0} p(\phi_{-t}(k)) \phi_{-t}'(k). Arriving at the attachment operator required a leap of the imagination, but now that we are here the role of invariance is back in sight. Indeed, when π(k)k\pi(k) \partial_k is a complete vector field, summable eigenfunctions of AπA_\pi are exactly the distributions whose tails are invariant under the flow of π(k)k.\pi(k) \partial_k. In other words, we expect a distribution of degrees that is stationary under an attachment function π(k)\pi(k) to have a "continuous analog" that is invariant under the flow π(k)k.\pi(k) \partial_k.

We can solve the eigenfunction equation Aπ(p)=pππdpdk=λpA_\pi(p) = - p \pi' - \pi \frac{d p}{d k} = \lambda p in general by rearranging it as a separable equation 1pdp=λ+π(k)π(k)dk.\frac{1}{p} dp = -\frac{\lambda + \pi'(k)}{\pi(k)} dk. So, a stationary distribution for the attachment function π(k)\pi(k) will be of the form p(k)exp(λ+π(k)π(k)dk)=1π(k)exp(λdkπ(k)).(4)p(k) \propto \exp \left( -\int \frac{\lambda + \pi'(k)}{\pi(k)} dk\right) = \frac{1}{\pi(k)} \exp \left( - \lambda \int \frac{dk}{\pi(k)} \right). \tag{4} for some unknown positive λ.\lambda.

We can also go backwards, solving for an attachment function π\pi that has pp as an eigenfunction. For this we need to solve the differential equation dπdk+πp(k)p(k)=λ.\frac{d \pi}{d k} + \pi \frac{p'(k)}{p(k)} = - \lambda. Applying usual techniques—my favorite being to informally imagine the constant λ\lambda as a superposition of delta functions—we arrive at the expression π(k)=λP(k)+Cp(k)\pi(k) = \frac{-\lambda P(k) + C}{p(k)} where P(k)P(k) is the cumulative density function P(k)=p(k)dkP(k) = \int p(k) \, dk and CC is some constant. For π(k)\pi(k) to be positive we need Cλ,C \ge \lambda, while for π(k)k\pi(k) \partial_k to be a complete vector field we need dkπ(k)=p(k)λP(k)+Cdk\int \frac{dk}{\pi(k)} = \int \frac{p(k)}{-\lambda P(k) + C} dk to diverge, forcing C=λ.C = \lambda. So, subject to these two constraints, we have a unique solution of π(k)=λ1P(k)p(k)(5)\pi(k) = \lambda \frac{1 - P(k)}{p(k)} \tag{5} up to a scalar. Another way to derive this is to view our distribution as the image of the uniform distribution on [0,1)[0, 1) under the image of P1.P^{-1}. Then there is a correspondence of invariances of p(k)p(k) and invariances of the distribution on the interval, but the invariances of the latter are exactly affine maps sending [0,1)[0, 1) to subintervals. Pushing forward the family of invariances ϕt(x)=1(1x)eλt\phi_t(x) = 1 - (1 - x)e^{\lambda t} under P1P^{-1} and differentiating with respect to tt at t=0t = 0 gives us exactly the same expression as (5).(5).

Some remarks about the meanings of these equations in order. First of all, note that putting λ=0\lambda = 0 in (4)(4) gives us the bizarre conclusion that the density p(k)π(k)1p(k) \propto \pi(k)^{-1} is exactly conserved by the flow of π(k)k.\pi(k) \partial_k. If p(k)p(k) is a probability distribution and π(k)k\pi(k) \partial_k is a complete non-zero vector field with π(k)>0,\pi(k) > 0, then conservation of probability dictates that LAπ(p)(k)dk>0\int_L^\infty A_\pi(p)(k) dk > 0 whenever the support of p(k)p(k) contains L,L, so in particular if p(k)p(k) is an eigenfunction of AπA_\pi then it must have strictly positive eigenvalue. (For some intuition, see the widget at the start of this post.) But in fact, π(k)k\pi(k) \partial_k is complete exactly when π(k)1dk\int \pi(k)^{-1} \, dk diverges, while p(k)p(k) can be viewed as a density function when p(k)dk\int p(k) \, dk converges, so if p(k)π(k)1p(k) \propto \pi(k)^{-1} one of these assumptions must fail.

Also, consider what it means to say that π(k)k\pi(k) \partial_k is a complete vector field. In fact, the reasonable condition to stipulate on π(k)\pi(k) in view of our preferential attachment process is that, given the stationary density p(k),p(k), we can define a density function Π(k)π(k)p(k)\Pi(k) \propto \pi(k) p(k) for the "selected" degree, or in other words that π(k)p(k)\pi(k) p(k) is summable. But in fact, we have shown that any stationary distribution p(k)p(k) for π\pi will admit a λ\lambda such that π(k)p(k)=exp(λdkπ(k)),\pi(k) p(k) = \exp\left( -\lambda \int \frac{dk}{\pi(k)} \right), and the RHS cannot converge to zero unless dkπ(k)\int \frac{dk}{\pi(k)} diverges, so completeness of π(k)k\pi(k) \partial_k is a necessary condition for Π(k)\Pi(k) to exist.

Overall, we get a nice bijective correspondence between families of probability distributions and complete vector fields π(k)k\pi(k) \partial_k with π(k)0,\pi(k) \ge 0, each considered modulo their tails. Indeed, it is straightforward to check that the relations p(k)1π(k)exp(λdkπ(k)),π(k)1P(k)p(k)\begin{align*} p(k) & \propto \frac{1}{\pi(k)} \exp\left( -\lambda \int \frac{dk}{\pi(k)} \right), \\ \pi(k) & \propto \frac{1 - P(k)}{p(k)} \end{align*} send summable function p(k)p(k) to inverse-summable functions π(k)\pi(k) and vice-versa. This appears to be an interesting framework to investigate how attachment functions π(k)\pi(k) are related to the tails of the resulting degree distributions. However, recall that the eigenvalue λ\lambda is not determined by our analysis because it depends on the normalizing constant μ\mu derived from the discrete model. Since the normalizing constant in turn depends on the distribution p(k),p(k), solving for λ\lambda in general requires returning to the discrete model and solving a (potentially intractable) non-linear equation.

Examples of Invariant Distributions

Let's check some examples. With π(k)=k\pi(k) = k as in the BA model, our attachment operator becomes Aπ(p)=kpp.A_\pi(p) = -k p' - p. The eigenfunction with eigenvalue λ\lambda will be p(k)=kλ1,p(k) = k^{-\lambda - 1}, since indeed Aπ(kλ1)=k(λ1)kλ2kλ1=λkλ1.A_\pi(k^{-\lambda - 1}) = -k (-\lambda - 1) k^{-\lambda - 2} - k^{-\lambda - 1} = \lambda k^{-\lambda - 1}. Because we know that μ=2m,\mu = 2m, we can substitute into equation (2)(2) to determine the eigenvalue λ.\lambda. In fact, kλ1=m2mAπ(kλ1)=λ2kλ1    λ=2,k^{-\lambda - 1} = \frac{m}{2m} A_\pi(k^{- \lambda - 1}) = \frac{\lambda}{2} k^{-\lambda - 1} \implies \lambda = 2, so we will have p(k)k3,p(k) \propto k^{-3}, which agrees with our derivation that P(KV>k)k2P(\langle K_V \rangle > k) \propto k^{-2} above.

Another particularly simple case is π(k)=1,\pi(k) = 1, corresponding to a lack of any preferential attachment. In this case we get Aπ=kA_\pi = -\partial_k whose eigenfunctions are exponentials. Indeed, the exponential distribution is invariant under translation in the same way that Pareto distributions are invariant under scaling.

Now consider a more general case π(k)=kα.\pi(k) = k^\alpha. With α>1\alpha > 1 we find that the integral of 1/π(k)1/\pi(k) does not diverge, and so we predict that we will not obtain meaningful stationary distributions for the reason mentioned above. (Specifically, the density π(k)p(k)\pi(k) p(k) would diverge for any stationary distribution p(k).p(k).) On the other hand, for α1\alpha \le 1 our analysis correctly predicts stretched exponential distributions of the form p(k)kαexp(λk1αα1).p(k) \propto k^{-\alpha} \exp\left( \frac{\lambda k^{1 - \alpha}}{\alpha - 1} \right). In practice, power law distributions are often confused with log-normal distributions. Up to a scaling parameter, the density function of a log-normal distribution is of the form p(k)1kexp(ln(k)2)=exp(ln(k)2ln(k))).p(k) \propto \frac{1}{k} \exp(-\ln(k)^2) = \exp(- \ln(k)^2 - \ln(k))). This gives a slightly thinner tail than a power law, which could be written as p(k)exp(γln(k)).p(k) \propto \exp(-\gamma \ln(k)). What kind of attachment operator might produce a log-normal distribution?

Plugging in equation (5)(5) gives π(k)Erfc(ln(k))k1exp(ln(k)2)=kErfc(ln(k))exp(ln(k)2).\pi(k) \propto \frac{\Erfc(\ln(k))}{k^{-1} \exp(-\ln(k)^2)} = k \frac{\Erfc(\ln(k))}{\exp(-\ln(k)^2)}. where Erfc\Erfc is the complementary error function Erfc(k)ket2dt\Erfc(k) \propto \int_k^\infty e^{-t^2} \, dt However, note that Erfc(k)exp(k2)1k\frac{\Erfc(k)}{\exp(-k^2)} \sim \frac{1}{k} so we have the asymptotic approximation π(k)kln(k).\pi(k) \sim \frac{k}{\ln(k)}. If we take π(k)=k/ln(k)\pi(k) = k/\ln(k) exactly, our attachment operator AπA_\pi turns out to have eigenfunctions p(k)ln(k)kexp(λ2ln(k)2)=exp(λ2ln(k)2ln(k)+ln(ln(k))),p(k) \propto \frac{\ln(k)}{k} \exp\left( -\frac{\lambda}{2} \ln(k)^2 \right) = \exp\left( -\frac{\lambda}{2} \ln(k)^2 - \ln(k) + \ln(\ln(k)) \right), which is very similar to the density of a log-normal.

← cgad.ski