← cgad.ski
2023-09-12

In their 1999 paper *Emergence of Scaling in Random Networks*, Barabasi and Albert observed that some large real-world networks had degree distributions well-approximated by power laws. They called such networks "scale-free," and the name stuck.

The same paper also proposed a random model, now called the Barabasi-Albert model, for a network undergoing certain idealized forms of *growth* and *preferential attachment*. It can be shown that a network produced by the BA model is scale-free with high probability in the large time limit, which raises the hypothesis that scale-free networks in the real world are also produced by mechanisms of growth and preferential attachment.

But, why do we call scale-free networks scale-free? In physics, a law is scale-free when it is unaffected by an operation of scaling. In mathematics, the term "scale-invariant" seems to be preferred, but has the same meaning. So, when I first encountered "scale-free networks", I was confused. What scaling action leaves a scale-free network invariant?

In the usage of Barabasi and Albert, "scale freeness" simply refers to the scale-free property of a network's degree distribution. Indeed, Pareto distributions of the form
$P(X \ge x) = C x^{-\gamma}$
for $\gamma > 0$ are exactly the scale-invariant distributions of positive numbers, and the power law tails we find in degree statistics of scale-free networks are their discrete analogs. The widget below gives a geometrical sense of what it means for a distribution, represented by its density function, to be "invariant" under scaling. (Note that invariance under scaling means that the *tail* of the distribution scales uniformly under scaling—in other words, the distribution is unaffected by scaling after we condition on large outcomes. To help visualize this, I've amplified a tail of each distribution, displayed in grey.)

In reality and mathematics alike, symmetries are typically not surface-deep. If the rotational symmetry of tree trunks makes you suspect that the phloem of trees are also pretty round, you would be right. So, it is reasonable to wonder whether similar invariances can be found in other properties of scale-free networks, or perhaps in the mechanisms that produce them.

In this post, we'll take a look at two different mathematical arguments for the power law degree distributions of networks generated by the BA model. We begin with the argument used in Barabasi's and Albert's original paper, which focuses on the expected degree of a given vertex and is very simple to derive. Next, we consider a more direct characterization of stationary degree distributions in the large time limit.

Both of these approaches are well-known in the network science literature, but in our presentation we highlight the role of invariance. In each case, we will find that scale-invariance of degree distributions is a mathematical consequence of a scale-invariance in the mechanism of preferential attachment. In our consideration of stationary distributions, the pursuit of scale-invariance also leads us to discover a simple approximate relationship between preferential attachment laws and limiting degree distributions in general. Overall, we hope to shine some light on the scale-freeness of scale-free networks, and to convince the reader that tracking the role of symmetries occasionally provides unexpected insights!

In the BA model, a graph is grown iteratively by adding vertices one at a time. When a vertex is added, it connects to existing vertices of the graph by growing $m$ random edges, the destinations of which are chosen independently with replacement from a certain distribution $\Pi$ over vertices. (Multi-edges and self-edges are allowed.)

"Preferential attachment" is the property that new vertices will prefer to attach to existing vertices that already have high degrees. In the usual BA model, the probability that a particular vertex with degree $k$ is selected will be $\Pi(k, n) = \frac{k}{n}$ where the normalizing constant $n$ is the sum of degrees over our graph at a particular moment. In other words, the rate at which a vertex acquires new edges is proportional to the share of edges that it already owns in the graph.

There is already a kind of scale-invariance at work here. First of all, the function $\Pi(k, n)$ is homogeneous, meaning that the way new vertices connect into the graph is *invariant* under the scaling of existing degrees. This is a pretty reasonable-sounding property; for example, if we replaced every hyperlink on the internet with two identical hyperlinks, it seems that no web page would become more or less "relevant" or likely to be linked to in the future. If we assume homogeneity, the definition of $\Pi$ made above is fully determined by the additional property of linearity in $k,$ which is another kind of scale-invariance: scaling the degree of an individual node causes a proportional scaling of its likelihood to receive new connections. We will return to these observations momentarily.

Let $K(t)$ be the degree of a particular node over time and $N(t)$ be the sum of degrees over the graph. In expected value, the number of edges our node gains at time $t + 1$ is $\begin{align*} E[K(t + 1) - K(t)] & = m E[\Pi(K(t), N(t))] \\ & = m \Pi(E[K(t)], N(t)) = m \frac{E[K(t)]}{N(t)}. \end{align*}$ Note that the expected value commuted nicely with $\Pi$ due to the linear dependence of $\Pi$ on $k$ and the determinism of $N(t)$. (Since $m$ edges are added at every step, $N(t) = 2tm.$)

Now, a simple way to understand how $K(t)$ will evolve for large $t$ is to approximate its evolution by the ODE $\frac{dk}{dt} = \frac{m k}{n}. \tag{1}$ In fact, it can be shown that the error of this approximation is small relative to the growth of $K(t).$ Furthermore, substituting $n = 2tm,$ this equation is easy to solve by separation of variables: $\begin{align*} \frac{dk}{k} = \frac{dt}{2t} & \implies \ln(k) = \frac{1}{2} \ln(t) + C \\ & \implies k = e^C \sqrt{t}. \end{align*}$ We conclude that a vertex started at time $t_0$ with $m$ initial edges should have degree around $K(t) = m \sqrt{t / t_0}$ at time $t,$ in expectation. At a large time $T$, this lets us conclude that the (ensemble) average degree of a randomly chosen vertex $V$ will be approximately power-law distributed until some upper bound on degree. Indeed, a vertex started at some time $t_i < T$ has degree larger than $k$ in expectation when $m \sqrt{T / t_i} \ge k \iff t_i \ge T \left( \frac{m}{k} \right)^2.$ So, where $K_V$ is the degree of a uniformly chosen vertex $V,$ we get $P(\langle K_V \rangle \ge k) = \frac{\# \{ t_i : \sqrt{T/t_i} \ge k \}}{T} \approx \left( \frac{m}{k} \right)^{2}$ for large $T.$

Are we surprised to arrive at a power law? In fact, as we will argue now, the existence of a power law relationship is implied by the invariances of the preferential attachment mechanism we described above.

Let a group $S \cong \R^2$ act on the $(k, n)$-plane by scaling each axis independently. We claim that the family $\Rc$ of relationships between $k$ and $n$ produced by solutions to the system $\frac{dk}{dt}=\Pi(k, n) = \frac{k}{n}, \quad \frac{dn}{dt} = 2$ is invariant under $S,$ in the sense that any possible relationship, when scaled by an element of $S,$ gives another possible relationship.

For example, we said that the likelihood of a given vertex to attract new edges is unaffected when all degrees in our network are scaled. This corresponds to the fact that uniform scaling of both $k$ and $n$ is an invariant for $\Rc.$ What about the fact that preferential attachment scales proportionally with the degree of a particular vertex? This means that scaling $k$ and leaving $n$ fixed is an invariant for $\Rc.$ These two subgroups generate all of $S,$ so we are done.

What kind of family could $\Rc$ be? As we know, power law relations are called *scale-free* because they are invariant under scaling; specifically, they are orbits of one-parameter subgroups of $S.$ From here, it can be seen that if a family $\Rc$ of curves is $S$-invariant and sufficiently regular—say, satisfies the definition of a smooth foliation—then it must be a family of power laws.

For some geometric intuition, drag around on the following graph to view a power law transformed under different elements of $S,$ and drag the slider to change the coefficient of the power law. The colored lines represent the family of power law relationships with the chosen coefficient. (We could only draw finitely many of them, of course!) Although individual curves may be affected by scaling, the overall family remains the same.

This insight does not reveal the coefficient of the power law—for that, we must use the particular form of Equation $(1)$—but it is interesting to notice that the "scale-freeness" of the degree distribution is implied by nothing but the scale-freeness of preferential attachment.

Rather than tracking the evolution of the degree of a given vertex, let us now suppose that the distribution of degrees has reached an approximate steady state $p(k)$ and see what we can figure out about this distribution.

We'll also consider a somewhat more general scenario than before. Let's say that preferential attachment gives a weight $\pi(k)$ to edges of degree $k,$ so that the probability that the vertex chosen by preferential attachment has degree $k$ is $\Pi(k) = \frac{\pi(k) p(k)}{\mu}$ where $\mu$ is certain a normalizing constant $\mu = \sum_k \pi(k) p(k).$ Note that when $\pi(k) = k,$ $\mu$ is the expected degree of a uniformly chosen vertex, which is $2m$ because there are always $m$ times more edges than vertices.

How does the number $N(k)$ of vertices with degree $k$ change when one edge is added to the network? When $k > 1,$ $N(k)$ will increase if the vertex selected to receive the new connection has degree $k - 1$ and decrease if the selected vertex has degree $k.$ In expectation, a new vertex that arrives with $m$ independently chosen edges will cause an increase of $\begin{align*} \langle \Delta N(k) \rangle = m(\Pi(k - 1) - \Pi(k)) \end{align*}$ to the value of $N(k)$ for $k > m,$ and an increase of exactly $1$ to $N(m).$

If $p(k)$ is a stationary distribution under this process—which strictly speaking will only happen in the large time limit—then $p(k)$ will be proportional to the expected increase $\Delta N(k).$ Indeed, since only one vertex is added, the constant of proportionality is $\sum_k \Delta N(k) = 1.$ Overall, we conclude the stationarity condition $\begin{align*} p(k) & = m(\Pi(k - 1) - \Pi(k)) \\ & = \frac{m}{\mu} (\pi(k - 1) p(k - 1) - \pi(k) p(k)) \end{align*}$ for all $k > m.$

If we hope to talk about scale freeness of the distribution $p(k),$ we need to upgrade it to a distribution over continuous-valued quantities. So, let us take a leap of faith and make $k$ into a real-valued variable, replacing the equation above with the corresponding differential equation $p(k) = \frac{-m}{\mu} \left( \frac{d}{d k} \pi(k) p(k) \right) = \frac{-m}{\mu} \left( p(k) \frac{d \pi}{d k}(k) + \pi(k) \frac{d p}{d k}(k) \right). \tag{2}$ Our hope, which we will not attempt to justify in this post, is that the limiting distributions for "continuous-valued degree" predicted by this equation will be representative of our limiting degree distributions for large $k.$

One complication here is that the normalizing constant $\mu$ also depends on $p,$ and in fact was defined based on the "discrete version" of $p.$ So, for the purposes of our continuous-valued degree analysis, we'll focus on solving the eigenfunction problem for the operator $A_\pi(p) = - p \frac{d \pi}{d k} - \pi \frac{d p}{d k}. \tag{3}$ (Specifically, we're asking for functions $p$ so that $A_\pi(p) = \lambda p$ for sufficiently large inputs $k.)$ Let's call $A_\pi$ the "attachment operator" for $\pi.$ It can be viewed as giving the Lie derivative of probability densities with respect to the vector field $\pi(k) \partial_k$ that "boosts" degree $k$ proportionally to $\pi(k).$ Specifically, when $\pi(k) > 0,$ $\phi_t(k)$ gives the flow of the vector field $\pi(k) \partial_k$ and $(\phi_t)_*$ denotes the pushforward of a probability measure, $A_\pi(p)(k) = \frac{d}{dt}_{t = 0} (\phi_t)_*(p)(k) = \frac{d}{dt}_{t = 0} p(\phi_{-t}(k)) \phi_{-t}'(k).$ Arriving at the attachment operator required a leap of the imagination, but now that we are here the role of invariance is back in sight. Indeed, when $\pi(k) \partial_k$ is a complete vector field, summable eigenfunctions of $A_\pi$ are exactly the distributions whose tails are invariant under the flow of $\pi(k) \partial_k.$ In other words, we expect a distribution of degrees that is stationary under an attachment function $\pi(k)$ to have a "continuous analog" that is invariant under the flow $\pi(k) \partial_k.$

We can solve the eigenfunction equation $A_\pi(p) = - p \pi' - \pi \frac{d p}{d k} = \lambda p$ in general by rearranging it as a separable equation $\frac{1}{p} dp = -\frac{\lambda + \pi'(k)}{\pi(k)} dk.$ So, a stationary distribution for the attachment function $\pi(k)$ will be of the form $p(k) \propto \exp \left( -\int \frac{\lambda + \pi'(k)}{\pi(k)} dk\right) = \frac{1}{\pi(k)} \exp \left( - \lambda \int \frac{dk}{\pi(k)} \right). \tag{4}$ for some unknown positive $\lambda.$

We can also go backwards, solving for an attachment function $\pi$ that has $p$ as an eigenfunction. For this we need to solve the differential equation $\frac{d \pi}{d k} + \pi \frac{p'(k)}{p(k)} = - \lambda.$ Applying usual techniques—my favorite being to informally imagine the constant $\lambda$ as a superposition of delta functions—we arrive at the expression $\pi(k) = \frac{-\lambda P(k) + C}{p(k)}$ where $P(k)$ is the cumulative density function $P(k) = \int p(k) \, dk$ and $C$ is some constant. For $\pi(k)$ to be positive we need $C \ge \lambda,$ while for $\pi(k) \partial_k$ to be a complete vector field we need $\int \frac{dk}{\pi(k)} = \int \frac{p(k)}{-\lambda P(k) + C} dk$ to diverge, forcing $C = \lambda.$ So, subject to these two constraints, we have a unique solution of $\pi(k) = \lambda \frac{1 - P(k)}{p(k)} \tag{5}$ up to a scalar. Another way to derive this is to view our distribution as the image of the uniform distribution on $[0, 1)$ under the image of $P^{-1}.$ Then there is a correspondence of invariances of $p(k)$ and invariances of the distribution on the interval, but the invariances of the latter are exactly affine maps sending $[0, 1)$ to subintervals. Pushing forward the family of invariances $\phi_t(x) = 1 - (1 - x)e^{\lambda t}$ under $P^{-1}$ and differentiating with respect to $t$ at $t = 0$ gives us exactly the same expression as $(5).$

Some remarks about the meanings of these equations in order. First of all, note that putting $\lambda = 0$ in $(4)$ gives us the bizarre conclusion that the density $p(k) \propto \pi(k)^{-1}$ is exactly conserved by the flow of $\pi(k) \partial_k.$ If $p(k)$ is a probability distribution and $\pi(k) \partial_k$ is a complete non-zero vector field with $\pi(k) > 0,$ then conservation of probability dictates that $\int_L^\infty A_\pi(p)(k) dk > 0$ whenever the support of $p(k)$ contains $L,$ so in particular if $p(k)$ is an eigenfunction of $A_\pi$ then it must have strictly positive eigenvalue. (For some intuition, see the widget at the start of this post.) But in fact, $\pi(k) \partial_k$ is complete exactly when $\int \pi(k)^{-1} \, dk$ diverges, while $p(k)$ can be viewed as a density function when $\int p(k) \, dk$ converges, so if $p(k) \propto \pi(k)^{-1}$ one of these assumptions must fail.

Also, consider what it means to say that $\pi(k) \partial_k$ is a complete vector field. In fact, the reasonable condition to stipulate on $\pi(k)$ in view of our preferential attachment process is that, given the stationary density $p(k),$ we can define a density function $\Pi(k) \propto \pi(k) p(k)$ for the "selected" degree, or in other words that $\pi(k) p(k)$ is summable. But in fact, we have shown that any stationary distribution $p(k)$ for $\pi$ will admit a $\lambda$ such that $\pi(k) p(k) = \exp\left( -\lambda \int \frac{dk}{\pi(k)} \right),$ and the RHS cannot converge to zero unless $\int \frac{dk}{\pi(k)}$ diverges, so completeness of $\pi(k) \partial_k$ is a necessary condition for $\Pi(k)$ to exist.

Overall, we get a nice bijective correspondence between families of probability distributions and complete vector fields $\pi(k) \partial_k$ with $\pi(k) \ge 0,$ each considered modulo their tails. Indeed, it is straightforward to check that the relations $\begin{align*} p(k) & \propto \frac{1}{\pi(k)} \exp\left( -\lambda \int \frac{dk}{\pi(k)} \right), \\ \pi(k) & \propto \frac{1 - P(k)}{p(k)} \end{align*}$ send summable function $p(k)$ to inverse-summable functions $\pi(k)$ and vice-versa. This appears to be an interesting framework to investigate how attachment functions $\pi(k)$ are related to the tails of the resulting degree distributions. However, recall that the eigenvalue $\lambda$ is not determined by our analysis because it depends on the normalizing constant $\mu$ derived from the discrete model. Since the normalizing constant in turn depends on the distribution $p(k),$ solving for $\lambda$ in general requires returning to the discrete model and solving a (potentially intractable) non-linear equation.

Let's check some examples. With $\pi(k) = k$ as in the BA model, our attachment operator becomes $A_\pi(p) = -k p' - p.$ The eigenfunction with eigenvalue $\lambda$ will be $p(k) = k^{-\lambda - 1},$ since indeed $A_\pi(k^{-\lambda - 1}) = -k (-\lambda - 1) k^{-\lambda - 2} - k^{-\lambda - 1} = \lambda k^{-\lambda - 1}.$ Because we know that $\mu = 2m,$ we can substitute into equation $(2)$ to determine the eigenvalue $\lambda.$ In fact, $k^{-\lambda - 1} = \frac{m}{2m} A_\pi(k^{- \lambda - 1}) = \frac{\lambda}{2} k^{-\lambda - 1} \implies \lambda = 2,$ so we will have $p(k) \propto k^{-3},$ which agrees with our derivation that $P(\langle K_V \rangle > k) \propto k^{-2}$ above.

Another particularly simple case is $\pi(k) = 1,$ corresponding to a lack of any preferential attachment. In this case we get $A_\pi = -\partial_k$ whose eigenfunctions are exponentials. Indeed, the exponential distribution is invariant under translation in the same way that Pareto distributions are invariant under scaling.

Now consider a more general case $\pi(k) = k^\alpha.$ With $\alpha > 1$ we find that the integral of $1/\pi(k)$ does not diverge, and so we predict that we will not obtain meaningful stationary distributions for the reason mentioned above. (Specifically, the density $\pi(k) p(k)$ would diverge for any stationary distribution $p(k).$) On the other hand, for $\alpha \le 1$ our analysis correctly predicts *stretched exponential* distributions of the form
$p(k) \propto k^{-\alpha} \exp\left( \frac{\lambda k^{1 - \alpha}}{\alpha - 1} \right).$
In practice, power law distributions are often confused with log-normal distributions. Up to a scaling parameter, the density function of a log-normal distribution is of the form
$p(k) \propto \frac{1}{k} \exp(-\ln(k)^2) = \exp(- \ln(k)^2 - \ln(k))).$
This gives a slightly thinner tail than a power law, which could be written as
$p(k) \propto \exp(-\gamma \ln(k)).$
What kind of attachment operator might produce a log-normal distribution?

Plugging in equation $(5)$ gives $\pi(k) \propto \frac{\Erfc(\ln(k))}{k^{-1} \exp(-\ln(k)^2)} = k \frac{\Erfc(\ln(k))}{\exp(-\ln(k)^2)}.$ where $\Erfc$ is the complementary error function $\Erfc(k) \propto \int_k^\infty e^{-t^2} \, dt$ However, note that $\frac{\Erfc(k)}{\exp(-k^2)} \sim \frac{1}{k}$ so we have the asymptotic approximation $\pi(k) \sim \frac{k}{\ln(k)}.$ If we take $\pi(k) = k/\ln(k)$ exactly, our attachment operator $A_\pi$ turns out to have eigenfunctions $p(k) \propto \frac{\ln(k)}{k} \exp\left( -\frac{\lambda}{2} \ln(k)^2 \right) = \exp\left( -\frac{\lambda}{2} \ln(k)^2 - \ln(k) + \ln(\ln(k)) \right),$ which is very similar to the density of a log-normal.