In their 1999 paper Emergence of Scaling in Random Networks, Barabasi and Albert observed that some large real-world networks had degree distributions well-approximated by power laws. They called such networks "scale-free," and the name stuck.
The same paper also proposed a random model, now called the Barabasi-Albert model, for a network undergoing certain idealized forms of growth and preferential attachment. It can be shown that a network produced by the BA model is scale-free with high probability in the large time limit, which raises the hypothesis that scale-free networks in the real world are also produced by mechanisms of growth and preferential attachment.
But, why do we call scale-free networks scale-free? In physics, a law is scale-free when it is unaffected by an operation of scaling. In mathematics, the term "scale-invariant" seems to be preferred, but has the same meaning. So, when I first encountered "scale-free networks", I was confused. What scaling action leaves a scale-free network invariant?
In the usage of Barabasi and Albert, "scale freeness" simply refers to the scale-free property of a network's degree distribution. Indeed, Pareto distributions of the form for are exactly the scale-invariant distributions of positive numbers, and the power law tails we find in degree statistics of scale-free networks are their discrete analogs. The widget below gives a geometrical sense of what it means for a distribution, represented by its density function, to be "invariant" under scaling. (Note that invariance under scaling means that the tail of the distribution scales uniformly under scaling—in other words, the distribution is unaffected by scaling after we condition on large outcomes. To help visualize this, I've amplified a tail of each distribution, displayed in grey.)
In reality and mathematics alike, symmetries are typically not surface-deep. If the rotational symmetry of tree trunks makes you suspect that the phloem of trees are also pretty round, you would be right. So, it is reasonable to wonder whether similar invariances can be found in other properties of scale-free networks, or perhaps in the mechanisms that produce them.
In this post, we'll take a look at two different mathematical arguments for the power law degree distributions of networks generated by the BA model. We begin with the argument used in Barabasi's and Albert's original paper, which focuses on the expected degree of a given vertex and is very simple to derive. Next, we consider a more direct characterization of stationary degree distributions in the large time limit.
Both of these approaches are well-known in the network science literature, but in our presentation we highlight the role of invariance. In each case, we will find that scale-invariance of degree distributions is a mathematical consequence of a scale-invariance in the mechanism of preferential attachment. In our consideration of stationary distributions, the pursuit of scale-invariance also leads us to discover a simple approximate relationship between preferential attachment laws and limiting degree distributions in general. Overall, we hope to shine some light on the scale-freeness of scale-free networks, and to convince the reader that tracking the role of symmetries occasionally provides unexpected insights!
In the BA model, a graph is grown iteratively by adding vertices one at a time. When a vertex is added, it connects to existing vertices of the graph by growing random edges, the destinations of which are chosen independently with replacement from a certain distribution over vertices. (Multi-edges and self-edges are allowed.)
"Preferential attachment" is the property that new vertices will prefer to attach to existing vertices that already have high degrees. In the usual BA model, the probability that a particular vertex with degree is selected will be where the normalizing constant is the sum of degrees over our graph at a particular moment. In other words, the rate at which a vertex acquires new edges is proportional to the share of edges that it already owns in the graph.
There is already a kind of scale-invariance at work here. First of all, the function is homogeneous, meaning that the way new vertices connect into the graph is invariant under the scaling of existing degrees. This is a pretty reasonable-sounding property; for example, if we replaced every hyperlink on the internet with two identical hyperlinks, it seems that no web page would become more or less "relevant" or likely to be linked to in the future. If we assume homogeneity, the definition of made above is fully determined by the additional property of linearity in which is another kind of scale-invariance: scaling the degree of an individual node causes a proportional scaling of its likelihood to receive new connections. We will return to these observations momentarily.
Let be the degree of a particular node over time and be the sum of degrees over the graph. In expected value, the number of edges our node gains at time is Note that the expected value commuted nicely with due to the linear dependence of on and the determinism of . (Since edges are added at every step, )
Now, a simple way to understand how will evolve for large is to approximate its evolution by the ODE In fact, it can be shown that the error of this approximation is small relative to the growth of Furthermore, substituting this equation is easy to solve by separation of variables: We conclude that a vertex started at time with initial edges should have degree around at time in expectation. At a large time , this lets us conclude that the (ensemble) average degree of a randomly chosen vertex will be approximately power-law distributed until some upper bound on degree. Indeed, a vertex started at some time has degree larger than in expectation when So, where is the degree of a uniformly chosen vertex we get for large
Are we surprised to arrive at a power law? In fact, as we will argue now, the existence of a power law relationship is implied by the invariances of the preferential attachment mechanism we described above.
Let a group act on the -plane by scaling each axis independently. We claim that the family of relationships between and produced by solutions to the system is invariant under in the sense that any possible relationship, when scaled by an element of gives another possible relationship.
For example, we said that the likelihood of a given vertex to attract new edges is unaffected when all degrees in our network are scaled. This corresponds to the fact that uniform scaling of both and is an invariant for What about the fact that preferential attachment scales proportionally with the degree of a particular vertex? This means that scaling and leaving fixed is an invariant for These two subgroups generate all of so we are done.
What kind of family could be? As we know, power law relations are called scale-free because they are invariant under scaling; specifically, they are orbits of one-parameter subgroups of From here, it can be seen that if a family of curves is -invariant and sufficiently regular—say, satisfies the definition of a smooth foliation—then it must be a family of power laws.
For some geometric intuition, drag around on the following graph to view a power law transformed under different elements of and drag the slider to change the coefficient of the power law. The colored lines represent the family of power law relationships with the chosen coefficient. (We could only draw finitely many of them, of course!) Although individual curves may be affected by scaling, the overall family remains the same.
This insight does not reveal the coefficient of the power law—for that, we must use the particular form of Equation —but it is interesting to notice that the "scale-freeness" of the degree distribution is implied by nothing but the scale-freeness of preferential attachment.
Rather than tracking the evolution of the degree of a given vertex, let us now suppose that the distribution of degrees has reached an approximate steady state and see what we can figure out about this distribution.
We'll also consider a somewhat more general scenario than before. Let's say that preferential attachment gives a weight to edges of degree so that the probability that the vertex chosen by preferential attachment has degree is where is certain a normalizing constant Note that when is the expected degree of a uniformly chosen vertex, which is because there are always times more edges than vertices.
How does the number of vertices with degree change when one edge is added to the network? When will increase if the vertex selected to receive the new connection has degree and decrease if the selected vertex has degree In expectation, a new vertex that arrives with independently chosen edges will cause an increase of to the value of for and an increase of exactly to
If is a stationary distribution under this process—which strictly speaking will only happen in the large time limit—then will be proportional to the expected increase Indeed, since only one vertex is added, the constant of proportionality is Overall, we conclude the stationarity condition for all
If we hope to talk about scale freeness of the distribution we need to upgrade it to a distribution over continuous-valued quantities. So, let us take a leap of faith and make into a real-valued variable, replacing the equation above with the corresponding differential equation Our hope, which we will not attempt to justify in this post, is that the limiting distributions for "continuous-valued degree" predicted by this equation will be representative of our limiting degree distributions for large
One complication here is that the normalizing constant also depends on and in fact was defined based on the "discrete version" of So, for the purposes of our continuous-valued degree analysis, we'll focus on solving the eigenfunction problem for the operator (Specifically, we're asking for functions so that for sufficiently large inputs Let's call the "attachment operator" for It can be viewed as giving the Lie derivative of probability densities with respect to the vector field that "boosts" degree proportionally to Specifically, when gives the flow of the vector field and denotes the pushforward of a probability measure, Arriving at the attachment operator required a leap of the imagination, but now that we are here the role of invariance is back in sight. Indeed, when is a complete vector field, summable eigenfunctions of are exactly the distributions whose tails are invariant under the flow of In other words, we expect a distribution of degrees that is stationary under an attachment function to have a "continuous analog" that is invariant under the flow
We can solve the eigenfunction equation in general by rearranging it as a separable equation So, a stationary distribution for the attachment function will be of the form for some unknown positive
We can also go backwards, solving for an attachment function that has as an eigenfunction. For this we need to solve the differential equation Applying usual techniques—my favorite being to informally imagine the constant as a superposition of delta functions—we arrive at the expression where is the cumulative density function and is some constant. For to be positive we need while for to be a complete vector field we need to diverge, forcing So, subject to these two constraints, we have a unique solution of up to a scalar. Another way to derive this is to view our distribution as the image of the uniform distribution on under the image of Then there is a correspondence of invariances of and invariances of the distribution on the interval, but the invariances of the latter are exactly affine maps sending to subintervals. Pushing forward the family of invariances under and differentiating with respect to at gives us exactly the same expression as
Some remarks about the meanings of these equations in order. First of all, note that putting in gives us the bizarre conclusion that the density is exactly conserved by the flow of If is a probability distribution and is a complete non-zero vector field with then conservation of probability dictates that whenever the support of contains so in particular if is an eigenfunction of then it must have strictly positive eigenvalue. (For some intuition, see the widget at the start of this post.) But in fact, is complete exactly when diverges, while can be viewed as a density function when converges, so if one of these assumptions must fail.
Also, consider what it means to say that is a complete vector field. In fact, the reasonable condition to stipulate on in view of our preferential attachment process is that, given the stationary density we can define a density function for the "selected" degree, or in other words that is summable. But in fact, we have shown that any stationary distribution for will admit a such that and the RHS cannot converge to zero unless diverges, so completeness of is a necessary condition for to exist.
Overall, we get a nice bijective correspondence between families of probability distributions and complete vector fields with each considered modulo their tails. Indeed, it is straightforward to check that the relations send summable function to inverse-summable functions and vice-versa. This appears to be an interesting framework to investigate how attachment functions are related to the tails of the resulting degree distributions. However, recall that the eigenvalue is not determined by our analysis because it depends on the normalizing constant derived from the discrete model. Since the normalizing constant in turn depends on the distribution solving for in general requires returning to the discrete model and solving a (potentially intractable) non-linear equation.
Let's check some examples. With as in the BA model, our attachment operator becomes The eigenfunction with eigenvalue will be since indeed Because we know that we can substitute into equation to determine the eigenvalue In fact, so we will have which agrees with our derivation that above.
Another particularly simple case is corresponding to a lack of any preferential attachment. In this case we get whose eigenfunctions are exponentials. Indeed, the exponential distribution is invariant under translation in the same way that Pareto distributions are invariant under scaling.
Now consider a more general case With we find that the integral of does not diverge, and so we predict that we will not obtain meaningful stationary distributions for the reason mentioned above. (Specifically, the density would diverge for any stationary distribution ) On the other hand, for our analysis correctly predicts stretched exponential distributions of the form In practice, power law distributions are often confused with log-normal distributions. Up to a scaling parameter, the density function of a log-normal distribution is of the form This gives a slightly thinner tail than a power law, which could be written as What kind of attachment operator might produce a log-normal distribution?
Plugging in equation gives where is the complementary error function However, note that so we have the asymptotic approximation If we take exactly, our attachment operator turns out to have eigenfunctions which is very similar to the density of a log-normal.