Frequency of Friendship Predictors

Lada A. Adamic and Eytan Adar
Information Dynamics Lab


The internet has proven to be a rich source of information that allows one to "predict" real world relationships between people [1]. For example, by looking at the text or links which appear on two personal homepages, we can make a prediction as to whether those two people know each other. Here we focus on the distribution of those predictors for a subset of Stanford users with homepages. These predictors include text, in-links, out-links, and mailing lists. Text consists of words and phrases selected by ThingFinder from the user's homepage. In-links are links pointing to that page. Out-links are links on a user's homepage which point to other pages. Mailing lists are publicly accessible email lists on the main Stanford mailing list server.

Figure 1 shows how frequently these predictors occur for users. Note the log-log scale of the plots. The fact that the plots are straight lines on a log-log scale indicates that the distributions are power-law. That is the number of lists with a given number of users drops off as n-a, where x is the number of users. This means that many words or links are mentioned by only a few users, while only a few, such as "www.stanford.edu" are mentioned by a large number. The slope a is in the [2.5,3] range for text, in- and out- links, while for mailing lists it is closer to 1.6.
chart depicting Frequency of mention for text, out-links, and in-links at Stanford chart depicting Distribution of users among mailing lists for Stanford
Figure 1a Frequency of mention for text, out-links, and in-links at Stanford. Note that, as expected, users on average had more words than links on their homepages, and even fewer links on average pointing to their homepages. Figure 1b Distribution of users among mailing lists for Stanford

Power-laws have been associated with many naturally occuring and man made phenomena. For example, George Kingsley Zipf studied the frequency of occurence of words in english text [2] and found it to be power-law. More recently, a number of power-laws have been observed on the internet, from the popularity of sites [3] to the number of links[4].

The power-laws in our data are a nice illustration of the hierarchical organization of human society. For example, a large number of students go to Stanford, a large but smaller number study biology, and very few study plant ecology. But the smaller subsets, such as "plant ecology" are much more numerous than entire departments, which are limited in number. Thus we see in the information gathered about people on the internet a reflection of real world social networks.

References

1. L.A. Adamic and E. Adar (2000), Friends and Neighbors on the Web.
2. G.K. Zipf, "Human Behavior and the Principle of Least Effort", Hafner, New York, 1949.
3. L.A. Adamic and B.A. Huberman, "The Nature of Markets in the World Wide Web", QJEC 1(1), pp. 5-12 (2000).
4. M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On Power-Law Relationships of the Internet Topology", SIGCOMM '99 pp. 251-262.