The Laws of the Web: Patterns in the Ecology of Information

Stuart Hannabuss

Library Review

ISSN: 0024-2535

Article publication date: 1 September 2005

261

Keywords

Citation

Hannabuss, S. (2005), "The Laws of the Web: Patterns in the Ecology of Information", Library Review, Vol. 54 No. 7, pp. 440-442. https://doi.org/10.1108/00242530510611983

Publisher

:

Emerald Group Publishing Limited

Copyright © 2005, Emerald Group Publishing Limited


The internet is so large and ubiquitous that it is easy to regard it as a form of information chaos. This makes it like other large apparently chaotic systems, including social and economic ones. Bernardo Huberman is an HP Fellow at the Hewlett‐Packard Laboratories in Palo Alto in California and has published widely in physics and information science. His particular cross‐disciplinary interest in understanding and quantifying non‐linear systems, and eliciting orderly principles and “laws” from what seems disorder, characterizes many of his publications, which extend over topics like information flow, market dynamics, traffic optimization, patterns in unstable structures, and expectation. Along with colleagues and fellow researchers like Adamic and Glance and Maurer, Huberman has also examined the ecology of information, above all on the internet and world wide web (www): this research comes through regularly in The Laws of the Web to underwrite his claims and observations. Huberman’s earlier work, The Ecology of Computation (Amsterdam, North‐Holland, 1998), is the point of reference here.

Ecology of information in The Laws of the Web takes the form of drawing out what Huberman regards as “laws” or patterns of information use there. Readers familiar with Zipf’s law (on information occurrence related to rank order) will get the idea. Huberman’s arguments, in a nutshell, are that there are regularities and link structures in the information held on the www, in the ways in which people search for information and entertain expectations of finding information, and in the ways in which markets (particularly e‐commerce markets) develop and perform. He looks at surfing and web congestion, teasing out conclusions about the social behaviour of the information age itself. Though familiar territory for followers of Huberman’s work, what is new in The Laws of the Web is his aim to produce a book based on mathematical and statistical principles but textually relatively free of complex formulae, so that the book is accessible to the general reader. In this he succeeds admirably, producing a densely fertile and thoughtful book likely to challenge information specialists to revisit what they know of information provision and search patterns. Readers expecting more of a mathematical and economic underlay may be frustrated at the absence of detail, though this is implicitly there in the analysis and through a short, but highly focused, reading list.

In essence, websites and traffic on them appear to follow predictable patterns. The Internet Archive Project and the research conducted by Huberman and his colleagues on massive data‐sets on the scale of AOL traffic (some 30 per cent of the www, it is alleged) suggest that the distribution of page counts on websites follow a power law predicted by the formula 1/n, with n to the power of x where x is some number greater than or equal to one (so if x were two, websites with two pages would make up a fourth of all websites, and so on). This creates a lognormal distribution, very different from a normal distribution. Huberman argues that such a hidden pattern underlies the apparent chaos of the www, most websites being small and most users clicking most on small websites and clicking on predictably grouped sequences of web‐pages as they try to locate the information they need. These, then, are the “regularities” or “laws” of the web. If, Huberman suggests, we think that traditional statistical and economic laws or rules are irrelevant to and for the internet, then think again: the revealed existence of such regularities prove that patterns of provision and use of scarce resources still very much apply.

The probabilism of surfing behaviours is the information analogy of Milgram’s social connectedness (we all know anyone else through chains of “small world” links, usually not greater than six links) and of Einstein’s analysis of molecular behaviour. Most people click on very few pages on the www, shedding both statistical and economic light on the price and time expectations and tolerances of web users, and providing valuable information for content providers such as eBay and Amazon (who themselves dominate statistically in lognormal ways). Huberman applies this reasoning, too, to internet congestion, contrasting the internet to the commons (common grazing land) where pay‐offs and expectations shape social behaviour such as defecting (where people give up on a search). Assumptions of rationality are usually made, but other factors may also be in play, Huberman hints. These regularities apply also to downloading information where the variability of travel times (how long the information takes to reach us) hypothesizes optimal search and restart‐search decision‐making. Maurer and Huberman’s computer simulations on this point to relevant research: this demonstrates that the reader determined to take full advantage of this book needs access to a well‐equipped library and/or online journal content.

Of particular interest to readers keen to draw out inferences for e‐commerce decisions will be Huberman’s final chapter on markets and the web. He argues that digital markets have less friction in the sense that they offer higher levels of price elasticity: most effective websites have higher than normal market share, a small number of websites command large segments of traffic. In other words, what is intuitive in a commercial sense, that success breeds success, gets statistical and bibliometric confirmation. Visits and website dynamics seem to prove this: the probability that a site has n visitors is proportional to 1/n (to the power of exponent beta, where this exponent is a number close to 2.0). Even so, while some of the most effective websites attract most traffic, site popularity and age are correlated only in non‐linear ways and simplistic inferences should not be drawn, above all by entrepreneurs anticipating market entry. The “laws”, then, aim to be both mathematical/statistical and economic/social.

All this has a legal dimension too, shown most clearly by MP3 file‐sharing services like Napster and Gnutella and FreeNet. They reveal all the characteristics of large systems, a digital commons, where free‐riding presents economic risks and system vulnerability presents privacy and copyright risks. Universal consumption at the risk of production is a possibility, and this is characteristic of any common good with no authority. Economists would identify this as the public good problem and ask whether systems like Gnutella can ever be a public good. The voluntarism and free‐spirit ethos of such systems might blind us to the implications. Huberman’s own analysis revealed 70 per cent free‐riding on such systems and that a relatively small number of peers provided the bulk of the shared files: this could make them identifiably easy to sue by the big labels (not something that Recording Industry Association of America has exclusively done, it has to be said!). Huberman confronts the problem of free‐riding by suggesting scaled file‐sharing, encouraging more market features (for users to buy and sell, a system based on utility and value), and reducing costs. This is really kite‐flying and only incidental to his main arguments about the ecology of information, although they pick up on the principles developed for e‐commerce.

This is a compact thought‐provoking work, distilling a lot of topical research ideas and challenging conventional wisdom (always a feature of work from The MIT Press). The “laws” Huberman elicits and argues for are really regularities in information systems and information behaviour, about distributions and congestion and markets: claims for their existence and their relevance reinforces our understanding of how patterns in information use, on the www and generally, illuminate social dynamics and social systems. The ecology here, then, is universalistic. A book to be read and re‐read, thought about and argued over, but also examine the evidence for yourself, starting with Huberman’s own. For any specialist, research, and academic collection where the sociology of information, informatics and computing science are taken seriously.

Related articles