What can Darwin’s finches tell us about the downturn?
Newspaper articles paint the markets in metaphors like “difficult climate” and “harsh landscape” –but these clichéd phrases have a kernel of truth. Thinking about markets as natural environments reveals that selective forces are at work. But it also predicts when they work. In the natural world, as the story of Darwin’s finches tells us, selection acts in times of crisis: drought, famine, and disease. For our markets, that time is now.
(Aside: I confess that relating the economic crisis to Darwin is a symptom of an academic bad habit: namely, mapping every phenomenon onto the intellectual giant of one’s field. Somewhere there is a psychologist blogging about Freud and the economy).
When does natural selection act? This question motivates two modern naturalists, Peter and Rosemary Grant, who studied Darwin’s finches over several decades on the Galapagos islands, and whose work is chronicled in Jonathan Weiner’s The Beak of the Finch.
During the wet seasons, it was hard to see how a finch’s beak made any difference to its fitness.
“[F]inches with long thin beaks and short fat parrot-like beaks [were] all hopping on the same lava, eating identical bird food… All those beaks were cracking the same birdseed.” [p.52]
A long line of ornithologists had concluded that the beak of the finch was unimportant.
But despite this, Peter and Rosemary Grant kept returning to the islands, and kept measuring beaks. In 1977, the rainy season brought no rain. Weiner describes what the naturalists’ witnessed:
“They found fewer than two hundred finches alive on the island. Just one finch in seven had made it through the drought… The average beak before the drought was 10.68 millimeters long and 9.42 deep. The average beak of the fortis that survived was 11.07 millimeters long and 9.96 deep… The birds were not simply magnified by the drought: they were reformed and revised. They were changed by their dead. Their beaks were carved by their losses.” [p.78]
The drought was the crucible that shaped the species. And it wasn’t simply size, but dimension (longer and deeper beaks, versus wider) that separated the survivors from the dead.
In the same way, the benefits of new technologies are often masked during good times. Firms with both new and old technologies remain solidly profitable, happily hopping along. Like ornithologists watching finches in the wet season, some analysts have questioned whether technological innovation even matters. Robert Solow summed up this paradox by quipping “You can see the computer age everywhere but in the productivity statistics.”
But when hard times hit, innovators survive. More importantly, they flourish when the business cycle swings up again. Work by Erik Brynjolfsson and others has shown strong positive evidence for technology’s impact on productivity, most markedly over five-to-seven year periods – the resonant frequency of the business cycle. But like Darwin’s finches, the survivors are not just those who have more technology investments, but those who get the dimensions right.
Downturns are not only good for innovation, they are necessary. While innovation may occur in times of plenty, crises allow the right innovations (hybrid cars) to outcompete the wrong ones (SUVs). This assumes that crises are allowed to run their course (the case against bailouts), but that there are at least some survivors (the case for them).
As a data guy, I’m cautiously optimistic that firms who have invested in analytics, who have quietly innovated in understanding their business data, will emerge as winners on the other side of this downturn. As a contemporary of Darwin’s said, “That which does not kill us makes us stronger.”
How do you measure a major league slugger?
I gave a talk last month at SAP Labs in Palo Alto, along with Jim Porzak of ResponSys, introducing the R Statistical Language to a Business Intelligence interest group. The goal was to highlight how open source tools, like R, can be used to build predictive models. The example I gave centered around baseball and a simple question: how do you measure a baseball slugger?
Michael Lewis, in Moneyball , described how the baseball analyst Bill James was frustrated by the fact that major league hitters were consistently rated by their batting averages. James wrote:
“a hitter should be measured by his success in that which he is trying to do, … create runs. It is startling, when you think about it, how much confusion there is about this.”
- Bill James, 1979 Baseball Abstract
However, since teams create runs, not batters, the only way to connect batting statistics with runs is to use team averages. The idea is that if we know which statistics predict runs at the team level, these statistics could be used to measure individual hitters.
I decided to test the value of three batting statistics myself — batting average, slugging percentage, and OPS (on-base plus slugging) — and see how well they predicted team runs, using MLB team data for the years 2000-2005 (available from baseball-databank.org). The results are shown in the three scatter plots below, and no surprise, Bill James is right: a team’s overall batting average (top-most chart) is a comparatively poor predictor of how many runs it will score in an average game. Slugging percentage (middle plot) is a slightly better predictor, and OPS (bottom plot) is the best of the three statistics I looked at: it has a 0.95 correlation with runs scored (the r shown in the upper right corner of the plots is the Pearson correlation coefficient, the red lines represent least-squares fits to the points).
I highlighted a couple of interesting outliers in the top batting average plot: teams that achieved a high level of scoring with a comparatively low team batting average. Who were these teams? None other than Billy Beane’s 2000 and 2001 Oakland Athletics. This suggests that the As management may have found excess value in fielding players who — despite having slightly lower batting averages — were capable of generating runs.
These results show what Bill James and others already know: that a baseball slugger should not be measured by his batting average, but by OPS or other hybrid statistics that best correlate with his success at generating runs. There is nothing novel about the results of my analysis. But what I hope is novel is showing how it can be done using open source tools (R and MySQL), free data (baseball-databank.org), and a few lines of code ( sabermetrics using R page).
cloud computing and the rise of a fungible, elastic computing infrastructure
Over the last few days I’ve attended a couple of events here in San Francisco discussing the promise of cloud computing. I believe there are several reasons why this technology represents a paradigm shift (and one that does justice to Kuhn’s original meaning).
First, what is cloud computing? From the ten-thousand foot view, it is technology that uncouples web servers from their underlying hardware; it re-conceives them from being physical machines with plugs into “instances” running on top of the hardware, bundles of bits that can be moved and multiplied as easily as the software on our desktops. The “cloud”, like the “web” , is an abstraction whose physical reality — data centers with thousands of softly humming servers — we need not care about.
This shift has far-reaching consequences for the economics of computing, among them:
- Fungibility of computing power. When the hardware that powers servers is indistinguishable and interchangeable, it becomes a fungible commodity. It binds together the entire market for computing infrastructure into one of massive scale (an estimated 1.5% of all energy use in the U.S. is due to servers). Servers can now run on hardware the way cars run on gasoline. It creates competition and market opportunities that didn’t previously exist. companies like Amazon and Google are recognizing the opportunity to become the dominant providers of this new commodity.
- Computing power as a leasable (and releasable) commodity. As Werner Vogels, the CTO of Amazon.com, has observed, cloud computing allows firms to shift infrastructure costs from being capital expenditures (owned) to operating costs (leased). Most servers are idle most of the time — this is because firms have traditionally invested in computing infrastructure to meet their peak capacity needs. Leasing computing power allows firms to more efficiently fit their demand, leasing more capacity on the cloud when needed, and releasing it when a peak period has passed.
Given these two features of the cloud computing marketplace, one might wonder where the competitive advantages lie for firms in this space outside of cost competition. In other words, what’s to stop users from seeking only the lowest prices? There is room for differentiation in service offerings, most notably in terms of reliability and security.
But a less conspicuous competitive edge that accrues to the first-movers in this space owes to the weight of data. Because of the relatively high cost of bandwidth, data needs to live close to the computing power that operates on it. But this same high cost of bandwidth makes moving data warehouses from one cloud provider to another — unlike migrating servers “instances” — an expensive proposition. Indeed, one of the feature requests for the Amazon Web Services folks at June 24th’s CloudCamp was to accept data uploads mailed in on optical disks — showing that snail mail still lives as a cheap, fast way to move data.
For data analytics applications, many of which require short-lived but intense bursts of computation (performing daily or monthly trend analysis, for example), cloud computing offers cheap access to vast CPU power. It also provides a compelling incentive to parallelize these analytics algorithms - as leasing 100 servers for one hour is cost equivalent to having one server for 100 hours.
Cloud computing promises to elevate our computing infrastructure to the level of a utility, like water, gas, and electricity: something we take for granted in best sense.
Google app engine and the read-write-execute web
Google’s App Engine cloud service is launched, and it reflects a very different philosophy than Amazon’s. First of all, as we are used to with Google, the first bit is free; this should get a lot of small users over the hump who were hesitant to start the flow of funds (and $75/month minimum for a persistent presence) to Amazon. For many purposes, the free account will be enough.
Secondly, it is very much in the direction of the next generation of the web: the read-write-execute web. The read-write web made data first-class, enabling two-way communication; the rwx-web makes functions first class. In this environment, not just the data displayed but the code executed on a rwx-web site is user-contributed.
Google provides a sandboxed platform, development environment, a Python runtime, and APIs to link to persistent storage and Google services (authentication, mail, etc.). This removes a lot of complexity for developers and allows many scalability issues to be dealt with by Google engineers, at the platform level.
I was delighted to see their choice of Python: it is a remarkably clear language, and well suited for web applications. Google’s endorsement of Python through employing the BDFL helped put some corporate power behind the language, but if App Engine catches on it could really vault Python to the next level in terms of acceptance. It also strikes me as particularly suited to the kind of abstraction they are offering.
