Visualizing Tim Wakefield’s knuckleball
(May 2009 Update: The Pitch F/X viewer for all MLB Pitchers is up and running).
“Back in 1980, STATS Inc. … sent its own scorekeepers [to record] play-by-play information about the games that had never before been systematically collected: the pitch count at the end of at bats, pitch types and locations, the direction and distance of batted balls. They broke the field down into twenty-six wedges radiating out from home plate.”
– Michael Lewis, Moneyball, p. 84
A friend recently sent me a blog post (originally from Josh Kalk ) visualizing the differences in two Red Sox pitchers’ styles by using a data set — called MLB Extended Game Log — which catalogs over a dozen attributes of each pitch thrown. This got me wondering about why baseball has attracted such interest by statisticians, and also about ways in which this pitching data, in particular, might be better visualized.
Among the major professional sports in America, baseball is the game most amenable to statistical analysis. Whereas in football an offensive drive can be any range of yards, in baseball a hit comes in just four sizes. In football, hockey, and basketball, a clock continuously draws us toward the game’s end; in baseball, innings and outs mark the stepwise progress of the game. Baseball is, in many ways, a lovable finite-state machine — a game composed of a countable number of states (innnings, outs, the count, base runner configurations).
It’s not surprising, then, that statisticians have had a long love affair with baseball. Yet only in recent years have saber-metricians like Bill James finally gained the esteem of the front office and even the dugouts — as Michael Lewis’ Moneyball recounts.
(Aside: baseball is to professional sports what Wall Street is to the business world; a field dominated by numbers and consequently the leading adopter of quantitative techniques among its peers. It’s perhaps no coincidence that Michael Lewis’ earlier book, Liar’s Poker, centered on Wall Street).
To get back to the original motivation for this post, the MLB’s Extended Game Log introduces detailed data about pitches; for this data to be used effectively, it must be presented effectively. Unfortunately, data visualization is a hard problem — the choices of layout, color, point shape and size are nearly infinite. I have thus taken the same 2007 data (data for every pitch thrown in the MLB, courtesy of Josh Kalk’s blog ), and generated a per-pitcher visualization. However, rather than overlaying all pitches on a single graph, I have created a mini-plot for each kind of pitch (fastball, slider, etc.). The result is that each pitcher has a panel that characterizes his pitching style, based on all pitches in the 2007 season.
In my visualization shown below, color depth is used to illustrate pitch count (deep blues represent more pitches) and pitch distribution is evidenced through the small multiples of the mini-graphs. One can see, for example, that Wakefield throws almost nothing but knuckleballs, and those knuckeballs break in every direction. We can also see how Matsuzaka’s and Sabathia’s choice of pitches are similar, but that Matsuzaka lacks a splitter. What is not preserved, but should be, is a fixed axis for the horizontal and vertical breaks (each sub-graph is centered on its own axes) that would better demonstrate how different pitchers and pitches use the strike zone.
What’s happening in baseball is a harbinger of what’s to come in other sports, and in other industries, as more data is generated and available for analysis. A few lessons emerge from baseball. First, data analysis will not replace human decision-makers; a manager’s domain expertise and tacit knowledge can not be replaced by an automated algorithm. However, data analysis can support and augment decision-makers’ instincts and reasoning abilities. Second, because decision-makers are people, and because — given our powerful visual system — seeing is believing, the way in which data is presented is critical.
comments
3 Responses to “Visualizing Tim Wakefield’s knuckleball”
Leave a Reply

Interesting use of visualization… sure got me reading the blog though…
[…] very interesting blogpost on ‘Visualizing Tim Wakefield’s knuckleball‘. A statisticians view on analyzing and visualizing baseball pitches and baseball data in […]
[…] Open Source Hero II: Michael Driscoll, who makes awesome visualizations of huge data sets using the statistical analysis software R. Check out his six-dimensional analysis of baseball pitches. […]