Color: The Cinderella of dataviz
“Avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.” — Envisioning Information, Edward Tufte, Graphics Press, 1990
Color is one of the most abused and neglected tools in data visualization. It is abused when we make poor color choices; it is neglected when we rely on poor software defaults. Yet despite its historically poor treatment at the hands of engineers and end-users alike, if used wisely, color is unrivaled as a visualization tool.
Most of us think twice before walking outside in fluorescent red underoos. If only we were as cautious in choosing colors for infographics. The difference is that few of us design our own clothes. But until good palettes (like ColorBrewer) are commonplace, to get colors that fit our purposes, we must be our own tailors.
While obsessing about how to implement color on the Dataspora Labs’ PitchFX viewer I began with a basic motivating question: Read more
People who love scatter plots & connecting dots

We hosted the first Dataviz Salon SF on Tuesday night, with lightning talks by boredom cop Shane Booth, dataviz wiz Lee Byron , computational journalist Brad Stenger, data wrangler Pete Skomoroch , and any/all data enthusiast Brendan O’Connor .
I was going to blog all about it — but Tom Carden of Stamen Design already has a great write-up.
… Dataspora invited a few people to a Dataviz Salon yesterday evening. Mike and I went along and huddled in a brick-built basement in SoMa to listen to the following:
.
The case for open source data visualization
When I was in graduate school, the most closely studied part of the scientific publications we read was not the results, but the methods sections. (It was also, incidentally, often the hardest section to write for one’s own publications.) Methods sections are wonderful because they allow you to verify that someone else’s work is correct — by reproducing it yourself. But more importantly, methods sections allow you to build upon the work of others. They are the open source code of science.
Unfortunately, for all but a small fraction of data visualizations on the web, there are no methods sections being published. This is a shame, because it slows the free flow of ideas and prevents the creative extension of other people’s work.
Three conditions must be met for a data visualization to be considered open and reproducible:
Read more
Dataviz is… the right glyphs in the right places
“To achieve great things, we must be self-confined:
Mastery is revealed in limitation
And law alone can set us free again.”
– Goethe, Natur und Kunst
Data visualization suffers from the curse of dimensionality. Visualizing data is hard because there are so many ways to do it — and so few ways to do it right. There are orders of magnitude more arrangements of five points on a grid than there are five-word sentences. In his book, Information Visualization , Colin Ware enumerates several dimensions in visualization — including form, color, and position — which can be combined to create this vast space of possibility. The graphs below all visualize the same five data points but with different forms.
Written language has a grammar, and only a minute fraction of possible five-word sentences are valid. From this perspective, data visualization is deceptively easy. To a first approximation, anything goes. A fluorescently colored Excel ‘97 pie chart may reflect poor design choices, but it doesn’t break any rules.
Constraints — whether formal grammars or accepted conventions — force choices upon creators, freeing them to train their expressive powers elsewhere. Read more
Visualizing Tim Wakefield’s knuckleball
(May 2009 Update: The Pitch F/X viewer for all MLB Pitchers is up and running).
“Back in 1980, STATS Inc. … sent its own scorekeepers [to record] play-by-play information about the games that had never before been systematically collected: the pitch count at the end of at bats, pitch types and locations, the direction and distance of batted balls. They broke the field down into twenty-six wedges radiating out from home plate.”
– Michael Lewis, Moneyball, p. 84
A friend recently sent me a blog post (originally from Josh Kalk ) visualizing the differences in two Red Sox pitchers’ styles by using a data set — called MLB Extended Game Log — which catalogs over a dozen attributes of each pitch thrown. This got me wondering about why baseball has attracted such interest by statisticians, and also about ways in which this pitching data, in particular, might be better visualized.
Read more


