Dataviz Salon SF #2: Maps, Grammars, & Models

by Michael E. Driscoll | May 8, 2009

A few nights ago the talented folks at Stamen Design hosted us at their studios for our second dataviz salon in San Francisco.  (Special thanks to Tom Carden and Michal Migurski for inviting us).  Four talks were given, which I’ll review in turn.

Stamen: Reaching through Maps

Eric Rodenbeck (Stamen) started by highlighting several mapping visualizations that Stamen has been hacking on recently and in the past, including Cabspotting in San Francisco , Crimespotting in Oakland, and Olympic Stadium spotting in London.

Eric showed how Stamen has attempted to move away from what Schuyler Erle has dubbed “red dot fever”, whereby the overlayed data can overwhelm our visual attention, and toward allowing various data layers to “reach through” the maps.

For example, the London Olympic maps provide a mixture of schematic, satellite, and webcam images.  These various drill-downs of detail are not all exposed, but rather collaged.  Even more interesting was a movable ‘lens’ that, as it is moved over regions of a map, reveals another layer (reminiscent of a polarized-light based mural at Boston’s MoS).  In these ways, additional layers of data are only selectively brought into focus (echoing a design pattern in Japanese gardening, mie gakure, meaning “seen and unseen”).

One practical gem that Mike Migurski shared regarding the Oakland Crimespotting site was, “the design of a comments section is a huge part of how its perceived and used.”  Nota bene, social web developers.

Protovis: A Declarative, Open Source Graphical Toolkit

Mike Bostock (Stanford CS) introduced Protovis, an extensible visualization toolkit implemented using Javascript’s canvas element. Protovis draws inspiration from Leland Wilkinson’s Grammar of Graphics, which argues for moving away from the prevailing method of building visualizations, where data are simply poured into one of several chart types — pie, stacked bar, or scatter.

Wilkinson argues that visualizations should not be cast from chart typologies, but rather composed of graphical primitives. In Protovis, these primitives include dots, areas, lines, and labels (called “marks”).

Among Protovis’s strengths are:

A More Declarative Syntax for Creating Graphics
One disadvantage of directly using Javascript’s canvas is its imperative style. To draw a diagonal line, the code must manipulate and move a pen using x,y coordinates. With Protovis, however, the code declares (roughly) “add a bar to this graph” (example). Thus Protovis provides a grammar for statements about graphical marks, rather than statements about graphical mechanics.
Visible Open Source
With Protovis, the source code is not just open and available, it’s viewable from within the browser. I have an admittedly personal bias for open source data visualization, but lowering the barriers to sharing source code ultimately drive faster adoption and iteration of visualization techniques.

Mike has used Protovis to recreate classic data visualizations by Will Burtin, Florence Nightingale, William Playfair, and others. You can find these at the Protovis site and in their InfoVis ‘09 paper.

(For those interested in a Wilkinson-inspired approach for graphics in R, check out Hadley Wickham’s ggplot).

A Mathematician’s View: A Visualization is a Hypothesis

Jason Morton (Stanford Mathematics) made the argument that a data visualization is not merely a descriptive vessel, it is a predictive model.

A visualization is a model is because, especially with large data sets, not every dimension of every observation can be shown. Quite simply, a (compressed) 100k data visualization cannot losslessly describe a (compressed) 10 Mb data set: information must be discarded. What remains is a model of the original data, albeit a visual model.

Moreover, a data visualization’s model is predictive: it presents a hypothesis about how observable data points were generated, and implies predictions about future, as-yet-unobserved data.

Seen from this perspective, Stamen’s Crimespotting maps are powerful precisely because they make compelling hypotheses about when and where crime occurs in Oakland. Their London Olympic maps, which integrate time series photographs of the stadium site, take a position about the pace of construction and how it is impacting the landscape.

“Form Ever Follows Function”

And if the function of a data visualization is to make hypotheses, then its form should follow this function. The arbitrary use of color, position, shape, and ornament — only adds noise.

The ever popular Wordle provides a visual model for word distribution in a text: more frequent words are larger. However, a word’s color, position, and font are arbitrarily chosen - they carry no meaning, and model nothing. Indeed, the “randomize” button is an admission of as much (for it does not randomize size).

Adding arbitrary marks or dimensions to a visualization carries two related risks: first, it can obscure the true model that’s trying to be conveyed (what do same-colored have in common?); second, this added complexity, beyond polluting the information channel, has a cost: the visualization is larger. Bar graphs with iPhone ads in the background cannot be succinctly rendered.

The parallels to the modernist movement in architecture are obvious. Adolf Loos wrote in 1908 that “the evolution of culture marches with the elimination of ornament from useful objects.” The American modernist Louis Sullivan proclaimed that “form ever follows function.”

But the truth is that stripping visualizations down to their bare models can be counterproductive. Call it noise or ornamentation, but even visual marks that do not advance a hypothesis can act to support it,  by guiding the eye, providing context, or otherwise speeding the absorption of a pattern by the human brain. At the very least, this functionalist perspective can help data visualizers use ornamentation intentionally, not inadvertently.

UUorld: Multidimensional Extrusion Maps

Zach Wilson (UUorld) showcased his company’s software that simplifies creating and exploring extrusion maps. Among the several interesting applications of his software, Zach showed off a temporal visualization of the spread of swine flu in the United States over the past several weeks.

In response to the critique that layering data dimensions on two-dimensional maps could be done more effectively by use other indicators such as color — instead of the simulation of a third dimension of height — Zach indicated that research has shown that physical dimensions (or their simulation) possess greater visual saliency to the human eye.

Zach also mentioned UUorld’s data portal which contains thousands of downloadable statistics from a variety of public sources; some of which have been used to generate UUorld visualizations.

comments

5 Responses to “Dataviz Salon SF #2: Maps, Grammars, & Models”

  1. Amyric Duclert on May 8th, 2009

    One issue I see with Protovis is that — while the visualization is pushed client side, the data for the visualization must also be funneled through the client.

    (For example, the source code for the Jobs Browser at http://vis.stanford.edu/protovis/ex/jobs.html shows a call to jobs.js, a 5000-line data file).

    This is fine for relatively small visualizations, but for large-scale data sets, it can become an issue, one that doesn’t exist — OTOH — when data and graphic generation live server-side.

  2. Michal Migurski on May 9th, 2009

    Amyric, I think that’s more of a feature than an issue - there is a universe of visualization dark matter that would benefit from client side rendering of tiny data sets. Sparklines are one example, all the other tiny charts on protovis.org are another.

  3. Tom Carden on May 9th, 2009

    It’s worth comparing the size of the data set: jobs.js is almost 500KB, but if it was served with gzip compression enabled it would be around 60KB. A png image of the initial view of the visualization is almost 300KB… but the visualization is interactive, so you’d need a png for each possible view.

    So while Amyric’s point is valid if you’re considering a dataset of more than a few megabytes, it doesn’t really hold for the jobs.js example. I think the jobs.js case is actually compelling evidence for pushing more datasets to the client in order to make things interactive.

  4. Overheard @ Stamen: Mie Gakure Maps, Graphical Grammars, & Visual Models on Datavisualization.ch on May 11th, 2009

    […] from the Dataspora Blog has writen a comprehensive blogpost about the DataViz Salon held at the Stamen office in Mission District, San Francisco. In the post […]

  5. Dataviz Salon SF #2: Maps, Grammars, & Models :Health Fitness Wealth on July 18th, 2009

    […] Stamen: Reaching through Maps […]

Leave a Reply