Principles of Information Display for Visualization Practitioners

by Al Globus (globus@nas.nasa.gov), CSC @ NASA Ames Research Center
NASA Contract NAS 2-12961
28 November 1994

Note: a German translation by Jens Meiert is available here.

Introduction

This paper is intended to give the visualization practitioner an overview of Edward Tufte's work on information display. Dr. Tufte has written two classic books on information display: The Visual Display of Quantitative Information and Envisioning Information. I believe that many of the concepts in these books are important to scientific visualization, but are often not applied by practitioners.

Much of this paper is Tufte paraphrased; e.g., where Tufte might say `graphical excellence', I write `visualization excellence'. When you see the word `ink' (paper technology!), think `non-back-ground pixels'. Passages in quotation marks are direct quotes. Most of the text is a re-wording of Dr. Tufte's ideas, but all comments on the current state of visualization belong to me; and I am responsible for all errors.

The reader is encouraged to read Tufte's books. The treatment here is brief, incomplete, picture-poor, and low resolution.

Excellence

Visualization excellence Visualizations should

Principles

Visualizations should strive towards the following goals:

Content Focus

"Above all else show the data." The focus should be on the content of the data, not the visualization technique. This leads to design transparency. Avoid "fooling around with data" and use a clear, simple, straight-forward design with a richness of data. The success of a visualization is based on deep knowledge and care about the substance, and the quality, relevance and integrity of the content.

Assume that the viewer is just as smart as you and cares just as much. Never `dumb-down' a visualization.

Comparison vs. Description

"At the heart of quantitative reasoning is a single question: Compared to what?" Most visualizations today are descriptive rather than comparative. This may be part of the reason why scientific graphics, even those about multivariate phenomenon, are dominated by the xy-plot. The xy-plot invites reasoning about causality in a way that even the most impressive isosurface does not. We should strive for relational, rather than merely descriptive, visualizations.

To focus a visualization on "Compared to what?" enforce visual comparisons, particularly within the eyespan. Avoid relying on the viewer's memory to make visual comparisons; a weak facility in most of us.

Integrity

Misleading visualizations are common. Although the following suggestions are tuned to statistical graphics, following a few of Dr. Tufte's rules may help limit unintentional visualization lies: There is even an equation to quantify one approach to lack of integrity, the lie factor. Lie-factor = size-of-effect-shown-in-visualization / size-of-effect-in-data.

High Resolution

Human eye registers 150 Mbits and can understand this avalanche of data because it is connected to a terrific editor: the brain. Consider some information sources in this context: Consider also the character density of some information forms: Nobody uses a 1200 baud modem when a 56 Kbit leased line is available. Similarly, we should not accept any less than the maximum information transfer rate for our visualizations. Not only is the information density of the computer screen somewhat low, it is further reduced in visualization packages that allocate only a smallish portion of the display to data and the rest to widgets and other "computer administrative debris."

Classic Designs

Tufte has researched a number of classic information designs and general principles. Some of these are small multiples, time series, and micro/macro composition.

Small Multiples


A. Ghizzo, B. Izrar, P. Betrand, E. Fijalkow, M. R. Feix, and M. Shoucri, "Stability of Bernstein-Greene-Kruskal Plasma Equilibria: Numerical Experiments Over a Long Time," Physics of Fluids, 31 (January 1988).

A small multiple design consists of a single design repeated several times within the eyespan, each example showing a different value of the independent variable(s). "Comparison must be enforced within the scope of the eyespan," a task at which small multiples excel. Thus, "for a wide range of problems in data presentation, small multiples are the best design solution." "At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives."

"Well designed small multiples are

Note that "simultaneous two-dimensional indexing of the multiplied image, flatland within flatland, significantly deepens displays, with little added complication in reading."

Small multiples are a straightforward extension to many current visualization systems, although graphics performance may be a problem.

Time Series


L. Hugh Newman, Man and Insects (London, 1965), pp. 104-105.

"The time-series plot is the most frequently used form of graphic design." One dimension, usually the horizontal, is time, and the graphics march along showing variation as time proceeds. Most visualization time-series works are videos, which show time by changing the picture, requiring the user to remember what came before. Finding innovative ways to incorporate time-series into visualization systems should be given serious consideration.

Micro/Macro Composition


Diagram by David H. Hathaway, Marshall Space Flight Center, NASA.

Micro/macro composition refers to an approach where a visualization contains enormous detail, but an overall pattern emerges. "Panorama, vista, and prospect deliver to viewers the freedom of choice that derives from an overview, a capacity to compare and sift through detail. And that micro-information, like smaller texture in landscape perception, provides a credible refuge where the pace of visualization is condensed, slowed, and personalized."

Design Guidelines

This section summarize design guidelines and principles found in Tufte's work. Visualizations "are paragraphs about data and should be treated as such." Words, pictures, and numbers are all part of the information to be visualized. All should be integrated together, not separated into word processor documents, spread sheet tables, and visualization package screens. Here are some guides for workaday designs: Definition: Chartjunk - miscellaneous graphic gunk attached to a chart (visualization) that has nothing to do with the data and everything to do with poor taste.

Data-ink ratio

There is at least one quantitative measure of a visualization, here expressed in terms of ink rather than pixels. The translation is straightforward.

"Data-ink ratio = data-ink / total ink used to print the graphic = proportion of a graphic's ink devoted to the non-redundant display of data-information = 1.0 - proportion of a graphic that can be erased without loss of data-information"

One should

This leads to tight visualizations with a minimum of extraneous junk. We see that "for non-data-ink, less is more. For data-ink, less is a bore." This suggests five principles of data graphics: Just as good prose is often the result of revision and editing, good visualization requires criticism and rework.

An interesting example of non-data ink erasure is range frames. The frame of an xy-plot is non-data ink. Most of the frame can be erased without loss of information, and if only the portion of the frame between minimum and maximum values is left, the frame provides additional information!

Clutter

When viewing a visualization jammed with incomprehensible, cluttered graphics, there is a great temptation to remove data; even relevant information. But "clutter and confusion are failures of design, not attributes of information." If a visualization is too cluttered, don't remove data, change the design. Credibility comes from detail and in many cases one can clarify a design by adding detail. "High-density designs also allow viewers to select, to narrate, to recast and personalize data for their own uses. ... Data-thin, forgetful displays move viewers toward ignorance and passivity, and at the same time diminish the credibility of the source."

Empty space may reduce clutter, but "it is not how much empty space there is, but rather how it is used. It is not how much information there is, but rather how effectively it is arranged."

Low density computer displays lead to spreading information out over many screens or dialog boxes. This leads to the "one damn thing after another" syndrome which causes users to get lost in an information maze. Place information adjacent in space, not stacked in time, to avoid the `Where am I?' problem.

Layering and Separation

Consider a colormapped surface that requires annotation. If the colormap uses all possible colors, positioning annotation will be difficult because of color clashes. A better approach might be to use intensity of a single hue for the colormap, leaving visual space for addition information; i.e., the annotation. This is an example of layering and separation. Layering and separation implies using color or other differentiation to separate important classes of information. Maps are often very good examples of this technique.

1 + 1 = 3 or More

But "effective layering of information is often difficult... (because) an omnipresent, yet subtle, design issue is involved: the various elements collected together... interact, creating non-information patterns and texture." "Josef Albers described this visual effect as 1 + 1 = 3 or more."

For example, consider a single line -- a single graphic element. Now consider two parallel lines. Here we have at least three graphical elements, each of the lines and the space between them. This 1 + 1 = 3 or more effect is important, and "most of the time, that surplus visual activity is non-information, noise, and clutter." However, "the noise of 1 + 1 = 3 is directly proportional to the contrast in value (light/dark) between figure and ground. On white backgrounds, therefore, a varying range of lighter colors will minimize incidental clutter."

A particularly common example of 1 + 1 = 3 or more is boxes around text. "Unless deliberate obscurity is sought, avoid surrounding words by little boxes, which activate negative white spaces between word and box." Note that the box is non-data ink.

Color

"The fundamental uses of color in information design (are): to label, to measure, to represent or imitate reality, to enliven or decorate." Dr. Tufte provides a few specific guidelines on the use of color: Tufte turns to the Swiss cartographer, Eduard Imhof for additional insight: "Color itself is subtle and exacting. And, furthermore, the process of translating perceived color marks on paper into quantitative data residing in the viewer's mind is beset by uncertainties and complexities. These translations are nonlinear (thus gamma curves), often noisy and idiosyncratic, with plenty of differences in perception found among viewers (including several percent who are color-deficient)."

Miscellaneous Admonitions

Parting Shots

After a detailed analysis of sunspot diagrams developed over a few centuries, Tufte writes:
"Note all the different techniques for displaying sunspots during 380 years of data analysis -- from Galileo's first precious observation of the solar disks, to small multiple images, to dimensionality and data compression, and finally to micro/macro displays combining pattern and detail, average and variation. Exactly the same design strategies are found, again and again, in the work of those faced with a flood of data and images, as they scramble to reveal, within the limits of flatland, their detailed and complex information. These design strategies are surprisingly widespread, albeit little appreciated, and occur quite independently of the content of the data. "
"Graphical competence demands three quite different skills: the substantive, statistical, and artistic." Visualization competence requires no less.

References

Edward Tufte The Visual Display of Quantitative Information and Envisioning Information, Graphics Press, PO Box 430, Cheshire, CT 06410.