Scatterplots as an interdisciplinary communication tool

Community member post by Erin Walsh

erin-walsh
Erin Walsh (biography)

Scatterplots are used in many disciplines, which makes them useful for communicating across disciplines. They are also common in newspapers, online media and elsewhere as a tool to communicate research results to stakeholders, ranging from policy makers to the general public. What makes a good scatterplot? Why do scatterplots work? What do you need to watch out for in using scatterplots to communicate across disciplines and to stakeholders?

What makes a good scatterplot?

In his 1983 magnum opus, The Visual Display of Quantitative Information, statistician Edward Tufte outlined nine principles of excellence and integrity in data visualisation:

  1. Show the data
  2. Induce the viewer to think about the substance rather than about methodology, graphic design, the technology of the graphic production or something else
  3. Avoid distorting what the data have to say
  4. Present many numbers in a small space
  5. Make large datasets coherent
  6. Encourage the eye to compare different pieces of data
  7. Reveal the data at several levels of detail, from a broad overview to a fine structure
  8. Serve a reasonably clear purpose: description, exploration, tabulation or decoration
  9. Be closely integrated with the statistical verbal descriptions of a dataset.

Noting “Graphics reveal data” (1983: 13), Tufte presented the classic case of Anscombe’s Quartet (Anscombe 1973) as an example of successful application of these principles. X, Y, and the relationship between X and Y in Anscombe’s four datasets are numerically indistinguishable (sharing a mean, variance, and correlation). Viewed as pure numbers, it is difficult to see any difference between the sets:

(Data generated by Erin Walsh in accordance with Anscombe’s Quartet (Anscombe 1973))

Striking differences become immediately obvious once they are displayed as scatterplots.

(Source: Erin Walsh)

This demonstrates the importance of data visualisation in a broad sense, and more specifically shows the power of the commonplace scatterplot.

Emerging late in nineteenth century, scatterplots are ubiquitous in the modern data visualisation landscape. Whether a simple monochrome display with two axes, or enhanced through colour, interactivity, motion, or the addition of a third dimension, scatterplots are in widespread use.

Why do scatterplots work?

So, what makes scatterplots so versatile? Scatterplots are remarkably accessible because their interpretation leverages the universal human capacity for pattern recognition. Apophenia is the unprompted awareness of connections and meaningfulness of phenomena.

Such heuristics are evolutionarily vital for making sense of ever-changing complex visual input that may represent important predator, prey or social interaction information. A more subtle, but equally pervasive example of apophenia is the tendency to connect points to find lines, trends, and patterns. Scatterplots convey perceptually simple information, points within a field, which is straightforward to encapsulate neutrally and perceptually. The combination of perceptual simplicity and bootstrapping of apophenic tendencies provide what appears to even lay viewers as conceptual simplicity and straightforward meaning extraction. This underlies the scatterplot’s appeal for conveying knowledge both within, across and beyond disciplinary boundaries.

What do you need to watch out for in using scatterplots to communicate across disciplines and to stakeholders?

  • For cross-disciplinary communication:
    • Be aware of differences in conventions that underpin the data or topic (eg., in chemistry beta means something very different from beta in psychology).
  • In the context of a single plot:
    • Try to always keep Tufte’s principles of excellence and integrity in data visualisation in mind.
    • Give yourself time to properly generate the plot (too many people leave it to the last-minute).
    • Honest mistakes:
      • Too much data/overcrowding points.
      • Trying to say too much at once (multiple groups denoted by size and shape and colour…).
      • Too little (poor axis labels) or too much (caption takes more space than the figure) context.
    • Signs of nefarious intent:
      • Truncated axes without disclosure.
      • Aspect ratio distorted to exaggerate trends.
      • Plotting things which don’t make sense.
  • In the context of the larger communication, if multiple plots:
    • Use a consistent aesthetic across plots (so the eye focuses on meaning, not wondering why the fonts on the axes are different, or the colour scheme has changed).
    • Don’t use too many plots (only important things need a figure; nobody will properly read 10+).

When have you found scatterplots helpful for either obtaining or sharing knowledge? Are there circumstances where they got in the way of information exchange?

References:
Anscombe, F. J. (1973). Graphs in Statistical Analysis. The American Statistician, 27: 17-21

Tufte, E. and Graves-Morris, P. (1983). The visual display of quantitative information. Graphics Press: Connecticut, United States of America.

Biography: Erin Walsh PhD is a postdoctoral fellow at the Centre for Research on Ageing, Health and Wellbeing, Research School of Population Health, The Australian National University in Canberra, Australia. She is also a freelance scientific illustrator with over ten years of experience converting scientific ideas, data, and excitement into visual form. Her primary research interest is the impact of blood glucose on the ageing brain, which she investigates with an eclectic cross-disciplinary range of concepts and statistical techniques, spanning the fields of animal biology, psychology, geography, computer science and population health.

Erin Walsh is a member of blog partner PopHealthXchange, which is in the Research School of Population Health at The Australian National University.

One thought on “Scatterplots as an interdisciplinary communication tool

  1. It was a pleasure to read Erin’s post about the scatterplot – so fundamental it is introduced to school children as young as ten, so powerful to change people’s view of the world. I’m thinking specifically of the Gapminder phenomenon here, where the “200 years that changed the world” notion has been used so successfully to challenge people’s thinking about the make-up of the developed and developing world and where nations are headed.

    Friendly and Denis have written an authoritative history of the scatterplot, and they note that the term “scatterplot” has been around since about the early 1900s, and was certainly known by the British biostatistician Karl Pearson. Its ability to display the relationship between two quantitative variables at once compared to say, the single variable presentations of bar chart, pie chart and line graph, may be one of the reasons for its coming late to the party in terms of statistical graphics. But we are so glad it came! What a difference it has made to our ability to visualise numerical relationships.

    References
    Friendly M, Denis D. The early origins and development of the scatterplot. Journal of the History of the Behavioural Sciences 2005; 41: 103 – 130. DOI 10.1002 /jhbs.20078

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.