Module 5 Visualizing data

This course will focus heavily upon visualizing data in plots, maps, and dashboards. If there is anything you take from this course, it will be this: you will be able to take data and make some pretty pictures. And that’s not trivial.

Why? Because humans are wired to process information through pictures. We can translate images into meaning with amazing speed.

The value of data viz

This screenshot, from David McCandless’s TED talk about the beauty of data visualization, depicts how quickly each of our senses can process information.

 

 

This plot’s punchline: When we process data with our eyes – with pictures – we can take in a lot of information all at once.

Let’s try this out with an example. Below this paragraph is another paragraph describing a painting. Try this: scroll down quickly, look at this paragraph for just one second, then keep scrolling until it is out of view.

Go!

One-second paragraph:

Fog rises from the evergreen forest of a distance mountain range. A whitewater creek cascades down a streambed with large, rounded boulders, arriving at a broad flatwater pool where ducks are milling. There is one group of four and another group of two. On the shore near the ducks, a wooden dinghy is tied up to a small dock with eight pilings. The dock lead to a path through more round peddles and tall grass, past a chair and a fire ring, and continues uphill to a small cabin. The evening sun and sparse fairweather clouds are reflected in the cabin’s large, multi-paned windows under the small front porch. A cobblestone chimney on the side of the cabin has a whisp of smoke rising from it. The steep roof implies that this cabin is designed to withstand heavy snow. Tall evergreens tower over the diminutive cabin; the cabin seems to be placed up against a forested hillside. There are only a few deciduous trees in view, and their leaf colors – combined with the lack of snow in the distant mountains – imply that the time of year is early fall.

End of paragraph.

OK. Now try to answer these simple questions:

  • What was this a painting of, in general? Can you describe the scene?

  • What details do you recall?

OK. Try this next: the actual painting is at the very end of this chapter. Scroll down to it quickly, look at this painting for just one second, then scroll back up to this spot in the module.

Ready? Go!

 

< Return to this line! >

Now try to answer those same questions above. What was this a painting of? Did you catch any more details? Was there anything in the water? Was there smoke coming out of the chimney? What time of day was it?

Which type of visual information was easier for you to process quickly? Text, or a picture?

Think about the profound differences in these two forms of visual communication:

When we read text, we are working outward, from individual details to the big picture: we process each individual word, understand their individual meanings, understand their meanings in the context of each individual sentence, then use all of the information to step back and imagine the scene based on the details.

In contrast, when we look at a picture, we are working inward, from the big picture down to the details. We understand the scene first, then we start exploring the finer points. And, since each finer point is interpreted from within the context of the bigger picture, we can make sense of the details much more efficiently.

Pictures communicate data. This is why data science and data visualization nearly always go hand in hand.

Data scientists use visualizations both to communicate their insights externally, e.g., to the public in a Twitter post, but also internally: when they are working with the data themselves. A data scientist’s workflow is peppered with data visualization, because – again – visualizing your data is the most effective way of making sense of it: Download the data, then visualize it. Do something to the data, then visualize what you’ve done. Repeat, then visualize, then repeat again.

The point here is that great data visualizations are not simply pretty. Much more importantly, they are effective too. They are the best means you have of conveying insights from your data to someone else.

A final thought: Keep in mind that plots can be effective and misleading at the same time. There is a politics to plots and maps; they can have agendas, and they can manipulate viewers into interpreting the data in certain ways. So, it is incomplete for us to say simply that a good plot is an effective plot. Here’s a better definition: a good plot is one that is both effective and fair.

So, when you are viewing other people’s plots and making plots of your own, keep these five rules in mind:

  1. A bad plot is an ineffective one, even if it is beautiful.

  2. A good plot is an effective plot.

  3. A great plot is one that is both effective and beautiful.

  4. If you ever have to make a trade-off between effectiveness and beauty, sacrifice beauty.

  5. Any plot that misleads or manipulates the viewer is bad, no matter how effective or beautiful it is.

Before you begin evaluating the plots in the gallery below, enjoy this excellent talk by the Egyptian data scientist, David McCandless, about the beauty of data visualization (link here).

 

Chart junk

A few times already, we have referred to the concept of chart junk. This refers to the idea that the best plots are the ones that minimize the ink-to-data ratio. In other words, there should be no extraneous or unnecessary ink on your plot.

The chart junk principle applies to both graphical and tabular representations of data. Which of these tables is easier to read?

 

 

Which of these diagrams is easier to read?

 

 

And what about these basic scatterplots? Which is more effective and elegant?

 

Final thoughts

To get a sense of what can be done with data visualization – and just how enthusiastic data scientists can get about data viz – enjoy this video by the Swedish epidemiologist Hans Rosling, the pioneer of interactive data visualization (link here).

 

 

As you go down the rabbit hole of data visualization, it will be important to become familiar with the work of Edward Tufte, the grandfather of thinking about data viz as an art form. This video showcases Tufte and other data scientists who have been inspired by his work (link here).

 

 

Finally, this video offers a nice and concise summary of Edward Tufte’s principles of data visualization (link here).