Visualizing Data

SDS 192: Introduction to Data Science

Lindsay Poirier
Statistical & Data Sciences, Smith College

Fall 2022

For Today

  • Lab debriefing
  • What is a data visualization?
  • Taxonomy of Data Visualizations
  • Visualization Conventions and Critiques
  • Work on Problem Solving Lab in Class

What is data visualization?

  • the translation of information into a graphical format
  • helps analysts summarize and identify patterns across large datasets
  • always involves critical judgment calls on the part of the designer

Elements of data graphics

  • visual cues/aesthetics
  • scale
  • context

Framework drawn from: Yau, Nathan. 2013. Data Points: Visualization That Means Something. 1st edition. Indianapolis, IN: Wiley.

Visual Cues

  • Where is the data positioned on the plot?
  • What is the length of shapes on the plot?
  • How large is the angle between vectors?
  • What shapes/symbols appear on the plot?
  • How much area do shapes take up on a plot?
  • How intense is the color presented on the plot?

What variables mapped onto what visual cues?

What variables mapped onto what visual cues?

** This is the last time you will see me use a pie chart in this class!

What variables mapped onto what visual cues?

What variables mapped onto what visual cues?

What variables mapped onto what visual cues?

What variables mapped onto what visual cues?

Scale

  • Linear: Numeric values are evenly spaced on axis.
  • Logarithmic: Numeric interval are spaced by a factor of the base of the logarithm.
  • Categorical: Categorical values are discretely placed on axis.
  • Ordinal: Categorical values are ordered on axis.
  • Percent: Percentages of a whole are evenly spaced on axis.
  • Time: Date/time values are placed on axis in years, months, days, hours, etc.

Examples

Context

In every plot you submit for this class, I will be looking for five pieces of context.

  • The data’s unit of observation
  • Variables represented on the plot
  • Filters applied to the data
  • Geographic context of the data
  • Temporal (date/time range) context of the date

Context

Data Visualization Conventions

  • Edward Tufte, American statistician sometimes considered “father of data visualization”
  • Introduced the concept of “graphical integrity”
  • How do we present data as honestly as possible?

Lie Factor

  • Lie Factor = (size of effect in graphic)/(size of effect in data)
  • Lie factor is greater when variations on a graph fail to match variations in data

Tufte, Visual Display of Quantitative Information

Inconsistent Scales

Example from callingbullshit.org

Presenting Data out of Context

Example from mediamatters.org

Disproportionate Data-to-Ink Ratio

  • Ensure that the ink used on the data match the amount of data presented
  • Data-to-ink ratio = (ink used to represent data)/(ink used to print graphic)
  • Should be as close as possible to 1
  • Another way to think about it: How much of this graph could I erase without losing data?

Disproportionate Data-to-Ink Ratio

Deviating from Norms

Example from callingbullshit.org

When can I break convention??