Module 3: Visualizations and Effectiveness (Sep 29-Oct 10)
We will look at strategies for implementing (creating) visualizations and how visualization can be used for exploring data (in addition to communicating stories with it). We will focus on implementation and scale. Students will practice creating and critiquing visualizations.
Introduction
This module, we’ll actually get to making visualizations, so we’ll focus on two aspects that are central: how to “implement” them (what tools to use) and how to deal with the problem that you probably have more stuff (data, complexity, etc.) than you can easily put in front of the viewer.
We’ll be working on making “exploratory” visualizations - quick visualizations designed to help us understand the data - so we can make better “final” visualizations later. This is a common use of visualization: we use it to get a sense of what is in the data, and then do further analysis or design to see if things are really worth talking about and then to present them well.
Implementation is a tricky topic to talk in the abstract - everyone is likely to use different tools (and the available tools are always changing), but it is hard to talk about tools in general without specific ones. What we’ll try to do is discuss kinds of tools, and then let you look for relevant ones that you might actually use.
We’ll also have a visitor for one of the lectures.
Summary
The parts of the assignment (detailed below) can be done any time during the 2 weeks of the module.
- Reading is in three parts: implementation, scalability, and practical issues. Some of the “reading” is interactive or video, and some of the reading you get to pick yourself.
- There are two design exercises, both to make visualizations with the data sets you saw last week. This will force you to “implement” visualizations (you can use tools like Tableau or program). The topics we are learning about are relevant.
- The content survey is meant to help you connect the different pieces in the module.
- The class survey helps us tune how things work.
Recommended Schedule
- Start early on the design exercises (do #1 first) - this way you can think about what tools you might want to use, and you can read about those tools for the readings.
- The lectures are designed to complement the readings - without an order dependency (there is a bit of redundancy). Either can go first.
- The practical readings should be helpful as you do the assignments.
- The design exercises are designed to be done in order.
- The seek and find will be more relevant after you do the readings (and lectures).
- The surveys are designed to be done at the end of the module after you’ve done the other parts.
Content Survey
While I recommend you do the content survey after completing other parts of the module, you should be aware of the questions (so you can think about them as you do the readings, listen in lecture, and do the design exercises).
Module Learning Outcomes (Goals)
- Select appropriate implementation strategies based on an awareness of a wide range of available approaches and tools.
- Understand the range of strategies for visualization implementation strategies.
- Understand the main tradeoffs in implementation strategies.
- Exposure to examples of the different types of tools.
- Understanding of the different levels of abstraction of visualization tools/toolkits
- Specific awareness of key strategies/tools: D3, grammars
- Describe scalability problems in a broad/abstract way and apply the basic strategies to achieve them.
- Practice making exploratory visualizations (quick visualization designed to expose interesting aspects of the data)
Readings
Unfortunately, I don’t have good readings on exploring with visualizations. The readings address two specific aspects: implementation and scalability. There is a third part to the readings: some practical advice that might help you with your design exercises.
Readings Part 1: Implementation
Reading about implementation is hard: everyone is likely to want to use a different tool, and for any tool, the best documentation is a moving target. What I really want to teach you is not any particular tool, but to give you a sense of what’s available and how you might choose amongst them. That’s what we’ll focus on in lecture.
In 2020, I had a guest lecturer for this topic: Prof. Dominik Moritz from CMU. Dominik was a central part of several of the systems/toolkits we’ll learn about. He gave an amazing survey that connected the key ideas from class (abstraction and encodings) to a range of implementation choices.
Remote guest lectures were an upside to online pandemic teaching. This year, you just get to watch the video.
- Dominick Moritz. Dominick Moritz's Guest Lecture. (video from CS765 2020 - in Kaltura Mediaspace). (video)
After that, I want you to read about 2 different visualization “toolkits” that might be relevant for you. I use the term “toolkit” to refer generically to libraries, packages, APIs, etc.
You should pick 2 things that you are likely to use. If you like to program in Python, pick 2 Python libraries. You may not pick MatPlotLib.
I’d like you to pick a “high-level” (in terms of level of abstraction) toolkit and a “low-level” one. The concept of level of abstraction should be clearer after the lecture. Of course, you might not know the level of abstraction until after you read about the toolkit. That’s OK - think about how the two relate to each other (“high/low level” can be relative).
If you need some ideas…
For a low level library, I recommend learning about VegaLite (or Altair, it’s Python binding). I recommend learning about it by going through the the first 2 “Chapters” of the UW Visualization Curriculum. (UW is the other UW, not us). I recommend that you watch the video first (its also linked in chapter 1). (Chapter 3 is optional, but recommended if you want to really understand or use the tool). Reading the technical paper for Vega-Lite gets at the ideas more directly.
- (optional - but recommended) Vega Lite Tutorial. UW Visualization Curriculum. (url) (video)
- (optional) Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, Jeffrey Heer. Vega-Lite: A Grammar of Interactive Graphics. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis '16), 2017. (web pdf) (url)
D3 is an important low level toolkit in terms of its adoption. It is a very common tool used to make visualizations for the web (in JavaScript).
If you need ideas for a high-level toolkit…
- Plot.ly - high level charting API for Python, R and JavaScript
- Bokeh - Python Graphing Library that provides high- and low-level control
- Seaborn - a python library that has lots of useful chart types
- HighCharts - a commercial (and industrial grade) graphing library. Not free, so you can’t really use it unless you work at a company. But interesting in the “why do people pay for this when there are free alternatives” sense.
If you want to learn about the “future” of toolkits, look at the Draco system. You should not pick this as one of your “practical” toolkits. But it gives a sense of where research is going. There are already several successors to it.
- (optional) Dominik Moritz, Chenglong Wang, Gregory Nelson, Halden Lin, Adam M. Smith, Bill Howe, Jeffrey Heer. Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco. IEEE Transactions on Visualization and Computer Graphics, (Proc InfoVis 2019), 25(1). (doi) (url)
Readings Part 2: Scalability
A big challenge in exploring is that we almost always have too much “stuff”. Even in the data sets you have to work with in this module, you will need some stratgies for working with more data than you can show at once. For now, we’ll learn some basic strategies to help think about what to do.
- (required) Michael Gleicher. Considerations for Visualizing Comparisons. IEEE Transactions on Visualization and Computer Graphics (Proc. InfoVis '17), 2018. (doi) (url) (Summary)
- For this one, the summary may be sufficient. But hopefully, that makes you want to read the whole paper. The scalability strategies (the second of the three threes) is the most applicable, but the whole thing should help you think about designing visualizations the way I like to think about it.
- (required) Tamara Munzner. Reduce Items and Dimensions. Chapter 13 from Munzner's Visualization Analysis & Design. (Canvas File) (video) (UW Library)
- (required) Tamara Munzner. Embed: Focus+Context. Chapter 14 from Munzner's Visualization Analysis & Design. (Canvas File) (video) (UW Library)
- (optional) Sarikaya, Gleicher and Szafir. Design Factors for Summary Visualization in Visual Analytics. Computer Graphics Forum 37(3) (Proceedings EuroVis 2018). (doi) (url)
- This will make more sense after reading the summary of the comparisons paper. It is a survey of examples of different ways visualizations create summaries.
Readings Part 3: Practical Help
Here are some readings from Enrico Bertini’s class that I think give some very practical help to connect the concepts we’re learning to the process of creating and exploring with visualizations. These are optional, but might be helpful.
- (optional) Enrico Bertini. Shape the Data, Shape the Thinking. Fell in Love With Data Substack Posting. (url)
- (optional) Enrico Bertini. Shape the Data, Shape the Thinking #1: Selection and Aggregation. Fell in Love With Data Substack Posting. (url)
- (optional) Enrico Bertini. Shape the Data, Shape the Thinking #2: Visualizing Statistical Aggregations. Fell in Love With Data Substack Posting. (url)
- (optional) Enrico Bertini. Shape the Data, Shape the Thinking #3: Data Filtering and its Visual Effects. Fell in Love With Data Substack Posting. (url)
- (optional) Enrico Bertini. Shape the Data, Shape the Thinking #4: Granularity and Visual Patterns. Fell in Love With Data Substack Posting. (url)
Titles are important (and required for your visualizations!). These might help you appreciate them (and make better ones):
- (optional) Enrico Bertini. Titles in Data Visualization: Empirical Evidence. Fell in Love With Data Substack Posting. (url)
- (optional) Enrico Bertini. Data Visualization Titles: A Taxonomy. Fell in Love With Data Substack Posting. (url)
Lecture Plan
- Monday 1 (Sep 29) - Implementation Overview
- Wednesday 1 (Oct 1) - “Too Much Stuff” - how to think about scalability. We’ll motivate the questions with an in class exercise
- Monday 2 (Oct 6) - “Guest Lecture” - Prof. Remco Chang from Tufts University will visit class. He has an in-class exercise he’ll do, and will talk about some of his work.
- Wednesday 2 (Oct 8) - More on Scalability. We might also do some critiques of old student assignments in preparation for upcoming design exercises.
Assignments
All assignments are due at the end of the module.