How to Vis
Contents
How to Visualize
This document serves three purposes – that are not independent:
- It will give you my philosophy about how to do visualization
- It will give you a sense of why the Vis class is the way that it is – and suggest what you might expect to learn over the course of the semester.
- If you aren’t going to take the class, it will give you a sense of what I might do if I try to help you with your visualization problems, or what you should try to do if you go at it on your own.
You might think of this as the whole semester condensed into a single blog posting. Or a crash course in visualization.
This is adapted from my usual first lecture in class – except that it doesn’t have as many fun example pictures. However, since lots of people don’t get to see that lecture, this tries to get the main point. And even if you do get to see the lecture, it’s a reminder of the steps of how to do visualization.
I was going to title this posting “All I ever really needed to know (to do Visualization) I (could have) learned in Tamara Munzner’s Nested Model Paper” – as a play on the title of the book “All I Ever Really Needed to Know I Learned in Kindergarten,” but I haven’t actually read that book, and the humor is lost if you haven’t heard of it. And explaining how the philosophy of the nested model morphs into my process below is a story not worth telling. The ideas of the nested model paper form the backbone of her book, which was also highly influential in my thinking (and the class).
There is another “What is this class and Why” posting that covers some of the same ideas, but focuses on how the ideas of visualization in this posting are taught in the class. This posting is a bit redundant with my typical first class lecture. This document is an update of my 2015 “How To Do Visualization” posting. It also has a bit of redundancy with the “What is this class and Why” post.
What is Viusalization Anyway?
It’s surprisingly hard to define visualization, but probably not important.
The more interesting question is to define what is a good visualization.
Here’s a rough definition of visualization that is purposefully broad, but surprisingly good enough:
Visualization: a picture(1) that helps someone(2) do something(3).
The picture part is hard actually, since a visualization may not be a picture in a traditional sense – it can be anything you look at. For example:
A physical object that you look at can be a visualization (like the blocks of snow or lego model). Or a visualization could be an animation, or some interactive things. You might argue that we should relax the “look at” and bring other senses to bear (e.g., auralization to communicate data via sounds). However, while there are similarities between vision and other senses, there are enough differences that I think its best to focus on visual things (things we see) for this discussion (and the class).
The important part of the definition is that it helps someone do something. What makes a picture a visualization is a sense of purpose: it’s going to be used for something.
The more specific term “data visualization” is challenging since almost any visualization has some data in it. I’m not going to try to define “data visualization”.
Example: Architectural visualization is often not considered data visualization. However, there is “data” (information about the building being visualized). It is a visualization since it is a picture (or an interactive system that generates pictures) that help someone do something, in this case an architect or their client get a sense of what a designed building will look like before it is actually built.
A good thing about my definition of visualization is that it focuses on this sense of “task” – the picture is meant to do something, so we should think about what it is trying to do to make sure it really can help someone do the thing its meant to do.
The definition doesn’t necessary say that the visualization succeeds at helping someone do something. We can certainly have bad visualizations that don’t help. Effective visualizations are pictures that really do help their intended audience achieve the task.
Making visualizations isn’t hard. Making good visualizations is hard.
Aside: In Munzner’s book, she defines visualization as being “designed to be effective.” In my mind, she is defining good visualizations – bad visualizations might not be effective, or might not be designed.
The goal of this document/class is to teach you how to design/create good visualizations. With the emphasis on the “good” part – making bad visualizations doesn’t have to be hard, and is probably not worth the effort.
There are a few messages in this:
- A core of this class will be understanding what makes for a good visualization, and what we can do to design them.
- Figuring out what good visualization to make (designing it) is important, we don’t want to waste our time implementing bad visualizations.
- Understanding the principles and process of visualization can help us figure out what visualizations will be good before we invest too much energy in making them.
- Generating ideas for visualization and making sure they are good (and will lead to good designs when they are fully implemented) is my preferred approach. Finding ways to “prototype” ideas so we can assess them before investing too much energy is important.
- Implementating the design once you have it is not a focus in this class. It is a detail. A sometimes challenging detail. And it is definitely a practical concern: a great design isn’t of much value if you can’t make it real.
Implementations can take many forms. I’m not going to suggest you make snow sculptures (like the pictures above), but maybe prototyping with Legos (picture above) is a way to try things out. Your choice of implementation strategy is almost always dictated by practical issues (where you need to show your visualization, what tools are available, …). The appropriate tools change quickly. The principles of choosing what to make with them do not.
A side effect of this is that we are not going to focus on programming. In fact, later in this document, I’ll argue that the goal should be to avoid programming. If you can get by with existing tools, you should.
What are good visualizations?
To make a good visualization, we need to decide what a good visualization is. And then we can consider a process to make them.
Defining “good” visualizations will be a major topic in this class. Evaluation considers how we decide if a visualization is good or not. At a high level, the definition of visualization provides an answer:
A good visualization is one that effectively serves it intended purpose (helping the audience do the thing the visualization was meant to help them do).
Exactly how to measure whether a visualization does what it needs to do is more challenging, and is a topic we’ll come back to.
One important and useful technique to assess a visualization (or just about anything) is critique. Critique is the “standard” design practice of looking at something carefully and discussing it. Critique is a really useful assessment approach because we can apply it to existing designs (e.g. created by others) to learn from them, or our own ideas. It can be applied to finished designs, or rough ideas. Learning to critique is a valuable skill for all design – and it’s not something that it typically focused on in CS education.
Critique will be a key technique in this class.
Teaching students to do critique (via lots of practice) is a key component of the class.
Note that a good visualization doesn’t have to been fancy – it has to be effective / get the job done. In fact, using a standard design is often desirable: you don’t need to teach people how to use a new design, and you can probably find an existing implementation.
Here’s my favorite analogy. You go to the doctor’s office because you feel sick. The last thing you want to hear is “that’s a novel and interesting problem! we need to devise a novel treatment. let’s write a grant proposal and hire some research assistants…” No, you want to hear “I’ve seen that before. No problem. Take two aspirin and see me in the morning.”
As visualization practitioners, our goal is to be able to look at a problem and make those kinds of prescriptions. The task identification and abstraction are key here. It’s how we can say “I’ve seen that before” and get to “take two scatterplots and see me in the morning.”
How to make a good Visualization?
Here is my three step recipe:
1. Why are you making this visualization? Who are you trying to help? What are you trying to help them do? I refer to the latter as the “task” – and it’s usually more important than the who part.
2. What data are you trying to use to achieve this task?
3. How are you going to use the data to help achieve the task?
I split question 3 into two parts. There’s a planning part, and a part where you realize that plan. Which leads to the four step recipe.
- Task
- Data / Resources
- Design
- Details
In the ideal world, you start at the top, and work your way down through the list.
The steps are iterative: at the end of each step (ideally) you do some evaluation (e.g., critique) and maybe go back to a previous step.
Sometimes the steps don’t happen in order. For example, you really want to use a particular tool, try out a new algorithm, or make things a particular color, so you go looking for something to make with these details.
Sometimes, the process seems to start with #2 (Data): one gets some data and needs to figure out what to do with it. But this is actually an initial task: find what is interesting in the data. Often there is an iterative cycle – as the designer understands the data more, they can refine the task.
In a little more detail
- Task – understand what the purpose of the visualization. Who is it meant to help? What is it meant to help them do?
- Data – what resources are available to help achieve the task? The main thing is data.
- Design – what is the strategy for mapping the data into something visual?
- Details – how will you make this strategy into a specific picture / system that produces pictures? What are the specific choices (e.g., colors, implementation, …)
You may notice that this parallels Tamara Munzner’s nested model for validation. (It’s discussed in her book, but was a great paper first) I think in terms of visualization design, not just validation (but evaluation is so important to design that it might not matter), so I changed the layers a bit.
If all goes according to plan, you’ll understand these 4 steps in the first few weeks of class.
How do we think about tasks and data?
Visualizations help someone do something for some reason. (who, what why).
The better that you understand what the visualization is trying to achieve (what will it help the person do), the more likely you will come up with a good solution. In the end, everything serves the tasks.
Note the plural: you may have a set of tasks. Often, there isn’t just one at a time. There are a set of things that a set of someones may want to do for a set of reasons. And maybe your solution will address many of these.
I was going to say “it starts with the tasks,” but sometimes you start someplace else (like you have some data and say “I’d like to do something with it” – but even then, I would probably say you have a task: figure out what the right questions to ask are!). However, in those cases, it’s really important to remember that task is key: the sooner you get to “what is this thing going to do for someone,” the better off you are.
This is also not to say that you need to fully understand the task at the beginning. Sometimes, your understanding of the task is hazy, or changes as you learn more (from later stages).
Task is an informal, fuzzy notion. It doesn’t always get explicitly written down or defined. But the clearer you are about it, the better off everything else will be. You can’t succeed unless you have something to succeed at.
One other detail on task: there is a range of kinds of tasks. There are abstract tasks and concrete application tasks. This is actually a spectrum/continuum.
While task is the most central thing, it’s also hard to talk about. We lack good, rigorous ways to talk about it. For the longest time, it meant that it didn’t get discussed enough (in the literature, in my class, in my work, …). The fact that it is hard shouldn’t get in the way of us trying to get better at thinking about it. We particularly lack good ways to talk about different levels of task abstraction.
Where I start…
When I talk to a new (potential) domain collaborator, I always start with the the question “tell me about your science.” I want to know the big picture (the why) – because without it, it’s hard to have context.
My first goal is to identify the problem that needs to be solved – it won’t help anyone if we solve the wrong problem.
Usually people come thinking they want specific help – they want to start with the data, or worse, with the way they are looking at their data (can you make a better chart for me? not without understanding what you are trying to do, so I know what “better” means!) We will get to that, but I think its important to identify the task.
I’ll stress this: if you want to be a visualization scientist (or more generally, a data scientist or computer scientist), one of the best skills you can have is to be able to help people identify their problems. I think it’s hard for people to identify their problems. Part of this is that people get so caught up in the details, that they lose sight of the big picture. Or that they are so set in how they do things that they lose the ability to imagine alternatives.
And, as computer scientists (and/or mathematicians), we have a secret weapon: abstraction. This is something that we value/stress much more than other disciplines. For this task phase of visualization, abstraction is a key tool. If we can recognize the abstract task for which the real problem is an instance of, the path to solving it becomes much clearer.
How do we make a design?
A design is the plan for how you are going to turn the data into a “picture” that helps with the task. This is why it’s so important to understand task and data before trying to make a design.
One you know your task and your data, you can try to design a solution. I say “design” to explicitly separate the act of coming up with the idea and actually building it (implementation). Design is the act of making conscious choices to solve a problem. (Defining design is a whole philosophical debate – but that definition is one I like, and will work with for the moment)
In terms of the class, a big part of what we’ll do is focus on design. What are the choices you can make, and how can you make good choices.
There are four main categories of things that we consider in designing a visualization. You can think of these as the kinds of choices you can make, or the kinds of building blocks you can build a visualization out of. I sometimes think of these like moves in a turn-based game, at each step I pick one of these things to either add (or change, if I am doing redesign).
- Data Transformations – we compute some derived thing about the data that will be useful in one of the other steps
- Layout – we decide where things go. Technically, this is a position encoding (see encodings below), but position is such an important thing, it gets it’s own special category.
- Encodings – an encoding is how we choose to map a data variable to some “visual variable” (an attribute of what we see – like color). Position is a visual variable, but it’s special enough that it becomes its own category (see layout).
- Interaction – taking user input is another thing you can do in a visualization. Often, input can be thought of as mapping input actions to changes in the visualization.
For a simple example of applying these four steps see “A Simple Example: 4 Design Moves.”
Almost everything we do in designing a visualization turns out to be making one of those 4 kinds of choices. Almost every visualization can be thought of in terms of these 4 building blocks.
I find this list to be a useful way to organize the larger list of more specific things you might do. Most things fit into one category or another. I won’t waste time arguing this is the best categorization – but it’s good enough to give you a sense of the kinds of things that you can think about.
We’ll learn how to choose these different components, and use them together. We will look at visualizations and try to understand them in terms of these four components. We’ll think about redesigning visualizations by changing the choices. We’ll try to develop a sense of how to map tasks and user goals onto these kinds of choices.
How do we make good choices for design?
Creating a visualization is about making those choices for a design so that the result is effective for the task… but how can you choose wisely?
Part of it is trial and error. Sorry. But, this is why we emphasize prototype and critique so much.
But there are things we can use that can hopefully help us make better choices. Some examples (which are, of course, things we’ll study in class):
- Principles of Design – General ideas on how to make things that are “nice” visually and communicate effectively. These principles are the same if you’re designing a visualization, a web page, your resume, … – so they are good principles to learn!
- Principles of Visualization – Over time, people in the field have gotten some ideas about what works and what doesn’t. Sometimes, this folklore is made up and may not be true. Other times, it comes from experience or has been proven by experiments.
- Principles of Perception – Understanding how people see (as in how the visual system works and how the brain interprets images) provides a lot of useful clues as to what designs will (and won’t) work.
- Examples – Looking at existing examples – both good and bad – can help us. Sometimes, we can gain intuitions so we can make new designs. Other times, standard solutions provide us with answers, or at least a starting point.
But what about implementation?
Actually realizing the design is the last part. Well, not really, since usually the process of making a visualization is iterative: once you make something, you learn from it, and refine some of your earlier work, and try again.
If you were thinking “this is a CS class, we should focus on implementation,” you will be disappointed. As I’ve said, this class is more about how to figure out what the right picture to make is (e.g. the design) than how to make it. It’s a waste of energy to spend time making the wrong picture.
In the ideal world, you can think about implementation last – it’s an afterthought. In practice, the constraints of having to implement things will probably influence the kinds of designs you will want to consider. A design becomes less attractive if its too hard to build. In practice, there’s often a tradeoff between the practical issues of implementation and having the best design.
Even within implementation, there is a spectrum of levels. I like to think of this as “fidelity of prototypes.” In a sense, you can think of a back-of-the-napkin sketch as an implementation of a design. Most likely an incomplete, non-final one, but an concrete instantiation. It might be a good enough implementation that you can evaluate your design and decide if you want to pursue the design further (and make a higher-fidelity prototype). If you’re lucky, a crude prototype might just solve the actual problem.
One thing I like to stress is the importance of prototyping to explore designs. It’s best to try out lots of ideas, and see if you can figure out their problems before investing a lot in implementing them. Good “Designers” (graphic designers, industrial designers, …) usually like to explore an entire space of designs – by using very crude “implementations” (e.g. sketches).
Data analysis tools – things like Excel (yes, excel will turn out to be my favorite visualization tools) or Tableau or … – often let you prototype lots of different things with your data. This “playing” with data – re-ordering it, making various kinds of pictures with it, looking at it all kinds of different ways – is actually a form of rapid prototyping. You can explore a lot of designs easily – often to decide that they don’t solve your problem – but sometimes to see that some of the simple elements actually can help. This “playing with data” (if you can do it) is a lot like sketching a lot of visual designs.
Having a good toolbox so that you can implement your designs is useful. If you don’t have one, you will be limited in what designs you can explore, and won’t be able to choose designs that you can’t realize (that’s not quite true: if you can come up with a great design, you may be able to get someone else to implement it). Part of my premise for this class (or at least this instantiation of it) is that we can all have different toolboxes – some students might be wizard programmers, some might be fabulous artists – but we all can have some common basic tools (e.g. sketching), and we can all explore designs using out respective toolboxes.
Now, if you’re saying “but I want visualization to be about writing fancy programs using complex data analysis methods and algorithms and spiffy programming things …” let me give you a bit of caution.
Building a custom visualization solution by programming should be a last resort. You should really believe that your problem cannot be solved by some easier method. Going back to the medical analogy, writing a program for a new design is like inventing a completely new (and therefore untested) treatment. Yes, if your patient has a mysterious disease and is going to die you want to take these drastic measures. Or, you might do an experiment if you believe that you can afford the risk on this patient in order to learn something to save the next ones (this is the excuse we use as researchers).
That said, all too often there are other factors that make us want to take the extreme measure. Sometimes, we just want to practice our inventive skills. Sometimes our “customers” think they want to have something novel (don’t make it look too easy!). Sometimes we really want to try out some implementation idea, or show off some challenging design idea. And sometimes, it might just be easier to re-implement a standard design than to figure out how to make an “easy” tool do what we want. (you’d be amazed how often I’ve found myself writing Python code for scatterplots because I wasn’t in the mood to wrestle with Excel). Sometimes, it’s hard to find a decent “easy” tool for something that should be easy (like graph layout).
So what do we do?
After that, you can guess what the topics of the class should be. But, to be explicit, here’s a list of what we did in the Spring of 2017 (this year we’ll follow a similar, but not identical, plan).
- (2017:2) Understand What is Visualization
We will try to get a better sense of the broad range of what visualization is. - (2017:3) Understand why to use visualization
This gets at that notion of task, and sets up for the notion of evaluation - (2017:4) Discuss strategies for Evaluation
There are many different ways to assess if a visualization is good. And since good visualization is our goal, knowing how to measure “good” will be important. - (2017:4?) Critique Skills
Critique is one method for evaluation – that we will use extensively throughout class.
I believe that it is best learned by practice, so we will do it a lot. But at the beginning of class we’ll take some time to develop critique skills. - Design School
I can’t teach 4 years of an art degree into one lecture. But we can’t try to get to be a little better at doing design. - (2017:5) Understand Data and Task Abstractions
Abstraction is the key way that we use to talk about tasks and data. - (2017:?) Visualization Principles
There are some basic good ideas - (2017:6) Understanding Encodings and Standard Designs
Encodings are the basic building blocks of visualizations.
We’ll use these as a way to look at how the standard designs can be broken apart. - (2017:7) Understanding Perception
A good source of principles and design ideas is from the science of perception.
We’ll try to learn a bit about how we see, and what this might mean in terms of the design of visualizations - (2017:8) Color
How we see color, and what this means in terms of how to use color effectively in visualization will turn out to be a big topic. - (2017:9) Interaction
Interaction is a key tool in creating effective visualizations. We’ll try to understand how to use it effectively. Interaction is (often) tricky to implement and prototype. - (2017:11) Implementation
We’ll talk about basic strategies for implementation – types of tools and how to choose. We’ll talk about some specific tools as examples, but more in terms of understanding the kinds of tools and toolkits (and how to choose between them) than the specifics of any particular tool. - (2017:?) Implementation: Specific Tools in Depth
I hate to spend much time on any specific tool.
But inevitably, you may want to make a visualization with something more than a sketch, so getting exposed to some common tools is helpful. Even if you choose to use different ones. - (2017:12) Multi-Variate
We’ll look at common strategies for the challenging cases of multi-variate data. - (2017:13) Dealing with Scale
Having “too much data” is a common reason why visualization becomes hard.
We’ll look at some common approaches to deal with it. - (2017:10) Graph and Network Data
Graphs (in the CS/Math sense of a network of connected things) are an common kind of data, and offer some important challenges. - (2017:15) Scientific Visualization
There are some common types of data that come up in science and engineering applications. - (2017:15) 3D
- Presentations – sadly, we usually run out of time before getting to this
- Animation – sadly, we usually run out of time before getting to this