What Is This Class and Why?
This class is a principles-and-design-oriented exploration of Data Visualization. This posting will explain the content, style and rationale for the class. The class is different from most other CS classes.
<!-more–>
Summary
This class focuses on the principles of Visualization, with a focus on design. It does not focus on implementation. If you want to learn about implementation, you may be disappointed.
The class focuses on the basics/foundations, not the fancy stuff.
To learn, we’ll do a lot of small activities. There will be a lot of reading and writing (for a CS class). We’ll do many small design exercises (in class and at home). We’ll practice “critique” (looking carefully at examples). The small assignments will build into bigger “mini-projects”, but this isn’t a “single big project class”.
Backstory (Why you should read this)
Why do I feel compelled to write this? (and why is it worth your time to read it?)
This class is different than most other CS classes. I want to set expectations in the right place. If you (a student) are expecting something else, you may be disappointed. This class might not be for you.
I want to explain the rationale for why this class is the way that it is. There is a method to the madness. I believe that if students appreciate why I am teaching the way I am teaching, they are more likely to trust me and learn.
This is a way to introduce the components of the class. The class has a lot of small pieces.
This is actually a way for me to make some of the foundational (content) points of the class. Subversively, there is some actual content here.
Each past class, I’ve written a “what is this class and why” (see the 2022 version). To the point that I made a point that it was consistent. But this year, I decided to revise it because the message was getting muted.
What is Data Visualization?
Hopefully, you have some idea what data visualization is - otherwise, why would you consider taking a class about it?
I define visualization in a very broad way (I’ve moved that to a separate document). My definition is:
Visualization: a picture(*) that helps someone do something(**).
Short version (I’ll have a lot more to say about this - see What is Visualization? for a start): By picture(*), I mean something to look at; it could be a figure in a paper, an image on a web page, an interactive gadget on a dashboard, a video, or even a sculpture. The “helps someone do something”(**) gets at the notion that there needs to be a task or purpose that we are trying to achieve. If there’s no task, it’s art, not vis.
Maybe I’ll give a more conventional definition for the purposes of getting started…
Data Visualization is the practice of creating images that communicate information to achieve a goal.
These definitions imply that the important thing is that the “pictures” we make communicate effectively (to meet their goal or help the viewer achieve their task). The emphasis is on communicate effectively.
The things you probably think of as data visualizations (bar charts, line graphs, chloropleth maps, …) can fit these definitions. The trick is that for them to be “good” they need to communicate effectively.
So What Is This Class?
Here’s an oversimplified version (from the advertisements from old editions of the class):
This class is more about what pictures to make to understand data than about how to make them. We will spend a lot of time understanding design principles. We will not spend lots of time talking about how to program visualizations, or how to use tools to make visualizations.
A point to emphasize: this class is not about implementation. The goal is to help you figure out what the right picture is. The details of how to make the picture are, well, details.
This class is about visualization principles - how visualizations “work” (communicate effectively). We start with thinking in terms of the goals (what we’re trying to communicate). We’ll learn how to think about how visualizations are assembled from basic building blocks that will allow us to understand what they make easy to see. We’ll look at common patterns for addressing problems like scale.
To teach this, the class is more like an art/design class than a typical computer science class. We’ll read and discuss. We’ll do critique - look at examples carefully to learn from them. We’ll practice with small exercises (often pen and paper). We’ll also try some slightly larger “mini-projects”.
This class involves lectures, in-class design exercises (pen and paper), readings, on-line discussion, and some “Design Challenges” (or mini-projects). If you think that a graduate CS class should be a big programming project and not much else, this class is not for you.
In the past, I’ve tried to move away from traditional lectures (Professor monologues) to use class time for in-class exercises where we work together to practice. This year, that might not be possible because we’re stuck in an inappropriate room (see below).
What this class is not:
We won’t spend a lot of time talking about “computational issues” or implementation. There are “design challenges” where you will make visualizations (possibly by programming), but this isn’t the focus of the class.
We will not spend lots of time looking at specific visualization tools or implementation methods (e.g., libraries or frameworks). We will point you at some to help you learn about the concepts, but mastering a particular tool is not a learning goal of the class.
We do not let students “bring their own data” to class. This class isn’t just to help you solve the current visualization problem you have. It’s to teach you the principles you need to design a solution to problems in the future.
Why focus on “Visualization Principles” not implementation?
The principles of good visualization apply for everyone. For each person, the appropriate tools and development process may be different.
The principles of good visualization are constant and unchanging (although, our understanding of them is improving). The tools for implementing visualizations change continually.
One of the premises of this class is that implementation is a detail (I will explain this in class). Implementation may be a big and important detail, but it is just a detail - it doesn’t matter if you don’t get the design right. One lesson in the class: don’t bother solving the wrong problem well. It’s a waste of time to make a great implementation of an inappropriate or ineffective design.
The skills for thinking about visualization principles (design, critique, task-oriented analysis, abstraction, …) are generally useful for many things. The skills for implementing visualizations are pretty specific.
Over the years, I think I’ve learned to teach people the principles. For many of the implementation skills (e.g., web development), many of you either already know a lot more than me (i.e., you are up-to-date web developers) or aren’t at a place on the learning curve where a class like this will help (i.e., you need to learn basic web programming skills first).
Why focus on Basics/Foundations and not Fancy Stuff?
First, a lot of the best cutting edge research is building a better understanding of the foundations.
Second, understanding the foundations is the basis for doing fancy stuff.
Third, and maybe most important, effectively using the basic stuff and foundations is usually what you need. Fancy stuff should be a last resort!
Visualization’s goal is to solve people’s problems. Sometimes, that requires inventing a novel and complicated visualization. Other times, it might mean applying some simple, off-the-shelf solution.
Here is my favorite analogy. You go to the doctor’s office because you feel sick. The last thing you want to hear is “That’s a novel and interesting problem! We need to devise a novel treatment. Let’s write a grant proposal and hire some research assistants…” No, you want to hear “I’ve seen that before. No problem. Take two aspirin and see me in the morning.”
As visualization practitioners, our goal is to be able to look at a problem and make those kinds of prescriptions. The foundations (e.g., task and data abstractions) are key here. It’s how we can say “I’ve seen that before” and get to “take two scatterplots and see me in the morning.”
A standard design (like a scatterplot or line chart) can be really effective in many situations. And if a standard design can be effective, there are lots of good reasons to prefer a standard design. For example, they are familiar to the viewer and you probably don’t need to reinvent the implementation. The key is to be able to identify when a standard design is effective and how to use it appropriately. You’ll need to make similar choices in inventing a novel design.
This class is, admittedly, not for everyone
This class will teach you how to design good visualizations, and will teach you a little about the choices you have in how to make them. If you learn how to use some tools for making visualizations (Tableau, D3, R, matplotlib, etc.) you will know what to make with those tools!
There are good resources for learning about specific tools. And in class, you may meet others who are also trying to learn these tools as well.
The focus on the non-technical elements (design, perception, design methodologies, etc.) makes this different than the standard CS class.
If you dislike classes where you have to attend class and do lots of small regular assignments each week, this class may not be for you.
But, I don’t care about general principles – I just want to visualize my data
Sorry, we generally don’t let students “bring their own data” to class, for a number of reasons.
The principles you will learn will help you work with your own data – but for learning in class, it’s best that we work on data that everyone has access to, and that we believe is the right level of challenge.
What are we going to do?
We’ll learn about visualization by:
- Reading (especially to learn about the principles).
- Discussing (often written, for practical reasons) - since this is the best way to force students to think about what they’ve heard/read.
- Critiquing - this means to look at examples and try to learn from them.
- Practice - we’ll actually try to make visualizations. This will range from quick in-class exercises (with pen and paper), to longer design challenges where you will be expected to make things yourself.
You can look at the course web to get a sense of what’s planned (although, not everything is in place yet). You can look at the web from the previous class offering to get a sense of what this class has been like, but be warned…
What’s different this year?
This year, some things will be different than previous years.
There are three driving factors:
The room: For the past several years, I taught the class in a “collaboration classroom” that enabled us to use class time for collaborative work. It was great for in-class assignments, and not-so-great for traditional lectures (which discourged me from doing them). This year, we are in a traditional lecture room with fixed seating. (We’ll see how it works for in-class group work)
The students: Obviously, each class is a different set of students. But this year, there may be more of you (in the past we were limited by a smaller room). Historically, we had a large range of backgrounds in class; but this year it is likely to be mainly CS (or CS-adjacent) students.
Me: I am coming back from a sabbatical. Part of my sabbatical was working in industry (at Amazon Robotics) to see how Visualization is done “in the real world.” Part of my sabbatical was thinking about Visualization, and how we should be thinking about it (a bit meta).
Experience: Each time I teach the class, I think “this is how it could be better”. I have some things that I really want to fix. There are some things that work that I really want to preserve.
The differences in the room means the previous mix of class (less lectures, more in-class exercises) may not work - we’ll need to experiment. I was hoping to do less lecturing (replace that with readings and videos) and more “interactive” class activities. Unfortunately, those may not work in the traditional lecture hall this class is assigned to.
The number of students means that many of the mechanisms for feedback and evaluation won’t work. There isn’t sufficient staff for us to look at every assignment and provide meaningful responses. Assessment has always been a problem in this class.
I want to update the content of the class a bit based on my new thinking. However, realistically I might need to be less ambitious.
How will we make visualizations?
There is a learn by doing and practicing component to the class. Which means we need to use tools to make them. I try to make the tools not be the focus.
There will be some pen-and-paper design exercises. Often these happen in class, but this year some might be “try at home.”
Some of the work will require using visualization tools. In the past, we’ve given students access to Tableau (a commercial system for data analysis), as it is a good way to explore visualization primitives. We don’t “teach Tableau”, we expect students to use it in their learning.
In the past, I have tried to make the class programming optional. I plan to “try” again - but no promises.
There is no requirement for any specific programming languages or tools. You can use whatever programming language you like - provided you have access to it. We care what you make, not necessarily how you make it (within some reasonable bounds of academic honesty). We won’t teach you any particular tools (but we can give you some guidance on choosing and learning them).