DC2: Spaghetti Plots
This Design Challenge Has Been Cancelled!
An Explanation is coming soon - but you will not be doing Design Challenge 2 in this year’s class.
In a Nutshell…
The goal of this project is to have students consider scalability issues in a very common data type (multi-line series data, or “Spaghetti Plot Data”). Students will create designs and tools that may be used to visualize a range of datasets (not just create a visualization of a particular dataset).
The assignment has four phases / milestones:
DC2-1: Tasks and Data (due Mon, Oct 19) - You will turn in lists of tasks and situations where this data applies.
DC2-2: Designs and Sketches (due Mon, Oct 26) - You will turn in design ideas that address some of the situations and tasks that you have identified.
DC2-3: Draft (due Mon, Nov 2) - You will describe what you are trying to build, and give us some indication that you have started building it.
DC2-4: Final Handin (due Mon, Nov 9) - You will turn in the system that you have built.
Overview
“Spaghetti plot” is a pejorative term for a multi-line line graph that has too many line for people to see everything in them. Despite their problems, they are ubiquitous. In this design challenge we will try to understand them, figure out what they are good and bad for, and (hopefully) come up with something better for the places where they do not work.
This assignment is about multi-line, line graph data. But a multi-line line graph is just a visualization of the particular kind of data that we’re interested in. But, it’s easier to say that than “data sets with a nominal/categorical set of items each with a quantitative (interval/ratio) dimension where at each sample there is a quantitative (interval/ratio) value. To make our lives a little easier…
- There is a set (potentially a nominal/categorical set - but it might be ordered) of “objects” - these are the things that we have a line for. (we have N objects)
- Each line covers a range (for purposes of discussion, let’s call it “time” - even though the dimension can be any time). For this assignment, we can assume that each line has the same start and end “time”. Even though this is a continuous dimension, we’ll assume we have a uniformly spaced set of samples (so we can simply refer to them by integers), without any gaps. (we have M samples)
- At any “time” (for any one of the samples on the line), the line has a “value” - which is a quantitative value within some range.
- We’ll ignore the fact that the “time” dimension is time (since it isn’t always). This means that there aren’t “obvious” cycles that we need to account for (like seasons, or day/night).
For example…
- We may have climate data. For 50 cities around the world (N=50), over the course of 3 years, we have measured the temperature each day (M=365*3). For each of those 50*365*3 observations we have a temperature in degrees Celsius (in the range -20 to 50).
- We may have sales data. For 100 products (N=100), over the course of 10 years, each month (M=10*12) we have the number of items sold.
- We may have noise data on a train. For each of the 12 cars on a train (N=12), over the course of the 100km route (measured every km) (M=100), we have a measure of the noise level.
- We may have data for trans-ocean cables. For each of N cables, Over the length of each cable (M), we have a measurement of the cable’s depth.
- We may have heart rate information for runners in a marathon. For all 2000 runners, we have a stream of measurements of heart rate for every position along the route sampled every kilometer.
Notice that there are (at least) 3 types of scale we need to contend with:
- The number of “lines” (N)
- The number of samples of each line (M)
- The dynamic range of the values (if the values are over a very wide range they can be harder to show)
If all three of those are small (N,M, range), then it’s easy - you can use a multi-line graph. But as N, M and/or the range grows, the problem gets trickier. There are other complexities beyond these three. For example, if the samples aren’t uniform, there may be uneven gaps and different spacings. Imagine that for each different line, we get the measurements on different days… This is an additional type of complexity that you may optionally consider.
The goal of this assignment is to have students explore issues of scale in a fundamental data type. We want students to understand the range of applicability and tasks, to think about how different strategies for dealing with scale may be applied, and to implement some solutions to explore how these strategies may work. The goal of this project is not to create a visualization of a particular dataset, but rather to create designs and tools that may be used to visualize a range of datasets.
A Starting Point
Here are three common approaches to this type of data. Technically, the “Spaghetti Plot” is the first one of the three below. I like the term “Spaghetti Plot Data” to refer to the type of data we are considering. It sound much better than “multiple interval-indexed series of interval values with many points per series.
By “visual design” I am referring to encoding. There may be other encodings (can you think of some? that’s part of the challenge here). Here are those three designs - generated by the simplest program that I could write, all using the same “fake” data. (you can check out my simple implementation. (described below). You can also try out Florian’s D3 example which is also described below.
For each encoding, there are lots of minor variants. You can add interaction to highlight on element in a spaghetti plot; you can scroll and filter to select within small multiples; you can change the colorings of a lasagna plot, … Given how common this kind of data is, you’d expect there to be good solutions. Or at least a well-characterized space of design decisions that gives guidance of how to make informed choices. But I am not aware of any in the literature. So, we have to work it out ourselves in this assignment.
Note, that there is room for a lot of creativity in design: while applying the four “design moves” (remember these from 1: What Is Visualization and How do We Do It? (how-do-we-make-a-design)?) to one of these gives a lot of starting points, there may be solutions that are very different than these three.
I have given variants of this assignment in the past (see 2018 and 2017). This year’s is different: in the past, the assignment was made complex by trying to make too many options for students. And this year, we are working harder to provide you with interesting example data.
The Structure of The Assignment
The main objective of the assignment are for us to use a very common and standard chart type to explore visualization principles and design. It’s a real problem that lots of people have, and I don’t think there is consensus in the literature on how to address these challenges. In fact, I don’t think there is much literature on the problem. Often, people just do the basic stuff and users suffer.
The first phase of this assignment asks you to consider examples of scenarios and tasks where this data occurs. This will be useful in the latter phases as it will help you identify problems/tasks worth building tools for. We will share the results of this first phase with the class (to provide everyone with the range of ideas).
The second phase of the assignment asks you to consider designs that address some of the tasks above. We want you to consider what the right design might be, not necessarily implement it. We want you to create designs - preferably by sketching and describing. This should let you explore much more easily than if you had to build each one.
The final part (phases 3 and 4) of the assignment asks asks you to actually implement a prototype of your design(s). Ideally, this will be a tool that can read in data in a standard form and produce visualizations (possibly with interaction). Given the short time (there is just two weeks), and the range of implementation skills in class, we emphasize the “prototype” aspects.
There is clearly a tradeoff between building a more complex design and a more robust system. Ideally, you would do both, but realistically.this might be difficult to achieve. We are providing flexibility here (which makes these things really hard to grade). You will be given the opportunity to explain what you were trying to do, and we will evaluate you accordingly. An interesting design with a brittle implementation (or even one where we need to use our imagination a bit since the implementation doesn’t work perfectly) can be rewarded as well as a solid implementation that applies a more straight-forward design but robustly loads a variety of data sets. We can also consider your prior experience, for example, if you decide to make learning a new tool part of the assignment, we will factor that in.
The ground rules for implementation are discussed below (see Implementation Ground Rules). But generally, you may use whatever tools and platform you want (providing they are available to us).
Given the tight time frame, this might not be the best opportunity for you to learn new tools. If you’re not already a D3 programmer, you may not be able to pick up enough of it in the short time frame to implement something fancy.
This assignment gives you a lot of choices in how you proceed. I make no premise that the hardness / amount of effort will be balanced. Your assignment must excel in at least one aspect - and you can choose which one it is. You just need to convince us.
Note: This year, people will work individually. You are not permitted to work with a partner. However, you are welcome to discuss ideas with classmates (see Policy on Collaboration (for DC1 and beyond)).
Implementation Ground Rules
You may use whatever programming language and environment that you like. You may use whatever libraries and tools you like. Subject to a few ground rules.
We may not be able to run your program. Your documentation must be complete and show off what the program can do. Be sure to describe things well and give pictures. You can even provide a video.
You can use any tools you like. We do not restrict you in terms of languages, libraries, etc. You do need to tell us what you’ve used. You need to make clear as part of your hand in what one would need to do run your program.
You must be able to run your program - at least enough to produce enough sample outputs for documentation.
You do need to turn in everything we would need to run your program (in terms of the source code). However, we understand that we may not have the right environment to run it. Therefore, we may ask you to give a demo on your own computer (if your program requires a demonstration). Preferably, your documentation is so complete that we can understand what your program does without even seeing the demo.
Phases of the Assignment
Warning: the final handin is a big thing. If you view phase 4 as 1 week, this is a hard project. If you view Phase 4 as the culmination of 4 weeks of work, things are much more manageable. While you probably want to figure out your designs before you implement too much, you can think about implementation from the beginning.
Phase 1: Task and Scenario Analysis (Oct 19th)
There are two parts to this: a task analysis (which has two parts: situations and tasks), and critiques.
Task Analysis: You need to come up with lists of:
- (at least) 3-5 concrete situations where this kind of data comes up. I gave 4 above (city/temp, sales, train noise, cables) – don’t pick those. Describe how these problems might scale (will they get large in N,M, or range?). If you can identify real, publically available data sets, that’s great (but not required).
- (at least) 5-7 tasks. Describe them both in terms of a specific situation but also in a more abstract way (examples below).
I’ll give you a few to start with… you cannot use these in your list (for part 1, you can use them in your list for part 2). I’ll describe them in terms of the cities/days example above (these are the “concrete” examples).
- On which day was the greatest range in temperatures seen?
- What city had the widest range of temperatures?
- Was there a month in which a city had its temperature rise consistently?
For abstract descriptions, these might be:
- Identify the sample (time) with the greatest range in value.
- Identify the line with the greatest range in value.
- Identify a consistent increasing trend for a line within a time range.
Note: there is a hard cutoff for this part. After the cutoff date, we will share lists – so you can take ideas from others for your parts 2-4.
You will turn in this part using a Google Form. We recommend that you write out your answers in a file, and copy/paste them into the forms. There are two forms:
- A form to enter your 3-5 concrete situations.
- A form to enter your 5-7 tasks.
Initial Critique: We have a few basic designs above (Spaghetti, SmallMultiples, Lasagna), and a few tasks (from your lists that you’re handing in).
In this part, you need to pick 1-3 tasks (from your list) and critique each of the three basic designs (Spaghetti, SmallMultiples, Lasagna). For each task, explain why you think each design may or may not be appropriate (consider how they might scale as N, M, or range scales). You may wish to sketch out what something might look like to better explain the pros and cons. Remember, that a critique isn’t just to say what’s wrong – it’s also to say what’s right. Hopefully, you can identify some things the basic designs are good for as well as some things they are not. You will need to have some situations where the basic designs aren’t good for the later parts – where you will need to come up with something better.
In doing your critique: critique the design (e.g., spaghetti plot), not necessarily the simple example implementations given above. Although, you might use this as an opportunity to point out design elements that you need to get correct if you were to implement one of these baseline designs well.
Turn this part in by uploading a PDF file to Canvas: DC2-1: Tasks and Data (due Mon, Oct 19)
Phase 2: Design Sketches (Oct 26th)
Your goal in this assignment is to create a design (or multiple designs) that address tasks/situations that are not well addressed by the well-known designs (the 3 we’ve seen). By situation, I mean a combination of task and data scale / data properties).
In this phase, we ask you to (1) identify (2-3) problems you would like to solve (e.g., the task(s) and situation), (2) and “sketch out” designs (at least 2) for each. We would like you to consider multiple problems and designs, even though you will probably only pick 1 to implement.
By “sketch out” we would like you to describe the design, preferably with sketches to illustrate it. Describe it enough that someone could understand how it would work, and give some rationale for why you think it would address the task that you designed it for.
The expectation is that one of these sketches to implement. (or more than one if you are ambitious) However, if you come up with an even better idea later (or realize that your design isn’t practical to implement in the time allotted), you are not bound to one of the designs that you have.
Your handin for this phase should be a single PDF file uploaded to Canvas. DC2-2: Designs and Sketches (due Mon, Oct 26)
Phase 3: Draft (Nov 2nd)
This is the first of two weeks focused on implementation. We don’t expect you to have everything done, but we want to check that you are well underway. For this phase, all you need to do is turn in a picture of what your program is doing. We just want to check that you’ve at least started to implement things.
Your handin for this phase should be an image file (e.g., screen shot) uploaded to Canvas. DC2-3: Draft (due Mon, Nov 2)
Phase 4: Final Handin (Nov 9th)
(stopped writing here)