Design Exercise 5-1: Experiment with the Flight Data
In Design Exercise 5 (which is split into two parts, this DE 5-1, and Design Exercise 5-2: Visualizations from the Flight Data), you will work with another tabular data set: Flight Delay Data. In DE 5-1, you will sketch a visualization to get you thinking about the data, and then make some basic exploratory visualizations to get a sense of how to work with the data and what is there. In Design Exercise 5-2: Visualizations from the Flight Data you will try to make visualizations to tell stories.
The flights data set is described at Flight Data. We recommend you work with the “combined” data set (Jan, Apr, Jul, Oct 2024) - although, you are welcome to work with a single month (if the combined data set is too big), or pull a more complete data set. We recommend that you work with the “simplified” data (first 28 days of each month, even days only) to make many other issues easier.
We expect your “analysis” to be descriptive - your visualizations are describing what happened in the data. You would need to build complex statistical models if you wanted to predict the future or establish causality.
For Design Exercise 5, we want you to focus looking at this as tabular data (albeit a big data set). In the future, we’ll explore the set and graph aspects of it.
The rules for this assignment are as in previous assignments. You may use whatever tools you like. We will also follow similar grading criteria: your rationales are an important part of the assignment.
There are really 5 questions, but since we need to split them up on Canvas it will mean more “sub-questions”. The questions are:
- Sketch designs for top-10 lists. The goal here is for you to sketch to explore designs for a problem similar to those we’ve seen in class. This will allow you to explore how principles can lead to different designs.
- Create a sanity check visualization. The goal here is to give you an “easy” thing to make to check that you can work with the data. It will also let you consider how to “explore” to address an easy question quickly.
- Create visualizations that address a simple (univariate) question. This will give you a chance to check that you can use different aggregations, and to see which ones might be relevant going forward.
- Create an exploratory visualization that shows patterns across multiple variables. The goal here is to make an exploratory visualization (quickly, showing lots of data in a way that can steer you to better things).
- Explain how you made the visualizations.
Question 1-3
(note: because you are sketching, you should consider the same data I used; I used the combined (4 months) simplified set)
In the past someone made a similar visualization - except that is compared years rather than months. I recreated it…
I wanted to know, In each month, what are the top-10 flights?
Which of course, leads to a lot more questions like…
- How many times do they occur?
- Does the ranking change month to month?
- How much do the amounts (for each route) vary?
Note: we are defining a route as an Origin-Dest pair, so the reverse route is a different route.
Based on an old assignment, I created this visualization in Tableau:
Sketch two different visualization designs that shows this Top-N per month data. It is OK to sketch (I recommend it). In your sketch, it’s OK to only show fewer routes - but it should be clear how your design will scale up to the top-10.
Try to make designs that are better than my design for at least one task/question. (I’ve listed a few possible questions above, you can pick different questions - try to pick ones that could be answered using my chart, just with more effort).
For each design, turn in a picture (again, sketching is recommended). If your design isn’t clear from the picture, you can explain it as needed. In the explanation, be sure to:
explain your designs (if it isn’t clear), and how it would scale to 4 top-10 lists (if you don’t show it)
give a rationale for your design - what does it make easy to see, and why did you think that?
compare the 3 designs (your two, and mine) - which tasks is each one good for. Justify your decisions
Question 1: Upload your First Design
Question 2: Upload your Second Design
Question 3: Provide your explanation (including the comparison)
Questions 4-5 Sanity Check
The first thing I do with a new data set is create a simple visualization to check that it has what I think it has. For this data set, I want to check that there are flights on the days I expect, and there are a reasonable number of flights on each day.
Here is a really quick thing I made in Tableau (click to zoom)…
Not too imaginative, but at a glance (and reading the legend) I can see that all the days I expect (even days up to 28) for the 4 months, I have flights. All days have all 105 airports. I do wonder about those outlier days with low flight counts…
Make a sanity check visualization from the data set you will be working with. If you are using the same data set as me, you will hopefully get the same answers. If you are using a different data set, we’ll see which days you are using.
You might want to check more things (not just flight and origin counts).
You can try to recreate my design. Or you can try a different design (as long as it works). Re-creating my design will not get you extra points.
In your explanation, be sure to explain what you can see.
- Question 4: Upload your Visualization
- Question 5: Provide your explanation
Questions 6-8: Univariate Aggregations
I want you to make a “simple” visualization - showing how the arrival delays vary over one variable. Average delay for each day of the week is an easy (and recommended one) one.
A tricky piece is scalability: how are you going to deal with the thousands of flights. The obvious thing is to use a summary statistic (mean, median, 75 percentile, …). Or you could use some more visual summary.
Pick two different ways to deal with the summarization. You could make one visualization that compares two (or more) different aggregations, or two visualizations using different aggregations (you can only upload one or two visualizations).
(for my “sample solution” I compare mean, median, 75th percentile and 90th percentile)
In your explanation, explain the difference. Give a sense of what you think is most appropriate for the data and the domain.
- Question 6: Upload your Univariate Aggregation Visualization
- Question 7: Upload your alternate aggregation visualization. If your first visualization already compares different aggregation strategies, you can leave this blank.
- Question 8: Explain your visualizations, including the different aggregation/summarization strategies. Which do you think is more appropriate for this question and data set.
Questions 9-10: Exploratory Visualization
For this question, we’d like you to make an exploratory visualization that tries to show a lot of data so you can see if there is likely to be a story worth making a good visualization for.
Consider the question: How do delays vary by time of day? Does this vary by day of the week? (or month/season if you are using the combined data set)
Make an exploratory visualization that helps you get a basic sense of what is going on in the data set.
In the explanation, describe your visualization (if it isn’t self-evident), explain how the design choices support helping the patterns emerge (if they are there), and what you see in this initial visualization.
- Question 9: Upload your visualization
- Question 10: Upload your explanation
Question 11: How did you make these
- Question 11: Describe how you made the visualizations (what tools (list any libraries)?, disclose any AI usage, did you use our data or get your own? did you do any data-processing steps?)
Turning it in
You will turn this assignment in as a Canvas Survey: Design Exercise 5-1: Experiment with the Flight Data (due Fri, Nov 07). We strongly recommend completing this assignment before starting on Design Exercise 5-2: Visualizations from the Flight Data. As usual, you will need to upload your answers into the Canvas Survey form (prepare them offline). Canvas will give you 87 points for uploading something, and a human will give you a different grade later.
Rubric
You will lose 10 points per “Question” (of the 5) that you do not provide a complete answer to. You can leave Canvas question 7 (alternate aggregation) blank if your question 6 visualization compares different aggregations already.
Scoring: You will get a “low A” (90) for a “solid” assignment. This means visualizations that meet the requirements and show competent designs. It means that there are complete (and reasonable) textual answers for each question.
We may award 1-2 points per part (1-4) for “exceptional” visualizations or answers. We expect most students to give “solid answers with the common designs” (which is rewarded with a low A). Doing better than this requires doing something non-standard (which is challenging given the constraints of the assignment). Creative design choices, exceptional insights into the data, etc. Recreating my design is not an “exceptional answer” (although, it would have been if you hadn’t seen mine first).
We will deduct points (from 90) for problems. For example:
- Visualizations / Answers that do not fully respond to the prompt
- Visualizations that use inappropriate encodings (for example, using a part whole design with non-part-whole data)
- Visualizations that make it too hard for the viewer to see what is going on
- Visualizations that “dump data” without making choices to have a story emerge
- If the two visualizations for one part are too similar.
Note: for this assignment, we are asking for sketches or explorations. Your visualizations need to be good enough that you (and we) can see things in them, and show some thought as to applying design principles to create good designs.
Your “defense” against us not being able to see things the way you do is by explaining. If you think you’ve made a good choice to make something clear, write it in the explanation.
If we take points off for something like “no clear story” or “inappropriate encoding” or “ineffective design choice” we’re really saying “we don’t see an effective choice here and the student didn’t convince us otherwise.”
GenAI Disclosure:
I asked Gemini for help doing the top-N chart in Tableau. It took two prompts:
I am working in Tableau. I have a data set where each row has a string “route” and a “month”. I would like to make a bar chart grouped by month. Within each month, I would like to list the top 10 most common (highest count) route, showing the count (within the month) as the height of the bar chart.
which didn’t quite work, so I had to add:
The sort order within each group doesn’t appear correct. The same route appears as the first one, even though it has a shorter bar than the third one in some of the months.