Design Exercise 5-1: Experiment with the Flight Data

In Design Exercise 5 (which is split into two parts, this DE 5-1, and Design Exercise 5-2: Visualizations from the Flight Data), you will work with another tabular data set: Flight Delay Data. In DE 5-1, you will sketch a visualization to get you thinking about the data, and then make some basic exploratory visualizations to get a sense of how to work with the data and what is there. In Design Exercise 5-2: Visualizations from the Flight Data you will try to make visualizations to tell stories.

The flights data set is described at Flight Data. We recommend you work with the “combined” data set (Jan, Apr, Jul, Oct 2024) - although, you are welcome to work with a single month (if the combined data set is too big), or pull a more complete data set. We recommend that you work with the “simplified” data (first 28 days of each month, even days only) to make many other issues easier.

We expect your “analysis” to be descriptive - your visualizations are describing what happened in the data. You would need to build complex statistical models if you wanted to predict the future or establish causality.

For Design Exercise 5, we want you to focus looking at this as tabular data (albeit a big data set). In the future, we’ll explore the set and graph aspects of it.

The rules for this assignment are as in previous assignments. You may use whatever tools you like. We will also follow similar grading criteria: your rationales are an important part of the assignment.

There are really 5 questions, but since we need to split them up on Canvas it will mean more “sub-questions”. The questions are:

  1. Sketch designs for top-10 lists. The goal here is for you to sketch to explore designs for a problem similar to those we’ve seen in class. This will allow you to explore how principles can lead to different designs.
  2. Create a sanity check visualization. The goal here is to give you an “easy” thing to make to check that you can work with the data. It will also let you consider how to “explore” to address an easy question quickly.
  3. Create visualizations that address a simple (univariate) question. This will give you a chance to check that you can use different aggregations, and to see which ones might be relevant going forward.
  4. Create an exploratory visualization that shows patterns across multiple variables. The goal here is to make an exploratory visualization (quickly, showing lots of data in a way that can steer you to better things).
  5. Explain how you made the visualizations.

Question 1-3

(note: because you are sketching, you should consider the same data I used; I used the combined (4 months) simplified set)

In the past someone made a similar visualization - except that is compared years rather than months. I recreated it…

I wanted to know, In each month, what are the top-10 flights?

Which of course, leads to a lot more questions like…

  1. How many times do they occur?
  2. Does the ranking change month to month?
  3. How much do the amounts (for each route) vary?

Note: we are defining a route as an Origin-Dest pair, so the reverse route is a different route.

Based on an old assignment, I created this visualization in Tableau:

Top-Flights-Each-Month.svg
How to make this in Tableau (if you want to re-create it)

First, I made a new “calculated field” called “Route Name” which is [Origin]+"-"+[Dest].

Doing the top-N and sort per group was beyond my Tableau skills, so I asked Gemini for help. See the GenAI statement below for the prompt (it took 2 tries), but here are the steps…

This is a classic “Top N within a group” problem in Tableau. A simple “Top 10” filter on the [Route] dimension won’t work, as that would give you the overall Top 10 routes, not the Top 10 for each month.

The solution is to use a Table Calculation (specifically, RANK) to determine the rank of each route within its respective month, and then filter on that rank.

Here is the step-by-step process.


1. Build the Basic Bar Chart

First, let’s create a chart showing the counts for all routes, grouped by month.

  1. Drag [Month] to the Columns shelf. Right-click the pill and ensure it’s set to Discrete (it should be a blue pill).
  2. Drag [Route] to the Columns shelf and place it to the right of [Month].
  3. Drag [Route] to the Rows shelf.
  4. Right-click the [Route] pill on the Rows shelf, go to Measure, and select Count ( COUNT([Route]) ).

You should now see a bar chart for every single route, neatly grouped by month.

2. Create the Rank Calculation

Next, we create the calculation that will rank the routes inside each month.

  1. Go to the top menu and select Analysis > Create Calculated Field.
  2. Name the field something clear, like [Route Rank].
  3. Enter the following formula:
    RANK(COUNT([Route]), 'desc')
    
    This formula ranks each item based on the count of routes, in descending order (so the highest count gets rank #1).

3. Filter Using the Rank

Now, we apply this rank as a filter, but we must tell Tableau how to apply it (i.e., “within each month”).

  1. Find your new [Route Rank] calculation in the data pane (on the left) and drag it to the Filters shelf.
  2. A “Filter [Route Rank]” dialog box will pop up. Before you do anything in it, right-click the [Route Rank] pill that you just placed on the Filters shelf and select Edit Table Calculation.
  3. This step is the most critical. In the Table Calculation dialog box:
    • Set Compute Using to [Route].
    • (Alternatively, you can select “Specific Dimensions,” check both [Month] and [Route], and set “Restarting every” to [Month]. But “Compute Using [Route]” is simpler and achieves the same result here. It means: “Group by [Month] and rank the [Route]s.”)
  4. Click “OK” or close the Table Calculation dialog.
  5. Now, the “Filter [Route Rank]” dialog will be active (or reappear). Set the range of values from 1 to 10.
  6. Click OK.

Your chart will now be filtered to show only the top 10 routes for each individual month.

The chart is filtered, but the bars within each month might not be in order. Let’s fix that.

We need to change the sort from a standard sort to a nested sort. This tells Tableau to sort the [Route] labels dynamically based on their count inside each [Month].

  1. Find the blue [Route] pill on your Columns shelf.
  2. Right-click it and select Sort….
  3. In the “Sort [Route]” dialog box that appears:
    • For Sort By, select Nested.
    • For Sort Order, select Descending.
    • For Field Name, select [Route] (or whatever your original measure field is).
    • For Aggregation, ensure it’s set to Count.
  4. Click OK or close the dialog.

This should immediately re-sort the bars within each month from highest to lowest, as you intended.

Sketch two different visualization designs that shows this Top-N per month data. It is OK to sketch (I recommend it). In your sketch, it’s OK to only show fewer routes - but it should be clear how your design will scale up to the top-10.

Try to make designs that are better than my design for at least one task/question. (I’ve listed a few possible questions above, you can pick different questions - try to pick ones that could be answered using my chart, just with more effort).

For each design, turn in a picture (again, sketching is recommended). If your design isn’t clear from the picture, you can explain it as needed. In the explanation, be sure to:

  • explain your designs (if it isn’t clear), and how it would scale to 4 top-10 lists (if you don’t show it)

  • give a rationale for your design - what does it make easy to see, and why did you think that?

  • compare the 3 designs (your two, and mine) - which tasks is each one good for. Justify your decisions

  • Question 1: Upload your First Design

  • Question 2: Upload your Second Design

  • Question 3: Provide your explanation (including the comparison)

Questions 4-5 Sanity Check

The first thing I do with a new data set is create a simple visualization to check that it has what I think it has. For this data set, I want to check that there are flights on the days I expect, and there are a reasonable number of flights on each day.

Here is a really quick thing I made in Tableau (click to zoom)…

flights-per-day.svg

Not too imaginative, but at a glance (and reading the legend) I can see that all the days I expect (even days up to 28) for the 4 months, I have flights. All days have all 105 airports. I do wonder about those outlier days with low flight counts…

Make a sanity check visualization from the data set you will be working with. If you are using the same data set as me, you will hopefully get the same answers. If you are using a different data set, we’ll see which days you are using.

You might want to check more things (not just flight and origin counts).

You can try to recreate my design. Or you can try a different design (as long as it works). Re-creating my design will not get you extra points.

In your explanation, be sure to explain what you can see.

  • Question 4: Upload your Visualization
  • Question 5: Provide your explanation

Questions 6-8: Univariate Aggregations

I want you to make a “simple” visualization - showing how the arrival delays vary over one variable. Average delay for each day of the week is an easy (and recommended one) one.

A tricky piece is scalability: how are you going to deal with the thousands of flights. The obvious thing is to use a summary statistic (mean, median, 75 percentile, …). Or you could use some more visual summary.

Pick two different ways to deal with the summarization. You could make one visualization that compares two (or more) different aggregations, or two visualizations using different aggregations (you can only upload one or two visualizations).

(for my “sample solution” I compare mean, median, 75th percentile and 90th percentile)

In your explanation, explain the difference. Give a sense of what you think is most appropriate for the data and the domain.

  • Question 6: Upload your Univariate Aggregation Visualization
  • Question 7: Upload your alternate aggregation visualization. If your first visualization already compares different aggregation strategies, you can leave this blank.
  • Question 8: Explain your visualizations, including the different aggregation/summarization strategies. Which do you think is more appropriate for this question and data set.
Hints on Delay Distributions

With a flight delay, the experienced pain of things going wrong outweighs the experience that usually things are OK. Knowing that the median flight is on time (yes, this seems to be true) is of little consolation when your flight is late.

Similarly, the average may not be a good metric either. If all flights are 5 minutes late, things may be OK. If 19 flights are on time, and one flight is 100 minutes late, there will be a plane full of unhappy people.

It seems like the distributions of delays are generally good (most flights are on time - or early!).

The air travel industry sometimes uses custom metrics (aggregation functions) to help measure delays in meaningful ways. For example, the might measure “what percentage of flights is 15 or fewer minutes late.”

Questions 9-10: Exploratory Visualization

For this question, we’d like you to make an exploratory visualization that tries to show a lot of data so you can see if there is likely to be a story worth making a good visualization for.

Consider the question: How do delays vary by time of day? Does this vary by day of the week? (or month/season if you are using the combined data set)

Make an exploratory visualization that helps you get a basic sense of what is going on in the data set.

In the explanation, describe your visualization (if it isn’t self-evident), explain how the design choices support helping the patterns emerge (if they are there), and what you see in this initial visualization.

  • Question 9: Upload your visualization
  • Question 10: Upload your explanation

Question 11: How did you make these

  • Question 11: Describe how you made the visualizations (what tools (list any libraries)?, disclose any AI usage, did you use our data or get your own? did you do any data-processing steps?)

Turning it in

You will turn this assignment in as a Canvas Survey: Design Exercise 5-1: Experiment with the Flight Data (due Fri, Nov 07). We strongly recommend completing this assignment before starting on Design Exercise 5-2: Visualizations from the Flight Data. As usual, you will need to upload your answers into the Canvas Survey form (prepare them offline). Canvas will give you 87 points for uploading something, and a human will give you a different grade later.

Rubric

You will lose 10 points per “Question” (of the 5) that you do not provide a complete answer to. You can leave Canvas question 7 (alternate aggregation) blank if your question 6 visualization compares different aggregations already.

Scoring: You will get a “low A” (90) for a “solid” assignment. This means visualizations that meet the requirements and show competent designs. It means that there are complete (and reasonable) textual answers for each question.

We may award 1-2 points per part (1-4) for “exceptional” visualizations or answers. We expect most students to give “solid answers with the common designs” (which is rewarded with a low A). Doing better than this requires doing something non-standard (which is challenging given the constraints of the assignment). Creative design choices, exceptional insights into the data, etc. Recreating my design is not an “exceptional answer” (although, it would have been if you hadn’t seen mine first).

We will deduct points (from 90) for problems. For example:

  • Visualizations / Answers that do not fully respond to the prompt
  • Visualizations that use inappropriate encodings (for example, using a part whole design with non-part-whole data)
  • Visualizations that make it too hard for the viewer to see what is going on
  • Visualizations that “dump data” without making choices to have a story emerge
  • If the two visualizations for one part are too similar.

Note: for this assignment, we are asking for sketches or explorations. Your visualizations need to be good enough that you (and we) can see things in them, and show some thought as to applying design principles to create good designs.

Your “defense” against us not being able to see things the way you do is by explaining. If you think you’ve made a good choice to make something clear, write it in the explanation.

If we take points off for something like “no clear story” or “inappropriate encoding” or “ineffective design choice” we’re really saying “we don’t see an effective choice here and the student didn’t convince us otherwise.”

GenAI Disclosure:

I asked Gemini for help doing the top-N chart in Tableau. It took two prompts:

I am working in Tableau. I have a data set where each row has a string “route” and a “month”. I would like to make a bar chart grouped by month. Within each month, I would like to list the top 10 most common (highest count) route, showing the count (within the month) as the height of the bar chart.

which didn’t quite work, so I had to add:

The sort order within each group doesn’t appear correct. The same route appears as the first one, even though it has a shorter bar than the third one in some of the months.