DE02: Design Exercise 2: A First Look at Data

In our first design exercise (it is DE02 because it is in the second week), we will look at our data set and begin to think about why to visualize it. We’ll also practice critique.

For the first few design exercises, we will be working with a data set of US Census data from the USDA. The dataset, and information about it, is on the Census Data page. Note, that for this assignment (DE2), you might not need to look at the data (looking at the information on the web page may be enough).

You will get more from this assignment if you do the parts in order without peeking at the later parts. Part B has visualizations that you shouldn’t look at until after you’ve done Part A. Part C has hints for part B. (the hints are “hidden” so you can read over the whole assignment - but please don’t expand the expanders until you’ve done the previous parts).

This assignment uses 3 visualizations made from the census data set (technically, from the 2020 version of it, and using the state level aggregations). These visualizations are a sample solution to an old assignment that asked students to identify a story in the data, and then make a visualization to tell them. Course staff created these examples (they are designed to be used as a prop for critique, rather than as great visualizations).

Details not important for the assignment

The visualizations provided have the data aggregated by state, so that all 3 show the same data points.

The actual data was provided by county. The aggregation of the data (how the counties are averaged to compute the state values) are wrong in these visualizations. The county values are simply averaged (without accounting for their varying populations). So, the numbers in these visualizations are off. So, consider this as “fake data” used to learn about visualization.

The two different variables (high school attainment and unemployment rate) are from different years. You can ignore that for the assignment.

And if you’re wondering: these are (intentionally) not great visualizations; so we can use them in the future for thinking about how to improve them!

Mechanics

This Design Exercise (and many others) will be turned in as a Canvas Survey: Design Exercise 02: Questions from Data (due Fri, Sep 13). Each question will have either a type in box, or a file upload box. We do this to keep the different parts separate.

When you turn in the survey, Canvas will give you “completion points.” This is not your grade. We will grade the assignment and put the grade in a new canvas column.

Canvas will let you turn in the survey more than once. However, each time it will start from scratch - you will lose what you did previously. Therefore, I recommend you write your answers first (in some text editor) and then copy the answers to the Canvas quiz later.

For many questions, we are asking for a list of questions. Please number them, put each on a separate line, and put your best ones first. We ask for 3-5, but feel free to give more.

Rules for lists
  • The should be a simple numbered list. Each question should be a sentence (or phrase) - it doesn’t need to be much longer.
  • You need to give 3-5 questions for each, but feel free to give a few extras.
  • You cannot use “find more questions” as a question.
  • Try to make your questions be diverse.
  • You cannot use the examples.
  • You should ask questions that could conceivably be answered using the data. (you don’t need to check, but make reasonable assumptions)
  • Try to come up with questions that are more interesting and complex than the examples.

Part A (Questions 1-2)

First, look at Visualization 1 (the two maps). It is much clearer as a PDF, but here’s a thumbnail:

census-maps-state-highschool-unemployment.png

High school attainment (2014-2018) and unemployment (2019) 

Note: the visualization is the pair of maps, they are meant to be viewed together. Also, this map shows levels of high school attainment and unemployment rates in specific years (you can pretend they are both for the same year).

Question 1: List 3-5 questions that you can answer (relatively easily) about the data with this visualization (Visualization 1, the maps). Try to pick questions that this visualization is good for answering. See if you can try to guess what questions it was designed to answer.

Question 2: Make a list of questions (at least 3-5) that the visualization (#1, the maps) inspires you to ask next - that you probably would need to make another visualization for. Try to pick questions that would “need” a visualization to answer (rather than ones with a specific textual answer), and that cannot be readily answered with this visualization.

Part B (Questions 3-5)

Visualizations 2 and 3 are two different visualizations made from the same data.

This is a bit of a “comparison” exercise: what is each one potentially good for?

Visualizations Hidden - Don’t peek until you complete Part A

Visualization 2 is a pair of bar charts (better as a PDF)

census-bars-state.png

Visualization 3 is a scatterplot (better as a PDF)

census-scatter-state.png

Note: this is a bit tricky, because it’s hard to know. Use your eyes and intuitions. In the future, we’ll understand some of the science.

Question 3: Make a list of questions (3-5) that would be easier to answer with Visualization 1 (maps) than with Visualization 2 (bars) or 3 (scatter).

Question 4: Make a list of questions (3-5) that would be easier to answer with Visualization 2 (bars) than with Visualization 1 (maps) or 3 (scatter).

Question 5: Make a list of questions (3-5) that would be easier to answer with Visualization 3 (scatter) than with Visualization 1 (maps) or 2 (bars).

Part C (Questions 6-7)

The idea here is to practice critique. Please use the stylized form we discussed in class. (If purpose, specific aspect (choice), information to make the choice (principle)). At this point, you may not know many principles, but use your intutions.

Example Critique - don’t look at this until after you’ve done Part B

Here’s an example in the form from the book/class (critiquing the Bar Charts - Visualization 2):

If the purpose is to help the viewer compare Wisconsin and Minnesota, highlighting those two states with colors makes them stand out helps the viewer identify them quickly. The choice of colors helps the viewer identify which state is which. The use of Badger red may be effective in Madison, but the connection of blue to Minnesota is less clear.

Question 6: Make a positive critique of Visualization 1 (the maps). That is, point out a good aspect. (yes, that may be hard). Try to critique the visual choices (not the data problems).

Question 7: Make a critique of any of the visualizations that suggests a way to improve it. Obviously, don’t copy my example.

Part D (Questions 8-9)

In this part, please consider the whole data set: county level details for several variables (population, educational attainment, poverty, unemployment) over different years. You will want to consult the Census Data page that describes the variables, and you might want to look at the data files.

In this exercise, we are not actually asking you to look at the data itself (the numbers) - just the kinds of data contained (the row and column headers). The idea of exploratory data analysis where we look at the data to define appropriate questions is something we’ll do later in the semester.

Question 8: List questions (at least 3-5) that probably don’t “need” a visualization. For example, “what states have the highest unemployment rate”?

Note that “need” here isn’t quite right. For the example question, Visualization 2 answers the question quickly, but also gives a lot of context.

Question 9: List questions (at least 3-5) that would probably benefit from a visualization (beyond a table). For example “are there geographic patterns in where unemployment is high?”

Food for thought question (you don’t have to answer it): what visualizations might you want to make to help you figure out what are good questions to ask?

Part E (Looking Ahead)

In the next weeks, we will ask you to make visualizations from this data set to answer questions. We will also let you pick some of your own questions to answer with visualizations.