DE04: Pictures from Data

In this assignment, you will make some visualizations from the Census Data. Actual visualizations based on data (unlike last time when we asked you to sketch). You will turn your assignment in using the Canvas survey Design Exercise 04: Pictures from Data (due Fri, Sep 27)

For this assignment, we will specify the question that you are to show the data to answer. This isn’t necessarily social science research, and in some cases, a statistical test may be warranted. But try to show data that addresses the questions.

Some of the questions are easy - there is an “obvious” visualization. It’s OK if your visualization is straightforward. But be prepared to argue for why it was a reasonable one to make. Even a “simple” visualization (like a bar chart) should have good design elements, including a title, axis labels, legends, …

Ground rules for this assignment:

  • You need to create static visualizations: things that work in “print” (or just as an image file). If you create something interactive, we won’t see the interactivity.
  • Each visualization must be a single “picture” (one page, preferably PNG - if it’s a PDF it can only be 1 page)
  • Unless the question specifies counties, you can work with the state level data. For all questions, using the county level data will lead to more interesting answers.
  • It is OK to use “approximately correct” years. (so for example to use 2003 urban codes to look at 2000 data)

You may use any tools you like - but they need to actually use the data to generate the visualizations. The idea was for you to use data analysis/visualization tools (e.g., Tableau), but you might also choose to write programs (and use a visualization library). Technically, you could use programs to compute things and then plot them manually in a drawing program like Illustrator or Inkscape.

I encourage you to at least try Tableau (see Tableau for CS765 2024 (why tableau)). The questions in this exercise are designed to work well in Tableau (and not to require too much Tableau skill). I admit to “cheating” some times… if I can’t figure out how to do something in Tableau, I’ll write some script to re-organize the data into a form that makes it easier to figure out how to do it. Sometimes, I won’t be able to figure out how to convince Tableau to put charts on the same page - so I’ll take screen shots and put them together in some other program. I’ve also added titles, captions, and legends manually (although I am figuring out how to do that in Tableau).

Because the assignment is for static visualizations, you will lose the nice interactive elements of Tableau.

The question numbers refer to the questions on Canvas. So one “question” has multiple numbers (because of the way Canvas numbers things).

As usual, this assignment will be turned in as a Canvas Quiz (Design Exercise 04: Pictures from Data (due Fri, Sep 27)). Your grade will be posted to a separate Canvas assignment.

There are some hints for this assignment in Hints for Good Figures (how to do well on Design Exercises).

Question 1 and 2

Here is a totally backwards question: I’ll tell you the visualization (a chart type, no less!) and you need to find a way to use it on the Census Data. Yes, I am breaking many of my “rules” - but it’s in the name of pedagogy.

This week, I asked you to learn about treemaps from my video lectures.

Now, I want you to find a valid use for one. Make a treemap from the Census Data. You need to pick variables for which a treemap makes sense/is appropriate. A treemap doesn’t need to be the best solution for your question, but at least a reasonable one.

Tableau is very good at making treemaps (although, it will let you make treemaps from variables that don’t make sense). Excel is pretty good for treemaps as well. The key (with either) is to identify appropriate variables to show.

Question 1 is upload a picture (a treemap). Question 2 is to explain why a treemap is a valid choice for your data, and what you can see in it (say what is easy to see, or what the task is).

Hints on Treemaps for Tableau

It is very easy to make a 1-level treemap using Tableau (pick a dimension and a measure and use “ShowMe”).

Making a 2-level treemap can be trickier. With the Census Data, you might want to have states divided into counties. The way I learned to do this: (1) create a treemap with the top level dimension; (2) use the second level dimension as either a detail or for color; (3) re-order the mark specifications so that the upper levels of the hierarchy are on top.

The controls look like:

tableau-treemap-mark-controls.png

And my result is:

tableau-treemap.png

Question 3-5: Migration

In this question we are considering the differences between domestic migrants (people who move between counties within the country) and international migrants (people who come to the US). The Census Data has both. For these questions, we’ll consider 2023 (the file has 2020-2023).

Do domestic and international migrants go to the same places?

Create two different visualizations that help us understand the similarities and differences in where migrants go. Upload these as Questions 3 and 4. Your visualizations should not be the same chart types.

In Question 5, describe what you can see in each one. How do your choices make different aspects of the answer clear?

Question 6-8: Unemployment and Education

There is generally a correlation between percentage of adults with less than a highschool diploma and unemployment rate. You can see this nicely on a scatterplot (this is in Tableau with a best-fit line, R-squared .322). This does not consider that the counties are different sizes (each county is weighted the same).

education-unemployement-2000.png

Add to the richness of the answer. Pick one (or more) of the following…

  • Is the trend similar between rural and urban? (and levels between)
  • Where are the outliers? What can we say about them?
  • Does the trend vary geographically?
  • Did the trend change over time?
  • Is the trend different if corrected for the number of people in each county?

(or you can come up with your own follow-on questions)

Create two different visualizations that each add to the richness of the question. Upload these as Questions 6 and 7. Your visualizations should not be the same chart types.

In question 8, describe the questions that each one tries to answer, and how it answers that question.

Question 9-11: Student Choice

Your turn.

Pick some question about the data. (You will enter this as Question 9)

Make a visualization that tries to answer it. (Upload this as Question 10)

Explain how well you can see the answer. (Enter this as question 11)

Do not use a question that we’ve already asked (either here or in class).

Note: the rubric here involves the “interestingness” of your question and the answer. Good answers will ask questions involving multiple variables and show multiple variables.