DE09: Initial Experiments
This is the first assignment in a series of design exercises where you will work with the ATUS Data for 765-24.
The goals for this assignment are for you to (1) get some experience with the data set, (2) to practice making visualizations from it, (3) to do some exploratory visualization from it, and to (4) think about some of the issues particular to this data set.
If you haven’t already done so, I recommend reading the ATUS Data for 765-24 page to get started. I also recommend that you look through A Quick EDA Example with the ATUS Data (from 2022) - this is an example of me playing with the data sets from 2 years ago (which were different).
A bit about exploratory visualizations…
The goal of exploratory visualizations are to quickly make visualizations that give you a rough idea of a question that you are curious about. They may not be “great” visualizations for anything else - they are there to expose something. Ideally, they make something easy to spot.
For example, when I started working with the ATUS07 data set (the 2010-2019 only version of the data), I was wondering if there were relatively similar numbers of samples in each month. So I made this chart:
A bar chart with 120 bars is not normally something I recommend. But it was quick to make - and it quickly exposes that there are positive and negative outliers, and a generally decreasing trend. But even in the lowest month, there are several hundred samples.
Here’s another picture of the same data, encoding the count with size and color:
This one makes the trend a little less obvious, but would probably be good enough for me to get the rough idea that there’s a reasonable number of samples in all months, and there is some variance. The outliers and trends don’t jump out as well - but for an exploration, it could answer my question. This design might scale better if we had a bigger data set (if we tried this on the 20 years data set ATUS06, I suspect the bar chart would be terrible and this would be OK).
A Note About This Assignment
You will turn this assignment in using the Canvas Survey Design Exercise 09: Explorations and Pictures (due Fri, Nov 01). As usual, you will get points for turning in your survey - we will return your grade using another mechanism. We will accept assignments until the following Monday (Nov 4) without penalty. After that, there will be penalties.
The survey to upload your answers won’t be available right away.
We strongly recommend that you prepare your answers ahead of time and then upload them (but people seem to have figured this out).
Canvas will auto number the questions - differently than the numbering I will use. I don’t know how to get around that.
You may use whatever tools that you like. Tableau is good for doing the kinds of explorations we are asking for, but it does take time to learn how to use Tableau, and it is hard to make great “final” visualizations with it. (although, it is possible - look at the Tableau Vis of the Day site for some amazing examples of what Tableau experts can do). I am putting some specific hints for using Tableau with the ATUS data in ATUS Data for 765-24 (tableau hints) - I expect this list to grow over the weeks we work with the data.
When we ask you to make visualizations to “answer a question”:
- We are asking for descriptive/exploratory analysis. Is this true in the samples we have?
- Your visualization should show the data for the question in a clear way.
- A “visualization” might be “compound” (it might be multiple charts together). You should turn in a single image.
- Your visualization should be good enough that it can answer the question - you shouldn’t need a rationale. We aren’t expecting “final” visualizations - but the details should be good enough that we can tell that they answer the questions. You shouldn’t need a rationale to explain how you see the answer, but you will provide them.
- Good visualizations should do more than just answer the question - they should provide context and suggest further information.
You might want to look at Last Year’s Assignment - where we asked students to explore and record what they discovered. Last year, there was more of a focus on finding questions.
Part 1 - Warm Up to the Data
You can do this with either the “all years” data set (ATUS06) or the smaller “2010s” data set (ATUS07). Either way, download a data set and familiarize yourself with it.
For the specific questions, it’s OK to consider the broad categories (except for sleep - which is a specific BLS category, that is part of “ACT_PCARE”).
Create visualizations that give you a quick answer to the following:
Last year, I found that in the data set there were people who claimed to sleep all 1440 minutes. Confirm that this year, there are people who (claim to) sleep more than 23 hours in a day. How many are there, and which days (of the week) do they do it on?
Consider only people from ages 15-30 (I don’t think there are people in the data set younger than 15). You would expect that over this period, they go from being students to entering the workplace, and this would be reflected in their time usage. Make a visualization that supports (or denies) this expected trend. (this should be about time usage - not the demographic (education, employment) variables).
Other than shifting from school to work, are there other trends in how people spend their time as they age from 15-30?
(On Average) People sleep more on the weekends. Is this just for people who are not working or in school?
What else (besides sleep) do people do more of on the weekends?
One I am not asking you to make: as we divide up the data into small groups (20 year olds, in school, in a particular year), the groups may become so small that we don’t have enough samples. Often, we need to check for when we divide things up to make sure that there are enough samples. I think you are OK for the questions above.
Part 2: Explore…
I’d like you to spend some time “exploring” the data set as practice (for exploring in general, as well as to gain traction with the data). It’s difficult for me to make this into a formal assignment (I tried last year see 2022 EDA Assignment).
What I’d like you to turn in:
- A visualization that you made to answer an initial question.
- An explanation of what you were looking for in that visualization, and where it lead you to.
- A second visualization that follows from #2.
- A list of (2-5) interesting things that you found (that you might want to explore deeper, or make a good visualization of).
You shouldn’t use one of my visualizations (or ones that you made for Part 1) for #1. However, they could inspire a question that leads to something. If nothing else, many of the questions look at averages - and there might be interesting “variance” (e.g., differences among the subgroups).
An “interesting finding” might be a failure to see a difference…
For example, the 2010s were a period of great change (economically, politically, technologically, …) - is this reflected in the coarse level time usage statistics?
This is a “null result” - I was expecting something to change, but I don’t see any big changes. This makes me wonder… (by the way, if it makes you curious, that might be a good thing to explore. I think the fact that we’re averaging over so many different people is hiding something.).
Part 3: Questions for Visualizations
Next week (for DE10) I am going to ask you to make some visualizations that answer questions (some of my questions and some of your own questions).
In this part, I want you to preview some of the questions. For people who opted in to the Collaborative Learning Opportunity, you will do a different version of this as part of the “Design Discussion.” If you are in the CLO, put “CLO” for your answer to these questions.
For those of you not doing CLO, here’s a list of 4 questions that I might ask you to make a visualization for:
- What categories have the most differences between sexes? Did this change over the years?
- Do employed people spend their time differently on weekends (non-work days) than unemployed people? (and if so how)
- Do people who socialize/relax more do it consistently (spread across the week), or do they do they do it more on certain days? (i.e., do high socializers have different patterns across the week than low socializers) Does this hold across different age groups?
- What states have the biggest differences (in time usage) between 25-35 year olds and 55-65 year olds? Be aware that the trends will be impacted by employment and whether they have kids.
I have to give out DE10 before I get your responses. But I am curious:
- For each, give a 1-2 sentence answer as to (A) whether you think this question will lead to an interesting visualization, and (B) a guess at what you think you might see.
- If you had to pick one, which would you pick?
Note: I am NOT asking you to make visualizations for these questions (yet).
What to turn in…
Yes, I am asking you to turn in 7 visualizations (!)
On the Canvas Survey Design Exercise 09: Explorations and Pictures (due Fri, Nov 01) there are questions that allow you to upload (make sure to prepare your answers ahead of time!):
- Your visualizations for the 5 questions in Part 1. For each, you should upload an image and a “rationale” - A few sentences explaining why you chose your particular design, and what you see in it that answers the question.
- Your 2 visualizations for Part 2 (an initial visualization and a follow-on)
- An explanation of your exploration (what you were looking for and where it lead you to)
- A list of 2-5 interesting things you found that you might want to make visualizations for from the data set.
- (not for CLO students) Assessments for each of the 4 questions in Part 3
- (not for CLO students) Your choice of which one you would prefer.
- For CLO students, you will be asked for your assignment (which came from the discussion that you had to do).
This design exercise is due on Friday, we will accept submissions until the following Monday without penalty (and later in the week - for a severe penalty).