ATUS Data for 765-24

October 23, 2024 (Last Modified: May 16, 2025)

Page content

The American Time Usage Survey is a big data collection effort from the U.S. BUREAU OF LABOR STATISTICS (BLS).

The survey collects detailed time usage data (how people spend their time) for a sampling of people across the US.

They provide the data files in a very detailed form: data files page. You can get lots of information about who the people are and what they did.

One thing that makes this data interesting is that it is done with a great deal of statistical care to document each thing very carefully so that it can be used correctly.

Another interesting thing… it’s a massive data set that is a familiar enough topic.

In the past, working with this data meant joining together many files, and summarizing it. The official BLS site will give you a lot of detail, about the people surveyed and how they spend their time (the details of the observations).

But, for this year we will allow you to work with a version of the data provided by IPUMS, and organization that provides convenient access to the government data. They put the data into an easy-to-use form, that doesn’t have all the details - but is sufficiently interesting for our purposes.

For later stages of the assignment, you are welcome to obtain more detailed data (from either IPUMS or BLS). For the initial stages of the assignment, we prefer you work with the “easy” version of the data we provide.

The Provided Data Set

I created an “extract” (a pull from the data base to create a data set) using IPUMS extract builder. This created a nice, clean CSV file from the data. I got to pick which samples (all) and which variables (both attributes of the person and how they spent their time on the day of the sample).

I retrieved samples from the entire history (2003-2023) (this is “all samples”). I picked an assortment of different attributes about the person being sampled.

The resulting data file has 245139 rows. Each row is a “sample” - it’s a description of what one person did on one day.

There are 69 columns. They tell you when the sample was taken (year, month, day), demographics (age, sex, etc.), some other information (levels of education, etc.)

For some of the columns, you need to look at the data dictionaries to interpret the codes. The official data dictionaries are at BLS. But you can also find information scatterd about the IPUMS ATUS site. Note: be sure to translate the codes in your visualization legends and labels.

A hint… the ATUS data is part of the “Current Population Survey” (CPS). The “raw” ATUS data from BLS assigns each person a code, and the data about the person is found by looking that code up in the CPS. With the BLS data you can also connect a person to a household and learn about who else lives there. For the extract data, IPUMS has joined the ATUS and CPS data (so each row has the demographics). But, you might need to look up the column definitions in the CPS documentation.

The Columns We Provide (some basic documentation)

YEAR - when the sample was taken
CASEID - ATUS Case ID
SERIAL - Household serial number
HRHHID_CPS8 - Household ID (CPS)
HRHHID2_CPS8 - Household ID part 2 (CPS)
HRSAMPLE_CPS8 - Sample ID (CPS)
HUHHNUM_CPS8 - Household number (CPS)
REGION - (part of the country)
STATEFIP - FIPS code for the state
METRO - Metropolitan and central/principal city status
MSASIZE - Metro Area Size
COUNTY - County (FIPS code)
METAREA - Metro Area Code
FAMINCOME - Family Income
HH_SIZE - Number of people in household
HH_CHILD - Children in household yes/no
HOUSETYPE
HH_NUMKIDS - Number of children under 18
AGEYCHILD - Age of youngest child
HH_NUMADULTS - Number of Adults in Household
LFPROXY_CPS8 - Labor force information collected by self or proxy response
MONTH - when the sample was taken
DAY - day of the week (not the date)
AGE
SEX
RACE
MARST - Marital Status
CITIZEN - Citizenship Status
GENHEALTH - https://www.atusdata.org/atus-action/variables/GENHEALTH
EDUC - Highest level of school completed
EDUCYRS - Years of education
SCHLCOLL - Enrollment in school or college
EMPSTAT - Employment Status
MULTJOBS - Has more than one job
CLWKR - Class of worker. main job
OCC2 - General occupation category. main job
OCC - Detailed occupation category. main job
RETIRED
ACT_CAREHH
ACT_CARENHH
ACT_EDUC
ACT_FOOD
ACT_GOVSERV
ACT_HHACT
ACT_HHSERV
ACT_PCARE
ACT_PHONE
ACT_PROFSERV
ACT_PURCH
ACT_RELIG
ACT_SOCIAL
ACT_SPORTS
ACT_TRAVEL
ACT_VOL
ACT_WORK
BLS_EDUC_CLASS
BLS_EDUC_HWORK
BLS_FOOD
BLS_HHACT_FOOD
BLS_HHACT_HWORK
BLS_HHACT_PET
BLS_LEIS_ATTEND
BLS_LEIS_ATTSPORT
BLS_LEIS_PARTSPORT
BLS_LEIS_SOC
BLS_LEIS_SOCCOM
BLS_LEIS_SOCCOMEX
BLS_LEIS_TV
BLS_PCARE_SLEEP

The ACT columns are the amount spent in the 17 “main categories” of activities. These do add up to 1440 minutes. The are described here.

The BLS columns are more specific categories of time usage. They are described here. There are a total of 431 categories - they get very detailed. If you want more categories, you can get them in the BLS data, or from IPUMS.

You can download the CSV file on Canvas (atus_00006.csv 52.5mb)

If you’re wondering why the data file is “ATUS_00006” - it’s because this was my 6th attempt to create a data set.

A Few Notes on the Data …

For this assignment, we are doing “descriptive” statistics. We are describing the samples. The BLS data has information on how you can generalize from the sample population to the broader US population (basically, this requires applying weighting factors to the samples). But performing such weighting is not required (or recommended) for this class.

While the data may seem big (245000+ samples), it breaks down pretty quickly. If you start to divide people into groups, you quickly get to small sample sizes.

The number of samples are not uniform. Some years sampled far more people than other years. The data sources make a point of saying that sampling was unnusual when the pandemic broke out (e.g. April 2020 has no samples). However, the number of samples varies a lot in all years.

What does the Data Mean?

The time use variables are defined at: https://www.atusdata.org/atus-action/time_use_variables/select_template

The data we provide uses the documented codes: we did not define any “user defined variables”.

There are two sets of variables: The ATUS codes (17 categories, beginning with ACT), and the BLS codes (which are very detailed, they are a three level hierarchy). In the data we provide, we only use the “top level codes” (defined on this page). But you can get very detailed data if you want.

The BLS codes do not nest exactly (so the detailed codes don’t necessarily add up to the category codes).

The ACT codes are enough to do many interesting things. Although, a few key things (like sleep) are bundled together. We recommend using the ACT codes for cases where you care about part-whole, and specific BLS codes for details you are interested in (like sleep or housework).

A Historical Note

We’ve used ATUS data in the past - but we didn’t have the nice, easy to use IPUMS version. We forced students to use the data directly from BLS, which required doing the joins (Tableau can do it easily). If you’re curious the 2022 Data Description will give you the details, and a guide if you want to try to use the BLS data yourself (for example, to get more detailed categories, or to get more demographic information).

Tableau Hints…

Note - in the pictures here, I am showing you the whole Tableau interface since I want you so see how I made things - you won’t show the interface in what you turn in (use export, not screen shot).

Creating New Variables

I find it extremely useful to create new variables - either by calculation, by binning, or by grouping.

Some examples:

I created a variable (dimension) “Employed or in School” - which converts the confusing Employment and School Status variables into a single binary variable. (this is a computed field)
I created a variable (dimension) “Grouped Sleep” - which divides sleep into groups (they aren’t bins since they aren’t equal size) 0-4 hours, 4-6 hours, 6-8 hours, … (this was done with grouping, not binning)
I created a variable “AGE (bin)” that bins the ages into 5 year ranges.

Stacking Bars

One of the most common things to do with this data is to make a stacked bar of the Activity (ACT) measures - (since the day is a part-whole). This is a little tricky, but…

Set the default Aggregation for the ACT measures to “Average” (since you want the average of all the samples)
Drag “Measure Values” to an axis (I am using vertical)
Remove the measures that aren’t averages of the activity. (you might be able to achieve a similar effect by using the “Measure Names” filter)

Here’s a picture of what the basic setup looks like:

Notice that it does add up to (close to) 1440 (the number of minutes in the day). It isn’t exact because of rounding (which gets compounded by the averaging).

You can drag “measure names” to colors (although, you get 17 colors!). And you can split on other dimensions… For example:

Note: this is not a great visualization. It is a good exploratory visualization (it gives me a starting point for digging deeper). The amount of sleep (Act Pcare) definitely goes up as the amount of sleep goes up (Act Pcare includes sleep). If people sleep more, what are they doing less of?

Archive of the Fall 2024 Class

This web page is from the Fall 2024 CS765 (Data Visualization) class.