Design Challenge 1 Data Sets

This is a list of the data sets you may use for Design Challenge 1: One Dataset / Four Stories (Design Exercises 3 and 4).

You must choose one data set to create all of the visualizations in a phase. We strongly encourage you to use the same data set for phases 3 and 4.

For each data set, we have put the file in a Canvas file folder (the DC1 Data Sets folder). We will also create a starter workbook in Tableau Online, so you can copy the workbook (rather than uploading it yourself). See this Piazza Posting for instructions.

Note: for each data set, we’ve checked that it is in decent shape, and done some data cleaning. You may want to do more.

Also, for many of the sources, there is more data available. If you’d like to grab a different set from the same source, that is acceptable. You must choose a set that is at least comparable to what we provide. Please describe what you’ve done somewhere when you hand things in.

Also, note that this year’s list is different than prior years. (there are two that are the same)

1. Spotify

Note: this is a different (and bigger) Spotify dataset than we had in Phases 1 and 2.

The SpotifyFeatures.csv data set is a 32MB CSV file with 230K songs. For each song there are approximately a dozen features, as well as some identifying information.

The data set originally comes from Kaggle. There is some documentation for the features there, as well as in the documentation for the smaller dataset. Unfortunately, the Spotify Documentation provides little explanation of the features.

2. College Scorecard

This data comes from the U.S. Department of Education. This is a huge data set, that has a ton of information about all the degree granting institutions in the US. Some of the files have over 2600 features for each college. Most of these features are quite detailed, and relate to financial aspects.

If you want to get the data (be warned - there’s a 380M+ compressed zip file) and get more information you can download it from here. But, Abhay has processed the data to choose a reasonable subset. It should be more than enough to find interesting stories for this project.

Abhay’s files pick 37 features from the huge files, and put together multiple years. For each college, there is an entry for each year. You can get information about what the columns mean from the data dictionary.

There are two versions of the data: College_2012_2_2020.csv has years 2012-2020 in 2 year increments, while College_2000_5_2020.csv covers 2000-2020 in 5 year increments. You can choose either one.

One warning: this dataset has a bunch of “missing data” (empty entries that may show up as zeros).

3. Census Data by County

This data set aggregates many different quantities of interest over the counties of the US. In past assignments, we’ve provided even more detailed data sets which required even more aggregation.

The USDA provides this data as 4 separate sheets (on This page). Any one of them could tell an interesting story but together, they provide a very rich and complex data set full of stories.

We have (well, Young, the 765 TA in 2020 has) joined the 4 spreadsheets together (joining by the “FIPS Code” column) creating a single file. The four files are also combined (joined by the “FIPS Code” column) into one file, and put in Box. The rows for the states (not counties) are also removed. The data is downloaded and processed on September 13, 2020 from USDA/ERS.

The CSV File census_counties.csv is about 4MB and has 3196 rows and 339 columns)

The variable descriptions can be found on here, and some of them are replicated in the following table:

Variable Descriptions
Column name Description
Births_2019 Births in period 7/1/2018 to 6/30/2019
CENSUS_2010_POP 4/1/2010 resident Census 2010 population
CI90LB017_2018 90% confidence interval lower bound of estimate of people age 0-17 in poverty 2018
CI90LB017P_2018 90% confidence interval lower bound of estimate of percent of people age 0-17 in poverty 2018
CI90LBINC_2018 90% confidence interval lower bound of estimate of median household income 2018
CI90UB017_2018 90% confidence interval upper bound of estimate of people age 0-17 in poverty 2018
CI90UB017P_2018 90% confidence interval upper bound of estimate of percent of people age 0-17 in poverty 2018
CI90UBINC_2018 90% confidence interval upper bound of estimate of median household income 2018
Civilian_labor_force_2018 Civilian labor force annual average, 2018
Deaths_2019 Deaths in period 7/1/2018 to 6/30/2019
DOMESTIC_MIG_2019 Net domestic migration in period 7/1/2018 to 6/30/2019
Economic_typology_2015 County economic types, 2015 edition
Employed_2019 Number employed annual average, 2019
ESTIMATES_BASE_2010 4/1/2010 resident total population estimates base
FIPS_Code State-County FIPS Code
GQ_ESTIMATES_2019 7/1/2019 Group Quarters total population estimate
GQ_ESTIMATES_BASE_2010 4/1/2010 Group Quarters total population estimates base
INTERNATIONAL_MIG_2019 Net international migration in period 7/1/2018 to 6/30/2019
Med_HH_Income_Percent_of_State_Total_2019 County Household Median Income as a percent of the State Total Median Household Income, 2019
MEDHHINC_2018 Estimate of median household income 2018
Median_Household_Income_2019 Estimate of Median household Income, 2019
Metro_2013 Metro nonmetro dummy 0=Nonmetro 1=Metro (Based on 2013 OMB Metropolitan Area delineation)
N_POP_CHG_2019 Numeric Change in resident total population 7/1/2018 to 7/1/2019
NATURAL_INC_2019 Natural increase in period 7/1/2018 to 6/30/2019
NET_MIG_2019 Net migration in period 7/1/2018 to 6/30/2019
PCTPOV017_2018 Estimated percent of people age 0-17 in poverty 2018
POP_ESTIMATE_2019 7/1/2019 resident total population estimate
POV017_2018 Estimate of people age 0-17 in poverty 2018
R_death_2019 Death rate in period 7/1/2018 to 6/30/2019
R_DOMESTIC_MIG_2019 Net domestic migration rate in period 7/1/2018 to 6/30/2019
R_INTERNATIONAL_MIG_2019 Net international migration rate in period 7/1/2018 to 6/30/2019
R_NATURAL_INC_2019 Natural increase rate in period 7/1/2018 to 6/30/2019
R_NET_MIG_2019 Net migration rate in period 7/1/2018 to 6/30/2019
RESIDUAL_2019 Residual for period 7/1/2018 to 6/30/2019
Rural-urban_Continuum_Code_2013 Rural-urban Continuum Code, 2013
State State Abbreviation
Unemployed_2019 Number unemployed annual average, 2019
Unemployment_rate_2019 Unemployment rate, 2019
Urban_Influence_Code_2013 Urban Influence Code, 2013

4. Time Usage Survey

The American Time Usage Survey (ATUS) tracks how people spend their time. There are corresponding international versions. There are actually lots of different surveys with interesting data available from the IPUMS website.

We have done a data pull for you. We are allowed to share a data pull ( see this). We chose to collect data from all of the years (2003-2019), and selected a wide range of different attributes. The data was downloaded on September 13, 2020 from ATUS.

Note: this is pre-pandemic data - we may try to create a new pull from the data set to get more updated information. I may make a newer data set, or a data set in a different form, available. Or, we may use that in a future assignment.

The CSV File atus_data.csv has 210587 rows and 34 columns.

You may create your own data pull if you’d like to try this with different columns (but be sure to document it in your writeup). Getting a data set requires picking from all the options - there are so many options that picking a good set is pretty time consuming. It is actually an interesting exercise to see how they document their data - they are very careful in documenting everything.

You can find out what the “time use codes” mean on this page.

Interpreting the other codes requires some digging, unfortunately. Some are self-explanatory, but others… I tracked down the “FAMINCOME” columns: explanation here. The state codes are here. Here are some of them are replicated in the following table:

Variable Descriptions
Column name Description
RECTYPE Record type
1 Household
2 Person
3 Activity
4 Who
5 Eldercare
REGION Region
1 Northeast
2 Midwest
3 South
4 West
STATEFIP FIPS State Code
01 Alabama
02 Alaska
04 Arizona
05 Arkansas
06 California
08 Colorado
09 Connecticut
10 Delaware
11 District of Columbia
12 Florida
13 Georgia
15 Hawaii
16 Idaho
17 Illinois
18 Indiana
19 Iowa
20 Kansas
21 Kentucky
22 Louisiana
23 Maine
24 Maryland
25 Massachusetts
26 Michigan
27 Minnesota
28 Mississippi
29 Missouri
30 Montana
31 Nebraska
32 Nevada
33 New Hampshire
34 New Jersey
35 New Mexico
36 New York
37 North Carolina
38 North Dakota
39 Ohio
40 Oklahoma
41 Oregon
42 Pennsylvania
44 Rhode Island
45 South Carolina
46 South Dakota
47 Tennessee
48 Texas
49 Utah
50 Vermont
51 Virginia
53 Washington
54 West Virginia
55 Wisconsin
56 Wyoming
METRO Metropolitan/central city status
01 Metropolitan, central city
02 Metropolitan, balance of MSA
03 Metropolitan, not identified
04 Nonmetropolitan
05 Not identified
FAMINCOME Family income
001 Less than $5,000
002 $5,000 to $7,499
003 $7,500 to $9,999
004 $10,000 to $12,499
005 $12,500 to $14,999
006 $15,000 to $19,999
007 $20,000 to $24,999
008 $25,000 to $29,999
009 $30,000 to $34,999
010 $35,000 to $39,999
011 $40,000 to $49,999
012 $50,000 to $59,999
013 $60,000 to $74,999
014 $75,000 to $99,999
015 $100,000 to $149,999
016 $150,000 and over
996 Refused
997 Don’t know
998 Blank
HH_SIZE Number of people in household
001 1
002 2
003 3
004 4
005 5
006 6
007 7
008 8
009 9
010 10
011 11
012 12
013 13
014 14
015 15
016 16
999 NIU (Not in universe)
PERNUM Person number (general)
01 1
02 2
03 3
04 4
05 5
06 6
07 7
08 8
09 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
LINENO Person line number
001 1
002 2
003 3
004 4
005 5
006 6
007 7
008 8
009 9
010 10
011 11
012 12
013 13
014 14
015 15
016 16
017 17
018 18
019 19
999 NIU (Not in universe)
SEX Sex
01 Male
02 Female
99 NIU (Not in universe)
RACE Race
0100 White only
0110 Black only
0120 American Indian, Alaskan Native
0130 Asian or Pacific Islander
0131 Asian only
0132 Hawaiian Pacific Islander only
0200 White-Black
0201 White-American Indian
0202 White-Asian
0203 White-Hawaiian
0210 Black-American Indian
0211 Black-Asian
0212 Black-Hawaiian
0220 American Indian-Asian
0221 American Indian-Hawaiian
0230 Asian-Hawaiian
0300 White-Black-American Indian
0301 White-Black-Asian
0302 White-Black-Hawaiian
0310 White-American Indian-Asian
0311 White-American Indian-Hawaiian
0320 White-Asian-Hawaiian
0330 Black-American Indian-Asian
0331 Black-American Indian-Hawaiian
0340 Black-Asian-Hawaiian
0350 American Indian-Asian-Hawaiian
0398 Other 3 race combinations
0399 2 or 3 races, unspecified
0400 White-Black-American Indian-Asian
0401 White-Black-American Indian-Hawaiian
0402 White-Black-Asian-Hawaiian
0403 Black-American Indian-Asian-Hawaiian
0404 White-American Indian-Asian-Hawaiian
0500 White-Black-American Indian-Asian-Hawaiian
0599 4 or 5 races, unspecified
9999 NIU (Not in universe)
EDUC Highest level of school completed
- Less than HS diploma
010 Less than 1st grade
011 1st, 2nd, 3rd, or 4th grade
012 5th or 6th grade
013 7th or 8th grade
014 9th grade
015 10th grade
016 11th grade
017 12th grade - no diploma
- HS diploma, no college
020 High school graduate - GED
021 High school graduate - diploma
- Some college
030 Some college but no degree
031 Associate degree - occupational vocational
032 Associate degree - academic program
- College degree +
040 Bachelor’s degree (BA, AB, BS, etc.)
041 Master’s degree (MA, MS, MEng, MEd, MSW, etc.)
042 Professional school degree (MD, DDS, DVM, etc.)
043 Doctoral degree (PhD, EdD, etc.)
999 NIU (Not in universe)
EDUCYRS Years of education
100 Grades 1-12
101 Less than first grade
102 First through fourth grade
105 Fifth through sixth grade
107 Seventh through eighth grade
109 Ninth grade
110 Tenth grade
111 Eleventh grade
112 Twelfth grade
200 College
213 College–one year
214 College–two years
215 College–three years
216 College–four years
217 Bachelor’s degree
300 Advanced degree
316 Master’s degree
317 Master’s degree–one year program
318 Master’s degree–two year program
319 Master’s degree–three+ year program
320 Professional degree
321 Doctoral degree
999 NIU (Not in universe)
EMPSTAT Labor force status
01 Employed - at work
02 Employed - absent
03 Unemployed - on layoff
04 Unemployed - looking
05 Not in labor force
99 NIU (Not in universe)