Census Data

We will use “the census data set” for many of the design exercises in class. I put it in scare quotes because there will be different versions. This data set was specifically chosen to be “just about right” for class - not too hard, and not too easy, to work with. But complex enough to be interesting. We will provide it in “cleaned” form.

The US Department of Agriculture (USDA) provides county level aggregation of population data aggregated at the county level. They gather education data, income data, poverty data, and population data. Later in the semester, we might gather more detailed data from other sources. (we will also provide the data at the state level)

The USDA provides this data as 4 separate sheets (on USDA Census Data). Any one of them could tell an interesting story but together, they provide a very rich and complex data set full of stories.

To help you get started faster (and focus on visualization, not data cleaning), the TA (Cat Nelson in 2024) has joined the data into one “convenient” large file. We may publish better versions of this data set as the semester goes on. (later in the semester, we might provide a more detailed data set from another source).

The CSV file is: (county_census_2023-24_raw.csv 4.8mb). The state level data is: (state_census_2023-24_raw.csv 0.1mb). The state level data is provided in the original files - we assume it was computed correctly.

Beware: the file has redundant columns, missing data, and other artifacts. We may provide newer versions of the data that are easier to work with.

The data set was pulled from USDA Census Data on September 6, 2024. Cat joined the data based on FIPS code (see below) and removed aggregate regions (state and country level).

The variable descriptions can be found on here, and some of them are replicated in the following table:

Variable Descriptions
Column nameDescription
Births_2019Births in period 7/1/2018 to 6/30/2019
CENSUS_2010_POP4/1/2010 resident Census 2010 population
CI90LB017_201890% confidence interval lower bound of estimate of people age 0-17 in poverty 2018
CI90LB017P_201890% confidence interval lower bound of estimate of percent of people age 0-17 in poverty 2018
CI90LBINC_201890% confidence interval lower bound of estimate of median household income 2018
CI90UB017_201890% confidence interval upper bound of estimate of people age 0-17 in poverty 2018
CI90UB017P_201890% confidence interval upper bound of estimate of percent of people age 0-17 in poverty 2018
CI90UBINC_201890% confidence interval upper bound of estimate of median household income 2018
Civilian_labor_force_2018Civilian labor force annual average, 2018
Deaths_2019Deaths in period 7/1/2018 to 6/30/2019
DOMESTIC_MIG_2019Net domestic migration in period 7/1/2018 to 6/30/2019
Economic_typology_2015County economic types, 2015 edition
Employed_2019Number employed annual average, 2019
ESTIMATES_BASE_20104/1/2010 resident total population estimates base
FIPS_CodeState-County FIPS Code
GQ_ESTIMATES_20197/1/2019 Group Quarters total population estimate
GQ_ESTIMATES_BASE_20104/1/2010 Group Quarters total population estimates base
INTERNATIONAL_MIG_2019Net international migration in period 7/1/2018 to 6/30/2019
Med_HH_Income_Percent_of_State_Total_2019County Household Median Income as a percent of the State Total Median Household Income, 2019
MEDHHINC_2018Estimate of median household income 2018
Median_Household_Income_2019Estimate of Median household Income, 2019
Metro_2013Metro nonmetro dummy 0=Nonmetro 1=Metro (Based on 2013 OMB Metropolitan Area delineation)
N_POP_CHG_2019Numeric Change in resident total population 7/1/2018 to 7/1/2019
NATURAL_INC_2019Natural increase in period 7/1/2018 to 6/30/2019
NET_MIG_2019Net migration in period 7/1/2018 to 6/30/2019
PCTPOV017_2018Estimated percent of people age 0-17 in poverty 2018
POP_ESTIMATE_20197/1/2019 resident total population estimate
POV017_2018Estimate of people age 0-17 in poverty 2018
R_death_2019Death rate in period 7/1/2018 to 6/30/2019
R_DOMESTIC_MIG_2019Net domestic migration rate in period 7/1/2018 to 6/30/2019
R_INTERNATIONAL_MIG_2019Net international migration rate in period 7/1/2018 to 6/30/2019
R_NATURAL_INC_2019Natural increase rate in period 7/1/2018 to 6/30/2019
R_NET_MIG_2019Net migration rate in period 7/1/2018 to 6/30/2019
RESIDUAL_2019Residual for period 7/1/2018 to 6/30/2019
Rural-urban_Continuum_Code_2013Rural-urban Continuum Code, 2013
StateState Abbreviation
Unemployed_2019Number unemployed annual average, 2019
Unemployment_rate_2019Unemployment rate, 2019
Urban_Influence_Code_2013Urban Influence Code, 2013

A few things to note…

(especially for those of you new to the US)

Counties vary across the country. Some states have a few big counties, some states have lots of smaller counties.

Counties vary greatly in population and size.

Note: FIPS (“Federal Information Processing System”) code is a 5 digit string (the leading zeros are important!) that are the US Government’s ways to indicate counties. Each county has a unique code.

Some statistical things you might know better than me…

The differences in county sizes/populations make a big difference in the “noise” (random effects). Especially for uncommon events. If 1 person wins the lottery in a county with 100 people in it, that county will have a hugely high level of lottery winners one year, and a really huge change year to year (when it goes from very high to very low).