A Tale of Two Data Sets
Over the next few weeks (3 modules) we will be working with two different data sets: US Census Data and Life Expectancy around the World. This page describes the data sets and provides access to them.
Modules 2, 3 and 4 involve design exercises where you will work with “real” data sets. These assignments will involve two different data sets: a data set from the US Department of Agriculture (USDA) that provides county level aggregation of population data aggregated at the county level, and a data set of life expectancies from around the world (across many years).
There are two versions of each data set provided.
Quick links:
- GitHub Repo: 765Data
- Census CSV file (county-level): GitHub link
- Census CSV file (state-level): GitHub link
- Census Readme: README
- Life Expectancy World Bank (1960-2023, broken down by sex):
- CSV File GitHub link
- Life Expectancy Readme: README
- Life Expectancy Our World in Data (includes historical data):
- CSV File GitHub link
- Readme GitHub link
Looking ahead, the assignments are…
- Module 2: Design Exercise 2-1: Critique Practice (due Fri, Sep 26) - You’ll be asked to critique some visualizations we made that use this data. (in addition to some others)
- Module 2: Design Exercise 2-2: Questions and Sketches (due Fri, Sep 26) - You’ll be asked to think about what questions come up in the data, and to sketch some designs that use it.
- Module 3: Design Exercise 3-1: Make Pictures (due Fri, Oct 10) - You will be asked to make some visualizations using these data sets (for things specified by us).
- Module 3: Design Exercise 3-2: Explore (due Fri, Oct 10) - You will be asked to make some exploratory visualizations of these data sets.
- Module 4: Design Exercise 4-1: Questions and Drafts (due Fri, Oct 24) - You will be asked to pose questions and make drafts for 4-2.
- Module 4: Design Exercise 4-2: 5 Visualizations (due Fri, Oct 24) - You will be asked to make several visualizations from these data sets.
Some comments on class mechanics and data sets
For these assignments, we are forcing you to use our data sets. Hopefully, they are “general interest” enough that you are interested.
We chose these data sets because they are simple enough to work for class, but complex enough to be interesting. They are (intentionally) problematic in several ways (we don’t want to make things too easy).
In the past, we’ve only had one data set per exercise. But this time, we’re trying two. But you will work with these data sets for 6 weeks - so the time of sorting them out at the beginning will (hopefully) pay off.
We intentionally have chosen one data set in “wide” format and one in “tall” format.
To make access easy, we are hosting the data sets on GitHub. (LINK)
Census Data
This data set has a variety of data about the US, broken down by county. It has information such as population, unemployment numbers, and education levels. It covers a number of years, but for different variables, the years covered are different.
The US Department of Agriculture (USDA) provides county level aggregation of population data aggregated at the county level. They gather education data, income data, poverty data, and population data. Later in the semester, we might gather more detailed data from other sources. (we will also provide the data at the state level)
The USDA provides this data as 4 separate sheets, but together, they provide a very rich and complex data set full of stories. To help you get started faster (and focus on visualization, not data cleaning), the Cat (the 2025 TA) has joined the data into one “convenient” large file.
This year, please get the data (and readme) from the GitHub Repo: https://github.com/uwgraphics/765Data/ (if you don’t have experience working with GitHub, please ask for help).
If you want to see an example of trying to work with this data in Tableau, check provide link to my Tableau tutorial look at: Tableau Tutorial for CS765: Getting Started with Census Data (although, this was last year’s data and assignment).
We will also provide a tutorial on working with this data using standard Python tools. (COMING SOON!)
Life Expectancy Data
I was inspired to use this data set by colleagues at the University of Vienna who used it for an assignment in their class.
There are two versions of this data:
- World Bank Data - This is only 1960 to the present, but has almost all countries for almost all years, and it is broken down by Sex.
- Our World in Data - This data set has varying historic data at irregular intervals (there is data for some countries over hundreds of years!), and almost all (current) countries from about 1950-present.
See the GitHub Repo: https://github.com/uwgraphics/765Data/ for the data and a Readme.
Final Thoughts…
Working with two datasets with different challenges will force you to think about how different tools work with data in different forms. We urge you to try using different tools over the course of the assignments. The assignments will force you to work with both data sets.