Design Challenge 1: One Data Set, 4 Stories

by Mike Gleicher on September 19, 2017

Due Dates:

Kickoff Meeting: September 22nd (Friday, Optional Class)
Data Set Selection: Sunday, September 24th (All data sets must be approved) (Canvas)
Sketches: October 1st (Canvas)
Rough Drafts: October 15th (Canvas)
Designs Due: October 22nd (Canvas)

Objectives: To make some visualizations with real data, and to explore how to tell different “stories” by choosing different encodings of the data. This is a chance to try out using visualization tools.

Overview

In this assignment, you’ll pick one data set to make visualizations from. Then, you will make 4 visualizations – each telling a different “story” about the data. Then you will also make a 5th visualization that re-tells one of the stories from the first 4. The idea here is that you should explore the different kinds of visualizations you might make from this data, and the different questions/tasks that you might want to show someone, and to see how you can match the picture.

We will provide a bunch of choices of data sets. We will check to make sure they are sufficiently challenging (there are good stories in them), yet not too hard in ways unrelated to the class (e.g., they need extensive cleaning or specialized science to interpret them). We encourage you to pick one of our data sets.

For this year, we will allow people to “bring their own data set” subject to a bunch of rules. The data set must be publicly available, must be on a topic of general awareness (i.e., not something that only researchers in a specialized field care about), and must be sufficiently challenging to work with. In order to use a data set not on our “approved” list, you must get our approval. We will have a “bring your own data day” (September 22nd) where you can bring your data set for public critique (and possible approval). If your dataset is approved, it will be added to the “list of approved data sets” so that anyone in class can use it. No new data sets will be approved after September 24.

You may use any tools that you like to create the visualizations – subject to the constraint that you are required to hand in PDFs, and to document your process. It is fine to use Excel or Tableau or R or JMP or some other “tool.” It is also fine to write your own programs that create visualizations in whatever programming language you like. There may be practical issues in getting pictures our of your own programs – at worst, you can use screen capture.

For the final ones, you should make real visualizations with the real data.

If you find that you aren’t able to exactly implement your design (e.g. you can’t figure out how to convince excel to use the colors that you want), feel free to “cheat” a little (save the picture and open it in Photoshop and paint over it), but part of the idea is to try to make pictures with real data (so don’t just sketch – unless you are doing precise measurements). If you’re really stumped on implementation, you can put a note in your caption “the red dots were supposed to be blue” – but try not to leave too much to the imagination of the viewer.

By September 24th, you must tell us which data set you will be using (on Canvas).

By October 1st, you will upload at least 2 sketches (either as PDF or image files) to Canvas.

By October 15th, you will upload a “rough draft” of your assignment – hopefully better than your initial sketches – to Canvas

On October 22nd, you will turn in your “final” visualizations (at least 5 – since for one of the stories you need to make 2 visualizations). For each visualization, there should be a good caption, explaining the data and enough of the story. Although, if your graph is really great, the reader might figure out the story without reading the caption. Please do not put your name inside the PDF (so that we can send them out for anonymous critique). The PDFs should be 1 page each. it should be clear from the visualization and/or caption what data set it is. Turn in a 6th document that explains how you made the pictures, and what you were trying to show with each one. These will be turned in as an assignment on Canvas.

How to do this?

We are explicitly not specifying how you should make your visualizations. Given the range of skills of students in the class, there isn’t one tool for everyone.

Our main interest is in the results. Good results are visualizations that effectively tell the stories they are trying to tell. How those visualizations are made is less important than how well they work. Well-chosen, basic charts can often tell interesting stories, but we would like you to try to tell richer, more complex stories.

We do encourage you to use this assignment as an excuse to learn about new and different tools. We intentionally added some extra time at the beginning of the assignment for people to do this. That said, this isn’t a time to go overboard: if you’ve never programmed in JavaScript before, now might not be the time to master D3. But, it might be a chance to try out Tableau – even if you decide to make your final pictures some other way.

Part of this assignment will require you to do some quick looking over the data set to see what stories are there – this is “exploration” (in statistics, they might call it Exploratory Data Analysis). The tools you use for this kind of exploration might be different than those you choose for making your final pictures.

Data Sets

We will give you a bunch of data sets to choose from. If you want to pick a data set that isn’t on the list, see the instructions above. See the Data Sets Page.

If there’s a data set you want to see on the list, submit it to us (and bring it to the optional class on September 22nd). If we agree it’s good for the assignment, we will put it on the list for anyone to use (including you).

Examples

Last year’s designs are online: http://graphics.cs.wisc.edu/Courses/Visualization17/design-challenge-1/

Data and Example Questions

Try not to pick questions that can be answered with a single statistic – but something where the visualization adds value. The richer and more complex the task the story (or sets of stories) that the visualization tells makes it more interesting (and challenging), and gives you more opportunities to make a particularly cool “story”.

For example with the airline data (a month of flight delay information):

You could give the statistics on the average delay for flights leaving Madison
You could give the statistics on flight delays leaving Madison, helping someone choose which destination has the least delays, or what time of day you are most/least likely to have a delay, or some combination of both.
You could present information on a bunch of city pairs – for example, to help someone plan a trip between Madison and San Francisco, which hub city is it best to connect through? what time of day should you leave? (if your goal is to avoid delays)

We’ve picked the data sets (but you get to choose amongst them). You get to pick the stories to tell. Think about stories that someone would care about. Stories that would be interesting.

Grading / Turning Things In

Choosing a data set: you must tell us which data set you are using on Canvas by September 24th.

Sketches: post at least 2 initial sketches (hand drawn) with ideas of what you want to do to the Canvas discussion, due October 1st. Please give feedback to other people in your group.

Rough drafts: due October 15th. Upload (at least) 2 PDF files (or other image files) to Canvas. These should have the same form as the final turn-in. Sketches are OK, but not preferred.

Designs Due: due October 22nd. This is the “main hand in.” This will be turned in as an assignment on Canvas.

You need to upload 5 designs (4 questions, 2 designs for 1 question). You may submit 1 or 2 extras. Each design should be a separate PDF file, and be self-contained with a caption. However, it should not have your name on it (so we can send it out for anonymous critique).

As an additional document (either as a PDF or in the Canvas type-in box), explain how you made the pictures, and the questions that each is meant to address (hopefully it will be clear from the vis and caption). Your peer reviewers will not see this document, but the grader will.

We will assign a grade (unclear if we will use a numeric scale or an A-F scale). The grade will be for the quality of what is turned in (other parts of the assignment, and penalties for being late will be added later). Your “net grade” will be reduced if you failed to do any of the earlier parts of the assignment (e.g., sketches, drafts), or are late.

The things we will consider include:

How good/interesting are the “stories” that you chose? Did you pick a diverse set? Are the things you chose to show multi-variate?
How well chosen are your encodings? Are they effective at communicating the message?
How well “implemented” are the designs? Are the specific detail choices made thoughtfully?

Visual appeal and implementation (beyond what is required for effectiveness) may be rewarded, but are not central.

Note: if your assignment is too late, we won’t grade it.

Peer Review: In the past, peer review was an integral part of the assignment. This year, we will do peer review separately.

Design Challenge 1: Data Sets

by Mike Gleicher on September 19, 2017

Addition: September 22, 2017: New data sets were added at the bottom (see Student suggested Data Sets from 2017 Fall). These data sets seem really interesting – but they may be more challenging.

These are the “approved” data sets for Design Challenge 1. Remember, you must use one of these approved data sets. If you want to use a different data set, you must get it approved (and we’ll put it on this list).

~~This list is in no particular order.~~

Data Sets from Old Classes

The datasets are available in this Box folder. (except for ones you need to grab yourself – and even then a copy might be available in the folder)

Everyone who is registered for class should have access to the box folder. If you are having a problem, let me know – Box doesn’t always cooperate.

White House Budget Data

The data used in developing the budgets (back in 2016 and 2017). From the White House github. I recommend going to the 2017 branch and selecting “download ZIP” (look for the green “clone or download” button). There is good documentation, and the data is quite rich – giving historical spending in a lot of categories.

In the past, we considered the “receipts” data as small, and the “budgets and outlays” as harder data sets. Here we’re grouping them together.

Airline On-Time Peformance

The Bureau of Transportation Statistics lets you download a lot of data, one month at a time from this page. We’ve downloaded a few months for you – but even if you download our versions, you might want to refer to this page for explanations of all the fields, and look up tables (files that say what the codes mean).

For this data set, you may choose to use the months we downloaded, or download your own (please specify what data you use). You can choose to use just 1 month, or you can pick multiple months to compare (if you want a real challenge).

Nationwide Crime Data

One of the functions of the Federal Bureau of Investigation (FBI) is to compile crime statistics within the US and use this information to help local law enforcement to curtail crime. Every year, the FBI releases this data along with recommendations for communities to stem violent crime. We have downloaded the 2014 year dataset (as well as 2015) of types of crime by area, available on Box.

If you use this dataset, we ask that you resist ranking cities/states or their law enforcement capabilities by their crime, as requested by the FBI. Showing trends and patterns should be your goal here.

Census Data By County

Note: this is aggregated census data – which is much less interesting than the IPUMS “raw” (or sampled) data.

You can get census data in all kinds of forms. This page has 4 spreadsheets. Any one of them could tell an interesting story – but you probably want to put together multiple files. The complication is that it’s a long list of counties (you might just pick some, or try to give a sense of the range of what is going on, or identify unusual things, or …). The files are also in the Box.

The files are:

Population Estimates – has data 2010-2015 (per year) with inflows and outflows. There is a seperate sheet in the excel file that explains the columns.
Education – has data from multiple years (1970, 1980, 1990, 2000, 2015) for different levels of educational attainment.
Unemployment – has data from many different years
Poverty Estimates – mainly 2015 data, explanations for the columns in a separate sheet.

Time Usage Survey

The American Time Usage Survey (ATUS) tracks how people spend their time. There are corresponding international versions. There are actually lots of different surveys with interesting data available from the IPUMS website.

Getting a data set requires picking from all the options. And you can probably pull together an interesting data set in many ways. I grabbed one from the site. I also checked that, despite the scary agreements I had to agree to, sharing it with a class is legal (see this), so I put a grab of how Americans time usage has changed over the years into DataSets Box folder.

You can find out what the “time use codes” mean on this page.

Interpretting the other codes requires some digging, unfortunately. Some are self-explanatory, but others… I tracked down the “FAMINCOME” columns: explanation here. The state codes are here.

Detailed Census Data

You can get detailed census data (as in samples of specific people) from the IPUMS website. This data gets very huge very fast (you can get millions of people) and requires aggregation and clever ways to handle it efficiently (Tableau does surprisingly well).

We will probably use this data for another challenge, but it’s so big and interesting (both in terms of amount of individuals as well as amount of variables about everyone), that a little redundancy is not bad.

When you create a data set, you have to pick which census to sample (e.g., which years), and which variables you want. The tool will create huge CSV files (gigabytes). It also created documentation files.

In the box folder, I have a big data grab I got (past 15 years, many variables) – there’s the CSV file and the documentation file. There is also a “reduced file” that I created with a processing script – I decoded some of the columns, and selected a subset of the years. Even this small set is millions of people!

Basketball Players

This dataset is relatively small, but should be big enough to be interesting. It was used in the past for Alper (who was the TA) to demonstrate how to use Tableau and Excel for doing class projects. It’s in the Box.

Student Contributed Data Sets (from 2017 Spring)

Beijing Air Quality Data

2 Data Sets about Air Quality in Beijing, joined into a single cohesive table.

From the contributor:

The data comes from two sources:

Air quality data: http://www.stateair.net/web/historical/1/1.html (need to download each .csv separately)
Weather data: https://www.wunderground.com/history/airport/ZBAA (here’s a link to a .csv for 2011)

I first pulled the air quality data (where measurements are taken multiple times a day), and aggregated to be at the daily level. Then I merged the weather data to the air quality data. I have a GitHub repository with the data and R and Python code.

Note: the github repo not only has the documentation for the data, and the data conveniently processed into a CSV file, but it also has code for some basic visualizations. I can’t stop you from looking at the code. But, if you are not the author, you cannot turn in these visualizations.

UN Refugee Data

UN-Link: http://popstats.unhcr.org/en/asylum_seekers_monthly

Student suggested Data Sets from 2017 Fall

These data sets were approved in class. They all seem pretty interesting. They may require you to sign up for an account.

midus.wisc.edu – Midlife in the US. Large longitudinal “census like” data. note: this requires an account log in
insideairbnb.com – AirBNB properties and reviews. note: this will require you to join across files
https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset – info on 5000 movies with review information. this data may be noisy
https://www.yelp.com/dataset – a big dataset of yelp reviews and reviewers. it may require more interesting analysis to uncover numerical data worth making pictures of.

Old Datasets that you CANNOT USE

These data sets were suggested in old editions of the class (when we had undergrads as well). They are too simple/small to be interesting. But you can use them for practice

Metropolitan Area Population Change

Note: this data set is small / easy. If you pick this one, the expectations for what you will need to do with it are much higher. I really dislike the vis on the census bureau website, you should do better (from the visualization, you can link to the data table). But the data is too small, and I’m not sure how many rich stories are to be found in it.

The Week in Vis: Week 3 (Sep 18-Sep 22)

by gleicherapi on September 16, 2017

Week 3 (Mon, Sep 18-Fri, Sep 22) – Abstractions

Mon, Sep 18 : Class: Abstraction
Wed, Sep 20 : Class: DC1 Intro, Tableau, ICE:Abstractions
Fri, Sep 22 : Optional Class (Workshop): DC1 Data
Reading: Week 3 – Abstractions
Discussion 3: Abstractions (first post due 09/19)
Seek and Find 3: Abstractions (due 09/22)
Design Challenge : DC1 – Data (due 09/24)

This past week, we looked at the broad question of “why use visualization,” talked about some historical examples (and Tufte), and did some practicing with critique.

This week, we’ll deal with the topic of abstraction. Which is really two topics: Data Abstractions and Task Abstractions.

On Monday, we’ll have a lecture on abstraction. We’ll also do another critique practice.

On Wednesday, we’ll have a mix of things. We’ll talk a little bit about the first Design Challenge (which should be assigned this weekend – keep your eyes open). We’ll talk a bit more about Tableau both because it may be useful for the first design challenge, but also because it makes use of some of the abstraction concepts (which is why it’s in a reading). And we’ll do an in-class exercise to make sure everyone has a good sense of data abstraction.

On Friday, there will be an optional class to talk about Design Challenge 1 – in particular, if you want to bring your own data set, this is your chance to get it “approved.” (this will make more sense once you see the rules for the assignment)

Learning Goals (for this week)

Understand the value of abstraction in visualization design
Have a sense of task abstraction, it’s value, and it’s limitations
Understand data abstraction, and what it means for creating appropriate (and transferable) designs.
Get some encoding basics and see how basic visualization designs come from attaching data variables to visual variables

Getting Tableau

by Mike Gleicher on September 14, 2017

Tableau has generously provided their software for students in this class. The details on obtaining it are on Canvas – note that this is only for student in class this semester (I’m using Canvas as the mechanism to only make it available to students).

Using Tableau is optional, but it is an interesting tool, it may be useful for some of the assignments, and you can learn a lot about visualizations by trying to do things with it (it embeds a lot of knowledge about how to make good visualizations and tries to guide you).

However: Tableau provides free licenses for students that ask. You can apply for a student license on their web page. This may be better than using the class license, since it may last longer (it’s not clear how long the class license will last after class).

The Week in Vis: Week 2 (Sep 11-Sep 15)

by gleicherapi on September 8, 2017

Mon, Sep 11 : Class: Why Visualization
Wed, Sep 13 : Class (ICE): Critique
Fri, Sep 15 : Optional Class: Meet & Greet
Reading: Week 2 – Why Visualize
Discussion 2: Why Visualize (first post due 09/12)
Seek and Find 2: Why Vis That? (due 09/15)

Last week, we got started with some introductory material and a simple design exercise. If you haven’t done last week’s readings&discussion and seek and find, please do those immediately! It’s important that you do them – it’s more than just making sure the class mechanics work.

This week, we’ll have a more regular schedule with class meetings on Monday and Wednesday, and a reading with discussion (first post due on Tuesday), and a seek and find (due on Friday).

The basic topic is to get some foundational perspective on visualization by asking the question “Why” (do visualization, does visualization work, …). We’ll also talk about critique and practice it a little.

Hopefully, all of the enrollment issues will settle out and we’ll figure out whose in the class and who isn’t.

Learning Goals (for this week)

Appreciate the range of reasons why visualization is useful.
Get a perspective on key foundations (perceptual, cognitive, task effectiveness, …)
Get some historical perspective on visualization
Begin to think in terms of task
Understand what effective critique is and how to do it.

If you missed class today (Friday, 9/8)

by Mike Gleicher on September 8, 2017

Some people may have missed class because they didn’t enroll yet. Or didn’t pay attention to the web site that said that this week Friday is no optional. Or, you just had something else to do.

But, if you missed class (and please, try not to miss class in the future)…

The slides from lecture are on Canvas.

The in class excercise is from:
45 ways to communicate 2 numbers

Try to generate as many ideas as you can before looking at the 45 he provides.
Then try to see if you can categorize and describe the different designs. You should try to do the exercise on your own – you don’t need to send me your answers! You missed out on some creative designs from your classmates, and some discussion of what this tells us about how to think about visualization.

Also, be sure to read through the “How to Visualize” on the course web (since we’ll talk about the 4 step process), and the 4-design moves example.

Lecture Slides online

by Mike Gleicher on September 8, 2017

I am putting the slides from the lectures online. They are in a folder in the “Files” part of Canvas. Or you can follow this link.

The Week in Vis: Week 1 (Sep 4-Sep 8)

by gleicherapi on September 3, 2017

This is the first of the weekly “Week in Vis” posts – posts that I will make at the beginning of the week to give you a sense of what will be happening.

The main events are available on the Course Schedule (look for the green thing on the menu at the top of the page). But I’ll also repeat it (it’s the same content):

Mon, Sep 4 : No Class: Labor Day
Wed, Sep 6 : Class: Intro Day
Fri, Sep 8 : Class: Kinds & ICE 2 Numbers
Reading: Week 1 – What is Visualization
Discussion 1: Getting Started (first post due 09/07)
Seek and Find 1: Bring Me a Visualization! (due 09/08)

For this first week, we’ll get started by working out some class mechanics and learning a bit about what visualization is. We’ll have the usual weekly stuff (readings, discussions, seek and find) so you can get a sense of what these different assignment types are like (and we can work out the mechanics of using Canvas).

Lectures: Because of Labor Day, our first two lectures are Wednesday and Friday. On Wednesday, I’ll introduce my broad view of what visualization is, and the four step process I like to use in thinking about how to do it. On Friday, we’ll have an in class exercise to try the process out on a very simple visualization problem, and we’ll use that to discuss how we will think about visualization in the class.

Readings: Learning your way around the class web page (and Canvas) is one of the things you’ll need to read (which includes getting a sense of what I think Vis is, and how to do it). But there are some other foundational readings designed to get you started, and to introduce you to some of the characters we’ll be learning from.

Discussion: Each week, we will have an online discussion to give you a chance to think about the issues we’re learning, and re-enforce them by discussing them with classmates. For this week, the main thing is to get the mechanics down for having in-class discussions. See this post about discussion assignments in general.

Seek and Find: Each week we will have a Seek and Find assignment to give you a chance to find a visualization in order to see how the ideas we discuss in class end up as practical things in the world. This first one is good practice to make sure you know the rules. See this post for general information about seek and finds.

Learning Goals: Normally, I’ll put the learning goals for the week in the Week in Vis post. But for this week, I’ll point you at the big long list of the whole semester’s goals since there’s a lot of other stuff (before the Week 1 specific goals) that you should look at.

The First Day of Class

by Mike Gleicher on September 1, 2017

The first class meeting is Wednesday, September 6th, at 11am, in room 3024 Engineering.

Right now the class is full. If you are not enrolled, please do not come to class on the first day. We have as many people enrolled as there are chairs in the room. Once we get a sense of how many people there are, we will let more people in off of the waiting list.

A warning that rooms in Engineering can be hard to find. Room 3024 is in the part near the corner of Randall and Engineering drive (it’s the part closest to Union South). I find its easiest to go in the side door (along Randall), and go up to the third floor – the room is at the end of the hall.

Posting Images to Canvas

by Mike Gleicher on August 23, 2017

Adding Images to Your Canvas Posts

by Alper Sarikaya on January 6, 2017

For some assignments (notably the Seek and Find assignments), we may ask you to post an image alongside your posting. You are free to use an external provider, but to make sure that images persist, we ask you to upload your image through Canvas.

In the class canvas, navigate to the sidebar and click “Account”. In the expanded tab, click “Files”.

Make sure you’re in your own folder, then upload an image. You might consider naming it something descriptive so you can find it later in the semester!

Meanwhile, to use the image in a posting, click the “Embed Image” icon (shown below) to embed the uploaded image in your post.

Under the dialog box that pops up, click the “Canvas” tab, navigate the folder structure to your uploaded image, and select your image. Try to select dimensions smaller than 1000px on the longest edge.

Note: please set the dimensions of your picture (in the attribute thing at the bottom) to be reasonable (no more than 600 or so pixels wide or tall) – if you make your embedded picture too big, it messes everyone else who tries to look at it up.

← Previous Entries

Next Entries →

ARCHIVED Course Web for the Fall 2017 edition of CS765 Data Visualization

Due Dates:

Overview

How to do this?

Data Sets

Examples

Data and Example Questions

Grading / Turning Things In

Data Sets from Old Classes

White House Budget Data

Airline On-Time Peformance

Nationwide Crime Data

Census Data By County

Time Usage Survey

Detailed Census Data

Basketball Players

Student Contributed Data Sets (from 2017 Spring)

Beijing Air Quality Data

UN Refugee Data

Student suggested Data Sets from 2017 Fall

Old Datasets that you CANNOT USE

Metropolitan Area Population Change

Week 3 (Mon, Sep 18-Fri, Sep 22) – Abstractions

Learning Goals (for this week)

Learning Goals (for this week)

Adding Images to Your Canvas Posts

Archived Web Site!

Recent Posts

Categories

Useful Links

Other Course Pages

Archives