Theme 1: Group Populations

In this theme, your objective is to create visualizations that show the distribution of the samples across groupings formed by different variables.

With a large number of variables, we can divide up our total population in many ways. We can pick any variable, or pair of variables, or even more (unemployed people in Wisconsin who have 3 kids and were surveyed on Tuesday). With any variable, there might be many ways to form groups or bins. If its a numerical variable, you can divide up into ranges in different ways; if its a discrete variable you may group it in different ways.

A problem when we start dividing things up into groups… at some point these groups become so specific that we won’t have enough samples to say anything meaningful about the group - for example to make comparisons. If there are a small number of samples in the group (or no samples), it may not be an interesting grouping to look at.

In this project theme, your job is to design a tool that helps a user explore ways to divide the population into groups such that the groups have enough samples in them (for some user defined idea of samples).

For a single variable, we could show a histogram that indicates how many samples are in each group. For two variables, we could make a 2D table (maybe encoding each cell with a color or …). Beyond two variables, it gets hard.

Even with a single variable there are many choices - you might change the ways the bins are divided. When there are multiple variables, each could have its grouped adjusted individually, or together (on weekdays, group unemployed and employed people; on weekends split employed and unemployed; so there are 3 total groups).

Note that there are many potential different tasks: a user might have some division in mind and need to check if there are sufficient samples in each group, or need help adjusting (e.g., combining groups) to get appropriate sizes; a user might be exploring different ways to divide things up and see how the population distributes across multiple dimensions; a user may be seeking even splits to create different grouping.

Note that automation may be a useful tool, but it must be coupled with visual tools to make it “interesting” in terms of this assignment. For example, you might use adaptive histogramming (create bins such that each bin has the same number of samples), but then couple this will tools to see how the distributions are distributed in other dimensions.

It might be possible to do something for this project without interaction. For example, you might come up with a good design that in static form allows a viewer to quickly assess the 5 dimensional histogram that comes up from 5 variables. If you do this, you should give multiple examples, and describe how it might generalize. (using the 5 variable example, you might pick a few different sets of 5 variables to show, as well as describe how it might work on other sets of 5). Static displays of 3 or fewer variables are unlikely to be interesting as there are some standard solutions.