Dc2 Phase1 Feedback

November 19, 2021 (Last Modified: November 18, 2024)

I have given people feedback on Design Challenge 2: A Visualization Project (Hard Vis Problems) (phase-1) (assuming that you turned it in). It was done quickly and is brief, but hopefully, it can give you some guidance. Things can be discussed in more detail in office hours or after class.

The grades on this probably don’t matter - if you are putting good work into the project now, it will show up in later phases. In general, I gave everyone a 6. I was going to reserve this for assignments that actually convinced me that the student thought about things, but I was more generous - to not get a 6, you needed to blatantly not cover the prompts. A few notable proposals were given higher scores.

More important are the comments. Rather than typing the comments for each person, I put them on a list and gave you the code (you can see them below). Sometimes seeing the kinds of comments that others received can be useful.

Here are comments that at least one assignment earned. I was not comprehensive: it could be the comment applies to you, and I didn’t put it down. But seeing comments to others might give you some help thinking about the problems.

Each code has a group - two groups apply to all assignments (Missing and All), the other groups are based on problem (Subgroup, Dimensionality Reduction, Tiny charts).

Missing (or at least seems to be) (M) - used as a reason not to give a score of 6:

Which problem
Identify data
Tasks
Problems
Specificity of problem

The prompt did require you to describe a solution. But #4 implied that you need to show you’ve thought about the general problem enough to identify more specific things to work on, which comes up as #5. (#1 is trivial, because you had to pick the check box)

Applies to All Problems (A):

Gives an actual example that really shows problems.
I didn’t take the time to fully understand a proposed solution, but it is nice to see that you have thought enough to have (at least) an initial idea.
With a design, be sure to connect it to the tasks that it will serve.
Good list of many tasks.
Deep exploration of example data.
Task definitions use generic terms like “look at” or “interact”, which can be a sign of a lack of specificity.
Minimally covered the requirements - doesn’t show much thinking about the problem (or how it might be solved).
Unclear what the initial exploration shows beyond that you are at least looking at an example (which is a start!)
Concrete examples of example problems.
Provides thinking about baselines as a starting point.
Brainstorming lots of ideas is great! (although I didn’t look at the ideas too carefully)
Confusing list of high-level problems (poor use of terminology?
Not specific enough that I can really figure out what problem you are intending to solve.

Subgroup (S):

Distinguish looking for outlier items (or an item with missing data) from groups that are empty (or otherwise outliers). There might be a connection. But this should not be about examining individual items.
The “dig into groups that are big and interesting enough to see how they divide” is a strategy, but consider how the viewer will know where to choose to explore more, and maybe previewing what they would find to help avoid too much effort.
Think about tasks beyond 2 variables (since baseline designs do pretty well for 2 variables). That isn’t to say there isn’t room to do cool things with 2 variables, but think about going beyond.
Exploring what happens with standard tools (like Tableau) is a good start - but hopefully it shows the problems you need to get to.
A large number of variables with small numbers of categories can be hard. Few variables with large numbers of categories are differently hard.
If your focus is on filtering to a small number of groups, consider how to help the user choose those groups.

Dimensionality Reduction (D):

The recommended problem is comparing a high-dimensional data set with its reduction. I have been discouraging people from other problems (e.g., understanding an HD data set, comparing 2 HD data sets, …). I am open to other problems - but they may not be as amenable to visualization solutions.
There is an issue of the HD data set may not be “right” to begin with. This often happens with text or image embeddings: things may be incorrectly placed near or far in the high dimensional space. So, for example, if a document embedding places two very different documents in HD document space, a “correct” DR would place these close together in 2D - the DR method can’t say “I think the HD data is wrong so I know to ignore it in this case”.
Making an improved DR method is not a good project idea (unless it has a specific vis connection).
Identifying bias in word embeddings is an important embedding design challenge - it is a different problem than understanding the DR of a word embedding. That doesn’t mean that there aren’t ways to think about DR approaches. There is also literature on the problem.
The problem of “are the dimensions in low-D meaningful” is the responsibility of the DR algorithm itself.
There is a special case of the problem where the high-D dimensions are actually interpretable (this is different than most of the examples we talked about). Your proposal implies this kind of data - which is good, and different. The tasks (and solutions) are likely to be different than the case where the high-D data are embeddings.
The problem of bringing user knowledge into dimensionality reduction is (user-guided DR) is actually a common topic in Visualization - but it is different than the focus of this project (which is about interpreting existing DR - see #3).
Trying to device new metrics of DR quality isn’t a Vis problem by itself - but could be part of a Vis solution. There is a literature on metrics, and their application for things like cluster assessment (and even visual cluster assessment).
Tasks for using an embedding are not necessarily directly connected with the connection between and embedding and its reduction.

Tiny Charts (T):

Be clear which chart type you are working on. (if you plan to do more than one, I recommend doing one first).
It is OK to change the chart type up until Phase 2, but after you submit
It is good to show awareness (in the proposal stage) of what kinds of problems come up as things get small. Then you can try to devise methods that suggest this is happening.
It could be that legends have to be dealt with separately. You might choose to just omit them at first (and bring them back if you have time)
“Text is too small” is a bit simplistic. Axes and labels definitely require some thought around adaptation (although, focusing on the content of the chart might be a better start)
Multiple X charts (e.g., a set of pies or bar charts) is a different (in a good way) problem than a single one. But be clear is the problem to reduce the ensemble, or to reduce the small charts to make an ensemble.
The general problem of changing the scale of maps is called “generalization” by cartographers, and it is quite hard. But there is a lot of literature (and hundreds of years of historical examples).
Your problem seems to be a scatterplot - for which there is a rich literature on how to deal with density issues (which can be the dual of the size problem - the ratio of points to space is the problem).
Good thinking about the range of problems that come up.
Not clear why the starting example was chosen, is there an assumption that the larger chart is effective?
With bar charts… if the data is a histogram, then it can make sense to re-bin (this may or may not be a good idea). If the data is not a histogram, changing the set of bars might require different strategies.

Archive of the Fall 2021 Class

This web page is from the Fall 2021 CS765 (Data Visualization) class.

Dc2 Phase1 Feedback