12/6/ – sample data/code enhanced with more examples and better numbering
11/30 – added link to GitHub with sample data
11/30 – Canvas hand-in links added
The Basic Idea
The purpose of this assignment is to design visualizations that help a user compare the communication networks of groups.
For a group of people, we count the number of messages sent by each person to each other person. For every pair of people, we have the number of messages in each direction. This data is a directed graph, with links between every pair of people in both directions, and weights (counts) for each edge. Examining this data for a group, you can find patterns that tell you about how the group operates. Is the group hierarchical? Is it split into smaller subgroups? Are there people who are more or less connected?
Examining one group to determine the patterns is a graph visualization problem. Which can be tricky unto itself. But, for your assignment, you need to consider how to compare between several (two or more) groups.
Schedule
- Due Sunday, December 3rd – Initial Check In. Put something into the Canvas type-in box to let us know that you’re actually working on the assignment, and if you’re working with a partner.
- Due Sunday, December 10th – Rough Draft. Upload something (to Canvas) to give us an idea of what you plan to turn in and what you’re working on. If you turn this in on time, we will provide you with some feedback. If you’re working with a partner, one partner should turn in the files, and the other should turn in a note reminding us who your partner is.
- Due Wednesday, December 13th – official deadline. No cost extension until Sunday, December 17th (the official deadline). Note: we may not be able to accept assignments past this date. Please turn in your writeup as a PDF, and any code/data that you generated. Please include pictures created with your implementation (if you made one) and enough of a description in case we aren’t able to run your program. If you’re working with a partner, one partner should turn in the files, and the other should turn in a note reminding us who your partner is.
What You Need To Do
You may work with a partner. If you work with the partner, the requirements are the same as if you’re working by yourself. We may have slightly higher expectations of quality for pairs. Both people in a pair will get the same grade.
For this assignment you must:
- Provide a list of tasks – being specific about which ones you are going to focus on for your designs. You will need to provide a list of at least 5 tasks.
- Provide at least 2 designs, with full descriptions and rationales. Be sure to explain how the designs were created to address particular tasks.
- Provide an evaluation that compares the designs you created to the baseline designs for at least 2 tasks.
- (optional) You can provide an implementation of one or more designs that allows you to show of how your design works on varying data. If you turn in an implementation, the expectations for other parts are reduced (discussed below).
Types of Tasks
You can define your own tasks (and you should specify the tasks you are considering). But roughly, your tasks are to compare the networks. In general, you should focus on trying to help make comparisons between the patterns of communications in the network (e.g., is the group hierarchical, is it well-connected), rather than simple measurements of the messages (e.g., which group sends more messages).
The line between a simple task (e.g., which group sent more messages) and a complex one (e.g., do the groups have similar structure) is a fuzzy one. To get a good grade, focus on more complex tasks.
Hardness of Problem
The problem gets harder as you have more networks to compare.
- How big are the networks (how many people in each group)?
- How many groups to compare?
- How complex are the patterns that you are comparing?
- Note: you cannot assume the groups you are comparing are the same size!
You can choose the scale of problem that you want to address. If you pick something that is easier (e.g., comparing two groups of the same small size), you’ll need to make up for it in some other way.
A reasonable expectation is to handle a few (3-6) groups of 6-12 people. Handling larger groups (20-30) is good. Handling very large groups (100 or more) becomes a different (and seemingly harder) problem because you need to find ways to summarize the patterns.
Data
We’ll provide some example data sets as CSV files. Some examples are in a GitHub Repo. There is Python code that reads and writes the files, as well as a directory of examples written with this code (in case you don’t want to run it yourself).
Rows with no commas mark the beginning of a new group. The first token (ending with a space) is the number of people in the group. The rest of the line is a “name” for the group.
Rows with commas provide the outgoing data for a person, in the order of people listed (as the rows). People will always have 0 for their “self-messages”.
For example:
3 Group One
Alice,0,5,10
Bob,6,0,9
Carol,7,8,0
2 Group Two
Debbie, 0, 3
Elaine, 4,0
In this data file, there are two groups (separate networks to compare). The first group has three people (Alice, Bob, and Carol). Alice sent 5 messages to Bob, Carol sent 7 messages to Alice.
It is OK for you to place limitations on the number and size of the networks your implementation considers.
Baseline Design
As a baseline, consider the simplest design I can think of: showing the matrix for each of the networks. You could use color encoding to help make assessing the values a little faster (I use dark for the “max” value, and white for no messages). In the baseline, the people have a fixed order (for example, alphabetical order), and I show the different groups/networks “juxtaposed” (next to each other).
You need to come up with at least two different designs (beyond the baselines). Taking the baselines and tweaking it a little (changing the color, or even using a different encoding for each cell is not really “different”).
Re-ordering the matrix (especially if you come up with clever re-orderings to address tasks) is a different design: especially if you implement a dynamically re-orderable matrix. Coming up with re-orderings only counts as one design. And you need at least two.
A second baseline design is to create a node-line diagram with a force directed layout. Use the number of messages between a pair of people as the “strength” of the connection, and use the force-directed layout to make the strong connections have short links.
Evaluation
You need to evaluate your designs, considering the tasks that you focus on.
Your evaluation can be a critique of the designs – try to give the relative strengths and weaknesses of different designs. You should have at least 4 (the (at least) two you came up with, and the baseline designs. Note: even if you only did a “real” implementation of one design, you should at least sketch a second design to compare against. Your designs should be different enough that they have different pros and cons.
Comparing against the baseline designs is a good way to show that you’ve solved the right problem.
Your evaluation should consider several tasks.
If you build an implementation…
You may build your implementation in any programming language and environment that you like. However, you should assume that we will not be able to run it. Therefore, be sure to include enough pictures of what it looks like (for different data sets), and enough description of what it’s like to use (especially if its interactive). Be sure to explain what tools you used for implementation, and give us instructions on how to tun it in case we decide to try.
If you do an implementation, you still must do all of the other parts of the assignment (task list, 2 designs with rationale, evaluation). However, the implementation gives you another way to excel. If you have a more minimal set of tasks, or your evaluation isn’t as thorough, you can make up for it with the implementation. In fact, showing how your implemented design(s) to address the task on different data sets is a good way to assess your design (you should still discuss the comparison with the baselines).
If you do not build an implementation…
The expectations for the other parts will be much greater. We’ll have greater expectations for your task list and analysis – both in terms of length and depth of description. We’ll expect more from your designs – in terms of number (you may want to have more than the minimum), and how well described they are (since we can’t see the implementation to figure it out). Give sketches of what it would look like for different kinds of data.
Your evaluation will have to be more about reasoning about the different designs – since you can’t provide real examples.
Analytic Measures
One way to compare graphs is to compute some “analytic measure” of them (a number that summarizes something) and compare that. For example, you can compare the total number of messages sent. There are fancier measures for more complex things, and I’m sure you could make up an “analysis measure” that measured a complex concept like “hierarchicalness” or “cliquiness.”
Using analytic measures may be a useful tool in this assignment, but the spirit of the assignment is to create visual representations. Consider: even if you had a measure of “hierarchicalness” that said that one group was more hierarchical than the other, you would need to show this structure to the viewer so they would trust/understand it.
Performing graph analysis (e.g., finding connected components or node centralities) can play a useful role in developing good visualizations. It’s not necessarily required.
Some Hints
This assignment is hard: it’s a hard data type, and the comparison nature of it makes it more challenging. I am aware of one research paper on a very related problem. But it’s for binary comparison of a different kind of weighted graph (it’s for brain connectivity). The tasks are somewhat different, they assume there is correspondences in the nodes, and that there are the same nodes in each graph. I’m not sure if this paper is helpful or not.
Alper, B., Bach, B., Henry Riche, N., Isenberg, T., & Fekete, J.-D. (2013). Weighted graph comparison techniques for brain connectivity analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems – CHI ’13 (p. 483). New York, New York, USA: ACM Press. http://doi.org/10.1145/2470654.2470724 (free PDF in French Archive)
I think my strategy would be to come up with tasks by thinking about the domain problem. First think about what things you might want to see in the data for a single group, and then think about what it might mean to compare them.
One challenge in this comparison is that we don’t have correspondences. The groups have different people, and possibly different numbers of them. However, you might be able to define correspondences that will allow you to show groups in similar arrangements that will make the similarities clear.
There are many different ways to show graphs. Think about which ones are most amenable to comparison.