(phase 1 deadline Feb 10 Feb 12 – although, there is a big reading assignment due Monday)
The objective of this assignment is to get you to try to figure out ways to present some data.
I am intentionally doing this before you’ve learned too many tricks: we’ll come back to this problem once you’ve learned more. This assignment will also serve to give us something to think about when we get to critiques.
Note that I am intentionally not giving you the data. Yes, you can go out and get it yourself. But we want to discourage that at this point. We really want you to explore designs by sketching – not get caught up in implementing.
Picking an appropriate problem for this task is hard, but I wanted everyone to try to have a common problem to work on. There are various reasons why I think this is actually a pretty decent domain/problem.
The initial deadline is Friday, February 10th. For this deadline, each person in class must come up with at least one design. For this design, you should have a sketch (or series of sketches) that show off what it would look like, and how it would work. You can create your sketches however you like (pencil and paper, photographs of whiteboards, things drawn in some computer program, …). For each design, you should give a brief description of what it should look like, and a description of the kinds of questions that it might be good at answering.
For the initial deadline, try to work independently: don’t look at others’ responses until you’ve uploaded yours. Try to come up with something novel and different: don’t just recreate the sample designs. If you think a standard approach applies, be sure to explain why you think it would work and what kinds of questions it may be good at answering.
In upcoming assignments and class activities, we’ll look at each others’ designs. We’ll critique and evolve them, try to cluster and classify them, and try to figure out what it might take to implement them.
The best way to come up with a good design is often to come up with lots of designs. I encourage you to invent many designs. You need to post at least one to the discussion, but there really isn’t a hard limit. I’d recommend that you post your one or two favorites, and then see what everyone else posts – and after that only post additional ones that are unlike anything else you’ve seen.
Details below, but the initial phases:
Friday, Feb 10Sunday, February 12th – Initial Designs PostedMonday, Feb 13,Weds Feb 15 – Design discussions over the web and in class- beyond that: TBD
I should note that this is fairly unconstrained – on purpose. I want to see what people come up with. I think that the range of people in the class will lead to an interesting range of ideas – which will at least lead to interesting conversation about design, if not some actual great solutions.
The Data
As mentioned in class, the data set/problem comes from the Google books words dataset. You can learn more about the overall project from this page. A lot has been done with this dataset, and the web is full of discussion of it. I recommend you poke around a bit to learn more – especially if you’re stumped for the kinds of questions you might ask.
For now, we’ll focus on the single word count data (1-grams). If it matters, I prefer to think of the data aggregated by decade.
So, the basic data we have is a set of entries:
- word, year, count, number of pages, number of books
So for each word, in each year, we know how many times it appeared, and in how many books it appeared in.
Warning: the total number of books scanned in each year varies widely, so you can’t compare directly across years. Google normalizes by dividing word counts by the total number of words in the year. I normalized by ranking the words. I have no strong opinion on whether one is better than the other (to be honest, I did ranking originally since I wasn’t sure how to easily get the total counts to normalize by). There are actually other ways to normalize as well.
For now, you have 2 example visualizations to look at:
- The google ngrams page at http://books.google.com/ngrams gives you one visualization. It lets you look at a few words at a time.
- I tried a different experiment to look at the top-N words in each decade, but it breaks down at scale. (my experiment)
Each of these has its good points and bad points. They are good for some things, and not for others. Different (I hesitate to use the word better) may be applicable for different tasks, or differently effective on the tasks these are good for.
I should add that the data before about 1700 is highly problematic for a wide variety of reasons. (not enough books, poor scanning, …)
It might simplify things for you to just decide you are going to think about rank data. And/or just focus on a particular amount of words (say the top-1000 in each decade).
The Challenge
There are a lot of different kinds of questions you might ask for this kind of data. One thing I’ve notice by giving this data to people with various tools to explore it (yes, I have more than I’ve told you about so far), is that we’re all closet historical linguists. The questions that real domain scholars have may be more plentiful, but just about everyone has something they wonder about. This is especially true if you think of the book set as representative of writing (i.e. that word frequency may be related to the range of topics people are interested in).
Some examples of questions:
- What words have interesting evolutions?
- Are there events that cause words to change?
- Can we find spelling changes? or typography changes? – at some point in time, people started using “v” instead of “u” in words like “have” – when did this happen? was it gradual?
- Do some words go in and out of favor over time?
- Can we connect changes in word usage to world events?
- Can we find pairs/sets of words that move in coordinated ways?
- Can we find unnusual patterns worth further exploration? (in my visualization, I notice that the word “you” has really been on the rise over the past 40 years – what could that say about writing/society?)
Notice that there is a wide range of questions. Some may be answered by tools that are good at looking at single words in details (the google line graph). Some may be answered by tools that show groups of words (like mine). Part of this assignment is for you to be creative in thinking up questions that lead you to design ideas.
The Google design is good for looking at a few words when you know what words you are interested in.
My design can show a larger number of words at once. (I won’t argue it’s good for anything – but some people like it)
I am particularly interested in finding designs that can look at lots of words, and help with questions that involve figuring out what words might be interesting to look more closely at. But I am mainly interested in this because I haven’t seen any really great ways to find displays that scale to lots of words. I am asking you to come up with designs. It’ll be great if you come up with ones that scale well.
Your challenge is to find interesting ways to present this data. As you’ve probably learned, there is a strong connection between “interesting design” and “kinds of things you want to show” – which is why I bring up the idea of questions. In particular, I’d like you to think about the kinds of questions that the existing/obvious solutions aren’t good at helping with. You aren’t necessarily designing visualizations for specific questions: but its useful to have questions in mind in trying to motivate new designs.
I am also being vague about the potential users. With one exception, you probably aren’t a domain expert or have access to domain experts. However, in this domain, we can all be “interested amateurs.” You may prefer to target your ideas towards interested amateurs, guess at what a domain specialist might care about (please state those assumptions) – or even better, have a discussion on Piazza (not on the thread for posting the designs, but a seperate discussion thread).
In fact, I would encourage some discussion to try and see what kinds of information about the questions, the domain, the users, …
I would like to encourage novelty, with the caveat that what I really want to encourage is innovative ideas that solve the problem. Novelty for novelty’s sake isn’t so great (although, it can inspire other more useful ideas). A good perspective on how to adapt something standard might be really valuable.
I am also aware that the nature of the assignment (sketching) may bias the kinds of designs we see (ones that are easy to sketch). Figuring out how to show things that are hard to sketch in your sketch (like illustrating what the interaction might feel like) is a challenge – be creative! The skill of communicating design ideas quickly is valuable. Learning to sketch ideas, and evaluate them before investing in implementation is really valuable.
If you’re skeptical about sketching (or maybe your ability to do it), check out http://www.alistapart.com/articles/sketching-the-visual-thinking-power-tool/ .
The Assignments
For Friday, February 10, before 11am you must turn in at least one design. You should attach your sketches to the Piazza Page for this design challenge. You should write a followup posting that describes your design and links to your sketches. (I fear if we embed the images, the page will just get unwieldy). The mechanics of handing in designs might change. The Piazza page for handing in is: here.
On Friday, Feb 10, we’ll have an optional class session. (note: coming to class is optional, since its Friday – handing in the assignment is not optional). In this class, we’ll use the submitted designs as the starting point for a discussion.
For Monday, Feb 13 and Wednesday Feb 15, I’d like you to look at other people’s designs and comment on them. There will be a lot of reading for Feb 13, but it will be relevant (since we’ll be reading about critique and evaluation).
I am not sure what will happen when 30 people all try to make comments on 30 designs. Part of this is an experiment to see if online discussion/design critique can occur at the scale of this class. I am not sure using Piazza the way we are will work. But we’ll try it.
To help you with these initial discussions, consider: what kinds of questions are the different designs good/bad for? Are there similar designs? (can we find “clusters” in the design space?) Are there things you see as particularly novel or clever? Can you identify things that depart from standard designs? Can you suggest changes that might improve things?
You may suggest ways to elaborate on the description of the design – things that could be clearer in the explanation, or better shown in the sketch. You may add new sketches or update your description in response to feedback (try to be clear that you’ve made changes or clarifications).
Note: we’ll be doing various forms of critique in class. The online discussion is probably not the place for being negative (especially about the details). My hope is that the online discussion will help us group the designs and issues so we can talk about things in groups.
The expectation is that each person should comment on at least 2-3 other designs. You may want to respond to comments made about your design. Please do this before Wednesday, February 15th (so I can look at them to plan for class). You are welcome to continue the discussion after class.
We will probably do some critique/discussion of the designs in class on the 13th and 15th. (Friday classes are optional, but the assignments are not).
{ 1 trackback }