Reading 4: Evaluation

February 4, 2010

in Assignments

(reading due Tuesday, February 9th – please post comments before 7am)

One big question we’ll need to ask with anything we do with visualization is: is it any good?

There are many different ways to assess this. In fact, you can ask this question from the different perspectives on visualization (domain science, visualization/CS science, design). I’ve chosen 3 readings that come at evaluation from these different directions:

  • Tamara Munzner. A Nested Model for Visualization Design and Validation. Infovis 2009 (project page with pdf)

Of course, we can’t talk about “what is good” without consulting Tufte for his strong opinions. (not that he isn’t going to make his opinions clear). This “chapter” is kindof split into one on good and one on bad.

  • Edward Tufte. The Fundamental Principles of Analytical Design. in Beautiful Evidence. (protected pdf). In hindsight, this Tufte chapter is actually much better in the “how” to make a good visualization, and trying to distill the general principles, than many of the others we’ve read. But its Tufte, so its still full of his opinions on “what is good.”
  • Edward Tufte. Corruption in Evidence Presentations. in Beautiful Evidence. (protected pdf)

Finally, Chris North at Virginia Tech has been doing some very interesting work on trying to quantify how much “insight” visualizations generate. I recommend reading the actual journal article with the details of the experiments, but the short magazine article might be a good enough taste of the ideas. (Update: I actually recommend reading the shorter “Visualization Viewpoints” article, since it gives a better overview of the basic ideas. If you’re interested, you can go read the longer journal article that details a specific experiment.)

  • Purvi Saraiya, Chris North, Karen Duca, “An Insight-based Methodology for Evaluating Bioinformatics Visualizations”, IEEE Transactions on Visualization and Computer Graphics, 11(4): 443-456, (July 2005). [pdf]
  • Chris North, “Visualization Viewpoints: Toward Measuring Visualization Insight”, IEEE Computer Graphics & Applications, 26(3): 6-9, May/June 2006. [pdf]

Everyone should read all 3 of these. (well, at least 1 chapter of Tufte and at least one of the Chris North papers).

In the comments, share your thoughts on how these different ways to look at evaluation (well, Munzner actually gives several – but I am lumping them together) might relate and help you think about creating visualizations and/or visualization research yourself. What do you think is important for your perspective (e.g. your domain)?

If you have experience in another domain where there are ideas of how things are evaluated, how might these ideas relate to how visualization is evaluated?

Everyone in class must contribute at least one “top level” comment answering the questions above, and preferably add some replies to others to “start up” the class conversation on evaluation.


dalbers February 8, 2010 at 6:03 pm

Reading through Munzner’s paper, the two Tufte readings, and “Visualization Viewpoints,” the authors seem to place their target audiences distinctly on both the triangular spectrum of Visualization Science – Art – Domain Science.

Munzner appears to focus on ways for the individual to “tell” about a particular visualization. She tends to focus on evaluation for the creation of evaluation methods for effective reports on visualization rather than the evaluation of the tools for the purpose of evaluation for the purpose of design. Through her work, I learned more on the technical aspects necessary to write a good information visualization paper. She tends more toward using evaluation as an expression of the science behind the visualizations.

Tufte’s work on the fundamental principles of design communicates methods for the evaluation of how to “show” data through visual presentation. He lays out methodology more focused on the art of visualization and, in turn, I gained a better sense of how to physically lay out the visual content of data.

North’s work focuses mostly on conducting an accurate evaluation of how to “use” a visualization tool. Through his description of the troubles with current benchmarking analysis techniques, I was forced to consider the biases involved in typical black-and-white statistical evaluation so frequently used in engineering-based fields. Tufte’s work, while seemingly more emotionally motivated, also focuses on the end-user. Unlike North, however, it seems to focus more on informing the end-user how to examine a visualization rather than guiding designers on how to evaluate the value of their design. In both cases, this focus on the end user falls in the domain science category of visualization as the authors describe how to get actual feedback from the domain users in a subjective and reportedly more comprehensive manner.

From my prospective as an individual involved in visualization research, I found that, while Munzner’s paper was most relevant to my own work, Tufte’s chapter on fundamentals and North’s article brought up evaluation considerations that I had previously not taken much time to consider. Typically, given my previous experiences in computer science and mathematics, proofs and statistical evidence typically provided a sufficient evaluation of something. However, getting a chance to learn about the other aspects of visualization and how they are evaluated brings a decidedly more qualitative and user-influenced notion to the evaluation of design and more heavily integrates weight than in the more quantitative domains.

turetsky February 8, 2010 at 8:51 pm

I found Muzner’s paper to be accessible and to focus on the designer. It’s about how to create and evaluate a good visualization. I enjoyed the fact that she lays out the problems that can occur at each level of creation and evaluation. I was particularly amused by the fact that all her examples were other papers, which makes sense considering the topic of the paper, but it was a refreshing change of pace.

Tufte focused on how a good visualization can be well read and interpreted by someone else. I disliked how he seemed to put Minard’s visualization on a pedestal, as I found that, to me, at least, some of the visualization was not immediately intuitive. I also was turned off by some grammatical errors (however hypocritical that may be) in the chapter (e.g. “credit is to often absent”).

North’s article brought up some interesting points about how to choose which all ready made evaluation tool for a given set of data. However, I think the fundamental problem with visualization and bio-informatics is that researchers in these fields do not know that they have these tools to choose from. I’ve been to medical conferences and almost every poster I’ve seen looks the same, they are all basic statistical images (scatter plots, bar graphs, etc.) While it might be the case that the data is not complicated enough to warrant a more creative visualization, I think that more effort should be made to promote the tools to researchers in various fields.

While I think

punkish February 9, 2010 at 1:51 am

I read Tufte’s “Corruption” chapter after reading the “Fundamentals” chapter, and I too couldn’t help but see the irony in reading the section on “Punning and overreaching” and his bubbling enthusiasm for Minard’s map.

Nate February 9, 2010 at 10:11 am

Yup. And it’s pretty brazen of him to complain, in his opening paragraph of ‘Corruption’, of rhetorical ploys in presentation.

watkins February 10, 2010 at 11:20 am

I don’t see the irony here. I might have seen some irony while reading the “Cherry-Picking, Evidence Selection, Culled Data” section, where Tufte implies that enthusiasm for ones work is related to poorly designed studies, but he’s trying to make a different point in the “Punning” section. The author of the excerpt of “Painting Outside the Lines” either intentionally or carelessly uses a word that can be interpreted multiple ways, then uses the meaning that is most convenient for him to make his point. So it’s not necessarily about the tone of the writing, just the vocabulary.

Also, in the interest of not cherry-picking, it’s probably worth pointing out that North’s article had a typo too. (“They place an undo burden on the evaluation designers…”)

Nate February 8, 2010 at 9:38 pm

Ah! Here are some readings that would be right at home in the librarians’ Information Architecture focus — North and Munzner, in particular, read pretty much like usability testing papers. And, as per normal, all these readings approach similar ideas in slightly different ways.

Munzner, as usual, is very “here’s the process; it will generally give you results.” The steps almost have the feel of testable hypotheses, and even if you don’t follow the steps exactly, this is a great primer on how, generally, to go about deciding what to build and how to iterate its design.

I was both very intrigued and a little frightened by the North paper — in this case, it felt like a model-free approach for assessing the effectiveness of tools for analyzing model-free data. I was intrigued in that both the assessment and the tools they were assessing fundamentally harness the power of the human mind in pattern-finding. Biologists know what, in these tools, looks like “signal.” Vis researchers, in these assessments (perhaps with the help of a domain scientist), know what constitutes an “insight.”

The frightening part, however, is that there’s no underlying hypothesis. These tools appear to encourage researchers to look at huge datasets and find patterns, without any obvious method to assess the applicability of these patterns in a larger context. I see this in fMRI analysis, where researchers essentially run great huge numbers of statistical tests, and ultimately tell a story explaining the tests that turned out to be significant. Many, many factors (pretty much everything Tufte rails against in his cherry-picking chapter) combine to make it easy to convince an audience, and even oneself, that this story is well-supported by the evidence.

Of course, there’s an inherent problem in both microarray DNA and fMRI datasets — in both cases, you’re dealing with a whole lot of data that’s still understood relatively poorly. One subject’s worth of fMRI data may reasonably have well over 100 million observations; finding an informative way to show that much data honestly is a serious challenge — and one that Tufte, for all his strengths, doesn’t address.

What I’m getting at is that I feel like vis tools — especially interactive explorers for massive datasets — have the potential to encourage investigators to overfit their models without even realizing they’re doing so. Vis tools need to not only allow researchers to gain insight into their data, but also easily test to ensure they’re not engaging in self-deception.

Ning February 8, 2010 at 9:40 pm

Tufte’s paper is very interesting to me. It mentioned some important principles that a good analytical design should follow. The example (the successive losses in men of the French army) in the paper was very illustrative and concise. Although those principles might have been proposed for a long time, they still sounds reasonable nowadays.

North’s paper defined what is “insight” and how to evaluate the visualization reasonably. It first mentioned controlled experiments on benchmark tasks, and list several drawbacks with that method. I agree with that. Then, it proposed two methods to measure the insight. Personally, I prefer the second method (eliminating
benchmark tasks), since it can give the users more space to express their feelings about a visualization, rather than guiding the user to make a choice from a fixed answer set.

Munzner’s paper talked about the nested model for the visualization design and validation with four steps. The interesting stuff to me is that the model is able to analyze the previous existing system and also three recommendations from the model. However, compared with other two papers, Munzner’s paper is a little bit abstract and not easy to follow. I can understand that Munzner wanted to splits visualization design into levels. But, why choose 4 levels ?

In my research of visualization of data, I used some principles to evaluate the visualization is good or not. Such as follows,

(1) Conciseness: visualization with simple graph (less nodes and edges) are more concise, and hence are easier to understand for users.

(2) Coverage: If more information about the data is covered, the resulting visualization is more comprehensive

(3) Interestingness: the visualization should reveal some information/insight beyond people’s common knowledge, that is to say, people can not predict or know that information without the visualization.

The above three principles are what I used in my research. I believe Tufte’s paper is more related to it, because we both need some principles to evaluate the visualization. However, North’s paper provides me a new prospective, that is to study the user’s feedback to decide the visualization is good or not.

Jim Hill February 8, 2010 at 10:10 pm

Of the three readings, I liked Munzner’s the most. Her method for developing and validating visualizations looks a lot like modern software engineering methods. At first glance it would seem that she’s promoting a waterfall development cycle, but she does mention iteration which makes it seem more agile (a method that I approve of for large software development). I also liked that she provided methods of validations for each stage in the development process. Initially, I thought that there should have been a fifth level following algorithm design that was simply labeled test, but I think her method of testing at each level is better and probably would produce better results.

I also liked Tufte’s piece. I feel like he’s a little too general though. Some of his rules for good visualizations seem a little vague and lacking in definition. I feel like there might be instances where the rule could be followed, but Tufte would still think the visualization was done incorrectly. I must agree that the map of Napoleons route was very fun to look at. In particular, the method by which so many variables were displayed, it’s amazing how comprehensible the final product was. Having said that, I’m still hesitant on packing as many variables as possible into visualization as possible. I think in the interestes of giving important data more influence in the final visualization, perhaps fewer variables are better.

I didn’t like North’s paper as much as Munzner’s. He seemed to be in favor of a very rigorous method of comparing visualizations. At the end of the day, I think this is analogous to optimizing source code; there are some things that give huge performance boosts and some things that are almost negligible. Munzner has the right idea, the best feed back is going to come from asking the end user to tell you if they like it or not and if not, how to change it. Like software, there are hundreds of different ways to visualize a data set and I think that spending time trying to quantitatively rank them is kind of a waste of time.

Most of my ideas for using visualizations are in the tool smith category. For that reason I prefered Munzner’s paper over the others. She really seems to be nailing down a good procedure for putting together good, useful visualizations.

Shuang February 8, 2010 at 10:50 pm

Munzner’s ‘A Nested Model for Visualization Design and Validation’ tends to present a method for a unified way to design and evaluate visualization. In order to explain how to evaluate visualization systems, the author establishes a model that splits visualization design into four levels. I did not go into the details of the modeling, but I think the main idea here is that it guides designer to choose the right visualizations for scientists. The 4-level model is nested, so it is helpful for users/domain scientists to apply it in order.

Tufte’s work evaluates the goodness of visualizations on the viewpoints of readers. The criteria of evaluating visualizations are categorized. Those categories give me a sense of how to visualize data more efficiently. When going through the third principle, Multivariate Analysis, which I am particularly interested in, I find that the method to present the fourth dimension, time, is interesting.

North’s paper uses microarray study as an example to evaluate the visualization work. The data is time-related, high-dimensional and of large amount. Among those examples, tree plot is used to categorize genes and heat map is used to identify the hot spots among genes. The evaluation is based on comparing it with previous software and interaction techniques are used to get a clear feedback. By quantifying insight generation, the insight of the visualization is presented.

In my previous intern experience at a bioinfomatics company, I worked to create program to make heat map for microarray data, and I wish I could have read North’s paper at that time. When visualizing microarray data, how to concentrate on the most important genes is the core. For scientists, the most interesting thing is not how the color difference shows different genes or other fancy stuff, but to identify the really hot spots at the first glance.

I think the aspects in visualizing statistical genetics include how to make the figures containing much information and how to select useful information to present. North’s paper gives a vivid example in this domain and can be helpful especially when designing tools.

Adrian Mayorga February 8, 2010 at 11:19 pm

The three papers gave very different perspectives in how to evaluate a particular visualization. Munzner gives a hierarchical way of thinking about visualizations, and describes in detail how to evaluate each of the level. Tufte’s pieces seemed to be a bit more general and arbitrary. The first reading enumerates principles that a good visualization must have, and discusses these through a case study of the map. The second reading enumerates things that one should never do in visualizations. Finally, the North reading provides a very rigorous framework for conducting user studies, and argues of the need of having more than just contrived benchmarks.

While I personally find Munzner’s work to be the most immediately useful, as it describes a way to design AND evaluate visualizations from a visualization scientist point of view , the other two perspectives do not go unnoticed. Evaluating visualizations is obviously not an easy task, and perhaps the best way to do it is to put ourselves in each of the three (Vis Scientist, Designer, Domain Scientist/user) roles.

Nakho Kim February 9, 2010 at 9:00 am

I strongly agree to the point that all three roles should be taken, though I interpreted them in more straightformard terms – evaluate your visualization by examining your visualisation process systematically(Munzner), check whether it tells the right story(Tufte), and evaluate how it fared with the users(North).

ChamanSingh February 8, 2010 at 11:27 pm

Chapter : The Fundamental Principles of Analytical Design.
Book : Beautiful Evidence, Edward Tufte.
Wow factor : 1/10 ( 10 represent an intellectually stimulating article).
Comments : In this paper, Tufte explains What visualization must be and the
fundamental expectation from the visualization must “Insight”
to augment thinking about the data.

I liked the main theme of the article and but very critical
about the style of writing, subjective judgements and lack
of objectivity in conveying the idea.

Tufte “Best Statistical Graphics Ever” is the worst graph
I have encountered in my professional career.

(1) How does Multivariate analysis of 6 dimension useful
in knowing what we want to know. How latitude and longitude
useful in the information ( At least not clear from his writing)

(2). Where is the direction of army movement in the graph ?
Wouldn’t it be simple to draw two pictures separately on the
same graph.

(3) I got completely confused about the direction of dates.
Couldn’t make out who is attacking whom ? Dates decreasing from
left to right.

(4) What the criss-cross diagram showing ?

Well if the purpose of visualization is to augment understanding
the certain fact or phenomenon then this graph failed miserably
in conveying the meaning in simple ways.

I wish there were some recent dataset from Biology, physics, CFD etc to
convey the idea than one 1845 graphics.

Chapter : Corruption in Evidence Presentation ….
Book : Beautiful Evidence, Edward Tufte
Wow Factor : 1/10
Comments : Before we comment on this paper, we need to decide under
which camp we belong to :

Camp -I : An idealistic, truth seeking camp ( but they
probably don’t know what is truth).

Camp -II : A group of educated people who use visualization
as an augmented memory tool which can also be used
for suppressing the truth for some benefits or
to focus certain truth more than other truths.

I personally think, all the words “Corruption, Cheery-Picking,
Overreaching and Chartjunk” have been given negative meaning
in this article, but the author failed to give some objective
meaning of these terms ( probably they don’t exist ).

In positive sense, I will consider “Cheery Picking” is the
art of being focus. A chartjunking example given by the
author is a great example to show the effectiveness of
visualization in communicating a simple idea to the masses.

Author’s overemphasis that “Visualization is truth seeker” is somewhat not
convincing. Companies, politician, corporate world use visualization to
utilize the weakness and limitations of human cognitive skills and I think
they do their work very scientifically.

Paper : Toward Measuring Visualization Insight
Authors : Chris North
Wow factor : 3/10

Comments : Till August 1948, humanity was using word “Information”
liberally without knowing what is “Information”, It was
the Shannon whose brilliancy and deep insight which gave
us precise definition of “Information”.

We are on the the same road as pre-1948 era and we don’t know
what is “Insight” and this paper doesn’t make it clear
either. The four characteristics of insight i.e. Complex,
Deep, Qualitative, Unexpected and Relevant may be acceptable
as long as we don’t know we do not know the meaning of “Insight”.

Over many years, I have developed visualization tools that helps scientist
and engineers in understanding laws of physics better (for example how
turbulence effects the carotid arteris), therefore, I don’t understand the
importance of benchmarking the task. Every problem is different, so are the
solutions. For me, Insight is “one-to-one mapping” between theory and reality.
Visualization, like all other branches of science need specialized training
and “training” and not the “benchmarking by commoners ” should decide
what is good visualization.

The paper makes too general statements and lack of objectivity and
results makes this paper uninteresting to read.

Paper : A Nested Model for Visualization Design and Validation
Authors : Tamara Munzner
Wow factor : 6/10

Comments : This is a nice paper in which Tamara tries answer one of
frequently asked question. How to evaluate the visualization
system ?

The concepts are not new and we had read from the earlier
papers about the importance of (1) characterization of the
task (2) Vocabulary (3) visual encoding and (4) interaction
technique. The good thing that Tamara has done is to make
them vertically integrated in Nested Model which I think
make simpler. She gives many examples, compare with other
models and provides some limitations that make this paper
more scientific than other papers.

Probably, the use of word “Threat” from networking domain
could be avoided. “Threat” has far more serious implications
than the innocuous visualization tools.


Paper : An Insight based Methodology for Evaluating Bioinformatics

Authors : Purvi etc.

Comments : The paper is outside my domain of knowledge and too lengthy
to focus and draw some conclusions. I do not know what
Micro-Array dataset are. My knowledge in Biology dataset is
close to zero.

punkish February 9, 2010 at 1:30 am

I don’t necessarily agree with all of Chaman’s analysis, but I do applaud him for his bravery in calling bs on Tufte. Tufte, for someone who makes a pretty emphatic deal out of “honesty” in analysis, makes no bones about what he believes to be good or bad. By starting off with defining Minard’s map as the world’s best graphic ever, it is hard to find him finding fault with it.

lyalex February 9, 2010 at 2:34 am

I still think Minard’s graph quite good. To my understanding, he tries to show the historical matching map with his own attitude: losses of a war.

I’m trying to answer some of Chaman’s questions:
(1) How does Multivariate analysis of 6 dimension useful
in knowing what we want to know. How latitude and longitude
useful in the information ( At least not clear from his writing)

It’s a map. The primary goal of Minard might be to present the historical information about this “death match”, so latitude and longitude is used to mark the French Army’s path.

(2). Where is the direction of army movement in the graph ?
Wouldn’t it be simple to draw two pictures separately on the
same graph.

The direction is indeed not explicit. However, the tan and dark routes are the path of the same entity (the French army), drawing them together is clearly to show the loss in the battle as well as loss by the winter.

(3) I got completely confused about the direction of dates.
Couldn’t make out who is attacking whom ? Dates decreasing from
left to right.

The date only corresponds to the dark route. The French is defeated at the gate of MOSCOW, and retreating ever since. The tan route is showing the French Army’s match to beat the Russians, and after they are defeated, they are sufferring from the attack of Russians and the cold winter, thus only date on the dark route is necessary.

Jim Hill February 9, 2010 at 5:26 am

I didn’t notice that there was no visual pertaining to the direction of the army. However I didn’t have any issues determining what the direction was. Could it possibly be that there was enough context to imply the direction?

jeeyoung February 9, 2010 at 12:41 am

Munzner (Vis Science), Tufte (design) and North (domain science) have different perspectives on evaluation. Munzner does evaluations at each layer divided by visualization design model. Tufte presents his fundamental principles to follow for good design and corrupting activities to avoid. North tries to measure “insight” that visualizations produce as an evaluation.

But, all three evaluation perspectives have one thing in common – content. North thinks almost only about the highest level (domain threats) of Munzner’s nested model. Tufte also emphasizes the content in their quality, relevance and integrity (principle 6: content counts most of all).

Those three ways to look at evaluation seem to be entangled each other. Some of Tufte’s fundamental principles of analytical design (comparisons, integration of evidence) or chartjunk might correspond to Munzner’s encoding threats but also higher level threats of nested design model. North’s insight measurement methods (eliminating benchmark tasks) can be applied in Munzner’s highest level.

I have examined visualizations by some of Tufte’s principles – comparisons, labels, legends, documentation, etc. Other principles of Tufte’s principles will help me to produce better visualizations but I think putting too much information is not always good because it sometimes needs more time to interpret the visualization. Munzner’s method will be especially helpful when I produce lots of same format plots. North’s method is interesting but I think defining metrics on insights will not be easy.

punkish February 9, 2010 at 1:26 am

The readings deal with the topic of evaluating visualization. Munzner develops a theory of evaluation; North deals with the meta-level question of what is that which visualization is trying to convey (“insight”), and proposes pros and cons of being too precise or too vague in measuring insight; and Tufte dives right in, waves Minard’s poster as the “world’s best graphic,” and comes up with 6 working principles of visualization and one grand unifying principle — good visualization does visually what thinking does cognitively.

North’s paper was too light-weight to add any substantive remarks here. Tufte, although full of a certain level of bombast, did seem to provide useful principles that, while not necessarily accurate, are certainly articulate in understanding the purpose and efficacy of visualization. Munzner occupies an enviable position between a designer and a tool creator, and it shows. Her four-level model, from problem characterization to mapping it to operations and data to designing the visual encoding and interaction to mechanizing the entire process describes the design and manufacture of a visualization device.

punkish February 9, 2010 at 1:32 am

(was unable to edit the comment — the fancy Ajax edit comments box came up, but failed to load the text. I clicked cancel after waiting a bit).

Wanted to clarify that I read the short paper by North rather than the one on micro-arrays.

watkins February 9, 2010 at 1:36 am

Chris North- Toward Measuring Visualization Insight

North’s paper is very ambitious in that it first characterizes insight as qualitative, then offers methods for quantitatively evaluating it.

The evaluation problem seems to be analogous to the definition problem: Simple benchmark tests are too restrictive to capture the essence of insight, but any evaluation method that does is too general and vague to be useful.

I’m not sure that insight is necessarily qualitative, it just takes on very different meanings depending on context. So maybe the things to take away from this paper are that there is no blanket evaluation technique that will work for all tools, complexity must be carefully balanced with objectivity, and context should be clearly established before researchers begin designing evaluation methods.

Edward Tufte- Corruption in Evidence Presentations

Ethics is important in all aspects of research, including presentation of data and results. Of course no one should maliciously or intentionally try to misrepresent their data, but sometimes people make honest mistakes. I don’t think they should be condemned for it. That’s why I think the best advice Tufte offers in this chapter is to “consumers” of presentations, that they should keep an open mind but not an empty head when presented with new information.

On another note, I’m not sure I can stand behind Tufte’s hatred of repackaging. After all, it seems like primary reports are just a repackaging of raw data. It’s true that information can be lost or distorted in a kind of “telephone game” of multiple repackagings, but the goal of these multiple reports, at it’s most basic level, is to make information more accessible to a wider audience of people, and I think there’s something to be said for that.

Tamara Munzner- A Nested Model for Visualization Design and Validation

We have taken some time during in-class discussions to comment on evaluation (or validation) methods of visualizations and come to some vague conclusions about what a “good” evaluation technique is (see my response to the North paper). Munzner’s paper helps us classify different types of validation and couple them with simple, concrete steps for creating a visualization to address a particular domain problem. If this paper encourages vis scientists to consider all these steps in a more organized way, with a greater focus on the more tedious but higher level layers that are often overlooked, then its value transcends many specific applications and domains and making it one of the most influential papers we’ve read so far.

I read this paper right after the Tufte chapter, and noticed that its vocabulary sections seemed to specifically address the “punning multiplicity of meaning”– it distinguished the technical and everyday meanings of words, and noted the intended meaning of the word for the purpose of the paper.

lyalex February 9, 2010 at 2:21 am

As I’m not yet (might not be one in the future, too) an expert in VIS, I consider Tufte’s paper more impressive, as it uses a single example “Minard’s map” to explain the principles of visualization. A example of a good map is really helpful to understand the author’s points. The principles listed is very completed, at least it covers the importance of my domain science (natural science)fields. Comparison is exactly why we always use a “control group”, and multivariate analysis is why we use 3-D graphs to show our modelling. However, in the relavance of principles part, the author didn’t clearly identify the “hierarchy” of principles. In my opinion, the context, document and integration of evidence should be include in every graph in a series of natural science papers, but comparison and casuality can be implict.

Munzner’s paper created a nested model of four levers. (1) characterization of the task (2) Abstract level (3) visual encoding and (4) algorithm for implementation. I really consider this paper as a review paper as it listed and cited so many previous papers and studies. Though without detailed samples and a little bit hard to understand for me, the nested model and upstream – downstream invalidation method is really effective, and the idea of seperating the levels will surely be helpful within the visual design process. The only “defects” I can feel about the paper is that: first, a detailed example that obeys the levels of the model would be every good for the audience to understand the paper and would be a direct proof of the effectiveness of the model; secondly, the authors list a lot of previous papers which do not completely fit in the model but work, which left us this question: will the nested model be accurate and concise enough?

North’s paper is kind of trival to me. My impression is only that the authors attacked the betchmark appoarch for evaluation, but the alternative evaluation method the authors proposed seems only an more qualitive version with far less feasibility. I learned and accepted the drawbacks of the betchmark evaluation, but what can we do to avoid that? I consider the authors’ reply is ambiguous.

hinrichs February 9, 2010 at 3:14 am

Tufte – on Evidence corruption. One of my all-time favorite quotes is, “the best way to tell a lie is to tell the truth so unconvincingly that no one will believe it.” Tufte hit on that early on, but not in so many words.

It’s interesting that this chapter focused mainly on obfuscating language – double meanings, over-reaching, hiding the evidence by summarizing, and every shade of confirmation bias. While I am whole-heartedly on board that language is often ripe for abuse and one should always demand precision wherever possible, I wonder how many of these ideas translate directly into visualization topics.

Double meanings – Do arrows, lines, colored shapes or scattered dots offer multiple interpretations? Probably they do in some cases – there are multiple visual channels, some of which encode real data, but the rest have to be populated some other way. Probably the best is to make the data uniform in any channel which is not meant to be interpreted, though that’s not always possible – for instance, suppose you decide not to use position – you can’t put all of the graphics objects in your visualization on the same spot!

Summarizing – Definitely this is a danger in visualization. One has to take care that information being discarded so as to fit the data into a 2-dimensional space is non-essential.

Confirmation bias – As long as the mode of visualization is chosen before seeing how it looks, one should be free of this bias, however, sometimes you might try something out that doesn’t work for other reasons.

Probably the main reason for the prevalence of these kinds of intellectual dishonesty is that there is no institution that can be created which will guarantee fair and impartial presentations of information, perfectly. (There are imperfect ones, such as the academic review process.) Nevertheless Tufte is happy to give some stern warnings. 🙂 Another reason would be that it is often done unconsciously – even with the best of intentions, producing a complex communication which is free of distortion is a strenuous undertaking.

Chris North, on measuring visualization insight.

A personal opinion – I am generally inclined to never believe self-reports on subjective criteria. Thus, a study that asks users if they liked a certain visualization, or if they would use it are not very convincing to me. Data on whether they actually used one tool over another is far better. Better still is some metric on how successful users are at certain tasks when using a certain tool.

I once read Jef Raskin’s book on the Humane Interface, and it had some interesting things to say about quantitatively measuring how good certain interfaces were. Some things included Shannon-information content (for instance, a popup which only asks you to click “ok” has an information content of 0,) time taken to perform certain tasks, number of operations taken to perform certain tasks, number of errors made in performing certain tasks, etc.

UI is not the same as data visualization, though if the visualization is interactive, then it’s pretty close.

I think Chris North’s point is that, sure, you can measure how effective people are at doing some manual tasks, which is all well and good, but how does the visualization help make someone a better scientist, engineer, or detective? This is a much more ambitious question. While there probably are lots of gainful things to do, I wouldn’t expect a full “reduction to practice” for something of this magnitude. The main pitch is to let users explore the data, and record their insights as they go, either by writing them down, or saying them out loud.

Coding – I think this is the heart of it, because this is how you tell how good the insights were. This is basically the entire problem, swept under one giant rug. Ok, we need to measure something complex and subjective – so we’ll have someone assign numbers to it according to their subjective notions of what constitutes depth. North writes,

“Coding converts qualitative data to quantitative and is
inherently more subjective, but supports the qualitativeness of insight. Significant objectivity can be maintained through rigorous coding practices.”

Yes, you can have rigorous, but still subjective practices, which leave the door open to all of the drawbacks of purely unquantified measures. Numbers are a good way to represent things, but they have to mean what they say.

Clustering – Who does the clustering, and how?

Errors – Who is to say what is an erroneous insight?

Domain experts – I think this is a good idea that compensates for a lot of the weaknesses above. In any field, there are inherent controversies and ambiguities, and resolving them is no easier than deciding whether someone has gained an insight from exploring the data. The solution is to nominate some people as experts whose opinions can arbitrate, wherever there is a consensus among them. If it’s good enough for the whole of academia, and leadership in general then it’s probably good enough for evaluating visualizations.

In any case, I think it’s worth pursuing, because visualization is definitely NOT the same as UI design. Just the same, I’m wary of drawing comfort from made-up numbers without being extremely cautious. This is in fact one of the criticisms of current methods which use yes-no questions which can lose a lot of nuance.

faisal February 9, 2010 at 7:36 am

Use of qualitative methods often make us uneasy. But, I guess some of the coding methods are fairly rigorous for developing deep insights/theories from the raw observations like thinking out loud or similar ethnographic data. In this context I think coding is being used in a very limited sense as to whether an insight by user is correct or not as judged by domain experts.

I am not sure if we can call visualization tool as a not user interfaces (or a human computer interfaces). I think these tools have similar usability concern and concern for human expectation and value assigned that make them as much an interface as anything else we might use for interacting with computers.

faisal February 9, 2010 at 6:57 am

The insight characterization used by North is a very useful idea that can actually be used as a validation method along with formal lab experiment in Munzner’s nested model. Although, Munzner seems to focus more on informal style of usability studies and highlights a significant problem with lab test as their failure to capture task mismatch given that participants will be doing tasks as designed by experiment designer. The North work address this problem by measuring the insight on the task accomplished by domain scientist for a given tool and can address some of the concern regarding lab experiment.

I read the North journal paper stating their experimental methodology. The rigorous experimental methodology adopted for visualization evaluation can be very valuable given that one has enough resources to do so. It might not be necessary for the tool smith to actually compare a lot of tools against given datasets (thus reducing number of participants).

For Tufte’s readings, I read the first chapter stating different visualization principal using Minard’s map example but skimmed through the second one. Unlike other Tufte’s reading in this course so far, I found this to have more rhetorical details than actual message he wanted to convey.

Jeremy White February 9, 2010 at 7:01 am

There were three vastly different approached to visualization this week, but which one of these approaches will result in the best visualization?

Tufte would have you believe that proper integration and data density will efficiently drive the message. Essentially, he stresses telling the story by providing the answers.

Munzer provides a framework through which a systematic evaluation of tasks and data will establish direction for design. Nesting processes ensures that procedural order is covered in its entirety.

North posits that domain expertise will guide the visualization through insight. He would assert that the best person to tell a particular story is someone who has experience or vast knowledge of the subject matter.

If it were obvious which approach was the most effective, I suppose this seminar could be reduced to an afternoon workshop. One thing remains clear, however, visualization requires an understanding of the underlying message and the target audience.

Previous post:

Next post: