Critiques

GOOD example:

Newsmap

Newsmap is a treemap visualization tool that depicts which news topics are most intensely being covered at a given moment. The goal of Newsmap is to help news readers finding out underlying patterns in news issue selection. The data is derived from Google news, which has location, section and time metadata and moreover already clusters similar articles into chunks with their own semantic algorhithms. It is intended for use by regular web users rather than trained specialists.

The quantity of a given issue topic is encoded into the size of the box and the font of the title inside it. The section in which the article cluster can be found is encoded into color hue, while the age of the issue is encoded into color lightness. The tool provides browsing by enabling users to select specific sections and countries. It also provides a search window that filters out article clusters with a specific keyword.

I like this visualization because it clicks with the intended audience by keeping its presentation as intuitive as possible. The topics of interests are right there in your face, by visually guiding the focus on the more prominent features. Moreover, it does not overload you with too much metadata because you can’t read the topics in the smaller boxes and fonts, unless you intentionally hover your mouse on them. With a remarkable simplicity, it shows the current landscape of topics in terms of specific issue, theme section and age of news on a single page.

======================

BAD Example:

ArtDiaspora.viz

ArtDiaspora.viz is a art-meets-cultural studies tool to visualize the concept of diaspora, by depicting various relationships between Korean-born artists, current residency and their works. It was presented at the 2002 Kwangju Bienniale, and uses the data from the artworks exhibited there.

This visualization ambitiously tries to include a vat array of data in a circular network diagram. The different classes (title of artwork, artist, country of residence, place of birth, year of birth, year of artwork, form of artwork) are encoded into different colors. Then the nodes are connected with colored edges.

Taking into account that this is presented at an art exhibition, the target audience may have been impressed by its beautiful design and the very idea of visualizing this kind of data. However, I don’t think this achieves its core goal of conceptualizing diaspora, because it is hard to read patterns with too many data categories cramped into one diagram. Artist’s name, country of residency, year of work would have been enough to show a meaningful pattern of talented artists moving out of Korea. Even for that, I don’t really see the need to use a circular network diagram rather than simple histograms and time-series worldmaps. On a shorter note, I don’t see why the category title (e.g. “Date of birth”) connects a edge to every single entity within the category. It makes the overloaded picture even messier.

Bad example

The two graphs show the natural rate of population increase for both developed countries and developing countries. The natural rate of population increase is the difference between the birth rate and the death rate. The purpose of this visualization is to reveal the difference in population growth rate (natural rate of population increase) between developed countries and developing countries.

However, this picture is an example of bad visualization because it doesn’t encode the important information in an effective way:

1) To getat a particular point in time, the viewer needs to calculate the vertical difference between the birth rate and death rate for that particular time, which is not intuitive.

2) The author wished to compare the population growth rates and conclude from these graphs that the population growth rate of developed countries is low and stable, while that rate of developing countries is high. However, by drawing the two graphs separately, it’s not easy to compare the difference.

Good example

http://well-formed.eigenfactor.org/radial.html#/?id=4324

This visualization gives an overview of the whole citation network. The colors represent the four main groups of journals, which are further subdivided into fields in the outer ring. The segments of the inner ring represent the individual journals. In the initial view, the top 1000 citation links are plotted. Selecting a single journal (inner ring) or whole field (outer ring) displays all citation flow coming in or out of the selection.

Benefit:

(1) The visualization is dynamic. If viewer click one journal, the other journals and their citation ship will fade away, and the clicked one will be highlighted.

(2) The ring design gives same emphasis to all journals. Imaging line or other shape, the information at the corner might be ignored.

(3) lines (citationship) between different journals run very smoothly. It gives the viewers a clear observation of which two journals are related.


Visualizing 3-Dimensional line data is an extremely tricky problem. Examples of such data are neuron connectivity information and fluid flow simulations. Below is a visualization of DTI fiber tracts in a human brain.

DTI fiber tracksThis visualization is able to correctly convey the connectivity information and at the same time adds depth cues in order to keep the three dimensional context. The paths are also bundled to give emphasis to general trends. I also like the fact that they use implicit perceptual cues ( the width of the halos around each line) and allow our visual system to perform the hard work of deciding what is in front of what. Previous approaches have tried to use realistic rendering, but because we rarely see such structures in real life, we have a hard time extracting the structure.

Here’s a link to the project page:

http://www.cs.rug.nl/~isenberg/VideosAndDemos/Everts2009DDH

The paper :

http://www.cs.rug.nl/~isenberg/personal/papers/Everts_2009_DDH.pdf

And additional high res images:

http://www.cs.rug.nl/~isenberg/uploads/VideosAndDemos/Everts_2009_DDH_supplemental_normal.pdf

There’s also a couple of link to videos in the project page. I encourage all to take a look at them.

I think that one of the most common problems when trying to visualize data is overcrowding. This can happen whenever one tries to display too many data categories, too many data points, or both. Below is a map from the National Weather Service that displays the current national weather warnings and advisories.

Small map of weahter warnings

The main problem with this map is that it tries to cram way too many types of warnings and advisories into a single view. There are a total of 30 categories, and all are coded by color. Eventually it becomes almost impossible to distinguish between them. For example,  does the coast of Hawaii have a Coastal Flood Warning, High Surf Warning, or both? Further investigation revels that it was a High Surf Waring.  There are even more problems when one considers the interaction that a user might have with the map. For one, clicking on a state and using the drop down menu to select one produce completely different results.

The current map and others can be found here

http://www.weather.gov/

One of the better visualizations that I’ve come across is in the Hierarchical Edge Bundles system built by Danny Holten at the University of Eindhoven. In this system, the visualizations are derived from a set of hierarchical data. The data is then drawn into clusters using either a radial or balloon layout. Finally, the relationships between the different data points is drawn by the use of edges linking the different points. These edges form a color gradient from the parent(green) to the child(red) and are further clustered with like edges through a series of bundling operations defined by a bundling parameter beta.

This visualization allows for a cleaner view of the relationships amongst hierarchical data. By encoding the edge direction via color and creating edge bundles of varying strengths, the user can easily detect patterns across the hierarchy of data and begin to form a more clustered hierarchy. By presenting this data in a radial or balloon layout, more information can also be derived in a single viewpoint than can be done with a simple top-down tree layout. In general, this represents a very clean design and uses encodings that are clear and easy to read. Data does not seem to become foggy as a result of the techiques and relationships within the data are very clearly emphasized.

In general, I really like this visualization because it is very clean and applicable to any sort of hierarchically defined data. Aesthetically, it makes for a very nice visual and also clearly conveys its intended information. More pictures and the paper explaining the system can be found here.

Good

American income tax statistics

“Who is Paying Taxes” mint.

http://www.mint.com/blog/wp-content/uploads/2009/11/MINT-TAXES-R4.png

Found at: http://www.visualizingeconomics.com/

Critique

The chart above was found at the website cited above and a large version can be found by going there.  This visualization falls on the InfoVis/Present side of the two perspectives covered in class and attempts to show the relation ship between income and income tax payed by different brackets of the population.  The chart would be of interest to any member of the united states curious about who pays what percent of income tax. The chart contains both quantitative and ordered data.  The quantitative data includes income size and percentage of taxable income.  The ordered data includes income levels and percent income tax paid.  I consider percent income tax paid to be ordered data because while the percentages could be considered quantitative, they are displayed in the context of a pie chart in which the size is used to create an ordering.

Tasks enabled include

  1. Determining what percentage of the population falls into a certain tax bracket.
  2. Determining what percentage of income in each tax bracket is considered taxable.
  3. Determining relative sizes of different tax brackets
  4. Determining how much total income tax is payed by each tax bracket.

Income size is encoded with plain text for each tax bracket.  Percentage of taxable income is also encoded with text but also with a horizontal bar that is shaded with hash marks over the required percentage.  Income levels are encoded using different shades of the color green with lighter being the lowest and higher being the darkest.  The size of the tax brackets is encoded in the height of the green blocks as well as percentages to the size of the blocks.  The percent income paid is encoded in a pie chart to the right of the income block along with text in each piece giving an exact value and shading that reflects the green counterpart.  Connections are made between each income block and its percent of total income tax with lines that do not intersect.

Ordering of the tax brackets is done using both position and lightness.  The highest earners are at the top and the lowest are at the bottom with the bottom being the lightest and the top being the darkest.  According to Munzner, position is the best encoding for all data types, for ordered data, lightness is the second best encoding.  The second best encoding for quantitative data is length which is used for percent of taxable income, and to some extent percent of income tax paid.  Area is used extensively in both the table and the pie chart.  While this is not one of the best encodings for any of the data types, it does allow the viewer to relate the sizes of two datum.

This visualization makes if very easy to answer the question “what percent of total tax income comes from what tax bracket.”  It is clear that even though only 1.8% of the income of the highest tax bracket is taxable, that bracket still pays the majority of income tax collected.  This is very important because saying that only 1.8% of income is taxable for those making more than $500,000 per year could be very misleading.

This is also a good example of removing chart junk.  There are no tidbits of information cluttering the valuable information.  There is a proper title and narrative at the top and a signature at the bottom.  The rest is information relevant to the visualization.

Bad

Bad visualization of 3D space

Edward J. Haug. Computer Aided Kinematics and Dynamics of Mechanical Systems. Allyn and Bacon 1989

Critique

Visualizations of three dimensional spaces occur in almost every graphics and vector-math text book.  The image above is only an example of how this visualization is attempted.  The example image is attempting to show a local reference frame inside of a global reference frame and depict points in each.  This visualization method fails because often it doesn’t provide any method for determining “true” orientations.  Often these visualizations are given as an afterthought to a larger more thorough narrative, however, even with a good explanation, the visual can be misleading.  The main issues is that one entire dimension worth of data is thrown away in order to present the data in a two dimensional form.

There are a number of ambiguities including what direction is the x axis of the global reference frame pointing?  Are z’ and x’ pointing towards or away from the viewer?  Where is R?  With a little information including knowledge of left or right handed conventions, these questions are a little easier to answer however there are a number of queues that could be added to help pull useful information from the image


Bank Graph

February 10, 2010

in Critiques,Student Posts

This is a graph that compares the market values of various banks at two different times. Once in 2007, the other after the crash to show value in early January, 2009.

This graph is very misleading, shape is probably not the best method to display the data, especially when there are only two pieces of data that are being compared within one bank. If the plan is to also compare between different banks, then there should be a better ordering, possibly by original market value (represented by the blue circles) or changed market value (represented by the green circles), depending on what you wish to show or compare.

One major problem with this image is that I quantify and compare areas much more readily than I do diameter, which is how the circle sizes are determined. Especially as there is no actual diameter lines written in to compare the two. For example, Goldman Sachs, which has dropped from 100 to 35, does not immediately appear to be worth about a third of it’s original value. Looking at the graph, my first impressions is that it’s worth a much smaller fraction of the value. This is a very misleading image.

I think this is an elegant way of displaying connections between a huge dataset.  To explain, this is a haplotype map that has SNPs (single nucleotide polymorphism) at the top (where it is feathered). They then study the genes of a population to determine the probability that SNPs occur together (called linkage disequilibrium. If you take two white squares at the top, and then draw lines parallel to the triangle to get an intersection, you get a square that represents that probability. A higher probability (closer to 1) is colored red. A probability that approaches random (.5) is white, with blue being slightly better than random and light red being between red and blue.

This means that triangles of mostly red, which are marked off in these plots, are those SNPs which are more likely to be passed on together during chromosome crossover. In the Caucasian sample, there are many more of these than in the Yoruba sample, which implies that any mutation that added something new to the gene pool happened more recently for Caucasians than Yoruba, as more crossovers (more generations) means less linkage disequilibrium.

These graphs are specifically of chromosome 8p23.1 and is for a 100kb region.

HapMap

The intended audience are those with an understanding of biology and the human genome. It allows you to see, at a glance, genes or SNPs which are connected. This encodes probability to color, and uses position as a way of filling in the matrix connecting two values. The positions along the top of the triangle is mapped to position in the chromosome itself. I think it is good because you can clearly see the connections between different SNPs and regions with are likely to stick together during the recombination required to create gametes. It also is simple to glance between different HapMaps of the same chromosome in different populations and see which has had the most recent mutation (and therefore less time to be more random in the population).

The Good

February 10, 2010

in Critiques,Student Posts

The Cost of Care

http://blogs.ngm.com/blog_central/2009/12/the-cost-of-care.html

This visualization compares cost of healthcare per person to average life expectancy for various developed countries. This text is included on article linking to the actual image:

“The United States spends more on medical care per person than any country, yet life expectancy is shorter than in most other developed nations and many developing ones. Lack of health insurance is a factor in life span and contributes to an estimated 45,000 deaths a year. Why the high cost? The U.S. has a fee-for-service system—paying medical providers piecemeal for appointments, surgery, and the like. That can lead to unneeded treatment that doesn’t reliably improve a patient’s health. Says Gerard Anderson, a professor at Johns Hopkins Bloomberg School of Public Health who studies health insurance worldwide, “More care does not necessarily mean better care.”  —Michelle Andrews

This visualization encodes four dimensions of data in the following ways:

–       Cost of healthcare per person- y position

–       Average life expectancy- y position

–       Average number of doctor’s visits per person- line thickness

–       Type of coverage (universal or otherwise)- hue

What Munzner might say:

Cost and life expectancy clearly fall into the quantitative data category, and are encoded using position, the strongest visual channel for their data type. Type of coverage is categorical, and is encoded using hue, the second strongest visual channel for its data type (after position, which has already been used).  All the visual channels are seperable, and code these four dimensions without confusion. Also, cost and life expectancy are connected by lines, so their relationship is encoded using line slope. Clearly and explicitly relating this data makes the US pop out as the country with the steepest downward slope.

What Tufte might say:

First, this graphic is well documented. The creator, his position, the data source, the year the data was collected, the fact that some countries aren’t shown, and the scales for all the numeric data are all clearly written on the image. The lines connecting cost and life expectancy facilitate clear comparisons of all the data.

Edit: Here’s an interesting article where the creator justifies his design choice over a scatterplot: http://blogs.ngm.com/blog_central/2010/01/the-other-health-care-debate-lines-vs-scatterplot.html

The Bad

February 10, 2010

in Critiques,Student Posts

Trends and Technology Timeline 2010+: A roadmap for the exploration of current and future trends

http://nowandnext.com/PDF/trends_and_technology_timeline_2010.pdf

This visualization denotes current trends as well as predictions for future trends, and displays them in a way that is analogous to a subway map. This text is included on the map:

“This map is a broad representation of some of the trends and technologies currently visible. Improvement works are carried out at weekends and travellers should check to see whether lines are still operable before commencing any journeys. Helpful suggestions concerning new routes and excursions are always welcome.”

This visualization encodes five dimensions of data in the following ways:

–       “time zones”– radial distance from the center of the map, hue

–       phenomena – text labels, position on category lines, connection?

–       category of phenomena – hue

–       type of phenomena – shape, glyphs

–       global risks – bulleted list, containment?

What Munzner might say:

First of all, this “roadmap” doesn’t even use the strongest visual channel(s): absolute x and y position. Then, it uses hue to distinguish different time zones (even though this is ordered data, and saturation would be more appropriate), AND to distinguish different categories. And there are 16 different colors corresponding to different categories, even though the max amount of colors used should be eight.

What Tufte might say:

What is meant by “time zones”? The common definition of this word is very different from the definition as relates to this visualization. Also, trends appear on category lines in a particular order. Is there any logic behind this order? Does it imply causality? Finally, the extremely dense text doesn’t make very judicious use of ink.