Introduction

Project Web Page: III: EAGER: Visual Comparison of Machine Learning Outcomes

The goal of this project is to develop better tools for working with Machine Learning systems. Our approach has two core ideas:

  1. We treat the learning systems as black boxes - our tools only look at their inputs, outputs, and metadata. This allows us to build tools that are agnostic to the learning methods, that enable users to work with (familiar) data rather than internal representations, and to consider meaningful performance. Our approach can compliment tools for looking “inside” the black boxes.
  2. We focus on comparison between different classifiers. Both because many tasks, such as model selection, involve comparison, but also because often the best way to understand something complication is to compare it with something else.

As a simple example, consider comparing two (or more) classifiers. Typically, one would run each over a testing set and summarize the performance using some metric (e.g., accuracy, F1), and pick the one with a higher score. Our premise is that by looking carefully at this experiment - that is examining which items got right or wrong, we can gain better insights into the classifiers. This requires us to develop new tools for examining classifier result data (collection of input/output pairs) as well as potentially developing new strategies for choosing the testing examples such that examining them is more information.

Our project is initially considering classifiers, but we are looking forward to exploring other types of machine learning problems (e.g., recommender systems, reinforcement-learned reactive policies, predictive regression).

Projects

This overall project is related to a number of specific technical projects. Note: the technical projects often involve support from a number of sources.

Boxer: Comparison of Discrete Choice Classifiers: The Boxer system is designed to help users compare discrete choice classifiers. It helps users choose appropriate metrics, identify subsets of the testing data to focus on, assess performance over data subsets, and identify instances of interest. We have applied it to tasks include metric selection, model selection, model tuning, and data quality assessment.

CellOViewer: Examination of Cell Ontology Classifiers: CellOViewer is a specialized tool for looking at the results of experiments to build classifiers to label cell types based on genetic information (RNA-Seq data, to be specific). The data comprises a classifier for each cell type in a Cell Ontology which determines if an observed gene expression is likely to be a cell of that type. The CelloViewer enables viewers to consider a large set of classifiers to find patterns in which cell types are correlated with which genes (and vice versa).

EmbComp: Comparison of Embeddings: EmbComp is a tool for pairwise comparison of embeddings. It is general, and has been used for applications of embedding many different kinds of objects (words, documents, graph nodes, etc.) into high-dimensional vector spaces. It focuses on allowing for comparison of the distance relationships between different embeddings, rather than the specific values of particular embeddings. For example, it allows understanding if objects have similar neighbors in different embeddings.

Project Products and Dissemination

Publications

All publications from the UW graphics group should be available from The Group’s Papers Page.

Online Demos

Several of our projects have online demos. See the descriptions above.

Videos

Research videos are generally available from the project websites or Gleicher’s Video Page

Talks

Talks related to projects are linked from the specific project web pages. Slides from PI Michael Gleicher’s talks are also available at Gleicher’s Talks List.

Highlights related to this NSF project:

  • What Shakespeare Taught Us About Visualization and Data Science - Invited talk at the University of Arizona TRIPODS Seminar, January, 2019
  • Interpreting Embeddings with Comparison - Invited talk at the University of Arizona CS Department Seminary, January 2019

Educational Resources

We continue to develop a graduate level visualization class designed to serve both CS graduate students as well as others from around the University. The class focuses on design and principles rather than implementation details. Ideas from the project are connected to class: we use our research projects as examples and case studies, and the principles developed in the projects are discussed in class.

The course web page provides most of the materials about class operation and content.

NSF Award Information

Award Title: III: EAGER: Visual Comparison of Machine Learning Outcomes
NSF Award Number: 1841349
Official NSF Award Page: link
Duration: January 1, 2019 - December 31, 2020 (two years, plus extensions)
Award Amount: $169,964.00
PI: Michael Gleicher

Original Abstract:

Abstract

This project will explore a new approach to the challenging problems of enabling people to work effectively with machine learning systems. The project will develop new tools that will assist users with a wide range of tasks, including building and tuning models, assessing and diagnosing them to build appropriate levels of trust, and using the learned models to gain insight on the data. In order to meet user needs in the face of increasingly large and complex systems, the project will develop a new approach to interacting with learning systems: providing interactive visualization tools that enable exploration and comparison in sets of model outputs. The project will explore tools for exploring learning experiment results, comparing these results with tests of other models, and helping run specialized tests. These new tools will have broad impact as they will enable a wide range of users to more effectively work with machine learning systems.

This project will explore the viability of the outcome comparison approach by developing it in the context of two applications. In one, will develop tools for identifying causal insights from complex models. This will allow us to show that the approach scales to state-of-the-art model types and complex experiment designs. A second application will develop tools for diagnosing and interpreting embeddings from text corpora. This application will allow us to demonstrate the approach on complex data types and significant scales. The project is exploratory: there is no evidence that the approach can scale to such challenging scenarios, and the project must explore experiment strategies and visual designs that support large and complex scenarios. Products of the project, including publications, open source software, demonstrations and tutorials will be available from the project website.

Project Participants and Collaborators

  • PI: Michael Gleicher
  • Supported Students: (none yet)
  • UW Graphics Group Contributors: (group members involved in projects, but not (yet) supported by this project) Florian Heimerl (post-doc), Adiya Barve (Graduate Student, RA and TA), Xinyi Yu (Graduate Student, RA and TA), Ainur Ainabekova (Graduate Student, TA), Ruoyu He (Undergraduate Student, directed study and hourly), Jeff Ma (Undergraduate Student, hourly)
  • UW Domain Collaborators: Colin Dewey (faculty, Department of Biostatistics and Medical Informatics), Matt Bernstein

Acknowledgment

This material is based upon work supported by the National Science Foundation under Grant No. 1841349.

The work in this project is also supported in part by other sponsors, including DARPA and the Chan Zuckerberg Foundation.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or our other sponsors.

Last Modified Monday, April 24, 2023 by gleicher (office)
Point of contact: Michael Gleicher