Demo #

Demo website #

Click here to open the online demo in a new browser tab. You can also try the use cases listed below in our system. The dataset and key views for each use case have been included.

Public Datasets #

Dataset Name	Description	Related Use case	Demo url
Imputation	The Mushroom dataset that considers a testing set of 2,000 (of 8,124) randomly selected instances.	Model Selection	demo
IMDB Confidence	The data consists of 5044 movies with 27 features, however, 25% is sequestered for final assessment. A stratified sampling of 200 movies per class is held out from the 3756 for testing.	Model Selection and Tuning	demo
recid	This dataset is used for fair learning: the Broward County recidivism dataset, popularized by ProPublica. The data set contains 6,172 instances and 14 numeric features (created by one-hot encoding the categorical features in the initial seven feature data set). 20% are held for testing.	Fairness Assessment	demo
date-12000-strat	The dataset is the TCP collection of historical documents. It took a random sample of 12,000 documents, and held out 30% using stratified sampling. While the testing set is balanced (1,800 per class), the training set is highly skewed (only 15% before 1642)	Bias and Data Discovery	demo
fuzz-mod-5-02	The data set is a collection of 554 plays written in the Early Modern Period (1470-1660). Five linguistic features are used. It contains four kinds of plays : Comedy, History, Tragedy and Tragicomedy.	Feature Sensitivity Testing	demo
tcp-tree-select-9-10	This dataset considers a corpus of 59,989 documents from a historical literary collection and the data counts the 500 most common English words in each document.	Model Selection and Data Discovery	demo
(continuous) heart disease	The dataset is a standard data set used in machine learning education. Classifiers are trained to predict if a patient is likely to develop a disease (binary decision).	(Continuous) Model Selection and Calibration Analysis	demo
(continuous) wine quality	The dataset is used for wine quality classification, which requires classifying the quality of a wine from its properties.	(Continuous) Hyper parameter Tuning	demo
(continuous) income	The dataset comes from income classification benchmark dataset from that has been downsampled. Classifiers determine whether an individual’s income is above a certain level.	(Continuous) Model Selection	demo
(continuous) cifar-sampled-scaling	The datset is created based on CIFAR 100 computer vision benchmark using Tensorflow. The data set has 100 classes, and the trained classifier produces a distribution over these classes as its decision.A binary classifier has been created for a “meta-class” which combines 5 of the main classes. This datasets aims to classify flowers, which can be any one of 5 of the original classes. Because the test set contains all 100 classes, it is quite imbalanced: flowers are only 5% of the total instances	(Continuous) Model Selection and Detail Examination	demo
(continuous) cdate-2500	This dataset considers a corpus of 59,989 documents from a historical literary collection: Text Creation Partnership (TCP) transcriptions of the Early English Books Online (EEBO). The data counts the 500 most common English words in each document. For the experiment, we took a random sample of 2500 documents, and held out 30% using stratified sampling.	(Continuous) Data Examination	demo