Readings 07: High-Dimensional Data

Last week, we focused on scaling in the number of items. This week, we’ll talk about what to do when we have too many dimensions.

Unfortunately, we can’t discuss the mathematics and algorithms of dimensionality reduction in class. Which is too bad, since its useful and important and (in my mind) interesting. There are enough other classes that discuss it.

  1. (required) High-Dimensional Visualizations. Georges Grinstein, Marjan Trutschl, Urska Cvek. (semantic scholar) (link1)

    This is an old (Circa 2001) paper that I am not sure was actually published at KDD. However, it is a great gallery of old methods for doing “High-Dimensional” (mid-dimensional by modern standards) visualizations. Most of these ideas did not stand the test of time - but it’s amusing to look through the old gallery to get a sense of what people were trying.

  2. (required) The Beginner’s Guide to Dimensionality Reduction, by By: Matthew Conlen and Fred Hohman. An Idyll interactive workbook.

    This is a very basic demonstration of the basic concepts of dimensionality reduction. It doesn’t say much about the “real” algorithms, but you should get a rough idea if you haven’t already.

  3. (required) How to Use T-SNE Effectively

    I wanted to give you a good foundation on dimensionality reduction. This isn’t it. But… it will make you appreciate why you need to be careful with dimensionality reduction (especially fancy kinds of it).

I was going to suggest some optional readings for those of you who want to learn more about dimensionality reduction. There is a lot of great stuff the is visualization specific: techniques for using dimensionality reduction, approaches for user-controlled (supervised) dimensionality reduction, ways to visualize and interpret dimensionality reductions, … But there’s so much I don’t know where to start. If there is some topic that is interesting to you, make a posting on Piazza and I’ll give a recommendation on where to start.