User Guide
We are in the process of developing a user guide to help people use AbstractsViewer.
Jump to:
Load dataset
Find target documents
Find similar documents
Using the heatmap
Find important words
Change vector spaces
Radial Plot
Click the ‘submit’ button without entering anything to view the list of datasets we have.
Choose the dataset that you wish to explore by clicking on it. In this tutorial, we use the ‘VIS CONFERENCE PAPERS’ as an example.
After choosing the dataset, you will be redirected to the following interface.
The scatterplot on the left presents a 2D “map” of all documents. Each document is represented as a dot in the scatterplot, shown as a density map. Selecting a region, by clicking on one of the boxes in the scatterplot, shows a detailed view with the documents within that region, even showing documents not in the search term or similar documents in grey.
This will also open up a heatmap below it with documents within the region on the top and salient words within those documents on the left. This helps to understand why these particular documents are grouped together.
There also is a list of papers that can be sorted in different ways to display all the papers within that region.
The table on the right is the searchlist. Without a search, it shows the default list of documents within the database. You can search for target documents by typing key words in the upper-right search panel.
You can search with two options:
- Keyword: searches for abstracts across the database that contain the search terms.
- Neighbors: generates an abstract using the key terms and finds neighboring papers to it.
A list of search results will appear in the table and also shown in the scatterplot as green dots.
In this tutorial, we searched for the vis papers that have the key word “text” using keyword search. We can see that a list of papers containing “text” are shown in the table and the scatterplot.
You can also select a document to view its details (title, key words, abstracts, etc). The search terms are highlighted in green. Clicking “yellow highlight on” highlights the single document’s salient words. Clicking “favorited” adds this document to the favorites list, which can be accessed using the button in the top right corner.
A multitude of resources can be used to search for and evaluate similarity of a document/documents. The left side is primarily used for this, with a scrolling bar that can be used to look at whatever view is most useful for you at the moment, along with collapsable views of helpful comparison graphics. If you are done at this point using the search view, you may easily collapse it for later use, and expand the corpus map and word matrix.
After selecting a document, a list of neighboring documents will appear below the document view. You can also find the neighboring documents in the left column in the scatterplot, shown in orange.
In the screenshot below, the corpus map and word matrix have been expanded, and the right side with the expanded documents is scrolled down to directly compare similar documents to the graphic tools.
Notice the list of neighboring documents has two columns. These are two different vector spaces. Use the drop-down to adjust the vector space used, and then compare directly using the side-by-side columns.
Compare documents and vector spaces using these columns as well. Click on a similar document from the recommendations or in the heatmap and it will appear to the right of the original paper. To switch back and forth between comparisons on multiple papers, select multiple and use the arrows at the top of the screen to go back and forth between documents.
The stars next to each of the neighboring documents are color coded in alignment with the color coding for the rest of the system. This is as follows:
Green: Contains the search term
Yellow: A selected document of the left selection’s left vector space
Pink: Similar document of the second selection’s left vector space
If it is the first selected document, the background color will be yellow, and if it is the second selected document, the background color will be pink.
This color coding is the same as what is used in the corpus map and scatterplot, which has a provided legend.
Words in the selected document view are also color-coded highlighted to help the user gain an understanding of why two papers are similar.
Green: a search term
Yellow: a word that have a large influence in the similarity between the selected documents.
Light yellow: a word that has a smaller influence in the similarity between the selected documents.
This highlighting can be toggled on and off by clicking “yellow highlight on”.
Documents in the scatterpot are color coded with the legend provided on the website and also as described above. Documents that fall under multiple categories are split half and half. The left selected document is identified with a yellow star, and the right selected document is identified with a pink star.
In the image below, the corpus map shows the similar documents displayed below the selected documents.
When so many documents are in a small region, it may be difficult to see or click on the individual documents. To view what words define that region at a glance, simply hover over it.
To expand a region and look at its scatterplot, as well as see documents that are not featured in the corpus map, click on a region, or box, on the corpus map. Then, click “show/collapse scatterplot” and scroll down so it’s in view.
4. Word and Regional Matrices and Lists
Example images of these matrices in use are shown above.
Once a document is selected, the word matrix for that document will appear on the left under the tab “show/collapse word matrix”. This matrix has a column for each document that is selected and its neighbors and a row for each salient word. The color indicates the number of occurrences of that salient word in the document. This functionality is also in the region view.
To view the matrix for a specific region of the corpus map, simply click on that region and press “show/collapse regional matrix”. Sort the matrix words by choosing from a list of sorting criteria.
You can also reorder the list of documents by using the regional list. The regional list is used for systematic exploration of the region in list view. You can sort this using a list of different sorting metrics. By changing the sorting metric here and clicking the button “Sort document by regional list”, this changes the ordering of documents in the regional matrix as well. Below shows an example of the list, sorted by search results first. As you can see, search results, highlighted in green, appear at the top.
Use the dropdown to change between vector spaces to find similar works with. The default is set to Tf-idf on the left and Tf-idf(2D) on the right, but there are several other vector spaces to pick from. The default vector space shown on the heatmap is Tf-idf, since it picks from the left vector space shown. Tf-idf(2D) is the default vector space for he scatterplot, and this can be changed right above the scatterplot. Here is a brief description to help guide you in which vector space you want to use.
**6. View document similarity using the Radial Plot **
Documents are represented using a circle point, color coded similarly to the rest of the system. Documents closer to the center point are more similar to it according to the vector space chosen. Additionally, they are given a score based on how similar they are, which can be found by hovering over it. The smaller the score, the closer a neighbor. Advanced settings such as number of rings, type of rings, number of points, and embedding are available. Additionally, there is a setting called “pen color”. This is a feature of the radial plots available to single out documents by coloring them, which in turns also colors them on the corpus map. By default the pen color is null, meaning it will not change the color by clicking on it, but other colors are available as well.
Tf-idf:
Tf-idf stands for term frequency-inverse document frequency. It evaluates the relevancy of a word within a document that is in a collection of other documents. It does this by considering two metrics, the frequency in which a word appears and the inverse document frequency (IDF) of the word in a set of documents. IDF measures how unique the word is to this specific document in comparison to the others. For example, if the term appears in every document, the IDF is 0. The fewer the documents the term appears in, the IDF goes closer to 1. This helps to make sure terms like “the” that may appear very frequently are not placed at a high value.
Tf-idf multiplies the word frequency and the IDF. So, it determines that words are more important if they appear more often, but realizes they might not be super relevant if they appear in many documents.
It’s useful in understanding what a document is about and also for retrieving key words.
Tf-idf(2D):
This is simply the same as Tf-idf, however projected into a 2D space. This can be useful to see how the scatterplot interprets the similar papers.
Learn more about Tf-idf here:
Tf-idf
Inverse document frequency
Sentence Encoder: The Universal Sentence Encoder encodes text as high dimensional vectors. While encoding words as vectors does a good job of measuring how related two words may be by measuring their distances, they fail to cover the context of the whole sentence. The sentence encoder fixes this.
It’s useful in text classification, semantic similarity, and clustering.
Sentence Encoder(2D): Again, this is the same results from the sentence encoder above, projected onto a 2D space. This is how the scatterplot would show the sentence encoder vector space.
Learn more about the Sentence Encoder: Tf-idf
Understanding why things are happening/interpretability One of the key aspects of AbstractsViewer is its interpretability. This means it is easy to understand what is actually happening to make the computer make the choices it does.
-
Interpreting the scatterplot/document map: To uncover how the document map decides on where documents land, click on regions. From this view, you can see what words are similar between a lot of the documents that have also landed in that spot.
-
Interpreting why two documents are determined similar: Click on the two documents to compare them side-by-side. From here, you can see the highlighted words in the abstracts between the two. The yellow words highlighted are words that are “important” in both documents, and make them more similar.
-
Understanding neighboring documents: To understand why an entire list of documents is similar to a selected document, look at the similar documents heatmap. From here, you can view what words are present in the selected document vs what words are present in the list of similar documents to understand what they have in common.