Walkthrough
This example takes through a use-case scenario that exemplifies several of the features open for use in AbstractsViewer. It will essentially walk through an exploration of the system, and how its interpretability features can be used to understand how and why the system makes the decisions it does.
Narrowing the search
The first thing is to type in a search term to narrow down the results. Here, we type “text” in the search bar, and use the keyword search. To filter by the most recent documents available, we click the “year” header to change the ordering.
To select a paper, simply click on it from the search results. We can easily see why this paper was picked, as “text” is highlighted in green in the selected documents view.
To focus more on the visualization tools now that a paper is selected, click the “show/collapse search bar” button.
Quick Evaluation of Papers
To get a high-level overview of large concepts of the paper in seconds, click on “show/collapse word matrix”. Words that appear on the left side are the most relevant words of that document according to the vector space currently selected.
Here we can see the word stems of “news”, “stori”, “narr”, and “datadriven” are prominent in this document. This is validated by the abstract, which shows the paper seems to be about discovering the best way to represent data-driven stories with visualizatios and text, taking a lot of these stories from the news and media.
Now look at the similar papers. Hovering over many of them, it makes sense why they were determined a similar paper. Most seem to be about stories, narration, and news.
Understanding Odd Recommendations
However, one paper is unclear how it is related: “Promoting Insight: A Case Study of How to Incorporate Interaction in Existing Data Visualizations”.
Click on the paper to open it as a second selected document. Immediately, you can identify using the yellow highlighted words why it was determined to be a selected paper. While not apparent in the title, the abstract talks a lot about stories, narrative, news, media, and improving the visualization of data stories. This is extremely similar to first selected document, which would have been unclear from the title alone.
Looking at the corpus map, we can also see that they are in the same region (the boxes on the corpus map are called “regions”, and have been grouped together by their similarities). By looking at the corpus map, its easy to identify that the qualities of papers that fall in this map - most, if not all, of the papers have stories and narration as similar words. To view a regional matrix, click on the region of your choice and click “show/collapse regional matrix”.
Understanding Odd Placements on the Corpus Map
However, looking at the corpus map, there are a few papers that stand out. Looking at the key, most of the orange and pink papers, or similar papers, fall within the same region. However, there are a few papers that fall outside of this region. The document that is pink with an orange outline is present in both, and the title seems like it would fit as well. The orange document has been marked as a similar paper, is way outside of the region most papers fall under, and has a title that seems unlike the rest.
The orange paper is strange as to why it was recommended in the first place. The pink paper is strange as to why it hasn’t been grouped with the rest. We want to look at both of these papers and use the interpretability features of AbstractsViewer to answer these questions. To ensure we don’t lose track of one of the papers, we left click one of them to save it to “favorites”.
Why is that paper in a completely different region?
We click on the pink paper to select it. Here, we can see that this region seems to be defined by the word stem “stequenc” and “event”. We can also see why these papers were recommended, as they have similar highlighted words between the papers such as “narrative” and “stories”.
Looking at the word matrix, we see that the second selected document that has been placed in the “sequenc” region, also has a high frequency of the word “sequenc”, along with both “narr” and “stori”.
So, by using the interpretability features of AbstractsViewer, it’s easy to determine why this paper was matched as a recommended paper as it has the word stems of “narr” and “stori” in common. However, we can also see that it was placed in this other region because of the high frequency of “sequenc” used in the paper and its chosen region.
Why is that paper even recommended?
Since we have clicked on a new paper, the orange paper singled out before is no longer as clear by the corpus map. Since it was saved to favorites, we can easily access it by using the side bar. Access the favorites by clicking the button on the top right.
Again, using the regional matrix, we can see that this region doesn’t have much in common with our original selected paper, as “layout” is the common word stem defining this region.
However, looking at the yellow highlighted words between the two selected papers, we can easily see the connection between the two, as even out in a completely different region, the papers are connected by the words “stori” and “data”.
This is validated by looking at the word matrix, which shows the papers have the common high-frequency word stems of “stori” and “narr”.
Using the scatterplot
Lastly, we will use the scatterplot to expand dense regions. The region with our first selected paper is extremely dense. By hovering over, we can see there are 41 papers in it.
To expand this region, click on it and scroll down the left side to view the expanded scatterplot.
Overall, the multiple visualization tools of AbstractsViewer allow for a seamless experience to explore and discover similar papers and also understand what connected these papers together.