TBD
(Need clarification w Michael; Assuming this is using NYT dataset)
it seems this region has categorized of both papers relating to China and papers relating to Ukraine/government. A hypothesis - lots of papers relating to China have fallen in this region since it is characterized by word stems relating to government, which has also caused papers about Ukraine to fall into this region as well. None of the papers about Ukraine have any frequency of word stems relating to China.
Interestingly, the region with the most search hits of Ukraine falls in a region with the defining words all relating to China
“china”, “hong”, “kong”, “chines”, “beij”
Below that, are words referring to government “presid”, “polit”, “econom”, “govern”
Using the Ukraine findings gives a good example of Specter being better. The examples above were using TFIDF. Using Specter, the Ukraine search hits mainly clump together in a region around “lead, today, soviet, govern, union, presid, countri, unit, minist, polit”. So Spector seems to have at least done the initial clustering better. Interestingly, an article from 2006 is about the gas crisis from Russia cutting off natural gas from Ukraine’s pipelines - the first most recommended paper from this directly relates to this and almost seems to be a follow up - “Russia and Ukraine Reach Compromise on Natural Gas”, with a bunch more of the neighborhood papers falling in the same region and also relating to gas. This is using TFIDF.