Guest Post: Data-Mining King Lear

[I am pleased to offer this guest post by Darby Foster, a first year undergraduate student at Georgia Institute of Technology, majoring in Business Administration/Information Technology Management. Her professor, Dr Sarah Higanbotham, was kind enough to get in touch with me to share Darby’s final paper, which appears in a truncated form here. VEP loves hearing from students whose imaginations have been really taken by the work we do. –hgf]

Darby Foster
Georgia Institute of Technology

first folio 3
First Folio, Emory University, Nov. 2016

When I read King Lear, I became even more curious about this play’s language. The corpus analysis software Ubiqu+ity, allowed me to quantitatively analyze King Lear in terms of the play’s tragedy, trying to gain perspective on just how sad the play really is. My analysis provided substantial evidence against the claims of the literary critic, George Steiner, in terms of Shakespeare and the genre of tragedy. As a Business Administration/IT Management major, I was not overly eager to take an English Literature course, and especially not a Shakespeare course focusing on the 1623 First Folio. And yet I have never been (and perhaps will never be again) so excited about research as I was when I applied data mining to Shakespeare’s late tragedy, King Lear. It began with Michael Witmore’s podcast on data-mining Shakespeare, which inspired me to experiment with data-mining: first with Hamlet, using an online corpus analysis software Voyant to isolate word trends in Hamlet’s soliloquies. In particular, I traced Hamlet’s relative frequencies and found a predominance of comparisons (16 uses of the preposition “like”).

Most people define genre by its overall narrative structure. To a traditional close reader, genre is “a type of literary work characterized by a particular form, style, or purpose” (“Genre”). But to a computer, “genre is a coordinated set of having things and not having things” (Witmore 2011). Data-mining software takes texts/selections of text and counts the occurrences of specific words and phrases. Certain words play a key role in tragic drama in particular, including doubt, sense, nature, and fortune (Booth 1983, 37). DocuScope’s dictionary categorizes thousands of words into “Positivity,” “Negativity,” “Anger,” “Sad,” and so on. By choosing individual words in each category, I found it surprisingly easy to discover its genre.

graph 1
Hope and Witmore 2004, 2010

Shakespeare’s 1623 First Folio divides the plays according to genre: comedies, histories, and tragedies. While the compilers of this collection of works likely used plot to separate the plays into genres, the same separation can be done using data-mining (Witmore 2011). Unfortunately, at this close level of analysis, the genre of tragedy can be difficult to distinguish. Data-mining software can easily delineate between comedies and histories, but tragedies lie somewhere in between these two genres (Hope and Witmore 2004). DocuScope, a sophisticated data-mining tool, counts the occurrences of specific categories of words and phrases in sections of text and creates graphs to display the findings in a visual manner. The following graph is a scatterplot of 1,000-word pieces of all of Shakespeare’s plays, color-coded based on genre (Witmore 2011). Green dots represent histories, red dots represent comedies, the orange dots correspond to tragedies, and the blue dots represent the late plays. This graph shows that what histories have, comedies lack, and vice versa, while tragedies are more in the middle of these two more defined genres. The patterns in the graph demonstrate that in addition to having similar plot structures and characters, Shakespeare’s plays within the same genre were clearly written with the same language and style.

One of Shakespeare’s most famous tragedies, King Lear, produces fascinating results when data-mined. DocuScope breaks up the text into over 100 categories of words. Each category contains thousands of words that were individually selected by David Kaufer, an English professor at Carnegie Mellon University, to fit a specific idea. One of the most prominent categories in the text of King Lear is “Negativity.” This category contains words such as death, curse, and torturous and corresponds to a total of 798 individual instances of negativity throughout the play (Ishizaki and Kaufer 2012). Such a strong presence of a single emotion greatly influences a work of literature. In this case, it also plays a big role in determining the genre of the play. Data-mining this play clearly reveals the play’s tragic nature.

Anyone who experiences King Lear can likewise tell that the play is a tragedy. From act one, scene one, it is evident that things are going downhill, as the king reveals his “darker purpose” to divide his kingdom into three parts, one for each of his daughters, so they can rule while he takes an “unburdened crawl toward death” (Shakespeare 1997, 1.1.43). From this point forward, the play is filled with pessimism, tragic events, and nihilism. Some argue that the decision to divide the kingdom is the true climax of the story, breaking the mold of traditional Shakespearean tragedies (Bowers 1980, 13). This structure allows no time for introducing the classic narrative fall of Lear; it brings the audience right into the middle of the story, which quickly becomes tragic. The two most loving and loyal characters in the play, Cordelia and Kent, are quickly banished. Not long after, Lear himself is banished from the homes of his daughters and sent out into a terrible storm (Shakespeare 1997, 2.4.295-353). The play becomes less tolerable to the audience as Lear’s mental capacity deteriorates. Rather than the tragedy building slowly over five acts, the audience experiences King Lear’s fall from 1.1. As the play progresses, there is still hope that conflict will be resolved and the protagonist will live on, but Shakespeare refuses to fulfil the desires of his audience (Booth 1983, 17). Cordelia’s death shocks everyone. “Enter Lear, with Cordelia in his arms, and the most terrifying five minutes in literature have begun” (Booth 1983, 11). The play ends, not with poetic justice, but with a father carrying the body of the virtuous young daughter whom he misjudged. And to intensify the tragedy, Lear himself dies just minutes later.

A quantitative perspective on King Lear provides similar results. When graphing the relative frequencies of specific types of language, patterns can be found in the data. An interesting example is with “Positivity,” which contains words and phrases such as trust, blessing, and hope. For example, “I pray you, sir, take patience: I have hope” (Shakespeare 1997, 2.4.130). While overall levels of negativity decrease as the play progresses, so do levels of positivity, which are almost always lower than the levels of negativity.

graph 2

In the graph above, “Negativity” is represented in red and “Positivity” is represented in blue, over time. The diminishing positivity can be attributed to the nature of tragedy. As more and more tragic events occur, the scenes and characters are filled with less positivity. This increasing level of tragedy correlates to a steadily increasing level of overall sadness. While there are peaks and troughs on the graph of words categorized as “Sad,” the linear regression line shows an overall increase in sadness as the play goes on. This reflects the overall emotions of the characters in the play as well as the mood that is inflicted upon the audience during the tragedy. Language categorized as “Anger” also follows a similar pattern, increasing relatively as the play progresses. In this overlay of the two graphs, with DocuScope categories “Anger” in red and “Sad” in blue, note that the major peaks in both categories of word even somewhat align. These two emotions, anger and sadness, are clearly correlated in this play. Both are typically thought of as negative emotions, which are common in tragedies. When tragic events occur, natural responses often include sadness over what happened and anger that it did happen. In King Lear, characters often experience one or both anger and sadness as a result of something happening in their life.

graph 3

Lear is Shakespeare’s most tragic play. It is possibly even “the most devastating tragic apprehension in the whole of Western dramatic literature” (Jackson 1996, 26). As Stephen Booth summarizes, “watching Lear is not unlike waiting for the death of a dying friend; our eagerness for the end makes the friend no less dear” (Booth 1983, 17). This very specific feeling captures the experience of King Lear; it is so depressingly tragic that all the audience wants is for the misery of the play to end. This type of incredibly sad tragedy can be categorized with its own name: absolute tragedy. Absolute tragedy “is immune to hope” (Steiner 2004, 4). It leaves no opportunity for the audience to believe that something good will come from all the negativity; it is unquestionably tragic. Such absolute tragedy “presents men and women who the gods torture and kill ‘for their sport’” (Steiner 2004, 11). This action is directly referenced in King Lear, when Gloucester and Edgar recognize late in the play, “As flies to wanton boys are we to th’ gods. They kill us for their sport” (Shakespeare 1997, 4.1.41-42). By this definition, King Lear aligns seamlessly with the definition of absolute tragedy.

chart 1

Steiner disagrees. According to him, Shakespeare’s only absolute, and therefore most tragic, tragedy is Timon of Athens (Steiner 2004, 12). He argues that Timon’s utterly bleak plot and motifs make this play more tragic than the rest. A scan through DocuScope provides contrary results. In categories that are critical to the genre of tragedy, King Lear dominates. The chart on the right shows the percentage of each play that fits into the DocuScope categories of “Negativity,” “Positivity,” “Anger,” and “Sad.” These values show that King Lear is approximately 1.09 times more negative, 1.59 times sadder, and 1.02 times angrier than Timon of Athens, which also happens to be 1.08 times more positive than King Lear. Based on these metrics, King Lear clearly contains higher concentrations of words that are typically found in tragedies. This quantitative analysis provides a more precise technique for determining absolute tragedy, revealing that Lear is not only an absolute tragedy, but even more tragic than Timon of Athens.

Works Cited
Booth, Stephen. (1983). King Lear, Macbeth, Indefinition, and Tragedy.

Bowers, Fredson. (1980). “The Structure of King Lear.” Shakespeare Quarterly 31 (1): 7-20.

“Genre, N.” (2014) OED Online. Oxford University Press. Accessed February 7, 2017.

Hope, Jonathan and Michael Witmore. (2010). “The Hundredth Psalm to the Tune of ‘Green Sleeves’: Digital Approaches to Shakespeare’s Language of Genre.” Shakespeare Quarterly 61 (3): 357-90.

Hope, Jonathan, and Michael Witmore. (2004). “The Very Large Textual Object: A Prosthetic Reading of Shakespeare.” Early Modern Literary Studies 9 (12). Available online:

Ishizaki, Suguru and David Kaufer. DocuScope Dictionary. Created 2012. Accessed 7 November 2016. Available online:

Jackson, Ester Merle. (1966). “King Lear: The Grammar of Tragedy.” Shakespeare Quarterly 17 (1): 25-40.

Shakespeare, William. (1997). King Lear. Ed. R.A. Foakes. London: Arden Shakespeare. Available online:

Steiner, George. (2004). “’Tragedy,’ Reconsidered.” New Literary History 35 (1): 1-15.

Witmore, Michael. Data-Mining Shakespeare. Created 2011. Accessed 7 September 2016. Available online: