VARD Normalization Errors

Deidre Stuffer
in News
VARD decently standardizes Early Modern English. Sometimes, though, it makes questionable replacements. ORIGINAL NORMALIZATION SHOULD BE all’s ell’s all’s caus’d cause caused Cicilia Cicely Cicilia courtesie curtsy courtesy diuers divers diverse hir his her ile isle I’ll ist first is’t kild/killd kilt killed maister moister master maist moist mayst *nunc nuns nunc Pauls Pals Paul’s *qui queen qui shees shoes she’s / she is weele weal we’ll / we will where’s whores where’s / where is Of course, you will want to check how your VARD installation handles these words. Read more…

Tweaking VARD: Aggressive Rules for Early Modern English Morphemes and Elisions

Since I have discussed how VARD behaves with character encoding and symbols, I will devote space to explaining how I tweaked VARD to standardize Jonathan Hope’s early modern drama corpus. Given the size of Hope’s corpus, it required automating the process of comparing VARD’s output to the original play files. Erin Winter wrote a case-sensitive python script that generated a CSV recording all of VARD’s changes and their frequencies. I compared the original words to VARD’s normalizations, looking at only the highest frequencies. Read more…