VARD Normalization Errors
VARD decently standardizes Early Modern English. Sometimes, though, it makes questionable replacements.
ORIGINAL | NORMALIZATION | SHOULD BE |
---|---|---|
all’s | ell’s | all’s |
caus’d | cause | caused |
Cicilia | Cicely | Cicilia |
courtesie | curtsy | courtesy |
diuers | divers | diverse |
hir | his | her |
ile | isle | I’ll |
ist | first | is’t |
kild/killd | kilt | killed |
maister | moister | master |
maist | moist | mayst |
*nunc | nuns | nunc |
Pauls | Pals | Paul’s |
*qui | queen | qui |
shees | shoes | she’s / she is |
weele | weal | we’ll / we will |
where’s | whores | where’s / where is |
Of course, you will want to check how your VARD installation handles these words. VARD keeps a running list of changes it makes, which silently trains the program as it executes. It is good practice to examine what VARD changes certain words to. Your datasets will be different than mine. These changes are based on blatant errors the VEP team located in the early modern drama corpus. Datasets, based on their content, have different curation needs.
*The dataset we use is interspersed with different languages, especially Latin. I had to add foreign words–like nunc and qui–to the dictionary to prevent skewing the frequency of certain vocabulary.