VARD Normalization Errors

VARD decently standardizes Early Modern English. Sometimes, though, it makes questionable replacements.

ORIGINALNORMALIZATIONSHOULD BE
all’sell’sall’s
caus’dcausecaused
CiciliaCicelyCicilia
courtesiecurtsycourtesy
diuersdiversdiverse
hirhisher
ileisleI’ll
istfirstis’t
kild/killdkiltkilled
maistermoistermaster
maistmoistmayst
*nuncnunsnunc
PaulsPalsPaul’s
*quiqueenqui
sheesshoesshe’s / she is
weelewealwe’ll / we will
where’swhoreswhere’s / where is

Of course, you will want to check how your VARD installation handles these words. VARD keeps a running list of changes it makes, which silently trains the program as it executes. It is good practice to examine what VARD changes certain words to. Your datasets will be different than mine. These changes are based on blatant errors the VEP team located in the early modern drama corpus. Datasets, based on their content, have different curation needs.

*The dataset we use is interspersed with different languages, especially Latin. I had to add foreign words–like nunc and qui–to the dictionary to prevent skewing the frequency of certain vocabulary.