VEP Shakespeare Collection
The following corpora are released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Before downloading the corpora, read about format of the text files here, and about our text processing workflow. Corpora are generated from Text Creation Partnership (TCP) XML files. Our downloads do not contain texts from EEBO-TCP Phase II, which will not be in the public domain until five years after the completion of the TCP project for Phase II.
Shakespeare Corpora
Shakespeare TCP (A11954)
A corpus of 36 Shakespeare plays, taken from TCP file A11954.
- Download Shakespeare TCP Simpletext plain text files
- zip contents: 36 SimpleText plain text files, README_SimpleText_files.txt
- size: 1.63 MB zipped; 4.20 MB unzipped
- Download Shakespeare TCP Ubiqu+Ity Tokens Files
- zip contents: 36 Ubiqu+Ity Tokens csvs, TextViewer.html, ShakespeareTCP-ubiq.csv, README_Ubiquity_tokens_files.txt
- size: 3.78 MB zipped; 21.2 MB unzipped
VEP Shakespeare Folger
A corpus of 36 Shakespeare plays, based on Folger Digital Texts.
- Download Shakespeare Folger SimpleText plain text files
- zip contents: 36 SimpleText plain text files; README_SimpleText_files.txt
- size: 1.62MB zipped; 4.29 MB unzipped
- Download Shakespeare Folger Ubiqu+Ity Tokens Files
- zip contents: 36 Ubiqu+Ity Tokens csvs, TextViewer.html, ShakespeareFolger-ubiq.csv, README_Ubiquity_tokens_files.txt
- size: 3.77 MB zipped; 21.2 MB unzipped