The latest version of the corpus of historical Slovak hist-7.0 contains 981,000 tokens.
Is is made up of texts from the pre-codification period. It is comprised of both own project’s transliterated texts, as well as printed texts. The texts are not lemmatized nor morphologically annotated, users can search for a word form or use CQL. Searching is easier now – a user can include or omit diacritical marks and the search results stay the same. New version has been edited and its annotation unified.
More information can be found here.
The corpus is accesible after free registration.