Corpus of Texts from 864–1843

The first version of the corpus named r864az1843-1.0 consisting of 2.11 million tokens has been available since February 5, 2015. The corpus contains a spread of publications from the so called Zlatý fond SME (Gold Fund of SME). The corpus includes texts written before the language standardization attempt by Ludevít Štúr. The texts have been transcribed following the grammar rules used at that time, as well as principles of the then editors or publishers.

The corpus includes basic bibliographical and style annotation, the texts are not lemmatized nor morphologically annotated. The user can search for all the occurrences of a word form or he can use the CQL.