Corpus of Texts from 1843–1954

The first version of the corpus named r1843az1954-1.0 containing 24 million tokens has been available since February 5, 2015. The corpus contains a spread of publications mostly from the so called Zlatý fond SME (Gold Fund of SME). The corpus includes texts written after the language standardization attempt by Ludevít Štúr. The texts have been transcribed following the grammar rules used at that time, as well as principles of the then editors or publishers.

The corpus includes basic bibliographical and style annotation, the texts are not lemmatized nor morphologically annotated. The user can search for all the occurrences of a word form or he can use the CQL.