We have just released the new version the corpus r1843az1954-2.0. The corpus contains 3,897,816 tokens, which is 19,987,522 tokens less than in the previous version. Why? Because the aim was to capture the authentic language of the period when it was actually published. The latest version of the corpus of texts from 1843 to 1954 can be found in your SNK account in the section Written Corpora – Time-restricted Corpora.
Despite the selection, the corpus consists of texts with the language and grammatical principles of standard Slovak at the time of publication, e.g.:
- Andrej Sládkovič: Svätomartiniada. Budín: V tlačiarni kr. uhor. university 1861.
- Sbierka krajinských zákonov z roku 1868. Budín: Uhor. král. ministerstvo pravosudia 1868. (official translation into Slovak)
- Ján Babilon: Prvá kuchárska kniha v slovenskej reči. 1. sväzok, 2. sväzok. Pešť: Vlastným nákladom 1870.
- Franko Chvojnický: Smejme sa! Sbierka vtipov, žartov, smiechot a hádanok. T. S. Martin: Knítlačiarsko-účastinársky spolok 1870.
- Jan Fuchs: Počiatky silospytu (fiziky). Naukosklad prostonárodnej školy. Diel štvrtý. Pešťbudín: Majetok Viléma Lauffera 1879.
- Význam kávovinového priemyslu na Slovensku. Bratislava: Franck továreň na kávoviny 1945.
The corpus is currently available free of charge (after registration).
More information can be found here.