SNC corpora

  • Overview of SNC corpora
  • Frequency lists of lemmata, word forms and parts of speech from the publicly available SNC corpora

Monolingual corpus of written texts

The current version prim-10.0 created in April 2022, was made available on 15 April 2022 containing more than 1.68 billion tokens. Registration for free access is required.

Users can get access to the earlier versions by request.

Manually morphologically annotated corpus

Other text corpora

Time-restricted corpora

Learner corpora

Spoken corpora

Corpus of Dialects of the Slovak National Corpus

Corpus of Historical Slovak

Corpus of Crimean Tatar language