SNC corpora

  • Overview of SNC corpora
  • Frequency lists of lemmata, word forms and parts of speech from the publicly available SNC corpora

Monolingual corpus of written texts

The current version prim-11.0 created in February 2025, was made available on 11 April 2025 containing more than 1.85 billion tokens. Registration for free access is required.

Users can get access to the earlier versions by request.

Manually morphologically annotated corpus

Other text corpora

Time-restricted corpora

Learner corpora

Spoken corpora

Corpus of Dialects of the Slovak National Corpus

Corpus of Historical Slovak

Corpus of Crimean Tatar language