Corpus Structure

Monolingual corpus of written texts

The current version prim-6.1 has been available since September 2013. The publicly available subcorpus contains more than 829 million tokens.

There are only two publicly available versions of the corpus and several earlier ones. One can get access to the earlier versions by request:

Manually morphologically annotated corpus