Corpus of Slovak Wikipédia and Necyklopédia
The fourth version wiki-2017-02 containing 45 109 693 tokens was made available in March 2017.
It is lemmatized (lemma is capitalized when it is a proper noun) and morphologically annotated, information on the source is provided.
The third version wiki-2016-02 containing 42 615 597 tokens was made available in March 2016.
It is lemmatized and morphologically annotated, information on the source is provided.
The second version wiki-2015-02 containing 40 million tokens was released in March 2015. It includes texts from Slovak Wikipédia and Necyklopédia, as of February 2015.
The first version wiki-2014-02 was released in February 2014 containing 37 548 997 tokens.