Slovak-Russian Parallel Corpus

Slovak-Russian Parallel Corpus contains Slovak and Russian texts which are translations of each other, Slovak texts are translated into Russian or vice versa, as well as translations from the third language into Slovak and Russian. The texts are automatically sentence aligned. The Slovak texts are automatically morphologically annotated by the tagger Morče which has been trained and tuned on tagset developed by the SNK. TreeTagger has been used to tag the Russian texts.

The current version par-skru-2.0 was released in January 2014 containing 8.45 million tokens (4.2 million in the Slovak part and 4.25 million in the Russian part).

You can query the corpus of “fiction“ via the web interface NoSketch Engine in the Russian half, in the Slovak part or here.

Enter the query term (Slovak/Russian word or a regular expression) into the input field Search. In the selection box corpus, choose the desired source for a particular term (par-skru-*-sk for Slovak texts and par-skru-*-ru for Russian texts). By clicking on the leftmost column, a short bibliography will be displayed.

Version 1.0

The first version 1.0 contains 101 thousand sentences in the Slovak half, 128 thousand sentences in the Russian one (almost 2 million tokens per language).


Developed jointly by: Slovenský národný korpus, Jazykovedný ústav Ľ. Štúra SAV and Кафедра математической лингвистики, Филологический факультет СПбГУ.