Slovak-Polish Parallel Corpus

The first version par-skpl-1.0 was released on 3 December 2018 containing 8.2 million tokens (4 122 236 tokens in the Slovak part, 4 063 598 tokens in the Polish part).

A user can query the corpus after registration in NoSketch Engine in the Polish half, in the Slovak half.

Knowledge of NoSketch Engine and CQL is recommended.

Slovak-Polish parallel corpus includes 42 translations: from Polish into Slovak (25), from Slovak into Polish (6), translations from other languages into Slovak and Polish (11); and a document about mutual cooperation. The texts are automatically sentence aligned. Slovak texts are automatically morphologically annotated by MorphoDiTa tagger which has been trained on SNK tagsetu developed by the Slovak National Corpus, the Polish texts are annotated by TreeTagger.