Slovak-Latin Parallel Corpus

The most recent version par-skla-3.0 was released on 13 december 2018 containing 5 million token (2.66 milion tokens in the Slovak part and 2.3 million tokens in the Latin part).

Slovak-Latin parallel korpus includes 36 Latin translations (14 from classical latin, 8 from medieval Latin, 14 from modern Latin), and two texts are translations from originally Italian and a combined text.

The texts are automatically sentence aligned. Slovak texts are automatically morphologically annotated by MorphoDiTa tagger which has been trained on SNK tagset developed by the Slovak National Corpus, the Polish texts are annotated by TreeTagger.

A user may access the corpus after registration in NoSketchEngine in the Latin part or in the Slovak part.

Knowledge of NoSketch Engine and CQL is recommended.

Version 2.0

The version par-skla-2.0 was made available in 2014. The corpus contains about 1.44 million tokens (780 953 tokens in the Slovak part, 661 612 tokens in the Latin part). In comparison to the first version, no other texts were added, but the tokenisation and morphological annotation were improved.

Version 1.0

the version par-skla-1.0 was released at the end of 2012 containing more than 1.44 million tokens (781 193 tokens in the Slovak part, 661 691 tokens in the Latin part).

Version 0.1

The version par-skla-0.1 was released at the beginning of 2012 containing about 1.1 milion tokens (580 975 tokens in the Slovak part and 516 493 tokens in the Latin part).