Slovak-Romanian Parallel Corpus
The first version par-skro-fic-1.1 was released on 24 August 2017 as a small and experimental corpus, including about 1.3 million tokens (603 111 tokens in the Slovak half and 688 867 tokens in the Romanian half).
The Slovak-Romanian Parallel Corpus is a database containing three literary texts translated from Romanian into Slovak and one documents about mutual collaboration. The texts are automatically sentence-aligned. The Slovak texts are automatically morphologically annotated by the MorphoDiTa tagger which has been trained and tuned on tagset developed by the Slovak National Corpus. The Romanian texts are annotated by TreeTagger.