Slovak-Bulgarian Parallel Corpus

The first version of par-skbg-free-0.1 was released in January 2014 containing 163 million tokens (78 million in the Slovak half, 85 million in the Bulgarian one).

A user can query the corpus via the web interface NoSketch Engine in the Bulgarian half, in the Slovak one.

Slovak-Bulgarian Parallel Corpus contains translations of texts from other languages into Slovak and Bulgarian. The texts are automatically sentence aligned. The Slovak texts are automatically morphologically annotated by the tagger Morče which has been trained and tuned on tagset developed by the SNK. TreeTagger has been used to tag the Bulgarian texts.