Slovak-Bulgarian Parallel Corpus
The trial version par-skbg-free-0.1 was released in January 2014 containing 163 million tokens (78 million in the Slovak half, 85 million in the Bulgarian one).
Slovak-Bulgarian Parallel Corpus contains translations of texts from other languages into Slovak and Bulgarian. The texts are automatically sentence aligned. The Slovak texts are automatically morphologically annotated by the tagger Morče which has been trained and tuned on tagset developed by the SNK. TreeTagger has been used to tag the Bulgarian texts.