Slovak-French Parallel Corpus
The current version par-skfr-2.0 was released in May 2016. The corpus includes 441.5 million tokens (213.3 million in the Slovak part and 228.2 million in the French part).
The corpus consists of the following parts: the subcorpus of fiction (par-skfr-fic-2.0) and the free subcorpus. You can query the subcorpus of fiction (containing 2.3 million tokens) by using NoSketch Engine. Par-skfr-fic-2.0 is identical to its previous version.
Slovak-French Parallel Corpus is a database of texts that are translations of each other, Slovak texts are translated into French or vice versa, as well as translations from the third language into Slovak and French. Texts are automatically sentence aligned. The Slovak texts are automatically morphologically annotated by the tagger Morče and MorphoDiTa which have been trained and tuned on the tagset developed by the Slovak National Corpus and the French texts are annotated by TreeTagger.
The version par-skfr-1.0 was released in October 2015. It included 350 million tokens (167.4 million in the Slovak part and 181.28 million in the French part).
The first, testing version of the Slovak-French parallel corpus was released in 2006, preceded by the first parallel corpus of the SNC containing approx. 125 million tokens (more than 59 million in the Slovak half and 66 million in the French half). Apart from fiction, the subcorpus also consisted of free translations of the EU texts, that had been included into the subcorpus, as the first parallel corpus of the SNC.