Current version par-skhu-1.1 was developed on 20 January 2023 and made accessible on 26 January 2023.
Compared to previous version, the content of the corpus remains unchanged, but the texts have been added metadata on style and genre annotation. Therefore, a user can search the corpus using keys of style and genre annotation, as well as keys of bibliographical annotation.
The corpus consists of two parts: the subcorpus of fiction (4 million tokens, 2 million per language) and the subcorpus of freely available texts. To access the fiction subcorpus, you can use the NoSketch Engine interface to query the Hungarian texts, or the Slovak texts.
Slovak-Hungarian Parallel Corpus is a database containing texts in both Slovak and Hungarian language. Slovak texts are translated into Hungarian or vice versa, the freely available texts were translated from third language. Texts are automatically aligned at sentence level. Slovak texts are automatically morphologically annotated by the tagger Morče trained on Slovak tagset developed by SNK. The Hungarian texts are annotated by the HUNPOS tagger.
The version par-skhu-1.0 from 17 December 2015 contains 99 million tokens (51 million in the Slovak half, 48 million in the Hungarian half).
The previous version par-skhu-0.2 was released in May 2015 containing 4 million tokens (approximately 2 million tokens per language).
The pilot version par-skhu-0.1 was released in January 2014 containing 3 million tokens (approximately 1.5 million tokens per language).
Developed jointly by Slovenský národný korpus, Jazykovedný ústav Ľ. Štúra SAV and Magyar Tudományos Akadémia, Nyelvtudományi Intézet.