Language data

The data is jointly released by the Slovak National Corpus, Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences and Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague.

These (and other) datasets relevant for MT are also available from the Clarin ERIC repository located at the LINDAT-Clarin project page.

To get access to the files, please contact us.

Translation tables for the Moses MT system

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

These tables will help you build your own MT system.

You can get more complete language models here.

Parallel corpora (English-Slovak)

Slovak texts are automatically lemmatized and morphologically annotated with the Slovak National Corpus tagset. English texts are lemmatized and part-of-speech tagged with the Penn Treebank Tagset.
CorpusSourceSentence pairs
Official Journal of the European Unionhttp://apertium.eu/data (oj4-ss-1)3272180
OPUS, the open parallel corpus http://opus.lingfil.uu.se/ 
OPUS-EMEA 1054178
OPUS-EUconst 10119
OPUS-KDE4 105425
OPUS-PHP 31173
JRC-Acquis 3.0http://langtech.jrc.it/JRC-Acquis.html1115765
The European Commission webpagehttp://ec.europa.eu/13050
Europarl v6http://www.statmt.org/europarl/460779

Parallel corpora (Slovak-Czech)

Slovak texts are automatically lemmatized and morphologically annotated with the Slovak National Corpus tagset. Czech texts are automatically lemmatized and morphologically annotated with the Czech National Corpus tagset.
CorpusSourceSentence pairs
Official Journal of the European Union http://apertium.eu/data (oj4-ss-1)3078210
OPUS, the open parallel corpushttp://opus.lingfil.uu.se/ 
OPUS-EMEA 1067905
OPUS-EUconst10630
OPUS-KDE497260
OPUS-PHP28084
JRC-Acquis 3.0 http://langtech.jrc.it/JRC-Acquis.html926082
The European Commission webpage http://ec.europa.eu/24190
Europarl v6 http://www.statmt.org/europarl/459089

Supported by the EC grant FP7-ICT-2009-5 Bringing Machine Translation for European Languages to the User – Enlarged European Union (EuroMatrixPlus-X).