Corpus of OpenData Texts

This involves open and contemporary modern public administration with data made accessible on the Internet by government and other public institutions. The very nature of the data implies a corpus comprised mainly of statistical, numerical data, although in some cases it would include a sufficient amount of texts suitable for corpus processing.

A specialised corpus of judicial decisions, od-justice-1.0, was created in the Slovak National Corpus from the texts available in the OpenData project and this was released on 7 December 2018 with more than four billion tokens.

The corpus is based on data provided by the Ministry of Justice.

The corpus is lemmatised and morphologically annotated, the texts are accompanied with their URL and the place and time of retrieval.

The most recent version is available for users here.