Corpus of Government Administration Texts

The current version of the corpus gov-vs-2.0, containing 12,363,067 tokens, was created on 7 April 2020 and released on 22 July of the same year. The corpus was comprised of texts made available on gov web domains up to 2019.

It was incorporated within the Slovak National Corpus in order to meet the need for the definition of government administration terms in the Slovak Terminology Database.

The corpus is lemmatized and morphologically annotated, with background information about their URL’s and time of retrieval. The source texts have been de-duplicated at the paragraph level.

Verzia 1.0

The first version of the specialised corpus gov-web-1.0 was released on 1 February 2019 with 11,677,058 tokens. It contains texts from government institutions that were available on the .gov and e-gov domains until the first half of 2017.