Corpus of Annual Reports from Government Institutions

The specialised corpus of texts from annual reports issued by government institutions gov-vs-1.0 was created on 8 April 2020 and released on 22 July 2020. It contains 17,864,463 tokens. The corpus was produced from the texts of annual reports of the government institutions made available to the public until early 2018.

It was incorporated within the Slovak National Corpus in order to meet the need for the definition of government administration terms in the Slovak Terminology Database.

The corpus is lemmatized and morphologically annotated, with background information about their URL’s and time of retrieval. The source texts have been de-duplicated at the paragraph level.