Corpora

What is a corpus?

A corpus of texts is a specific set of linguistic data presented in an electronic form. It is made up of texts that are usually of many different styles and genres accompanied by linguistic information. The search tools enable search and selection of desired linguistic units and information. Corpora are a good source of authentic information, which linguists can use to write accurate papers on the function and meaning of words, as well as describe other linguistic phenomena, such as statistics about words, their collocability, etc. Using a corpus enables common users to better understand the system of language and to verify or enrich their knowledge of how linguistic units function in real situations. It is not a replacement for linguistic reference books.

The Slovak National Corpus is a scientific and research project for building a corpus of electronic text, which was in the first period focused on contemporary Slovak written texts from the period 1955 – 2005. In its second and third period, it has been expanded to provide a wider array of texts, including texts from other periods (before 1955 and after 2005). It also covers various language types (spoken Slovak and dialects to a limited extent). Since 2002, the SNK Department of the Ľ. Štúr Institute of Linguistics at SAS has been carrying out systematic and comprehensive research on the Slovak language and subsequently putting it into electronic form.

For a clearer understanding of terms, see Výberový slovník termínov z korpusovej lingvistiky (Selective Dictionary of Corpus Linguistics Terms).