Corpus of Dialects of the Slovak National Corpus

We started to prepare the Corpus of Dialects of the Slovak National Corpus (hereinafter referred to as CD SNC) in 2013. The aim of the initial phase is to gather existing dialect audio recordings or handwritten transcriptions, in particular those already published, to process them in the way using a corpus methodology and tools and make them available for research.

The new version dialekt-5.0 containing 980 643 text units was made accessible in April 2022. The current version contains more than 100 text resources.

The previous version dialekt-4.0 was released on 18 December 2018 containing 711 766 tokens.

CD SNC is not lemmatised nor morphologically annotated. User can browse the corpus by searching for a word or using CQL. The transcribed texts contain sociolinguistic metadata about respondents, informants, origin and content of record. User can access the corpus through web interface NoSketch Engine, but he/she must register for an account.