Corpus of Spoken Slovak

The current version s-hovor-7.0 contains 869 records, it is composed of 851 hours of audio recordings containing 7 852 469 tokens.

 The first version s-hovor was released in December 2008, the version s-hovor-2.0 in January 2010, the version s-hovor-3.0 in February 2011, the version s-hovor-4.0 in August 2012, the version s-hovor-5.0 in April 2015 and the version s-hovor-6.0 in November 2017.

From the version s-hovor-6.0, the symbols used for transcription (turn.[ogg|spx|flac]) are available right in the search tool NoSketch Engine; the users are also given the possibility to hear the relevant part of the audio recording. The versions 4.0, 5.0 and 6.0 include two subcorpora: s-hovor-x-upn, which contains transcribed recordings of witnesses from the Project Oral History within the Nation’s Memory Institute and s-hovor-x-sane containing the other recordings from the primary corpus. A user can query the corpus through a Bonito client (as part of the SNC registration) or through the WWW interface where the text transcription is aligned to the audio.

The text transcriptions are lemmatized and morphologically anotated. The transcription metadata contain information about the participant, origin and content of the audio recording.

A user can enter a word, lemma or pronunciation in the input field and the transcription will be displayed.

Here you can find a list of the audio recording providers.

Contact: katarinag @