ERRKORP – text corpus for foreigners learning Slovak as a foreign language

The objective of the presented project is to describe the language error types observed in process of the learning of SFL and to explore the correlations between types of language errors and various factors influencing the learning of SFL, upon the basis of the corpus of the written texts of the non-native speakers.

The corpus errkorp-1.0 contains 347,395 tokens and you can find it among the SNK corpora in the section Written Corpus – Acquisition Corpus, after you sign in to your account.

The corpus is comprised of 1,063 texts written by students learning Slovak as a foreign language, with different mother tongues and different knowledge of Slovak. The current version contains, at the level of manual annotation of marked errors, qualitatively improved data, compared to the pilot version, and also new supplemented data.

The acquisition corpus is being created by Studia Academica Slovaca – a centre for SFL at the Faculty of Arts of CU and the Department of Slovak National Corpus at Ľ. Štúr Institute of Linguistics of the Slovak Academy of Sciences in Bratislava. Except for SAS centre, since 2017, there has been included the lectorates of Slovak language abroad, which are run by the Ministry of Education, Science, Research and Sport of the Slovak Republic. In the research itself, there take part also the professionals from the leading Slovak studies institutions: the Institute of Slovak and Media Studies at the Faculty of Arts at PU in Prešov and the Department of Slovak Language and Communication of the Faculty of Arts at UMB in Banská Bystrica.

For more information, please contact Katarína Gajdošová (katarina.gajdosova@korpus.juls.savba.sk).