The second version of Literary Author Corpus released

The second version of the Slovak literary corpus LAK lak-2.0 contains 379,625 tokens (295,973 words). The current version of the LAK corpus consists of a total of six fiction books:

  • four books by the prominent Slovak writer Pavel Vilikovský (1941–2020): Večne je zelený (Forever is Green…), Peší príbeh (A Pedestrian Story), Krutý strojvodca (The Cruel Train Driver), and Posledný kôň Pompejí (The Last Horse of Pompeii);
  • two books by the classic Slovak writer Martin Kukučín (1860–1928): Keď báčik z Chochoľova umrie (When the Old Man from Chochoľov Dies) and Dom v stráni (The House on the Slope).

In the future, the corpus will be expanded to include additional authors and will be used on an ongoing basis to analyze processed texts, focusing, for example, on identifying stylistic features of individual characters, narrators, characters, analyzing motifs and their collocations etc. With the addition of new authors, it will also be possible to conduct diachronic research, for example, on types of narrators or other stylistic, linguistic, and literary phenomena in a broader context.

More information can be found here.

The corpus is accessible after free registration.

You can see how to work with the corpus in the instructional video available in Slovak here.

The corpus is co-financed within the DARIAH-SK consortium.