The pilot version of the manually annotated Literary Author Corpus lak-1.0 was created in 2024. Literary Author Corpus was made available on 29 January 2025, containing 177,938 tokens (140,536 words).
The corpus project builds on:
- the intersection of literary scholarship and corpus technologies, referred to as Digital Literary Studies;
- the korpusprozy.cz project developed by Richard Změlík (Department of Bohemistics, Faculty of Arts, Palacký University in Olomouc), which is based on the typology of narrators and other narrative phenomena defined in the diachronic narrative poetics of Alice Jedličková and team;
- the structuralist-oriented theory of narrative semantics by Lubomír Doležel and others.
The pilot version of the LAK corpus consists of the following three texts by the leading Slovak writer Pavel Vilikovský (1941 – 2020): Forever is Green… (“Večne je zelený”, 1989), A Pedestrian Story (“Peší príbeh”, 1992), and The Last Horse of Pompeii (“Posledný kôň Pompejí”, 2001).
The texts in the corpus have style-genre annotation, are automatically lemmatized and morphologically annotated using the Morphodita tagger.
For the purposes of basic literary (narratological) annotation, a tagset has been developed that contains eight tags reflecting three literary annotation keys: narrator, direct speech, and embedded structures.
The following values are assigned to all the keys:
- narrator: omni (omniscient), pers (personal), char (character, narrator-character) and rhet (rhetorical);
- direct speech: dirs (marked direct speech) and dirs_wq (direct speech without quotation, unmarked direct speech);
- insertion structures: inst (inserted text) or inss (inserted story).
The tags of the literary categories appear as structural markers in the corpus. Key values can be displayed as references and quantified, opening up new possibilities for computer-aided literary analysis.
The current version of the corpus can be searched using lemma, word form, substring, phrase and CQL expression.
Following the example of the Czech project, in the near future it is planned to extend the author corpus to include authors and texts from other periods by annotating named entities, which in turn will also allow the development of diachronically oriented and more exact literary analysis.