Corpus of texts from the daily newspaper SME from 2022 to 2023

The corpus sme01-2022az06-2023, made available on 7 July 2023, contains 34,473,536 text units. The texts are sourced from the daily newspaper SME, released from January 2022 to June 2023.

The corpus is provided with information on bibliographical and style and genre annotation, identical with annotation in prim corpus. The texts are lemmatized and morphologically annotated by MorphoDiTa trained and tuned on the SNC tagset, which is used for the written corpora. The corpus contains also automatically labeled headings (<h1>) and subheadings (<h2>).