Bibliographical, style and genre annotation are inevitable parts of the primary processing of corpus texts. Information about the identity and the basic text structure are useful for its archiving, citation, statistical evaluation of parameters or investigating the distribution of language units and language phenomena in particular texts. The annotation will be displayed at the bottom of the client Bonito window by clicking on the desired line in a concordance list with the right mouse button. The annotation consists of keys together with values, which can be either free (e.g. author’s name) or other (e.g. genre). Keys can refer to style and genre characteristics of text. The main categories are type of text (literary, journalistic, professional, live communication), genre (poem, novel, short story, article, etc.) and domain (subject area, e.g. science, law, politics, economy). These categories can be further divided. Other keys provide the bibliographic details of a source and information about the author and text. Here is the list of keys under which you can find relevant information.
External annotation
External annotation uses the key-value structure. Value is a string of characters finished at the end of each line. The multi-line names are therefore excluded. The values may be either free (e.g. name of author) or chosen from specified values (e.g. genre). Optional flags consists of a set of flags separated by commas. Each flag establishes a particular characteristic of a value. These values have a special meaning (they are not necessarily meaningful for all the keys):- (an empty space or a whitespace)
- the same as „…“. Default value in the automatic annotation. But we suppose it will appear.
- missing key
- has the same value as the undefined key („…” or empty)
- XXX
- unknown value. It cannot be defined, e.g. author’s name in article.
- YYY
- undefinable value. It cannot be defined or has no meaning. It cannot be defined or has no meaning, e.g. gender of author (in collaborative work), gender of translator (if not a translation).
- MIX
- mixture. Mixed values, e.g. author is a hermaphrodite.
- MSC
- other. If the value is not defined in the set of values, e.g. author is a eunuch.
- TTT
- unknown value which needs to be defined. The annotation must be completed, the value added.
Annotation of the bank
Keys are in the form of title (abbreviation). Its meaning is described under the corresponding key and its possible values are listed, if not free. 1. Basic keys with free values:Author (Auth)
- Author’s name. As listed in resources under the standards for bibliographic records.
Origauthor (OrgA)
- Original author’s name.
Translator (Trnr)
- Name of translator. YYY, if not a translation.
Bibliography (Bibl)
- Bibliography.
BOGOCONG (BOGO)
- Multi-letter record of a conglomerate.
Name (Name)
- Name of text.
Origname (OrgN)
- Original name of text (in translation).
Conglomerate (Cong)
- Identification of conglomerate which the text is a part of.
Comment (Comn)
- Comment. It is used to specify or provide more information about the text.
Date (Date)
- Issue date.
Dateorig (OrgD)
- Original issue date (first issue, it might be identical with “Date”), original issue date of translations.
ISBN (ISBN)
- ISBN number.
ISSN (ISSN)
- ISSN number.
SourceId (ScId)
- ID of document of archive (remains the same in the bank).
Id (Id)
- Identification code of the document.
Translation (Trnn)
- Determines whether the text has been translated.
Values:
- trn
- translation
- org
- original text
- ftr
- loosely translated, retold text
- YYY
- combination of a translated and original text (e.g. a collection of short stories)
Rhyme (Rhym)
- It indicates whether the text rhymes in the sense of rhythmic binding or is unrhymed.
Values:
- nrh
- unrhymed
- rhy
- rhymed
- MIX
- partially rhymed
Type (Type)
- Text Type, the key is important when classifying texts into more homogeneous groups, it divides texts into individual styles.
Values:
- img
- literary (imaginative) text
- inf
- journalistic (informative) text
- prf
- professional text
- liv
- live communication
Subtype (SubT)
- Subtype of the text, extended values are used to more precisely specify the style (Type) of the text.
Type and specification according to Subtype | |||
---|---|---|---|
for Type = img | pre Type = inf | for Type = prf | for Type = liv |
(literary (imaginative) text) | (journalistic (informative) text) | (professional text) | (live communication) |
poe poetry | pub public press | sci scientific literature, articles, journals, university textbooks | spk spoken |
pro prose | adv advertising materials, advertising | pop popular science, special interest magazines | wri written (Internet, telex if used interactively, communication of speech-impaired people) |
dra drama | adm administrative texts | txb primary and high school textbooks | |
enc encyclopedia and similar alphabetically arranged works | |||
man manuals, operating instructions, recipes,… |
Genre (Genr)
- Genre determines other properties of texts, a large group of fixed values is established. The properties of artistic texts are also determined by Subgenre. There is a close relationship between the Type and Genre keys, which is illustrated in the table:
Genre in individual styles | ||
---|---|---|
for Type = img | for Type = inf | for Type = prf |
(literary (imaginative) text) | (journalistic (informative) text) | (professional text) |
ver verse | doc (documentary) minute, protocol, resolution, contract, annual report, resolution | mon monograph |
son song, libretto | ann (announce) directive, decree, questionnaire, commercials, announcements, offers | hnd handbook |
scd drama script, drama play | lst (heslovité) lists, programmes, rules, statues, content, masthead | dis dissertation, rigorous theses |
scf film script, film subtitles | rpt (report) report, interview, announcement, communique | ins instruction |
scr radio script | anl (analytic) editorial, comment, gloss, review, critics, discussion, polemic, debate, caricature | dpl diploma, bachelor and final works |
nov novel | pbb (belles-lettres) feuilleton, report, feature, column | std study |
col short story, collection of short stories | spc speeches (political, occasional) | abs abstract |
ess essay | dsc discussion/polemic/debate paper | tcl article |
dia dialogues | rfl reflection | |
mem memoirs, biographies, autobiographies | ref paper, term paper | |
let letters | lct lecture | |
chr chronicle | crs characteristics | |
sen short epic genres (sayings, quotes, aphorisms, jokes, etc.) | crt short epic genres (quotes, aphorisms etc.) | |
fac non-fiction | opn opinion |
Subgenre (SubG)
- for Genre: nov, col, ver, fac (for Genre ver and fac, Subgenre is optional)
Values:
- crm
- crime, detective
- scf
- sci-fi, fantasy, mystery
- adn
- adventurous, westerns
- rms
- romance novels
- bel
- belles lettres
- jun
- junior literature
- trv
- travel literature
Domain (Domn)
- Domain, thematic area (activities or knowledge).
Values:
- ars
- artistic science
- hum
- human science
- law
- law
- nat
- natural science
- tec
- technology
- ecn
- economy, management
- blf
- belief, supernatural
- lif
- life style
- ins
- interdisciplinary science
- plt
- politics
- gov
- state and public administration
Subdomain (SubD)
- Subdomain — a more detailed definition of a thematic, professional area.
For Domain “ins” there is no Subdomain.
The scope of relationships between Domain and Subdomain | ||||
---|---|---|---|---|
for Domain = ars | for Domain = hum | for Domain = law | for Domain = nat | for Domain = tec |
mus music, opera, operetta, ballet | his history, archeology | bil bills, statutes, regulations | agr agriculture | tra transport, lines, telecommunication |
cin cinema, film | psy psychology | jud judicatures | med medicine | ene energetics |
arc architecture | edu education | jur jurisdiction (other legal texts) | pha pharmacy | ind industry |
art art, photos, sculpture | soc sociology, communication | zoo zoology | com computer science | |
the theatre, theatre studies and critics | phi philosophy, aesthetics | bot botany | bui building industry | |
lit literature, literature science and critics | inf library science and information sources | bio biology | sta normalization, standardization | |
pol political science | che chemistry | |||
lin linguistics | mat mathematics | |||
eth ethnology, ethnography, anthropology | ggr geography | |||
cul cultural science | phy physics (including astronomy) | |||
swo social work | met meteorology | |||
mec mass media and marketing communication, media, advertising | geo geology | |||
env environmental studies, ecology | ||||
for Domain = ecn | for Domain = blf | for Domain = lif | for Domain = plt | for Domain = gov |
eco economy, banking, business | rel religion, belief, sects | hou household (flat, garden, handicraft, kitchen, breeding) | reg (optional) region | uso central authorities; institutions, centers and businesses with nationwide scope |
mng management, control | teo theology | fsh clothing, fashion | sam local government and self-government bodies | |
mer merchandising, consumer area | exc the supernatural, occult, magic, astrology | spo sport | tvs professional texts on public administration and self-government | |
sct social life | ||||
amu amusement, games, hobbies, free time, travelling | ||||
min ethnic minorities | ||||
reg region | ||||
cnl counselling | ||||
clt culture |
Medium (Medi)
- Medium, refers to the data carrier or text source.
Values:
- lib
- book
- ebk
- e-book
- nws
- newspaper
- jou
- journal
- ste
- studying materials
- net
- the Internet and other (pre-internet) networks. These include specific Internet newspapers, websites, e-mail, usenet contributions, contributions to fora, and live communication. Note that print newspapers downloaded from the Internet are „nws“, electronic books intended primarily for publishing are „lib“, but the e-books primarily intended for on-screen viewing are „net“.
- for
- form
- occ
- occasional (miscellanies)
- npu
- non-published texts, handwritings
- tvf
- television, cinema
- rad
- radio
Authsex (AutS)
- Sex of author.
Values:
- msc
- masculine
- fem
- feminine
Transsex (TrnS)
- Sex of translator, see Authsex.
Varieta (Vari)
- Language variant of document. It is Slovak mostly.
Values:
- std
- standard Slovak
- nst
- non-standard Slovak
- ost
- old standard / before the orthography reform in 1953
Paragraphs (Para)
- Determines the text division.
Values:
- tru
- true; text divided into paragraphs
- fls
- false; information on text division lost
Emphasis (Emph)
- Information on presence of an original highlighted text.
Values:
- tru
- true
- fls
- false
Diacritics (Dcrt)
- Text with correct or incorrect diacritics.
Values:
- tru
- true; correct diacritic marks
- fls
- false; incorrect or missing diacritic marks
Corrected (Corr)
- Document corrected or not.
Values:
- tru
- yes
- fls
- no
License (Lice)
- Type of licence.
Lang (Lang)
- Language of work, three-letter abbreviation in ISO format 639-2. It is always Slovak.
Origlang (OrgL)
- Original language of work according to ISO 639-3. Translations of already translated texts are marked „>“. For example: eng>ger.