The WordNet contains semantic relations of the most frequent Lithuanian nouns, verbs, adjectives and adverbs following the general model of English WordNet. Slovak entries (synsets) are mapped to English and Slovak synsets.
The project is still in the development stage. Currently, the database contains 15 000 synsets. It has been made available to give an insight into the data and processing technologies. The file format may change.
The files are encoded using UTF-8 with the Unix line ending (LF, \n, U+00A0 ...). Each synset is a single line consisting of three records separated by a symbol ␞ U+241E SYMBOL FOR RECORD SEPARATOR. Synsets are ordered as follows: Lithuanian record, Slovak record and English record.
Format of the Lithuanian and Slovak record
Each record includes 4 annotations separated by tabs (\t):
Number is a synset identifier.
Part of speech classification:
n for nouns
v for verbs
a for adjectives
r for adverbs.
Words are literals grouped by similarity of meaning – literals are separated by a semicolon; explanation or further clarification can be given in the brackets. Plus sign (+) denotes semantically ‘most important’ literal in the synset. Minus sign (-) indicates that there is no direct equivalent in the target language. Question mark (?) denotes unclear synset.
Gloss is an optional comment on synset; in most cases this annotation remains empty.
Lithuanian synset can be linked to several English or Slovak synsets.
Lithuanian WordNet is available under following licenses: