Diakritik

DIAKRITIK is a tool for reconstructing diacritics. It was developed in the SNC and made available on August 18, 2014. It draws on the use of a language model based on the large corpus of Slovak texts.

One of the following methods with different error / speed ratios can be used for reconstruction:

first	Selects the first reconstruction option it finds in the text.
random	Where possible, replaces each word with a random word with diacritics.
naïve	Selects the most common words featuring diacritics.
n-gram	Will use a language model – words are reconstructed in sections of the length n so that the probability of occurrence of the resulting sentence in Slovak is as high as possible. The higher the n, the better the accuracy, but the greater the computational complexity.
remove diacritics	Opposite procedure, the tool removes diacritics from the uploaded text.

The error rate of the reconstructed text, i.e. the ratio of words with incorrect diacritics, is approximately 0.2%, i.e. the reconstruction of about one word out of five hundred will be erroneous. The more similar the text is to standard Slovak, the more successful its reconstruction.

To use DIAKRITIK tool, please click here.

Diakritik

More links

Address

Phone

Mobile

E-mail