Morpheme Segmentation Models of the Russian National Corpus

On this page you can test the morpheme segmentation models developed by the team of Laboratory of Applied Digital Technologies (Novosibirsk State University) in collaboration with the Russian National Corpus.

Try segmentation online!

Please enter the word lemma without spaces, punctuation marks and numbers.

Segmentation: разбор

How it works?

  • First, we splitted each word to letters and assigned each letter BMES-label with morpheme type:
  • Then we fine-tuned pretrained BERT-like models for characher-level annotation for 30 epochs: roberta-small-belarusian for Belarusian, Czert-B-base-cased for Czech, and RuRoberta-large for Russian.

Find out more about our research:


How to cite

Dmitry Morozov, Lizaveta Astapenka, Anna Glazkova, Timur Garipov, and Olga Lyashevskaya. 2025. BERT-like Models for Slavic Morpheme Segmentation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6795–6815, Vienna, Austria. Association for Computational Linguistics.