RNC News

Since August, the new version of the Russian National Corpus is the only one available for searching the entire corpus. The old version of the corpus is closed.

The Russian and English-Russian multimedia parallel corpora have been improved, and a number of minor errors in these corpora have been fixed.

More search results are downloadable in Excel format from the Main and Media corpora. As many as 5000 examples can be saved into an Excel table no matter how the search results are customized.

In the Multimedia corpus the detailed query of gestures (specifying active vs. passive organs) is activated, and some other search errors are fixed.

The main corpus reached the size of 375 million tokens. It is updated by new texts including but not limited to: diaries and memoires of the 19th-21th centuries from the «Prozhito» project; pre-revolutionary fiction, journalism and private letters both in old and modern orthography, including mass literature; post-1917 and contemporary prose; a collection of tourist guides; a collection of different academic genres (abstracts, programs, textbooks, problems), a collection of technical guides and instructions.

The Old East Slavic corpus is now sortable by date, including the date of the manuscript, and by genre.

The RNC website has been redesigned. The start page and the pages with general information on the Corpus are now displayed with a new interface. The project description has been revised and updated. Current information on the structure and composition of the subcorpora and other pages is now available. A FAQ section is added explaining the main features of the Corpus.

The English version of the site has also been partially updated. The new website is fully adapted for mobile devices.

The search query and search results pages have not been redesigned yet. Gradually, all of them will switch to the new interface. Please use the new version of the site and feel free to provide us with feedback on all the errors you have noticed.

The Old East Slavic corpus has been updated and now contains 655 thousand tokens. It includes texts of the 11th-14th centuries, representing a variety of genres. They feature such famous works as Lives of Boris and Gleb, The Testament of Vladimir Monomakh and The Tale of Igor's Campaign, as well as other hagiographic, didactic and canonical texts. A collection of Old Novgorod business documents (gramoty), both on parchment and paper, has been added. The Old East Slavic metatextual information now contains the date of the text and the date of surviving copy.

The corpus of birchbark letters is now a parallel corpus: it presents original texts aligned with their translations into Russian and English.

The poetry corpus has also been updated and now counts 13 million tokens. The update consists of poems by A. Vertinsky, G. Sapgir and others.

The parallel corpus now contains almost 163 million words. It has been updated with two new language pairs: Portuguese-Russian and Romanian-Russian. The Finnish-Russian text collection has been significantly expanded and now includes translations of fiction and journalistic texts, as well as the corpus of international treaties (we thank Mikhail Mikhailov who provided the texts). The collections of English and German texts in Russian translations have also been expanded.

Within the spoken corpus, a new search field 'Region' is now available.
Within the Old East Slavic corpus, it is now possible to search homonyms by semantics. In the Middle Russian corpus, a suggestion list has been attached to the Lemma field.