|
ðóññêàÿ âåðñèÿ
Morphology
The morphological standard of the RNC
The presentation of morphological information (part of speech, gender, case, aspect, etc.) in the Corpus is mainly based on the morphological model suggested by Zalizniak in the Grammatical dictionary of Russian (Moscow, 1977; 4th ed. Moscow, 2003). Nevertheless, the specifics of the Corpus as a universal language research tool require in some cases different solutions; every departure away from Zalizniak's model is motivated by such specific requirements.
The structure of morphological information
Morphological information assigned to a wordform consists of four fields, or groups of tags:
- Lexeme (a dictionary form of the lexeme and the part of speech to which it belongs)
- A variety of the lexeme's grammatical features, known as word-classifying features (for example, gender for nouns and transitivity for verbs)
- A variety of the wordform's grammatical features, known as word-altering features (for example, case for nouns and number for verbs)
- Information concerning non-standard forms of the word-form, orthographic variations, etc.
Morphological analysis (or a number of them), assigned to the lexeme as a part of the search result, is displayed as a tooltip when the mouse cursor is over the wordform. In the disambiguated corpus the full analysis is displayed, in the rest of the corpus the lexeme and the part of speech is displayed.
The metalanguage of the grammatical features is based on a set of tags, designed with a foreign audience in mind. It is also possible to search using traditional Russian names of grammatical categories.
The following is the inventory of grammatical tags used in the Corpus, with examples in brackets.
Parts of speech
S – noun
A – adjective
NUM – numeral
A-NUM – numeral adjective
V – verb
ADV – adverb
PRAEDIC — predicative (æàëü, õîðîøî, ïîðà)
PARENTH — parenthesis (êñòàòè, ïî-ìîåìó)
S-PRO — pronoun (îíà, ÷òî)
A-PRO — adjectival pronoun (êîòîðûé, òâîé)
ADV-PRO — adverbial pronoun (ãäå, âîò)
PRAEDIC-PRO — predicative pronoun (íåêîãî, íå÷åãî)
PR — preposition (ïîä, íàïðîòèâ)
CONJ — conjunction (è, ÷òîáû)
PART — particle (áû, æå, ïóñòü)
INTJ — interjection (óâû, áàòþøêè)
Grammatical categories:
Gender
m — masculine (ðàáîòíèê, ñòîë)
f — feminine (ðàáîòíèöà, òàáóðåòêà)
m-f — common (çàäèðà, ïüÿíèöà)
n — neuter (æèâîòíîå, îçåðî)
Animacy
anim — animate (÷åëîâåê, àíãåë, óòîïëåííèê)
inan — inanimate (ðóêà, îáëàêî, êóëüòóðà)
Number:
sg — singular (ÿáëîêî, ãîðäîñòü)
pl — plural (ÿáëîêè, íîæíèöû, äåòèøêè)
Case:
nom — nominative (ãîëîâà, ñûí, ñòåïü, ñàíè, êîòîðûé)
gen — genitive (ãîëîâû, ñûíà, ñòåïè, ñàíåé, êîòîðîãî)
dat — dative (ãîëîâå, ñûíó, ñòåïè, ñàíÿì, êîòîðîìó)
acc — accusative (ãîëîâó, ñûíà, ñòåïü, ñàíè, êîòîðûé/êîòîðîãî)
ins — instrumental (ãîëîâîé, ñûíîì, ñòåïüþ, ñàíÿìè, êîòîðûì)
loc — locative ([î] ãîëîâå, ñûíå, ñòåïè, ñàíÿõ, êîòîðîì)
gen2 — second genitive (÷àøêà ÷àþ)
acc2 — second accusative (ïîñòðè÷üñÿ â ìîíàõè; ïî äâà ÷åëîâåêà)
loc2 — second locative (â ëåñó, íà îñè́)
voc — vocative (Ãîñïîäè, Ñåð¸æ, ðåáÿò)
adnum — “count form”, or adnumerative (äâà ÷àñà́, òðè øàðà́)
Short/Full form:
brev — short form (âûñîê, íåæíà, ïðî÷íû, ðàä)
plen — full form (âûñîêèé, íåæíàÿ, ïðî÷íûå, ìîðñêîé)
Degree:
comp — comparative (ãëóáæå)
comp2 — prefix ïî + comparative (ïîãëóáæå)
supr — superlative (ãëóáî÷àéøèé)
Aspect:
pf — perfective (ïîø¸ë, âñòðå÷ó)
ipf — imperfective (õîäèë, âñòðå÷àþ)
Transitivity:
intr — intransitive (õîäèòü, âàðèòüñÿ)
tran — transitive (âåñòè, âàðèòü)
Voice:
act — active (ðàçðóøèë, ðàçðóøèâøèé)
pass — passive (adjectival participles only: ðàçðóøàåìûé, ðàçðóøåííûé)
med — middle (verbs ending in -ñÿ: ðàçðóøèëñÿ)
Verb form:
inf — infinitive (óêðàøàòü)
partcp — participle (óêðàøåííûé)
ger — gerund (óêðàøàÿ)
Mood:
indic — indicative (óêðàøàþ, óêðàøàë, óêðàøó)
imper — imperative (óêðàøàé)
imper2 — 1st person plural imperative ending in -òå (èäåìòå)
Tense:
praet — past (óêðàøàëè, óêðàøàâøèé, óêðàñèâ)
praes — present (óêðàøàåì, óêðàøàþùèé, óêðàøàÿ)
fut — future (óêðàñèì)
Person:
1p — first person (óêðàøàþ)
2p — second person (óêðàøàåøü)
3p — third person (óêðàøàåò)
Other features:
persn — first name (Èâàí, Äàðüÿ, Ëåîïîëüä, Ýñòåð, Ãîìåð, Ìàóãëè)
patrn — patronymic (Èâàíîâè÷, Ïàâëîâíà)
famn — family name (Íèêîëàåâ, Âîëêîíñêàÿ, Ãóìáîëüäò)
0 — indeclinable (øîññå, Ñåäûõ)
A number of these tags, namely second accusative, vocative, count form, prefix ïî- + comparative, common gender, transitivity, and indeclinability, are only available for the disambiguated corpus.
Multiple analyses
In certain cases the tagging will show multiple morphological analyses for one wordform. Such cases are:
Adjectives matching participles (îòêðûòûé), where both the adjective lexeme (îòêðûòûé) and the verb (îòêðûòü) are suggested
In cases where an unambiguous choice of a lexeme or grammatical meaning is impossible in the context (íå âèäåë ðîäíîãî îòöà – gen/acc, ìàíåêåíó – anim/inan, ñïàçìàìè – lexemes ñïàçì/ñïàçìà)
Nonstandard forms
The disambiguated Corpus employs a number of tags to signal nonstandard or peculiar wordforms. The lack of such distinguishing features is marked with a tag 'normal'.
anom (Anomalous form) – various morphological anomalies, possible in the case of old or colloquial, non-literary forms (òðè äíè instead of the norm òðè äíÿ, ëÿæü instead of the norm ëÿã)
distort (Distorted form) – orthographic and/or phonetic distortion of a word, often used to show peculiarities of pronunciation (äýâóøêà, òîâà'èùè, ïðî-õî-äè, íèçíàþ)
ciph (Numeral recording) – notation of a numeral, a numeral adjective or an adjective (fully or partly) with numbers (73, LXXIII, 73-é, 22-ëåòíèé). In such cases wordforms are assigned to a count form lexeme; number and case are only displayed in cases where an ending is recorded (as in 14-ìó)
INIT (Initials) – notations of the type “capital letter and a dot” (M., P.). The initials are not expanded in the lexeme field; no grammatical features are given.
abbr (Abbreviation) – an abbreviated notation (òîâ., ãã., ÷.). In the lexeme field the abbreviation is expanded (except initials), a grammatical form is supplied according to the context. Acronyms such as ÎÎÍ, âóç and shortened words like çàâ, çàì, recorded without a dot and not expanded in reading, do not receive the abbr tag and are treated like normal words (declinable or indeclinable).
In addition, the non-disambiguated Corpus uses a special tag for non-dictionary forms (forms not included in the dictionary of the parser but derived by analogy). As the dictionary is updated the occurrence of these forms will decrease. To lower the amount of “noise” in searches in the non-disambiguated corpus it may be advisable to exclude these forms from the search; for some tasks, however, the search may be limited to such forms entirely.
|