corpus lexicography in russia: recent trends and perspectives
DESCRIPTION
Maria Khokhlova St.Petersburg State University Philological Faculty [email protected]. Corpus lexicography in Russia: recent trends and perspectives. Prehistory of Russian Corpus Linguistics. Frequency Dictionary of Russian: (L.N.Zasorina, 1977) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/1.jpg)
Corpus lexicography in Corpus lexicography in Russia: recent trends Russia: recent trends
and perspectivesand perspectivesMaria KhokhlovaMaria Khokhlova
St.Petersburg State UniversitySt.Petersburg State University
Philological FacultyPhilological Faculty
[email protected]@gmail.com
![Page 2: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/2.jpg)
2
Prehistory of Russian Corpus Linguistics
Frequency Dictionary of Russian: (L.N.Zasorina, 1977) Text database contained about 1 mln units.During its compilation a huge number of notorious issues were discussed:representiveness;tokenization;lemmatization...So it was the earliest computer corpus of Russian.
![Page 3: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/3.jpg)
3
Prehistory of Russian Corpus Linguistics «Computer Fund of the Russian
Language»Idea: Acad. Andrey Yershov
Andrey Petrovich Yershov (1931-1988)
![Page 4: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/4.jpg)
Jeršov A.P. "On methodology of constructing dialogue systems: the
phenomenon of business prosa" (1978)
The idea was formulated as follows: "Any progress in the field of constructing models and algorithms will remain a purely academic exercise, unless a most important problem of creating a Computer fund of the Russian language is solved. We hope that creation of such a Computer fund by linguists, qualified for the task, will precede construction of large systems for application purposes. This would minimize labour costs and simultaneously would protect the Russian language from arbitrary and incompetent intervention“.
![Page 5: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/5.jpg)
5
Russian Corpora (1)
The Uppsala Russian Corpus (1960s), the earliest corpus
The Tübingen Russian Corpus (Tübingen Universität, in 1999 -2004 under the guidance of T.Berger)
The HANCO corpus (Helsinki Annotated Corpus), Helsinki University, Slavic and Baltic Languages Department (2001-2004, A. Mustajoki, M. Kopotev). It is a small teaching corpus with morphological and syntactic annotation.
![Page 6: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/6.jpg)
6
Russian Corpora (2)
Three big corpora of Russian: The National Corpus of Russian Language
(NCRL, about 364 million words) (http://ruscorpora.ru
Corpora at the Leeds University created by S.Sharoff (about 2000 million words) (http://corpus.leeds.ac.uk/ruscorpora.html)
A corpus of Russian Fiction at the Automatic Text Processing initiative team (AOT), 680 million words (http://aot.ru).
![Page 7: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/7.jpg)
7
Russian National Corpus (1)Over 364 million wordsBased on Yandex Search:
Search by exact form(s); Lexico-grammatical search. see www.yandex.ru – Advanced Search and www.ruscorpora.ru – Search in the Corpus
Additional options:morphological features;semantic features;metadata.
![Page 8: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/8.jpg)
8
Russian National Corpus (2)Subcorpora: Modern Russian corpus, Diachronic corpus (the Church Slavonic
language), Syntactic corpus, Spoken corpus, News corpus, Parallel corpora, Poetic corpus, Dialect corpus, Speech corpus, Multimodal corpus
![Page 9: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/9.jpg)
9
![Page 10: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/10.jpg)
10
![Page 11: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/11.jpg)
11
Dictionaries based on the Russian National Corpus
Grammatical Dictionary of Russian Neologisms;
New Frequency Dictionary of Russian;
The Combinatory Dictionary of Russian Intensifiers;
The Verbal Combinatory Dictionary of Russian Abstract Nouns
http://dict.lang.ru
![Page 12: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/12.jpg)
AOT (1)
![Page 13: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/13.jpg)
AOT (2)
![Page 14: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/14.jpg)
Russian Corpora (Leeds University, Serge Sharoff)
Russian Reference CorpusRussian Reference Corpus,
another versionRussian Fiction (disambiguated) Russian Newspapers
Russian Internet Corpus Russian National Corpus…
![Page 15: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/15.jpg)
![Page 16: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/16.jpg)
Collocations
![Page 17: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/17.jpg)
St.Petersburg Corpus of Hagiographic Texts
Biographies of saints and holy people;
50 manuscripts; 500 000 tokenshttp://project.phil.spbu.ru/scat/
page.php?page=project
![Page 18: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/18.jpg)
The Fundamental Digital Library of Russian Literature
and FolkloreFEB-web accumulates information in text,
audio, visual, and other forms on 11th-20th-century Russian literature, Russian folklore, and the history of Russian literary scholarship and folklore studies.
![Page 19: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/19.jpg)
19
Conference “Corpus Linguistics”
2002 2004 2006 2008 2011 2013 (late June)Saint-PetersburgSt.Petersburg State University,
Department of Mathematical Linguistics
![Page 20: Corpus lexicography in Russia: recent trends and perspectives](https://reader036.vdocuments.us/reader036/viewer/2022062423/568148cc550346895db5e7ff/html5/thumbnails/20.jpg)
Thank you for your attention!