crosslexica: a large electronic dictionary of collocations and semantic links in russian igor a....

45
CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City, Mexico

Upload: isaiah-mills

Post on 27-Mar-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

CrossLexica: A Large Electronic Dictionaryof Collocations and Semantic Links in Russian

Igor A. BolshakovNational Polytechnic InstituteMexico City, Mexico

Page 2: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Synopsis Electronic dictionary – CrossLexica – of collocations and semantic

links in Russian is developed, with especial stress to collocations. It contains a vocabulary of 185,000 entries and a matrix of classified

links between these entries. As many as 1.75 million nonempty syntagmatic links reflect the same quantity of collocations. So CrossLexica exceeds any monolingual dictionary by volume.

CrossLexica´s structure restores and gives out a collocation in its true grammar form when a query contains any its collocate. Thus the problem of two-sided data inversion is easily solved.

CrossLexica feasibly generates lacking collocations of available collocates, manages the order and the recall of delivery to the screen and (if needed) rejects unwanted stylistic elements.

The main operational mode is interactive, primarily for creating and editing Russian texts. CrossLexica also has a special outer link for numerous non-interactive applications, which are not manageable or poorly manageable by available linguistic tools.

Page 3: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Exposition plan Topic domains covered General features and features of entries Various types of links between entries Collocations The most frequent collocates Semantic links Semantic links support collocations Other linguistic resources Tags of idiomaticity and colloquialism User’s options Interactive applications Non-interactive applications An example of a delivery to the screen Profi’s opinion on CrossLexica Dreams on CrossLexica’s future

Page 4: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Topic domains covered Economy and business Politics and political science Engineering and technologies

(electronics, computers, programming, cars, construction, etc.) Exact, hard, and natural sciences

(mathematics, physics, chemistry, biology, geology, geography, etc.) Humanities, arts, and religion

(linguistics, history, confessions, etc.)

Medicine (mainly of everyday life)

Colloquial language (a lot of purely colloquial and abusive

words and expressions)

Page 5: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

General features Total vocabulary size, entries 185,000

Nouns 31%

Verbs 21%

Adjectives 27%

Adverbs 21% Includes homonymous groups 2,300

with total number of various senses 5,400

Total amount of collocations 1.75 million

Total amount of semantic links 2 million Paronymous links of two types 200,000

Page 6: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Dictionary entries may be

Noun entry (noun: separate entries for singular and plural)

Verb entry (personal forms + infinitive: perfect and imperfect aspects separately)

Adjective entry(adjectives or participles: two aspects separately)

Adverb entry(adverbs or gerunds: two aspects separately)

Auxiliary words (prepositions, conjunctions) are built into collocations and usually haven’t entries of their own.Predicative utterances like а пошел ты ‘go to hell’ are consideredadverbials.

Page 7: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Noun entrymay describe

An individual noun: абберация ‘abberation’, аббревиатура ‘abbreviation’, абзац ‘paragraph’, битва ‘battle’, бифштекс ‘steak’, блага ‘goods’,...

A stable noun group: алкогольные напитки ‘alcoholic drinks’, сельское хозяйство ‘agriculture’, точка зрения ‘point of view’, уровень жизни ‘life level’, болеутоляющие средства ‘analgesics’...

Page 8: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Verb entrymay describe

An individual verb говорить ‘to speak’, идти ‘to go’, обсуждать ‘to discuss’, спать ‘to sleep’,...

A verb with reflexive pronounвести себя ‘to conduct oneself’, чувствовать себя ´to feel oneself’,...

A verb groupнаводить страх ‘to be horrid’, оказывать внимание ‘to pay attention’, испытывать стремление ‘to aspire’...

Page 9: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Adjective entrymay describe

An individual adjectiveабстрактный ´abstract´, автономный ‘autonomous’, воздушно-реактивный ‘aerojet’...

An individual participle (maybe adjectivized)aглютинирующий ‘agglutinizing’, агонизирующий ‘agonizing’, вдвинутый ‘moved in’, возимый ‘being carried by’, коррумпированный ‘corrupt’...

An adjective groupхорошо одетый ‘well-dressed’, большой дальности ‘of long range’, бросающийся в глаза ‘conspicuous’, бывший в употреблении ‘second hand’, из ряда вон выходящий ‘outstanding’, с маслом ‘dressed with oil’, как бархат ‘like velvet’, как сталь ‘as steel’, без правил ‘no rules’, большого ума ‘of great wisdom’...

Page 10: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Adverb entrymay describe

An individual adverbабсолютно ‘absolutely’, абстрактно ‘in an abstract way’, аляповато ‘garishly’, быстро ‘quickly’, долго ‘for a long time’, плохо ‘badly’, по-хорошему ‘in an amicable way’, удовлетворительно ‘satisfactorily’...

An individual (specifically Russian) gerundбазируясь ‘(while) basing’, надев ‘(after) putting on’, удовлетворившись ‘(after) being satisfied’...

An adverb groupаккуратным образом ‘in an accurate way’, без воодушевления ‘without enthusiasm’, более или менее ‘more or less’, как выжатый лимон ‘as a squeezed lemon’, как лед ‘as ice’, в особой степени ‘to a high degree’, куда попало ‘anywhere’, на цыпочках ‘atiptoe”, долгое время ‘for a long time’...

Page 11: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Links between entriesdivide into

Syntagmatic links (= collocations = word combinations)думать о былом ‘to think about the past’, самолет садится ‘the plane is landing’, хорошо поесть ‘to eat well’, очень длинный ´very long’, сотрудничество с британцами ´cooperation with Britons’, предельно внимательно ‘in a quite attentive manner’…

Semantic links Synonyms дурак ‘fool’ – болван ‘blockhead’ Semantic derivates Москва1 ‘Moscow’ – москвичи

‘Muscovites’... Part – whole террариум ‘terrarium’ – зоопарк ‘zoo’ Genus – species документ ‘document’ – диплом1 ‘diploma’ Antonyms длинный ‘long’ – короткий ‘short’

Paronymous links (similarity in letters or morphs)кадка – каска, качка...; бег – бегун, бега, пробежка...

Page 12: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Collocations

Collocation is a pair of content words (=collocates) syntactically linked and stably compatible in meaning

Syntactic dependency link between collocates can include an auxiliary word (preposition or conjunction)

content word1 → (auxiliary word) → content word2

сотрудничество → ради → мира ‘cooperation → for → peace’

Each collocation is accessible from any its collocate. Hence the number of the unilateral links doubles the number of collocations

Page 13: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

The most numerous collocation types Modificative pair noun & adjective or verb /

adjective / adverb & adverb: краснокочанная капуста ‘red cabbage’, явный наглец ‘impudent fellow’, резко высказаться ´to bluntly

state’, полностью ясный ‘completely clear’, ужасно рад ‘awfully glad’... Verb & directly / indirectly / prepositionally

complementing noun : рассмотреть вопрос ‘to analyze a problem’, воротить нос ‘to turn up one’s nose’, остаться из-за погоды ‘to stay because of the weather’…

Noun subject & verbal or adjectival predicate: самолет вылетел ‘the plane took off’, внимание привлечено ‘attention was caught’, доклад (был) краток ‘the talk is(was) short’...

Noun & subordinated noun: сердце матери ‘mother’s heart’, отличия в произношении ‘differences in pronunciation’, борьба против

терроризма ‘struggle against terrorism’...

Page 14: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Some other types of collocations Adjective & directly, indirectly, or prepositionally complementing noun: красный от стыда ‘red with shame’, покрытый навозом ‘covered with manure’, согретый солнцем ‘warmed by the sun’, открытый для публики ‘open for the public’...

Verb & complementing infinitive: собраться поехать ‘to prepare to go’, мечтать выкупаться ‘to dream to take a bath’, хотеть перекусить ‘to wish to have a snack’...

Noun & complementing infinitive: соблазн сказать ‘temptation to say’, желание уйти ‘wish to leave’, проблема выжить ‘problem to survive’...

Verb & complementing adjective: быть нормальным ‘to be normal’, вернуться здоровым ‘to return healthy’, найти мертвым ‘to find dead’ …

Stable coordinate pairs: автобусы и троллейбусы ‘buses and trolleybuses’, ясный и четкий ‘clear and well-defined’, экономический и культурный ‘economic and cultural’, быть или не быть ‘to be or not to be’, взвесить и решить ‘to ponder and to decide’, власть и бизнес ‘the power and the business’, в срок и в полном объеме ‘in time and in full’, наука и техника ‘science and technology’...

Page 15: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

The most frequent collocates (1/4)Nouns with maximal number of governing verbs513 работа1 ‘work’ 363 руки ‘hands’

456 деньги ‘money’ 337 дело1 ‘business’

411 ребенок ‘child’ 327 книга ‘book’

386 местo1 ‘place’ 321 дорога ‘way’

374 дом1 ‘house’ 302 глаза ‘eyes’

366 дети ‘children’ 301 город ‘city’

Page 16: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

The most frequent collocates (2/4)

Nouns with maximal number of modifiers 1173 человек ‘(hu)man’ 510 вид1 ‘view’

709 лицо1 ‘face’ 506 режим2 ‘mode’

549 работа1 ‘work’ 494 голос1 ‘voice’

539 глаза ‘eyes’ 433 покрытие1 ‘cover’

534 женщина ‘woman ’ 408 препараты

‘preparates’

527 взгляд1 ‘look’ 400 анализ ‘analysis/test’

Page 17: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

The most frequent collocates

(3/4) Verbs with maximal number of complements 2284 быть ‘to be’ 963 считать ‘to consider’

2185 иметь ‘to have’ 959 вести ‘to do’

1442 находиться ‘to be’ 951 оказаться ‘to turn to be’

1270 стать ‘to become’ 918 требовать ‘to

require’

1095 начать ‘to begin’ 916 использовать ‘to

use’

1068 получить ‘to get’ 910 провести ‘to do’

Page 18: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

The most frequent collocates

(4/4) The most common adjective modifiers 2943 большой ‘big’ 1401 полный1 ‘complete’

2037 крупный ‘large’ 1309 явный ‘evident’

1739 небольшой ‘small’ 1198 огромный ‘huge’

1592 новый ‘new’ 1163 многочисленный

‘numerous’

1456 постоянный ‘stable’ 1149 сильный ‘strong’

Page 19: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Semantic links

Synonyms: 19,000 synsets of 5.6 members; unilateral links – 1.2 million

Semantic derivates: the groups like{извлечение ‘extraction’; извлекать ‘to extract’; извлеченный ‘extracted’, извлекший ‘extracting’; извлекая ‘while extracting’, по извлечении ‘after extraction’, путем извлечения ‘by extraction’};

unilateral links – 0.9 million

Part (or quantifier) Vs. whole, unilateral links – 25,000 Genus Vs. specie, unilateral links – 14,000 Antonyms, unilateral links – 12,000

Page 20: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Semantic links support collocations

Semantic links help to comprehend the meaningof the entry keywords. Glosses are absent except for homonymous entries, but English translations are to be ubiquitous.

A set of collocations lacking in the matrix is generated automatically at runtime, based on synonymy and hyponymy of the available collocates:

(bunch of flowers) & (asters IS_A flowers) (bunch of asters)

The correctness of the collocations thus generated is not guaranteed, and this is shown by low contrast of their delivery.

Page 21: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Other linguistic resources

Literal paronyms кадка: кака, каска, качка, кашка, кладка

Morphemic paronyms (бег is the common stem)бегающий, беглый, беговой, бегучий,...

Morphological paradigms for nearly all inflective keywords

English translations of Russian vocabulary entries, which, taken together, form a separate dictionary to access CrossLexica’s resources

Page 22: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Tags of idiomaticity at separate entries or collocations

no tag direct meaning only(идти в школу ‘to go to school’)

(idiom) idiomatic (figural) meaning only (сесть в галошу ‘to get into a fix’, lit. ‘to sit down into a galosh’)

(mb idiom) direct or idiomatic meaning (сесть в лужу lit. ‘to sit down to a puddle’ and also ‘to get into a mess’, первая ракетка lit. ‘the first racket’ and also ‘tennis champion’)

The symbol on the screen

Page 23: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Tags of colloquialism level ( style) at separate words or collocations no tag It’s common word / collocation; use it without any restrictions (стена ‘wall’, окно ‘window’, книга ‘book’, налоги ‘taxes’...)

● It’s a special, bookish or obsolete word / collocation; use it if you don’t fear to be unclear (абсцесс ´abscess´, парадигма ´paradigm´, адъективный ´adjectival´, аутсорсинг ´outsourcing´, роуминг ´roaming´...)

● It’s a purely colloquial word / collocation; don’t use it in official documents (мотать нервы ‘to squander the nerves’, жевать сопли ‘to chew one’s snot’... )

● It’s an abusive word / collocation; don’t use it at ladies and children, or inan official environment (говно ´shit´, жопа ´ass´, мудак ‘asshole’, взять за яйца ‘to catch hold of the nuts’...)

● It’s common in speech, but the scholars do not recommend it; so it should be reworded (оплатить за проезд, проплатить операцию)

The symbol on the screen

Page 24: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

User’s options The following two options of the whole dictionary can be chosen:

Russian, with menu items, names of delivery sections, and glosses of homonym senses given in Russian, or

English, with all mentioned given in English. At runtime, the user can:

Select alphabetic order to deliver some types of collocations or statistical order (those with more frequent collocates coming first). The cutting level for rarer collocates can be adjusted to prevent novice’s drowning in special or rare words.

Forbid delivering to the screen of abusing, colloquial or special words with all their collocations.

Enter the query through the keyboard, or select it in the vocabulary list, or select it in History list, or select it in the current collocation list on the screen. The latter option takes the indicated collocate as a new query, thus beginning navigation through the vocabulary.

Page 25: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Two types of possible applications Interactive applications

The user puts questions to the dictionary in interactive mode and can use the results, e.g., for the parallel text editing or language learning

Non-interactive applications An outer program applies to the dictionary for a reference and uses the results for its own purposes

Page 26: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Interactive application 1Perfecting Russian speaker’s skillsReference to the collocation ходить в школу (two possible ways)

‘to go to school’

Query1: ходить ‘to go’ In the delivery:. . . . . . . .HAS GOVERNING PATTERNS:

. . . . . . . .ходить в кого / во что / куда?

. . . . . . . . ходить в университет ходить в учреждения ходить в храм ходить в церковь ходить в цирк ходить в школу

Query2: школа ‘school’ In the delivery:. . . . . . . .GOVERNED BY VERBS:

. . . . . . . . руководить школой создать ... при школе уйти из школы уходить из школы учиться в школе ходить в школу шефствовать над школой являться школой ...

Page 27: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Interactive application 1 Perfecting Russian speaker’s skills

What valencies has the verb забыть ‘to forget’?

забыть что / кого? ‘what / whom’? забыть адрес, багаж, вкус, времена, время, вчерашнее,... (101 col.)

забыть о чем / о ком? ‘about what / about whom’? забыть о времени, обо всем, о вчерашнем, о главном,... (37 col.)

забыть про что / про кого? ‘about what / about whom’? забыть про все, про главное, про детей, про диссертацию, про семью,...(22 col.)

забыть … в чем / в ком / где? ‘…in what / where’? забыть ... в вагоне, в гостях, в комнате, в кафе, в ресторане,... (9 col.)

забыть … на чем / на ком / где? ‘…on what / where’? забыть ... на диване, на кресле, на кровати, на окне,... (7 col.)

забыть … при чем / при ком? ‘…while what’? забыть ... при декларировании, при зачтении,.. (3 col.)

забыть … по чему / по кому? ‘…because of what / why’? забыть ... по рассеянности, по невнимательности (2 col.),

забыть … из-за чего / из-за кого / почему? ‘…because of what / why’? забыть ... из-за волнения, из-за спешки (2 col.)

забыть … за чем / за кем? ‘…behind what / because of what’? забыть ... за давностью (1 col.)

забыть … от чего / от кого / откуда? ‘…because of what/ from where’? забыть ... от волнения (1 col.)

Page 28: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Interactive application 1Perfecting Russian speaker’s skillsMore examples How can be expressed by verb платa за проезд ‘transport

payment’? платить / оплатить / оплачивать проезд or заплатить за проезд (проплатить проезд and оплатить за проезд are also included but marked ● as unsuggestible)

How can бразильские женщины ‘Brasilian women’ be reworded? – бразильянки. And иракские женщины ‘Iraqi women’? – Only this way! (But иракец ‘Iraqi man’ and иракцы ‘Iraqi men’ do exist!)

How can somebody «cause» иск ‘suit’?внести / возбудить / вчинить / подать / предъявить иск, as well as обратиться с иском

What does the abbreviation РФФИ mean? – It has two senses: - Российский фонд федерального имущества - Российский фонд фундаментальных исследований

Page 29: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Interactive application 1 Perfecting Russian speaker’s skills Distinguishing morphemic paronyms вероятный

‘probable’IS MODIFIER FOR: адрес ‘address’альтернатива ‘alternative’вариант ‘option’версия ‘version’визит ‘visit’встреча ‘meeting’выбор ‘choice’гипотеза ‘hipothesis’запасы ‘stocks’изменение ‘change’

.........

вероятностный ‘probabilistic’

IS MODIFIER FOR: автомат ‘automaton’алгоритм ‘algorithm’анализ ‘analysis’анализатор ‘analyzer’аспекты ‘aspects’вывод ‘inference’задача ‘task’идеи ‘ideas’контроль ‘control’

.........

Page 30: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Interactive application 1 Perfecting Russian speaker’s skills Word sense disambiguation доменный1

‘domain’ (in attr. use)IS MODIFIER FOR:адрес ‘address’аукцион ‘auction’беспредел ‘violations’бизнес ‘business’границы ‘borders’зона ‘zone’имена ‘names’карта ‘map’контроллер ‘controller’новости ‘news’протокол ‘protocol’регистрация ‘registration’.........

доменный2 ‘blast furnace’ (in attr. use)IS MODIFIER FOR:воздухонагреватель ‘air heater’газы ‘gases’кокс ‘coke’конструкция ‘construction’ мастера ‘masters’печи ‘furnaces’ подъемник ‘elevator’производство ‘production’процесс ‘process’стенки ‘sidewalls’.........

Page 31: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Interactive application 1 Perfecting Russian speaker’s skills Disambiguation of quasi-homonyms личный

‘personal’IS MODIFIER FOR:

автомашина ‘car’автомобиль ‘automobile’автотранспорт ‘motor transport’адъютант ‘adjutant’ амбиции ‘ambitions’ антипатии ‘antipathies’ архив ‘archive’ аспект ‘aspect’багаж ‘luggage’безопасность ‘secuirity’беседа ‘conversation’библиотека‘library’.........

личной ‘face/facial’

IS MODIFIER FOR: карман ‘pocket’крем ‘cream’напильник ‘(smooth) file’нашивки ‘chevrons’полотенце ‘towel’пуговицы ‘buttons’салфетка ‘napkin’сторона ‘side’.........

Page 32: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Interactive application 2 Advice for an advanced learner of Russian All queries typical for a Russian user seem valuable

for a foreigner, plus: Getting references on the orthography and

morphology of any word. E.g. the noun Христос ‘Christ’ has its own declination pattern (Христа, Христу, Христом...)

Accessing through English dictionaryE.g., for the verb pay as many as 11 Russian verbs are got обращать, обратить, окупать, окупить,оплатить, оплачивать, платить, уделить, уделять, уплатить, уплачивать and through each of them relevant information can be obtained.

Page 33: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Non-interactive application 1

Facilitating text parsing

Вышеупомянутые механизмы обрушения

вызваны перегрузкой антресолей

в процессе эксплуатации.

Collocations are searched in the sentence to be parsed; the greater is the number of correct collocations found in a given parsing variant, the more probable it is.

Page 34: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Non-interactive application 2 Word sense disambiguation

Хамовнический суд ... начнет рассмотрение иска управления Федеральной службы1 исполнения наказаний России по Москве к адвокатам осужденных. As much as ●●●●● compatible neighbors служба1 ‘service’ Vs. служба2 ‘servicing’

Хамовнический суд ... начнет рассмотрение иска управления Федеральной службы2 исполнения наказаний России по Москве к адвокатам осужденных. As few as ●● compatible neighbors

The Khamovniki court … will start to condider the suit of Administration of Russian Federal Penitentiary Service/Servicing in/for Moscow against the convicts’ attornies

Collocations and semantic links are searched compatible with various senses of a homonymous word. The sense with greater number of syntagmatically or semantically compatible neighbors is preferred.

Page 35: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Non-interactive application 3Detection and correction of malapropisms

‘hysteric’ истеричного

‘historical’ исторического центризма‘centrism’

‘sterical’ стерического цента ‘cent’

... посещение истерического центра Москвы ...

… visiting the hysterical center of Moscow …

Syntactically linked pairs are detected that are not correct collocations. For each word in the pair found, all paronyms are searched through, as well as all collocations available for them in the dictionary. The collocations found are proposed to the user for their verification.

Page 36: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Non-interactive application 4

Steganography and steganalysis (Electronic water-marking problem) - explanation

Collocations and synonyms of words occurring in a text are used for controlled change of some word to their synonyms in order to encipher in these changes an independent information, which is thus transferred covertly by the carrier text without any alteration of its meaning.

Page 37: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Non-interactive application 4

Steganography and steganalysis - example

Пять

на юге

составляла от 2,2 до 3,1 балла по шкале

Рихтера, сообщили на Акташской

сегодня

замеченозарегистрированозафиксировано ●

● отмечено

землетрясений ● подземных толчков ●

за 24 часа ●● за сутки

● Алтая. ●Республики Алтай.

МагнитудаМощность

● МощьСила ●

● землетрясенийподземных толчков ●

● сейсмической станции ●сейсмостанции

после полудня. ● во вторую половину дня. ●

● ● ● ● B Obama● ● ● ● H Clinton

synonyms

Page 38: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Other non-interactive applicationsAdvanced information searchInternet query is automatically enriched by some collocates of the query words.

Idiomatic translation of English collocations into Russian onesAs an answer to the query strong woman, CrossLexica outputs now крепкая баба, сильная женщина...

Automatic splitting of text to paragraphsThe frontiers between sentences are searched which are crossed by the minimal number of links of any type.

etc.

Page 39: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Example of delivery for чувство ‘emotion’

Translations

Page 40: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Instead of conclusion:Opinion of Prof. Igor Mel’čuk, Canada CrossLexica is unique in its genre. As far

as I know, no similar dictionary exists for any language. A few published dictionaries of collocations (English and French) cannot even be compared with CrossLexica as far as the number of phrases described, the wealth of lexicographic information supplied, and the logic of dictionary organization.

The whole text of the letter of evaluation is presented separately.

Page 41: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

My first dream isTo supply CrossLexica to several users’ groups USERS’ GROUP

Any Russian who manages PC, mobile phone or Internet terminal (officials, businesspersons, scientists, students, etc.) [There are 50 millions Internet users and more than 60 millions mobile phones in Russia now.]

A dweller of countries nearby Russia (Ukraine, Baltic States, Poland, Middle Asia countries, etc.) wishing to restore or acquire knowledge of modern Russian (businesspersons, potential migrants or students, etc.) [Russia accepts now about 7 millions migrants. More than 20 millions Russian speakers are out of Russia.]

A dweller of Western countries (USA, UK, Canada, France, Germany, Italy, Spain, Scandinavian States, etc.) already knowing somewhat in Russian but wishing to improve their skills (businesspersons, Russian émigrés, teachers of Russian, Slavists, etc.)[More than 300,000 specialists have left the ex-USSR for the Western countries after 1991.]

ESTIMATE

to 1 million

to 100,000

to 10,000

Page 42: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

My second dream isTo see non-interactive applications working Detection and correction of malapropisms:

there exists an algorithm and it is tested in limited experiments

Word sense disambiguation:experiments are under preparation

Steganography and steganalysis: there exists a scratch algorithm

Facilitating text parsing: only an idea

Advanced information search: only an idea

To implement all these tasks in parallel with bringing CrossLexica to perfection is already impossible for me. However I am ready to give suggestions and consultations.

Page 43: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

My third dream isTo see CrossLexica implemented for English The structure of CrossLexica-Eng can be basically

the same (the differences are rather clear for me). Russian collocations can be multiply translated

word-to-word and then the trash should be filtered out automatically through Web search engines and mentally by lexicographers.

As an initial supply, the collocations from the available academic dictionaries can be taken.

The work group for the task has better to be headed by a native English speaker and should include English lexicographers. My estimate of effort consumption is 20 person × years and is not less than 3 years. My experience is valuable, but only for consulting.

Page 44: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Thus:The World’s LargestCombinatorial Dictionary

© Database Content, Grammar: I. Bolshakov, 2009

© Uploading Utilities: I. Bolshakov, A. Gelbukh, 2009

© User Interface: A. Gelbukh, 2009

Page 45: CrossLexica: A Large Electronic Dictionary of Collocations and Semantic Links in Russian Igor A. Bolshakov National Polytechnic Institute Mexico City,

Thank you for your attention!Any questions?Want to see CrossLexica functioning?

Prof. Igor A. Bolshakov

[email protected] [email protected]