sémantique distributionnelle, embeddings (et dong ...felipe/ift6010-hiver2016/transp/dist.pdf ·...

73
BD Tasks Deep emantique distributionnelle, embeddings (et dong) [email protected] RALI Dept. Informatique et Recherche Op ´ erationnelle Universit ´ e de Montr ´ eal mars 2016, V0.00002 (pas termin ´ e) [email protected] emantique distributionnelle, embeddings (et dong)

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Semantique distributionnelle, embeddings (et dong)

[email protected]

RALIDept. Informatique et Recherche Operationnelle

Universite de Montreal

mars 2016, V0.00002 (pas termine)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 2: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Plan

(Before Deep) : modele vectoriel

Taches typiques

An then came the “Deep”Word2VecAnalogieEvaluationIdees interessantesLe cas bilingue

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 3: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Plan

(Before Deep) : modele vectoriel

Taches typiques

An then came the “Deep”Word2VecAnalogieEvaluationIdees interessantesLe cas bilingue

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 4: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

I If A and B have almost identical environments. . . we say thatthey are synonyms (Harris, 1954)

I you shall know a word by the company it keeps ! (Firth, 1957)I words which are similar in meaning occur in similar contexts

(Rubenstein & Goodenough, 1965)I . . . In other words, difference of meaning correlates with

difference of distribution (Harris, 1970, p.786)I words with similar meanings will occur with similar neighbors if

enough text material is available (Schutze & Pedersen, 1995)I a representation that captures much of how words are used in

natural context will capture much of what we mean by meaning(Landauer & Dumais, 1997)

I in the proposed model, it will so generalize because “similar”words are expected to have a similar feature vector, andbecause the probability function is a smooth function of thesefeature values, a small change in the features will induce a smallchange in the probability. (Bengio et al, 2003)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 5: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Hypothese distributionnelle [Sahlgren, 2008]

I reflexion sur l’hypothese distributionnelle

relations paradigmatiquessyntagmatiques she adores green paint

he likes blue dyethey love red colour

I un syntagme est une combinaison ordonnee d’entitesco-occurrences de premier ordre

I un paradigme est un ensemble d’entites (mots) substituables (etqui ne co-occurent donc pas)co-occurrences de second ordre

I “A distributional model accumulated from co-occurrenceinformation contains syntagmatic relations between words, whilea distributional model accumulated from information aboutshared neighbors contains paradigmatic relations betweenwords.”

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 6: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Modele vectoriel (Vector Space model)

I lire [Turney and Pantel, 2010] pour une introduction

I lire [Baroni and Lenci, 2010] pour une generalisation (tenseur)

1 une matrice de “comptes” de co-occurences2 un schema de ponderation (PMI, LLR, etc.)3 une politique de reduction de dimensionnalite

singular value decomposition [Golub and Van Loan, 1996]non-negative matrix factorization [Lee and Seung, 1999]aucune ! (tres bon baseline, voire meme systeme de reference)etc.

I DISSECT offre les etapes 2 et 3

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 7: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

matrice de co-occurence : terme×document

I similarite de documents

I hypothese bag of word : si une requete et un document ont desrepresentations (colonnes) similaires, alors ils vehiculent lameme information [Salton, 1975]

I implemente (par exemple) dans Lucene

Pris de [Jurafsky and Martin, 2015][email protected] Semantique distributionnelle, embeddings (et dong)

Page 8: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

matrice de co-occurence : terme×terme(contexte)

I similarite de termes

I hypothese distributionnelle : si deux termes ont desrepresentations (lignes) similaires, alors il sont similaires

a

a. Pris de [Jurafsky and Martin, 2015]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 9: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

matrice de co-occurence : terme×terme(contexte)

I similarite de termes

I hypothese distributionnelle : si deux termes ont desrepresentations (lignes) similaires, alors il sont similaires

a

a. Pris de [Jurafsky and Martin, 2015]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 10: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

matrice de co-occurence : paire determes×patron

I similarite de relations

I hypothese distributionnelle etendue : si deux patrons ont desrepresentations (colonnes) similaires, alors il ont des senssimilaires [Lin and Pantel, 2001]

I hypothese de relation latente : si deux paires de mots ont desrepresentations (lignes) similaires, alors elles sont similaires X ofY , Y of X , X for Y , Y for X , X to Y , et Y to X

I une liste de 64 mots comme of , for ou toI formant 128 patrons (colonnes) contenant la paire (X,Y) :

[Turney, 2005]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 11: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Plan

(Before Deep) : modele vectoriel

Taches typiques

An then came the “Deep”Word2VecAnalogieEvaluationIdees interessantesLe cas bilingue

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 12: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Credit

I la plupart de ces ressources (et d’autres) sont listees ici :http://www.cs.cmu.edu/˜mfaruqui/suite.html

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 13: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Rubenstein and Goodenough

http://www.cs.cmu.edu/˜mfaruqui/word-sim/EN-RG-65.txt

1 gem jewel 3.942 midday noon 3.943 automobile car 3.924 cemetery graveyard 3.885 cushion pillow 3.846 ...7 autograph shore 0.068 fruit furnace 0.059 noon string 0.04

10 rooster voyage 0.0411 cord smile 0.02

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 14: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

WordSimilarity-353

I http://www.cs.technion.ac.il/˜gabr/resources/data/wordsim353/

I set1 153 paires de mots : 13 jugements de relatednessI set2 200 paires de mots : 16 jugements

1 % more set1.csv | sort -t\, -k3,3n+2 tiger,tiger,10.00,10,10,10,10,10,10,10,10,10,10,10,10,103 coast,shore,9.10,9,9.75,10,10,6,6,10,8,9.5,10,10,10,104 football,soccer,9.03,9,9.9,9,10,8,7,9.5,10,8,10,9,9,95 fuck,sex,9.44,10,9.75,10,10,8,9,9,10,9,10,8,10,106 journey,voyage,9.29,9,9.75,10,10,6,7,9.5,10,9.5,10,10,10,10

I split de http://alfonseca.org/eng/research/wordsim353.html

similarite 203 pairesrelatedness 252 paires

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 15: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

A propos du split de [Agirre et al., 2009]

I 103 paires en commun dans les deux fichiers :

more wordsim_relatedness_goldstandard.txt \wordsim_similarity_goldstandard.txt | \sort | uniq -c | awk ’$1 == 2 print $0’ | \sort -k4,4gr | head -n 5+

1 2 cup food 5,002 2 monk oracle 5,003 2 journal association 4,974 2 car flight 4,945 2 street children 4,94

I lire l’article pour plus d’information . . .

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 16: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

MENhttp://clic.cimec.unitn.it/˜elia.bruni/MEN.html

I 3000 paires de mots avec un score (entre 0 et 50) indiquant leurrelatedness (200 TRAIN, 100 TEST)

cat-n feline-j 48 gun-n stair-n 2copper-n metal-n 48 military-j tomato-n 2flower-n garden-n 48 angel-n gasoline-n 1grass-n lawn-n 48 feather-n truck-n 1pregnancy-n pregnant-j 48 festival-n whisker-n 1

I paires verifiant des contraintes de frequences dans differentesressources et echantillonnees de maniere a generer une tachebalancee

I evaluation faite par comparaison binaire a 50 autres paires surMecanical Turk

I tache : trouver les paires les plus similaires (ranking)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 17: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

TOEFL Tests [Freitag et al., 2005]

I accessible ici :http://www.cs.cmu.edu/˜dayne/wbst-nanews.tar.gz

I chaque instance de test est de la forme :

target : [

stimulus︷ ︸︸ ︷synonym and 3 distractors]

ex :

comprehend : [encompass career admire pardon]

I 9887 instances de test pour les noms, 7398 pour les verbes

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 18: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

TOEFL Tests [Freitag et al., 2005]

force violence outburst encephalopathy spartaforce power plaintif vomiting deepforce personnel distress universe poeforce strength adrian character atlantaforce effect abuja flora share

buy bribe supply reinvent triumphbuy corrupt bar recount weightbuy purchase harp esteem forgiverecord commemorate smack resort tremorrecord enter ripple subpoena shirtrecord tape drink pare pearl

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 19: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Word2Vec datasets (analogies)

http://word2vec.googlecode.com/svn/trunk

questions-words.txt 19544 equations (avec solution) :: capital-common-countries

Berlin Germany Islamabad PakistanBaghdad Iraq Oslo Norway

: capital-worldAlgiers Algeria Belmopan BelizeAbuja Nigeria Doha Qatar

: currencyArmenia dram India rupeeAngola kwanza Denmark krone

: city-in-statePhiladelphia Pennsylvania Atlanta GeorgiaChicago Illinois Stockton California

: familyfather mother husband wifebrother sister prince princess

: gram1-adjective-to-adverbcheerful cheerfully efficient efficientlyapparent apparently free freely

: gram2-oppositeclear unclear impressive unimpressiveaware unaware efficient inefficient

: gram3-comparativebright brighter simple simplerbig bigger bright brighter

: gram4-superlativebright brightest weird weirdestbig biggest dark darkest

: gram5-present-participledebug debugging code codingdance dancing enhance enhancing

: gram6-nationality-adjectiveAustralia Australian India IndianAlbania Albanian Sweden Swedish

: gram7-past-tensedescribing described playing playeddancing danced vanishing vanished

: gram8-pluralbottle bottles man menbird birds bottle bottles

: gram9-plural-verbsenhance enhances play playsdescribe describes increase increases

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 20: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

Word2Vec datasets (analogies)

http://word2vec.googlecode.com/svn/trunk

questions-phrases.txt 3218 equations (avec solution) :

: newspapersBoston Boston_Globe Salt_Lake Salt_Lake_TribuneBaltimore Baltimore_Sun Cincinnati Cincinnati_Enquirer

: ice_hockeyCalgary Calgary_Flames Ottawa Ottawa_SenatorsBoston Boston_Bruins Chicago Chicago_Blackhawks

: basketballDallas Dallas_Mavericks Boston Boston_CelticsBoston Boston_Celtics New_York New_York_Knicks

: airlinesFinland Finnair Spain SpanairAustria Austrian_Airlines Turkey Turkish_Airlines

: people-companiesSamuel_J._Palmisano IBM Jeff_Bezos AmazonEric_Schmidt Google Larry_Ellison Oracle

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 21: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

A propos des datasets Word2Vec

http://word2vec.googlecode.com/svn/trunk

I 5031 analogies impliquant la relation capitale of, maisseulement 118 paires de pays / capitale . . .

I 870 analogies impliquant la relation plural-verbs, maisseulement 30 paires de verbes . . .

I 27 avec la relation +s ; ex : eat / eatsI et seulement 3 avec la relation +es : go / goes, search / searches

et vanish / vanishesI note : [go: goes :: eat : eats] n’est pas une analogie formelle

I 1560 analogies impliquant la relation past-tense, maisseulement 40 paires de verbes . . .

I ex : decreasing / decreased, knowing / knewI dont la moitie (23) sont des verbes irreguliers ; ex : sitted / sat et

sleeped / slept

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 22: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

MSR Sentence Completion dataset

http://research.microsoft.com/en-us/projects/scc

I 1040 questions comme celles-ci :

26 It may be that the solution of the one may to be the solution ofthe other.

a) proveb) learnc) choosed) forgete) succumb

27 We shall just be in time to have a little with him.

a) breakfastb) elegancec) garmentd) baskete) dog

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 23: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

MSR Sentence Completion dataset

http://research.microsoft.com/en-us/projects/scc

I 1040 questions comme celles-ci :

26 It may be that the solution of the one may prove to be the solutionof the other.

a) proveb) learnc) choosed) forgete) succumb

27 We shall just be in time to have a little breakfast with him.

a) breakfastb) elegancec) garmentd) baskete) dog

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 24: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

BLESShttps://sites.google.com/site/geometricalmodels/shared-evaluation

I 26 554 lignes comme celle-ci :frog-n amphibian_reptile attri green-j

qui indique qu’une grenouille est un amphibien ou un reptile etqu’un de ses attributs est d’etre verte (gasp).

I 200 concepts (reptiles, mammiferes, oiseaux, fruits, insects,armes, etc.) , 5 relations (attribut, co-hyponyme, hyperonyme,meronyme, evenement)

On average, there are 14 attri, 18 coord, 19 event,7 hyper and 15 mero relata per concept.

I 14 400 concept-relation-relatum + 12 154 control tuples (random)

I une tache suggeree

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 25: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

BLESS/froghttps://sites.google.com/site/geometricalmodels/shared-evaluation

mero blood-n bone-n eye-n foot-n gill-n head-n leg-n lung-npoison-n skin-n tongue-n wart-n

event breathe-v catch-v chase-v croak-v die-v drink-v eat-v hop-v kill-v lay-v leap-v live-v sit-v swim-v

attri amphibious-j aquatic-j brown-j colorful-j edible-j green-jlittle-j loud-j noisy-j old-j poisonous-j small-j ugly-j wild-j

coord alligator-n bullfrog-n lizard-n snake-n toad-n turtle-nhyper amphibian-n animal-n beast-n chordate-n creature-n

vertebrate-n

+ 50 relations aleatoiresfrog-n amphibian_reptile random-n doyen-n

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 26: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

SemEval 2010

Cause-Effect (1003)He had chest pains and <e1>headaches</e1> from<e2>mold</e2> in the bedrooms.

Component-Whole (941)The <e1>provinces</e1> are divided into<e2>counties</e2> (shahrestan), and ...

Entity-Destination (845)The prosecutors carried <e1>guns</e1> into the<e2>court room</e2>.

Product-Producer (717)Next the <e1>soldiers</e1> put up a <e2>wall</e2>of stakes on the pile of dirt.

+ Entity-Origin (716), Member-Collection (690), Message-Topic (634),Content-Container (540), Instrument-Agency (504)

I Tache : identifier la relation de 2717 paires de noms marqueesdans des phrases

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 27: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

SemEval 2012 (pilot) / 2013 (main task)

I sur une echelle de 0 a 5, noter la similarite de paires dephrases :

MT When we are faced with a potential risk, it is important to apply the pre-cautionary principle.

5 When we are faced with a potential risk, it is important to put into practicethe precautionary principle.

MT This rule can be applied equally to all recreational but dangerous drugswhen no one but the person consuming the drug is likely to be affected.

4.8 This rule can, moreover, apply to all drugs recreationnelles but dangerousprovided that the consumer of drugs is the only run the risks.

MSR-VID A man stares out a window.4.8 A man looks out the window.MSR-PAR But at age 15, she had reached 260 pounds and a difficult decision : It

was time to try surgery.4.2 But at the age of 15, she weighed a whopping 117kg and came to a

difficult decision : it was time to try surgery.

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 28: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

SemEval 2014 / Task3

I Phrase2Word (TRAIN : 500, TEST : 500) — echelle [0, 4]

4 the structural and functional unit of allknown living organisms

DNA Lexicographic

4 traditionally the land on which a collegeor university and related institutional buil-dings are situate

campus Lexicographic

4 watering hole pub Idiomatic4 wear out one’s welcome exhaust Idiomatic4 whole ball of wax total Idiomatic

...

0 to make from scratch breath Idiomatic0 what a vivid imagination insipid Newswire0 workouts to strengthen abdominal

musclesfractions Search

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 29: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep

SemEval 2014 / Task3

I Sentence2Phrase (TRAIN : 500, TEST : 500) — echelle [0, 4]

4 We completely agree and the-re’s no daylight between us onthe issue.

complete agreement Idiomatic

4 do you now where i can watchfree older movies online withoutdownload ?

streaming vintage mo-vies for free

CQA

I Paragraph2Sentence (TRAIN : 500, TEST : 500) — echelle [0, 4]

4 so i bought an item a coupleweeks ago on eBay and i stillhavent even been notified thatmy item has been shipped yet..what do i do ?

What should I do when myeBay purchase still hasn’tshipped after two weeks ?

CQA

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 30: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Plan

(Before Deep) : modele vectoriel

Taches typiques

An then came the “Deep”Word2VecAnalogieEvaluationIdees interessantesLe cas bilingue

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 31: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Liste des articles a articuler

I des modeles vedettes : Word2Vec [Mikolov et al., 2013a] ,GloVe [Pennington et al., 2014]

I proprietes des embeddings :[Mikolov et al., 2013d, Mikolov et al., 2013c]

I des resultats : glory [Baroni et al., 2014] , moderation[Levy et al., 2015]

I cool works : [Faruqui and Dyer, 2015, Faruqui et al., 2015b,Faruqui et al., 2015a]

I modeles bilingues :[Mikolov et al., 2013b, Chandar et al., 2014,Gouws et al., 2015, Coulmance et al., 2016]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 32: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Une revolution chez les “distributionnalistes” :Word2Vec [Mikolov et al., 2013a]

I un toolkit rapide implementant deux modeles

I no deep inside, mais meilleur (selon les tests realises a date)I une combinaison avec un RNN donne de meilleures

performances (sur certaines taches)

I des embeddings disponibles entraınes sur 6B de mots deGoogle News (180K mots) - dimension = 300

I directement utilisable dans de nombreuses applications

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 33: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Les 2 modeles de Word2Vec[Mikolov et al., 2013a]

I Skip-gram est le plus populaire (plus fiable pour les “petits”corpus), CBOW est plus rapide pour les tres grands corpus

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 34: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Skip-gram [Mikolov et al., 2013a]

I C un corpus d’entraınement que l’on peut voir comme unensemble D de paires (w, c) ou w est un mot de C et c est unmot vu dans un contexte (note : le modele representedifferemment les mots de contexte des mots du vocabulaire)

I Soit (w, c) une paire de C. Appartient-elle a C ?Soit p(D = 1|w, c; θ) la probabilite associee et p(D = 0|w, c; θ) lecomplement

I fonction objectif (optimisee par descente de gradient) :

argmaxθ

∏(w,c)∈D

p(D = 1|w, c; θ)∏

(w,c)∈D′p(D = 0|w, c; θ)

avec p(D = 1|w, c; θ) = 11+exp−vc.vw

ou vc (resp. vw) est le vecteur de c (resp. w)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 35: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Skip-gram [Mikolov et al., 2013a]

I Soit en prenant le log et en posant σ(x) = 11+e−x

argmaxθ

∑(w,c)∈D

log σ(vc.vw) +∑

(w,c)∈D′log σ(−vc.vw)

I D′ est construit en choisissant k paires aleatoirement selon lesdistributions unigrammes (des mots et des mots de contextes).

I les contextes sont definis par une fenetre centree autours dumot w considere, et dont la taille est tiree aleatoirement (etuniformement sur un intervalle fixe)

I les mots les plus frequents sont sous-echantillonnes (retiresaleatoirement de C) et les mots peu frequents sont elimines(cut-off).

I mais ca marche ! ! !(lire [Levy and Goldberg, 2014] pour une explication)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 36: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Arithmetique analogique des representations[Mikolov et al., 2013d]

I vec(Madrid) - vec(Spain) ' vec(Paris) - vec(France)

I permet de resoudre des equations analogiques : [x: y :: z : ?]

1 calculer t = vec(y)− vec(x) + vec(z) le vecteur cible2 rechercher dans V , le mot t le plus proche de t :

t = argmaxw

vec(w).vec(t)

||vec(w)|| × ||vec(t)||

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 37: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

[Mikolov et al., 2013d]

I RNN entraıne sur 320M de mots (V = 82k)

I test set de 8k analogies impliquant les mots les plus frequents

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 38: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

[Mikolov et al., 2013c]

I 6B de mots de Google News, 1M de mots les plus frequents

I le test syntaxique est le meme que dans [Mikolov et al., 2013d]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 39: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

[Mikolov et al., 2013c]

I Comparaison a d’autres modeles proposes

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 40: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

[Mikolov et al., 2013c]

I Big Data (plus de donnees, dimension plus elevee)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 41: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Don’t count, predict ! [Baroni et al., 2014]

I plein de taches, une etude des meta-parametres de chaquemethode

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 42: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Don’t count, predict ! [Baroni et al., 2014]

I cnt = count vector, pre = word2Vec, dm =[Baroni and Lenci, 2010] , cw = [?]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 43: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Don’t count, predict ! [Baroni et al., 2014]

. . .we set out to conduct this study because we were annoyed bythe triumphalist overtones often surrounding predict models,despite the almost complete lack of proper comparison to countvectors. Our secret wish was to discover that it is all hype, andcount vectors are far superior to their predictive counterparts.. . .we found that the predict models are so good that , while thetriumphalist overtones still sound excessive, there are verygood reasons to switch to the new architecture.

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 44: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Sur la difficulte d’evaluer sans biais. . . [Levy et al., 2015]

I comparent 4 approches : matrice de co-occurrence (PMI), SVD,Skip-Gram, et GloVe

I etudient leurs parametres en detail

I adaptent des choix faits dans Skip-Gram a d’autres methodeslorsque possible

I Bilan :

I match nul en performance (pas d’avantage clair d’une approchesur une autre)

I Skip-Gram se comporte mieux (temps/memoire) que les autresapproches

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 45: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Sur la difficulte d’evaluer sans biais. . . [Levy et al., 2015]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 46: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Exemple d’analyse . . . [Levy et al., 2015]

I dans l’approche matrice de co-occurences, un mot w et soncontexte c est note :

PMI(w, c) = logp(w, c)

p(w)p(c)

I une approche courante est de mettre a 0 les valeurs de PMIlorsque #(w, c) = 0 (plutot que −∞)

I une autre est de prendre : PPMI(w, c) = max(PMI(w, c), 0)

I adaptation de choix faits dans Skip-Gram :

I k (nb d’exemples negatifs)

SPPMI(w, c) = max(PMI(w, c)− logk, 0)

I sampling des examples negatifs (lisses avec α = 0.75)

PMIα(w, c) = logP (w, c)

p(w).Pα(c)avec Pα(c) =

#(c)α∑c#(c)α

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 47: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer, 2015]

I en utilisant des ressources linguistiques (WordNet, PTB,FrameNet, etc.)

I vecteurs tres creux (99.9%)

I comparables en performance aux modeles distributionnels etatde l’art entraınes sur des billions de mots

I vecteurs disponibles (pour l’anglais) :https://github.com/mfaruqui/non-distributional

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 48: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer, 2015]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 49: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer, 2015]

features (binaires) induitspour film :

SYNSET.FILM.V.01SYNSET.FILM.N.01

HYPO.COLLAGEFILM.N.01HYPER :SHEET.N.06

...

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 50: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer, 2015]

supersenses pour les noms, les verbes et les adjectifsex : lioness ⇒ SS.NOUN.ANIMAL

color lexique mot-couleur elabore par crowdsourcing [?]ex : blood ⇒ COLOR.RED

emotion lexique associant un mot a sa polarite(positif,negatif) et aux emotions (joie, peur,tristesse, etc.), elabore par crowdsourcing [?]ex : cannibal ⇒ POL.NEG, EMO.DISGUST etEMO.FEARCOLOR.RED

pos PTB part-of-speech tagsex : love⇒ PTB.NOUN, PTB.VERB

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 51: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer, 2015]

I note : difficile a faire pour toutes les langues

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 52: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Representation vectorielle binaire (nondistributionnelle) [Faruqui and Dyer, 2015]

I Skip-Gram : pre-entraıne sur 300B de mots[Mikolov et al., 2013a]

I Glove : pre-entraıne sur 6B de mots [Pennington et al., 2014]I LSA : obtenue a partir d’une matrice de co-occurrence calculee

sur 1B de mots de Wikipedia [Turney and Pantel, 2010]I Ling Dense : reduction de dimensionnalite avec SVDI taches : similarite, sent. analysis (positif/negatif), NP-bracketing

(local (phone company) versus (local phone) company )[email protected] Semantique distributionnelle, embeddings (et dong)

Page 53: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al., 2015a]

I etape de post-traitement applicable a n’importe quellerepresentation vectorielle de mots

I rapide (5 secondes pour 100k mots et dimension 300)

I idee : utiliser les informations lexco-semantiques d’uneressource pour ameliorer une representation existante

I comment : encourager que les mots de distance similaire dansla representation apprise soit proche de la representation induitede la ressource (encodee sous forme de graphe).

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 54: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al., 2015a]

I V le vocabulaire w1, . . . , w|V |I Q la collection de vecteurs qi ∈ Rd d’entree, i ∈ [1, . . . , |V |]I E = (i, j) : i, j ∈ [1, |V |], et wiRwj selon la ressourceI chercher la nouvelle representation Q qui minimise :

Ψ(Q) =∑i

αi ‖qi − qi‖2 +∑

j:(i,j)∈E

βi,j ‖qi − qj‖2

ou :I αi et βij sont des meta-parametres,I φ est convexe en Q et se resoud iterativement par : [?]

qi =

∑j:(i,j)∈E βijqj + αiqi∑j:(i,j)∈E βij + αi

I 10 iterations suffisent en pratique

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 55: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Retrofitting de vecteurs a une ressourcelexico-semantique [Faruqui et al., 2015a]

I SG : Skip-Gram(100B words GoogleNews, d = 300)

I GC : recursive NN(1B words of EnglishWikipedia d = 50)

I Multi : multilingue(360M EN-wordsWMT-2011, d = 512)

I PPDB : 8M paraphrases au niveau des mots, FN : FrameNet

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 56: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Sparsification des representations vectorielles[Faruqui et al., 2015b]

I transformer des representations denses en representations(plus longues et) creuses (eventuellement binaires) afin d’etreplus interpretables (linguistiquement)

I deux methodes presentees (assez techniques), la “meilleure” (etplus simple) semble celle-ci :

arg minD,A

‖X −DA‖22 + λΩ(A) + τ ‖D‖22

ou λ et τ sont des meta-parametres controlant uneregularisation `1 et `2 respectivement

I optimise par descente de gradient (une version de AdaGrad)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 57: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Sparsification des representations vectorielles[Faruqui et al., 2015b]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 58: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Sparsification des representations vectorielles[Faruqui et al., 2015b]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 59: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Une communaute qui s’organise[Faruqui and Dyer, 2014]

I des embeddings deja entraınes

I une suite de tests qui peuvent s’executer (similarite, analogie,completion, etc.)

I une interface de visualisation

I note : pas certain que le site soit tres populaire (ni mis a jour)pour le moment. . .

I http://wordvectors.org/demo.php

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 60: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Mikolov strikes again [?]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 61: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Mikolov strikes again [?]

I on peut apprendre une transformation lineaire (rotation +scaling) d’un espace vers un autre avec un lexique bilingue(xi, yi) :

W = minW

Σi ‖Wxi − zi‖2

ou xi et zi designent respectivement la representationvectorielle source de xi et cible de zi

I W optimisee par descente de gradient sur un lexique d’environ5k paires de mots

I au moment du test : traduire un mot x par z :

z = argmaxz

cosine(z, Wx)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 62: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Mikolov strikes again [?]

I 6K des most sources lesplus frequents traduits parGoogleTrans

I premieres 5K entreespour calculer W

I 1K suivantes pour lestests

I baselines : edit-distance,ε−Rapp

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 63: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Mikolov strikes again [?]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 64: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Plus de donnees ? (Google News)

I meme split : 5K train / 1Ktest

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 65: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Et pour les mots peu frequents ? [?]

1k-low 1k-highTOP@1 TOP@5 TOP@20 TOP@1 TOP@5 TOP@20

embedding 2,2 6,1 11,9 21,7 34,2 44,9context 2,0 4,3 7,6 19,0 32,7 44,3document 0,7 2,3 5,0 — — —

oracle 4,6 — 19,0 31,8 — 57,6

I Wikipedia dump de juin 2013 (EN : 3.5M, FR : 1.3M articles)

I VEN = 7.3M, VFR = 3.6M

I 2 test sets : 1k-low (1k mots rares) 1k-high (1k mots non rares)

I rare = freq < 26 (92% des mots de VEN)

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 66: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Et pour les mots peu frequents ? [?]

[email protected] Semantique distributionnelle, embeddings (et dong)

Page 67: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca,M., and Soroa, A. (2009).A study on similarity and relatedness using distributional andwordnet-based approaches.In Proceedings of Human Language Technologies : The 2009Annual Conference of the North American Chapter of theAssociation for Computational Linguistics, NAACL ’09, pages19–27.

Baroni, M., Dinu, G., and Kruszewski, G. (2014).Don’t count, predict ! a systematic comparison ofcontext-counting vs. context-predicting semantic vectors.In Proceedings of the 52nd Annual Meeting of the Associationfor Computational Linguistics (Volume 1 : Long Papers), pages238–247, Baltimore, Maryland. Association for ComputationalLinguistics.

Baroni, M. and Lenci, A. (2010).Distributional memory : A general framework for corpus-basedsemantics.Comput. Linguist., 36(4) :673–721.

Page 68: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Chandar, A. P. S., Lauly, S., Larochelle, H., Khapra,M. M., Ravindran, B., Raykar, V. C., and Saha, A. (2014).An autoencoder approach to learning bilingual wordrepresentations.CoRR.

Coulmance, J., Marty, J., Wenzek, G., and Benhalloum,A. (2016).Trans-gram, fast cross-lingual word-embeddings.CoRR, abs/1601.02502.

Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E.,and Smith, N. A. (2015a).Retrofitting word vectors to semantic lexicons.In Proceedings of NAACL.

Faruqui, M. and Dyer, C. (2014).Community evaluation and exchange of word vectors atwordvectors.org.In Proceedings of ACL : System Demonstrations.

Page 69: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Faruqui, M. and Dyer, C. (2015).Non-distributional word vector representations.In Proceedings of ACL.

Faruqui, M., Tsvetkov, Y., Yogatama, D., Dyer, C., andSmith, N. A. (2015b).Sparse overcomplete word vector representations.In Proceedings of ACL.

Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia,S., Rohwer, R., and Wang, Z. (2005).New experiments in distributional representations of synonymy.In Proceedings of the Ninth Conference on ComputationalNatural Language Learning, pages 25–32.

Golub, G. H. and Van Loan, C. F. (1996).Matrix Computations (3rd Ed.).Johns Hopkins University Press.

Gouws, S., Bengio, Y., and Corrado, G. (2015).

Page 70: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Bilbowa : Fast bilingual distributed representations without wordalignments.In ICML.

Jurafsky, D. and Martin, J. H. (2015).Speech and language processing.(3rd ed. draft).

Lee, D. D. and Seung, H. S. (1999).Learning the parts of objects by non-negative matrixfactorization.Nature, 401(6755) :788–791.

Levy, O. and Goldberg, Y. (2014).Neural word embedding as implicit matrix factorization.In Advances in Neural Information Processing Systems 27,pages 2177–2185.

Levy, O., Goldberg, Y., and Dagan, I. (2015).Improving distributional similarity with lessons learned from wordembeddings.

Page 71: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Transactions of the Association for Computational Linguistics,3 :211–225.

Lin, D. and Pantel, P. (2001).Dirt — discovery of inference rules from text.In Proceedings of the Seventh ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, pages323–328. ACM.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a).

Efficient estimation of word representations in vector space.CoRR, abs/1301.3781.

Mikolov, T., Le, Q. V., and Sutskever, I. (2013b).Exploiting similarities among languages for machine translation.CoRR, abs/1309.4168.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., andDean, J. (2013c).Distributed representations of words and phrases and theircompositionality.

Page 72: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

CoRR, abs/1310.4546.

Mikolov, T., tau Yih, W., and Zweig, G. (2013d).Linguistic regularities in continuous space word representations.In Proceedings of the 2013 Conference of the North AmericanChapter of the Association for Computational Linguistics :Human Language Technologies (NAACL-HLT-2013).

Pennington, J., Socher, R., and Manning, C. D. (2014).Glove : Global vectors for word representation.In Empirical Methods in Natural Language Processing (EMNLP),pages 1532–1543.

Sahlgren, M. (2008).The distributional hypothesis.Italian Journal of Linguistics, 20(1) :33–54.

Salton, G. (1975).Dynamic information and library processing / Gerard Salton.Prentice-Hall Englewood Cliffs, N.J.

Turney, P. D. (2005).

Page 73: Sémantique distributionnelle, embeddings (et dong ...felipe/IFT6010-Hiver2016/Transp/dist.pdf · bright brighter simple simpler big bigger bright brighter: gram4-superlative bright

BD Tasks Deep W2V Ana Eval Cool Bi

Measuring semantic similarity by latent relational analysis.CoRR.

Turney, P. D. and Pantel, P. (2010).From frequency to meaning : Vector space models of semantics.

J. Artif. Int. Res., 37(1) :141–188.