checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1...

29
Checking the adequacy of second-order vector space models of meaning RU Quantitative Lexicology and Variational Linguistics

Upload: others

Post on 27-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Checking the adequacy of second-order vector

space models of meaning

RU Quantitative Lexicology and Variational Linguistics

Page 2: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

On the menu

• narrowly defined:

examine the effect of ppmi weighting of context words (and other

parameter settings) in token-based vector space models

• more broadly:

introduce the Nephological Semantics project to the CL

community

ICLC, Nishinomiya, 2019年8月10日

Page 3: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Nephological Semantics

• a research project within the QLVL team of the university

of Leuven, aimed at developing and testing vector space

models for semantic and lectal analysis

ICLC, Nishinomiya, 2019年8月10日

• staff members: Dirk Geeraerts, Dirk Speelman, Benedikt Smzrecsanyi,

Stefania Marzo

• postdocs: Kris Heylen, Weiwei Zhang, Karlien Franco

• PhD students: Thomas Wielfaert, Stefano De Pascale, Mariana Montes

• informatician: Tao Chen

Page 4: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Nephological Semantics

• theoretically, a corpus-based scaling up of the

categorization model introduced in Geeraerts et al. 1994,

The Structure of Lexical Variation (CLR 5)

-> branches:

• semasiology: sense identification (this talk)

• formal onomasiology: synonym identification and lexical

lectometry (Weiwei Zhang’s talk on Tuesday)

• conceptual onomasiology: construal

ICLC, Nishinomiya, 2019年8月10日

Page 5: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Nephological Semantics

why ‘nephology’?

the study of the shape of clouds

here:

• represent word usages as points in a semantic space

(so that semantically similar tokens belong together)

• see what the shape of the resulting cloud of points tells

us about the polysemy of the word

ICLC, Nishinomiya, 2019年8月10日

Page 6: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Steps to take

• an introduction to the essentials of type-based and token-based

distributional semantics (and the variety of relevant context

features)

• a hands-on demonstration of the tool developed in the

Nephological Semantics project for the visual analytics of vector

models

ICLC, Nishinomiya, 2019年8月10日

Page 7: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

DISTRIBUTIONAL SEMANTICS

The Workflow

ICLC, Nishinomiya, 2019年8月10日

Page 8: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Type level vectors

ICLC, Nishinomiya, 2019年8月10日

protestant tower cloud faculty sovereign beyond join mission

member

state

go

Page 9: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Type level vectors

ICLC, Nishinomiya, 2019年8月10日

protestant tower cloud faculty sovereign beyond join mission

member 0.0 0.0 0.0 4.42 0.0 0.0 0.0 1.2

state 0.0 1.60 0.0 0.0 3.59 0.0 1.08 0.78

go 1.59 0.0 0.99 0.0 0.0 2.81 1.12 1.29

Page 10: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Type level vectors

ICLC, Nishinomiya, 2019年8月10日

protestant tower cloud faculty sovereign beyond join mission

member 0.0 0.0 0.0 4.42 0.0 0.0 0.0 1.2

state 0.0 1.60 0.0 0.0 3.59 0.0 1.08 0.78

go 1.59 0.0 0.99 0.0 0.0 2.81 1.12 1.29

PARAMETERS

• Window size

• Corpus

• Association measure (PPMI?)

• Number of context words

• ...

Page 11: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

member steeple state go separation school

church1

church2

church3

church4

church5

Token level vectors

ICLC, Nishinomiya, 2019年8月10日

Page 12: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

member steeple state go separation school

church1 0 0 1 0 1 0

church2 0 0 1 0 0 0

church3 0 0 0 0 0 1

church4 1 0 0 0 0 0

church5 0 0 0 1 0 0

Token level vectors

ICLC, Nishinomiya, 2019年8月10日

Page 13: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Token level vectors

ICLC, Nishinomiya, 2019年8月10日

member steeple state go separation school

church1 0 0 0.0 0 3.27 0

church2 0 0 0.0 0 0 0

church3 0 0 0 0 0 0.65

church4 2.41 0 0 0 0 0

church5 0 0 0 0.29 0 0

Page 14: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Token level vectors

ICLC, Nishinomiya, 2019年8月10日

PARAMETERS

• Window size

• Association measure (PPMI?)

• Number of context words

• Part of speech

• ...

member steeple state go separation school

church1 0 0 0.0 0 3.27 0

church2 0 0 0.0 0 0 0

church3 0 0 0 0 0 0.65

church4 2.41 0 0 0 0 0

church5 0 0 0 0.29 0 0

Page 15: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

member steeple state go separation school

church1 0 0 1 0 1 0

church2 0 0 1 0 0 0

church3 0 0 0 0 0 1

church4 1 0 0 0 0 0

church5 0 0 0 1 0 0

Second order vectors

ICLC, Nishinomiya, 2019年8月10日

protestant tower cloud faculty sovereign beyond join mission

member 0.0 0.0 0.0 4.42 0.0 0.0 0.0 1.2

state 0.0 1.60 0.0 0.0 3.59 0.0 1.08 0.78

go 1.59 0.0 0.99 0.0 0.0 2.81 1.12 1.29

Page 16: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

member steeple state go separation school

church1 0 0 1 0 1 0

church2 0 0 1 0 0 0

church3 0 0 0 0 0 1

church4 1 0 0 0 0 0

church5 0 0 0 1 0 0

Second order vectors

ICLC, Nishinomiya, 2019年8月10日

protestant tower cloud faculty sovereign beyond join mission

member 0.0 0.0 0.0 4.42 0.0 0.0 0.0 1.2

state 0.0 1.60 0.0 0.0 3.59 0.0 1.08 0.78

go 1.59 0.0 0.99 0.0 0.0 2.81 1.12 1.29

Page 17: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

member steeple state go separation school

church1 0 0 0.0 0 3.27 0

church2 0 0 0.0 0 0 0

church3 0 0 0 0 0 0.65

church4 2.41 0 0 0 0 0

church5 0 0 0 0.29 0 0

Second order vectors

ICLC, Nishinomiya, 2019年8月10日

protestant tower cloud faculty sovereign beyond join mission

member 0.0 0.0 0.0 4.42 0.0 0.0 0.0 1.2

state 0.0 1.60 0.0 0.0 3.59 0.0 1.08 0.78

go 1.59 0.0 0.99 0.0 0.0 2.81 1.12 1.29

Page 18: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

member steeple state go separation school

church1 0 0 0.0 0 3.27 0

church2 0 0 0.0 0 0 0

church3 0 0 0 0 0 0.65

church4 2.41 0 0 0 0 0

church5 0 0 0 0.29 0 0

Second order vectors

ICLC, Nishinomiya, 2019年8月10日

protestant tower cloud faculty sovereign beyond join mission

member 0.0 0.0 0.0 4.42 0.0 0.0 0.0 1.2

state 0.0 1.60 0.0 0.0 3.59 0.0 1.08 0.78

go 1.59 0.0 0.99 0.0 0.0 2.81 1.12 1.29

Page 19: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

ICLC, Nishinomiya, 2019年8月10日

PARAMETERS

• Window size

• Association measure (PPMI?)

• Number of context words

• Part of speech

• ...

PARAMETERS

• Window size

• Association measure (PPMI?)

• Number of context words

• Part of speech

• ...

Page 20: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Which context words should we use to represent a token?

ICLC, Nishinomiya, 2019年8月10日

Source: File p01 (Semcor), token 222

Youth obeyed when commanded. It went to church on

Sunday and one Saturday a month went to confession.

So, our challenge:

Page 21: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Youth obeyed when commanded. It went to church on

Sunday and one Saturday a month went to confession.

Which context words should we use to represent a token?

ICLC, Nishinomiya, 2019年8月10日

[...] Youth obeyed when commanded. It went to church on

Sunday and one Saturday a month went to confession. [...]

The whole text?

Source: File p01 (Semcor), token 222

Page 22: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Youth obeyed when commanded. It went to church on

Sunday and one Saturday a month went to confession.

Which context words should we use to represent a token?

ICLC, Nishinomiya, 2019年8月10日

Source: File p01 (Semcor), token 222

[...] commanded. It went to church on Sunday and one Saturday [...]

A 5:5 window?

Page 23: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Youth obeyed when commanded. It went to church on

Sunday and one Saturday a month went to confession.

Which context words should we use to represent a token?

ICLC, Nishinomiya, 2019年8月10日

Source: File p01 (Semcor), token 222

[...] commanded. It went to church on Sunday and one Saturday [...]

According to their PPMI value?

Page 24: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Youth obeyed when commanded. It went to church on

Sunday and one Saturday a month went to confession.

Which context words should we use to represent a token?

ICLC, Nishinomiya, 2019年8月10日

Source: File p01 (Semcor), token 222

According to their PPMI value?

[...] commanded. It went to church on Sunday and one Saturday [...]

Page 25: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

What does this look like?

ICLC, Nishinomiya, 2019年8月10日

http://delhikiev.github.io/

Page 26: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Discussion

ICLC, Nishinomiya, 2019年8月10日

☁ BETWEEN types, ppmi values are useful

☁WITHIN types, ppmi values might hide the very distinctions

we would like to highlight

☁ Other parameters should be used to select the first order

context words (window size, part-of-speech, dependency

relations...)

Page 27: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Rounding up

we have

• introduced the semasiological workflow of the Nephological

Semantics project

• shown how that workflow functions as a diagnostic tool for

examining the effect of parameter settings (such as the

selection and weighting of context words) in a token-based

distributional approach

ICLC, Nishinomiya, 2019年8月10日

Page 28: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Rounding up

next steps to take include

• using the procedures and visualizations illustrated here to

optimize parameter settings for semasiological analysis

(e.g. in terms of word class)

• further integrate the semasiological and onomasiological lines

of enquiry within the project

ICLC, Nishinomiya, 2019年8月10日

Page 29: Checking the adequacy of second-order vector space models ... · church1 0 0 1 0 1 0 church2 0 0 1 0 0 0 church3 0 0 0 0 0 1 church4 1 0 0 0 0 0 church5 0 0 0 1 0 0 Second order vectors

Thank you!

ご清聴ありがとうございました

[email protected]

[email protected]

http://delhikiev.github.io

ICLC, Nishinomiya, 2019年8月10日