the semantics of color terms. a quantitative cross ...gjaeger/slides/slidesleipzig.pdf · colors...

124
The semantics of color terms. A quantitative cross-linguistic investigation Gerhard J¨ ager [email protected] May 20, 2010 University of Leipzig 1/124

Upload: others

Post on 23-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The semantics of color terms. A quantitativecross-linguistic investigation

Gerhard [email protected]

May 20, 2010

University of Leipzig

1/124

Page 2: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The psychological color space

physical color space has infinite dimensionality — everywavelength within the visible spectrum is one dimensionpsychological color space is only 3-dimensionalthis fact is employed in technical devices like computer screens(additive color space) or color printers (subtractive colorspace)

additive color space subtractive color space

2/124

Page 3: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The psychological color space

psychologically correct color space should not only correctlyrepresent the topology of, but also the distances betweencolors

distance is inverse function of perceived similarity

L*a*b* color space has this property

three axes:

black — whitered — greenblue — yellow

irregularly shaped 3d color solid

3/124

Page 4: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The color solid

4/124

Page 5: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The Munsell chart

for psychological investigations, the Munsell chart is beingused2d-rendering of the surface of the color solid

8 levels of lightness40 hues

plus: black–white axis with 8 shaded of grey in betweenneighboring chips differ in the minimally perceivable way

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

5/124

Page 6: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

pilot study how different languages carve up the color spaceinto categories

informants: speakers of 20 typologically distant languages(who happened to be around the Bay area at the time)

questions (using the Munsell chart):

What are the basic color terms of your native language?What is the extension of these terms?What are the prototypical instances of these terms?

results are not random

indicate that there are universal tendencies in color namingsystems

6/124

Page 7: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Arabic

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

7/124

Page 8: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Bahasa Indonesia

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

8/124

Page 9: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Bulgarian

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

9/124

Page 10: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Cantonese

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

10/124

Page 11: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Catalan

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

11/124

Page 12: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

English

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

12/124

Page 13: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Hebrew

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

13/124

Page 14: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Hungarian

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

14/124

Page 15: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Ibibo

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

15/124

Page 16: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Japanese

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

16/124

Page 17: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Korean

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

17/124

Page 18: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Mandarin

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

18/124

Page 19: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Mexican Spanish

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

19/124

Page 20: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Pomo

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

20/124

Page 21: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Swahili

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

21/124

Page 22: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Tagalog

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

22/124

Page 23: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Thai

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

23/124

Page 24: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Tzeltal

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

24/124

Page 25: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Urdu

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

25/124

Page 26: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

extensions

Vietnamese

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

26/124

Page 27: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Berlin and Kay 1969

identification of absolute and implicational universals, like

all languages have words for black and whiteif a language has a word for yellow, it has a word for redif a language has a word for pink, it has a word for blue...

27/124

Page 28: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Color Survey

B&K was criticized for methodological reasons

in response, in 1976 Kay and co-workers launched the worldcolor survey

investigation of 110 non-written languages from around theworld

around 25 informants per languagetwo tasks:

the 330 Munsell chips were presented to each test person oneafter the other in random order; they had to assign each chipto some basic color term from their native languagefor each native basic color term, each informant identified theprototypical instance(s)

data are publicly available underhttp://www.icsi.berkeley.edu/wcs/

28/124

Page 29: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

distribution of focal colors across all informants:

Distribution of focal colors

Munsell chips

# na

med

as

foca

l col

or

2050

200

1000

29/124

Page 30: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

distribution of focal colors across all informants:

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

30/124

Page 31: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

31/124

Page 32: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

32/124

Page 33: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

33/124

Page 34: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

34/124

Page 35: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

35/124

Page 36: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

36/124

Page 37: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

37/124

Page 38: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

38/124

Page 39: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

39/124

Page 40: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Data digging in the WCS

partition of a randomly chosen informant from a randomlychosen language

40/124

Page 41: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

What is the extension of categories?

data from individual informants are extremely noisy

averaging over all informants from a language helps, but thereis still noise, plus dialectal variation

desirable: distinction between “genuine” variation and noise

41/124

Page 42: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Statistical feature extraction

first step: representation ofraw data in contingencymatrix

rows: color terms fromvarious languagescolumns: Munsell chipscells: number of testpersons who used therow-term for thecolumn-chip

A0 B0 B1 B2 · · · I38 I39 I40 J0

red 0 0 0 0 · · · 0 0 2 0green 0 0 0 0 · · · 0 0 0 0blue 0 0 0 0 · · · 0 0 0 0black 0 0 0 0 · · · 18 23 21 25white 25 25 22 23 · · · 0 0 0 0

......

......

......

......

......

rot 0 0 0 0 · · · 1 0 0 0grun 0 0 0 0 · · · 0 0 0 0gelb 0 0 0 1 · · · 0 0 0 0

......

......

......

......

......

rouge 0 0 0 0 · · · 0 0 0 0vert 0 0 0 0 · · · 0 0 0 0

......

......

......

......

......

further processing:

divide each row by the number n of test persons using thecorresponding termduplicate each row n times

42/124

Page 43: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Principal Component Analysis

technique to reduce dimensionality of data

input: set of vectors in an n-dimensionalspace

first step:

rotate the coordinatesystem, such that

the new n coordinates areorthogonal to each otherthe variations of the dataalong the new coordinatesare stochasticallyindependent

second step:

choose a suitable m < n

project the data on those mnew coordinates where thedata have the highestvariance

43/124

Page 44: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Principal Component Analysis

alternative formulation:

choose an m-dimensional linear sub-manifold of yourn-dimensional spaceproject your data onto this manifoldwhen doing so, pick your sub-manifold such that the averagesquared distance of the data points from the sub-manifold isminimized

intuition behind this formulation:

data are “actually” generated in an m-dimensional spaceobservations are disturbed by n-dimensional noisePCA is a way to reconstruct the underlying data distribution

applications: picture recognition, latent semantic analysis,statistical data analysis in general, data visualization, ...

44/124

Page 45: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Statistical feature extraction: PCA

first 15 principalcomponents jointlyexplain 91.6% of thetotal variance

choice of m = 15 isdetermined by using“Kaiser’s stoppingrule”

principal components

prop

ortio

n of

var

ianc

e ex

plai

ned

0.00

0.05

0.10

0.15

0.20

0.25

0.30

45/124

Page 46: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Statistical feature extraction: PCA

after some post-processing (“varimax” algorithm):

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

46/124

Page 47: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

noise removal: project observed data onto thelower-dimensional submanifold that was obtained via PCA

in our case: noisy binary categories are mapped to smoothedfuzzy categories (= probability distributions over Munsellchips)

some examples:

47/124

Page 48: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

48/124

Page 49: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

49/124

Page 50: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

50/124

Page 51: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

51/124

Page 52: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

52/124

Page 53: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

53/124

Page 54: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

54/124

Page 55: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

55/124

Page 56: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

56/124

Page 57: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

57/124

Page 58: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

58/124

Page 59: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

59/124

Page 60: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

60/124

Page 61: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

61/124

Page 62: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

62/124

Page 63: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

63/124

Page 64: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

64/124

Page 65: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

65/124

Page 66: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

66/124

Page 67: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Projecting observed data on lower-dimensional-manifold

67/124

Page 68: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothing the partitions

from smoothed extensions we can recover smoothed partitions

each pixel is assigned to category in which it has the highestdegree of membership

68/124

Page 69: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

69/124

Page 70: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

70/124

Page 71: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

71/124

Page 72: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

72/124

Page 73: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

73/124

Page 74: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

74/124

Page 75: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

75/124

Page 76: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

76/124

Page 77: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

77/124

Page 78: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Smoothed partitions of the color space

78/124

Page 79: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convexity

note: so far, we only used information from the WCS

the location of the 330 Munsell chips in L*a*b* space playedno role so far

still, apparently partition cells always form continuous clustersin L*a*b* space

Hypothesis (Gardenfors): extension of color terms always formconvex regions of L*a*b* space

79/124

Page 80: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Support Vector Machines

supervised learning techniquesmart algorithm to classify data in a high-dimensional spaceby a (for instance) linear boundaryminimizes number of mis-classifications if the training dataare not linearly separable

gree

nre

d

−3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

o

o oo

o

o

oooo o

o

o

o

o

o

o

ooo

o

oo

o

o

o

oo o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

oo

o o

oo

o

o

o

o o

o

o

oo o

o

o

o

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o o

o o

oo

o

o

o

o

o

o

o

ooo

oo

o

o

oo

o

o

o

o

o

o o

o

o

o o

o

o

ooo

o

o

o

o

o

o

o

o

o

o o

oo

o

o

o

o

o

o

oo

o

o

oo

o

oo

o

oo

o

o

o

oo

o

o

o oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

oo

o

ooo

o

o

SVM classification plot

y

x

80/124

Page 81: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex partitions

a binary linear classifier divides an n-dimensional space intotwo convex half-spaces

intersection of two convex set is itself convex

hence: intersection of k binary classifications leads to convexsets

procedure: if a language partitions the Munsell space into mcategories, train m(m−1)

2 many binary SVMs, one for each pairof categories in L*a*b* space

leads to m convex sets (which need not split the L*a*b*space exhaustively)

81/124

Page 82: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

82/124

Page 83: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

83/124

Page 84: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

84/124

Page 85: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

85/124

Page 86: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

86/124

Page 87: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

87/124

Page 88: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

88/124

Page 89: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

89/124

Page 90: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

90/124

Page 91: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

91/124

Page 92: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

on average, 93.7% of all Munsell chips are correctly classifiedby convex approximation

0.80

0.85

0.90

0.95

prop

ortio

n of

cor

rect

ly c

lass

ified

Mun

sell

chip

s

92/124

Page 93: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

compare to the outcome of the same procedure without PCA,and with PCA but using a random permutation of the Munsellchips

●●●●

●●●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●●●●●●●●●●

●●●

●●●●●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

1 2 3

2040

6080

100

degr

ee o

f con

vexi

ty (

%)

93/124

Page 94: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Convex approximation

choice of m = 10 is somewhat arbitraryoutcome does not depend very much on this choice though

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 10 20 30 40 50

5060

7080

9010

0

no. of principal components used

mea

n de

gree

of c

onve

xity

(%

)

94/124

Page 95: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Implicative universals

first six features correspond nicely to the six primary colorswhite, black, red, green, blue, yellow

according to Kay et al. (1997) (and many other authors)simple system of implicative universals regarding possiblepartitions of the primary colors

95/124

Page 96: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Implicative universals

I II III IV V

whitered/yellowgreen/blueblack

whiteredyellowgreen/blueblack

[white/red/yellowblack/green/blue

] whitered/yellowblack/green/blue

whitered/yellowgreenblack/blue

whiteredyellowgreenblueblack

whiteredyellowblack/green/blue

whiteredyellowgreenblack/blue

whiteredyellow/green/blueblack

whiteredyellow/greenblueblack

whiteredyellow/greenblack/blue

source: Kay et al. (1997)

96/124

Page 97: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Partition of the primary colors

each speaker/term pair can be projected to a 15-dimensionalvector

primary colors correspond to first 6 entries

each primary color is assigned to the term for which it has thehighest value

defines for each speaker a partition over the primary colors

97/124

Page 98: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Partition of the primary colors

for instance: sample speaker(from Piraha):

extracted partition:white/yellowredgreen/blueblack

supposedly impossible, butoccurs 61 times in thedatabase

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

98/124

Page 99: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Partition of primary colors

most frequent partition types:

1 {white}, {red}, {yellow}, {green, blue}, {black} (41.9%)2 {white}, {red}, {yellow}, {green}, {blue}, {black} (25.2%)3 {white}, {red, yellow}, {green, blue, black} (6.3%)4 {white}, {red}, {yellow}, {green}, {black, blue} (4.2%)5 {white, yellow}, {red}, {green, blue}, {black} (3.4%)6 {white}, {red}, {yellow}, {green, blue, black} (3.2%)7 {white}, {red, yellow}, {green, blue}, {black} (2.6%)8 {white, yellow}, {red}, {green, blue, black} (2.0%)9 {white}, {red}, {yellow}, {green, blue, black} (1.6%)10 {white}, {red}, {green, yellow}, {blue, black} (1.2%)

99/124

Page 100: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Partition of primay colors

87.1% of all speaker partitions obey Kay et al.’s universals

the ten partitions that confirm to the universals occupy ranks1, 2, 3, 4, 6, 7, 9, 10, 16, 18

decision what counts as an exception seems somewhatarbitrary on the basis of these counts

100/124

Page 101: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Partition of primary colors

more fundamental problem:

partition frequencies aredistributed according to powerlaw

frequency ∼ rank−1.99

no natural cutoff point todistinguish regular from exceptionalpartitions

● ●

● ● ●

●●●

●●

●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

1 2 5 10 20 50

12

510

2050

100

200

500

rank

freq

uenc

y

101/124

Page 102: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Partition of seven most important colors

frequency ∼ rank−1.64

●● ●

● ● ● ●

●●●

●●

●●

●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 2 5 10 20 50 100

12

510

2050

100

200

500

rank

freq

uenc

y

102/124

Page 103: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Partition of eight most important colors

frequency ∼ rank−1.46

●● ●

●●

● ●●

●●●●

●●●●

●●

●●

●●●

●●●●●●

●●●

●●●●●

●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 2 5 10 20 50 100 200

12

510

2050

100

200

rank

freq

uenc

y

103/124

Page 104: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Power laws

104/124

Page 105: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Power laws

105/124

Page 106: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Power laws

from Newman 2006

106/124

Page 107: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Other linguistic power law distributions

number of vowels

vowel systems and their frequency of occurrence

3

14

4

14 5 4 2

5

97 3

6

26 12 12

7

23 6 5 4 3

8

6 3 3 2

9

7 7 3

(from Schwartz et al. 1997,

based on the UCLA Phonetic Segment Inventory Database)

107/124

Page 108: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Other linguistic power law distributions

frequency ∼ rank−1.06

● ●

● ●

● ●

● ●

● ●

● ●

● ● ●●●

●●

1 2 5 10 202

510

2050

100

rank

freq

uenc

y

108/124

Page 109: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Other linguistic power law distributions

size of language families

source: Ethnologue

frequency ∼ rank−1.32

● ● ●●

●● ●

●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

1 2 5 10 20 50 1001

510

5010

050

0

rank

freq

uenc

y

109/124

Page 110: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Other linguistic power law distributions

number of speakers perlanguage

source: Ethnologue

frequency ∼ rank−1.01

● ●

● ● ●

●●

●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 2 5 10 20 50 100 2005

1020

5010

020

050

010

00

rank

freq

uenc

y (in

mill

ion)

110/124

Page 111: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

large scale typological database, conducted mainly by the MPIEVA, Leipzig

2,650 languages in total are used

142 features, with between 120 and 1,370 languages perfeature

available online

111/124

Page 112: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

question: are frequency of feature values powerlawdistributed?

problem: number of feature values usually too small forstatistic evaluation

solution:

cross-classification of two (randomly chosen) featuresonly such feature pairs are considered that lead to at least 30non-empty feature value combinations

pilot study with 10 such feature pairs

112/124

Page 113: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1:Consonant-VowelRatio

Feature 2: Subtypesof AsymmetricStandard Negation

Kolmogorov-Smirnovtest: positive

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

113/124

Page 114: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1: WeightFactors inWeight-SensitiveStress Systems

Feature 2: OrdinalNumerals

Kolmogorov-Smirnovtest: positive

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

114/124

Page 115: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1: ThirdPerson Zero of VerbalPerson Marking

Feature 2: Subtypesof AsymmetricStandard Negation

Kolmogorov-Smirnovtest: positive

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

115/124

Page 116: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1:Relationship betweenthe Order of Objectand Verb and theOrder of Adjectiveand Noun

Feature 2: Expressionof PronominalSubjects

Kolmogorov-Smirnovtest: positive

100

101

102

103

10−2

10−1

100

Pr(

X ≥

x)

x

116/124

Page 117: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1: Plurality inIndependent PersonalPronouns

Feature 2:AsymmetricalCase-Marking

Kolmogorov-Smirnovtest: positive

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

117/124

Page 118: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1: Locus ofMarking:Whole-languageTypology

Feature 2: Number ofCases

Kolmogorov-Smirnovtest: positive

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

118/124

Page 119: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1: Prefixingvs. Suffixing inInflectionalMorphology

Feature 2: Coding ofNominal Plurality

Kolmogorov-Smirnovtest: positive

100

101

102

103

10−2

10−1

100

Pr(

X ≥

x)

x

119/124

Page 120: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1: Prefixingvs. Suffixing inInflectionalMorphology

Feature 2: OrdinalNumerals

Kolmogorov-Smirnovtest: positive

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

120/124

Page 121: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1: Coding ofNominal Plurality

Feature 2:AsymmetricalCase-Marking

Kolmogorov-Smirnovtest: positive

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

121/124

Page 122: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

The World Atlas of Language Structures

Feature 1: Position ofCase Affixes

Feature 2: OrdinalNumerals

Kolmogorov-Smirnovtest: negative

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

122/124

Page 123: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

Why power laws?

critical states

self-organized criticality

preferential attachment

random walks

...

Preferential attachment

items are stochasticallyadded to bins

probability to end up in binn is linear in number ofitems that are already in binn

123/124

Page 124: The semantics of color terms. A quantitative cross ...gjaeger/slides/slidesLeipzig.pdf · colors distance is inverse function of perceived similarity L*a*b* color space has this property

(Wide) Open questions

Preferential attachment explains power law distribution ifthere are no a priori biases for particular types

first simulations suggest that preferential attachment + biasedtype assignment does not lead to power law

negative message: uneven typological frequency distributiondoes not prove that frequent types are inherently preferredlinguistically/cognitively/socially

unsettling questions:

Are there linguistic/cognitive/social biases in favor of certaintypes?If yes, can statistical typology supply information about this?If power law distributions are the norm, is their any content tothe notion of statistical universal in a Greenbergian sense?

124/124