the semantics of color terms. a quantitative cross ...gjaeger/slides/slidesleipzig.pdf · colors...

The semantics of color terms. A quantitativecross-linguistic investigation

Gerhard [email protected]

May 20, 2010

University of Leipzig

1/124

The psychological color space

physical color space has infinite dimensionality — everywavelength within the visible spectrum is one dimensionpsychological color space is only 3-dimensionalthis fact is employed in technical devices like computer screens(additive color space) or color printers (subtractive colorspace)

additive color space subtractive color space

2/124

The psychological color space

psychologically correct color space should not only correctlyrepresent the topology of, but also the distances betweencolors

distance is inverse function of perceived similarity

L*a*b* color space has this property

three axes:

black — whitered — greenblue — yellow

irregularly shaped 3d color solid

3/124

The color solid

4/124

The Munsell chart

for psychological investigations, the Munsell chart is beingused2d-rendering of the surface of the color solid

8 levels of lightness40 hues

plus: black–white axis with 8 shaded of grey in betweenneighboring chips differ in the minimally perceivable way

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

5/124

Berlin and Kay 1969

pilot study how different languages carve up the color spaceinto categories

informants: speakers of 20 typologically distant languages(who happened to be around the Bay area at the time)

questions (using the Munsell chart):

What are the basic color terms of your native language?What is the extension of these terms?What are the prototypical instances of these terms?

results are not random

indicate that there are universal tendencies in color namingsystems

6/124

Berlin and Kay 1969

extensions

Arabic

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

7/124

Berlin and Kay 1969

extensions

Bahasa Indonesia

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

8/124

Berlin and Kay 1969

extensions

Bulgarian

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

9/124

Berlin and Kay 1969

extensions

Cantonese

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

10/124

Berlin and Kay 1969

extensions

Catalan

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

11/124

Berlin and Kay 1969

extensions

English

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

12/124

Berlin and Kay 1969

extensions

Hebrew

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

13/124

Berlin and Kay 1969

extensions

Hungarian

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

14/124

Berlin and Kay 1969

extensions

Ibibo

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

15/124

Berlin and Kay 1969

extensions

Japanese

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

16/124

Berlin and Kay 1969

extensions

Korean

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

17/124

Berlin and Kay 1969

extensions

Mandarin

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

18/124

Berlin and Kay 1969

extensions

Mexican Spanish

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

19/124

Berlin and Kay 1969

extensions

Pomo

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

20/124

Berlin and Kay 1969

extensions

Swahili

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

21/124

Berlin and Kay 1969

extensions

Tagalog

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

22/124

Berlin and Kay 1969

extensions

Thai

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

23/124

Berlin and Kay 1969

extensions

Tzeltal

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

24/124

Berlin and Kay 1969

extensions

Urdu

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

25/124

Berlin and Kay 1969

extensions

Vietnamese

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

26/124

Berlin and Kay 1969

identification of absolute and implicational universals, like

all languages have words for black and whiteif a language has a word for yellow, it has a word for redif a language has a word for pink, it has a word for blue...

27/124

The World Color Survey

B&K was criticized for methodological reasons

in response, in 1976 Kay and co-workers launched the worldcolor survey

investigation of 110 non-written languages from around theworld

around 25 informants per languagetwo tasks:

the 330 Munsell chips were presented to each test person oneafter the other in random order; they had to assign each chipto some basic color term from their native languagefor each native basic color term, each informant identified theprototypical instance(s)

data are publicly available underhttp://www.icsi.berkeley.edu/wcs/

28/124

Data digging in the WCS

distribution of focal colors across all informants:

Distribution of focal colors

Munsell chips

# na

med

as

foca

l col

or

2050

200

1000

29/124


distribution of focal colors across all informants:

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

30/124


partition of a randomly chosen informant from a randomlychosen language

31/124



32/124



33/124



34/124



35/124



36/124



37/124



38/124



39/124



40/124

What is the extension of categories?

data from individual informants are extremely noisy

averaging over all informants from a language helps, but thereis still noise, plus dialectal variation

desirable: distinction between “genuine” variation and noise

41/124

Statistical feature extraction

first step: representation ofraw data in contingencymatrix

rows: color terms fromvarious languagescolumns: Munsell chipscells: number of testpersons who used therow-term for thecolumn-chip

A0 B0 B1 B2 · · · I38 I39 I40 J0

red 0 0 0 0 · · · 0 0 2 0green 0 0 0 0 · · · 0 0 0 0blue 0 0 0 0 · · · 0 0 0 0black 0 0 0 0 · · · 18 23 21 25white 25 25 22 23 · · · 0 0 0 0

......

......

......

......

......

rot 0 0 0 0 · · · 1 0 0 0grun 0 0 0 0 · · · 0 0 0 0gelb 0 0 0 1 · · · 0 0 0 0

......

......

......

......

......

rouge 0 0 0 0 · · · 0 0 0 0vert 0 0 0 0 · · · 0 0 0 0

......

......

......

......

......

further processing:

divide each row by the number n of test persons using thecorresponding termduplicate each row n times

42/124

Principal Component Analysis

technique to reduce dimensionality of data

input: set of vectors in an n-dimensionalspace

first step:

rotate the coordinatesystem, such that

the new n coordinates areorthogonal to each otherthe variations of the dataalong the new coordinatesare stochasticallyindependent

second step:

choose a suitable m < n

project the data on those mnew coordinates where thedata have the highestvariance

43/124

Principal Component Analysis

alternative formulation:

choose an m-dimensional linear sub-manifold of yourn-dimensional spaceproject your data onto this manifoldwhen doing so, pick your sub-manifold such that the averagesquared distance of the data points from the sub-manifold isminimized

intuition behind this formulation:

data are “actually” generated in an m-dimensional spaceobservations are disturbed by n-dimensional noisePCA is a way to reconstruct the underlying data distribution

applications: picture recognition, latent semantic analysis,statistical data analysis in general, data visualization, ...

44/124

Statistical feature extraction: PCA

first 15 principalcomponents jointlyexplain 91.6% of thetotal variance

choice of m = 15 isdetermined by using“Kaiser’s stoppingrule”

principal components

prop

ortio

n of

var

ianc

e ex

plai

ned

0.00

0.05

0.10

0.15

0.20

0.25

0.30

45/124

Statistical feature extraction: PCA

after some post-processing (“varimax” algorithm):

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

46/124

Projecting observed data on lower-dimensional-manifold

noise removal: project observed data onto thelower-dimensional submanifold that was obtained via PCA

in our case: noisy binary categories are mapped to smoothedfuzzy categories (= probability distributions over Munsellchips)

some examples:

47/124


48/124


49/124


50/124


51/124


52/124


53/124


54/124


55/124


56/124


57/124


58/124


59/124


60/124


61/124


62/124


63/124


64/124


65/124


66/124


67/124

Smoothing the partitions

from smoothed extensions we can recover smoothed partitions

each pixel is assigned to category in which it has the highestdegree of membership

68/124

Smoothed partitions of the color space

69/124


70/124


71/124


72/124


73/124


74/124


75/124


76/124


77/124


78/124

Convexity

note: so far, we only used information from the WCS

the location of the 330 Munsell chips in L*a*b* space playedno role so far

still, apparently partition cells always form continuous clustersin L*a*b* space

Hypothesis (Gardenfors): extension of color terms always formconvex regions of L*a*b* space

79/124

Support Vector Machines

supervised learning techniquesmart algorithm to classify data in a high-dimensional spaceby a (for instance) linear boundaryminimizes number of mis-classifications if the training dataare not linearly separable

gree

nre

d

−3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

o

o oo

o

o

oooo o

o

o

o

o

o

o

ooo

o

oo

o

o

o

oo o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

oo

o o

oo

o

o

o

o o

o

o

oo o

o

o

o

o

o

o

o

o

ooo

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o o

o o

oo

o

o

o

o

o

o

o

ooo

oo

o

o

oo

o

o

o

o

o

o o

o

o

o o

o

o

ooo

o

o

o

o

o

o

o

o

o

o o

oo

o

o

o

o

o

o

oo

o

o

oo

o

oo

o

oo

o

o

o

oo

o

o

o oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo

o

o

o

oo

o

ooo

o

o

SVM classification plot

y

x

80/124

Convex partitions

a binary linear classifier divides an n-dimensional space intotwo convex half-spaces

intersection of two convex set is itself convex

hence: intersection of k binary classifications leads to convexsets

procedure: if a language partitions the Munsell space into mcategories, train m(m−1)

2 many binary SVMs, one for each pairof categories in L*a*b* space

leads to m convex sets (which need not split the L*a*b*space exhaustively)

81/124

Convex approximation

82/124


83/124


84/124


85/124


86/124


87/124


88/124


89/124


90/124


91/124


on average, 93.7% of all Munsell chips are correctly classifiedby convex approximation

●

●

●

●

●

●

●

●

●

●

0.80

0.85

0.90

0.95

prop

ortio

n of

cor

rect

ly c

lass

ified

Mun

sell

chip

s

92/124


compare to the outcome of the same procedure without PCA,and with PCA but using a random permutation of the Munsellchips

●●●●

●

●●●●

●

●

●

●●●

●●

●

●

●

●

●●●

●

●●

●●●●

●

●●●●●

●

●

●

●

●●●●●●●●●●●

●

●●●

●

●

●●●●●●

●●

●●

●

●●●●●

●

●●●●●●

●

●●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

1 2 3

2040

6080

100

degr

ee o

f con

vexi

ty (

%)

93/124


choice of m = 10 is somewhat arbitraryoutcome does not depend very much on this choice though

●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 10 20 30 40 50

5060

7080

9010

0

no. of principal components used

mea

n de

gree

of c

onve

xity

(%

)

94/124

Implicative universals

first six features correspond nicely to the six primary colorswhite, black, red, green, blue, yellow

according to Kay et al. (1997) (and many other authors)simple system of implicative universals regarding possiblepartitions of the primary colors

95/124

Implicative universals

I II III IV V

whitered/yellowgreen/blueblack

whiteredyellowgreen/blueblack

[white/red/yellowblack/green/blue

] whitered/yellowblack/green/blue

whitered/yellowgreenblack/blue

whiteredyellowgreenblueblack

whiteredyellowblack/green/blue

whiteredyellowgreenblack/blue

whiteredyellow/green/blueblack

whiteredyellow/greenblueblack

whiteredyellow/greenblack/blue

source: Kay et al. (1997)

96/124

Partition of the primary colors

each speaker/term pair can be projected to a 15-dimensionalvector

primary colors correspond to first 6 entries

each primary color is assigned to the term for which it has thehighest value

defines for each speaker a partition over the primary colors

97/124

Partition of the primary colors

for instance: sample speaker(from Piraha):

extracted partition:white/yellowredgreen/blueblack

supposedly impossible, butoccurs 61 times in thedatabase

J

I

H

G

F

E

D

C

B

A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

98/124

Partition of primary colors

most frequent partition types:

1 {white}, {red}, {yellow}, {green, blue}, {black} (41.9%)2 {white}, {red}, {yellow}, {green}, {blue}, {black} (25.2%)3 {white}, {red, yellow}, {green, blue, black} (6.3%)4 {white}, {red}, {yellow}, {green}, {black, blue} (4.2%)5 {white, yellow}, {red}, {green, blue}, {black} (3.4%)6 {white}, {red}, {yellow}, {green, blue, black} (3.2%)7 {white}, {red, yellow}, {green, blue}, {black} (2.6%)8 {white, yellow}, {red}, {green, blue, black} (2.0%)9 {white}, {red}, {yellow}, {green, blue, black} (1.6%)10 {white}, {red}, {green, yellow}, {blue, black} (1.2%)

99/124

Partition of primay colors

87.1% of all speaker partitions obey Kay et al.’s universals

the ten partitions that confirm to the universals occupy ranks1, 2, 3, 4, 6, 7, 9, 10, 16, 18

decision what counts as an exception seems somewhatarbitrary on the basis of these counts

100/124

Partition of primary colors

more fundamental problem:

partition frequencies aredistributed according to powerlaw

frequency ∼ rank−1.99

no natural cutoff point todistinguish regular from exceptionalpartitions

●

●

●

●

● ●

●

●

●

●

● ● ●

●●●

●

●●

●

●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

1 2 5 10 20 50

12

510

2050

100

200

500

rank

freq

uenc

y

101/124

Partition of seven most important colors


●

●

●

●

●

●● ●

● ● ● ●

●●●

●●

●●

●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 2 5 10 20 50 100

12

510

2050

100

200

500

rank

freq

uenc

y

102/124

Partition of eight most important colors


●

●● ●

●●

● ●●

●●●●

●●●●

●●

●●

●●●

●●●●●●

●●●

●●●●●

●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 2 5 10 20 50 100 200

12

510

2050

100

200

rank

freq

uenc

y

103/124

Power laws

104/124

Power laws

105/124

Power laws

from Newman 2006

106/124

Other linguistic power law distributions

number of vowels

vowel systems and their frequency of occurrence

3

14

4

14 5 4 2

5

97 3

6

26 12 12

7

23 6 5 4 3

8

6 3 3 2

9

7 7 3

(from Schwartz et al. 1997,

based on the UCLA Phonetic Segment Inventory Database)

107/124



●

●

●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ●●●

●●

1 2 5 10 202

510

2050

100

rank

freq

uenc

y

108/124


size of language families

source: Ethnologue


●

●

● ● ●●

●● ●

●

●

●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

1 2 5 10 20 50 1001

510

5010

050

0

rank

freq

uenc

y

109/124


number of speakers perlanguage

source: Ethnologue


●

● ●

●

● ● ●

●

●

●●

●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 2 5 10 20 50 100 2005

1020

5010

020

050

010

00

rank

freq

uenc

y (in

mill

ion)

110/124

The World Atlas of Language Structures

large scale typological database, conducted mainly by the MPIEVA, Leipzig

2,650 languages in total are used

142 features, with between 120 and 1,370 languages perfeature

available online

111/124


question: are frequency of feature values powerlawdistributed?

problem: number of feature values usually too small forstatistic evaluation

solution:

cross-classification of two (randomly chosen) featuresonly such feature pairs are considered that lead to at least 30non-empty feature value combinations

pilot study with 10 such feature pairs

112/124


Feature 1:Consonant-VowelRatio

Feature 2: Subtypesof AsymmetricStandard Negation

Kolmogorov-Smirnovtest: positive

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

113/124


Feature 1: WeightFactors inWeight-SensitiveStress Systems

Feature 2: OrdinalNumerals


100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

114/124


Feature 1: ThirdPerson Zero of VerbalPerson Marking

Feature 2: Subtypesof AsymmetricStandard Negation


100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

115/124


Feature 1:Relationship betweenthe Order of Objectand Verb and theOrder of Adjectiveand Noun

Feature 2: Expressionof PronominalSubjects


100

101

102

103

10−2

10−1

100

Pr(

X ≥

x)

x

116/124


Feature 1: Plurality inIndependent PersonalPronouns

Feature 2:AsymmetricalCase-Marking


100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

117/124


Feature 1: Locus ofMarking:Whole-languageTypology

Feature 2: Number ofCases


100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

118/124


Feature 1: Prefixingvs. Suffixing inInflectionalMorphology

Feature 2: Coding ofNominal Plurality


100

101

102

103

10−2

10−1

100

Pr(

X ≥

x)

x

119/124


Feature 1: Prefixingvs. Suffixing inInflectionalMorphology



100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

120/124


Feature 1: Coding ofNominal Plurality

Feature 2:AsymmetricalCase-Marking


100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

121/124


Feature 1: Position ofCase Affixes


Kolmogorov-Smirnovtest: negative

100

101

102

10−2

10−1

100

Pr(

X ≥

x)

x

122/124

Why power laws?

critical states

self-organized criticality

preferential attachment

random walks

...

Preferential attachment

items are stochasticallyadded to bins

probability to end up in binn is linear in number ofitems that are already in binn

123/124

(Wide) Open questions

Preferential attachment explains power law distribution ifthere are no a priori biases for particular types

first simulations suggest that preferential attachment + biasedtype assignment does not lead to power law

negative message: uneven typological frequency distributiondoes not prove that frequent types are inherently preferredlinguistically/cognitively/socially

unsettling questions:

Are there linguistic/cognitive/social biases in favor of certaintypes?If yes, can statistical typology supply information about this?If power law distributions are the norm, is their any content tothe notion of statistical universal in a Greenbergian sense?

124/124

the semantics of color terms. a quantitative cross ...gjaeger/slides/slidesleipzig.pdf · colors...

Documents