the semantics of color terms. a quantitative cross ...gjaeger/slides/slidesleipzig.pdf · colors...
TRANSCRIPT
The semantics of color terms. A quantitativecross-linguistic investigation
Gerhard [email protected]
May 20, 2010
University of Leipzig
1/124
The psychological color space
physical color space has infinite dimensionality — everywavelength within the visible spectrum is one dimensionpsychological color space is only 3-dimensionalthis fact is employed in technical devices like computer screens(additive color space) or color printers (subtractive colorspace)
additive color space subtractive color space
2/124
The psychological color space
psychologically correct color space should not only correctlyrepresent the topology of, but also the distances betweencolors
distance is inverse function of perceived similarity
L*a*b* color space has this property
three axes:
black — whitered — greenblue — yellow
irregularly shaped 3d color solid
3/124
The color solid
4/124
The Munsell chart
for psychological investigations, the Munsell chart is beingused2d-rendering of the surface of the color solid
8 levels of lightness40 hues
plus: black–white axis with 8 shaded of grey in betweenneighboring chips differ in the minimally perceivable way
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
5/124
Berlin and Kay 1969
pilot study how different languages carve up the color spaceinto categories
informants: speakers of 20 typologically distant languages(who happened to be around the Bay area at the time)
questions (using the Munsell chart):
What are the basic color terms of your native language?What is the extension of these terms?What are the prototypical instances of these terms?
results are not random
indicate that there are universal tendencies in color namingsystems
6/124
Berlin and Kay 1969
extensions
Arabic
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
7/124
Berlin and Kay 1969
extensions
Bahasa Indonesia
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
8/124
Berlin and Kay 1969
extensions
Bulgarian
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
9/124
Berlin and Kay 1969
extensions
Cantonese
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
10/124
Berlin and Kay 1969
extensions
Catalan
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
11/124
Berlin and Kay 1969
extensions
English
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
12/124
Berlin and Kay 1969
extensions
Hebrew
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
13/124
Berlin and Kay 1969
extensions
Hungarian
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
14/124
Berlin and Kay 1969
extensions
Ibibo
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
15/124
Berlin and Kay 1969
extensions
Japanese
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
16/124
Berlin and Kay 1969
extensions
Korean
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
17/124
Berlin and Kay 1969
extensions
Mandarin
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
18/124
Berlin and Kay 1969
extensions
Mexican Spanish
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
19/124
Berlin and Kay 1969
extensions
Pomo
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
20/124
Berlin and Kay 1969
extensions
Swahili
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
21/124
Berlin and Kay 1969
extensions
Tagalog
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
22/124
Berlin and Kay 1969
extensions
Thai
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
23/124
Berlin and Kay 1969
extensions
Tzeltal
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
24/124
Berlin and Kay 1969
extensions
Urdu
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
25/124
Berlin and Kay 1969
extensions
Vietnamese
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
26/124
Berlin and Kay 1969
identification of absolute and implicational universals, like
all languages have words for black and whiteif a language has a word for yellow, it has a word for redif a language has a word for pink, it has a word for blue...
27/124
The World Color Survey
B&K was criticized for methodological reasons
in response, in 1976 Kay and co-workers launched the worldcolor survey
investigation of 110 non-written languages from around theworld
around 25 informants per languagetwo tasks:
the 330 Munsell chips were presented to each test person oneafter the other in random order; they had to assign each chipto some basic color term from their native languagefor each native basic color term, each informant identified theprototypical instance(s)
data are publicly available underhttp://www.icsi.berkeley.edu/wcs/
28/124
Data digging in the WCS
distribution of focal colors across all informants:
Distribution of focal colors
Munsell chips
# na
med
as
foca
l col
or
2050
200
1000
29/124
Data digging in the WCS
distribution of focal colors across all informants:
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
30/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
31/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
32/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
33/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
34/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
35/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
36/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
37/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
38/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
39/124
Data digging in the WCS
partition of a randomly chosen informant from a randomlychosen language
40/124
What is the extension of categories?
data from individual informants are extremely noisy
averaging over all informants from a language helps, but thereis still noise, plus dialectal variation
desirable: distinction between “genuine” variation and noise
41/124
Statistical feature extraction
first step: representation ofraw data in contingencymatrix
rows: color terms fromvarious languagescolumns: Munsell chipscells: number of testpersons who used therow-term for thecolumn-chip
A0 B0 B1 B2 · · · I38 I39 I40 J0
red 0 0 0 0 · · · 0 0 2 0green 0 0 0 0 · · · 0 0 0 0blue 0 0 0 0 · · · 0 0 0 0black 0 0 0 0 · · · 18 23 21 25white 25 25 22 23 · · · 0 0 0 0
......
......
......
......
......
rot 0 0 0 0 · · · 1 0 0 0grun 0 0 0 0 · · · 0 0 0 0gelb 0 0 0 1 · · · 0 0 0 0
......
......
......
......
......
rouge 0 0 0 0 · · · 0 0 0 0vert 0 0 0 0 · · · 0 0 0 0
......
......
......
......
......
further processing:
divide each row by the number n of test persons using thecorresponding termduplicate each row n times
42/124
Principal Component Analysis
technique to reduce dimensionality of data
input: set of vectors in an n-dimensionalspace
first step:
rotate the coordinatesystem, such that
the new n coordinates areorthogonal to each otherthe variations of the dataalong the new coordinatesare stochasticallyindependent
second step:
choose a suitable m < n
project the data on those mnew coordinates where thedata have the highestvariance
43/124
Principal Component Analysis
alternative formulation:
choose an m-dimensional linear sub-manifold of yourn-dimensional spaceproject your data onto this manifoldwhen doing so, pick your sub-manifold such that the averagesquared distance of the data points from the sub-manifold isminimized
intuition behind this formulation:
data are “actually” generated in an m-dimensional spaceobservations are disturbed by n-dimensional noisePCA is a way to reconstruct the underlying data distribution
applications: picture recognition, latent semantic analysis,statistical data analysis in general, data visualization, ...
44/124
Statistical feature extraction: PCA
first 15 principalcomponents jointlyexplain 91.6% of thetotal variance
choice of m = 15 isdetermined by using“Kaiser’s stoppingrule”
principal components
prop
ortio
n of
var
ianc
e ex
plai
ned
0.00
0.05
0.10
0.15
0.20
0.25
0.30
45/124
Statistical feature extraction: PCA
after some post-processing (“varimax” algorithm):
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
46/124
Projecting observed data on lower-dimensional-manifold
noise removal: project observed data onto thelower-dimensional submanifold that was obtained via PCA
in our case: noisy binary categories are mapped to smoothedfuzzy categories (= probability distributions over Munsellchips)
some examples:
47/124
Projecting observed data on lower-dimensional-manifold
48/124
Projecting observed data on lower-dimensional-manifold
49/124
Projecting observed data on lower-dimensional-manifold
50/124
Projecting observed data on lower-dimensional-manifold
51/124
Projecting observed data on lower-dimensional-manifold
52/124
Projecting observed data on lower-dimensional-manifold
53/124
Projecting observed data on lower-dimensional-manifold
54/124
Projecting observed data on lower-dimensional-manifold
55/124
Projecting observed data on lower-dimensional-manifold
56/124
Projecting observed data on lower-dimensional-manifold
57/124
Projecting observed data on lower-dimensional-manifold
58/124
Projecting observed data on lower-dimensional-manifold
59/124
Projecting observed data on lower-dimensional-manifold
60/124
Projecting observed data on lower-dimensional-manifold
61/124
Projecting observed data on lower-dimensional-manifold
62/124
Projecting observed data on lower-dimensional-manifold
63/124
Projecting observed data on lower-dimensional-manifold
64/124
Projecting observed data on lower-dimensional-manifold
65/124
Projecting observed data on lower-dimensional-manifold
66/124
Projecting observed data on lower-dimensional-manifold
67/124
Smoothing the partitions
from smoothed extensions we can recover smoothed partitions
each pixel is assigned to category in which it has the highestdegree of membership
68/124
Smoothed partitions of the color space
69/124
Smoothed partitions of the color space
70/124
Smoothed partitions of the color space
71/124
Smoothed partitions of the color space
72/124
Smoothed partitions of the color space
73/124
Smoothed partitions of the color space
74/124
Smoothed partitions of the color space
75/124
Smoothed partitions of the color space
76/124
Smoothed partitions of the color space
77/124
Smoothed partitions of the color space
78/124
Convexity
note: so far, we only used information from the WCS
the location of the 330 Munsell chips in L*a*b* space playedno role so far
still, apparently partition cells always form continuous clustersin L*a*b* space
Hypothesis (Gardenfors): extension of color terms always formconvex regions of L*a*b* space
79/124
Support Vector Machines
supervised learning techniquesmart algorithm to classify data in a high-dimensional spaceby a (for instance) linear boundaryminimizes number of mis-classifications if the training dataare not linearly separable
gree
nre
d
−3 −2 −1 0 1 2 3
−3
−2
−1
0
1
2
3
o
o oo
o
o
oooo o
o
o
o
o
o
o
ooo
o
oo
o
o
o
oo o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
oo
o o
oo
o
o
o
o o
o
o
oo o
o
o
o
o
o
o
o
o
ooo
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o o
o o
oo
o
o
o
o
o
o
o
ooo
oo
o
o
oo
o
o
o
o
o
o o
o
o
o o
o
o
ooo
o
o
o
o
o
o
o
o
o
o o
oo
o
o
o
o
o
o
oo
o
o
oo
o
oo
o
oo
o
o
o
oo
o
o
o oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo
o
o
o
oo
o
ooo
o
o
SVM classification plot
y
x
80/124
Convex partitions
a binary linear classifier divides an n-dimensional space intotwo convex half-spaces
intersection of two convex set is itself convex
hence: intersection of k binary classifications leads to convexsets
procedure: if a language partitions the Munsell space into mcategories, train m(m−1)
2 many binary SVMs, one for each pairof categories in L*a*b* space
leads to m convex sets (which need not split the L*a*b*space exhaustively)
81/124
Convex approximation
82/124
Convex approximation
83/124
Convex approximation
84/124
Convex approximation
85/124
Convex approximation
86/124
Convex approximation
87/124
Convex approximation
88/124
Convex approximation
89/124
Convex approximation
90/124
Convex approximation
91/124
Convex approximation
on average, 93.7% of all Munsell chips are correctly classifiedby convex approximation
●
●
●
●
●
●
●
●
●
●
0.80
0.85
0.90
0.95
prop
ortio
n of
cor
rect
ly c
lass
ified
Mun
sell
chip
s
92/124
Convex approximation
compare to the outcome of the same procedure without PCA,and with PCA but using a random permutation of the Munsellchips
●●●●
●
●●●●
●
●
●
●●●
●●
●
●
●
●
●●●
●
●●
●●●●
●
●●●●●
●
●
●
●
●●●●●●●●●●●
●
●●●
●
●
●●●●●●
●●
●●
●
●●●●●
●
●●●●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
1 2 3
2040
6080
100
degr
ee o
f con
vexi
ty (
%)
93/124
Convex approximation
choice of m = 10 is somewhat arbitraryoutcome does not depend very much on this choice though
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0 10 20 30 40 50
5060
7080
9010
0
no. of principal components used
mea
n de
gree
of c
onve
xity
(%
)
94/124
Implicative universals
first six features correspond nicely to the six primary colorswhite, black, red, green, blue, yellow
according to Kay et al. (1997) (and many other authors)simple system of implicative universals regarding possiblepartitions of the primary colors
95/124
Implicative universals
I II III IV V
whitered/yellowgreen/blueblack
whiteredyellowgreen/blueblack
[white/red/yellowblack/green/blue
] whitered/yellowblack/green/blue
whitered/yellowgreenblack/blue
whiteredyellowgreenblueblack
whiteredyellowblack/green/blue
whiteredyellowgreenblack/blue
whiteredyellow/green/blueblack
whiteredyellow/greenblueblack
whiteredyellow/greenblack/blue
source: Kay et al. (1997)
96/124
Partition of the primary colors
each speaker/term pair can be projected to a 15-dimensionalvector
primary colors correspond to first 6 entries
each primary color is assigned to the term for which it has thehighest value
defines for each speaker a partition over the primary colors
97/124
Partition of the primary colors
for instance: sample speaker(from Piraha):
extracted partition:white/yellowredgreen/blueblack
supposedly impossible, butoccurs 61 times in thedatabase
J
I
H
G
F
E
D
C
B
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
98/124
Partition of primary colors
most frequent partition types:
1 {white}, {red}, {yellow}, {green, blue}, {black} (41.9%)2 {white}, {red}, {yellow}, {green}, {blue}, {black} (25.2%)3 {white}, {red, yellow}, {green, blue, black} (6.3%)4 {white}, {red}, {yellow}, {green}, {black, blue} (4.2%)5 {white, yellow}, {red}, {green, blue}, {black} (3.4%)6 {white}, {red}, {yellow}, {green, blue, black} (3.2%)7 {white}, {red, yellow}, {green, blue}, {black} (2.6%)8 {white, yellow}, {red}, {green, blue, black} (2.0%)9 {white}, {red}, {yellow}, {green, blue, black} (1.6%)10 {white}, {red}, {green, yellow}, {blue, black} (1.2%)
99/124
Partition of primay colors
87.1% of all speaker partitions obey Kay et al.’s universals
the ten partitions that confirm to the universals occupy ranks1, 2, 3, 4, 6, 7, 9, 10, 16, 18
decision what counts as an exception seems somewhatarbitrary on the basis of these counts
100/124
Partition of primary colors
more fundamental problem:
partition frequencies aredistributed according to powerlaw
frequency ∼ rank−1.99
no natural cutoff point todistinguish regular from exceptionalpartitions
●
●
●
●
● ●
●
●
●
●
● ● ●
●●●
●
●●
●
●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●
1 2 5 10 20 50
12
510
2050
100
200
500
rank
freq
uenc
y
101/124
Partition of seven most important colors
frequency ∼ rank−1.64
●
●
●
●
●
●● ●
● ● ● ●
●●●
●●
●●
●●●●●●●●
●●●●
●●●●
●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
1 2 5 10 20 50 100
12
510
2050
100
200
500
rank
freq
uenc
y
102/124
Partition of eight most important colors
frequency ∼ rank−1.46
●
●● ●
●●
● ●●
●●●●
●●●●
●●
●●
●●●
●●●●●●
●●●
●●●●●
●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
1 2 5 10 20 50 100 200
12
510
2050
100
200
rank
freq
uenc
y
103/124
Power laws
104/124
Power laws
105/124
Power laws
from Newman 2006
106/124
Other linguistic power law distributions
number of vowels
vowel systems and their frequency of occurrence
3
14
4
14 5 4 2
5
97 3
6
26 12 12
7
23 6 5 4 3
8
6 3 3 2
9
7 7 3
(from Schwartz et al. 1997,
based on the UCLA Phonetic Segment Inventory Database)
107/124
Other linguistic power law distributions
frequency ∼ rank−1.06
●
●
●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●●●
●●
1 2 5 10 202
510
2050
100
rank
freq
uenc
y
108/124
Other linguistic power law distributions
size of language families
source: Ethnologue
frequency ∼ rank−1.32
●
●
● ● ●●
●● ●
●
●
●●●
●●●●●●
●●●●●●
●●●●●●●●●●●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
1 2 5 10 20 50 1001
510
5010
050
0
rank
freq
uenc
y
109/124
Other linguistic power law distributions
number of speakers perlanguage
source: Ethnologue
frequency ∼ rank−1.01
●
● ●
●
● ● ●
●
●
●●
●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
1 2 5 10 20 50 100 2005
1020
5010
020
050
010
00
rank
freq
uenc
y (in
mill
ion)
110/124
The World Atlas of Language Structures
large scale typological database, conducted mainly by the MPIEVA, Leipzig
2,650 languages in total are used
142 features, with between 120 and 1,370 languages perfeature
available online
111/124
The World Atlas of Language Structures
question: are frequency of feature values powerlawdistributed?
problem: number of feature values usually too small forstatistic evaluation
solution:
cross-classification of two (randomly chosen) featuresonly such feature pairs are considered that lead to at least 30non-empty feature value combinations
pilot study with 10 such feature pairs
112/124
The World Atlas of Language Structures
Feature 1:Consonant-VowelRatio
Feature 2: Subtypesof AsymmetricStandard Negation
Kolmogorov-Smirnovtest: positive
100
101
102
10−2
10−1
100
Pr(
X ≥
x)
x
113/124
The World Atlas of Language Structures
Feature 1: WeightFactors inWeight-SensitiveStress Systems
Feature 2: OrdinalNumerals
Kolmogorov-Smirnovtest: positive
100
101
102
10−2
10−1
100
Pr(
X ≥
x)
x
114/124
The World Atlas of Language Structures
Feature 1: ThirdPerson Zero of VerbalPerson Marking
Feature 2: Subtypesof AsymmetricStandard Negation
Kolmogorov-Smirnovtest: positive
100
101
102
10−2
10−1
100
Pr(
X ≥
x)
x
115/124
The World Atlas of Language Structures
Feature 1:Relationship betweenthe Order of Objectand Verb and theOrder of Adjectiveand Noun
Feature 2: Expressionof PronominalSubjects
Kolmogorov-Smirnovtest: positive
100
101
102
103
10−2
10−1
100
Pr(
X ≥
x)
x
116/124
The World Atlas of Language Structures
Feature 1: Plurality inIndependent PersonalPronouns
Feature 2:AsymmetricalCase-Marking
Kolmogorov-Smirnovtest: positive
100
101
102
10−2
10−1
100
Pr(
X ≥
x)
x
117/124
The World Atlas of Language Structures
Feature 1: Locus ofMarking:Whole-languageTypology
Feature 2: Number ofCases
Kolmogorov-Smirnovtest: positive
100
101
102
10−2
10−1
100
Pr(
X ≥
x)
x
118/124
The World Atlas of Language Structures
Feature 1: Prefixingvs. Suffixing inInflectionalMorphology
Feature 2: Coding ofNominal Plurality
Kolmogorov-Smirnovtest: positive
100
101
102
103
10−2
10−1
100
Pr(
X ≥
x)
x
119/124
The World Atlas of Language Structures
Feature 1: Prefixingvs. Suffixing inInflectionalMorphology
Feature 2: OrdinalNumerals
Kolmogorov-Smirnovtest: positive
100
101
102
10−2
10−1
100
Pr(
X ≥
x)
x
120/124
The World Atlas of Language Structures
Feature 1: Coding ofNominal Plurality
Feature 2:AsymmetricalCase-Marking
Kolmogorov-Smirnovtest: positive
100
101
102
10−2
10−1
100
Pr(
X ≥
x)
x
121/124
The World Atlas of Language Structures
Feature 1: Position ofCase Affixes
Feature 2: OrdinalNumerals
Kolmogorov-Smirnovtest: negative
100
101
102
10−2
10−1
100
Pr(
X ≥
x)
x
122/124
Why power laws?
critical states
self-organized criticality
preferential attachment
random walks
...
Preferential attachment
items are stochasticallyadded to bins
probability to end up in binn is linear in number ofitems that are already in binn
123/124
(Wide) Open questions
Preferential attachment explains power law distribution ifthere are no a priori biases for particular types
first simulations suggest that preferential attachment + biasedtype assignment does not lead to power law
negative message: uneven typological frequency distributiondoes not prove that frequent types are inherently preferredlinguistically/cognitively/socially
unsettling questions:
Are there linguistic/cognitive/social biases in favor of certaintypes?If yes, can statistical typology supply information about this?If power law distributions are the norm, is their any content tothe notion of statistical universal in a Greenbergian sense?
124/124