visualizing multivariate uncertainty: some graphical

49
Visualizing Multivariate Uncertainty: Some Graphical Methods for Multivariable Spatial Data Suicides Infants Crime_prop Instruction Donations Crime_pers Median Ranks by Region Michael Friendly York University http://www.math.yorku.ca/SCS/friendly.html National Academy of Sciences Washington, DC, Mar, 2005

Upload: others

Post on 02-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty:

Some Graphical Methods for Multivariable Spatial Data

Suicides

InfantsCrime_prop

Instruction

Donations Crime_pers

Median Ranks by Region

Michael FriendlyYork University

http://www.math.yorku.ca/SCS/friendly.html

National Academy of Sciences

Washington, DC, Mar, 2005

Page 2: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty outline

Outline

Multivariate uncertainty and “moral statistics”A. M. Guerry’s Moral Statistics of FranceGuerry’s data and analyses

Multivariate analyses: Data-centric displays

Bivariate plots and data ellipsesBiplotsCanonical discriminant plotsHE plots for multivariate linear models

Multivariate mapping: Map-centric displays

Star mapsReduced-rank color maps

National Academies of Sciences, March 2005 1 c© Michael Friendly

Page 3: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty guerry1

Multivariate Uncertainty and “Moral Statistics” ∼ 1800

It is a capital mistake to theorize before one has data. Sherlock Homes inScandal in Bohemia

What to do about crime?Liberal view: increase education, literacyConservative view: build more prisons

What to do about poverty?Liberal view: increase social assistanceConservative view: build more poor-houses

But:Little actual data – all armchair theorizingNo ways to understand or visualize relationships between variables• Statistical graphics just invented (Playfair)— line graph, bar chart, pie chart• All 1D or 1.5D (time series)

National Academies of Sciences, March 2005 2 c© Michael Friendly

Page 4: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty background

The rise of “moral statistics” and modern social science

Political arithmetic: William Petty (and others)

1654— first attempt at scientific survey (on Irish estates)1687— idea that wealth and strength of a state depended on its subjects(number and characteristics)

Demography: Johann Peter Sussmilch (1741)—

importance of measuring and analyzing population distributionsidea that ethical and state policies could encourage growth and wealth(increase birth rate, decrease death rate)• discourage alcohol, gambling, prostitution & priestly celibacy• encourage state support for medical care, distribution of land, lower taxes

Statistik: Numbers of the state (1800–1820), Germany and France

collect data on imports, exports, transportation, ...

Guerry & QueteletQuetelet: Concepts of “average man” and “social physics”Guerry: First real social data analysis (Guerry, 1833)

National Academies of Sciences, March 2005 3 c© Michael Friendly

Page 5: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty data

Guerry’s data

Compte general de l’administration de la justice criminelle en France

The first national compilation of official justice data (1825)• detailed data on all charges and disposition• collected quarterly in all 86 departments.Other sources: Bureau de Longitudes (illegitimate births); Parent-Duchatelet(prostitutes in Paris); Compte du ministere du guerre (military desertions); ...

Moral variables: Scaled so ’more’ is ’better’

Crime pers Population per Crime against personsCrime prop Population per Crime against propertyDonations Donations to the poorInfants Population per illegitimate birthLiteracy Percent who can read & writeSuicides Population per suicide

Tried to define these to ensure comparability and representativeness• Crime: Use number of accused rather than convicted• Literacy: Reported levels of education unreliable; use data from military draft

examinations (% of young men able to read and write)

Other variables: Ranks by department: wealth, commerce, ...

National Academies of Sciences, March 2005 4 c© Michael Friendly

Page 6: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty questions

Guerry’s Questions

Should crime and other moral variables be considered as structural, lawful

characteristics of society, or simply as indicants of individual behavior?

Statistical regularity as the key to social science (“social physics”) social

equivalent of “law of large numbers”)

Guerry showed that rates of crime had nearly invariant distributions over time

(1825–1830) when classified by region, sex of accused, type of crime, etc. “We

would be forced to recognze that the facts of moral order, like those of physical

order, obey invariant laws...” (p.14)

Relations between crime and other moral variables

Do crimes against persons and crimes against property show the same or

different trends?

How does crime relate to education and literacy?

• Some “armchair” arguments had suggested increasing literacy to decrease

crime: “The definitive result shows that 67 out of 100 prisoners can neither

read nor write. What stronger proof could there be that ignorance is the

mother of all vices” (A. Taillander, 1828)

Does crime vary coherently over regions of France (C, N, S, E, W)?

National Academies of Sciences, March 2005 5 c© Michael Friendly

Page 7: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty maps

Guerry’s maps

National Academies of Sciences, March 2005 6 c© Michael Friendly

Page 8: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty maps

Guerry’s maps

Population per Crime against persons

Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86

Population per Crime against property

Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86

Per cent who can Read and Write

Rank 1 - 8 10 - 17 20 - 2629 - 35 38 - 47 48 - 5658 - 66 68 - 76 77 - 86

Population per Illegitimate birth

Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86

Donations to the poor

Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86

Population per Suicide

Rank 1 - 9 10 - 19 20 - 2829 - 38 39 - 47 48 - 5758 - 66 67 - 76 77 - 86

77

66

70

15

33 8

85

3

49

28

7

20

34

38

6343

54

26

81

76

86

53

9

19

24

5279

17

42

42

61

14

55

82

47

44

65

35

51

71

29

48

36

258

6

78

80 27

6774

6862

60

10

64

69

72

73

59

32

3111

4

12

5

45

57

75

84

22

39

5613

40

83

1623

18

21

50 25

30

46

37

1

84

19

48

40

5273

60

70

6

74

32

17

7

79

8318

76

82

64

37

86

72

22

46

12

434

50

39

75

41

78

30

44

36

56

51

25

24

81

85

67

16

6361

23

57

42 15

6866

3365

49

9

53

26

31

54

5

80

5871

45

14

13

8

47

77

55

1

2

21 3

35

38

2759

69

20

43 11

28

62

29

10

42

64

3

58

8026

78

10

73

39

35

44

65

35

4045

3

1

74

7

17

10

85

50

64

68 6

47

35

44

47

56

22

8

26

31

85

29

26

31

15

20

50

2035

26

17

52 77

8312

7986

5

71

14

56

68

56

62

12

6066

35

76

82

56

73

38

32

82

52

6870

48

53

1422

17

42

29 22

3

76

60

62

81

26

43

67

6885

37

65

50

57

64

6

5

33

5371

69

52

35

83

24

63

18

40

36

2575

76

34

31

9

62

84

56

41

14

59

32

19

78

79

21

7

5847

73

51

22 11

4649

1545

80

20

54

4

48

61

8

66

1613

30

23

27

2

12

60

10

1

3

3938

74

17

7770

28

44

86 82

55

29

42

72

45

62

69

11

5717

56

24

25

10

18

7

85

33

7776

66

82

9

67

70

41

21

13

74

3984

16

6

14

44

2

60

71

59

32

15

75

50

22

12

61

42

4738

4

37

46 28

30 5

2734

81

65

68

55

49

64

51

52

1954

72

80

54

3

73

26

20

35

58

4829

83

43

2340

8

1

79 63

78

31

36

86

56

12

82

18

2569

43

84

8

74

83

5

49

81

4226

32

66

24

78

79

58

64

37

16

2040

28

71

72

31

48

65

39

21

59

55

57

19

77

86

44

10

6862

9

51

70 6

3345

2315

54

41

47

17

3

53

22

80

7385

61

29

34

27

63

36

46

2

7

4 1

38

13

7667

14

30

75 35

52

50

11

60

National Academies of Sciences, March 2005 7 c© Michael Friendly

Page 9: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty maps

Guerry’s analyses

Relate variables by comparing maps and ranked lists (1st|| coordinate plot)

Conclusion: no clear relation between crime and literacy

Literacy Ranked lists Crimes against persons

Similar analyses for other variables (suicide, illegitimate births, ...)

National Academies of Sciences, March 2005 8 c© Michael Friendly

Page 10: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multivar0

Graphical methods for multivariate data

Bivariate displays: Bivariate displays can be enhanced to show statisticalrelations more clearly and effectively

Scatterplots with data (concentration) ellipses and smoothed (loess) curvesScatterplot matricesCorrgrams and visual thinning

Reduced-rank displays: Multivariate visualization techniques can show thestatistical data in simple ways, using dimension reduction techniques.

Biplots - show variables and observations in space accounting for greatestvarianceCanonical discriminant plots - show variables and observations in spaceaccounting for greatest between-group variation

HE plots: Visualization for Multivariate Linear Models

National Academies of Sciences, March 2005 9 c© Michael Friendly

Page 11: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty bivar

Bivariate plots: Points and visual summaries

Indre

Haute-Marne

Meuse

Doubs

Cote-d’Or

Jura

AriegeHaut-Rhin

Corsica

Creuse

Ardennes

Pop

. per

Crim

e ag

ains

t per

sons

0

10000

20000

30000

40000

Percent Read & Write

10 20 30 40 50 60 70 80

Scatterplot with linear regression line

National Academies of Sciences, March 2005 10 c© Michael Friendly

Page 12: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty dataellipse0

The Data Ellipse: Galton’s Discovery

Pearson (1920): “... one of the most noteworthy scientific discoveries arising from pure

analysis of observations.”

National Academies of Sciences, March 2005 11 c© Michael Friendly

Page 13: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty dataellipse0

The Data Ellipse: Details

Visual summary for bivariate marginal relations

Shows: means, standard deviations, correlation, regression line(s)

Defined: set of points whose squared Mahalanobis distance ≤ c2,

D2(y) ≡ (y − y)T S−1 (y − y) ≤ c2

S = sample variance-covariance matrix

Radius: when y is approx. bivariate normal, D2(y) has a large-sample χ22

distribution with 2 degrees of freedom.

• c2 = χ22(0.40) ≈ 1: 1 std. dev univariate ellipse– 1D shadows: y ± 1s

• c2 = χ22(0.68) = 2.28: 1 std. dev bivariate ellipse

• Small samples: c2 ≈ 2F2,n−2(1 − α)Construction: Transform the unit circle, U = (sinθ, cos θ),

Ec = y + cS1/2US1/2 = any “square root” of S (e.g., Cholesky)

Robust version: Use robust covariance estimate (MCD, MVE)

Nonparametric version: Use kernel density estimation

National Academies of Sciences, March 2005 12 c© Michael Friendly

Page 14: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty bivar

Bivariate plots: Data ellipse and smoothing

Indre

Haute-Marne

Meuse

Doubs

Cote-d’Or

Jura

AriegeHaut-Rhin

Corsica

Creuse

Ardennes

Pop

. per

Crim

e ag

ains

t per

sons

0

10000

20000

30000

40000

Percent Read & Write

10 20 30 40 50 60 70 80

Scatterplot with 68% data ellipse and smoothed (loess) curve

National Academies of Sciences, March 2005 13 c© Michael Friendly

Page 15: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty bivar

Bivariate plots: Region differences

C

E

N

S

W

Indre

Haute-Marne

Meuse

Doubs

Cote-d’Or

Jura

AriegeHaut-Rhin

Corsica

Creuse

Ardennes

Pop

. per

Crim

e ag

ains

t per

sons

0

10000

20000

30000

40000

Percent Read & Write

10 20 30 40 50 60 70 80

National Academies of Sciences, March 2005 14 c© Michael Friendly

Page 16: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty bivar

Bivariate plots: Scatterplot matrices

Crime_pers

2199

37014

Crime_prop

1368

20235

Literacy

12

74

Infants

2660

62486

National Academies of Sciences, March 2005 15 c© Michael Friendly

Page 17: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty corrgram

Corrgrams— Correlation matrix displays

How to show a correlation matrix for different purposes? (Friendly, 2002)

Render a correlation to depict sign and magnitude (tasks: lookup, comparison,detection)

Correlation value (x 100)-100 -85 -70 -55 -40 -25 -10 5 20 35 50 65 80 95 Number

Circle

Ellipse

Bars

Shaded

Task-specific renderings:

Task Lookup Comparison Detection

Rendering Number Circle Shading

National Academies of Sciences, March 2005 16 c© Michael Friendly

Page 18: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty corrgram

Corrgrams— Rendering

Baseball data: (lower) Patterns vs. (upper) comparison

Baseball data: PC2/1 order

Years

logSal

Homer

Putouts

RBI

Walks

Runs

Hits

Atbat

Errors

Assists

Assis

tsErr

ors

Atb

at Hits Runs

Walk

s RB

I

Puto

utsHom

er

logS

al Years

National Academies of Sciences, March 2005 17 c© Michael Friendly

Page 19: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty corrgram

Corrgrams— Variable ordering

Baseball data: (a) alpha vs. (b) correlation ordering (Friendly and Kwan, 2003)

(a) Alpha order

Assists

Atbat

Errors

Hits

Homer

logSal

Putouts

RBI

Runs

Walks

Years

Years

Walk

s Runs

RB

I

Puto

utslogS

al Hom

er

Hits

Err

ors

Atb

at

Assis

ts

(b) PC2/1 order

Years

logSal

Homer

Putouts

RBI

Walks

Runs

Hits

Atbat

Errors

Assists

Assis

tsErr

ors

Atb

at Hits Runs

Walk

s RB

I

Puto

utsHom

er

logS

al Years

See: http://www.math.yorku.ca/SCS/sasmac/corrgram.html

National Academies of Sciences, March 2005 18 c© Michael Friendly

Page 20: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty corrgram

Corrgrams— Variable ordering

Reorder variables to show similarities: PC1 or angles (PC2/PC1)

logSal

Years

Homer

Runs

Hits

RBI

Atbat

WalksPutouts

Assists Errors

Dim

en

sio

n 2

(1

7.4

%)

-1.0

-0.5

0.0

0.5

1.0

1.5

Dimension 1 (46.3%)-1.0 -0.5 0.0 0.5 1.0 1.5

National Academies of Sciences, March 2005 19 c© Michael Friendly

Page 21: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty corrgram

Corrgrams— Guerry data

Guerry Variables

Crime_pers

Desertion

Wealth

Infanticide

Literacy

Instruction

Clergy

Suicides

Lottery

Crime_prop

Infants

Infa

nts

Crim

e_pr

op Lo

ttery

Sui

cide

s C

lerg

y

Inst

ruct

ionLi

tera

cy Infa

ntic

ideW

ealth

Des

ertio

n

Crim

e_pe

rs

National Academies of Sciences, March 2005 20 c© Michael Friendly

Page 22: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty corrgram

Guerry data— Variable ordering

Suicides

Wealth

Lottery

Crime_propInfants

Instruction

Clergy

Crime_pers

Desertion

Infanticide

Literacy

Dim

ensi

on 2

(15

.8%

)

-1.2

-0.8

-0.4

0.0

0.4

0.8

1.2

Dimension 1 (35.5%)

-1.6 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2

National Academies of Sciences, March 2005 21 c© Michael Friendly

Page 23: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty corrgram

Visual thinning: Minimal summaries for large data sets

Guerry data: schematic scatterplot matrix: 68% data ellipse + loess smooth

rime_pers

2199

37014

Desertion

1

86

Wealth

1

86

Infanticide

1

86

Literacy

12

74

Instruction

1

86

Clergy

1

86

Suicides

3460

163241

Lottery

1

86

Crime_prop

1368

20235

Infants

2660

62486

National Academies of Sciences, March 2005 22 c© Michael Friendly

Page 24: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multivar1

Multivariate analyses: Reduced rank displays

Multivariate visualization techniques can show the statistical data in simple ways,using dimension reduction techniques.

Biplots - show variables and departments in space accounting for greatestvarianceCanonical discriminant plots - show variables and departments in spaceaccounting for greatest between-region variation

Can try to show geographic location by color coding or other visual attributes.

Color code by regionShow data ellipse to summarize regions

→ Data-centric displays: The multivariate data is shown directly; geographicrelations indirectly

National Academies of Sciences, March 2005 23 c© Michael Friendly

Page 25: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multivar1

Biplots

Biplots represent both variables (attributes) and observations (departments) in thesame plot— a low-rank (2D) approximation to a data matrix (Gabriel, 1971)

Y � = Y − Y·· ≈ ABT =d∑

k=1

akbTk

Variables are usually represented by vectors from origin (mean)Observations are usually represented by pointsCan show clusters of observations by data ellipses

Properties:

Angles between vectors show correlations (r ≈ cos(θ))Length of variable vectors ∼ % variance accounted foryij ≈ aT

i bj : projection of observation on variable vectorDimensions are uncorrelated overall (but not necessarily within group)

National Academies of Sciences, March 2005 24 c© Michael Friendly

Page 26: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multivar1

Biplots: Guerry data

National Academies of Sciences, March 2005 25 c© Michael Friendly

Page 27: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty biplot-bb

Biplots: Baseball data

C

1B

OF

1B

IF

IF

IF

IF

DH

IF

COF

IF

UT

IF

IF

C

OFOF

C

1BOFOF

C

IF

OF

OF

IF

C

IF

C1B

UT C

IF

OF OF

C

OF

IF

OFOF

OF

IF

IF

OF

IF

IF

IF

DH

OF

IF

UT

IF

1BOF

IF

OF

OF

IF

IF 1B

OF

OFOF

OF

C

IF

C

OF

IF

OFIF

OF

OFOF 1B

IFIF

OF

OF

IF

1B

C

1B

IF

IF

OF

IF

IF

OF

IF OF

OF

IF

1BOF

OFOF

IF

IF

DH

IF

1B

OF

OF

UT

OF

OF

OF

1B OF

C

DH

IF

IF

DH

IF OFOF

IF

OF

OF

IFIF

C

OF

UT C

IF

OFC

C

IF

OF

OF

OF

OFOF

1B1BOF OF

OF

OF

IF

DH

OF

IF

1B

OF

OFOF

OF

OFC

DH

DH

IF

1B

IF

OFOF

IF

DH

OFC

OF C

OF

IF

CIF CCOF

OF

OFOF OF

IF

OF

IF

C

OF

IF

OFIF

1B

OF

1B

IF

UTOF

CIF

OF

CC

OF

DH

DH

IF

DH

OF

IF

IFIF

IF

OF

OF

IF

IF

IF

OF

DH

IF

IF

OF

1B

C

1BIF

IF

IF

1B

IF

IF

IF

IF

IF

UT

OF

IF

IFUT

OF

OF

IFIF

C

OFC

UT

C

IF

IF

OFUT

IFIFOF

1B

IF

IFIF

1B 1B

OF

IF

IF1B

OF

logSal

Years

Homer

Runs

Hits

RBI

Atbat

WalksPutouts

Assists Errors

Dim

ensi

on 2

(17

.4%

)

-1.0

-0.5

0.0

0.5

1.0

1.5

Dimension 1 (46.3%)

-1.0 -0.5 0.0 0.5 1.0 1.5

Biplot of baseball data, players labeled by position

National Academies of Sciences, March 2005 26 c© Michael Friendly

Page 28: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty biplot-bb

C

1B

OF

1B

IF

IF

IF

IF

DH

IF

COF

IF

UT

IF

IF

C

OFOF

C

1BOFOF

C

IF

OF

OF

IF

C

IF

C1B

UT C

IF

OF OF

C

OF

IF

OFOF

OF

IF

IF

OF

IF

IF

IF

DH

OF

IF

UT

IF

1BOF

IF

OF

OF

IF

IF 1B

OF

OFOF

OF

C

IF

C

OF

IF

OFIF

OF

OFOF 1B

IFIF

OF

OF

IF

1B

C

1B

IF

IF

OF

IF

IF

OF

IF OF

OF

IF

1BOF

OFOF

IF

IF

DH

IF

1B

OF

OF

UT

OF

OF

OF

1B OF

C

DH

IF

IF

DH

IF OFOF

IF

OF

OF

IFIF

C

OF

UT C

IF

OFC

C

IF

OF

OF

OF

OFOF

1B1BOF OF

OF

OF

IF

DH

OF

IF

1B

OF

OFOF

OF

OFC

DH

DH

IF

1B

IF

OFOF

IF

DH

OFC

OF C

OF

IF

CIF CCOF

OF

OFOF OF

IF

OF

IF

C

OF

IF

OFIF

1B

OF

1B

IF

UTOF

CIF

OF

CC

OF

DH

DH

IF

DH

OF

IF

IFIF

IF

OF

OF

IF

IF

IF

OF

DH

IF

IF

OF

1B

C

1BIF

IF

IF

1B

IF

IF

IF

IF

IF

UT

OF

IF

IFUT

OF

OF

IFIF

C

OFC

UT

C

IF

IF

OFUT

IFIFOF

1B

IF

IFIF

1B 1B

OF

IF

IF1B

OF

logSal

Years

Homer

Runs

Hits

RBI

Atbat

WalksPutouts

Assists Errors

1BC

DH

IF

OFUT

Dim

ensi

on 2

(17

.4%

)

-1.0

-0.5

0.0

0.5

1.0

1.5

Dimension 1 (46.3%)

-1.0 -0.5 0.0 0.5 1.0 1.5

Biplot of baseball data, with data ellipses by position

National Academies of Sciences, March 2005 27 c© Michael Friendly

Page 29: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty biplot-bb

logSal

Years

Homer

Runs

Hits

RBI

Atbat

WalksPutouts

Assists Errors

1BC

DH

IF

OFUT

Dim

ensi

on 2

(17

.4%

)

-1.0

-0.5

0.0

0.5

1.0

1.5

Dimension 1 (46.3%)

-1.0 -0.5 0.0 0.5 1.0 1.5

Biplot of baseball data, with data ellipses by position (thinned)

National Academies of Sciences, March 2005 28 c© Michael Friendly

Page 30: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multivar1

Canonical discriminant plots

Project the variables into a low-rank (2D) space that maximally discrimates amongregions (Friendly, 1991)

Visual summary of a MANOVACanonical dimensions are linear combinations of the variables with maximumunivariate F -statistics.Vectors from the origin (grand mean) for the observed variables show thecorrelations with the canonical dimensions

Properties:

Canonical variates are uncorrelatedCircles of radius

√χ2

2(1 − α)/ni give confidence regions for group means.Variable vectors show how variables discriminate among groupsLengths of variable vectors ∼ contribution to discrimination

National Academies of Sciences, March 2005 29 c© Michael Friendly

Page 31: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multivar1

Canonical discriminant plots: Guerry data, by Region

National Academies of Sciences, March 2005 30 c© Michael Friendly

Page 32: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty cdaplots-bb

CDA plots: Baseball data, by player position

logsalyears

homerruns

hits

rbi

atbat

walks

putoutsassists

errors

Position 1B C DHIF OF UT

Can

onic

al D

imen

sion

2 (

27.1

%)

-6

-4

-2

0

2

4

6

Canonical Dimension 1 (72.9%)

-6 -4 -2 0 2 4 6 8

National Academies of Sciences, March 2005 31 c© Michael Friendly

Page 33: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty heplots

HE plots: Visualization for Multivariate Linear Models

How are p responses, Y = (y1, y2, . . . ,yp) related to q predictors,X = (x1, x2, . . . ,xq)? (Friendly, 2004a)

MANOVA: X ∼ discrete factorsMMRA: X ∼ quantitative predictorsMANCOVA, response surface models,

⎫⎪⎬⎪⎭

All the same MLM:

Y(n×p)

= X(n×q)

B(q×p)

+ E(n×p)

Analogs of univariate tests:

Explained variation: MSH �−→ (p × p) covariance matrix, HResidual variation: MSE �−→ (p × p) covariance matrix, ETest statistics: F �−→ |H − λE| = 0 �→ λ1, λ2, . . . λs

How big is H relative to E ?

Latent roots λ1, λ2, . . . λs measure the “size” of H relative to E ins = min(p, dfh) orthogonal directions.Test statistics: Wilks’ Λ, Pillai trace, Hotelling-Lawley trace, Roy’s maximumroot combine these into a single number

National Academies of Sciences, March 2005 32 c© Michael Friendly

Page 34: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty heplots

HE plots: Visualization for Multivariate Linear Models

HE plot: for two response variables, (y1, y2), plot a H ellipse and E ellipse

HE plot matrices: For all p responses, plot an HE scatterplot matrix

→ Shows: size, dimensionality, and effect-correlation of H relative to E .

Scatter around group meansrepresented by each ellipse

(a)

1

2

3

4

5

6

7

8

H matrix

E matrix

Deviations of group means fromgrand mean (outer) and pooledwithin-group (inner) ellipses.

How big is H relative to E?

(b)

Individual group scatterY2

0

10

20

30

40

Y1

0 10 20 30 40

Between and Within ScatterY2

0

10

20

30

40

Y1

0 10 20 30 40

Essential ideas behind multivariate tests: (a) Data ellipses; (b) H and E ellipses

National Academies of Sciences, March 2005 33 c© Michael Friendly

Page 35: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty heplots

Simple example: Iris data

Setosa

Versicolor

Virginica

Setosa

Versicolor

Virginica

Sep

al le

ngth

in m

m.

40

50

60

70

80

Petal length in mm.

10 20 30 40 50 60 70

Matrix Hypothesis Error

Sep

al le

ngth

in m

m.

40

50

60

70

80

Petal length in mm.

10 20 30 40 50 60 70

(a) Data ellipses and (b) H and E ellipses

H ellipse: Shows 2D covariation of predicted values (means)

E ellipse: Shows 2D covariation of residuals

points: show group means on both variables

National Academies of Sciences, March 2005 34 c© Michael Friendly

Page 36: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty he-bb

Baseball data: Variation by position

How do relations among variables vary with player’s position?

Fit MANOVA model,

(logSal Years Homer Runs Hits RBI Atbat Walks PutoutsAssists Errors) = PositionHE plots for selected pairs: (Years, logSal), (Putouts, Assists)

1B

C

DH

IF OF

UT

Model: logSal Years ... Errors = Position

Matrix Hypothesis Error

log

Sal

ary

2.0

2.2

2.4

2.6

2.8

3.0

3.2

Years in the Major Leagues

0 5 10 15 20

C

DH

IF

OF

UT

Model: logSal Years ... Errors = Position

Matrix Hypothesis Error

Ass

ists

-100

0

100

200

300

Put Outs

0 100 200 300 400 500 600

National Academies of Sciences, March 2005 35 c© Michael Friendly

Page 37: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty he-bb

1B

C

DH

IF OF

UT

Model: logSal Years ... Errors = Position

Matrix Hypothesis Error

log

Sal

ary

2.0

2.2

2.4

2.6

2.8

3.0

3.2

Years in the Major Leagues

0 5 10 15 20

National Academies of Sciences, March 2005 36 c© Michael Friendly

Page 38: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty he-bb

C

DH

IF

OF

UT

Model: logSal Years ... Errors = Position

Matrix Hypothesis Error

Ass

ists

-100

0

100

200

300

Put Outs

0 100 200 300 400 500 600

National Academies of Sciences, March 2005 37 c© Michael Friendly

Page 39: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty he-guerry

Guerry data: Predicting crime

How do rates of crime vary with other variables?

Fit MANCOVA model,

(Crime_pers Crime_prop) = Region + Wealth + Suicides+ Literacy + Donations + Infants

HE plots: Overall, plus for Region and covariate effects

.

Correze

Seine

Ain

Haute-Loire

Creuse

C

E

N

S W

Region

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Region effect

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

.

Correze

Seine

Ain

Haute-Loire

Creuse

.

Wealth

.

Suicides

.

LIteracy

.

Donation

.

InfantsP

op. p

er C

rime

agai

nst p

rope

rty

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Covariates: Wealth Suicides Literacy Donations Infants

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

National Academies of Sciences, March 2005 38 c© Michael Friendly

Page 40: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty he-guerry

.

Correze

Seine

Ain

Haute-Loire

Creuse

C

E

N

S W

Region

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Region effect

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Overall: Predicted crimes against persons and property are negatively correlatedLarger variation in crimes against propertyRegion variation greater in crimes against persons

National Academies of Sciences, March 2005 39 c© Michael Friendly

Page 41: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty he-guerry

.

Correze

Seine

Ain

Haute-Loire

Creuse

.

Wealth

.

Suicides

.

LIteracy

.

Donation

.

Infants

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Covariates: Wealth Suicides Literacy Donations Infants

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Pop

. per

Crim

e ag

ains

t pro

pert

y

0

5000

10000

15000

20000

Pop. per Crime against persons0 10000 20000 30000 40000

Each quantitative variable (covariate) plots as a 1D ellipse (vector)Orientation: relation of xi to y1, y2

Length: strength of relation

National Academies of Sciences, March 2005 40 c© Michael Friendly

Page 42: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multimap

Multivariate mapping: Map-centric displays

How to generalize choropleth maps to many variables?

Star maps: Show multivariate data on the map using star icons, variable ∼length of ray

Reduced-rank RGB displays: Factor analysis → (F1, F2, F3) factor scores�→ (R, G, B) shading

PREFMAP (x, y) maps: Fit data variables to (Long, Lat) map coordinates.Display variables as vectors in map coordinates.

National Academies of Sciences, March 2005 41 c© Michael Friendly

Page 43: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multimap

Star maps

Suicides

InfantsCrime_prop

Instruction

Donations Crime_pers

Departments coloredby Region

Star map of Guerry Variables (Ranks)

National Academies of Sciences, March 2005 42 c© Michael Friendly

Page 44: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multimap

Star maps: Medians by region

Suicides

InfantsCrime_prop

Instruction

Donations Crime_pers

Median Ranks by Region

National Academies of Sciences, March 2005 43 c© Michael Friendly

Page 45: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty multimap

Star maps: Multivariate boxplots by region

Suicides

InfantsCrime_prop

Instruction

Donations Crime_pers

stars for Q1, Median, Q3How to show unusualdepts?

National Academies of Sciences, March 2005 44 c© Michael Friendly

Page 46: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty gfactor

Reduced-rank color-coded displays

Use dimension-reduction technique (PCA, Factor analysis, ...) to produce scoresfor observations (departments) on 3 dimensions (F1, F2, F3)

Factor1 Factor2 Factor3Variable Civil society Moral values Crime

Pop per Crime against persons 0 97Pop per Crime against property 0 75 0 39

Percent Read & Write -0 72Pop per illegitimate birth 0 62 0 42

Donations to the poor 0 89Pop per suicide 0 80

Scale (F1, F2, F3) → [0,1]

Color mapping function, e.g., C(F1, F2, F3) �→ rgb(Fi, Fj , Fk)

National Academies of Sciences, March 2005 45 c© Michael Friendly

Page 47: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty gfactor

Reduced-rank color-coded displays

F3 F2

F1

RGB 3-factor map: R=f1, G=f2, B=f3Variables: Crime_pers Crime_prop Literacy Infants Donations Suicides

1

2

3

4

5 7

8

9

10

11

12

13

14

15

16 17

18

19

21

22

23

24

25

26

27

28 29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47 48

49

50 51

52 53

54 55

56

57

58

59

60

61

62

63

64 65

66

67

68

69

70

71

72

75

76

77 78

79

80

81

82

83

84

85 86

87

88

89

200

National Academies of Sciences, March 2005 46 c© Michael Friendly

Page 48: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty summary

Summary and future directions

Guerry’s challenge: Understanding uncertainty in multivariate, spatial data

How visualize and understand relations among many variables?How to relate these to geographic information?

Understanding multivariate variationVisual summaries (data ellipses, smoothings) can show statistical relationsmore clearly and effectivelyReduced-rank visualization methods can show simpler, approximate views,based on several criteria.Multivariate statistical models need their own visualization methods, justbeginning – HE plots as an example.

Understanding multivariate, spatial variationMultivariate visualizations applied to spatial data can be revealing, but still needworkStatistical methods for spatial data need to be extended to the multivariatesetting.

National Academies of Sciences, March 2005 47 c© Michael Friendly

Page 49: Visualizing Multivariate Uncertainty: Some Graphical

Visualizing Multivariate Uncertainty

ReferencesFriendly, M. (1991). SAS System for Statistical Graphics. Cary, NC: SAS Institute, 1st edn.

Friendly, M. (2002). Corrgrams: Exploratory displays for correlation matrices. The AmericanStatistician, 56(4), 316–324.

Friendly, M. (2004a). Graphical methods for the multivariate general linear model. Journal ofComputational and Graphical Statistics. (submitted 2/17/04; resubmitted: 10/02/04).

Friendly, M. (2004b). Milestones in the history of data visualization: A case study in statisticalhistoriography. In W. Gaul and C. Weihs, eds., Studies in Classification, Data Analysis, andKnowledge Organization. New York: Springer. (In press).

Friendly, M. and Kwan, E. (2003). Effect ordering for data displays. Computational Statistics andData Analysis, 43(4), 509–539.

Gabriel, K. R. (1971). The biplot graphic display of matrices with application to principalcomponents analysis. Biometrics, 58(3), 453–467.

Guerry, A.-M. (1833). Essai sur la statistique morale de la France. Paris. English translation:Hugh P. Whitt and Victor W. Reinking, Lewiston, N.Y. : Edwin Mellen Press, c2002.

Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13(1), 25–45.

National Academies of Sciences, March 2005 48 c© Michael Friendly