exploring the multivariate structure of missing values ... · exploring the multivariate structure...

18
Exploring the multivariate structure of missing values using the R package VIM Matthias Templ 1,2 , Andreas Alfons 1 , Peter Filzmoser 1 1 Department of Statistics and Probability Theory, Vienna University of Technology 2 Department of Methodology, Statistics Austria Rennes, July 8, 2009 Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 1 / 18

Upload: others

Post on 05-Jun-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Exploring the multivariate structure of missing valuesusing the R package VIM

Matthias Templ1,2, Andreas Alfons1, Peter Filzmoser1

1 Department of Statistics and Probability Theory, Vienna University of Technology2 Department of Methodology, Statistics Austria

Rennes, July 8, 2009

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 1 / 18

Page 2: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Content

1 Motivation

2 Visualization of missing values

3 Conclusions

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 2 / 18

Page 3: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Motivation

Missing values

Real data sets often contain missing values:

X =

x11 . . . . . . x1p

... NA...

NA... NA

...xn1 . . . . . . xnp

,

with n observations, p variables, and some missing values. (NA)

Examples: nonresponse in surveys, element concentration belowdetection limit in chemical analyses.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 3 / 18

Page 4: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Motivation

Comments on missing values

Most statistical methods can only be applied to complete data.

In order to select an appropriate imputation method (especially formodel-based imputation), it is necessary to know the multivariatestructure of the missing values beforehand.

Visualizing missing values may not only help to detect the missingvalue mechanisms, but also to gain insight into the quality andvarious other aspects of the data.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 4 / 18

Page 5: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Motivation

Missing value mechanisms

Three important cases (e.g., Little and Rubin 2002):

MCAR (Missing Completely At Random):

P(Xmiss |X) = P(Xmiss)

MAR (Missing At Random):

P(Xmiss |X) = P(Xmiss |Xobs)

MNAR (Missing Not At Random):

P(Xmiss |X) = P(Xmiss |Xobs ,Xmiss)

where X = (Xobs ,Xmiss) denotes the complete data, and Xobs and Xmiss

are the observed and missing parts, respectively.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 5 / 18

Page 6: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Visualization of missing values

Visualization of missing values

Famous books and almost all articles about missing values do notaddress vizualization.

Visualization tools for missing values are rarely or not at allimplemented in SAS, SPSS, STATA or even R.

Through linking, missing values can be highlighted in GGobi (Cookand Swayne 2007) and Mondrian (Theus 2002).

MANET (Unwin et al. 1996, Theus et al. 1997) is quite powerful, butonly available for older Apple systems with PowerPC architecture andMac OS.

Visualization tools for missing values need to be available for the Rcommunity so that visualization of missing valuess, imputation andanalysis can all be done from within R, without the need of additionalsoftware.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 6 / 18

Page 7: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Visualization of missing values

Histogram and spinogram

20 30 40 50 60 70 80

010

020

030

040

050

060

070

0

age

mis

sing

/obs

erve

d in

py0

10n

0.0

0.2

0.4

0.6

0.8

1.0

age

mis

sing

/obs

erve

d in

py0

10n

15 30 40 50 65

Figure: Austrian EU-SILC data from 2004 with missings generated in variable age.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 7 / 18

Page 8: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Visualization of missing values

Marginplot

● ●● ●●

15

00

3.0 3.5 4.0 4.5 5.0

3.0

3.5

4.0

4.5

pek_n

py13

0n

Figure: Austrian EU-SILC data from 2004.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 8 / 18

Page 9: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Visualization of missing values

Scatterplot matrix

pek_n

20 30 40 50 60 70 80

02

46

810

12

2030

4050

6070

80

age

0 2 4 6 8 10 12

py010n

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure: Austrian EU-SILC data from 2004.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 9 / 18

Page 10: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Visualization of missing values

Matrixplot

P00

1000

r007

000

py01

0n

py03

5n

py05

0n

py09

0n

py10

0n

pek_

n

bund

esld

age

010

0020

0030

0040

00

Inde

x

Figure: Austrian EU-SILC data from 2004.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 10 / 18

Page 11: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Visualization of missing values

Parallel coordinate plotse

x

pek_

g

P03

3000

P02

9000

P01

4000

P00

1000 ag

e

bund

esld

Figure: Austrian EU-SILC data from 2004

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 11 / 18

Page 12: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Visualization of missing values

Parallel boxplots

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●

●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

4626 4132 4530 96 4479 4612 14 4622 4 4592 34 4560 66 4618 8 4611 15 4625 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

obs.

in p

y010

nm

iss.

in p

y010

n

obs.

in p

y035

nm

iss.

in p

y035

n

obs.

in p

y050

nm

iss.

in p

y050

n

obs.

in p

y070

nm

iss.

in p

y070

n

obs.

in p

y080

nm

iss.

in p

y080

n

obs.

in p

y090

nm

iss.

in p

y090

n

obs.

in p

y100

nm

iss.

in p

y100

n

obs.

in p

y110

nm

iss.

in p

y110

n

obs.

in p

y130

nm

iss.

in p

y130

n

obs.

in p

y140

nm

iss.

in p

y140

n

01

23

45

pek_

n

Figure: Austrian EU-SILC data from 2004.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 12 / 18

Page 13: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Conclusions

General Statements

The detection of missing value mechanisms is quite complex whenusing models or tests.

Statistical methods frequently lead to only vague statements aboutthe missing value mechanisms.

Non-robust methods lead to erroneous statements about missingvalue mechanisms for data containing outliers.

Visualization tools are easier to handle and more powerful, butflexible, easy-to-use visualization software is required.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 13 / 18

Page 14: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Conclusions

The R package VIM

The R package VIM (Templ and Filzmoser 2008, Templ and Alfons 2009). . .

has all previously shown plots implemented, along with some more.

is a tool for explorative data analysis of data with missing values.

makes it possible to analyze the multivariate structure of missingvalues.

comes with a graphical user interface (GUI).

contains interactive features.

allows producing high-quality graphics for publications.

is available on CRAN(http://cran.r-project.org/package=VIM).

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 14 / 18

Page 15: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Conclusions

Graphical user interface of the R Package VIM

Figure: VIM GUI

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 15 / 18

Page 16: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Conclusions

Acknowledgments

This work was partly funded by the European Union (represented by theEuropean Commission) within the 7th framework programme for research(Theme 8, Socio-Economic Sciences and Humanities, Project AMELI(Advanced Methodology for European Laeken Indicators), GrantAgreement No. 217322). Visit http://ameli.surveystatistics.netfor more information.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 16 / 18

Page 17: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Conclusions

References I

D. Cook and D.F. Swayne. Interactive and Dynamic Graphics for DataAnalysis: With R and GGobi. Springer, New York, 2007. ISBN978-0-387-71761-6.

R.J.A. Little and D.B. Rubin. Statistical Analysis with Missing Data.Wiley, New York, 2nd edition, 2002. ISBN 0-471-18386-5.

M. Templ and A. Alfons. VIM: Visualization and Imputation of MissingValues, 2009. URL http://cran.r-project.org/package=VIM. Rpackage version 1.3.

M. Templ and P. Filzmoser. Visualization of missing values using theR-package VIM. Research Report CS-2008-1, Department of Statisticsand Probability Theory, Vienna University of Technology, 2008. URLhttp://www.statistik.tuwien.ac.at/forschung/CS/CS-2008-1complete.pdf.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 17 / 18

Page 18: Exploring the multivariate structure of missing values ... · Exploring the multivariate structure of missing values using the R package VIM Matthias Templ1;2, Andreas Alfons1, Peter

Conclusions

References II

M. Theus. Interactive data visualization using mondrian. Journal ofStatistical Software, 7(11): 1–9, 2002. URLhttp://www.jstatsoft.org/v07/i11.

M. Theus, H. Hofmann, B. Siegl, and A. Unwin. MANET - Extensions tointeractive statistical graphics for missing values. In In New Techniquesand Technologies for Statistics II, pages 247–259. IOS Press, 1997.

A. Unwin, G. Hawkins, H. Hofmann, and B. Siegl. Interactive graphics fordata sets with missing values: MANET. Journal of Computational andGraphical Statistics, 5(2): 113–122, 1996.

Templ, Alfons, Filzmoser (TUW) Exploring the structure of missing values Rennes, July 8, 2009 18 / 18