the rfwdmv package - ftp.auckland.ac.nzftp.auckland.ac.nz/software/cran/doc/packages/rfwdmv.pdf ·...

64
The Rfwdmv Package May 19, 2005 Version 0.71 Date 2005-05-01 Title Forward Search for Multivariate Data Author Anthony Atkinson <[email protected]>, Andrea Cerioli <[email protected]>, Marco Riani <[email protected]>. Maintainer Kjell Konis <[email protected]> Depends R (>= 1.8.0), MASS Description Explore Multivariate Data with the Forward Search License BSD URL http://www.riani.it R topics documented: assign.groups ........................................ 2 baby.dat ........................................... 3 bank.dat ........................................... 4 bb.subset .......................................... 5 bigunit.fwdmv ........................................ 6 bridge.dat .......................................... 7 diabetes.dat ......................................... 7 dyestuff.dat ......................................... 8 eigenvalues.fwdmv ..................................... 9 eigenvectors.fwdmv ..................................... 10 electrodes.dat ........................................ 11 ellipse.subset ........................................ 12 emilia.dat .......................................... 12 fondi.dat ........................................... 14 fwdmv ............................................ 15 fwdmv.init .......................................... 16 fwdmv.object ........................................ 18 fwdmvChangePlot ..................................... 20 fwdmvConfirmPlot ..................................... 21 fwdmvCovariancePlot ................................... 22 fwdmvDeterminantPlot ................................... 23 1

Upload: duonghanh

Post on 25-Aug-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

The Rfwdmv PackageMay 19, 2005

Version 0.71

Date 2005-05-01

Title Forward Search for Multivariate Data

Author Anthony Atkinson <[email protected]>, Andrea Cerioli <[email protected]>,Marco Riani <[email protected]>.

Maintainer Kjell Konis <[email protected]>

Depends R (>= 1.8.0), MASS

Description Explore Multivariate Data with the Forward Search

License BSD

URL http://www.riani.it

R topics documented:

assign.groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2baby.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3bank.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4bb.subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5bigunit.fwdmv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6bridge.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7diabetes.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7dyestuff.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8eigenvalues.fwdmv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9eigenvectors.fwdmv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10electrodes.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11ellipse.subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12emilia.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12fondi.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14fwdmv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15fwdmv.init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16fwdmv.object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18fwdmvChangePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20fwdmvConfirmPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21fwdmvCovariancePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22fwdmvDeterminantPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23

1

2 assign.groups

fwdmvDistancePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24fwdmvEccentricityPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25fwdmvEigenvectorPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26fwdmvEllipsePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27fwdmvEntryPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28fwdmvGapPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29fwdmvMinmaxPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30fwdmvPairsPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31fwdmvPartitionPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32fwdmvPrePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33fwdmvPrincompPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34fwdmvQuantilePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35fwdtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36fwdtr.object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37fwdtr.test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38fwdtr.test.object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40fwdtrLrPlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41fwdtrMlePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42fwdtrProfilePlot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43heads.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45iris.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46mcd.subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46milk.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47ms.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48mssmall.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49mussels.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50panel.bb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51panel.be . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51panel.me . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53plot.fwdmv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54plot.fwdtr.test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55print.fwdmv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56profile.fwdtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56profile.fwdtr.object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58quality.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59record.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60sixtyeighty.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61threetwo.dat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62

Index 63

assign.groups Tentative Group Assignments

Description

This function is used to assign tentative groups to anfwdmv object containing and initial fit.

Usage

assign.groups(object, groups)

baby.dat 3

Arguments

object anfwdmv object containing and initial fit.

groups a list containing the tentative groups. Each group is given by a vector of positiveintegers and the groups must be mutually exclusive.

Value

anfwdmv object containing the tentative groups.

Author(s)

Kjell Konis

See Also

fwdmv

Examples

data(fondi.dat)

g1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 53, 55, 56)

g2 <- c(57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)

# Create an fwdmv object containing just an initial fit #

fondi.init <- fwdmv(fondi.dat)

# Assign tentative groups #

fondi.tgs <- assign.groups(fondi.init, groups = list(g1, g2))

baby.dat Babyfood data

Description

The responses are the density in centipoises at the time of manufacture (y1) and when measured 3(y2), 6 (y3) and 9 months later (y4). There are five explanatory variables.

Usage

data(baby.dat)

Format

A data frame with 27 observations on the following 9 variables.

x1 a numeric vector.

x2 a numeric vector.

x3 a numeric vector.

4 bank.dat

x4 a numeric vector.

x5 a numeric vector.

y1 a numeric vector, the initial viscosity of the babyfood.

y2 a numeric vector, the viscosity of the babyfood after three months storage.

y3 a numeric vector, the viscosity of the babyfood after six months storage.

y4 a numeric vector, the viscosity of the babyfood after nine months storage.

Details

Box and draper (1987, p. 572) find a linear model with terms x2, x3 and x5, as well as surprisinglythe interaction x3:x4 in the absence of x4. This model was suggested for all four responses. Itis generally agreed that such models, violating a marginality constraint, are undesirable: if thevariables in this model are rescaled, the model will apparently change, a term in x3 appearing.

Source

Box and Draper (1987), p. 265 present part of a larger data set on the storage of a babyfood.

References

Atkinson, Riani and Cerioli (2004), p. 567; http://www.riani.it/arc.

bank.dat Swiss bank notes

Description

The data are readings on 200 Swiss bank notes, 100 of which are genuine and 100 forged. However,the structure of the samples may not be quite that simple. The forged notes have all been detectedand withdrawn from circulation. To provide a useful comparison, the genuine notes are likewiseused. So some of the notes in either group may have been misclassified. A second complication isthat there may be more than one forger at work.

Usage

data(bank.dat)

Format

A data frame with 200 observations on the following 6 variables.

y1 a numeric vector, the length of bank note near the top.

y2 a numeric vector, the left-hand height of bank note.

y3 a numeric vector, the right-hand height of bank note.

y4 a numeric vector, the distance from bottom of bank note to beginning of patterned border.

y5 a numeric vector, the distance from top of bank note to beginning of patterned border.

y6 a numeric vector, the diagonal distance.

bb.subset 5

Details

The first three variables are measurements of paper size while the fourth and fifth variables aremeasurements from the edge of the paper to the printed area. Onlyy6 is solely on the printed area.It measures the diagonal distance across the frame of the central illustration.

Source

Flury and Riedwyl (1988), pp. 4-8.

References

Atkinson, Riani and Cerioli (2004), p. 562-566; http://www.riani.it/arc.

Examples

data(bank.dat)plot(bank.dat)

bb.subset Initial Subset by Bivariate Box Plots

Description

Computes the initial subset using robust bivariate boxplots as described in ARC.

Usage

bb.subset(X, size)

Arguments

X a matrix or data frame containing a multivariate data set.

size the size of the initial subset.

Value

An integer vector containing the indexes of the units in the initial subset.

Author(s)

Aldo Corbellini

References

Zani, S., Riani M. and Corbellini A. (1998), "Robust Bivariate Boxplots and Multiple Outlier De-tection", Computational Statistics and Data Analysis (1998), pp. 257-270.

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv fwdmv.init fwdmvPrePlot

6 bigunit.fwdmv

Examples

data(fondi.dat)fondi.fwdmv <- fwdmv(fondi.dat, bsb = bb.subset)

### start with a subset size m=22data(fondi.dat)fondi.fwdmv <- fwdmv(fondi.dat, bsb = bb.subset(fondi.dat,22))

bigunit.fwdmv Generate the Big Unit Matrix

Description

Returns a logical matrix with one column for each step in the Forward Search and one row foreach unit in the data. The (i,j) element isTRUEif unit i is in the subset during step j andFALSEotherwise.

Usage

bigunit.fwdmv(x)

Arguments

x anfwdmv object.

Value

a logical matrix. See the description.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fondi.bigUnit <- bigunit.fwdmv(fondi.1)

bridge.dat 7

bridge.dat Bridge data

Description

A modification of the 60:80 data to reduce the sharpness of the division between groups.

Usage

data(bridge.dat)

Format

A data frame with 170 observations on the following 2 variables.

y1 a numeric vector.

y2 a numeric vector.

Details

In the 60:80 data and in the "three clusters and two outliers" data, the groups of observations wereclearly separated. However such complete separation is rare in real world examples where clustersoften overlap. This data is a modification of the 60:80 data to reduce the sharpness of the divisionbetween groups.

Source

Atkinson Riani and Cerioli (2004), p. 590-591; http://www.riani.it/arc.

diabetes.dat Diabetes data

Description

This data set consists of 145 observations on diabetes patients. These data have been used in thestatistical literature as a difficult example of cluster analysis.

Usage

data(diabetes.dat)

Format

A data frame with 145 observations on the following 3 variables.

y1 a numeric vector, the plasma glucose response to oral glucose.

y2 a numeric vector, the plasma insulin response to oral glucose.

y3 a numeric vector, the degree of insulin resistance.

8 dyestuff.dat

Details

y1 andy2 are responses to oral glucose, y3 is insulin resistance. The scatter plot matrix of the datashows that there seems to be a central cluster and two "arms" forming separate clusters.

Source

Reaven and Miller (1979), Fraley and Raftery (1998).

References

Atkinson, Riani and Cerioli (2004), p. 594-595; http://www.riani.it/arc.

dyestuff.dat Dyestuff data

Description

The data arise in a study of the manufacture of a dyestuff. There are 64 observations at the pointsof a2^6 factorial and three responses: strength, hue and brightness.

Usage

data(dyestuff.dat)

Format

A data frame with 64 observations on the following 6 variables.

x1 a numeric vector.

x2 a numeric vector.

x3 a numeric vector.

y1 a numeric vector giving the strength.

y2 a numeric vector giving the hue.

y3 a numeric vector giving the brightness.

Details

The original data set contained 6 variables. However only 3 of the six explanatory variables have asignificant effect on the three responses. Here we report only the three explanatory variables whichhave been found to be significant.

Source

Box and Draper (1987), pp. 114-115.

References

Atkinson, Riani and Cerioli (2004), p. 570; http://www.riani.it/arc.

Examples

data(dyestuff.dat)

eigenvalues.fwdmv 9

eigenvalues.fwdmv Compute the Eigenvalues from an fwdmv Object

Description

An accessor method to retrieve the eigenvalues of the covariance matrix estimates from anfwdmvobject.

Usage

eigenvalues.fwdmv(x)

Arguments

x anfwdmv object.

Value

a list with one element for each group in thefwdmv objectx . Each element is a matrix where rowi contains the eigenvalues of the covariance matrix estimate computed during step i of the forwardsearch.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fondi.evals <- eigenvalues.fwdmv(fondi.1)

10 eigenvectors.fwdmv

eigenvectors.fwdmv Retrieve an Eigenvector of the Covariance Matrix Estimate from anfwdmv Object

Description

An accessor method to retrieve an eigenvector of the covariance matrix estimate from anfwdmvobject.

Usage

eigenvectors.fwdmv(x, which.vector = 1)

Arguments

x anfwdmv object.

which.vector an integer specifying which eigenvector should be returned.

Value

a list with one element for each group in thefwdmv objectx . Each element is a matrix where rowi contains the components of thewhich.vector eigenvector of the covariance matrix estimatecomputed during step i of the forward search.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fondi.evec1 <- eigenvectors.fwdmv(fondi.1, which.vector = 1)fondi.evec3 <- eigenvectors.fwdmv(fondi.1, which.vector = 3)

electrodes.dat 11

electrodes.dat Electrodes data

Description

The data are measurements from machines manufacturing supposedly identical electrodes. Theelectrodes are shaped rather like nipples. There are five measurements on each:y1 , y2 andy5 arediameters, whiley3 andy4 are lengths. Fifty electrodes from each machine have been measured.

Usage

data(electrodes.dat)

Format

A data frame with 100 observations on the following 5 variables.

y1 a numeric vector, the first diameter.

y2 a numeric vector, the second diameter.

y3 a numeric vector, the first length.

y4 a numeric vector, the second length.

y5 a numeric vector, the third diameter.

Details

For reasons of commercial secrecy, the data have been transformed by subtracting constants fromthe variables.

Source

The data come from an unpublished University Ph.D. thesis by Kreuter. The data are given anddescribed by Flury and Riedwyl (1998), pp. 128-132.

References

Atkinson, Riani and Cerioli (2004), p. 579-580; http://www.riani.it/arc.

Examples

data(electrodes.dat)

12 emilia.dat

ellipse.subset Initial Subset by Robustly Centered Ellipses

Description

Computes the initial subset using robustly centered ellipses as described in ARC.

Usage

ellipse.subset(X, size)

Arguments

X a numeric matrix containing the multivariate data set.

size an integer specifying the size of the initial subset.

Author(s)

Kjell Konis

References

Riani M. and Zani S. (1997), An Iterative Method for the Detection of Multivariate Outliers",Metron, pp. 101-117.

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)

fondi.fwdmv <- fwdmv(fondi.dat, bsb = ellipse.subset)

emilia.dat Municipalities in Emilia-Romagna

Description

A large data set with 341 observations on 29 variables. The data are taken from the 1991 Italiancensus and cover all municipalities. Nearly all the variables are indices in which counts have beendivided by municipal population.

Usage

data(emilia.dat)

emilia.dat 13

Format

A data frame with 341 observations on the following 29 variables.

Province a factor with levelsBO, FE, FO, MO, PC, PR, RA, REandRNidentifying the province.

Pop.Inf a numeric vector, the percentage of the population aged less than ten (y1 in ARC).

Pop.sena numeric vector, the percentage of the population aged 75 or more (y2 in ARC).

Unipers a numeric vector, the percentage of the single-member families (y3 in ARC).

Divorzi a numeric vector, the percentage of the residents who are divorced (y4 in ARC).

Vedovi a numeric vector, the percentage of widows and widowers (y5 in ARC).

Laurea a numeric vector, the percentage of the population aged over 25 (y6 in ARC).

Notitolo a numeric vector, the percentage of those aged over six having no education (y7 in ARC).

T.attiv a numeric vector, the percentage of those of working age in full-time employment (y8 inARC).

T.disoc. a numeric vector, the unemployment rate (y9 in ARC).

Mov.nat a numeric vector, the standardized natural increase in population (y10 in ARC).

Mov.mig a numeric vector, the standardized change in population due to migration (y11 in ARC).

Natalit. a numeric vector the average birth rate from 1992 to 1994 (y12 in ARC).

Fecondi1 a numeric vector, fecundity: the three year average birth rate amongst women of child-bearing age (y13 in ARC).

Dopo.82 a numeric vector, the percentage of occupied houses built since 1982 (y14 in ARC).

xbagni a numeric vector, the percentage of occupied houses with two or more WCs (y15 in ARC).

Impianto a numeric vector, the percentage of occupied houses with fixed heating system (y16 inARC).

TV.. a numeric vector, the percentage of TV licence holders (y17 in ARC).

Parco a numeric vector, the number of cars per 100 inhabitants (y18 in ARC).

Lusso.Imm a numeric vector, the percentage of luxury cars (y19 in ARC).

Add.H a numeric vector, the percentage of those working in hotels and restaurants (y20 in ARC).

Add.J a numeric vector, the percentage of those working in banking and finance (y21 in ARC).

Imp.cont a numeric vector, the average decleared income amongst those filing income tax returns(y22 in ARC).

Cont.res a numeric vector, the percentage of inhabitants filing income tax returns (y23 in ARC).

Add.Ul a numeric vector, the percentage of the residents employed in factories and public services(y24 in ARC).

x0add.Add a numeric vector, the percentage of employees working in factories with more than tenemployees (y25 in ARC).

V68 a numeric vector, the percentage of employees working in factories with more than 50 em-ployees (y26 in ARC).

Artigiane a numeric vector, the percentage of those working in artisanal enterprises (y27 in ARC).

Imprendi a numeric vector, the percentage enterpreneurs and skilled self-emplyed among those ofworking age (y28 in ARC).

14 fondi.dat

Details

These 29 variables were selected from 50 available. The first 13 (y1 -y13 ) are demographic vari-ables, the next three (y14 -y16 ) measure housing quality, the succeeding seven (y17 -y23 ) aremeasures of individual income and wealth and the last five (y24 -y28 ) relate to industrial produc-tion.

Source

Zani (1996).

References

A selection of these data is given by Atkinson, Riani and Cerioli (2004), p. 560-561; the full dataare available on the website http://www.riani.it/arc.

Examples

data(emilia.dat)

fondi.dat Investment Funds Data

Description

These data contain information on 103 investment funds operating in Italy since April 1996.

Usage

data(fondi.dat)

Format

A data frame with 103 observations on the following 3 variables.

y1 a numeric vector, the short term (12 months) performance.

y2 a numeric vector, the medium term (36 months) performance.

y3 a numeric vector, the medium term (36 months) volatility.

Details

The scatter plot matrix of the data shows there seem to be two clusters, with a few observations inbetween.

Source

Zani (2000), p. 194. An introduction to the data in English is given by Cerioli and Zani (2001).

References

Atkinson, Riani and Cerioli (2004), p. 592-593; http://www.riani.it/arc.

Examples

data(fondi.dat)

fwdmv 15

fwdmv Multivariate Forward Search

Description

This function computes a multivariate forward search. Several diagnostic statistics are monitoredduring the search: seefwdmv.object .

Usage

fwdmv(X, groups = NULL, alpha = 0.6, beta = 0.75, bsb = ellipse.subset, balanced = TRUE, scaled = TRUE, constrained = TRUE)

Arguments

X a matrix or data frame containing a multivariate data set.

groups a list of one or more integer vectors specifying the tentative groups. All elementsmust be unique. Units not belonging to any group are classified as unassigned.If omitted, all of the data are assumed to come from a single multivariate normalpopulation.

alpha a numeric value between 0 and 1 specifying the fraction of the units in eachgroup that will be included in the initial subset.

beta a numeric value betweenalpha and 1 specifying the fraction of the units ineach tentative group that must be included in the subset before the unassignedunits are allowed to be included. A large value ofbeta insures that the centroidand variance-covariance matrix estimates stabilize before the unassigned unitsenter the subsets.

bsb a function of two variables: the multivariate data in matrix formX and the num-ber of units in the initial subsetsize . If bsb= bb.subset the initial susbet isfound using robust bivariate boxplots. The default is to use robustly centeredellipsesellipse.subset . Alternatively, the initial subset my be specifieddirectly by providing an integer vector containing the indices of the units to bein the initial subset.

balanced a logical value. IfTRUEthen units are added to the subset so that the groupratios in the subset stay as close as possible to the group ratio in the data.

scaled a logical value. IfTRUEthen the Mahalanobis distances are scaled using the2p root of the determinant. This is intended to compensate for clusters withsignificantly different dispersions.

constrained a logical value. If TRUE then the forward search chooses units from the tenta-tive groups until each group isbeta full; then unassigned units are allowed intothe subset. If FALSE then unassigned units may enter the subset at anytime dur-ing the forward search. Note that whenconstrained == F , the argumentbalanced is ignored.

Details

Initial group subsets of sizealpha * nbsb[i] (wherenbsb[i] is the number of units as-signed to tentative group i) are obtained by running the initialization function on each group. Es-timates of the center and covariance matrix are computed for each group using the units currently

16 fwdmv.init

in the group subset. The Mahalanobis distance for each unit in a tentative group is computed us-ing the center and covariance matrix estimates for that group. The Mahalanobis distance for eachunassigned unit is computed by calculating the distance to each group and taking the minimum. Ifthe search is balanced then one unit is added to the subset that is currently the furthest below thepopulation ratio. If the search is not balanced then the unit (not in any subset) with the smallestdistance is allocated to the nearest group. If the search is constrained then the unassigned units arenot allowed into the group subsets until each group subset contains a fractionbeta of the units inthe tentative groups. If the search is not constrained then the unassigned units may enter the subsetat any time during the search.

Value

a list with classfwdmv .

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object

Examples

data(fondi.dat)

g1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 53, 55, 56)

g2 <- c(57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)

fondi.fwdmv <- fwdmv(fondi.dat, groups = list(g1, g2))

fwdmv.init Multivariate Forward Search for Ungrouped Data

Description

This function computes a multivariate forward search for ungrouped data. Several diagnostic statis-tics are monitored during the search: seefwdmv.object . Note that this function is called byfwdmv when no tentative groups are specified. It is recommended thatfwdmv be used for allmultivariate forward searches.

Usage

fwdmv.init(X, bsb = ellipse.subset, scaled = TRUE)

fwdmv.init 17

Arguments

X a matrix or data frame containing the multivariate data set.

bsb a function of two variables: the multivariate data in matrix formX and the num-ber of units in the initial subsetsize . If bsb= bb.subset the initial susbet isfound using robust bivariate boxplots. The default is to use robustly centeredellipsesellipse.subset . Alternatively, the initial subset my be specifieddirectly by providing an integer vector containing the indices of the units to bein the initial subset.

scaled a logical value. IfTRUEthen scaled Mahalanobis distances are used during theforward search.

Details

This function computes the Forward Search described in chapter 3 of ARC. The initial subset canbe specified directly in the argumentbsb or computed from the data. By defaultbsb is a functionfor computing the initial subset using robustly centered ellipses. Given a subset ofm units thenext subset is them+1 units with smallest Mahalanobis distances calculated using the center andcovariance matrix estimates of the units currently in the subset. This process is repeated until thesubset contains all of the units and several diagnostic statistics are computed for each subset.

Value

aafwdmv object.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object

Examples

data(fondi.dat)fondi.init <- fwdmv.init(fondi.dat)

data(fondi.dat)#### find the intial subset using robust bivariate ellipses#### start with an initial subset size of 17 unitsfondi.init <- fwdmv.init(fondi.dat,bsb=ellipse.subset(fondi.dat,17))

data(fondi.dat)#### find the intial subset using robust bivariate boxplots and#### start with an initial subset size of 17 units andfondi.init <- fwdmv.init(fondi.dat,bsb=bb.subset(fondi.dat,17))

18 fwdmv.object

fwdmv.object fwdmv.object

Description

An object containing a fitted forward search on multivariate data. The class attribute is set tofwdmv .

Format

An fwdmv.object is a list with the following elements:

call the matched call.

Distances a numeric matrix containing the Mahalanobis distances computed during the forwardsearch.

Center a list of numeric matrices containing the location estimates for each group computed duringthe forward search.

Cov a list of numeric matrices containing the covariance matrix estimates (in packed storage) foreach group computed during the forward seach).

Determinant a list of numeric vectors containing the determinants of the covariance matrix esti-mates for each group computed during the forward search.

Unit a list of lists of integer vectors containing the subsets during each step of the forward search.

groups a list of integer vectors containing the user specified tentative groups.

n an integer, the number of units in the data.

p an integer, the number of variables in the data.

m an integer, the number of units in the subset during the first step of the forward search.

data a numeric matrix containing the data, the dimnames attribute is set toNULL.

data.name the name of the data frame or matrix containing the multivariate data set.

data.namesa list of character vectors containing the row and column names of the data.

group.names a character vector containing the names of the tentative groups.

unassignedan integer vector containing the indices of the units that do not belong to one of thetentative groups.

constrained a loigcal value,TRUEif the forward search was constrained.

scaled a logical value,TRUEis scaled Mahalanobis distances were used during the forward search.

Max a numeric vector contining the maximum Mahalanobis distance in the subset.

Mth a numeric vector containing the mth overall Mahalanobis distance.

Min a numeric vector containing the minimum Mahalanobis distance not in the subset.

Mpo a numeric vector containing the (m+1)th overall Mahalanobis distance.

initial a logical value,TRUEif this object was generated by the functionfwdmv.init .

Details

Classfwdmv objects are created by the functionsfwdmv , fwdmv.init , andpartition . TheRfwdmv package contains a variety of plot methods for assesingfwdmv objects. These methodsare listed in the see also section. Note that there are accessor methods for several elements containedin the fwdmv object. When an accessor method exists for a certain element it should be used toretrieve that element in preference to direct reference as the structure of thefwdmv object is likelyto change.

fwdmv.object 19

See Also

fwdmv.init fit an ititial multivariate forward search.

fwdmv fit a multivariate forward search with user specified tentative groups.

Plot methods for assessing a fitted forward search stored in anfwdmv object:

fwdmvPairsPlot a pairs-like plot.

fwdmvQuantilePlot plot trajectories of entering units over quantiles of the distances in thesubset.

fwdmvEllipsePlot a pairs-like plots with the subsets represented by ellipses.

fwdmvConfirmPlot plots the nearest center for unassigned and misclassified units.

fwdmvCovariancePlot a forward plot of the elements of the covariance matrices.

fwdmvDeterminantPlot a forward plot of the determinants of the covariance matrices.

fwdmvDistancePlot a forward plot of the Mahalanobis distances.

fwdmvEccentricityPlot a forward plot of the eccentricity for one biariate ellipse.

fwdmvPrincompPlot a forward plot of the principal components.

fwdmvEigenvectorPlot a forward plot of a user specified eigenvector of the covariance ma-trice.

fwdmvEntryPlot a forward entry plot.

fwdmvGapPlot a gap plot.

fwdmvMinmaxPlot minimum and maximum distances plot.

fwdmvChangePlot aa forward plot of change in Mahalanobis distance.

partition graphically assign units to groups.

fwdmvPartitionPlot view tentative group.

Accessor methods:

bigunit.fwdmv subsets represented as a logical matrix.

eigenvalues.fwdmv the eigenvalues of the covariance matrix estimates computed during theforward search.

eigenvectors.fwdmv the eigenvectors of the covariance matrix estimates computed duringthe forward search.

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)

#fondi.1 is an fwdmv object.

20 fwdmvChangePlot

fwdmvChangePlot Change Plot of an fwdmv Object

Description

A matrix with one row for each unit in the data and one column for each step of the forward search.If the distance of unit i increases during step j of the forward search then cell (i,j) is filled - otherwiseit is empty.

Usage

fwdmvChangePlot(x, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is invisibly returned.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fwdmvChangePlot(fondi.1)

fwdmvConfirmPlot 21

fwdmvConfirmPlot Confirmatory Group Assignment Plot

Description

A plot displaying the nearest group to each of the unassigned units during a final multivariate for-ward search.

Usage

fwdmvConfirmPlot(x, n.steps, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

n.steps an integer value. The lastn.steps steps of the forward search are displayed.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is invisibly returned.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)

g1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 53, 55, 56)

g2 <- c(57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)

fondi.fwdmv <- fwdmv(fondi.dat, groups = list(g1, g2))

fwdmvConfirmPlot(fondi.fwdmv, n.steps = 30)

22 fwdmvCovariancePlot

fwdmvCovariancePlotPlot the Covariance Matrix of an fwdmv Object

Description

Plots the elements of the covariance matrix of each group against the subset size.

Usage

fwdmvCovariancePlot(x, id = FALSE, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

id a logical value. IfTRUEthen the curves in the plot can be identified interactivelywith the mouse. Alternatively,id = "all" will identify all of the curves inthe plot.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

a list is invisibly returned. Ifid = TRUE then the list has an element namedselected contain-ing the identified curves. The curves are identified by their column index in the fwdmv object. Ifthere is more than one group then the numbers 1 through p are for the first group, p+1 through 2pfor the second group and so on.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)

g1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 53, 55, 56)

g2 <- c(57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)

fondi.fwdmv <- fwdmv(fondi.dat, groups = list(g1, g2))

fwdmvDeterminantPlot 23

fwdmvCovariancePlot(fondi.fwdmv)

fwdmvCovariancePlot(fondi.fwdmv, id = "all")

# Use 'id = TRUE' for interactive curve labels.

fwdmvDeterminantPlotPlot the Determinant in an fwdmv Object

Description

Plots the determinants of the covariance matrix estimates against the subset size for each group inthefwdmv object.

Usage

fwdmvDeterminantPlot(x, psfrag.labels = FALSE)

Arguments

x anfwdmv object.psfrag.labels

a logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is returned invisibly.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)

g1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 53, 55, 56)

g2 <- c(57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)

fondi.fwdmv <- fwdmv(fondi.dat, groups = list(g1, g2))

fwdmvDeterminantPlot(fondi.fwdmv)

24 fwdmvDistancePlot

fwdmvDistancePlot Plot the Mahalanobis Distances in an fwdmv Object

Description

Plots the Mahalanobis distance against the subset size for each unit in thefwdmv object.

Usage

fwdmvDistancePlot(x, group = NULL, id = FALSE, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

group show only the trajectories for the units in this group.

id identify trajectories in the plot. Ifid = TRUE then trajectories in the plot canbe identified interactively with the mouse. Ifid is an integer vector then thetrajectories for those units are identified in the plot. Ifid = "all" then all ofthe trajectories are labelled.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

a list is invisibly returned. Ifid = TRUE then the list has an element namedselected contain-ing the indices of the identified trajectories.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)

fwdmvDistancePlot(fondi.1)

# Use 'id = TRUE' for interactive trajectory identification.

fwdmvDistancePlot(fondi.1, id = c(39, 52, 96))

fwdmvEccentricityPlot 25

g1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 53, 55, 56)

g2 <- c(57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)

fondi.2 <- fwdmv(fondi.dat, groups = list(g1, g2))

fwdmvDistancePlot(fondi.2, group = 2)

fwdmvEccentricityPlotEccentricity Plot

Description

Plots (for each group in thefwdmv object) the fraction of variance explained by the eigenvalues inwhich and the eccentricity of the ellipse from the same two eigenvalues.

Usage

fwdmvEccentricityPlot(x, which = c(1, 2), psfrag.labels = FALSE)

Arguments

x anfwdmv object.

which an integer vector of length 2. Specifies the eignevalues to use in the plot.psfrag.labels

a logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is returned invisibly.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fwdmvEccentricityPlot(fondi.1)

26 fwdmvEigenvectorPlot

fwdmvEigenvectorPlotPlot the Components of an Eigenvector in an fwdmv Object

Description

Plots (for each group in thefwdmv object) the components of the specified eigenvector of thecovariance matrix against the subset size.

Usage

fwdmvEigenvectorPlot(x, which.vector = 1, correlation = FALSE, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

which.vector an integer value used to select the eigenvector whose components are to be plot-ted.

correlation a logical value. IfTRUEthen the eigenvectors of the correlation matrix areplotted.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is returned invisibly.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fwdmvEigenvectorPlot(fondi.1)

fwdmvEllipsePlot 27

fwdmvEllipsePlot Ellipses Plot of an fwdmv Object

Description

A pairs like plot. The subsets within each group are represented by concentric ellipses drawn tocontours of the bivariate normal distribution. Units in the tentative groups that are not in the subsetsare plotted as points and the indices of the unassigned units are plotted as text.

Usage

fwdmvEllipsePlot(x, subset.size, plot.diagonal = TRUE)

Arguments

x anfwdmv object.

subset.size an integer specifying (by its size) which subset to use to draw the ellipses.plot.diagonal

a logical value. IfTRUEthen univariate boxplots are drawn along the maindiagonal.

Value

an empty list is invisibly returned.

Note

For initial fwdmv objects the unassigned units are plotted as points.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)

g1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 53, 55, 56)

g2 <- c(57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)

fondi.fwdmv <- fwdmv(fondi.dat, groups = list(g1, g2))

fwdmvEllipsePlot(fondi.fwdmv, subset.size = 60)

28 fwdmvEntryPlot

fwdmvEntryPlot Plot the Entry Order in an fwdmv Object

Description

Produces an entry order plot of an fwdmv object.

Usage

fwdmvEntryPlot(x, entry.order = "first", subset.size = -1, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

entry.order a character vector of length 1 specifying how the rows should be ordered. Thepossibilities are "first", "final", "natural" and "integer". Ifentry.order =="integer" thenentry.order.n must be supplied as well.

subset.size an integer value giving the subset size to be used whenentry.order =="integer" .

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Details

An entry oder plot is a matrix where the (i,j) cell is black if the unit represented by row i is in thesubset during step j and white otherwise. The rows can be ordered in four ways: (1) First entryorder: the rows are ordered (from bottom to top) by the first time the unit enters the subset; (2) Finalentry order: the row are ordered (from bottom to top) by the last time the unit enters the subset; (3)Natural entry order: the rows appear (from bottom to top) in the same order as in the data; and (4)Integer entry order: the rows are ordered (from bottom to top) in the same order as the distances inthe subset ofsubset.size units.

Value

an empty list is returned invisibly.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

fwdmvGapPlot 29

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fwdmvEntryPlot(fondi.1)

fwdmvGapPlot Plot the Gap in an fwdmv Object

Description

Plots the minimum Mahalanobis distance among points not in the subset minus the maximum Ma-halanobis distance among points in the subset the (m+1)th ordered Mahalanobis distance minus themth ordered distance.

Usage

fwdmvGapPlot(x, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is returned invisibly.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fwdmvGapPlot(fondi.1)

30 fwdmvMinmaxPlot

fwdmvMinmaxPlot Plot Minimum and Maximum Distances in an fwdmv Object

Description

A two panel plot. The first panel shows the maximum Mahalanobis distance among units in thesubset and mth ordered Mahalanobis distance. The second panel shows the (m+1)th ordered Maha-lanobis distance and the minimum Mahalanobis among units in the complement of the subset.

Usage

fwdmvMinmaxPlot(x, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is invisibly returned.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fwdmvMinmaxPlot(fondi.1)

fwdmvPairsPlot 31

fwdmvPairsPlot A Pairs-like Plots of an fwdmv Object

Description

For initial fwdmv objects this function produces a plot almost identical to the functionpairs . Iftentative groups have been assigned (either through the use of thepartition function or directlyin the call to fwdmv ) then the units in each tentative group are plotted with different symbols.Additionally, the indicies of the unassigned units are plotted.

Usage

fwdmvPairsPlot(x)

Arguments

x anfwdmv object.

Value

an empty list is returned invisibly.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)

g1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 53, 55, 56)

g2 <- c(57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103)

fondi.fwdmv <- fwdmv(fondi.dat, groups = list(g1, g2))

fwdmvPairsPlot(fondi.fwdmv)

32 fwdmvPartitionPlot

fwdmvPartitionPlot Plot a Partitioned fwdmv Object

Description

Produces a plot of the Mahalanobis distances similar to that produced byfwdmvDistancePlot .The trajectories for units assigned to tentative groups are not drawn. Instead, the median distance isdrawn for each group in thefwdmv object.

Usage

fwdmvPartitionPlot(x, pts = NULL, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

pts optionally includex$pts to draw the segment selected inpartition .

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is invisibly returned.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

partition , fwdmv , fwdmv.object

Examples

# data(fondi.dat)# fondi.init <- fwdmv.init(fondi.dat)# p1 <- partition(fondi.init)

# draw a line segment intersecting several trajectories ## view the allocation #

# fwdmvPartitionPlot(p1)

fwdmvPrePlot 33

fwdmvPrePlot Pairs Plot in Rfwdmv

Description

This function produces a pairs plot with sumperimposed contours.

Usage

fwdmvPrePlot(X, panel = panel.be, plot.diagonal = TRUE)

Arguments

X a matrix or data frame.

panel a function for computing the contours. The Rfwdmv package includespanel.befor bivariate ellipses,panel.me for median centered bivariate ellipses, andpanel.bb for bivariate box plots.

plot.diagonala logical value. IfTRUEthen univariate boxplots are drawn along the diagonal.

Value

an empty list is invisibly returned.

Note

The bivariate boxplots calculated from B-splines provide a useful tool for a preliminary examinationof the data. The non elliptical shape of the countours is an indication or non normality.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

panel.be , panel.me , panel.bb

Examples

data(heads.dat)

fwdmvPrePlot(heads.dat, panel = panel.be)fwdmvPrePlot(heads.dat, panel = panel.me)fwdmvPrePlot(heads.dat, panel = panel.bb)

34 fwdmvPrincompPlot

fwdmvPrincompPlot Plot the Principal Components from an fwdmv Object

Description

Plots (for each group in the fwdmv object) the principal components against the subset size.

Usage

fwdmvPrincompPlot(x, psfrag.labels = FALSE)

Arguments

x anfwdmv object.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Value

an empty list is returned invisibly.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fwdmvPrincompPlot(fondi.1)

fwdmvQuantilePlot 35

fwdmvQuantilePlot Plot Trajectories over Quantiles of the Distances

Description

Produces a 6 panel plot. The first panel contains the Mahalanobis distances for the units in thesubset of sizesubset.size . The backgrounds of the 5 remaining plots are quantiles of thesedistances. The dark lines in the 5 remaining plots are the trajectories of the units entering the subsetduring each of the next 5 steps of the forward search.

Usage

fwdmvQuantilePlot(x, subset.size, probs = "default", page = 1)

Arguments

x anfwdmv object.

subset.size an integer specifying (by its size) the subset to be used in the first panel.

probs an ordered numeric vector of probabilities used by the quantile function.

page an integer specifying the page. Whenpage == 1 the trajectories for the next 5steps are drawn, whenpage == 2 the next 5 steps are skipped and trajectoriesfor the 5 following step are drawn, etc.

Details

whenprobs == "default" the following quantiles are computed: 0.025, 0.05, 0.125, 0.25,0.50, 0.75, 0.875, 0.95, 0.975.

Value

an empty list is invisibly returned.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)fwdmvQuantilePlot(fondi.1, subset.size = 45)

36 fwdtr

fwdtr Maximum likelihood estimates of transformation parameters

Description

This function computes maximum likelihood estimates of transformation parameters. It uses themultivariate version of the parametric family of power transformations introduced by Box and Cox(1964).

Usage

fwdtr(X, bsb = ellipse.subset, n.bsb, lambda = 1, one.lambda = FALSE, col.to.transform = "all", boundaries = c(-3, 3))

Arguments

X a matrix or data frame containing a multivariate data set.

bsb usually a function of two variables: a matrixX containing the multivariate dataand the number of units in the initial subsetn.bsb . Alternatively, the initialsubset my be specified directly by providing an integer vector containing theindices of the units to include in the initial subset.

n.bsb the percentage of units forming the initial subset. For examplen.bsb = 40implies that we start the search usingas.integer(nrow(X) * 0.4) units.The default isn.bsb = 50 .

lambda a scalar or ak x 1 vector containing set of transformation parameters. Theordering of Mahalanobis distances at each step of the forward search uses vari-ables transformed withlambda . If lambda is a scalar all the variables incol.to.transform are transformed, for ordering Mahalanobis distances,using the commom supplied value.

one.lambda a logical value. IfTRUEa common valuelambda is estimated for all variablesspecified incol.to.transform .

col.to.transforma k x 1 integer vector specifying the variables which must be transformed.If col.to.transform = "all" all variables (columns of matrixX) areconsidered for transformation.

boundaries the upper and lower bounds for the estimates of the values of the transformationparameters.

Details

The analysis of data can often be improved by using transformed variables rather than the originalvariables themselves. There are physical reasons why a transformation might be expected to behelpful in some examples. If the data arise from a counting process, they often have a Poissondistribution and the square root transformation will provide observations with an approximatelyconstant variance, independent of the mean. Similarly, concentrations are nonnegative variablesand so cannot strictly be subject to additive errors of constant variance. Unfortunately the estimatedtransformation and related test statistics may be sensitive to the presence of one, or several, outliers.With this function we use the forward search to see how estimates of the transformation parametersevolve as we move through the ordered data. If a correct value oflambda has been found theparameter estimates will be stable until near the end of the search, where any outliers start to enter.

fwdtr.object 37

Value

a list with classfwdtr .

Author(s)

Fabrizio Laurini

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdtr.object , profile.fwdtr.object

Examples

## Forward search on untrasformed datadata(mussels.dat)l.mle <- fwdtr(mussels.dat)fwdtrMlePlot(l.mle)

## Forward search on transformed data as specified in vector lamdbadata(mussels.dat)l.mle <- fwdtr(mussels.dat,lambda = c(1, 0.5, 1, 0, 1/3))fwdtrMlePlot(l.mle)

## estimate a common value of lambda for all the variables and use 1/3 to order Mahalanobis distances in each step of the searchdata(mussels.dat)l.mle <- fwdtr(mussels.dat, lambda = 1/3, one.lambda = TRUE)fwdtrMlePlot(l.mle)

### Test variables 2 and 5### The forward is based on untransformed data for variables 1, 3 and 4### sqrt for variable 2 and third root for variable 5l.mle <- fwdtr(mussels.dat, lambda = c(0.5, 1/3), col.to.transform = c(2, 5))fwdtrMlePlot(l.mle)

fwdtr.object fwdtr.object

Description

An object containing estimates of transformation parameters on multivariate data. The class at-tribute is set tofwdtr .

Format

A fwdtr.object is a list with the following elements:

call the matched call.

Unit a list containing the units forming the subset in each step of the search.

38 fwdtr.test

n an integer, the number of units in the data.

p an integer, the number of variables in the data.

m an integer, the number of units in the initial subset.

data a numeric matrix containing the data, the dimnames attribute is set toNULL.

data.H0 a numeric matrix containing the data transformed using lambda under the null hypotehsis.

data.name the name of the data frame or matrix containing the multivariate data set.

data.namesa list of character vectors containing the row and column names of the data.

Mle a matrix containing the estimates of transformation parameters in each step of the forwardsearch. The number of columns of matrix Mle is equal to the length of vectorcol.to.transformor is equal to 1 ifone.lambda = TRUE .

H0 k x 4 matrix. The first column contains the values of lambda (transformation paramters) whichhave been used to transform the variables to order Mahalanobis distances during the forwardsearch. The second column contains the integers associated to the columns for which wecomputed maximum likelihood estimates. The third and fourth column contain respectivelythe constrained lower and upper bounds of the estimates of the values of the transformationparameters.

forced.onepar Logical value. ifTRUEa common value of lambda is estimated for all variablesspecified incol.to.transform .

dof scalar integer value containing the degrees of freedom to be used in the likelihood ratio test.

Details

Classfwdtr object is created by functionfwdtr .

See Also

fwdmv.init fit an ititial multivariate forward search.

fwdmv fit a multivariate forward search with user specified tentative groups.

Examples

data(mussels.dat)l.mle <- fwdtr(mussels.dat)

#l.mle is a fwdtr object.

fwdtr.test Multivariate Fan Plot

Description

Confirmatory signed square root likelihood ratio tests of a suggested transformation for specifiedvariables around a series of values of lambda.

Usage

fwdtr.test(X, parameters, n.bsb = 50, col.to.compare = "all", lambda.around = c(-1, -0.5, 0, 0.5, 1), one.lambda = FALSE)

fwdtr.test 39

Arguments

X a matrix of dimension n x p, or data frame containing a multivariate data set.

parameters a vector of length p=ncol(X) specifying a reasonable set of transformations forthe columns of the multivariate data set.

n.bsb the percentage of units forming the initial subset. For examplen.bsb = 40implies that we start the search usingas.integer(nrow(X)*0.4) units.The default isn.bsb=50 .

col.to.comparea k x 1 integer vector specifying the variables for which likelihood ratio testshave to be produced. For example, ifcol.to.compare = c(2, 4) , thesigned likelihood ratio tests are produced for the second and the fourth columnof matrix X. If col.to.compare = "all" the all variables (columns ofmatrixX) are considered.

lambda.arounda numeric vector specifying for which values of lambda to compute the like-lihood ratio test. If this argument is omitted, the function produces for eachvariable specified incol.to.compare the likelihood ratio tests associated tothe five most common values of lambda(-1, -0.5, 0, 0.5, 1) .

one.lambda a logical value. IfTRUEa common valuelambda is tested for all variablesspecified incol.to.transform .

Details

This function produces confirmatory tests of a suggested transformation. We expand each trans-formation parameter in turn around the five common values of lambda (-1, -0.5, 0, 0.5, 1), usingthe values of the vectorparameters for transforming the remaining variables of the data set. Inthis way we turn a multivariate problem into a series of univariate ones. In each search we can testthe transformation by comparing the likelihood ratio test with a chisquare on 1 degree of freedom.We use the signed square root of the likelihood ratio test in order to learn whether lower or highervalues of lambda are indicated. The plot is thus a version of the fan plot for multivariate data.

Value

a list with classfwdtr.test .

Author(s)

Fabrizio Laurini

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdtr.object , profile.fwdtr.object

40 fwdtr.test.object

Examples

data(mussels.dat)

## reasonable values of transformation parameters for the data set

lambda.R <- c(0.5, 0, 0.5, 0, 0)l.rat <- fwdtr.test(mussels.dat, lambda.R)plot.fwdtr.test(l.rat)

## Produce a fan plot for columns 2 and 4 of dataset mussels.dat

l.rat <- fwdtr.test(mussels.dat, parameters = lambda.R, col.to.compare = c(2,4))plot.fwdtr.test(l.rat)

## reasonable values of transformation parameters for the data set

lambda.R <- c(0.5,0,0.5,0,0)lambda.around <- c(0,1/3,0.5)

## Produce a fan plot for column 2 of dataset mussels.dat## The values of lambda which are tested are log, third root## and square root

l.rat <- fwdtr.test(mussels.dat, parameters = lambda.R, col.to.compare = 2, lambda.around = lambda.around)plot.fwdtr.test(l.rat)

fwdtr.test.object fwdtr.test.object

Description

An object containing the values of the signed likelihood ratio test estimates of transformation pa-rameters for selected variables and selected values of lambda as specified in function fwdtr.test. Theclass attribute is set tofwdtr.test .

Format

A fwdtr.test.object is a list with the following elements:

call the matched call.

test a list containing a collection of matrices. Each matrix has a number of columns equal to thelength of vector lambda.around. The first matrix contains the values of the signed sqrt likeli-hood ratio test for the first element in vector col.to.compare supplied in function lambda.test.fwdmv.More in detail, the first column contains the values of the signed sqrt likelihood ratio test as-sociated with first element of lambda.around. The second matrix contains the values of thesigned sqrt likelihood ratio test for the second element in vector col.to.compare, ecc...

n an integer, the number of units in the data.

m an integer, the number of units in the subset in the first step of the forward search.

col.namesa character vector containing the names of the columns of the data set.

col.to.compare integer vector containing the numbers associated to the variables for which the fanplot has been produced.

fwdtrLrPlot 41

lambda.around Vector specifying for which values of lambda the signed sqrt likelihood ratio hasbeen computed.

messageMessage that warn the user on the number of times thatoptim failed to converge prop-erly. If NULL there are no convergence problem. Seeoptim for details.

Details

Classfwdtr.test object is created by functionfwdtr.test .

See Also

fwdtr computes maximum likelihood estimates of transformation parameters in each step of theforward search.

Examples

data(mussels.dat)

## reasonable values of transformation parameters for the data set## found using procedure fwdtr

lambda.R <- c(0.5, 0, 0.5, 0, 0)l.rat <- fwdtr.test(mussels.dat, lambda.R, col.to.compare = 1:5)

## l.rat is a fwdtr.test object

plot.fwdtr.test(l.rat)

fwdtrLrPlot Plot method for fwdtr objects

Description

Produces a plot of the likelihood ratio of transformation parameters during all steps of the forwardsearch.

Usage

fwdtrLrPlot(x, psfrag.labels = FALSE)

Arguments

x a fwdtr object.psfrag.labels

a logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Details

The horizontal lines drawn in the plot refer to the 95% and 99% quantiles of the associated chisquaredistribution.

42 fwdtrMlePlot

Value

an empty list is invisibly returned.

Author(s)

Fabrizio Laurini

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdtr.object , profile.fwdtr.object

Examples

## Forward search on untrasformed data

data(mussels.dat)l.mle <- fwdtr(mussels.dat)

## plot the likelihood ratio test

fwdtrLrPlot(l.mle)

fwdtrMlePlot Plot method for fwdtr objects

Description

Produces a plot of maximum likelihood estimates of transformation parameters during all steps ofthe forward search.

Usage

fwdtrMlePlot(x, psfrag.labels = FALSE)

Arguments

x a fwdtr object.psfrag.labels

a logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

Details

Estimates that vary wildly are associated with variables that do no have to be transformed.

Value

an empty list is invisibly returned

fwdtrProfilePlot 43

Author(s)

Fabrizio Laurini

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdtr.object

Examples

## Forward search on untrasformed data

data(mussels.dat)l.mle<-fwdtr(mussels.dat)

## Plot maximum likelihood estimates of the trasnformation parameters## in each step of the forward search

fwdtrMlePlot(l.mle)

## Test variables 2 and 5## The forward is based on untransformed data for variables 1, 3 and 4## sqrt for variable 2 and third root for variable 5

l.mle <- fwdtr(mussels.dat, lambda = c(0.5, 1/3), col.to.transform = c(2,5))

## plot trajectories of maximum likelihood estimates of transformation parameters## for variables 2 and 5

fwdtrMlePlot(l.mle)

fwdtrProfilePlot Plot method for profile.fwdtr objects

Description

Produces a plot of profile loglikelihoods of transformation parameters in a particular step of theforward search

Usage

fwdtrProfilePlot(x, psfrag.labels = FALSE)

Arguments

x aprofile.fwdtr object.psfrag.labels

a logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

44 fwdtrProfilePlot

Details

This plot shows which variables show a sharp definition of the estimates of the transformationparameters and which are the variables whose value of lambda is not very well determined.

Value

an empty list is invisibly returned.

Author(s)

Fabrizio Laurini

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdtr.object

Examples

data(mussels.dat)

## Forward search on untransformed data## Compute max lik. estimates of tramsformation parameters

l.mle <- fwdtr(mussels.dat)

## Compute profile loglikelihoods for transformation parameters in the## last step of the search and create an object of class profile.fwdtr

l.profile.mle <- profile.fwdtr(l.mle)

## plot the profile loglikelihoods of transformation parameters for## each variable

fwdtrProfilePlot(l.profile.mle)

## Test variables 2 and 5## The forward is based on untransformed data for variables 1, 3 and 4## sqrt for variable 2 and third root for variable 5

l.mle <- fwdtr(mussels.dat, lambda = c(0.5, 1/3), col.to.transform = c(2, 5))

## build profile likelihood for transformed variables.## Profile function takes into account that we have also untransformed variables.

l.profile.mle <- profile.fwdtr(l.mle)fwdtrProfilePlot(l.profile.mle)

heads.dat 45

heads.dat Swiss heads data

Description

These data contain six dimensions in millimetres of the heads of 200 Swiss soldiers.

Usage

data(heads.dat)

Format

A data frame with 200 observations on the following 6 variables.

y1 a numeric vector, the minimal frontal breadth.

y2 a numeric vector, the breadth of angulus mandibulae.

y3 a numeric vector, the true facial height.

y4 a numeric vector, the length from glabella to apex nasi.

y5 a numeric vector, the length from tragion to nasion.

y6 a numeric vector, the length from tragion to gnathion.

Details

The data were collected to determine the variability in size and shape of heads of young men inorder to help in the design of a new protection mask for the Swiss army.

Source

The data are described by Flury and Riedwyl (1998), p. 218 and also by Flury (1997), p. 6.

References

Atkinson, Riani and Cerioli (2004), p. 592-593; http://www.riani.it/arc.

Examples

data(heads.dat)

46 mcd.subset

iris.dat Iris data

Description

The data contain measurements on three species of iris. The species are: 1) Iris setosa, 2) Irisversicolor, and 3) Iris virginica. Four measurements of characteristic dimensions of the flowerswere made on fifty flowers from each species.

Usage

data(iris.dat)

Format

A data frame with 150 observations on the following 4 variables.

y1 a numeric vector, the sepal length.

y2 a numeric vector, the sepal width.

y3 a numeric vector, the petal length.

y4 a numeric vector, the petal width.

Source

The data were published by Anderson (1935) from measurements taken on plants in the GaspePeninsula, Quebec. The three species are blue-flowered water loving irises, or flags, similar to theEuropean yellow flag. The data were analysed by Fisher (1936) as an example of discriminantanalysis and are often known as "Fisher’s Iris data".

References

Atkinson, Riani and Cerioli (2004), p. 576-578; http://www.riani.it/arc.

Examples

data(iris.dat)

mcd.subset Initial Subset by MCD Distances

Description

This function uses the robust distances obtained fromcov.mcd to compute the initial subset forthe multivariate forward search. It is not intended that users should call this function directly.

Usage

mcd.subset(X, size)

milk.dat 47

Arguments

X a numeric matrix containing the multivariate data set.

size an integer specifying the size of the initial subset.

Details

The functioncov.mcd in packagelqs is used to robustly estimate the center and covariancematrix of X. Robust Mahalanobis distances are computed and the intial subset is taken to be thesize units with the smallest robust distances.

Value

an integer vector with lengthsize containing the initial subset.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

Examples

data(fondi.dat)

fondi.fwdmv <- fwdmv(fondi.dat, bsb = mcd.subset)

milk.dat Milk data

Description

These data contain measurements on the composition of 85 samples of milk.

Usage

data(milk.dat)

Format

A data frame with 85 observations on the following 8 variables.

y1 a numeric vector, the density.

y2 a numeric vector, the fat content in grams/litre.

y3 a numeric vector, the protein content in grams/litre.

y4 a numeric vector, the casein content in grams/litre.

y5 a numeric vector, the cheese dry substance measured in the factory in grams/litre.

y6 a numeric vector, the cheese dry substance measured in the laboratory in grams/litre.

y7 a numeric vector, the milk dry substance in grams/litre.

y8 a numeric vector, the cheese produced in grams/litre.

48 ms.dat

Details

The scatter plot matrix shows in several panels a strong rising diagonal structure.

Source

Daudin, Duby and Trecourt (1988).

References

Atkinson, Riani and Cerioli (2004), p. 571-572; http://www.riani.it/arc.

Examples

data(milk.dat)pairs(milk.dat)

ms.dat Muscular dystrophy data

Description

These data refer to Duchenne muscular dystrophy (DMD), a genetically transmitted disease passedfrom a mother to her children. Affected male offspring may unknowingly carry the disease butfemale offspring with the disease die at a young age. Although carriers of DMD usually have nophysical symptoms, they tend to exhibit elevated levels of serum markers. In addition the levels ofthese enzymes may also depend on age and season. Levels of the enzymes were measured in noncarriers and in a group of carriers using standard laboratory procedures.

Usage

data(ms.dat)

Format

A data frame with 194 observations on the following 6 variables.

y1 a numeric vector, the age.

y2 a numeric vector, the month of the year.

y3 a numeric vector, the level of creatine kinase.

y4 a numeric vector, the level of hemopexin.

y5 a numeric vector, the level of lactate dehydrogenase.

y6 a numeric vector, the level of pyruvate kinase.

Details

The first two serum markers,y3 andy4 , may be measured rather inexpensively from frozen serum.The second twoy5 andy6 , require fresh serum. An important scientific problem is whether use ofthe expensive second pair of readings causes an appreciable increase in the detection rate.

mssmall.dat 49

Source

Andrews and Herzberg (1985), pp. 223-228.

References

Atkinson, Riani and Cerioli (2004), p. 581-585; http://www.riani.it/arc.

Examples

data(ms.dat)

mssmall.dat Muscular dystrophy data (small)

Description

These data are referred to Duchenne muscular dystrophy (DMD), a genetically transmitted diseasepassed from a mother to her children. Affected male offspring may unknowingly carry the diseasebut female offspring with the disease die at a young age. Although carriers of DMD usually haveno physical symptoms, they tend to exhibit elevated levels of serum markers. In addition the levelsof these enzymes may also depend on age and season. Levels of the enzymes were measured in noncarriers and in a group of carriers using standard laboratory procedures.

Usage

data(mssmall.dat)

Format

A data frame with 73 observations on the following 6 variables.

y1 a numeric vector, the age.

y2 a numeric vector, the month of the year.

y3 a numeric vector, the level of creatine kinase.

y4 a numeric vector, the level of hemopexin.

y5 a numeric vector, the level of lactate dehydrogenase.

y6 a numeric vector, the level of pyruvate kinase.

Details

The first two serum markers,y3 andy4 , may be measured rather inexpensively from frozen serum.The second two,y5 andy6 , require fresh serum. An important scientific problem is whether useof the expensive second pair of readings causes an appreciable increase in the detection rate.

Source

Rencher (1995), p. 170.

References

Atkinson, Riani and Cerioli (2004), p. 581-585; http://www.riani.it/arc

50 mussels.dat

Examples

data(mssmall.dat)

mussels.dat Horse mussels

Description

These data contain 82 observations on Horse mussels from New Zealand.

Usage

data(mussels.dat)

Format

A data frame with 82 observations on the following 5 variables.

width a numeric vector, the shell length in mm.

height a numeric vector, the shell width in mm.

length a numeric vector the shell height in mm.

shell a numeric vector the shell mass in grams.

mass a numeric vector, the muscle mass in grams.

Details

In this example the effect of outliers is masked unless the search uses a suitable transformation.

Source

The data were introduced by Cook and Weisberg (1994), p. 161 who treat them as regression withmuscle mass, the edible portion of the mussel, as response.

References

Atkinson, Riani and Cerioli (2004), p. 568-569; http://www.riani.it/arc.

Examples

data(mussels.dat)

panel.bb 51

panel.bb Bivariate Box Plots Panel Function

Description

This function is intended to be used in the panel argument offwdmvPrePlot . It computes abivariate boxplot for the given panel.

Usage

panel.bb(x, y, scale = 1)

Arguments

x a numeric vector.

y a numeric vector with the same length asx .

scale a positive numeric value to scale the boxplot contour.

Value

a list with elements x and y giving the points of the computed contour.

Author(s)

Kjell Konis

See Also

fwdmvPrePlot

Examples

data(heads.dat)

fwdmvPrePlot(heads.dat, panel = panel.bb)

panel.be Bivariate Ellipse Panel Function

Description

This function is intended to be used in the panel argument offwdmvPrePlot . It computes abivariate ellipse for the given panel.

Usage

panel.be(x, y, scale = 1)

52 panel.me

Arguments

x a numeric vector.

y a numeric vector with the same length asx .

scale a positive numeric value to scale the ellipse.

Value

a list with elements x and y giving the points of the ellipse.

Author(s)

Kjell Konis

See Also

fwdmvPrePlot

Examples

data(heads.dat)

fwdmvPrePlot(heads.dat, panel = panel.be)

panel.me Bivariate Ellipse Panel Function

Description

This function is intended to be used in the panel argument offwdmvPrePlot . It computes abivariate ellipse centered at the median for the given panel.

Usage

panel.me(x, y, scale = 1)

Arguments

x a numeric vector.

y a numeric vector with the same length asx .

scale a positive numeric value to scale the ellipse.

Value

a list with elements x and y giving the points of the ellipse.

Author(s)

Kjell Konis

See Also

fwdmvPrePlot

partition 53

Examples

data(heads.dat)

fwdmvPrePlot(heads.dat, panel = panel.me)

partition Interactive Group Assignment

Description

This function produces a plot of the Mahalanobis distances versus subset size similar to that pro-duced byfwdmvDistancePlot . The function then waits for the user to draw a line segment onthe plot (which is done with two consecutive mouse clicks). The units whose trajectories cross thisline segment are assigned to the group specified by thegroup argument.

Usage

partition(x, group = "next")

Arguments

x anfwdmv object.

group an integer value specifying which group the units corresponding to the selectedtrajectories should be assigned. Possible values are1 to n.groups + 1wheren.groups is the number of groups in thefwdmv object. The default"next" is to assign units to a new group.

Value

an fwdmv object with thegroups element and theunassigned element updated according tothe description.

Author(s)

Kjell Konis

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdmv.object , fwdmv , fwdmvPartitionPlot

Examples

# data(fondi.dat)# fondi.init <- fwdmv.init(fondi.dat)# p1 <- partition(fondi.init)

# draw a line segment intersecting several trajectories ## and view the allocation #

54 plot.fwdmv

plot.fwdmv Plot method for fwdmv objects.

Description

This function is the generic plot method forfwdmv objects. It allows the user to select from asubset of the plotting functions in the Rfwdmv package.

Usage

plot.fwdmv(x, ...)

Arguments

x an fwdmv object.

... these arguments will be passed on to the selected plot function.

Details

Plot functions that require additional arguments must be called directly and are thus not availablefrom this function: seefwdmv.object for a list of all available plot methods.

Value

the value returned by the selected plot.

Author(s)

Kjell Konis

See Also

fwdmv , fwdmvChangePlot , fwdmvCovariancePlot , fwdmvDeterminantPlot , fwdmvDistancePlot ,fwdmvPrincompPlot , fwdmvEntryPlot , fwdmvGapPlot , fwdmvMinmaxPlot , fwdmvPairsPlot

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)plot(fondi.1)

plot.fwdtr.test 55

plot.fwdtr.test Plot method for fwdtr.test

Description

This function is the generic plot method for fwdtr.test objects. It produces a series of fan plots for aset of variables. It allows the user to select for which variables to view the fan plot.

Usage

plot.fwdtr.test(x, psfrag.labels = FALSE, ...)

Arguments

x a fwdtr.test object.

psfrag.labelsa logical value. IfTRUEthen the x, y, and main labels are set to "xlab", "ylab",and "main" for replacement via the psfrag utility.

... these arguments will be passed on to the selected plot.

Details

It is also possible to have the confirmatory fan plots for each variable as panels in a single page

Value

the value returned by the selected plot.

Author(s)

Fabrizio Laurini

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

fwdtr.object , profile.fwdtr.object

56 profile.fwdtr

print.fwdmv Print method for fwdmv objects.

Description

Displays a brief summary of the forward search contained in an fwdmv object.

Usage

print.fwdmv(x, ...)

Arguments

x anfwdmv object.

... additional (and unused) arguments.

Value

x is invisibly returned.

Author(s)

Kjell Konis

See Also

fwdmv

Examples

data(fondi.dat)fondi.1 <- fwdmv(fondi.dat)print(fondi.1) ## Equivalent to fondi.1

profile.fwdtr Profile log-likelihood estimates of transformation parameters

Description

This function computes the profile loglikelihoods of transformation parameters.

Usage

profile.fwdtr(fitted, step.fwd = NULL, conf = 0.95, bounds = NULL, ...)

profile.fwdtr 57

Arguments

fitted an object of classfwdtr . This object can be created by functionfwdtr .

step.fwd an integer value. The step of the forward search for which profile loglikelihoodsmust be created. Ifstep.fwd = NULL the profile loglikelihood is computedfor the last step of the forward search.

conf scalar between 0 and 1 which defines the marginal confidence interval of lambdafor each variable. The default is 95% confidence interval.

bounds a 2 x 1 numeric vector containing the lower and upper limit of the x axisfor each profile loglikelihood. The default is the value ofboundaries infwdtr.object .

... Further parameters for profile method

Details

In order to compute the profile log-likelihoods the values of the parameters which are not beingvaried are kept at their maximum likelihood estimates. The loglikelihoods are roughly parabolicclose to zero although not necessary log concave further away from the maximum. The confidenceinterval for each value of lambda is based on the asymptotic chi-square distribution of twice theloglikelihood ratio.

Value

an object of classprofile.fwdtr .

Author(s)

Fabrizio Laurini

References

Atkinson, A. C., Riani, M. and Cerioli, A. (2004) Exploring Multivariate Data with the ForwardSearch. Springer-Verlag New York.

See Also

profile.fwdtr.object , fwdtrProfilePlot

Examples

data(mussels.dat)

## Forward search on untransformed data## Compute max lik. estimates of tramsformation parameters

l.mle <- fwdtr(mussels.dat)

## Compute profile loglikelihoods for transformation parameters in the## last step of the search and create an object of class profile.fwdtr

l.profile.mle <- profile.fwdtr(l.mle)

## plot the profile loglikelihoods of transformation parameters for## each variable

58 profile.fwdtr.object

fwdtrProfilePlot(l.profile.mle)

## Forward search on transformed data as specified in vector lamdba

data(mussels.dat)l.mle <- fwdtr(mussels.dat, lambda = c(1, 0.5, 1, 0, 1/3))

## Compute profile loglikelihoods for transformation parameters in the## last step of the search and create an object of class profile.fwdtr

l.profile.mle <- profile.fwdtr(l.mle)

## plot the profile loglikelihoods of transformation parameters for## each variable

fwdtrProfilePlot(l.profile.mle)

## estimate a common value of lambda for all the variables and use 1/3## to order Mahalanobis distances in each step of the search

data(mussels.dat)l.mle <- fwdtr(mussels.dat, lambda = 1/3, one.lambda = TRUE)l.profile.mle <- profile.fwdtr(l.mle)fwdtrProfilePlot(l.profile.mle)

## Test variables 2 and 5## The forward is based on untransformed data for variables 1, 3 and 4## sqrt for variable 2 and third root for variable 5

l.mle <- fwdtr(mussels.dat, lambda = c(0.5, 1/3), col.to.transform = c(2, 5))l.profile.mle <- profile.fwdtr(l.mle)fwdtrProfilePlot(l.profile.mle)

profile.fwdtr.objectprofile.fwdtr.object

Description

An object containing the values of the profile log-likelihood for each transformation parameterspecified in functionfwdtr . The class attribute is set toprofile.fwdtr .

Format

A profile.fwdtr.object is a list with the following elements:

call the matched call.

lambda a sequence of values frombounds[1] to bounds[2] with step 0.1. This vector willcontain the x-coordinates of the profile log-likelihoods

profile a list containing the values of the profile log-likelihoods for each variable. First element ofthe list is associated with first variable and so on.

quality.dat 59

ci a list containing the lower and upper values of the confidence intervals for each variable. Firstelement of the list is associated with first variable and so on.

x.names a character vector containing the names of the columns which are investigated for trans-formation.

step.fwd a scalar containing the step of the forward search for which the profile log-likelihoods arecomputed.

onepar logical. If TRUEa common valuelambda has been used for estimation

p an integer, the number of variables in the data.

Mle a vector containing the maximum likelihood estimates of the transformation parameters at stepstep.fwd of the forward search.

Details

Classprofile.fwdtr object is created by functionprofile.fwdtr

See Also

fwdtr computes maximum likelihood estimates of transformation parameters in each step of theforward search.

fwdtrProfilePlot Plot profile loglikelihoods of transformation parameters in a selected stepof the forward search

Examples

data(mussels.dat)

## Forward search on untrasformed data## Compute max lik. estimates of tramsformation parameters

l.mle<-fwdtr(mussels.dat)

## Compute profile loglikelihoods for transformation parameters in the last step of the search## and create an object of class profile.fwdtr

l.profile.mle <- profile.fwdtr(l.mle)

## plot the profile loglikelihoods of transformation parameters for each variable

fwdtrProfilePlot(l.profile.mle)

quality.dat Quality of life data

Description

Indices of the quality of life in the provinces of Italy.

Usage

data(quality.dat)

60 record.dat

Format

A data frame with 103 observations on the following 6 variables.

y1 a numeric vector, the average amount of bank deposits per inhabitant.

y2 a numeric vector, the number of robberies per 100,000 inhabitants.

y3 a numeric vector, the number of housebreakings per 100,000 inhabitants.

y4 a numeric vector, the number of suicides, committed or attempted, per 100,000 inhabitants.

y5 a numeric vector, the number of gyms per 100,000 inhabitants.

y6 a numeric vector, the average expenditure on theatre and concerts per inhabitant.

Details

The data are not provided on the original variables listed above, but rather on a scaled versionof them. Scaling is performed by dividing each response value by the maximum reading for thatresponse.

Source

These data have been published on the Italian financial newspaper Il Sole 24 ore 2001.

References

Atkinson, Riani and Cerioli (2004), p. 573-575; http://www.riani.it/arc.

Examples

data(quality.dat)

record.dat Track rercords for women

Description

This data set contains women’s athletic records for 55 countries.

Usage

data(record.dat)

Format

A data frame with 55 observations on the following 7 variables.

y1 a numeric vector, 100 metre record in seconds.

y2 a numeric vector, 200 metre record in seconds.

y3 a numeric vector, 400 metre record in seconds.

y4 a numeric vector, 800 metre record in minutes.

y5 a numeric vector, 1500 metre record in minutes.

y6 a numeric vector, 3000 metre record in minutes.

y7 a numeric vector, marathon record in minutes.

sixtyeighty.dat 61

Details

These data are taken from a handbook prepared for the 1984 Olympic games in Los Angeles. Theytherefore come from an interesting period in the history of women’s athletics, when there were, nowauthenticated, allegations about the treatment of female athletes, especially in communist countries,with male sex hormones.

Source

Johnson and Wichern (1997), pp. 44-45.

References

Atkinson, Riani and Cerioli (2004), p. 558-559; http://www.riani.it/arc.

Examples

data(record.dat)

sixtyeighty.dat 60:80 data

Description

An example with two clusters. The first 80 units form a rather diffuse group, while the remaining60 units form a tight cluster.

Usage

data(sixtyeighty.dat)

Format

A data frame with 140 observations on the following 2 variables.

y1 a numeric vector.

y2 a numeric vector.

Details

When we fit a single model to these data, neither standard classical methods nor very robust meth-ods yield Mahalanobis distances which unambiguously show that there are two different groups ofobservations.

Source

Atkinson, Riani and Cerioli (2004), p. 586-587; http://www.riani.it/arc.

Examples

data(sixtyeighty.dat)

62 threetwo.dat

threetwo.dat Three clusters two outliers

Description

These data contain again the two clusters of the 60:80 data, but now with the addition of a thirdcluster, units 141-158 and two outliers, units 159 and 160. The sizes of the groups are therefore 80,60, 18 and 2

Usage

data(threetwo.dat)

Format

A data frame with 160 observations on the following 2 variables.

y1 a numeric vector.

y2 a numeric vector.

Details

The scatter plot shows that the second compact cluster of 18 observations is near the longer axis ofthe dispersed cluster of 80. The two outliers are together, approximately across the centroid of thedispersed group from the cluster of 60.

Source

Atkinson, Riani and Cerioli (2004), p. 588-589; http://www.riani.it/arc.

Examples

data(threetwo.dat)

Index

∗Topic datasetsbaby.dat , 2bank.dat , 3bridge.dat , 6diabetes.dat , 6dyestuff.dat , 7electrodes.dat , 10emilia.dat , 11fondi.dat , 13fwdmv.object , 17fwdtr.object , 36fwdtr.test.object , 39heads.dat , 44iris.dat , 45milk.dat , 46ms.dat , 47mssmall.dat , 48mussels.dat , 49profile.fwdtr.object , 57quality.dat , 58record.dat , 59sixtyeighty.dat , 60threetwo.dat , 61

∗Topic hplotfwdmvChangePlot , 19fwdmvConfirmPlot , 20fwdmvCovariancePlot , 21fwdmvDeterminantPlot , 22fwdmvDistancePlot , 23fwdmvEccentricityPlot , 24fwdmvEigenvectorPlot , 25fwdmvEllipsePlot , 26fwdmvEntryPlot , 27fwdmvGapPlot , 28fwdmvMinmaxPlot , 29fwdmvPairsPlot , 30fwdmvPartitionPlot , 31fwdmvPrePlot , 32fwdmvPrincompPlot , 33fwdmvQuantilePlot , 34fwdtrLrPlot , 40fwdtrMlePlot , 41fwdtrProfilePlot , 42

panel.bb , 50panel.be , 50panel.me , 51partition , 52plot.fwdmv , 53plot.fwdtr.test , 54

∗Topic iplotfwdmvCovariancePlot , 21fwdmvDistancePlot , 23partition , 52

∗Topic methodseigenvectors.fwdmv , 9print.fwdmv , 55

∗Topic multivariatefwdmv , 14fwdmv.init , 15fwdtr , 35fwdtr.test , 37profile.fwdtr , 55

∗Topic utilitiesassign.groups , 1bb.subset , 4bigunit.fwdmv , 5eigenvalues.fwdmv , 8ellipse.subset , 11mcd.subset , 45

assign.groups , 1

baby.dat , 2bank.dat , 3bb.subset , 4bigunit.fwdmv , 5, 18bridge.dat , 6

diabetes.dat , 6dyestuff.dat , 7

eigenvalues.fwdmv , 8, 18eigenvectors.fwdmv , 9, 18electrodes.dat , 10ellipse.subset , 11emilia.dat , 11

fondi.dat , 13

63

64 INDEX

fwdmv , 2, 4, 5, 8, 9, 11, 14, 15, 17–31, 33, 34,37, 52, 53, 55

fwdmv.init , 4, 15, 17, 18, 37fwdmv.object , 5, 8, 9, 11, 15, 16, 17,

19–31, 33, 34, 52, 53fwdmvChangePlot , 18, 19, 53fwdmvConfirmPlot , 18, 20fwdmvCovariancePlot , 18, 21, 53fwdmvDeterminantPlot , 18, 22, 53fwdmvDistancePlot , 18, 23, 53fwdmvEccentricityPlot , 18, 24fwdmvEigenvectorPlot , 18, 25fwdmvEllipsePlot , 18, 26fwdmvEntryPlot , 18, 27, 53fwdmvGapPlot , 18, 28, 53fwdmvMinmaxPlot , 18, 29, 53fwdmvPairsPlot , 18, 30, 53fwdmvPartitionPlot , 18, 31, 52fwdmvPrePlot , 4, 32, 50, 51fwdmvPrincompPlot , 18, 33, 53fwdmvQuantilePlot , 18, 34fwdtr , 35, 37, 40, 58fwdtr.object , 36, 36, 38, 41–43, 54fwdtr.test , 37, 40fwdtr.test.object , 39fwdtrLrPlot , 40fwdtrMlePlot , 41fwdtrProfilePlot , 42, 56, 58

heads.dat , 44

iris.dat , 45

mcd.subset , 45milk.dat , 46ms.dat , 47mssmall.dat , 48mussels.dat , 49

optim , 40

panel.bb , 32, 50panel.be , 32, 50panel.me , 32, 51partition , 17, 18, 31, 52plot.fwdmv , 53plot.fwdtr.test , 54print.fwdmv , 55profile.fwdtr , 55, 58profile.fwdtr.object , 36, 38, 41, 54,

56, 57

quality.dat , 58

record.dat , 59

sixtyeighty.dat , 60

threetwo.dat , 61