mixer package for cda - - graphical display of three and four part (sub)compositions
DESCRIPTION
Matevž Bren 1,2 Vladimir Batagelj 2,3 1 University of Maribor, Slovenia [email protected] 2 Institute of Mathematics, Physics and Mechanics, Slovenia 3 University of Ljubljana, Slovenia IAMG 2005, August 21-26, Toronto, Canada. - PowerPoint PPT PresentationTRANSCRIPT
MixeR Package for CDA -- graphical display of three and four part
(sub)compositions
Matevž Bren1,2
Vladimir Batagelj2,3
1University of Maribor, [email protected]
2Institute of Mathematics, Physics and Mechanics, Slovenia 3University of Ljubljana, Slovenia
IAMG 2005, August 21-26, Toronto, Canada
IAMG 2005, August 21-26, Toronto 2
Introduction
Groundwork on Compositional Data Analysis is the book of John Aitchison from 1986 The statistical Analysis of Compositional Data.
From the book we quote: “The properties of many substances or objects, such as
gasoline, metal alloys and cakes, depend on the particular mixture, or composition, of their ingredients. The purpose of the experiments with different mixtures is to obtain some understanding of the nature and extend of the dependence of the properties on the composition. In the analysis of such experiments the composition is confined to the role of a covariate.”
IAMG 2005, August 21-26, Toronto 3
Introduction…
Examlpe 1: Glacial data set - from Aitchison (1986) 92 samples of pebbles of glacial tills sorted into four
categories red sandstone, gray sandstone, crystalline and miscellaneous. The percentages by weight of these four categories and the total pebbles counts are recorded.
RedSandstone GraySandstone Crystalline Misc Counts1 91.8 7.1 1.1 0.0 2822 88.9 10.1 0.5 0.5 368... ... ...90 15.9 83.3 0.8 0.0 24591 16.9 74.3 1.2 5.9 57592 31.4 65.9 2.7 0.0 698
“The glaciologist is interested in describing the pattern of variability of his data and whether the compositions are in any way related to abundance.”
IAMG 2005, August 21-26, Toronto 4
Introduction…
Compositions (compounds, mixtures, alloy…) can be represented with vectors of the portions of individual components. The portions are nonnegative and they have constant sum equal to 100 (percentage) or 1 (portions).
The sample space for compositions is (unit) simplex SD
For D=3 graphically represented by a ternary diagram
For D=4 graphically represented by a tetrahedron
IAMG 2005, August 21-26, Toronto 5
Introduction…
Left: three parts compositions x=(x1, x2, x3) in ternary diagram
x1 + x2 + x3 =1
Right: four part compositions x=(x1, x2, x3 , x4,) in tetrahedron
x1 + x2 + x3 + x4 =1
IAMG 2005, August 21-26, Toronto 6
Introduction…
R at http://www.r-project.org
is `GNU S' - a language and environment for statistical computing and graphics. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering...). Further extensions can be provided as packages.
IAMG 2005, August 21-26, Toronto 7
Introduction…
In 2003 we started a MixeR project - library of functions in R to support the CDA i.e. statistical analysis of mixtures:
operations on compositions perturbation and power multiplication, subcomposition with or without residuals, computing Aitchison's, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc.
graphical presentation of three and four parts (sub)compositions in ternary diagrams and tetrahedrons with additional features: barycentre, geometric mean of the data set, the percentiles and ratio lines, marking and colouring of subsets of the data set, centring of the data, notation of individual data in the set etc.
logratio transformations of compositions into real vectors that are amenable to standard multivariate statistical analysis etc.
IAMG 2005, August 21-26, Toronto 8
Compositional Data Analysis SW tools
• CoDa 1986 by John Aitchison, written in Quick Basic available with the Aitchison’s book
• CoDa upgraded by John Bacon-Shone• CoDaPack 2001 freeware SW by Santiago Thió and
Raimon Tolosana in Excel http://ima.udg.es/Recerca/EIO/inici_cat.html
• atemps in R
- by Joel Raynolds and Dean Billheimer at http://www.biostat.wustl.edu/archives/html/s-news/2003-12/msg00139.html
IAMG 2005, August 21-26, Toronto 9
Compositional data analysis SW tools…
• MixeR 2003 by Batagelj and Bren at http://vlado.fmf.uni-lj.si/pub/MixeR
• ‘compositions’ package 2005, by K. Gerald van den Boogaart and Raimon Tolosana Delgado at http://cran.r-project.org/src/contrib/Descriptions/compo
sitions.html
IAMG 2005, August 21-26, Toronto 10
Mixture class in R
The input mixture data - object m consist of m$tit the title, m$mat the data matrix,m$sum the value of the row sums, if constant and m$sta status of the mix object with values -2 - matrix contains negative elements -1 - zero row sum exists 0 - matrix contains zero elements 1 - matrix contains positive elements, rows with different row sums 2 - matrix with constant row sum 3 - normalized mixture, the row sums are equal to 1
IAMG 2005, August 21-26, Toronto 11
Mixture class in R…
Example 1: The glacial mixture object > m <- mix.Read('glacial.dat')$tit[1] "GLACIAL DATA 92 samples of pebbles of glacial tills
sorted into four categories percentages by weight"$sum[1] NA$sta[1] 0$mat RedSandstone GraySandstone Crystalline Misc1 91.8 7.1 1.1 0.02 88.9 10.1 0.5 0.5... ... ...91 16.9 74.3 1.2 5.992 31.4 65.9 2.7 0.0attr(,"class")[1] "mixture"
IAMG 2005, August 21-26, Toronto 12
The 'mix' procedures in R mix.Read(file, eps=1e-6) Reads a mix data from the file and returns a mix object.
If |m$sum - 1|< eps it sets m$sta = 3mix.Check(m, eps=1e-6) Determines the m$sum and m$sta of a given mixture
object m.mix.Normalize(m, c=1) Normalizes a given mixture object m if m$sta > 0. The
rows sums are now normalized to the constant c with default value c=1.
mix.Random(nr, nc, s=1)Constructs the random mix object with nr rows and nc
columns and constant row sum s
IAMG 2005, August 21-26, Toronto 13
The 'mix' procedures in R…
Subcompositions of mixture objects mix.Sub(m, k, Normalize=TRUE) subcomposition of m without the k=(k_1,...,k_r)
columns normalized if Normalize=Tmix.Extract(m, k, Normalize=TRUE) subcomposition of m with only the k=(k_1,...,k_r)
columns normalized if Normalize=Tmix.ExtractRes(m, k) subcomposition with the k=(k_1,...,k_r) columns
all the rest is amalgamated in the residual output is the normalized mixture object with the r+1 columns
IAMG 2005, August 21-26, Toronto 14
The 'mix' procedures in R…
Visualization in ternary diagram routine
mix.Ternary(m,dist,distG,cls,Center, Borders,Gmean)
Draws ternary diagram with mixture data m
with additional features centered, borders percentile lines and geometric mean of the data.
The default value for Center, Borders and Gmean is FALSE.
dist - additional distances to numbers marking the percentile line,
distG - additional distances to numbers marking the percentile line of the geometric mean and
cls – colors of the percentile lines.
IAMG 2005, August 21-26, Toronto 15
The 'mix' procedures in R…
LEFT: The three part subcomposition with geometric mean
> mix.Ternary(mix.Sub(m,4),Gmean=T)
RIGHT: centered for better visualization of the differences between cases – border perc. lines for actual variation.
>mix.Ternary(mix.Sub(m,4),Borders=T,Center=T)
IAMG 2005, August 21-26, Toronto 16
The 'mix' procedures in R…
Visualization in tetrahedron routine
mix.Q2kin(fkin, m) transforms a 4 parts mixture m quadrays into 3-
dimensional XYZ coordinates and writes them as a file.kin.
The kin file we display as 3D animation with MAGE
viewer – free software available at http://kinemage.biochem.duke.edu/software/software1.html/#mage
IAMG 2005, August 21-26, Toronto 17
The 'mix' procedures in R…
Snapshots of glac.kin 3D MAGE view of tetrahedral display of glacial data – four parts compositions.
> mix.Q2kin(“glac.kin", m)
IAMG 2005, August 21-26, Toronto 18
The 'mix' procedures in R…
Percentile lines routinepercentile.lines(y, direction, cls, dist,lt)draws percentile lines into drown ternary diagram. y – percents or portions for percentile linesdirection - directionions for percentile lines, value 1,
percentile lines to the vertex No.1 = top, value 2, to the vertex No.2 = right, value 3, to the vertex No. 3 = left. The default value is direction = 1:3 (all directionions)
cls – is the vector with colours, first for percentile lines to the vertex No. 1, second … The default value is cls = c("yellow" , "yellow2", "yellow3")
dist – additional distances to numbers marking the percentile lines, first for perc. lines to the vertex No.1… The default value dist = c(0.05, 0.05, 0.05)
lt – is the vector with line types (values 1, 2,..., 10), first for…The default value lt = c(4,3,2)
IAMG 2005, August 21-26, Toronto 19
The 'mix' procedures in R…
Example 2mix object m with nine cases and three variables, i.e. 9x3matrix having 0.1 to 0.9 values in the first column, ratiosbetween the second and third being ½ $tit[1] "Deciles values in the first column"$sum[1] 1$sta[1] 3$mat aa bb cc1 0.1 0.30000000 0.600000002 0.2 0.26666670 0.533333303 0.3 0.23333330 0.46666670... ... ... 9 0.9 0.03333333 0.06666667attr(,"class")[1] "mixture"
IAMG 2005, August 21-26, Toronto 20
The 'mix' procedures in R…
We draw a ternary diagram with these nine points in different colours – cls, shapes – pch, and size cex=1
> cls <- c("khaki", "pink", "sienna", "tan", ...,"purple" )
> mix.Ternary(m, col=cls, pch=0:8, cex=1)
> perc.lines(10*1:9,dir=1, cls="cyan", lt=1)
Example 3
> mix.Ternary(mix.Random(22,3))
> perc.lines(10*1:9, cls=c("blue", "blueviolet", "violet"))
IAMG 2005, August 21-26, Toronto 21
The 'mix' procedures in R…
LEFT: Three parts compositions with deciles values in the first variable and constant ratios ½ between the second and the third variable – simulated data, deciles lines in the first direction
RIGHT: ternary diagram with random 22 points and deciles lines in all three directions.
IAMG 2005, August 21-26, Toronto 22
Conclusions
We have demonstrated some mix routines and features forvisualization of three and four parts (sub)compositions, available at http://vlado.fmf.uni-lj.si/pub/MixeR To provide a complementary use of ‘compositions’ packageand MixeR routines would be a most welcoming step.Therefore our future work would be to code transformationsroutines from the mix object to the objects of the fivedifferent classes: rplus, rcomp, acomp, aplus and multimplemented in ‘compositions’ package and of coursetransformations from the four classes to the mix objects.With these routines we hope to enable users to apply andto benefit from both, the ‘compositions’ package and alsothe MixeR library routines.