maximising the value of pxrf data · pdf filemaximising the value of pxrf data ... ade4, amap,...

Maximising the value of pXRF data Michael Gazley| Senior Research Scientist

13 November 2015

MINERALS RESOURCES

With contributions from: Katie Collins, Ben Hines, Louise Fisher, June Hill, Angus McFarlane, Jess Robertson & René Sterk

• How good is pXRF data?

• How do you make sure your data are good?

• Multivariate data • Issues with compositional data

• Principal component analysis (PCA)

• The Teapot

• Case studies 1 & 2

• Cluster analysis

• Case studies 3 & 4

• Concluding remarks

Overview

How good is pXRF data? F

How do you make sure your data are good?

Instrumentation

Nature of the material to be analysed

Presentation of the sample to the unit

Calibration and reference materials

Validation and presentation of data

Top tips for ensuring good data 1. Ensure the sample is dry.

2. Present the sample as well as you possibly can (i.e. sample cup with mylar film). Reducing the particle size usually gives the best results.

3. Ensure the standards are appropriate – matrix matched – and that there are enough of them.

4. Send a sub-set of samples (5%?) for laboratory analysis.

Reporting pXRF data (JORC or otherwise)

• Datasets in geology tend to be high-dimensional

• Whatever it is we do, we do it either through space or through time, or both

• Humans are very good at seeing patterns.

• But, sometimes the sheer size of a dataset is overwhelming.

The multivariate problem

Disclaimer

• I am not a statistician.

• I am not a mathematician.

• I am a geologist who has found a need for multivariate methods to help us navigate n-dimensional space.

• Multivariate ordinations are not new, they have been around for a long time, geologists just seem to be slow adopters of them.

Missing Data

• You cannot have missing data.

• You need to substitute or impute missing values.

• For <10% missing 66% of LOD

• For 10 - 30% missing impute missing data

• For >30% discard element

• Geochemical data are typically reported as compositions

• They must total 100% or 1,000,000 ppm

• These data are “closed”

• For a composition of n-components, only n-1 components are required (Buccianti & Grunsky, 2014).

• Can’t do statistics on closed data because you find spurious correlations

• The log-ratio transform of Aitchison (1982, 1986) converts data into real number space

• Log-ratio transformations allow us to make meaningful statements on compositional data.

• There are a number of log-ratio transforms that have different purposes.

Closure and log-ratio transforms

Principal component analyses (PCA)

• PCA is an ordination

• All it does is reorient and rescale your data. Point-point relationships are preserved; PCA just makes it easier to see structure.

• PCA does a couple of really useful things.

• It quantifies how much of the variance in the dataset is summarised by each PC axis.

• It gives you a plot of loadings that you can use to understand which of your original variables are driving the variance in the dataset - it is human readable.

What’s the best

way to look at a

teapot so that you

can best

understand what

shape it is?

Imagine your dataset as a teapot...

Orientating the teapot

• PCA is to ordinations as vanilla is to ice cream flavours

• It works with most things but there are plenty of other ordinations to choose from and some of those might suit you better, or be useful in combination with PCA

• A priori groupings?

• Canonical Variates Analysis (CVA) or Linear Discriminant Analysis (LDA)

• Both categorical and continuous data?

• Canonical Correspondence Analysis (CCA) and Detrended Correspondence Analysis (DCA)

• Variables not normally distributed? • Independent Components Analysis (ICA)

Other ordinations

• A number of different PCAs (and other ordinations, in some cases) can be run very easily in different programs – various stats software, MATLAB, ioGAS, PAST – and R

• R can do PCA in a multitude of ways

– Base package [stats] has prcomp and princomp

– Also found in additional packages [FactoMineR, ade4, amap, pcaPP] … probably more!

– Also ‘robust PCA’ ‘sparse PCA’, ‘robust sparse PCA’

Implementation

Barnes et al. (2014); Fisher et al. (2014)

Case study 1 – Agnew gold mine

Au associated with Ca calcic

amphibole and not biotite

Case study 2 - Dolerites

Case study 2 - Dolerites G

Case study 2 - Dolerites

Case study 2 - Dolerites G

• What if PCA has done a good job but you’ve still got too much overlap to be able to draw your own lines between groups of data?

• This is where cluster analysis comes in.

• Cluster analysis finds groups by looking at distances between points

• It doesn’t know what your data are and it doesn’t care. It is interested in point-point relationships .

• So yes, different clustering methods will find different groups!

Cluster analysis

Clustering the teapot

• There are going to be points that could belong to more than one group

• How you deal with those is dependent on the methods you choose and your own judgement

• Cluster analysis cannot and will not solve this problem for you!

Clustering the teapot

The data analysis work flow

• Whangai/Waipawa/Wanstead Formations

• East Coast of North Island

• Homogenous, brown, boring – except …

• Waipawa Fm potential hydrocarbon source.

• Provenance of sediment of interest for palaeoenvironmental reasons

Case study 3 – East Coast Basin, NZ

• pXRF dataset from six measured sections along the East Coast.

Case study 3 – East Coast Basin, NZ

Case study 4 – Mozambique soil samples S

• Data collected by analysing a Niton XL3t GOLDD pXRF unit on a nominal 40 m x 80 m grid.

• The pXRF unit was used in the field by digging a ~20 cm pit.

• Ta and Sn are not good by pXRF due to overlaps Cu/Zn and K/Ca respectively.

• Following anomalism being detected in this survey a 100 x 300 m grid was run with samples sent for lab analysis

• Both sample sets were estimated to a 100 x 100 m cells in 3DS Surpac.

Conditional probability H

• If Sn in the pXRF dataset is >150 ppm, in the lab dataset it is >90 ppm truly anomalous.

• Used Fe, Ti, Zr and Mn concentrations and a dataset of Sn concentrations that were >150 ppm (8% of the samples) to predict the probability of Sn concentration in all samples.

• Left out Rb, Ca and Sr in case they were mobile during weathering

Conditional Probability based on Fe, Ti, Zr and Mn

Ignore anomaly

Exploration targets

• pXRF data are fit for many purposes.

• You can collect datasets that may contain elements you otherwise would not have paid for.

• But, you must stay on top of recording all of the metadata that tells you (and others) how good (or not) it really is.

• Multivariate methods can reveal underlying structure and provide ways to visualise big data.

• You can formulate hypotheses using PCA and cluster analysis which are then testable using standard statistics.

• pXRF technology allows for the collection of large datasets; ensure that you extract all of the value that you possibly can.

Concluding remarks

Questions?

MINERAL RESOURCES

Thank you

Michael Gazley Senior Research Scientist

t +61 8 6436 8501 e michael.gazley@csiro.au w www.csiro.au/

maximising the value of pxrf data · pdf filemaximising the value of pxrf data ... ade4, amap,...

Documents

maximising the effectiveness of information security ... ·...

portable x-ray fluorescence (pxrf) spectrometry applied to

constructing a database for pxrf, xrd, icp-ms and

final print - pxrf soil salinity ss

multivariate data analysis in microbial ecology - r: the r...

lithogeochemical pxrf study on the virtasalmi cu...

supplementar y information - media. · pdf...

maximising core-shell performance on conventional · pdf...

supplementary material for -...

maximising attendance - policy & procedure · pdf...

ade4 (pca)

theodora moutsiou* a compositional study (pxrf) of early

characterisation and analyses of museum objects using pxrf

interactive multivariate data analysis in r with the ade4...

package ‘adegenet’ - mcmaster...

maximising the ecological benefits of sustainable drainage...

maximising diagnostic imaging services in member · pdf...

geostatistical analysis of trace elements pxrf dataset of...

maximising your performance - fuel & oil conditioning · pdf...

identifying tsunami deposits in the field using pxrf: lhok...