maximising the value of pxrf data · pdf filemaximising the value of pxrf data ... ade4, amap,...
Post on 10-Mar-2018
217 Views
Preview:
TRANSCRIPT
Maximising the value of pXRF data Michael Gazley| Senior Research Scientist
13 November 2015
MINERALS RESOURCES
With contributions from: Katie Collins, Ben Hines, Louise Fisher, June Hill, Angus McFarlane, Jess Robertson & René Sterk
• How good is pXRF data?
• How do you make sure your data are good?
• Multivariate data • Issues with compositional data
• Principal component analysis (PCA)
• The Teapot
• Case studies 1 & 2
• Cluster analysis
• Case studies 3 & 4
• Concluding remarks
Overview
2 |
12 |
Top tips for ensuring good data 1. Ensure the sample is dry.
2. Present the sample as well as you possibly can (i.e. sample cup with mylar film). Reducing the particle size usually gives the best results.
3. Ensure the standards are appropriate – matrix matched – and that there are enough of them.
4. Send a sub-set of samples (5%?) for laboratory analysis.
• Datasets in geology tend to be high-dimensional
• Whatever it is we do, we do it either through space or through time, or both
• Humans are very good at seeing patterns.
• But, sometimes the sheer size of a dataset is overwhelming.
The multivariate problem
14 |
Disclaimer
• I am not a statistician.
• I am not a mathematician.
• I am a geologist who has found a need for multivariate methods to help us navigate n-dimensional space.
• Multivariate ordinations are not new, they have been around for a long time, geologists just seem to be slow adopters of them.
15 |
Missing Data
• You cannot have missing data.
• You need to substitute or impute missing values.
• For <10% missing 66% of LOD
• For 10 - 30% missing impute missing data
• For >30% discard element
16 |
• Geochemical data are typically reported as compositions
• They must total 100% or 1,000,000 ppm
• These data are “closed”
• For a composition of n-components, only n-1 components are required (Buccianti & Grunsky, 2014).
• Can’t do statistics on closed data because you find spurious correlations
• The log-ratio transform of Aitchison (1982, 1986) converts data into real number space
• Log-ratio transformations allow us to make meaningful statements on compositional data.
• There are a number of log-ratio transforms that have different purposes.
Closure and log-ratio transforms
17 |
Principal component analyses (PCA)
• PCA is an ordination
• All it does is reorient and rescale your data. Point-point relationships are preserved; PCA just makes it easier to see structure.
• PCA does a couple of really useful things.
• It quantifies how much of the variance in the dataset is summarised by each PC axis.
• It gives you a plot of loadings that you can use to understand which of your original variables are driving the variance in the dataset - it is human readable.
PC
2
PC2
18 |
What’s the best
way to look at a
teapot so that you
can best
understand what
shape it is?
Imagine your dataset as a teapot...
19 |
• PCA is to ordinations as vanilla is to ice cream flavours
• It works with most things but there are plenty of other ordinations to choose from and some of those might suit you better, or be useful in combination with PCA
• A priori groupings?
• Canonical Variates Analysis (CVA) or Linear Discriminant Analysis (LDA)
• Both categorical and continuous data?
• Canonical Correspondence Analysis (CCA) and Detrended Correspondence Analysis (DCA)
• Variables not normally distributed? • Independent Components Analysis (ICA)
Other ordinations
21 |
• A number of different PCAs (and other ordinations, in some cases) can be run very easily in different programs – various stats software, MATLAB, ioGAS, PAST – and R
• R can do PCA in a multitude of ways
– Base package [stats] has prcomp and princomp
– Also found in additional packages [FactoMineR, ade4, amap, pcaPP] … probably more!
– Also ‘robust PCA’ ‘sparse PCA’, ‘robust sparse PCA’
Implementation
22 |
Barnes et al. (2014); Fisher et al. (2014)
Case study 1 – Agnew gold mine
23 |
Au associated with Ca calcic
amphibole and not biotite
• What if PCA has done a good job but you’ve still got too much overlap to be able to draw your own lines between groups of data?
• This is where cluster analysis comes in.
• Cluster analysis finds groups by looking at distances between points
• It doesn’t know what your data are and it doesn’t care. It is interested in point-point relationships .
• So yes, different clustering methods will find different groups!
Cluster analysis
29 |
• There are going to be points that could belong to more than one group
• How you deal with those is dependent on the methods you choose and your own judgement
• Cluster analysis cannot and will not solve this problem for you!
Clustering the teapot
31 |
• Whangai/Waipawa/Wanstead Formations
• East Coast of North Island
• Homogenous, brown, boring – except …
• Waipawa Fm potential hydrocarbon source.
• Provenance of sediment of interest for palaeoenvironmental reasons
Case study 3 – East Coast Basin, NZ
Hin
es e
t a
l. (
20
15
; in
pre
p)
33 |
• pXRF dataset from six measured sections along the East Coast.
Case study 3 – East Coast Basin, NZ
Hin
es e
t a
l. (
20
15
; in
pre
p)
34 |
Case study 4 – Mozambique soil samples S
terk
et
al. (
in r
evie
w)
• Data collected by analysing a Niton XL3t GOLDD pXRF unit on a nominal 40 m x 80 m grid.
• The pXRF unit was used in the field by digging a ~20 cm pit.
• Ta and Sn are not good by pXRF due to overlaps Cu/Zn and K/Ca respectively.
• Following anomalism being detected in this survey a 100 x 300 m grid was run with samples sent for lab analysis
• Both sample sets were estimated to a 100 x 100 m cells in 3DS Surpac.
37 |
Case study 4 – Mozambique soil samples S
terk
et
al. (
in r
evie
w)
• If Sn in the pXRF dataset is >150 ppm, in the lab dataset it is >90 ppm truly anomalous.
• Used Fe, Ti, Zr and Mn concentrations and a dataset of Sn concentrations that were >150 ppm (8% of the samples) to predict the probability of Sn concentration in all samples.
• Left out Rb, Ca and Sr in case they were mobile during weathering
45 |
Case study 4 – Mozambique soil samples S
terk
et
al. (
in r
evie
w)
Conditional Probability based on Fe, Ti, Zr and Mn
Ignore anomaly
here
46 |
Exploration targets
• pXRF data are fit for many purposes.
• You can collect datasets that may contain elements you otherwise would not have paid for.
• But, you must stay on top of recording all of the metadata that tells you (and others) how good (or not) it really is.
• Multivariate methods can reveal underlying structure and provide ways to visualise big data.
• You can formulate hypotheses using PCA and cluster analysis which are then testable using standard statistics.
• pXRF technology allows for the collection of large datasets; ensure that you extract all of the value that you possibly can.
Concluding remarks
48 |
top related