maximising the value of pxrf data · pdf filemaximising the value of pxrf data ... ade4, amap,...

50
Maximising the value of pXRF data Michael Gazley| Senior Research Scientist 13 November 2015 MINERALS RESOURCES With contributions from: Katie Collins, Ben Hines, Louise Fisher, June Hill, Angus McFarlane, Jess Robertson & René Sterk

Upload: lyliem

Post on 10-Mar-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Maximising the value of pXRF data Michael Gazley| Senior Research Scientist

13 November 2015

MINERALS RESOURCES

With contributions from: Katie Collins, Ben Hines, Louise Fisher, June Hill, Angus McFarlane, Jess Robertson & René Sterk

• How good is pXRF data?

• How do you make sure your data are good?

• Multivariate data • Issues with compositional data

• Principal component analysis (PCA)

• The Teapot

• Case studies 1 & 2

• Cluster analysis

• Case studies 3 & 4

• Concluding remarks

Overview

2 |

How good is pXRF data? F

ishe

r e

t a

l. (

20

14

)

Ga

zle

y e

t a

l. (

in p

rep

.)

Rb Sr

K Zn

3 |

How do you make sure your data are good?

4 |

5 |

Instrumentation

Go

od

ale

et

al. (

20

14

)

6 |

Nature of the material to be analysed

Ga

zle

y &

Fis

he

r (2

01

4)

7 |

Nature of the material to be analysed

Pa

rson

s e

t a

l. (

20

14

)

8 |

Nature of the material to be analysed

Ga

zle

y &

Fis

he

r (2

01

4)

9 |

Presentation of the sample to the unit

Pa

rson

s e

t a

l. (

20

14

)

10 |

Calibration and reference materials

Fis

he

r e

t a

l. (

20

14

)

11 |

Validation and presentation of data

Ga

zle

y &

Fis

he

r (2

01

4)

12 |

Top tips for ensuring good data 1. Ensure the sample is dry.

2. Present the sample as well as you possibly can (i.e. sample cup with mylar film). Reducing the particle size usually gives the best results.

3. Ensure the standards are appropriate – matrix matched – and that there are enough of them.

4. Send a sub-set of samples (5%?) for laboratory analysis.

13 |

Reporting pXRF data (JORC or otherwise)

• Datasets in geology tend to be high-dimensional

• Whatever it is we do, we do it either through space or through time, or both

• Humans are very good at seeing patterns.

• But, sometimes the sheer size of a dataset is overwhelming.

The multivariate problem

14 |

Disclaimer

• I am not a statistician.

• I am not a mathematician.

• I am a geologist who has found a need for multivariate methods to help us navigate n-dimensional space.

• Multivariate ordinations are not new, they have been around for a long time, geologists just seem to be slow adopters of them.

15 |

Missing Data

• You cannot have missing data.

• You need to substitute or impute missing values.

• For <10% missing 66% of LOD

• For 10 - 30% missing impute missing data

• For >30% discard element

16 |

• Geochemical data are typically reported as compositions

• They must total 100% or 1,000,000 ppm

• These data are “closed”

• For a composition of n-components, only n-1 components are required (Buccianti & Grunsky, 2014).

• Can’t do statistics on closed data because you find spurious correlations

• The log-ratio transform of Aitchison (1982, 1986) converts data into real number space

• Log-ratio transformations allow us to make meaningful statements on compositional data.

• There are a number of log-ratio transforms that have different purposes.

Closure and log-ratio transforms

17 |

Principal component analyses (PCA)

• PCA is an ordination

• All it does is reorient and rescale your data. Point-point relationships are preserved; PCA just makes it easier to see structure.

• PCA does a couple of really useful things.

• It quantifies how much of the variance in the dataset is summarised by each PC axis.

• It gives you a plot of loadings that you can use to understand which of your original variables are driving the variance in the dataset - it is human readable.

PC

2

PC2

18 |

What’s the best

way to look at a

teapot so that you

can best

understand what

shape it is?

Imagine your dataset as a teapot...

19 |

Orientating the teapot

20 |

• PCA is to ordinations as vanilla is to ice cream flavours

• It works with most things but there are plenty of other ordinations to choose from and some of those might suit you better, or be useful in combination with PCA

• A priori groupings?

• Canonical Variates Analysis (CVA) or Linear Discriminant Analysis (LDA)

• Both categorical and continuous data?

• Canonical Correspondence Analysis (CCA) and Detrended Correspondence Analysis (DCA)

• Variables not normally distributed? • Independent Components Analysis (ICA)

Other ordinations

21 |

• A number of different PCAs (and other ordinations, in some cases) can be run very easily in different programs – various stats software, MATLAB, ioGAS, PAST – and R

• R can do PCA in a multitude of ways

– Base package [stats] has prcomp and princomp

– Also found in additional packages [FactoMineR, ade4, amap, pcaPP] … probably more!

– Also ‘robust PCA’ ‘sparse PCA’, ‘robust sparse PCA’

Implementation

22 |

Barnes et al. (2014); Fisher et al. (2014)

Case study 1 – Agnew gold mine

23 |

Au associated with Ca calcic

amphibole and not biotite

Case study 2 - Dolerites

Ga

zle

y e

t a

l. (

20

14

)

24 |

Case study 2 - Dolerites

Ga

zle

y e

t a

l. (

20

14

)

25 |

Case study 2 - Dolerites G

azle

y e

t a

l. (

20

14

)

26 |

Case study 2 - Dolerites

27 |

Ga

zle

y e

t a

l. (

20

14

)

Case study 2 - Dolerites G

azle

y e

t a

l. (

20

14

)

28 |

• What if PCA has done a good job but you’ve still got too much overlap to be able to draw your own lines between groups of data?

• This is where cluster analysis comes in.

• Cluster analysis finds groups by looking at distances between points

• It doesn’t know what your data are and it doesn’t care. It is interested in point-point relationships .

• So yes, different clustering methods will find different groups!

Cluster analysis

29 |

Clustering the teapot

30 |

• There are going to be points that could belong to more than one group

• How you deal with those is dependent on the methods you choose and your own judgement

• Cluster analysis cannot and will not solve this problem for you!

Clustering the teapot

31 |

The data analysis work flow

Ga

zle

y e

t a

l. (

20

15

)

32 |

• Whangai/Waipawa/Wanstead Formations

• East Coast of North Island

• Homogenous, brown, boring – except …

• Waipawa Fm potential hydrocarbon source.

• Provenance of sediment of interest for palaeoenvironmental reasons

Case study 3 – East Coast Basin, NZ

Hin

es e

t a

l. (

20

15

; in

pre

p)

33 |

• pXRF dataset from six measured sections along the East Coast.

Case study 3 – East Coast Basin, NZ

Hin

es e

t a

l. (

20

15

; in

pre

p)

34 |

Hin

es e

t a

l. (

20

15

; in

pre

p)

35 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

36 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

• Data collected by analysing a Niton XL3t GOLDD pXRF unit on a nominal 40 m x 80 m grid.

• The pXRF unit was used in the field by digging a ~20 cm pit.

• Ta and Sn are not good by pXRF due to overlaps Cu/Zn and K/Ca respectively.

• Following anomalism being detected in this survey a 100 x 300 m grid was run with samples sent for lab analysis

• Both sample sets were estimated to a 100 x 100 m cells in 3DS Surpac.

37 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

38 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

39 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

PC2

PC1

40 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

41 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

PC2

PC1

42 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

43 |

Conditional probability H

ill e

t a

l. (

20

14

)

44 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

• If Sn in the pXRF dataset is >150 ppm, in the lab dataset it is >90 ppm truly anomalous.

• Used Fe, Ti, Zr and Mn concentrations and a dataset of Sn concentrations that were >150 ppm (8% of the samples) to predict the probability of Sn concentration in all samples.

• Left out Rb, Ca and Sr in case they were mobile during weathering

45 |

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

Conditional Probability based on Fe, Ti, Zr and Mn

Ignore anomaly

here

46 |

Exploration targets

Case study 4 – Mozambique soil samples S

terk

et

al. (

in r

evie

w)

47 |

• pXRF data are fit for many purposes.

• You can collect datasets that may contain elements you otherwise would not have paid for.

• But, you must stay on top of recording all of the metadata that tells you (and others) how good (or not) it really is.

• Multivariate methods can reveal underlying structure and provide ways to visualise big data.

• You can formulate hypotheses using PCA and cluster analysis which are then testable using standard statistics.

• pXRF technology allows for the collection of large datasets; ensure that you extract all of the value that you possibly can.

Concluding remarks

48 |

Questions?

49 |

MINERAL RESOURCES

Thank you

Michael Gazley Senior Research Scientist

t +61 8 6436 8501 e [email protected] w www.csiro.au/