1-s2.0-s0169743915001744-main

11
Design of experiments and data analysis challenges in calibration for forensics applications Christine Anderson-Cook, Tom Burr , Michael S. Hamada, Christy Ruggiero, Edward V. Thomas 30th Anniversary Issue in Chemometrics and Intelligent Laboratory Systems abstract article info Article history: Received 17 April 2015 Received in revised form 7 July 2015 Accepted 8 July 2015 Available online xxxx Keywords: Design of experiments Measurement error Multivariate calibration Nuclear forensics Forensic science aims to infer characteristics of source terms using measured observables. Our focus is on statis- tical design of experiments and data analysis challenges arising in nuclear forensics. More specically, we focus on inferring aspects of experimental conditions (of a process to produce product Pu oxide powder), such as tem- perature, nitric acid concentration, and Pu concentration, using measured features of the product Pu oxide pow- der. The measured features, Y, include trace chemical concentrations and particle morphology such as particle size and shape of the produced Pu oxide power particles. Making inferences about the nature of inputs X that were used to create nuclear materials having particular characteristics, Y, is an inverse problem. Therefore, statis- tical analysis can be used to identify the best set (or sets) of Xs for a new set of observed responses Y. One can ta model (or models) such as Y = f(X)+ error, for each of the responses, based on a calibration experiment and then invertto solve for the best set of Xs for a new set of Ys. This perspectives paper uses archived experimental data to consider aspects of data collection and experiment design for the calibration data to maximize the quality of the predicted Ys in the forward models; that is, we assume that well-estimated forward models are effective in the inverse problem. In addition, we consider how to identify a best solution for the inferred X, and evaluate the quality of the result and its robustness to a variety of initial assumptions, and different correlation structures between the responses. We also briey review recent advances in metrology issues related to characterizing par- ticle morphology measurements used in the response vector, Y. © 2015 Elsevier B.V. All rights reserved. 1. Introduction Forensic science is a broad discipline that aims to infer characteris- tics of source terms using measured observables. In many cases, the in- ferred source terms can be traced to an attribution goal. For example, if certain ranges of input Pu concentration could be ruled out, then partic- ular processes that otherwise might have produced the product could be ruled out. Our focus is experiment design and data analysis chal- lenges arising in nuclear forensics. We focus on inferring aspects X of ex- perimental reaction conditions using measured features Y of the reaction product. The word signatureis a useful qualitative term that conveys shifts in the responses Y as a function of changes in the pro- cessing conditions X. Such a task falls within the scope of the Depart- ment of Homeland Security National Technical Nuclear Forensics Center, which is sponsoring a Plutonium processing signatures multi- year project to identify signatures of nuclear forensic value in plutonium (Pu) materials that can be related to the processing conditions used to produce them. One initial goal is to identify possible signatures derived from PuO 2 produced via a Pu(III) oxalate precipitation process shown in Fig. 1: [11,14,20,61]. The inferred processing conditions could help indi- cate what facility and settings were used to make the interdicted special nuclear material. The context we focus on in this perspectives paper is to produce a variety of PuO 2 materials via a statistically designed experiment that is currently in the planning stages in which certain process factors are de- liberately varied over a course of experimental trials. Other process fac- tors not discussed here will be held constant. Each experimental trial involves specic settings for each process factor. Once produced, the materials are analyzed to characterize their morphological, chemical and physical properties (which may include trace element concentra- tions, surface area, crystalline phase, and porosity). Collectively, it is ex- pected that the properties of the materials produced will span a wide range. Design of experiments enables a causal relationship between the factors and responses to be systematically explored and functional models describing the relationship between them to be estimated. Ulti- mately, the goal is to be able to infer the set of experimental conditions used to produce the material, based on its observed properties using an inverse modeling approach. In anticipation of the new experimental data in the near future, this perspectives paper uses archived experimental data from a previous similar process to consider aspects of data collection and experiment Chemometrics and Intelligent Laboratory Systems xxx (2015) xxxxxx Corresponding author. E-mail address: [email protected] (T. Burr). CHEMOM-03057; No of Pages 11 http://dx.doi.org/10.1016/j.chemolab.2015.07.008 0169-7439/© 2015 Elsevier B.V. All rights reserved. Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab Please cite this article as: C. Anderson-Cook, et al., Design of experiments and data analysis challenges in calibration for forensics applications, Chemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab.2015.07.008

Upload: jose-mer

Post on 08-Dec-2015

213 views

Category:

Documents


0 download

DESCRIPTION

Chemometrics and Intelligent Laboratory Systems

TRANSCRIPT

Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

CHEMOM-03057; No of Pages 11

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems

j ourna l homepage: www.e lsev ie r .com/ locate /chemolab

Design of experiments and data analysis challenges in calibration forforensics applications

Christine Anderson-Cook, Tom Burr ⁎, Michael S. Hamada, Christy Ruggiero, Edward V. Thomas30th Anniversary Issue in Chemometrics and Intelligent Laboratory Systems

⁎ Corresponding author.E-mail address: [email protected] (T. Burr).

http://dx.doi.org/10.1016/j.chemolab.2015.07.0080169-7439/© 2015 Elsevier B.V. All rights reserved.

Please cite this article as: C. Anderson-CookChemometr. Intell. Lab. Syst. (2015), http://d

a b s t r a c t

a r t i c l e i n f o

Article history:Received 17 April 2015Received in revised form 7 July 2015Accepted 8 July 2015Available online xxxx

Keywords:Design of experimentsMeasurement errorMultivariate calibrationNuclear forensics

Forensic science aims to infer characteristics of source terms using measured observables. Our focus is on statis-tical design of experiments and data analysis challenges arising in nuclear forensics. More specifically, we focuson inferring aspects of experimental conditions (of a process to produce product Pu oxide powder), such as tem-perature, nitric acid concentration, and Pu concentration, usingmeasured features of the product Pu oxide pow-der. The measured features, Y, include trace chemical concentrations and particle morphology such as particlesize and shape of the produced Pu oxide power particles. Making inferences about the nature of inputs X thatwere used to create nuclearmaterials having particular characteristics, Y, is an inverse problem. Therefore, statis-tical analysis can be used to identify the best set (or sets) of Xs for a new set of observed responses Y. One can fit amodel (ormodels) such asY= f(X)+ error, for each of the responses, based on a calibration experiment and then“invert” to solve for the best set of Xs for a new set of Ys. This perspectives paper uses archived experimental datato consider aspects of data collection and experiment design for the calibration data to maximize the quality ofthe predicted Ys in the forward models; that is, we assume that well-estimated forward models are effective inthe inverse problem. In addition, we consider how to identify a best solution for the inferred X, and evaluatethe quality of the result and its robustness to a variety of initial assumptions, and different correlation structuresbetween the responses.We also briefly review recent advances inmetrology issues related to characterizing par-ticle morphology measurements used in the response vector, Y.

© 2015 Elsevier B.V. All rights reserved.

1. Introduction

Forensic science is a broad discipline that aims to infer characteris-tics of source terms using measured observables. In many cases, the in-ferred source terms can be traced to an attribution goal. For example, ifcertain ranges of input Pu concentration could be ruled out, then partic-ular processes that otherwise might have produced the product couldbe ruled out. Our focus is experiment design and data analysis chal-lenges arising in nuclear forensics.We focus on inferring aspects X of ex-perimental reaction conditions using measured features Y of thereaction product. The word “signature” is a useful qualitative termthat conveys shifts in the responsesY as a function of changes in thepro-cessing conditions X. Such a task falls within the scope of the Depart-ment of Homeland Security National Technical Nuclear ForensicsCenter, which is sponsoring a Plutonium processing signatures multi-year project to identify signatures of nuclear forensic value in plutonium(Pu) materials that can be related to the processing conditions used toproduce them. One initial goal is to identify possible signatures derivedfrom PuO2 produced via a Pu(III) oxalate precipitation process shown in

, et al., Design of experimentsx.doi.org/10.1016/j.chemolab

Fig. 1: [11,14,20,61]. The inferred processing conditions could help indi-catewhat facility and settingswere used tomake the interdicted specialnuclear material.

The context we focus on in this perspectives paper is to produce avariety of PuO2 materials via a statistically designed experiment that iscurrently in the planning stages in which certain process factors are de-liberately varied over a course of experimental trials. Other process fac-tors not discussed here will be held constant. Each experimental trialinvolves specific settings for each process factor. Once produced, thematerials are analyzed to characterize their morphological, chemicaland physical properties (which may include trace element concentra-tions, surface area, crystalline phase, and porosity). Collectively, it is ex-pected that the properties of the materials produced will span a widerange. Design of experiments enables a causal relationship betweenthe factors and responses to be systematically explored and functionalmodels describing the relationship between them to be estimated. Ulti-mately, the goal is to be able to infer the set of experimental conditionsused to produce the material, based on its observed properties using aninverse modeling approach.

In anticipation of the new experimental data in the near future, thisperspectives paper uses archived experimental data from a previoussimilar process to consider aspects of data collection and experiment

and data analysis challenges in calibration for forensics applications,.2015.07.008

Fig. 1. PuO2 production via a Pu(III) oxalate process that is one of many possible Pu precipitation and crystallization methods for the conversion or recovery of plutonium. Pu(III) oxalateprecipitation has been used since the Manhattan Project because of its good Pu recovery and ability to be filtered. Experimental reaction conditions include, for example, the form of theoxalate and the order of addition (Pu to oxalate or vice-versa, reagent/reactant concentrations and addition rates, nitric acid concentration. temperature and duration, stir rates). These Xscreate different features Y in the PuO2. The Y features include particlemorphology (size, shape, structure, porosity), particle agglomeration, and trace element containment concentrations.Literature suggests that the process removes impurities such as Al, Fe, and UO2, but there is less “decontamination” fromNa, Ca, K; and none fromAmericium [22]. Different precipitationprocesses have different decontamination factors, so these trace elements can be a process signature.

2 C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

design for the calibration data to maximize the quality of the predictedYs in the forward models; that is, we assume that well-estimated for-ward models are effective in the inverse problem. In addition, we con-sider how to identify the best solution for the inferred X, and evaluatethe quality of the result and its robustness to a variety of initial assump-tions, and different correlation structures between the responses. Wealso briefly review recent advances in metrology issues related to char-acterizingparticlemorphologymeasurements used in the response vec-tor, Y, and describe theneeded statistical tools that have been developedfor chemometrics and other applications. Also, new technical challengesarise in this forensics context, including: the need to characterize mea-surement error, with possible censoring (data reports as “less than aparticular threshold” or “more than a threshold”), making use of inter-laboratory sample exchange programs to help estimate measurementbiases, new assay protocols, particularly for particle morphology mea-surements using modern image analysis to characterize particle sizeand shape, the need to allow for a combination of qualitative and quan-titative factors, and the fact that regardless ofwhat functional form is as-sumed to relate X to Y, once the data are collected, many functionalforms will be evaluated.

We have begun to generate candidate experiment designs followingestablished optimality criteria for the forward model with an approxi-mately known functional form. Simulation is expected to be a key anal-ysis tool to address the metrology, experiment design assessment,calibration, and model selection challenges. This allows candidate de-signs to be compared across different potential outcomes incorporatingscenarios described in Sections 5 and 6.

This perspectives paper reviews multivariate calibration, experi-ment design for multivariate calibration, and describes to what extentprevious literature helps address the example nuclear forensics applica-tion. Section 2 gives more background. Section 3 describes multivariatecalibration and experiment design as used in chemometrics, with afocus on the inverse problem. Section 4 describes related metrology is-sues, focusing on morphological measurements with particle morphol-ogy measurements included the predictors X and with recent ongoingefforts to improve morphology measurements. Section 5 describes cur-rent progress in the context of the motivating nuclear forensicsexample. Sections 6 and 7 include research directions and a summary.

2. Background

Weuse the following notation. There are p factors (predictors) and qresponses.

Factors=Processing conditions such as temperature, nitric acid andPu concentrations X1,X2,…,Xp. Complete factor set: X = {X1,X2,…,Xp}.

Responses = Measurements of processed material such as particlesize and shape

Y1,Y2, …,YqSignature = Complete set of responses, Y = {Y1,Y2,…,Yq}Forward (causal) models are often developed using the results of a

controlled experiment. Forward models are often (but not necessarily)expressed in terms of a low-order polynomial, which can be thought

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

of as a Taylor series approximation to the true underlying relationship.Suchmodels relate a specific response (Yi) to the complete set of factors.Generically, this relationship can be expressed as Yi = fi(X1,X2,…,Xp).The main forensics goal is to use the signature acquired from materialof an unknown pedigree to infer the conditions used to process the ma-

terial. Inverse prediction, generically expressed by X j ¼ g jðY1;Y2;…;YqÞ,can be used to predict the value of the jth process factor based on mea-sured properties. Oneway to perform inverse prediction is with a collec-tion of fitted forward models from the signature of the unknownmaterial. Alternatively, one might use the data acquired from the exper-iment to form a training set from which an inverse model can be devel-oped directly,without using forward models [29,30,37,38,46]. Whilephysics and chemistry dictate the causal relationship between factorsand responses, experimenters can influence control over the complexityof the fitted forwardmodels via selection of factors, factor levels, and theset of trials performed. Experimenters also influence the effectiveness ofthe inverse modeling process by selecting a sufficiently informative/dis-criminating set of response variables (the “signature”) that can be usedto unambiguously resolve the various factors. For a given sample, thenumber of responses comprising the signature may be limited by diffi-culty, expense, and/or availability of material. We believe that thereshould be at least as many response variables as factors (q ≥ p) to suc-cessfully deduce the complete set of conditions used to process an un-known material. With fewer responses (q b p), there is considerablepotential for ambiguous results with non-unique solutions.

To illustrate, Fig. 2a is a principal coordinate plot [62] of the distancesbetween theX vectors for each of the 72 samples. The X vector is the per-centages of fat, sucrose, dry flour, and water in a cookie recipe. The re-sponse vector, Y, is 700 near-infrared (NIR) reflectance from 1100 to2498 nanometers in steps of 2 nm. This is freely available data from theR [49] package ppls (which provides functions for linear and nonlinearregression based on partial least squares and penalization techniques;see Section 3.1). FromFig. 2a,we select samples 24 and 51 (in the bottomleft corner of the plot) that are close in the X-space, and sample 19 (topright) far away.We plot the corresponding three spectra in Fig. 2b. Qual-itatively, in this case, a small (large) distance in X-space corresponds to asmall (large) distance in Y space between samples, which indicates thatone could anticipate using Y to predict Xwith reasonable success.

In processes to produce Pu oxide, we estimate the relationship be-tween the chemical engineering processing parameters employed andthe physical, chemical, and morphological characteristics of the pro-ducedmaterials. The response Ys used to infer processing conditions in-clude morphological features and trace chemical concentrations, butcould also include other analytical chemistry measurements (such ascrystallographic phase, surface area, chemical form, heterogeneity, oxi-dation state, isotopic composition, nitrate/chloride/sulfate/hydroxide/carbide or other anion or organic contaminants present). Inferred pro-cessing conditions such as temperature, Pu and nitric concentrationcould indicate the facility used to produce any future interdicted Pupowder. Due to the large number of chemical engineering flow sheetsused historically, process variations within each flow sheet, and the po-tential for even more variations among international and subnational

and data analysis challenges in calibration for forensics applications,.2015.07.008

Fig. 2. Analytical chemistry assay methods such as NIR are used to infer trace chemicalproperties (a) principal coordinate plot of the 72 mixtures of four ingredients;(b) Example NIR spectra from each of 3 of the 72 mixtures defined by sucrose, flour,water, and fat percentages. Mixtures 24 and 51 are close in the percentages of the four in-gredients (see plot (a)) and both are relatively distant from the mixture 19 percentages.The NIR response for mixtures 24 and 51 is similar, but is quite different from mixtures24 and 19 and mixtures 51 and 19.

3C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

groups, it is not feasible to evaluate all possible combinations in labora-tory tests. For this reason, a statistically designed study reduces thenumber of laboratory tests (runs) conducted. Awell-designed study en-ables modeling of main effects and low-to-mid order interaction effectsof each process parameter (factor) on the measured product character-istics (outcomes/responses).

Morphological properties of the produced PuO2 powder are amongthe key predictor features included in X. However, not all morphologyfeatures are well-defined responses. There are well-known, studiedand standardized techniques to measure some defined morphologicalattributes. For example, there are multiple methods (multiple instru-ment types) tomeasure particle size [33,42], which is typically reportedas a distribution. Particle distributions are often used to characterize therange of sizes (or other attributes) that a process generates, and canreadily be used to predict X in ways similar to in Fig. 2 for the exampleNIR energy spectra. Microscopy methods (scanning electron, optical,transmission electron or scanning probe/atomic force) are widely usedto characterize the size andmorphology of precipitatedmaterials; how-ever, despite being one of the most common analytical techniques, mi-croscopy does not typically generate a directly usable morphological“response” (an exception is X-ray analysis techniques). Instead, micros-copy generates an image (or ideally, multiple images) from which we

Fig. 3. Morphology responses as determined from microscopy images. Although there are maimage generation, processing and analysis of that image are large sources of uncertainty in athere are many possible methods to get quantitative information about the material, and there

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

derive the Ymorphological responses. This process in itself is an inverseproblem in the sense of using the image(s) to define Y, so we haveImage → Y → X.

The quality and quantity of image information being generated bymicroscopy has dramatically increased in the last 10 years. Quantitativeimage analysis techniques, long developed in the computer vision andpattern recognition fields, can be applied to streams of microscopyand other tomographic and spectroscopic image data [1,26,63].

A typical image analysis pathway is shown in Fig. 3. Several imageprocessing methods can be used to segment out “particles” from theimage background. These segmented objects are used to encodeshape. Although this seems straightforward, the encoding can be donein many ways. The underlying representation can be a set of pixelsand pixel locations, a vector, or a level set. That representation can beencoded as a set of features, which can be commonly understood by“shape” attributes (size, length of longest diameter, aspect ratio, circu-larity, and fractal dimension), by orthogonal basis sets (Zernike polyno-mials and Hu moments), or by other representations [7,35,56,66]. Theycan be selected a priori, or learned from the data. Alternately, the statis-tical representation of the image can be used directly, without initial ob-ject segmentation. In our experience, chemists are most “comfortable”with a classical segment object method and prefer a priori definedconventional features, even though they are highly correlated (seeSection 5) [47].

Size and shape are only two morphological aspects needed to haveeffective and robust response factors for the inverse problem. Thesedo not capture the morphology information that is in the spatial/struc-ture arrangement of the sub particles, or the surface features, texture,topology, or other “morphology” attributes. These attributes are harderto use robustly. For example, in Fig. 4, the “bladed” bow shaped particleis in between the size of the two “rough” particles. Farmore informationthan size, shape, and surface texturewould be needed to accurately rep-resent these particle differences and similarities. In some cases, this in-formation can be more directly acquired through 3D imaging (surfacetopology, SEM-FIB, SEM-stereo imaging, or confocalmicroscopy) or sur-face probe measurement (AFM).

Descriptive terminology is also useful, using defined lexicons, andcan be put into a statistical framework [6–8] or by translating a set of de-scriptive morphology terms into a large feature vector. Some examplesof descriptive terms are included in Section 5wherewe provide numer-ical examples using archived morphological measurements of old ex-perimental data. Methodologies for representing particle morphologyare an active area of research, and “solutions” depend on the application(e.g., quantifying, categorizing, matching, sorting, ranking, comparing,etc.) [4].

Unlike trace element measurements, regardless of methods used torepresent size, shape, or othermorphological attributes, a chemical pro-cess produces a distribution of particles. Representing the response Yusing the estimated distribution itself, rather than using estimated dis-tribution summaries (mean, mode, range, skewness, kurtosis, etc.),may bemore effective for inferring the source X. Therefore, once the ex-periment is designed and run, product material will be measured inmany ways, some of which will be similar to the archived measure-ments described in Section 5.

ny best techniques to reduce bias in the first step (sample prep), the subsequent steps ofny morphological attribute. Even given a set of images ideally suited for image analysis,is unlikely to be a single method universally applicable and useful for all materials.

and data analysis challenges in calibration for forensics applications,.2015.07.008

Fig. 4. PuO2 particles images courtesy of AngeliqueWall & Troy Nothwang, MST-16, LANL. The shapes of the particles in Fig. 4 have been “quantified” using theMAMA software (Morpho-logical Analysis of Materials) that Los Alamos National Laboratory has been developing for quantifying the size, shape, andmorphological features of nuclear materials in a consistent andvalidated framework. This analysis is done by segmenting out the “large” particles in the front and then using a series of vector-based and pixel-based calculations. Although clearly dif-ferent to the human observer, formost calculated attributes, the larger globular particle and the “wheat sheaf” particle in the second image aremore similar. Attributes such as circularity,convexity, and someof theHumoments can be used to group theparticles aswe see them [52]. TheMAMA software is freely available by contacting the authors. In support of this forensicsproject, our ongoing research also includes characterizing the uncertainty in morphology measurements.

4 C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

3. Multivariate calibration and design of experiments

Here we briefly describe multivariate calibration and design of ex-periments, which are two of the essential tools used in chemometricsdata collection and analysis.

3.1. Multivariate Calibration

Many readers are aware of various multivariate calibration optionsfor moderate-to-high dimensional spectral data common in analyticalchemistry to infer species concentration from a measured spectrum.But briefly, a common setting for multivariate calibration is high-dimensional spectral data, such as near-infrared (NIR) spectroscopy(Fig. 2b), consisting of hundreds to thousands of measured responsesfor each sample across an energy range. A common inverse problem isto use a measured spectrum to predict absolute or relative componentconcentrations in mixtures, or to identify which components are pres-ent in a sample.

Measured spectra of m calibration standards each containing p (orfewer) chemical components are usually analyzed using one of two ap-proaches [29]. The first approach assumes that each component has anadditive impact without interacting with other components, and so themeasured spectrum is adequately modeled as a linear combination ofpure-component spectra, resulting in a model expressed (followingsuitable data transformation) as Y = Xβ + ε1, where Y is an m × q re-sponse matrix, m is the number of calibration standards and q is the

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

number of energy channels (the number of responses in this context),X is them× pmatrix of component concentrations, β is the p× qmatrixof absorptivity-path length products (estimated using calibration stan-dards), and ε1 is a random error term that includes modeling and mea-surement error effects [13,29,34,41]: The classical least squares estimate

of β is β ¼ ðX0XÞ−1X0Y from calibration. The least squares predictor for a

q× 1 test spectrum Yt is then X ¼ ðβ � βT Þ−1

� β � Yt. Note that ðβ � βT Þ−1

,which is affected by the set of response variables that is chosen, relates tothe selectivity/sensitivity of the signature [57]. In the case of a selective

signature, the off-diagonal elements of ðβ � βT Þ−1

will be relativelysmall. The parameter vectorβ characterizeswhich components are pres-ent in what amounts, which in some contexts, is themain forensics task.

The second approach, inverse least squares, uses the model X =Yα + ε2 for one-component at a time to directly fit each componentconcentration as a function of the measured response matrix, Y.During calibration, the least-squares solution for α for a given compo-

nent is α ¼ ðY 0YÞ−1Y 0X. The least-squares predictor for a test spectrum

is then X ¼ ðα0αÞ−1 bα0Y. Onedisadvantage of the secondapproach is thatthe analysis generally has to be restricted to a small number of energychannels. This is because thematrix whichmust be inverted has the di-mension equal to the number of channels, and this number cannot ex-ceed m. This can be exacerbated by near linear relationships betweenresponses [29]. Ridge regression is sometimes used to overcome theseproblems [28].

and data analysis challenges in calibration for forensics applications,.2015.07.008

5C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

Both approaches use calibration estimated parameters. The estimat-ed parameters are then used as predictors in the prediction step, sothere is a connection to the errors-in-variables literature [17,57–59].Within the topic of multivariate calibration, there are many other op-tions, such as partial least squares (PLS), that can improve the perfor-mance of classical or inverse least squares [13,28,29,41]. Methods suchas PLS (which is a version of the inverse approach) can use all availablechannels and can succeedwithout specific knowledge of all of the com-positional factors involved, and are thus applicable to a greater varietyof problems. Such methods rely on model development based on atraining set that incorporates variation from sources in a way suchthat the training set spans all (or most) situations that mightarise during the prediction step. The training set can be specified via apassive approach (random calibration) or preferably an active approach[60]. A considerable amount of chemometrics literature exists onmulti-variate calibration options, usually with a modest number of calibrationsamples and a large number of energy channels, such as in this cookieexample of Fig. 2.

For illustration, we used the m = 72 samples in the cookie experi-ment that are available in the R package ppls. We applied leave-10-out-at-a-time cross-validation (CV) to estimate the root mean squaredprediction error (RMSE) for each of the 4 component relative concentra-tions, for both classical (approach 1) and inverse (approach 2) leastsquares. In 104 repeats of leave-10-out-at-a-time CV, the option 1RMSEs for components 1 to 4 are 1.4, 7.7, 23.8, and 0.8, respectively (re-peatable to the number of digits shown). The option 2 RMSEs are 5.6,15.7, 7.8, and 1.0, respectively. This relatively simple NIR data hasbeen used in many chemometrics demonstrations to compare the per-formance of PLS, classical least squares, and other multivariate calibra-tion options.

Note that in our example nuclear forensics context, Y = Xβ + ε1where y is anm × q response matrix, m is the number of experimentalruns used for calibration, and q is the number responses. It is currentlyunknownhowmany responseswill be collected in theupcoming exper-iment, but the example in Section 5 using archived data from a similarexperiment assumes we have q = 5 responses. Note that spectral datacould be used, for example, to infer trace chemical concentrations, butit would be the estimates of those concentrations used as responses,so any spectral analysis would be a side effort used to estimate chemicalconcentrations. Also note that in our forensics context (see the examplein Section 5) the q responses will not have a smooth functional form asin the NIR spectra in Fig. 2.

3.2. Design of Experiments

Many readers are aware of design of experiment tools forchemometrics. However, some readers may not be aware how restric-tive the term “optimality” is regarding designed experiments. Broadly,experiment design helps address what questions to answer from an ex-periment, and then focuses on efficient designs that extract maximuminformation within the available budget. The experimental budget re-lates to the number of experimental runs and conditions. Statistical de-sign of experiment principles needed in the NIR example in 3.1 include:the possible need to include nuisance effects such as interfering(unknown or known) components present in some of the samples,the need for designs with good properties of least squares estimators

such as β ¼ ðX0XÞ−1X0Y in approach 1, and strategies such as the useof fractional factorial designs to reduce the number of experimentalruns while still being able to estimate the main and low-order effectsof each factor. Design of experiment principles utilized in the nuclear fo-rensics context include limiting the number of factors studied with ascreening strategy applied to the list of candidate factors, and consider-ation of the range of situations to be evaluated. For good prediction andto avoid extrapolation beyond the observed ranges, it is important thatthe set of trials span the range of situations where the inverse model is

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

to be applied. Thus, in order to meet forensics objectives (both forwardand inverse modeling), a sensible experimental strategy is one that al-lows for efficient estimation of forward model parameters such that:

1. the factor space is appropriately spanned where inverse model pre-dictions are to be applied,

2. accurate forwardmodels can be estimated (when the inverse predic-tion relies on the forward models), and

3. the experimental data provide a rich training set for fitting/evaluat-ing an inverse model directly.

While it is difficult to give prescriptive recommendations about theideal designed experiment to run across diverse applications, the generalapproach that we recommend is to use computer-generated designsfrom statistical software. An intuitive optimization criterion is to use I-optimality that focuses on good prediction of new observations for theoriginal model in approach 1. Myers et al. [44] have more details onthis criterion. The computer-generated design also has the flexibility toaccommodate any number of design runs, to avoid factor combinationsthat are not experimentally feasible, and to accommodate selected num-bers of replications. Including replicates at some of the design locations isimportant to get a precise estimate of the natural variability of the pro-cess and to help calibrate model selection during the analysis phase.We describe addition design of experiment considerations in Section 6.

4. Metrology

In Section 3.1 we considered models, Y= Xβ+ ε1 (approach 1) andY= Xβ+ ε2 (approach 2). The errors ε1 or ε2 are usually assumed to bepurely random, without bias or censoring effects. However, inter-laboratory studies [18] often show evidence of assay bias, andmeasure-ment systems often do not report a full instrument reading, opting in-stead to alter or “censor” instrument data. One common reason forcensoring in measurement systems is due to rounding, where actualvalues are rounded to instrument resolution (interval censoring). An-other common reason is that measurement systems often have a cali-brated response only over a certain data range. Near the ends of therange (the lower detection limit at the low end of the range and theupper detection limit at the high end of the range), the system responseis often not well characterized, so the protocol is to report a censoredvalue such as “measurand value is less than a specified threshold” or“measurand value is more than a threshold.”

Burr and Hamada [15] introduce a multiplicative measurementerror model to characterize bias for repeatability and reproducibility(R & R) studies. The model takes the form:

yi jk ¼ μ i 1þ θ0j þ ε0i jk� �

¼ μ i þ μ iθ0j þ μ iε

0i jk ð1Þ

for the ith item, jth laboratory, and kth repeat for scalar-valued yijk. Theprime notation θ′ and ε′ is used to convey the fact that these terms (sys-tematic and random errors, respectively) enter Eq. (1) in a multiplica-tive way with the true measurand value, μi. By contrast, if for example,the data suggested an additive model, we would use the model yijk =μi + θj + εijk without any prime notation. Any estimate of repeatabilityvariance or reproducibility variance requires a model such as Eq. (1).Burr et al. [18] define reproducibility as the sum of between-laboratory variance and repeatability variancewithin each lab in a sam-ple exchange program, so there is a model (explicitly written or just as-sumed) invoked for measurement errors similar to Eq. (1). Twocautions are in order: First, estimation of variance components alwaysrequires some type of measurement error model such as Eq. (1) to de-fine terms. Second, even minor changes to the model can lead to differ-ent conclusions, so model selection and evaluation using subject matterexpertise are important.

Regarding data censoring, Burr and Hamada [14] illustrate astraightforward numerical Bayesian analysis of such censored data

and data analysis challenges in calibration for forensics applications,.2015.07.008

Table 1Correlation matrix of the 5 responses from Burney and Smith [11].

y1 y2 y3 y4 y5

y1 1.00 −0.69 0.05 0.86 0.96y2 −0.69 1.00 −0.20 −0.75 −0.53y3 0.05 −0.20 1.00 0.1 0.00y4 0.86 −0.75 0.11 1.00 0.81y5 0.96 −0.53 0.00 0.81 1.00

6 C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

based on the likelihood function. They provide a simulation study toshow the impact of increased amounts of censoring, and showed thatif the true likelihood has an unknown form for measurand values nearthe threshold and another known form away from the threshold, thencommonpractice for themeasurement system to report censored valuesrather than actual values is prudent. Simple open-source software to an-alyze censored data is provided, and both simulated and real sample ex-ample data were analyzed. The simulation study indicated how muchinformation is lost due to having varying fractions of measurand valueslying near the end of instrument range. Measurement specialists reportcensored values not because statistical evaluations such as this suggestit is prudent, but instead because instrument vendors typically specifyan instrument's range on the basis of its calibration range, verifyingthat instrument calibration is adequate across the specified range. Inthe context of a sample exchange program, Burr and Hamada [14] illus-trated the importance of a well-understood instrument response for allmeasurand values within the specified instrument range.

4.1. Metrology for Morphology Data

Morphology data, as with all laboratory measurement data, havemeasurement errors, including bias and censoring. These issues areoften magnified during the interim image analysis steps. The errormodel must account for subsampling bias in sample preparation, mag-nification and region viewbias, and a number of instrumentation effectsthat cause uncertainty and noise in the image (for example, [32,40,55]),typical instrumentation calibration issues (SEMs are often stated tohave 5% uncertainty in magnification, as well as many other calibrationissues), image setting effects, human biases in imaging (a known ten-dency to select ‘cool’ looking particles for further analysis), image pro-cessing and analysis errors and uncertainty from both human andalgorithm factors, and digitization limitations. Quantifying measure-ment uncertainty in nanoscale optical dimension measurements is be-coming well understood, but does not capture all the uncertainty andbias factors that affect particle morphology assessments [23,36,48]. Forapplications requiring highly accurate metrology, such as pharmaceuti-cal materials [53], microscopy image-based sizing is recommended toonly be used in combinationwith other directmeasurement techniquesbecause of the number of error and bias sources. One common source oferror is not analyzing sufficient numbers of particles. It is hard to char-acterize and quantify enough particles in a set of images (with the par-ticles at enough resolution to accurately reflect shape) to have arepresentative sampling of the particles that were produced that canbe used for determining the actual particle distribution. Automatedimage analysis methods can help with this issue, and calculating uncer-tainty in particle size distributions [65] and quantifying bias and uncer-tainty in image segmentation. [5,54,64] are ongoing research areas.

Conventional shape factors can be highly redundant [31], andmethods for reducing to a better representative set are not often imple-mented. There are currently no certified particle shape standards, al-though some materials have been suggested. Shape validation is anactive area of discussion [10], and validation methods that do not re-quire a certified standard can readily be used. In fact, it has beensuggested that image analysis requires validation beyond (i.e. not de-pendent on) basic metrology concepts [1]. Methods based on quantify-ing the distance or difference between particles with differentmorphology, rather than referenced to a metrology standard, are wellsuited for morphological analysis for inferring source process condi-tions [21] and allow for more robust representations of particlemorphology.

5. Example results

We used archived morphological data collected from 40 runs of aPuO2 production experiment [11] to fit a forward model to eachscalar-valued response y (mode size, lath length, width and thickness,

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

and agglomeration), allowing for linear and quadratic terms, with inter-actions of the formy = β0 + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 +β6x2x3+β7x1

2+β8x22+β9x3

2+β10x1x2x3+ ε, which is then assumed tobe the true model [19] for each of the five morphological responses y.We created artificial data sets using the estimated full model as thetrue model and adding random error with the standard deviation esti-mated based on the 40 experimental runs (with varying X values of ni-tric and Pu concentration and temperature).

To address model uncertainty, we used both the Bayesian informa-tion criterion and cross-validation to assess which of 8 possible modelswas selected in each artificial data set. The 8 possible models were asubset of all 211 possible models, that included using any of x1, x2, orx3 alone, all three of x1, x2, and x3, or all three of x1, x2, and x3 plus the3 2-way interactions, plus 3 other models. Two important findingsare: (1) In 1000 simulated data sets, the bestfittingmodel for a given re-sponse yi was not stable, but patterns, such as which model was mostoften selected depended strongly on which of the 5 ys was beingevaluated. (2) The BIC often selected a different model than did cross-validation (CV). Assuming the fitted full model was the truemodel, we used the fitted full model to simulate realistic yi values,which were used to evaluate which of two orthogonal designed exper-iments performed better at selecting the correct (full model). Burret al.[19,20] showed BIC and CV only agreed sometimes, regardless ofdesign, and the magnitude of the true model parameter values mattergreatly (the true β parameters are unknown, but in the simulation ex-periments, we assumed the fitted βs were the true βs).

Next we evaluated how well we could infer the 3 xs, using a fre-quently selected model,

y = β0 + β1x1 + β2x2 + β3x3 + β4x1x2 + β5x1x3 + β6x2x3 + ε,

as the true forward model, with βs estimated from the original 40 realexperimental runs assumed to be the true βs. We varied themagnitude of the error standard deviation and computed the rootmean squared prediction error in each x for the two designs. For the in-verse step, thefitted xswere foundby simple numericalminimization of

∑5

i¼1ðyi−yiðx1; x2; x3ÞÞ2 using the nlminb function in R. The fit ŷi(x1, x2, x3)

was calculated using y ¼ β0 þ β1x1 þ β2x2 þ β3x3 þ β4x1x2 þ β5x1x3 þβ6x2x3, where the estimated βs were obtained from the two designedexperiments. Two important findings are: (1) The RMSE values dependstrongly on the true β values, and (2) The RMSE values are reasonablygood for both designs, but are very poor (large) if we fit the full 11-parameter model instead. In addition, we also experimented with ran-domly generated true βs. The fact that the values of the true βs mattersis not surprising; for one thing, the true βs determine the extent of cor-relation among the 5 ys. All these results added simulated random er-rors only, without censoring or assay bias.

The 5 × 5 correlation matrix of the 5 responses from the 40 experi-mental runs in Burney and Smith [11] is given in Table 1. With 40runs and assuming the ys are normally distributed, correlations largerthan approximately 0.30 in magnitude are significant; so, we see fromTable 1 that there are significant correlations among some of responsepairs. This is unsurprising, because the ys used by Burney and Smith in-cluded the modal particle volume measured by a coulter counter, ag-glomeration determined from the weight % of particles in the portion

and data analysis challenges in calibration for forensics applications,.2015.07.008

7C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

of larger size particles in the bimodal size distribution, and length, width,andheight asmeasured from the average L/W/Hof user-selectedparticlesfrom photomicrographs. From Burney and Smith, many other morpho-logical featureswere noted that could have been used for a less correlatedset of responses (e.g., laths. isomorphous, anisotropic growth, layeredgrowth, crystal branching, dendritic growth, delamination, mud flats,shrinkage, rows of bubbles, and bimodal distribution). However, as de-scribed earlier, these are challenging to measure.

Breiman and Friedman [9] introduced the “curds and whey” proce-dure to improve prediction of y values when there is correlationamong some of the ys. Rothman et al. [51] emphasizes the correlationamong the ys that can arise from correlated errors and/or from sharedrandompredictors. They also provideR software to implement a versionof the procedure. The curds and whey procedure performed well com-pared to partial least squares. Briefly, the procedure works as follows.First, compute the regress one y at a time-based approach to estimateeach y. Good linear combinations of transformations of the ys arefound empirically used to shrink each of the individual y toward themean of all the ys. To our knowledge, the procedure has not becomewidespread in chemometrics, and seems to deserve further study inthe context of multivariate calibration. Both Breiman and Friedman [9]and Rothman et al. [51] consider inverse regression, fitting the quantityto be inferred (x in our forensics case) as a function of the ys. We havenot yet evaluated inverse regression in our forensics context. For eitherclassical or inverse calibration, the true βs clearly impact the correla-tions among the ys, which impact how well we can infer the xs in thisversion of classical calibration.

One source of uncertainty often overlooked is model uncertainty[24]. Many analyses express the uncertainty in model parameters con-ditional on the chosenmodel being the correct model. Because randomerrors arise from multiple sources, including model misspecification, tosome extent, this is adequate. However, this example illustrates thatmodel uncertainty can be important for influencing results. If in eachsimulation run we apply model selection criteria to select a model,then we do not always have the same functional form in the fittedvalue. Because we anticipate using model selection to select a forwardmodel from the calibration data, we should include the uncertaintythat arises from which model is selected and fitted.

Because of both the model selection and the anticipated challengingmetrology issues, such as censoring and lab bias, we believe a Bayesianapproach is compelling [2,3]. The basic idea is straightforward and anal-

ogous to minimizing ∑5

i¼1ðyi−yiðx1; x2; x3ÞÞ2 used in [19,20]. The idea is

that plausible X values are those forwhich the observed response vectory is most likely under the fitted forward model, and the assumed mea-surement error structure, as in Fig. 1a. Metrology issues such as labora-tory biases, multiplicative measurement errors (having a standarddeviation linear in the true value of the measurand), and censoringcan all be handled using a suitable likelihood function. Model selectioncan be handled as in Anderson-Cook et al. [2], who drop any β fromthe full model if its minimum estimated posterior tail probability issmall, say less than 0.025. An important related topic is choosing amea-surement error model using model selection criteria applied to calibra-tion data [16]. Not all these effects have been included in any analysesthat we are aware of, partly because the appropriate likelihood functionis not standard. However, Anderson-Cook et al. [2] use Markov ChainMonte Carlo (MCMC) to address both model selection and multiplica-tive response error models. MCMC is a tool in Bayesian analysisgaining in popularity in many fields, including chemometrics [27,43].Example results using MCMC that consider the impact of multiplicativemeasurement error in addition to model error are given in Anderson-Cook et al. [2]. The MCMC approach is useful whenever the likelihoodfor the data and/or the prior for the model parameters make it difficultto analytically express the posterior distribution for the model parame-ters. Also, MCMC makes it straightforward to include all relevantsources of uncertainty, so a more comprehensive analysis is feasible.

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

6. Research and development needs for nuclear forensics

This section describes existingmethods that can be used ormodifiedin calibration for forensics applications, and where research and devel-opment are needed for statistical design of experiments, metrology, andto solve inverse problems.

6.1. Statistical Design of Experiments

Design of experiments (DOE) dates back to at least the 1700s, withmajor advances in statistical DOE occurring from the 1930s to the pres-ent, introducing key concepts such as orthogonal factors, blocking andrandomization, and “optimal” designs. Traditionally, constructing an“optimal” design meant selecting a criterion of interest, and then as-suming that the true functional form relating predictors (chemical engi-neering process parameters) to responses (PuO2 particle morphologyand chemical concentrations) was known in advance of the experiment.For example, suppose one knows from prior related study that the func-tional form relating the response y to predictors x, y = f(x), is linear inthe p model parameters. Then constructing an optimal design beginswith the use of the model y = Xβ + ε, where for n experimentalruns, the n × p design matrix X is the matrix of predictors for each ofthe n runs, β is the parameter vector, and ε is an error vector.Common choices of optimality criteria focus on either good estimationof the model parameters (such as D- or A-optimality), or goodprediction of new observations in the region of experimentation (suchas I- or G-optimality). See Myers et al. [44] for formal definitions ofthese criteria.

Most DOE literature does not consider the inverse problem as isdone in forensics to use y to predict x. Hence, it is not straightforward

to make the connection that good precision for the estimate β of thetrue parameter β in the known model y = Xβ + ε will translate into a

good solution for the identification of a best choice of x ¼ f−1ðyÞ. Sim-

ilarly, good prediction of ŷ from y= f(x) for a new location in the designspace, seems helpful, but not necessarily sufficient for best selection of x.

Hence, “optimal” forwardmodels for good properties of f ðxÞdo not nec-

essarily imply optimal properties of f−1ðyÞ.

Note that traditional DOE as currently applied in our forensics set-ting makes three key assumptions: (a) the functional form is well ap-proximated by a low-degree polynomial (fairly standard in responsesurface estimation for DOE), (b) an optimal forward model design is ef-fective for the inverse problem, and (c)measurement issues such as lab-to-lab biases and data censoring by reporting “less than threshold” or“more than threshold” rather than actual assay result can be addressedby standard measurement error modeling.

Ongoing work is aimed at studying these assumptions (a)-(c) moredeeply to further enhance the solution to the inverse problem byadapting tradition DOE methodology. First, we discuss the assumptionsin more detail. The initial assumption (a) that the underlying relation-ship between the factors and each individual response is smooth, con-tinuous, and can be approximated by a low-order polynomial is acommon assumption for many response surface methodology prob-lems, because it suggests that a Taylor-series expansion of adequatecomplexity of the true, but unknown relationship, adequately charac-terizes the important features of the relationship. If there are knownchanges of state or other discontinuities in the responses, then alternatemethods are required. In determining what design to select for the ex-periment, the order of the assumed model (first, first-order with inter-actions, second, etc.) suggests the number of levels of each factor to beconsidered, and the total number of runs that are required to estimateall of the model parameters. If the relationships between factors andeach response are of different assumed complexity, the design shouldbe selected based on the most complicated of the relationships, sincemodel reduction from a complicated model can be considered later.

and data analysis challenges in calibration for forensics applications,.2015.07.008

8 C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

However, if too simple a model is assumed and the design choice isbased on this, theremay be no opportunity to assess lack of fit or poten-tially missing features in the model. A very simple illustration of this isto consider estimating a one factor quadratic function (with 3model pa-rameters) with only two levels of that factor. No amount of clevernesswith the analysis will allow for the curvature to be estimated. However,if we collect data at three levels of that factor, we can estimate a qua-dratic curve, and if necessary reduce the model to a straight line (firstorder model) if no curvature is observed. More formally, suppose thatwe know y = f(x1) + ε is linear in the single predictor x1, y =β0 + β1x1 + ε. Then, the x1 values selected for the experiment shouldbewell spread out to produce a good estimate of β=(β0, β1). However,if the functional form is unknown and could oscillate with large localvariation, then a “space filling” design ismore effective, and the selectedx1 values should include some that are closer together.

The second assumption (b) is needed, because the current state ofthe literature has not suggested a direct metric for assessing the qualityof a design for the inverse problem. This is a needed area of research: tofirst, formally evaluate to what extent criteria for good estimation and

prediction properties of f ðxÞ translate into good properties for f−1ðyÞ.

Preliminary results from simulation suggest that the choice of a best de-sign for the inverse problem is quite closely correlated with optimal de-sign selection for the standard problem, as long as the choice ofmodel isadequate for the true underlying relationship. A second step to formal-ize these early findings is then to develop ametric that directly assessesthe ability of a design to select the best factor combination in the x-spacethat matches a new observed y signature. Similar to I- and G-optimalitywhich consider the distribution of prediction variance values across theentire design space, a new criterion for good designs for the inverseproblem considers the quality of prediction across all ranges of signa-ture values.

The final assumption (c) bounds the possible complications whenperforming the analysis. If there are additional sources of variabilitysuch as lab-to-lab variability in the assay protocol, then identifying themost promising candidate x-location ismore difficult, because each lab-oratory could have a different best factor combination in the x-space as-sociated with a new observation. Another potential complication is theloss of information from censoring, which allow ranges of x-locations toall be plausibly compatible with the observed new signature containingcensored values. Research efforts are currently exploring the impact ofinformation loss on the ability to solve the inverse problem. If thecensoring affects only a small fraction of the observations and/or thereare many responses as part of the signature, solutions to the inverseproblem may remain possible. However, the exact requirements forthe methods to continue to perform adequately are still not wellunderstood.

6.2. Five Areas for Research

There are a number of issues and potential research areas related todesign and analysis of experiments and how they are related to the in-verse problem: (1) identifying and evaluating appropriate criteria fordesign selection and evaluation; (2) analysis issues based on model re-duction to manage the bias–variance trade-off; (3) quantifying thegoodness of the match of the best identified x, when new data mayhave important differences from the original experiment; (4) the rolesof different types of errors on the quality of the solution for x; and(5) the role of robustness as it impacts the inverse problem solution.

6.2.1. Criteria for Designs Selection and EvaluationThe first area addresses a formal criterion for comparing and con-

structing potential designs when the goal of the experiment is to buildmodels for solving the inverse problem. If sequential experimentationis plannedwith a screening design to determinewhich factors are activefor influencing the relationship between factors and the different

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

responses in the signature, then standard criteria for this initial phase,such as D-optimality, are suitable. In this screening phase, there are nonew observations on which to consider the inverse problem, andhence a standard criterion for good estimation of model parameters isappropriate.

In the second phase where a response surface design is considered,and the results of modeling the responses in the signature will beused to identify the best factor combination in the x-space for the in-verse problem, an ideal design would provide unbiased estimates ofthe best factor combination in the x-space with minimal uncertainty.In simulation studies described in Anderson-Cook et al. [2], characteris-tics of the distance from the true x-value to the estimated solution sum-marized across a grid of x-combinations covering the design space andaveraged across simulated response values provide a computationally-intensive approach for comparing alternative designs. While capturingthe spirit of what a good metric for the designs should encompass, thesimulation-based nature of the metric and its dependence on how theanalysis was performed all make it less practical for easy comparison.Its computationally-intensive nature makes it impractical to embed ina design construction search algorithm, such as a genetic algorithm(see [39] for more details on genetic algorithms). Until a more stream-lined criterion can be created, or a formal connection developed be-tween D-, G- or I-optimal designs and performance for the inverseproblem, classical designs such as central composite designs, factorialor fractional factorials (with an appropriate number of levels to matchthe complexity of the assumed model) are potentially good choices.

An additional challenge is that there is not a large literature on de-sign of experiments for estimation when the factors include both quan-titative and qualitative factors (as we expect in our context) [25]. Thebasic issue is that it is rarely within the experimental budget to do sep-arate experiments with the quantitative factors at each combination ofthe qualitative factors.

6.2.2. Managing the Bias–Variance Trade-off During Analysis

After the designed experiment has been performed, a model, yi ¼ f iðxÞ, for each response in the signature is needed for use as part of the in-verse problem solution. It remains to be seen whether the curds-and-whey procedure [9] provides meaningful improvement over fittingeach yi one at a time. We have commented earlier that the assumedmodel used to select a best design should contain a generous numberof terms, typically including an interaction, main effects, 2- and poten-tially 3-way interactions, and quadratic terms. This initial model foreach of the responses in the signature assumes maximal complexity toallow for capturing any features in the underlying relationship. Howev-er, an intermediate step in the analysis phase involves model selection,where extraneous terms in the model are removed to simplify themodel. This model selection is done with the intention of balancingthe bias–variance trade-off between under- and over-fitting the truerelationship. If too many terms are removed from the model, then im-portant features of the true relationship might be missed, leading tothe predicted response at a given factor combination having bias. If un-necessary terms are left in themodel, then the prediction variance of theresponse will be inflated. Ozol-Godfrey and Anderson-Cook[45] illus-trated that additional terms in themodel can only inflate the predictionvariance at any location in the design space. Hence, having additionalterms remaining in the model not only complicates the interpretationof the estimated relationship, but alsomakes prediction of newobserva-tion values less certain. Hence, model reduction is an important part ofthemodel fitting process, and can yield substantial improvements in theoverall performance of the design. In simulation studies in Anderson-Cook et al. [2] and Burr and Hamada[14,20], the analysis strategies con-sidered included a frequentist approach with the full assumed model, afrequentist approach using simple model reduction based on p-valuesbelow a selected threshold, and a Bayesian approach with or withoutmodel reduction. We used synthetic data generated from fitting

and data analysis challenges in calibration for forensics applications,.2015.07.008

9C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

polynomial response surfaces (with interaction terms) to the real datain Burney and Smith [11].

A comparison of different analysis strategies showed a noticeableimprovement in the average and upper percentile distance betweenthe true and estimated x-location for new observationswhenmodel se-lection was used with the frequentist analysis. For the subset of casesconsidered with the more computationally intensive Bayesian analysiswith no model reduction, the performance was at least as good as themodel selection frequentist approach, with some improvements forsome individual cases. In selecting a forward model we can considermodern alternatives (used in data mining) to low degree polynomialssuch as multivariate adaptive splines, Gaussian processes, projectionpursuit regression, and regression trees. Model selection methods ap-pear to be helpful for managing the bias–variance trade-off, and theBayesian approach seems promising for providing additional advantagein the analysis phase.

6.2.3. Quantifying the Goodness of the Match Between a New Observationand the Best Choice from the Original Data

Once the designed experiment has been performed and the data an-

alyzed for each response to estimate the models yi ¼ f iðxÞ for all ele-ments of the signature, then the analyst is able to use the model toidentify the best factor combination in the x-space for a new signaturevalue that is observed. This optimization is based on finding the x-location that minimizes the combined distance between each observedresponse and the response values from the estimatedmodels at that lo-cation. For example, we might consider selecting the x-location in the

design space that minimizes ∑q

i¼1jyoi−yij or ∑

q

i¼1ðyoi−yiÞ2 , where yoi is

the new observed value of the ith response, and ŷi is the estimated re-sponse at a given x-location. Some thought should be given to scalingeach of these distances to match the natural variation of each responseto prevent themetric from being dominated by a single response in thesignature. Obtaining a solution for this minimization is always possible(although not always uniquely), but some scenarios could lead to diffi-culty with the interpretation of the results.

The first outcome is that there are a large number of possible solu-tions that are all similarly likely. One example of when this mightoccur would be if the responses in the signature are highly correlated,then the existence of several responses does not help disambiguatewhich location is best. In the simplest case, if all of the responses are es-sentially measuring the same feature, then the optimal solution mightbe a region defined by the contour of the response surface for one ofthe responses whose value matches the observed response. In thiscase, we would want the optimization result to show the entire collec-tion of solutions as best, instead of arbitrarily selecting a single locationin the x-space. Hence, having ametric that quantifies how good amatchbetween the newobservation and the predictedmodels allows compar-isons between the “best” location and other good solutions. If the scoreson this metric are very close for multiple locations, then a set of solu-tions should be reported.

The second outcome to consider is rooted in a measurement errorwith one or more of the responses in the observed signature. Having ametric to examine the goodness of thematchwould allow for a problemwith one measurement of a response in the signature influencing thechoice of the solution to the inverse problem. If there was a goodmatch on all but one of the responses, then a robust metric could iden-tify this as a candidate solution even in the presence of a rogue element.Again, examining the top contenders for a match instead of just a single“black box”match of the optimal location could providemoremeaning-ful results that could handle data or measurement issues.

Afinal outcome thatmerits discussion ariseswhen the newobserva-tion comes from a different process than the original experiment.Consider the casewhere the PuO2 production line used for the initial de-signed experiment considers ranges of a set collection of inputs with

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

some additional factors held fixed. The new observation comes from adifferent production line where perhaps some of the additional factorswere set at different values than what was considered in the designedexperiment. Alternatively, the new observation could be associatedwith material produced in a standard line but has since been altereddue to aging under some abnormal environment. In these cases, wehave no reason to believe that our designed experiment can adequatelypredict response values that match the new observation. In this case, ametric that quantifies the goodness of thematch between a new obser-vation and the best estimated response signature could signal that weare in a different paradigm. It would be helpful to have metrics thatallow easy discrimination between the best location in the x-spacewith estimated responses which are a “strong” match and consistentacross all responses in the observed signature versus a solution that isan artifact of the minimization criterion, but bear little or no resem-blance to the new observation.

6.2.4. The Role of Different Types of Response Measurement Error on theAnalysis

In the Section 6 discussion thus far, we have assumed that the re-sponse y has been measured exactly (or is censored). But in practice,the measured response y* is the true response measured with error, aso-called response measurement error. Typically, an additive modelwith Gaussian error is assumed so that y *= y+ η, where η has a Gauss-ian or some other specified distribution. However, chemical measure-ments often are modeled by a multiplicative model, where themeasurement error standard deviation increases with the magnitudeof the true response. That is, y *= y(1+ η), where η has a Gaussian dis-tribution, whose standard deviation is small enough so that 1 + η N 0with probability 1. The parameters associated with the η distributionmay be well-estimated through measurement system assessment ex-periments known as gauge R & R studies. Burdick et al. [12] and BurrandHamada (2013) presented analysis of gauge R &R study data for ad-ditive and multiplicative error models, respectively.

One approach to response measurement error is to ignore it in theanalysis; if the errors are small, ignoring them has little impact. Ifthere are repeat measurements, then using the average of the repeatmeasurements as the response will have less error. As an alternative, aBayesian approach can properly account for response measurementerror allowing for an additive or multiplicative model. Anderson-Cooket al. [2] showed how such a proper accounting improves the powerto detect active factors (and effects) in fitting a forward model. The im-pact of ignoring versus proper accounting of response measurementerror for the inverse problem requires further research.

6.2.5. Robustness and its Impact on the Inverse ProblemIn traditional response surface methodology and optimization of a

response, robustness of the response across different values of one ormore factors is considered an advantage. In this case, the response re-mains relatively unchanged when that factor is varied. Robust parame-ter design [50], uses strategies to exploit robustness to optimizemultiple responses or select factor combinations that aremore practicalto implement.

For the inverse problem, there are possible advantages and disad-vantages of robustness. If a subset of the responses were robust to theinfluence of a particular factor, this could allow maximum flexibility tomove in this dimension of the factor space to allow improved fit of theother responses to the observed value. The robustness would manifestitself as the response surface characterizing the relationship beingvery flat across different values of the factor. In this way it may be pos-sible to obtain a closermatch to the estimated signature, increasing con-fidence about where in the factor space the new observation was made.However, if all or a majority of the responses in the signature are robustto changes in one (or more) of the factors, then this would lead to flat

response surfaces across all yi ¼ f iðxÞrelationships in these dimensions.

and data analysis challenges in calibration for forensics applications,.2015.07.008

10 C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

In this case, there is considerable ambiguity about thematchwithmanypossible contenders being plausible and similarly likely. More explora-tion and research is needed to determine how robustness of responsesto factor input values impacts the inverse problem.

Another form of robustness to consider is robustness of results tomodel form. The simulation studies described in Section 5 that ad-dressed model selection all assumed that the true model was amongthose evaluated, which of course is never exactly true. As mentionedpreviously, a key assumption of the design and analysis of the inverseproblem focuses on specifying a model that is adequate (although notthe “true” model) to characterize the relationship between each re-sponse and the factors. If the model form selected is inadequate, thenthe impact of different choices for the design and analysis are notknown. Additional research into the impact of model misspecificationon the inverse problem would help provide guidance for practitionersas they consider the choice of models to be used.

7. Discussion and summary

Forensic science involves inverse problems. The contributions of sta-tistical science to inverse problems range from a relatively simple tasksuch as inferring the mean and variance of a process that generatedsome collected data to a comprehensive task such as inferring compo-nent concentrations from spectral data to be used as responses in a fo-rensics setting.

One ongoing experiment specifies the Pu startingmaterial, reagents,methods, and operating conditions for each production run and theresulting chemical, physical, and morphological characteristics to bemeasured. Each test run will be conducted on a small scale, nominally1-10 g Pu/run. The project is using statistical design of experiments toaddress the overall project goal of predicting production conditions fora new sample. There are unique challenges in our context that wehave described.

If resources allow, forward models should be developed for a largenumber of potentially useful response variables. However, the amountof sample material available for testing may limit the number of re-sponses that can be considered (e.g., due to the possible destructive na-ture of the testing). There may be even more restrictive limitations onthenumber ofmeasurements that can be acquired from interdicted spe-cial nuclear material that will be used for inverse prediction. Thus, insuch cases it would be useful to have a strategy for down-selecting(from the available forward models) the most valuable measurementsto acquire from interdicted special nuclear material.

This perspectives paper outlines some of the statistical challengesand progress to date toward a comprehensive approach to certaintypes of nuclear forensics problems. The approach utilizes existing sta-tistical methods, but also requires research and development in calibra-tion for inverse problems, experiment design, and in options toapproach the modest-dimensional inverse problems in the presence ofchallenging metrology issues. Section 5 described some of the re-analysis of archived experimental data from Burney and Smith [11]that fit forward models relating five factors or predictor measurements(morphological properties) to three processing conditions in anticipa-tion of related andmore extensive data from a planned PuO2 oxide pro-duction process. Some of the factors will be measured trace chemicalconcentrations, which will be estimated using multivariate calibrationthat could involve higher dimensional data such as that from typicalNIR spectra in Fig. 2. Responses used in our multivariate calibrationwould then be the output of multivariate calibration applied fromhigher-dimensional data. Candidate designed experiments have beencreated and evaluated with established optimality criteria for good esti-mation and prediction (D- and I-optimality, respectively) of the forwardmodel for which the functional form is at least approximately known.Simulation will be a key analysis tool to address the metrology, designof experiment assessment, calibration, and model selection challenges.

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

Acknowledgments

We acknowledge the Department of Homeland Security NationalTechnical Nuclear Forensics Center for funding this work at Los AlamosNational Laboratory and Sandia National Laboratory under contractnumbers DE-AC52-06NA25396 and DE-AC04-94AL85000, respectively.

References

[1] F. Adams, Spectroscopic imaging: a spatial odyssey, J. Anal. At. Spectrom. 29 (7)(2015) 1197–1205.

[2] C. Anderson-Cook, T. Burr, M.S. Hamada, E. Thomas, Statistical analysis for nuclearforensics experiments, to appear, to Statistical Analysis and Data Mining, 2015.

[3] C. Anderson-Cook, T. Burr, M.S. Hamada, The Impact of measurement error on theanalysis of designed experiments, Los Alamos National Laboratory Unrestricted Re-lease Report, LA-UR-15-21982, J. Qual. Technol. (2015) (submitted for publication).

[4] G. Bagheri, C. Bonadonna, I. Manzella, P. Vonlanthen, On the characterization of sizeand shape of irregular particles, Powder Technol. 270 (Part A(0)) (2015) 141–153.

[5] M. Beneš, B. Zitová, Performance evaluation of image segmentation algorithms onmicroscopic image data, J. Microsc. 257 (1) (2015) 65–85.

[6] D. Benn, C. Ballantyne, The description and representation of particle shape, EarthSurf. Process. Landf. 18 (7) (1993) 665–672.

[7] S. Blott, K. Pye, Particle shape: a review and new methods of characterization andclassification, Sedimentology 55 (1) (2008) 31–63.

[8] S. Blott, K. Pye, Particle size scales and classification of sediment types based on par-ticle size distributions: review and recommended procedures, Sedimentology 59(7) (2012) 2071–2096.

[9] L. Brieman, J. Friedman, Predicting multivariate responses in multiple linear regres-sion, J. R. Soc. B 1 (1997) 3–54.

[10] Lew Brown, The Urgent Need for Particle Shape Standards, Part Deux! www.particleimaging.com/the-urgent-need-for-particle-shape-standards-part-deux/www.particleimaging.com/the-urgent-need-for-particle-shape-standards 2014.

[11] G. Burney, P. Smith, Controlled PuO2 Particle Size from Pu(III) Oxalate Precipitation,Savannah River Laboratory, 1984. (DP-1689).

[12] R. Burdick, C. Borror, D. Montgomery, Design and Analysis of Gauge R & R Studies:Making Decisions with Confidence Intervals in Random and Mixed ANOVA Models,ASA-SIAM Series on Statistics and Applied Probability. , SIAM, Philadelphia, PA andASA, Alexandria, VA, 2005.

[13] T. Burr, H. Fry, Biased regression: the case for cautious application, Technometrics 13(2005) 284–296.

[14] T. Burr, M.S. Hamada, Analyzing censored data in the analysis of multiplicative re-peatability and reproducibility studies, Accred. Qual. Assur. 19 (2014) 75–80.

[15] T. Burr, M.S. Hamada, A multiplicative model for gauge R&R studies, J. Qual. Technol.31 (2015) 801–809.

[16] T. Burr, G. Hemphill, Multi-component radiation measurement error models, Appl.Radiat. Isot. 64 (3) (2006) 379–385.

[17] T. Burr, P. Knepper, A study of the effect of measurement error in predictor variablesin nondestructive assay, Appl. Radiat. Isot. 53 (4-5) (2000) 547–555.

[18] T. Burr, K. Kuhn, L. Tandon, D. Tompkins, Measurement performance assessment ofanalytical chemistry analysis methods using sample exchange data, Int. J. Chem. 3(4) (2012), http://dx.doi.org/10.5539/ijc.v3n4p40.

[19] T. Burr, C. Anderson-Cook, M.S. Hamada, E. Thomas, Experiment Design and DataAnalysis for an Inverse Problem in Nuclear Forensics, LA-UR- 14-21378, Proceedingsof the Institute of Nuclear Materials Management Annual Meeting, Atlanta, 2014.

[20] T. Burr, C. Anderson-Cook, M.S. Hamada, E. Thomas, Experiment Design and DataAnalysis for an Inverse Problem in Nuclear Forensics, LA-UR- 14-21378, Conferenceon Data Analysis, Santa Fe, 2014.

[21] P. Callahan, SimmonsJ. , M. De Graef, A quantitative description of themorphologicalaspects of materials structures suitable for quantitative comparisons of 3D micro-structures, Model. Simul. Mater. Sci. Eng. 21 (1) (2013) 015003.

[22] D. Christensen, D. Bowersox, B. McKerley, R. Nance, Wastes from Plutonium Conver-sion and Scrap Recovery Operations LA11069-MS, 1988.

[23] P.J. De Temmerman, E. Verleysen, J. Lammertyn, J. Mast, Size measurement uncer-tainties of near-monodisperse, near-spherical nanoparticles using transmissionelectron microscopy and particle-tracking analysis, J. Nanoparticle Res. 16 (10)(2014) 1–17.

[24] D. Draper, Assessment and propagation of model uncertainty, J. R. Stat. Soc. Ser. BMethodol. 57 (1) (1995) 45–97.

[25] N. Draper, J. John, Response–surface designs for quantitative and qualitative vari-ables, Technometrics 30 (4) (1988) 423–428.

[26] L. Duval, M. Moreaud, C. Couprie, D. Jeulin, H. Talbot, J. Angulo, Image Processing forMaterials Characterization: Issues, Challenges and Opportunities, Image Processing(ICIP), 2014 IEEE International Conference, 2014.

[27] C. Elster, Bayesian uncertainty analysis compared with the application of the GUMand its supplements, Metrologia 51 (2014) S159–S166.

[28] L. Frank, J. Friedman, A statistical view of some chemometrics regression tools,Technometrics 35 (1993) 109–135.

[29] D. Haaland, E. Thomas, Partial least-squares methods for spectral analyses. 1. Rela-tion to other quantitative calibration methods and the extraction of qualitative in-formation, Anal. Chem. 60 (1988) 1193–1202.

[30] M. Halperin, On inverse estimation in linear regression, Technometrics 12 (1970)727–736.

[31] M. Hentschel, N. Page, Selection of descriptors for particle shape characterization,Part. Part. Syst. Charact. 20 (1) (2003) 25–38.

and data analysis challenges in calibration for forensics applications,.2015.07.008

11C. Anderson-Cook et al. / Chemometrics and Intelligent Laboratory Systems xxx (2015) xxx–xxx

[32] JEOL Guide to Scanning Microscope Observation, http://www.jeolusa.com/RE-SOURCES/ElectronOptics/DocumentsDownloads/tabid/320/Default.aspx?EntryId=5.

[33] A. Jillavenkatesa, S. Dapkunas, H. Lum Lin-Sien, NIST Recommended Practice GuideSpecial Publication 960-1 Particle Size Characterization, 2001.

[34] J. Kalivas, Multivariate calibration, an overview, Anal. Lett. 38 (2005) 2259–2279.[35] A. Keys, C. Iacovella, S. Glotzer, Characterizing complex particle morphologies

through shape matching: descriptors, applications, and algorithms, J. Comput.Phys. 230 (17) (2011) 6438–6463.

[36] T. Klein, E. Buhr, C. Georg Frase, 6 TSEM: a review of scanning electronmicroscopy intransmission mode and its applications, Adv. Imaging Electron Phys. 171 (2012)297.

[37] R. Krutchkoff, Classical and inverse regression methods of calibration, Technometrics9 (1967) 425–439.

[38] R. Krutchkoff, Classical and inverse regression methods of calibration in extrapola-tion, Technometrics 11 (1969) 605–608.

[39] C.F.D. Lin, C.M. Anderson-Cook, M.S. Hamada, L.M. Moore, R.R. Sitter, Using geneticalgorithms to design experiments: a review, Qual. Reliab. Eng. Int. 31 (2015)155–167.

[40] C. Mansilla, V. Ocelik, J. De Hosson, Statistical Analysis of SEM Image Noise, 11th In-ternational Conference on Surface Effects and Contact Mechanics: ComputationalMethods and Experiments, 2013.

[41] H. Martens, T. Naes, Multivariate Calibration, Wiley and Sons, New York, 1991.[42] H. Merkus, Measurement of Particle Size, Shape, Porosity and Zeta-Potential, in: H.G.

Merkus, G.M.H. Meesters (Eds.), Particulate Products, 19, Springer InternationalPublishing 2014, pp. 59–96.

[43] S. Moussaoui, D. Brie, C. Carteret, A. Mohammad-Djafari, Bayesian analysis of spec-tral mixture data using Markov Chain Monte Carlo methods, Chemom. Intell. Lab.Syst. 81 (2) (2006) 137–148.

[44] R.H. Myers, D.C. Montgomery, C.M. Anderson-Cook, Response Surface Methodology:Process and Product Optimization Using Designed Experiments, Wiley, New York,2009.

[45] A. Ozol-Godfrey, C. Anderson-Cook, D.C. Montgomery, Fraction of design space plotsfor examining model robustness, J. Qual. Technol. 37 (2005) 223–235.

[46] P. Parker, G. Vining, S. Wilson, J. Szarka, N. Johnson, The prediction properties of in-verse and reverse regression for the simple linear calibration problem, J. Qual.Technol. 42 (4) (2010) 332–347.

[47] R. Porter, C. Ruggiero, Interactive Image Quantification Tools for Nuclear MaterialForensics, LAUR 11–00019, Proceedings of SPIE. 2011. San Francisco, 2011.

[48] J. Potzick, E. Marx, Parametric uncertainty in nanoscale optical dimensional mea-surements, Appl. Opt. 51 (17) (2012) 3707–3717.

[49] R, A Language and Environment for Statistical Computing, R Development CoreTeam, R Foundation for Statistical Computing, Vienna, Austria, 2008.

[50] T. Robinson, C. Borror, R. Myers, Robust parameter design: a review, Qual. Reliab.Eng. Int. 20 (2003) 81–101.

Please cite this article as: C. Anderson-Cook, et al., Design of experimentsChemometr. Intell. Lab. Syst. (2015), http://dx.doi.org/10.1016/j.chemolab

[51] A. Rothman, E. Levina, J. Zhu, Sparse multivariate regression with covariance estima-tion, J. Comput. Graph. Stat. 19 (4) (2010) 947–962.

[52] C. Ruggiero, A. Ross, R. Porter, Segmentation and Learning in the Quantitative Anal-ysis of Microscopy Images, Proceedings of SPIE 9405, Image Processing: MachineVision Applications, 8 2015, p. 94050L, http://dx.doi.org/10.1117/12.2083776(February 27, 2015).

[53] B. Shekunov, P. Chattopadhyay, H. Tong, A. Chow, Particle size analysis inpharmaceutics: principles, methods and applications, Pharm. Res. 24 (2) (2007)203–227.

[54] C. Straehle, U. Koethe, G. Knott, K. Briggman,W. Denk, HamprechtF. , SeededWater-shed Cut Uncertainty Estimators for Guided Interactive Segmentation, Computer Vi-sion and Pattern Recognition 2012 IEEE Conference on, IEEE, 2012.

[55] M. Sutton, L. Ning, G. Dorian, C. Nicolas, O. Jean Jose, R. Stephen, W. Hubert, L.Xiaodong, Metrology in a scanning electron microscope: theoretical developmentsand experimental validation, Meas. Sci. Technol. 17 (10) (2006) 2613.

[56] M. Taylor, Quantitative measures for shape and size of particles, Powder Technol.124 (1–2) (2002) 94–100.

[57] E.V. Thomas, Errors-in-Variables Estimation in Multivariate Calibration with Appli-cation to Analytical Chemistry(PhD Dissertation) University of New Mexico, 1988.84–88.

[58] E.V. Thomas, Errors-in-variables estimation in multivariate calibration,Technometrics 33 (1991) 405–413.

[59] E.V. Thomas, Insights into Multivariate Calibration Using Errors in Variables Model-ing, Recent Advances in Total Least Squares Techniques and Errors in VariablesModeling, SIAM 1997, pp. 359–371.

[60] E.V. Thomas, N. Ge, Development of robust multivariate calibration models,Technometrics 42 (2000) 168–177.

[61] E.V. Thomas, T. Burr, C. Anderson-Cook, M.S. Hamada, Statistical Strategy for Study-ing Effects of Processing Factors on PuO2, Los Alamos National Laboratory Unre-stricted Release Report, LA-UR-14-22386, 2014.

[62] W. Venables, B. Ripley, Modern Applied Statistics with S-plus 4th Edition, Springer,New York, 2002.

[63] D.Wildenschild, A. Sheppard, X-ray imaging and analysis techniques for quantifyingpore-scale structure and processes in subsurface porous medium systems, Adv.Water Resour. 51 (2013) 217–246.

[64] J.Wu, M. Halter, R. Kacker, J. Elliott, A. Plant, Measurement Uncertainty in Cell ImageSegmentation Data Analysis, NISTIR 7954 2013., http://dx.doi.org/10.6028/NIST.IR.7954.

[65] H. Yoshida, et al., Theoretical calculation of uncertainty region based on the generalsize distribution in the preparation of standard reference particles for particle sizemeasurement, Adv. Powder Technol. 23 (2) (2012) 185–190.

[66] D. Zhang, G. Lu, Review of shape representation and description techniques, PatternRecogn. 37 (1) (2004) 1–19.

and data analysis challenges in calibration for forensics applications,.2015.07.008